Jobs: PBS
Home > User Info > Jobs > PBS
PBS is a portable batch system which can be used to create and submit batch jobs to large number of cluster machines. A batch job is a shell script containing a set of commands you want to run on some set of execution machines. The script can contain the characteristics (attributes) of the job and the resource requirements (such as memory, cpu, time, etc.) that the job needs.
While the focus of the TeraGrid is to test grid functionalities, running
jobs via PBS may be useful if you are having problems running
jobs using the Grid Tools. The qsub command
is used to submit a PBS batch job to a queue. Currently, there are no
interactive nodes available for debugging; the PBS batch
system must be used.
PBS Commands
Some PBS commands and their functions are as follows:
| Function |
PBS
example |
| submit a batch job to a queue |
qsub [list of qsub options] script_name
"man qsub" for more options |
| create your own interactive nodes |
"qsub -I -V -l walltime=00:30:00 -l nodes=4:ppn=2"
will put you on one of the compute nodes. Upon your exit of the node,
or the wall time limit of 30 minutes in this example, the interactive
nodes will expire. |
| display the status of PBS batch jobs |
qstat -a
NOTE: When monitoring jobs with "qstat", look at "Elap Time" (elapsed time) rather than "Time Use". This is because "Elap Time" is the time since the job started, while "Time Use" is the CPU time used by the user process; this number is usually zero or close to it, since it countS the script that actually launches the MPI job, not the job itself.
"man qstat" for more options |
| delete (cancel) a queued job |
qdel PBS_JOBID |
| show all running jobs on system |
qstat -r |
| show detailed information of the specified
job |
qstat -f PBS_JOBID |
| show all queues on system |
qstat -q |
| show queue limits for all queues |
qstat -Q |
| show quick information of the server |
qstat -B |
| shows node status |
pbsnodes -a |
NOTE: use the numerical
portion of your job id from PBS when using PBS commands. The alternative
is to use qstat -f to obtain the full job id.
Please note that qstat -a and qstat
print out a limited number of characters in the jobid field. This can
result in a jobid string that is invalid.
Site Specific Commands
| NCSA |
submit a PBS job to Phase 2 nodes:
|
#PBS -q phase2 |
set the PBS wall clock limit:
|
maxTime or maxWallTime |
| PSC |
PBS command for interactive nodes:
|
-I rmsnodes=$nodes:$processors |
The following is an example of a PBS batch script (the script is the
top set of ten lines, and is explained in the bottom set of ten lines):
1
2
3
4
5
6
7
8
9
10 |
#!/bin/csh
#PBS -q dque
#PBS -N my_job
#PBS -l nodes=10:ppn=2
#PBS -l walltime=0:50:00
#PBS -o file.out
#PBS -e file.err
#PBS -V
cd /work/username
mpirun -v -machinefile $PBS_NODEFILE -np 20 ./a.out
OR alternate command:
mpirun -machinefile $PBS_NODEFILE -np $NP ./a.out |
1
2
3
4
5
6
7
8
9
10 |
start c shell script
use queue called "dque"
current job name is "my_job"
request 10 nodes and 2 processors per node
reserve the requested nodes for 50 minutes
standard output to a file called "file.out"
standard error to a file called "file.err"
export all my environment variables to the job
change to my working directory
run my parallel job |
Standard PBS node properties
These are the standard PBS node properties for IA-64 and IA-32 machines.
| Property |
Description |
| ia64-cpu13 |
1.3 GHz Tiger 4 w/ CTSS |
| ia64-cpu15 |
1.5 GHz Tiger 2 or 4 w/ CTSS |
| ia64-compute |
All ia64 w/ CTSS |
| ia32-compute |
All ia32 w/ CTSS |
Note: sites are free to have additional properties. With these properties
it will be possible for users to submit to any DTF site and specify the
type of node they'd like in a consistent way.
For example:
globusrun -o -r <any_dtf_gatekeeper>.teragrid.org/jobmanager-pbs \
'&(executable=<myprogram>)\
(jobType=mpi)\
(host_types=ia64-cpu15)\
(host_xcount=16)(xcount=2)\
(maxtime=10)'
will work at all DTF sites and run on 16 nodes of type 'ia64-cpu15'
with two processes per node (PPN).
Using PBS spool
PBS spool provides an output spool of the results of a job
submission while the job is running. In order to have the
results spooled to .pbs_spool follow these steps:
- Create a directory named .pbs_spool under your home directory. Permissions must be set such that your home directory is world executable and your .pbs_spool directory is world executable.
- Output of jobs will be spooled to the .pbs_spool directory while the job is running; otherwise, the job results are spooled to the head node of a job, then copied to the user directory when the job is complete.
- If there is not a directory, there is no guarantee you will see the job results until the job is completed.
Example Scripts
PSC
#!/bin/sh
#
#PBS -N example
#PBS -l rmsproject=st7ac1p
#PBS -l rmsnodes=2:4
#PBS -l walltime=00:05:00
#PBS -o ring.out
#PBS -e ring.err
#
## Export all my environment variables to the job
#PBS -V
#
## Change to my working directory
cd /usr/users/5/brieger/tutorial/pbs_example
#
## Run my parallel job (the PBS shell knows PBS_NODEFILE)
prun -N ${RMS_NODES} -n ${RMS_PROCS} ./ring
SDSC
#!/bin/sh
#
#PBS -q dque
#PBS -N example
#PBS -l nodes=2:ppn=2
#PBS -l walltime=0:05:00
#PBS -o ring.out
#PBS -e ring.err
#
## Export all my environment variables to the job
#PBS -V
#
## Change to my working directory
cd $HOME/tutorial/pbs_example
#
## Run my parallel job (the PBS shell knows PBS_NODEFILE)
mpirun -machinefile $PBS_NODEFILE -np 4 ./ring
UC/ANL
#!/bin/sh
#
#PBS -q dque
#PBS -N example
#PBS -l nodes=2:ia64-compute:ppn=2
#PBS -l walltime=0:05:00
#PBS -A TG-STA040001N
#PBS -o ring.out
#PBS -e ring.err
#
## Export all my environment variables to the job
#PBS -V
#
## Change to my working directory
cd $HOME/tutorial/pbs_example
#
## Run my parallel job (the PBS shell knows PBS_NODEFILE)
mpirun -machinefile $PBS_NODEFILE -np 4 ./ring