| User Support & Documentation | ||
Jobs: Condor-GOn this pageRelated linksNeed Help?Condor-G combines the inter-domain resource management protocols of the Globus Toolkit and the intra-domain resource and job management methods of Condor for managing Grid jobs. It allows you to specify a single script to run a job on the grid from a single site just as Globus does. In addition, Condor-G provides advanced job submission and monitoring capabilities. The Condor-G job manager automatically handles file transfers and job I/O while using the Globus Toolkit for job launching. The Condor-G distribution also provides a useful tool called DAGMan to define job dependencies. Submit a Condor-G JobThe following are two example Condor-G scripts. One is a single-process job and the other is an mpi job. Both scripts submit the job to the PBS queue. And both assume that the executable, and any necessary input files, are on the remote machine. To submit a job to a specific project, add the following line to the submit description file: globusrsl = (project=<projectid>) To determine your projectid, check the resource-specific user guide. Note: To configure your environment for Condor-G, you must either add +condor-g and +tgproxy to your .soft file (see "Modifying your default login environment" on the Using SoftEnv page for more information), or interactively executing "soft add +condor-g" and "soft add +tgproxy". Single-Process Job# Lines beginning with # symbol are comments # The following line is only required if you have access to multiple # projects and want to choose which to charge to. globusrsl = (project=<projectid>) # Submissions to TeraGrid must be through the globus universe universe = globus # Name of executable to run. Needs full path.~does not work. executable = /home/ncsa/jdoe/single/a.out # Command-line argument list arguments = 100 210 # false means that executable is already on remote machine # true means to copy the executable from the local machine to the remote transfer_executable = false # Where to submit the job - NCSA's TeraGrid fork jobmanager globusscheduler = tg-login1.ncsa.teragrid.org/jobmanager # Set up names for standard output and error and log files output = condor1.out error = condor1.err log = condor1.log # The following line is required. It is the command that # actually submits this to the Condor-G queue. queue MPI# Lines beginning with # symbol are comments # The following line is only required if you have access to multiple # projects and want to choose which to charge to. globusrsl = (project=<projectid>) # Submissions to TeraGrid must be through the globus universe universe = globus # Name of executable to run. Needs full path.~Does not work. Executable = /home/ncsa/jdoe/mpi/a.out # Command line arguments arguments = 100 210 # false means that executable is already on remote machine # true means to copy the executable from the local machine to the remote transfer_executable = false # Where to submit the job - NCSA's TeraGrid PBS jobmanager globusscheduler=tg-login1.ncsa.teragrid.org/jobmanager-pbs # Set up names for standard output and error and log files output = condor1.out error = condor1.err log = condor1.log # The following line is what makes it an mpi job globusrsl = (jobType=mpi) (count=4) # The following line is required. It is the command that # actually submits this to the Condor-G queue. Queue Pick one of the example scripts and save it in a file, called condor1. Submit it to the Condor-G queue:
% condor_submit condor1
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 4.
Notes: Notice the different globusscheduler lines between the two scripts. Be sure to use the one appropriate for your job. The output and log files will be located in the directory from which you submit the job. The log file contains information from Condor-G about the job. Check there first for error messages. Check Job Status
% condor_q
-- Submitter: ncsa-box1.ncsa.uiuc.edu :
<141.142.65.2:1535> : ncsa-box1.ncsa.uiuc.edu
4.0 jdoe 09:06 0+00:00:00 I 0 0.0 a.out
1 jobs; 1 idle, 0 running, 0 held
Cancel a JobTo cancel a job, run the condor_rm command on the job that you want to cancel. You need to get the job ID from the condor_q output. The following example cancels the job submitted above.
% condor_rm 4.0
Job 4.0 marked for removal
% condor_q
-- Submitter: ncsa-box1.ncsa.uiuc.edu :
<141.142.65.2:1535> : ncsa-box1.ncsa.uiuc.edu
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
4.0 jdoe 9/6 09:06 0+00:00:00 X 0 0.0 a.out
0 jobs; 0 idle, 0 running, 0 held
Wait a couple minutes....
% condor_q
-- Submitter: ncsa-box1.ncsa.uiuc.edu :
<141.142.65.2:1535> : ncsa-box1.ncsa.uiuc.edu
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
0 jobs; 0 idle, 0 running, 0 held
Condor DAGManA DAGMan script manages tasks, each a single Condor job, and enforces order-of-execution dependencies. Example DAGMan ScriptIn this example, there are three Condor-G scripts: staging.condor, setup.condor, and exec.condor. These files must be run in this order. DAGMan handles these dependencies in a script. Job stage staging.condor Job setup setp.condor Job run exec.condor parent stage child setup parent stage child run To run a script named my_job.dag: condor_dag_submit my_job.dag |
||
![]() |
![]() |
|
The TeraGrid project is funded by the National Science Foundation
and includes 11 partners: Please email help@teragrid.org with questions or comments. This site is XHTML 1.0 Transitional, CSS compliant. |
||
![]() |
![]() |