Table Of Contents:
Last Updated September 16th , 2011
This page lists on how to submit a job from Condor to a remote PBS cluster using SSH credentials.
To setup Condor to submit a job to a remote machine via SSH, installations need to be done both on the submit host and the remote machine that runs PBS/Torque
Installation on the Remote PBS machine
- On the PBS/Torque machine, install the latest version of Condor, but don't start any of the daemons. Then, create a script with these contents:
exec $GLITE_LOCATION/bin/batch_gahp "$@"
- Name the script batch_gahp.wrapper. Modify the filepath in the script to point into your Condor installation.
- NOTE The PBS/LSF glite code is only available on linux.
Installation on the submit machine
The submit machine needs modified remote_gahp and condor_gridmanager
We have binaries for MACOSX running snow leopard 10.6.8
corbusier:local.corbusier vahi$ uname -a
Darwin corbusier.isi.edu 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386
Note: Make sure you set the executable bits on the files.
Condor Configuration on submit host
- Add lines to your Condor config file to point to remote_gahp to use for PBS or LSF submissions
- Set up ssh keys such that you can ssh from your Condor submit machine to the PBS machine by entering a passphrase to decrypt the ssh private key. Place the passphrase in a file with read permissions restricted to you.
- Modify remote_gahp downloaded to tell it about the PBS machine and how to contact it. Near the top, you'll see a data structure named REMOTE_HOSTS. Add a new entry containing
- the PBS machine's hostname
- the path to the batch_gahp.wrapper script on the PBS machine
- the path to the passphrase file on your submit machine.
Now, you're ready to submit a job. File transfer isn't supported at the moment, so you'll need to set up the executable, input files, and directory to hold output files on the PBS machine first. Then, you can submit a job using a description file like this:
- Polling Interval for status of jobs in the remote queue
Condor polls the remote batch queue periodically for the status of all the jobs. The default interval is 5 minutes. You can change this by setting INFN_JOB_POLL_INTERVAL in your Condor config file. The value is the time between polling in seconds.
Sample Submit File to submit to PBS using SSH
+remote_queue indicates the remote PBS queue
+remote_iwd is the remote directory in which you want the job to execute
output and error keys point to a directory on the remote PBS machine where you want stdout and stderr to go.