Table Of Contents:
Last Updated September 16th , 2011
Introduction
This page lists on how to submit a job from Condor to a remote PBS cluster using SSH credentials.
Setup
To setup Condor to submit a job to a remote machine via SSH, installations need to be done both on the submit host and the remote machine that runs PBS/Torque
Installation on the Remote PBS machine
- On the PBS/Torque machine, install the latest version of Condor, but don't start any of the daemons. Then, create a script with these contents:
#!/bin/sh
export GLITE_LOCATION=/path/to/condor/lib/glite
exec $GLITE_LOCATION/bin/batch_gahp "$@" - Name the script batch_gahp.wrapper. Modify the filepath in the script to point into your Condor installation.
Installation on the submit machine
The submit machine needs modified remote_gahp and condor_gridmanager
We have binaries for MACOSX running snow leopard 10.6.8
corbusier:local.corbusier vahi$ uname -a
Darwin corbusier.isi.edu 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386
- Dowload condor_gridmanager and install it in $CONDOR_HOME/sbin directory
- Download remote_gahp and install it in $CONDOR_HOME/sbin directory
Make sure you set the executable bits on the files.
Then, add these lines to your Condor config file:
PBS_GAHP=$(SBIN)/remote_gahp
LSF_GAHP=$(SBIN)/remote_gahp
Set up ssh keys such that you can ssh from your Condor submit machine to the PBS machine by entering a passphrase to decrypt the ssh private key. Place the passphrase in a file with read permissions restricted to you.
Next, modify remote_gahp to tell it about the PBS machine and how to contact it. Near the top, you'll see a data structure named REMOTE_HOSTS. Add a new entry containing the PBS machine's hostname, the path to the batch_gahp.wrapper script on the PBS machines, and the path to the passphrase file on your submit machine.
Now, you're ready to submit a job. File transfer isn't supported at the moment, so you'll need to set up the executable, input files, and directory to hold output files on the PBS machine first. Then, you can submit a job using a description file like this:
universe=grid
grid_resource=pbs remote.machine.name
skip_filechecks=true
transfer_executable=false
+remote_iwd="/path/on/remote/machine"
executable=/remote/path/myjob
arguments=300
output=/remote/path/out.$(cluster).$(process)
error=/remote/path/err.$(cluster).$(process)
queue