You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

Table Of Contents:

Last Updated September 16th , 2011

Introduction

This page lists on how to submit a job from Condor to a remote PBS cluster using SSH credentials.

Setup

To setup Condor to submit a job to a remote machine via SSH, installations need to be done both on the submit host and the remote machine that runs PBS/Torque

Installation on the Remote PBS machine

  • On the PBS/Torque machine, install the latest version of Condor, but don't start any of the daemons. Then, create a script with these contents:
    #!/bin/sh
    export GLITE_LOCATION=/path/to/condor/lib/glite
    exec $GLITE_LOCATION/bin/batch_gahp "$@"
  • Name the script batch_gahp.wrapper. Modify the filepath in the script to point into your Condor installation.
  • NOTE The PBS/LSF glite code is only available on linux.

Installation on the submit machine

The submit machine needs modified remote_gahp and condor_gridmanager

We have binaries for MACOSX running snow leopard 10.6.8

corbusier:local.corbusier vahi$ uname -a
Darwin corbusier.isi.edu 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386

Note: Make sure you set the executable bits on the files.

Condor Configuration on submit host

  • Add lines to your Condor config file to point to remote_gahp to use for PBS or LSF submissions
    PBS_GAHP=$(SBIN)/remote_gahp
    LSF_GAHP=$(SBIN)/remote_gahp
  • Set up ssh keys such that you can ssh from your Condor submit machine to the PBS machine by entering a passphrase to decrypt the ssh private key. Place the passphrase in a file with read permissions restricted to you.
  • Modify remote_gahp downloaded to tell it about the PBS machine and how to contact it. Near the top, you'll see a data structure named REMOTE_HOSTS. Add a new entry containing
    • the PBS machine's hostname
    • the path to the batch_gahp.wrapper script on the PBS machine
    • the path to the passphrase file on your submit machine.

Now, you're ready to submit a job. File transfer isn't supported at the moment, so you'll need to set up the executable, input files, and directory to hold output files on the PBS machine first. Then, you can submit a job using a description file like this:

  • Polling Interval for status of jobs in the remote queue
    Condor polls the remote batch queue periodically for the status of all the jobs. The default interval is 5 minutes. You can change this by setting INFN_JOB_POLL_INTERVAL in your Condor config file. The value is the time between polling in seconds.

Sample Submit File to submit to PBS using SSH

universe=grid
grid_resource=pbs sukhna.isi.edu
skip_filechecks=true
transfer_executable=false
+remote_iwd="/lfs1/work/pbs/condor-ssh"
+remote_queue="batch"
executable=/bin/date
#arguments=300
output=/lfs1/work/pbs/condor-ssh/out.$(cluster).$(process)
error=/lfs1/work/pbs/condor-ssh/err.$(cluster).$(process)
log=ssh.log
queue

+remote_queue indicates the remote PBS queue
+remote_iwd is the remote directory in which you want the job to execute
output and error keys point to a directory on the remote PBS machine where you want stdout and stderr to go.

  • No labels