This document lists out issues for the algorithm developers to keep in mind while
developing the respective codes. Keeping these in mind will alleviate a lot of problems
while trying to run the codes on the Grid.
Most of the hosts making a Grid run variants of Linux or in some case Solaris. For the
purposes of this project, we should narrow down to a manageable list of supported Linux
versions and hardware platforms.
At the very least the algorithm codes should be able to run on the following Grids, during
the first phase of the project.
Running on Windows
The majority of the machines making up the various Grid sites run Linux. In fact, there is
no widespread deployment of a Windows-based Grid. Currently, the server side software
of Globus does not run on Windows. Only the client tools can run on Windows.
The algorithm developers should not code exclusively for the Windows platforms. They
must make sure that their codes run on Linux or Solaris platforms. If the code is written
in a portable language like Java, then porting should not be an issue.
Packaging of software
As far as possible, binary packages (preferably statically linked) of the codes should be
provided. If for some reason the codes, need to be built from the source then they should
have an associated makefile ( for C/C++ based tools) or an ant file ( for Java tools). The
building process should refer to the standard libraries that are part of a normal Linux
installation. If the codes require non-standard libraries, clear documentation needs to be
provided, as to how to install those libraries, and make the build process refer to those
Further, installing software as root is not a possibility. Hence, all the external libraries
that need to be installed can only be installed as non-root in non-standard locations.
If any of the algorithm codes are MPI based, they should contact the Grid group. MPI can
be run on the Grid but the codes need to be compiled against the installed MPI libraries
on the various Grid sites. The Grid group has some experience running MPI code through
Maximum Running time of the algorithm codes
Each of the Grid sites has a policy on the maximum time for which they will allow a job
to run. The algorithms catalog should have the maximum time (in minutes) that the job
can run for. This information is passed to the Grid sites while submitting a job, so that
Grid site does not kill a job before that published time expires. (It’s OK if the job runs
only a fraction of the max time).
Codes cannot specify the directory in which they should be run
Codes are installed in some standard location on the Grid Sites or staged on demand.
However, they are not invoked from directories where they are installed. The codes
should be able to be invoked from any directory, as long as one can access the directory
where the codes are installed.
This is especially relevant, while writing scripts around the algorithm codes. At that point
specifying the relative paths do not work. This is because the relative path is constructed
from the directory where the script is being invoked. A suggested workaround is to pick
up the base directory where the software is installed from the environment or by using the
dirname cmd or api. The workflow system can set appropriate environment variables
while launching jobs on the Grid.
No hard-coded paths
The algorithms should not hard-code any directory paths in the code. All directories
paths should be picked up explicitly either from the environment (specifying environment
variables) or from command line options passed to the algorithm code.
Propagating back the right exitcode
A job in the workflow is only released for execution if its parents have executed
successfully. Hence, it is very important that the algorithm codes exit with the correct
error code in case of success and failure. The algorithms should exit with a status of 0 in
case of success, and a non zero status in case of error. Failure to do so will result in
erroneous workflow execution where jobs might be released for execution even though
their parents had exited with an error.
The algorithm codes should catch all errors and exit with a non zero exitcode.
The successful execution of the algorithm code can only be determined by an exitcode of
0. The algorithm code should not rely upon something being written to the stdout to
designate success for e.g. if the algorithm code writes out to the stdout SUCCESS and
exits with a non zero status the job would be marked as failed.
If the algorithm codes create temporary files during execution, they should be cleared by
the codes in case of errors and success terminations. The algorithm codes will run on
scratch file systems that will also be used by others. The scratch directories get filled up
very easily, and jobs will fail in case of directories running out of free space. The
temporary files are the files that are not being tracked explicitly through the workflow
The stdout and stderr should be used for logging purposes only. Any result of the
algorithm codes should be saved to data files that can be tracked through the workflow
If your code requires a configuration file to run and the configuration changes from one
run to another, then this file needs to be tracked explicitly via the workflow system. The
configuration file should not contain any absolute paths to any data or libraries used by
the code. If any libraries, scripts etc need to be referenced they should refer to relative
paths starting with a ./xyz where xyz is a tracked file (defined in the workflow) or as
$ENV-VAR/xyz where $ENV-VAR is set during execution time and evaluated by your
application code internally.
Logical file naming.
The logical file names used by your code can be of two types.
- Without a directory path e.g. f.a, f.b etc
- With a directory path e.g. a/1/f.a, b/2/f.b
Both types of files are supported. We will create any directory structure mentioned in
your logical files on the remote execution site when we stage in data as well as when we
store the output data to a permanent location.
An example invocation of a code that consumes and produces files will be Or Note: A logical file name should never be an absolute file path. E.g. /a/1/f.a (there should not be a starting /)