Schema

The XML schema for a site catalog is defined at

http://pegasus.isi.edu/mapper/docs/schemas/sc-3.0/sc-3.0.html

Grid Site

The grid site is any site where you want to submit jobs

Site handle

The site handle is a identifier for the site that must match the identifier in the Process Catalog.
Also the architecture of the site and OS needs to be defined which would match with the PC.
If a site is mixed architecture and has only a single gateway (eg. jobmanager then if there is a common identifier that can define such a site should be used.
If the site has multiple gateways or multiple jobmanagers that can handle the different architectures then multiple sites should be defined.

options for arch are

  • x86,
  • x86_64
  • ppc
  • ppc_64
  • ia64
  • sparcv7
  • sparcv9

options for os are

  • LINUX
  • SUNOS
  • AIX
  • MACOSX
  • WINDOWS
<site  handle="isi_viz" arch="x86" os="LINUX">

Grid

The grid entry defines the globus jobmanager/gatekeeper interface to submit the remote jobs to

For tangram se-18 purpose the type is always gt2

The contact string is the globus jobmanager contact in the formation hostname/jobmanager-<scheduler>

where scheduler can be fork (for head node), condor or pbs or lsf or sge.

scheduler entry should match the scheduler defined in the contact information

Scheduler Types are
PBS, Fork, LSF or Condor

The jobtypes define which type of job can run on this jobmanager contact sting.

The various jobtypes are

compute = for all types of compute jobs
transfer = for running stagin, stageout transfer jobs
cleanup = for running cleanup jobs
auxillary = for running all other maintenance jobs
register = for running file registration jobs

For Tangram SE18 you need to atleast define compute, transfer and auxillary type.
Additionally the auxillary type should be of type fork and the transfer type should be of something other then fork.

<grid type="gt2" contact="viz-login.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="compute" idle-nodes="5"/>
<grid type="gt2" contact="viz-login.isi.edu/jobmanager-fork" scheduler="Fork" jobtype="auxillary"/>
<grid type="gt2" contact="viz-login.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="transfer"/>

File System definitions

The file system definition defines the file servers available on the head node and on the cluster as well as the shared and local files systems available.

The head-fs is a required element. The worker fs is option and is not needed unless there is no shared file system..

The head-fs has two types

Scratch : This is the filesystem on which the jobs are run.
Storage: this is the filesystem where output data is staged. (For tangram purposes during SE18 this may not have any meaning.)

Each type of filesystem needs one fileserver and its external and internal mont-points.

e.g.

<head-fs>
<scratch>
<shared>
<file-server protocol="gsiftp" url="gsiftp://viz-login.isi.edu" mount-point="/nfs/shared-scratch/gmehta/">
</file-server>
<internal-mount-point mount-point="/nfs/shared-scratch/gmehta/" free-size="null" total-size="null"/>
</shared>
</scratch>
<storage>
<shared>
<file-server protocol="gsiftp" url="gsiftp://viz-login.isi.edu" mount-point="/nfs/shared-scratch/gmehta/data">
</file-server>
<internal-mount-point mount-point="/nfs/shared-scratch/gmehta/data" free-size="null" total-size="null"/>
</shared>
</storage>
</head-fs>

Profiles

Profiles are a way to pass ENV variables and other special keys to pegasus and onto the running job.
The Env profile if specified with a key=value the environment variable will be set for any job that runs on the site.

Required Env profiles for Tangram are listed below. The paths need to be change for the installation on that grid site.
Also the profile maxwalltime and queue under namespace globus may be required when the jobmanager type for the grid is PBS. (see jobmanager type above)

<profile namespace="env" key="DC_HOME" >/nfs/software/anchor/se18/dc</profile>
<profile namespace="env" key="GLOBUS_LOCATION" >/nfs/software/globus/globus-4.0.1</profile>
<profile namespace="env" key="GU_HOME" >/nfs/software/anchor/se18/wrapped</profile>
<profile namespace="env" key="JAVA_HOME" >/nfs/software/java/default</profile>
<profile namespace="env" key="LD_LIBRARY_PATH" >/nfs/software/globus/globus-4.0.1/lib</profile>
<profile namespace="env" key="PEGASUS_HOME" >/nfs/home/vahi/PEGASUS/default</profile>

<profile namespace="globus" key="maxwalltime">43200</profile>
<profile namespace="globus" key="queue">normal</profile>

The local site

The local site is a special site that needs to present in the site catalog.
The local site represents the submission site.

Example

<?xml version="1.0" encoding="UTF-8"?>
<sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLoca
tion="http://pegasus.isi.edu/schema/sitecatalog Z:\Pegasus\pegasus\etc\sc-3.0.xsd" version="3.0">

<site  handle="isi_viz" arch="x86" os="LINUX">
{panel}
        <grid  type="gt2" contact="viz-login.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="compute" idle-nodes="5"/>
        <grid  type="gt2" contact="viz-login.isi.edu/jobmanager-fork" scheduler="Fork" jobtype="auxillary"/>
        <grid  type="gt2" contact="viz-login.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="transfer"/>
        <head-fs>
                <scratch>
                        <shared>
                                <file-server protocol="gsiftp" url="gsiftp://viz-login.isi.edu" mount-point="/nfs/shared-scratch/gmehta/">
                                </file-server>
                                <internal-mount-point mount-point="/nfs/shared-scratch/gmehta/" free-size="null" total-size="null"/>
                        </shared>
                </scratch>
                <storage>
                        <shared>
                                <file-server protocol="gsiftp" url="gsiftp://viz-login.isi.edu" mount-point="/nfs/shared-scratch/gmehta/data">
                                </file-server>
                                <internal-mount-point mount-point="/nfs/shared-scratch/gmehta/data" free-size="null" total-size="null"/>
                        </shared>
                </storage>
        </head-fs>
        <replica-catalog  type="LRC" url="rls://sukhna.isi.edu">
        </replica-catalog>
        <profile namespace="env" key="DC_HOME" >/nfs/software/anchor/se18/dc</profile>
        <profile namespace="env" key="GLOBUS_LOCATION" >/nfs/software/globus/globus-4.0.1</profile>
        <profile namespace="env" key="GU_HOME" >/nfs/software/anchor/se18/wrapped</profile>
        <profile namespace="env" key="JAVA_HOME" >/nfs/software/java/default</profile>
        <profile namespace="env" key="LD_LIBRARY_PATH" >/nfs/software/globus/globus-4.0.1/lib</profile>
        <profile namespace="env" key="PEGASUS_HOME" >/nfs/home/vahi/PEGASUS/default</profile>
{panel}

        <profile namespace="globus" key="maxwalltime">43200</profile>
{panel}
         <profile namespace="globus" key="queue">normal</profile>
{panel}

        <profile namespace="pegasus" key="bundle.stagein" >3</profile>
{panel}
        <profile namespace="pegasus" key="bundle.stageout" >1</profile>
        <profile namespace="pegasus" key="chain.stagein" >4</profile>
        <profile namespace="pegasus" key="change.dir" >false</profile>
{panel}

</site>

<site  handle="isi_wind" arch="x86_64" os="LINUX">
{panel}
        <grid  type="gt2" contact="wind.isi.edu/jobmanager-condor" scheduler="Condor" jobtype="compute" idle-nodes="6"/>
        <grid  type="gt2" contact="wind.isi.edu/jobmanager-fork" scheduler="Fork" jobtype="auxillary" idle-nodes="6"/>
        <grid  type="gt2" contact="wind.isi.edu/jobmanager-condor" scheduler="Condor" jobtype="transfer" idle-nodes="6"/>
        <head-fs>
                <scratch>
                        <shared>
                                <file-server protocol="gsiftp" url="gsiftp://wind.isi.edu" mount-point="/nfs/shared-scratch/gmehta/">
                                </file-server>
                                <internal-mount-point mount-point="/nfs/shared-scratch/gmehta/" free-size="null" total-size="null"/>
                        </shared>
                </scratch>
                <storage>
                        <shared>
                                <file-server protocol="gsiftp" url="gsiftp://wind.isi.edu" mount-point="/nfs/shared-scratch/gmehta/data">
                                </file-server>
                                <internal-mount-point mount-point="/nfs/shared-scratch/gmehta/data" free-size="null" total-size="null"/>
                        </shared>
                </storage>
        </head-fs>
        <replica-catalog  type="LRC" url="rls://sukhna.isi.edu">
        </replica-catalog>
        <profile namespace="env" key="GLOBUS_LOCATION" >/nfs/software/globus/default</profile>
        <profile namespace="env" key="GU_HOME" >/nfs/software/anchor/se18/wrapped</profile>
        <profile namespace="env" key="DC_HOME" >/nfs/software/anchor/se18/dc</profile>
        <profile namespace="env" key="JAVA_HOME" >/nfs/software/java/default</profile>
        <profile namespace="env" key="LD_LIBRARY_PATH" >/nfs/software/globus/default/lib</profile>
        <profile namespace="env" key="PEGASUS_HOME" >/nfs/software/pegasus/default</profile>
{panel}
</site>


<site  handle="local" arch="x86" os="LINUX">
{panel}
        <grid  type="gt2" contact="localhost/jobmanager-condor" scheduler="Condor" jobtype="compute"/>
        <grid  type="gt2" contact="localhost/jobmanager-fork" scheduler="Fork" jobtype="auxillary"/>
        <head-fs>
                <scratch>
                        <shared>
                                <file-server protocol="gsiftp" url="gsiftp://smarty.isi.edu/nfs/cgt-scratch/vahi/LOCAL" mount-point="/nfs/cgt-scratch/vahi/LOCAL">
                                </file-server>
                                <internal-mount-point mount-point="/nfs/cgt-scratch/vahi/LOCAL" free-size="null" total-size="null"/>
                        </shared>
                </scratch>
                <storage>
                        <shared>
                                <file-server protocol="gsiftp" url="gsiftp://smarty.isi.edu/nfs/cgt-scratch/vahi/LOCAL" mount-point="/nfs/cgt-scratch/vahi/LOCAL">
                                </file-server>
                                <internal-mount-point mount-point="/nfs/cgt-scratch/vahi/LOCAL" free-size="null" total-size="null"/>
                        </shared>
                </storage>
        </head-fs>
        <replica-catalog  type="LRC" url="rls://localhost">
        </replica-catalog>
        <profile namespace="env" key="GLOBUS_LOCATION" >/nfs/asd2/pegasus/software/linux/globus/default</profile>
        <profile namespace="env" key="LD_LIBRARY_PATH" >/nfs/asd2/pegasus/software/linux/globus/default/lib</profile>
        <profile namespace="env" key="PEGASUS_HOME" >/nfs/asd2/vahi/montage/tutorial/pegasus/default</profile>
{panel}
</site>
</sitecatalog>

Automated Site Catalog

Here we try to explain a possible Automated Site catalog situation.. This should make use of the existing nagios test framework.

  1. Each site which deploys a cluster should create an XML snippet for the Site as per the site catalog schema.
    1. The site administrator is responsible for maintaing the information in the site catalog snippet.
    2. This XML snippet should be available via http on their site.
    3. The site administrator will record the url to this xml snippet and the site id in a central DB specifiying that a site has been made active or currently inactive
    4. This registration can be done by running the sql command directly or providing a simple wrapper script to do the registration
  2. A Nagios plugin needs to be written that will query this database at regular intervals to collect the list of available active sites and site catalog snippets urls
    1. This plugin/script will fetch the sitecatalog snippet from each active site.
  3. The nagios plugin will parse the site catalog snippet and do the tests that are currently statically fed to nagios. i.e. Test for Gridftp, Test for jobmanagers etc..
    1. Onces the tests for a site succeed the nagios plugin will add the xml snippet to a master sitecatalog file that it maintains and makes available via a webserver
  4. a simple wrapper script can be provided around standard tools like wget or curl to fetch this master sitecatalog at regular intervals by running cron jobs on each submit node.
    1. The user could also use the download script to download the latest sitecatalog..