Table Of Contents:
SCB Ocean Workflows
Input and Output Data
- The input data for the workflow was staged in from corbusier.isi.edu
- Size of raw input data for workflow 1.7G
- The output data generated was staged back to ISI machine ( corbusier.isi.edu )
- Corbusier runs GridFTP Server 3.11 (gcc32, 1213742010-78) Globus Toolkit 4.2.0 ready.
- Final output of interpolation job that is staged out 8.7 MB
Workflow Information
DAX/Abstract Workflow
- SCB Test Workflow DAX
- Image of the workflow DAX
http://www.isi.edu/~vahi/scb/roms_dax_v5.jpg
Number of Jobs by Type |
|
---|---|
Type of Job |
Number |
das_tide |
1 |
fcst_tide |
1 |
interpolate |
1 |
Total number |
3 |
List of Filenames File |
||
---|---|---|
Type of Job |
Input LOF |
Output LOF |
das_tide |
||
fcst_tide |
http__--www.isi.edu-~vahi-scb-fcst_output_lof fcst_ouput_lof |
|
interpolate |
http__--www.isi.edu-~vahi-scb-inter_input_lof inter_input_lof |
Pegasus Configuration Files
The pegasus configuration files for the runs on cobalt system are below
- SCB Pegasus User Properties
- SCB Pegasus Transformation Catalog
- SCB Pegasus Site Catalog
- SCB Pegasus Replica Catalog
- SCB Pegasus Original Replica Catalog
Executable Workflow ( Generated By Pegasus )
- Each job has it's own data stage-in job.
- Only the final output of interpolate job is staged out
- Image of the workflow
- Compute Jobs ( SCB Jobs ) 3
Number of Compute/SCB Jobs by Type
Type of Job
Number
Number of processors used per job
das_tide
1
8
fcst_tide
1
8
interpolate
1
4
Total number
3
Number of Jobs by Type |
|
---|---|
Type of Job |
Number |
Compute/SCB Jobs |
3 |
Data Stagein |
3 |
Data Stageout |
1 |
Directory Creation and Sync Jobs |
1 |
Total number |
8 |
Pegasus Workflow Job States and Delays
Workflow Runtimes
- Number of Test Workflows executed 4
- Data generated by executing gensim utility on the submit directory
Each individual run ~UWC_TOKEN_START~1254808297634~UWC_TOKEN_END~PEGASUS_HOME/contrib/showlog/gensim --dag scb-0.dag --output /lfs1/work/jpl/scb_results/run0001 --jobstate-log jobstate.log All runs ~UWC_TOKEN_START~1254808297635~UWC_TOKEN_END~PEGASUS_HOME/contrib/showlog/gentimes -x run*
Visualization of runs over time
X axis - time in seconds
Y axis - number of jobs
Runs On Teragrid
runs |
|
---|---|
![]() |
![]() |
Added by UWC, the Universal Wiki Converter
|
Added by UWC, the Universal Wiki Converter
|
![]() |
![]() |
Added by UWC, the Universal Wiki Converter
|
Added by UWC, the Universal Wiki Converter
|
Legend
- Legend
- Job - the name of the job
- Site - the site where the job ran
- Kickstart - the actual duration of the job in seconds on the remote compute node
- Post - the postscript time as reported by DAGMan
- Condor - the time between submission by DAGMan and the remote Grid submission. It is an estimate of the time spent in the condor q on the submit node
- Resource - the time between the remote Grid submission and start of remote execution . It is an estimate of the time job spent in the remote queue
- Runtime - the time spent on the resource as seen by Condor DAGMan . Is always >=kickstart
- CondorQLen - the number of outstanding jobs in the queue when this job was released.
run0001
#Job Site Kickstart Post DAGMan Condor Resource Runtime CondorQLen create_dir_scb_0_cobalt cobalt 0.32 5.00 13.00 15.00 0.00 15.00 1 das_tide_ID000001 cobalt 3806.65 5.00 5.00 15.00 3906.00 3855.00 1 fcst_tide_ID000002 cobalt 346.39 5.00 5.00 15.00 90.00 465.00 1 interpolate_ID000003 cobalt 134.49 5.00 5.00 15.00 155.00 160.00 1 stage_in_das_tide_ID000001_0 cobalt 2805.49 5.00 5.00 20.00 5.00 2946.00 1 stage_in_fcst_tide_ID000002_0 cobalt 1665.65 5.00 5.00 20.00 5.00 1805.00 2 stage_in_interpolate_ID000003_0 cobalt 318.15 5.00 5.00 15.00 0.00 435.00 3 stage_out_interpolate_ID000003_0 cobalt 13.31 5.00 5.00 15.00 0.00 135.00 1
run0002
#Job Site Kickstart Post DAGMan Condor Resource Runtime CondorQLen create_dir_scb_0_cobalt cobalt 0.34 5.00 13.00 15.00 0.00 15.00 1 das_tide_ID000001 cobalt 3811.54 5.00 5.00 20.00 13806.00 3915.00 1 fcst_tide_ID000002 cobalt 344.90 5.00 5.00 10.00 175.00 405.00 1 interpolate_ID000003 cobalt 128.56 5.00 5.00 20.00 225.00 160.00 1 stage_in_das_tide_ID000001_0 cobalt 2740.53 5.00 5.00 20.00 0.00 2890.00 2 stage_in_fcst_tide_ID000002_0 cobalt 1670.40 5.00 5.00 15.00 0.00 1815.00 3 stage_in_interpolate_ID000003_0 cobalt 341.18 5.00 5.00 20.00 0.00 490.00 1 stage_out_interpolate_ID000003_0 cobalt 12.97 5.00 5.00 10.00 5.00 155.00 1
run0003
#Job Site Kickstart Post DAGMan Condor Resource Runtime CondorQLen create_dir_scb_0_cobalt cobalt 0.32 5.00 14.00 15.00 0.00 15.00 1 das_tide_ID000001 cobalt 3794.60 5.00 5.00 10.00 1650.00 3916.00 1 fcst_tide_ID000002 cobalt 492.81 5.00 6.00 15.00 110.00 520.00 1 interpolate_ID000003 cobalt 108.58 5.00 5.00 10.00 120.00 160.00 1 stage_in_das_tide_ID000001_0 cobalt 2797.34 5.00 5.00 15.00 0.00 2955.00 2 stage_in_fcst_tide_ID000002_0 cobalt 1658.32 5.00 5.00 10.00 5.00 1755.00 3 stage_in_interpolate_ID000003_0 cobalt 348.12 5.00 5.00 15.00 0.00 495.00 1 stage_out_interpolate_ID000003_0 cobalt 8.22 5.00 5.00 15.00 0.00 95.00 1
run0004
#Job Site Kickstart Post DAGMan Condor Resource Runtime CondorQLen create_dir_scb_0_cobalt cobalt 0.29 5.00 13.00 15.00 0.00 125.00 1 das_tide_ID000001 cobalt 3861.94 5.00 5.00 10.00 2735.00 3916.00 1 fcst_tide_ID000002 cobalt 348.76 5.00 5.00 15.00 150.00 405.00 1 interpolate_ID000003 cobalt 139.54 5.00 5.00 15.00 150.00 165.00 1 stage_in_das_tide_ID000001_0 cobalt 2686.90 5.00 5.00 10.00 5.00 2821.00 3 stage_in_fcst_tide_ID000002_0 cobalt 1641.17 5.00 5.00 10.00 5.00 1806.00 1 stage_in_interpolate_ID000003_0 cobalt 333.03 5.00 5.00 10.00 5.00 455.00 2 stage_out_interpolate_ID000003_0 cobalt 12.13 5.00 5.00 15.00 0.00 135.00 1
All Runs
#All #Transformation Count Mean Variance pegasus::transfer 16 1190.81 1279724.52 scb::das_tide 4 3818.68 882.31 pegasus::dirmanager 4 0.32 0.00 scb::fcst_tide 4 383.22 5341.00 scb::interpolate 4 127.79 184.18
Runs On Pollux
run0001
run0002
run0003
Legend
- Legend
- Job - the name of the job
- Site - the site where the job ran
- Kickstart - the actual duration of the job in seconds on the remote compute node
- Post - the postscript time as reported by DAGMan
- Condor - the time between submission by DAGMan and the remote Grid submission. It is an estimate of the time spent in the condor q on the submit node
- Resource - the time between the remote Grid submission and start of remote execution . It is an estimate of the time job spent in the remote queue
- Runtime - the time spent on the resource as seen by Condor DAGMan . Is always >=kickstart
- CondorQLen - the number of outstanding jobs in the queue when this job was released.
run0001
#Job Site Kickstart Post DAGMan Condor Resource Runtime CondorQLen Seqexec Seqexec-Delay create_dir_scb-gemini_0_pollux pollux 1.19 5.00 60.00 0.00 0.00 100.00 0 - - das_tide_ID000001 pollux 4561.04 10.00 108.00 0.00 0.00 4430.00 0 - - fcst_tide_ID000002 pollux 956.74 5.00 6.00 0.00 0.00 935.00 0 - - interpolate_ID000003 pollux 262.02 10.00 7.00 0.00 0.00 255.00 0 - - stage_in_das_tide_ID000001_0 local 160.00 10.00 125.00 0.00 0.00 115.00 0 stage_in_fcst_tide_ID000002_0 local 160.00 10.00 125.00 0.00 0.00 115.00 0 stage_in_interpolate_ID000003_0 local 160.00 5.00 125.00 0.00 0.00 160.00 0 stage_out_interpolate_ID000003_0 local 126.00 5.00 5.00 0.00 0.00 126.00 0
run0002
#Job Site Kickstart Post DAGMan Condor Resource Runtime CondorQLen Seqexec Seqexec-Delay create_dir_scb-gemini_0_pollux pollux 1.19 10.00 121.00 0.00 0.00 215.00 0 - - das_tide_ID000001 pollux 4496.42 5.00 65.00 0.00 0.00 4485.00 0 - - fcst_tide_ID000002 pollux 1271.32 5.00 5.00 0.00 0.00 1255.00 0 - - interpolate_ID000003 pollux 256.19 5.00 6.00 0.00 0.00 260.00 0 - - stage_in_das_tide_ID000001_0 local 360.00 5.00 306.00 0.00 0.00 265.00 0 stage_in_fcst_tide_ID000002_0 local 255.00 5.00 306.00 0.00 0.00 155.00 0 stage_in_interpolate_ID000003_0 local 100.00 5.00 306.00 0.00 0.00 100.00 0 stage_out_interpolate_ID000003_0 local 100.00 5.00 5.00 0.00 0.00 100.00 0
run0003
#Job Site Kickstart Post DAGMan Condor Resource Runtime CondorQLen Seqexec Seqexec-Delay create_dir_scb-gemini_0_pollux pollux 1.17 10.00 120.00 0.00 0.00 155.00 0 - - das_tide_ID000001 pollux 4359.44 5.00 5.00 0.00 0.00 4345.00 0 - - fcst_tide_ID000002 pollux 864.60 10.00 5.00 0.00 0.00 856.00 0 - - interpolate_ID000003 pollux 305.44 5.00 57.00 0.00 0.00 290.00 0 - - stage_in_das_tide_ID000001_0 local 421.00 5.00 245.00 0.00 0.00 260.00 0 stage_in_fcst_tide_ID000002_0 local 311.00 5.00 245.00 0.00 0.00 155.00 0 stage_in_interpolate_ID000003_0 local 161.00 5.00 245.00 0.00 0.00 161.00 0 stage_out_interpolate_ID000003_0 local 170.00 5.00 5.00 0.00 0.00 170.00 0
DAX Generator
Input
- current simulation time, yyyymmddhh. hh is usually 03 09 15 21
- forecast lenght ( for fcst jobs ) . is usually 6 but should be configurable.
Notes
Vahi 14:33, 3 March 2009 (PST)
- for files ending in monMM, MM needs to have the month passed in the simulation time
- In the sample DAX, the files that end in 06 are the ones that have 3 hours subtracted from the HH of the time passed in the input
- For roms bulk file and scbclim fille that are input to the forecast job are always from a day before.
CODE DOWNLOAD
DAX Generator
Building from source
- svn checkout https://pegasus.isi.edu/svn/scb/trunk scb
- cd scb
- export SCB_HOME=`pwd`
- unset CLASSPATH
- source setup-devel.sh
- ant dist
This will create a binary distribution
scb-binary-1.0.tar.gz
Installing the binary distribution
- tar zxvf scb-binary-1.0.tar.gz
- cd scb-1.0
- export SCB_HOME=`pwd`
- unset CLASSPATH
- source setup.sh
Setting up the user environment
- cd scb-1.0
- this is the directory that was created when you untarred the scb-binary-1.0.tar.gz file.
- export SCB_HOME=`pwd`
- unset CLASSPATH
- source setup.sh
DAX Generator Description
The SCB DAX generator is a java program that uses the Pegasus JAVA DAX API to generate SCB DAX'es (abstract workflows). To generate a DAX the user needs to specify the data assimilation time and optionally the forecast duration. The forecast duration unless specified defaults to 6 hours. In addition to generating the SCB DAX, the dax generator generates the following
- the input and output LOF ( List of Filenames Files ) for the jobs in the DAX
- A File based Replica Catalog that catalogs the locations of the LOF files, so that Pegasus can transfer them as part of the workflow.
Generating a SCB DAX
scb-dax-gen
USAGE
{panel} $Id: DAXGenerator.java 1717 2009-03-10 04:09:08Z vahi $ 1.0 scb-dax-gen - The main class used to run SCB dax generator Usage: scb-dax-gen [-Dprop [..]] --time <YYYYMMDDHH> [--dir <output directory>] [--name <dax basename>] [--fcst-duration <forecast duration>] [-u <url-prefix>] [--verbose] [--Version] [-h] {panel} Mandatory Options {panel} -t |--time the time at which data assimilation took place. In YYYYMMDDHH format. Other Options -n |--name the basename to be given to the DAX file that is generated. -D |--dir the directory where to generate the DAX and the LOF files. -f |--fcst-duration the duration in hours of the forecast. Defaults to 6. -u |--url-prefix the url prefix to the server hosting the LOF files e.g. gsiftp://server.isi.edu -v |--verbose increases the verbosity of messages about what is going on -V |--version displays the version of the SCB DAX Generator -h |--help generates this help. The following exitcodes are produced 0 the dax generator was able to generate DAX and associated LOF files 1 an error occured. In most cases, the error message logged should give a clear indication as to where things went wrong. 2 an error occured while loading a specific module implementation at runtime {panel}
EXAMPLE
corbusier:scb-1.0 vahi$ scb-dax-gen --time 2008060409 --dir dax --fcst-duration 6 --url-prefix "gsiftp://dummy.isi.edu" 2009.03.09 21:06:40.968 PDT: [INFO] event.scb.dax-generator scb.version 1.0 - STARTED 2009.03.09 21:06:41.011 PDT: [INFO] Time taken to execute is 0.043 seconds 2009.03.09 21:06:41.011 PDT: [INFO] event.scb.dax-generator scb.version 1.0 - FINISHED corbusier:scb-1.0 vahi$ ls dax/ das_input_2008060409_lof fcst_output_2008060409_lof scb-2008060409-6.cache fcst_input_2008060409_lof interpolate_input_2008060409_lof scb-2008060409-6.dax
SCB Wrapper Scripts
Vahi 12:20, 2 November 2009 (PDT)
Each job in the SCB workflow has a wrapper associated with it , that prepares the input for the OPEN MP code to execute.
The wrappers figure out the input files required and the other arguments and pass them ahead to the OPEN MP code. To get the codes executed as part of a workflow, the wrapper scripts needed to be modified as follows
- Remove Hardcoded Paths
- The wrapper scripts expected that the input data is available in a fixed location relative to where the codes are installed. This is not a feasible solution in the GRID environment, as the jobs are usually executed on a scratch filesystem. Pegasus is able to transfer the input data to the scratch directories. However, the scripts needed to be modified to pick up the input data from the workflow specific scratch directory. The modifications involved passing a list of input of filenames file ( LOF file ) and a list of output filenames files to the wrapper scripts. These list of filenames file identify the input/output data that a job requires/produces.
- Remove a layer of scripts
- The wrapper scripts were launched by another wrapper script, that set the arguments for the jobs as environment variables, and certain environment variables for the OPEN MP system. The outermost script was removed, and the immediate wrapper around the codes were modified to take the arguments on the command line. The open MP variables are specified in the DAX and are set in the job environment when it is launched on the remote site.
- Name of SCB wrapper scripts
- scb_pegasus_run_das_tide
- scb_pegasus_interscript_das
- scb_pegasus_run_fcst_tide
The SCB wrapper scripts are available in the SVN checkout.
The SVN checkout has a tar file scb-codes-050409.tgz
Untar it and it will be in JPL/bin directory
Instructions for running workflows on pollux
- Log onto pollux as user gmehta
- check for grid proxy using grid-proxy-info. This is required to stage the input data from grid ftp server on pollux
pollux scb_test/run0005> grid-proxy-info subject : /DC=org/DC=doegrids/OU=People/CN=Karan Vahi 476301/CN=200344285 issuer : /DC=org/DC=doegrids/OU=People/CN=Karan Vahi 476301 identity : /DC=org/DC=doegrids/OU=People/CN=Karan Vahi 476301 type : Proxy draft (pre-RFC) compliant impersonation proxy strength : 512 bits path : /tmp/x509up_u41244 timeleft : 73:35:04 (3.0 days)
- change to pegasus submit directory
pollux /home/gmehta> cd ~/pegasus-submit-dir/ pollux gmehta/pegasus-submit-dir> pwd /workp/oba/gmehta/SUBMIT pollux gmehta/pegasus-submit-dir> ls conf dags dax dax-gen EXEC pegasus-plan.txt pegasus-plan.txt~ STORAGE
- pegasus-plan.txt has some command that we run
- generating the DAX using scb-dax-gen
#generating dax using dax generator pollux gmehta/pegasus-submit-dir> scb-dax-gen --time 2008060409 --dir dax-gen --fcst-duration 6 --url-prefix "file:///" -p pollux 2009.05.07 14:06:35.154 PDT: [INFO] event.scb.dax-generator scb.version 1.0 - STARTED 2009.05.07 14:06:35.328 PDT: [INFO] Time taken to execute is 0.156 seconds 2009.05.07 14:06:35.329 PDT: [INFO] event.scb.dax-generator scb.version 1.0 - FINISHED pollux gmehta/pegasus-submit-dir> ls dax-gen/ das_input_2008060409_lof fcst_output_2008060409_lof scb-2008060409-6.cache fcst_input_2008060409_lof interpolate_input_2008060409_lof scb-2008060409-6.dax This will create the dax in the dax-gen directory and the associated LOF files reqd for the workflow
- Edit the dax file to give a shorter label due to bug in the fcst code. Change label to scb-test
- planning and submitting the workflow
pollux gmehta/pegasus-submit-dir> pegasus-plan -Dpegasus.user.properties=./conf/properties --dax dax-gen/scb-2008060409-6.dax --cache dax-gen/scb-2008060409-6.cache -s pollux -o local --dir dags --nocleanup --force --submit 2009.05.07 14:09:14.780 PDT: [INFO] event.pegasus.planner planner.version 2.4.0cvs - STARTED 2009.05.07 14:09:16.152 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.174 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.287 PDT: [INFO] event.pegasus.parse.dax dax.id /workp/oba/gmehta/SUBMIT/dax-gen/scb-2008060409-6.dax - STARTED 2009.05.07 14:09:16.584 PDT: [INFO] event.pegasus.parse.dax dax.id /workp/oba/gmehta/SUBMIT/dax-gen/scb-2008060409-6.dax - FINISHED 2009.05.07 14:09:16.645 PDT: [INFO] event.pegasus.refinement dax.id scb-test_1 - STARTED 2009.05.07 14:09:16.683 PDT: [INFO] event.pegasus.load.cache dax.id scb-test_1 - STARTED 2009.05.07 14:09:16.695 PDT: [INFO] event.pegasus.load.cache dax.id scb-test_1 - FINISHED 2009.05.07 14:09:16.704 PDT: [INFO] event.pegasus.siteselection dax.id scb-test_1 - STARTED 2009.05.07 14:09:16.732 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.738 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.742 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.746 PDT: [INFO] event.pegasus.siteselection dax.id scb-test_1 - FINISHED 2009.05.07 14:09:16.763 PDT: [INFO] Grafting transfer nodes in the workflow 2009.05.07 14:09:16.763 PDT: [INFO] event.pegasus.generate.transfer-nodes dax.id scb-test_1 - STARTED Ignoring PFN file:////workp/oba/gmehta/SUBMIT/STORAGE/2008060409_rst.nc 2009.05.07 14:09:16.827 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.836 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.841 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.843 PDT: [WARNING] profile condor.grid_resource is empty, Removing! 2009.05.07 14:09:16.843 PDT: [WARNING] profile pegasus.style is empty, Removing! 2009.05.07 14:09:16.855 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.861 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.866 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.867 PDT: [WARNING] profile condor.grid_resource is empty, Removing! 2009.05.07 14:09:16.868 PDT: [WARNING] profile pegasus.style is empty, Removing! 2009.05.07 14:09:16.877 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.881 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.885 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.886 PDT: [WARNING] profile condor.grid_resource is empty, Removing! 2009.05.07 14:09:16.887 PDT: [WARNING] profile pegasus.style is empty, Removing! 2009.05.07 14:09:16.895 PDT: [INFO] event.pegasus.generate.transfer-nodes dax.id scb-test_1 - FINISHED 2009.05.07 14:09:16.913 PDT: [INFO] event.pegasus.generate.workdir-nodes dax.id scb-test_1 - STARTED 2009.05.07 14:09:16.922 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.924 PDT: [INFO] event.pegasus.generate.workdir-nodes dax.id scb-test_1 - FINISHED 2009.05.07 14:09:16.925 PDT: [INFO] event.pegasus.generate.cleanup-wf dax.id scb-test_1 - STARTED 2009.05.07 14:09:16.928 PDT: [WARNING] unknown profile condor.grid_resource, using anyway 2009.05.07 14:09:16.931 PDT: [INFO] event.pegasus.generate.cleanup-wf dax.id scb-test_1 - FINISHED 2009.05.07 14:09:16.931 PDT: [INFO] event.pegasus.refinement dax.id scb-test_1 - FINISHED 2009.05.07 14:09:17.152 PDT: [INFO] Generating codes for the concrete workflow 2009.05.07 14:09:17.524 PDT: [INFO] Generating codes for the concrete workflow -DONE 2009.05.07 14:09:17.525 PDT: [INFO] Generating code for the cleanup workflow 2009.05.07 14:09:17.801 PDT: [INFO] Generating code for the cleanup workflow -DONE 2009.05.07 14:09:19.181 PDT: [ERROR] Rescued /tmp/scb-test-17713699551907048940.log as /tmp/scb-test-17713699551907048940.log.000 2009.05.07 14:09:19.212 PDT: [INFO] 2009.05.07 14:09:19.224 PDT: [INFO] Checking all your submit files for log file names. 2009.05.07 14:09:19.236 PDT: [INFO] This might take a while... 2009.05.07 14:09:19.248 PDT: [INFO] Done. 2009.05.07 14:09:19.263 PDT: [INFO] ----------------------------------------------------------------------- 2009.05.07 14:09:19.280 PDT: [INFO] File for submitting this DAG to Condor : scb-test-1.dag.condor.sub 2009.05.07 14:09:19.310 PDT: [INFO] Log of DAGMan debugging messages : scb-test-1.dag.dagman.out 2009.05.07 14:09:19.320 PDT: [INFO] Log of Condor library output : scb-test-1.dag.lib.out 2009.05.07 14:09:19.332 PDT: [INFO] Log of Condor library error messages : scb-test-1.dag.lib.err 2009.05.07 14:09:19.344 PDT: [INFO] Log of the life of condor_dagman itself : scb-test-1.dag.dagman.log 2009.05.07 14:09:19.356 PDT: [INFO] 2009.05.07 14:09:19.368 PDT: [INFO] -no_submit given, not submitting DAG to Condor. You can do this with: 2009.05.07 14:09:19.380 PDT: [INFO] "condor_submit scb-test-1.dag.condor.sub" 2009.05.07 14:09:19.392 PDT: [INFO] ----------------------------------------------------------------------- 2009.05.07 14:09:19.404 PDT: [INFO] Submitting job(s). 2009.05.07 14:09:19.416 PDT: [INFO] Logging submit event(s). 2009.05.07 14:09:19.428 PDT: [INFO] 1 job(s) submitted to cluster 347. 2009.05.07 14:09:19.440 PDT: [INFO] 2009.05.07 14:09:19.452 PDT: [INFO] I have started your workflow, committed it to DAGMan, and updated its 2009.05.07 14:09:19.464 PDT: [INFO] state in the work database. A separate daemon was started to collect 2009.05.07 14:09:19.476 PDT: [INFO] information about the progress of the workflow. The job state will soon 2009.05.07 14:09:19.488 PDT: [INFO] be visible. Your workflow runs in base directory. 2009.05.07 14:09:19.500 PDT: [INFO] 2009.05.07 14:09:19.512 PDT: [INFO] cd /workp/oba/gmehta/SUBMIT/dags/gmehta/pegasus/scb-test/run0003 2009.05.07 14:09:19.524 PDT: [INFO] 2009.05.07 14:09:19.536 PDT: [INFO] *** To monitor the workflow you can run *** 2009.05.07 14:09:19.548 PDT: [INFO] 2009.05.07 14:09:19.560 PDT: [INFO] pegasus-status -w scb-test-1 -t 20090507T140914-0700 2009.05.07 14:09:19.572 PDT: [INFO] or 2009.05.07 14:09:19.584 PDT: [INFO] pegasus-status /workp/oba/gmehta/SUBMIT/dags/gmehta/pegasus/scb-test/run0003 2009.05.07 14:09:19.596 PDT: [INFO] 2009.05.07 14:09:19.608 PDT: [INFO] *** To remove your workflow run *** 2009.05.07 14:09:19.620 PDT: [INFO] 2009.05.07 14:09:19.632 PDT: [INFO] pegasus-remove -d 347.0 2009.05.07 14:09:19.644 PDT: [INFO] or 2009.05.07 14:09:19.656 PDT: [INFO] pegasus-remove /workp/oba/gmehta/SUBMIT/dags/gmehta/pegasus/scb-test/run0003 2009.05.07 14:09:19.668 PDT: [INFO] 2009.05.07 14:09:19.681 PDT: [INFO] Time taken to execute is 4.871 seconds 2009.05.07 14:09:19.681 PDT: [INFO] event.pegasus.planner planner.version 2.4.0cvs - FINISHED
Voicecall on May 12th, 2009
Topics to discuss after demonstration
Condor installation
Right now condor is running as user gmehta. No other user can use it.
Options
- One way to get around it is to run condor as root. This way multiple multiple users can use the same condor installation
- The installation is copied to peggy's user, and then we have condor running as peggy also.
If condor is run as a user , then it is running at a lower priority. So occasionally condor is not responsive. Condor commands take longer to execute.
Part of the reason is because we are running condor directly on the machine instead of going via PBS
Population of Replica Catalog
Right now we have the mapping for the input data for the sample workflow in a file based replica catalog
The input data is pulled from a grid ftp server at ISI.
Where is the input data hosted as and when it is generated ? We need to stand up a grid ftp server in front of it to stage the workflows.
Also the locations of the input data need to be catalogued in the replica catalog for DB to use.
Proxy for Peggy
Right now user gmehta uses Karan's proxy to stage in the data . One option is for Gaurang to generate a user certificate for Peggy from the CA he runs at ISI.
If Peggy has DOE certs we can use them.
Missing output files
Sometimes the job does not create an output file that is referred in the DAX for the job.
for e.g. in the sample workflow the fcst_tide job does not create the output file 2008060409_2008060415_avg.nc
Hence the stage out job for fcst_tide fails
One way is for the wrapper script to create an empty output file if the code exits successfully and the output files are not created.
DOCUMENTS
</html>