LOGIN INFORMATION

There is an other server which is now used by Jonathan :

  • ssh vahi@newbio.cs.wisc.edu
    Password:******

NB: old version were run on the bio machine (replace everything with bio instead of newbio)

There is one main script (pegasus_sipht.pl) that can :

  • generate the dax;
  • generate the dax and plan it;
  • generate the dax and run it;

Code source location

svn co https://pegasus.isi.edu/svn/sipht/auto_sRNAPredict/ auto_sRNAPredict

all the scripts used by Pegasus have p_* prefix

Make Sure xbit is set for the executables on newbio

The application code on newbio is usually installed as user jlivny.
/scratch/auto_sRNAPredict/exe

chmod 777

SUBMITTING SIPHT WORKFLOWS USING PEGASUS

  • ls /scratch/auto_sRNAPredict/pegasus (= $PEGASUS_CONFIG)
    • setup-with-pegasus - the source file to setup the environment variable (tcsh)
    • tc.data - the transformation catalog
    • site.config.xml - the site catalog

pegasus_sipht.pl

Can generate DAX, plan and run it for one SOI

  • run a simple SOI:
    newbio(1)% source /scratch/auto_sRNAPredict/pegasus/setup-with-pegasus
    newbio(2)% /scratch/auto_sRNAPredict/pegasus/pegasus_sipht.pl -soi NC_007146 -all \
    -c /scratch/auto_sRNAPredict/config/default.config -pegasus run
    

pegasus_multiple_vs_all.pl

Can generate DAX, plan and run it for all the SOI, it create one outer dax with inner DAX (the number of inner DAXs depend on the number of SOI)

  • run multiple SOI
    • We have implemented a 'multiple_soi_vs_all.pl' script which creates a DAX that contains inner DAXs.
      The way this script works is quiet sample and relies on the single SOI version: 'pegasus_sipht.pl'
      First, the script calls 'pegasus_sipht.pl' and creates only the DAX for one SOI, then it create the outer DAX which has the reference of all the DAX.
      So the outer DAX is a bag of DAX to execute. Then a simple pegasus-plan and pegasus-submit is called on the outer DAX (hdax.xml).
    • The 'multiple_soi_vs_all.pl' has 2 options to control the execution:
      -maxjobs maximum number of SIPHT workflow run concurrently, by default 50
      -maxpre maximum number of pegasus-plan invoke at the some time, by default 5
    • maxpre is used to avoid the case where to many pegasus-plan are called at the same time (too many instance of JVM) and may cause a high load on the submit host.

A typicall command line to invoke this script is :

#>source /scratch/auto_sRNAPredict/pegasus/setup-with-pegasus
#>/scratch/auto_sRNAPredict/pegasus/pegasus_multiple_vs_all.pl -all -c /scratch/auto_sRNAPredict/config/default_all_search.config -maxjobs 40

Pegasus workflow submit dir and output dir

Submit Dir

it depends on the variable OUTPUT_DIR in the config_file and it's created by the script pegasus_sipht.plfor example in /scratch/auto_sRNAPredict/config/default.config
OUTPUT_DIR = /scratch/auto_sRNAPredict/workspaces/$(USER)
then the script 'pegasus_sipht.pl' genereate a uniq submit dir based on the SOI and an id
for example (SOI=NC_009925 and id=1244759803):

newbio(3)% ls /scratch/auto_sRNAPredict/workspaces/vahi/NC_009925.1244759803/
output/  submit/
newbio(4)% ls -l /scratch/auto_sRNAPredict/workspaces/vahi/NC_009925.1244759803/submit/
NC_007146.1244759803/
NC_007146_cache_file
patser.in
pegasus_planlog_1244759803
pegasus_runlog_1244759803
NC_009925_cache_exec
NC_009925_dax.xml
pegasus_plancmd_1244759803
pegasus_runcmd_1244759803
sRNAPredict.in

* patser.in input file for the workflow . this is generated /copied by the dax generator code

  • sRNAPredict.in input file for the workflow . this is generated /copied by the dax generator code
  • NC_009925_cache_exec the file based pegasus replica catalog for the workflow
  • NC_009925_cache_file the file based pegasus replica catalog for the workflow
  • pegasus_planlog_1244759803 pegasus plan log
  • pegasus_plancmd_1244759803 pegasus plan invocation
  • pegasus_runlog_1244759803 pegasus run log
  • pegasus_runcmd_1244759803 pegasus run invocation
  • NC_009925.1244759803/ has the .dag and submit files (.sub)

Output Dir

The result files and outputs are stored in the ouput directory: for example with the previous example :

newbio(5)% ls /scratch/auto_sRNAPredict/workspaces/vahi/NC_009925.1244759803/output/

A successfull running workflow should produce a file name <SOI>_sRNA.out_annotated, in our example:
NC_009925_sRNA.out_annotated

The executables

the executable are located in /scratch/auto_sRNAPredict/exe/ . The compute jobs refer to these executables ===

newbio(5)% ls /scratch/auto_sRNAPredict/exe

* sRNA_Annotate

    • the only code where we needed to change the source to get it to compile without condor compiler correctly
    • a file path is hardcoded that needed to be removed.

Precomputed Input Files

are available at /scratch.1/auto_sRNAPredict/genomes_files/igr

These are the files that are fetched from the NCBI database by Jonathan. It is not part of the workflow

WEB server ( Maintained by Zachary Miller )

Internally the Web APP calls out to pegasus_sipht.pl to generate the DAX and submit the workflow.
All jobs submitted via the web interface are run as user www-cndr

The content of the web server is stored in the directory ;

/local.newbio/public/html/sRNA/

the web link is

http://newbio.cs.wisc.edu/sRNA

You can login as 'guest' without password.

There is a web page to see what is the content of the directory

http://newbio.cs.wisc.edu/sRNA/manage.php

submit log of the web portal

/scratch.1/auto_sRNAPredict/workspaces/submit_log

NB: old version were run on the bio machine (replace everything with bio instead of newbio)

Debugging SIPHT workflows

Pegasus has a collection of tools that can help a user debug workflows .

PRE-REQUISITE

By default pegasus-status and pegasus-analyzer are not in the system path. To do this , user needs to source the setup script in the terminal

source /scratch/auto_sRNAPredict/pegasus/setup-with-pegasus

pegasus-status, pegasus-remove and pegasus-analyzer take in the submit directories generated by pegasus as arguments.

Pegasus Workflow Submit Directory

This is the directory where Pegasus generates the condor submit files for the workflow. pegasus_sipht.pl will write out on stdout what the directory is.

example

/scratch/auto_sRNAPredict/pegasus/pegasus_sipht.pl -soi NC_002506 -all -c /scratch/auto_sRNAPredict/config/default.config -pegasus run

...

### SUBMIT_PATH ###
/scratch/auto_sRNAPredict/workspaces/vahi/NC_002505.1278713005/submit

### CACHE_FILE ###
/scratch/auto_sRNAPredict/workspaces/vahi/NC_002505.1278713005/submit/NC_002505_cache_file
### CACHE_EXEC ###
/scratch/auto_sRNAPredict/workspaces/vahi/NC_002505.1278713005/submit/NC_002505_cache_exec
Successful PEGASUS planning
check /scratch/auto_sRNAPredict/workspaces/vahi/NC_002505.1278713005/submit/NC_002505_cache_file
check /scratch/auto_sRNAPredict/workspaces/vahi/NC_002505.1278713005/submit/NC_002505_cache_exec

 ### PEGASUS WORKFLOW SUBMIT DIRECTORY  ###
 /scratch/auto_sRNAPredict/workspaces/vahi/NC_002505.1278713005/submit/NC_002505.1278713005

DISABLING TRANSFER_OUTPUT IN SUBMIT FILES

Checking status of a running workflow

Run pegasus-status to check the state of a workflow

Usage: pegasus-status <workflow-submit-directory>

By default pegasus-status shows only the jobs that are in the condor_q of a particular workflow

Optional arguments

  • user - By default pegasus-status can track jobs only of the user you are logged in as. Use this option to specify the user whose job you want track. For e.g you are logged in as jlivny and want to track www-cndr jobs, you will do
    pegasus-status --user www-cndr <path to workflow submit directory>
    
  • long - Gives more detailed information, like how many jobs succeeded or failed etc.

Example Usage

  • Without the long option
    newbio(75)% pegasus-status --user www-cndr /scratch.1/auto_sRNAPredict/workspaces/www-cndr/NC_002505.1278562560/submit/NC_002505.1278562560
    Warning: run directory mismatch, using /scratch.1/auto_sRNAPredict/workspaces/www-cndr/NC_002505.1278562560/submit/NC_002505.1278562560
    
    
    -- Submitter: newbio.cs.wisc.edu : <128.105.147.100:55928> : newbio.cs.wisc.edu
     ID      OWNER/NODENAME   SUBMITTED     RUN_TIME ST PRI SIZE CMD
    339451.0   www-cndr        7/7  23:16   1+17:40:59 R  0   97.7 condor_dagman -f -
    339597.0    |-SRNA_ID0000  7/7  23:51   1+16:43:43 R  0   97.7 sRNAPredict.sh acc
    
    
  • With the long option
    newbio(76)% pegasus-status --long --user www-cndr /scratch.1/auto_sRNAPredict/workspaces/www-cndr/NC_002505.1278562560/submit/NC_002505.1278562560
    Warning: run directory mismatch, using /scratch.1/auto_sRNAPredict/workspaces/www-cndr/NC_002505.1278562560/submit/NC_002505.1278562560
    sipht-0.dag is running.
    07/07 23:51:52  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
    07/07 23:51:52   ===     ===      ===     ===     ===        ===      ===
    07/07 23:51:52    25       0        1       0       0          8        0
    
    WORKFLOW STATUS : 25/34 ( 74% ) RUNNING (condor processing workflow)
    
    

jobstate.log file

In each workflow submit directory there is a jobstate.log file that is created by a process that Pegasus launches when a workflow is submitted and parses the condor dagman logs into an easy to read format.

Example
/scratch.1/auto_sRNAPredict/workspaces/www-cndr/NC_013316.1277999914/submit/NC_013316.1277999914/jobstate.log

Here is a snippet from there

1278000187 Patser_ID000005 EXECUTE 326166.0 wisc_bio -
1278000192 Patser_ID000005 JOB_TERMINATED 326166.0 wisc_bio -
1278000192 Patser_ID000005 JOB_SUCCESS - wisc_bio -
1278000192 Patser_ID000018 EXECUTE 326161.0 wisc_bio -
1278000192 Patser_ID000018 JOB_TERMINATED 326161.0 wisc_bio -
1278000192 Patser_ID000018 JOB_SUCCESS - wisc_bio -
1278000197 Blast_ID000023 EXECUTE 326149.0 wisc_bio -

Removing a running workflow

pegasus-remove allows you to remove a running workflow

Usage: pegasus-remove <workflow-submit-directory>

You need to be logged in as the user who is running the workflow.

newbio(86)% pegasus-remove  /scratch/auto_sRNAPredict/workspaces/vahi/NC_002505.1278713005/submit/NC_002505.1278713005
Job 342141.0 marked for removal

pegasus-analyzer

Debugging a Single SOI Run

Once pegasus-status tells you that the workflow has finished, we can use pegasus-analyzer to debug the workflow

Usage: pegasus-analyzer -i <workflow submit directory>

example usage

newbio(98)% pegasus-analyzer -i /scratch.1/auto_sRNAPredict/workspaces/vahi/NC_002505.1278701378/submit/NC_002505.1278701378
pegasus-analyzer: initializing...

************************************Summary*************************************

 Total jobs         :     33 (100.00%)
 # jobs succeeded   :     25 (75.76%)
 # jobs failed      :      0 (0.00%)
 # jobs unsubmitted :      7 (21.21%)
 # jobs unknown     :      1 (3.03%)

*****************************Unknown jobs' details******************************

=================================SRNA_ID000029==================================

 last state: EXECUTE
       site: wisc_bio
submit file: /scratch.1/auto_sRNAPredict/workspaces/vahi/NC_002505.1278701378/submit/NC_002505.1278701378/SRNA_ID000029.sub
output file: /scratch.1/auto_sRNAPredict/workspaces/vahi/NC_002505.1278701378/submit/NC_002505.1278701378/SRNA_ID000029.out
 error file: None

-------------------------------SRNA_ID000029.out--------------------------------


**************************************Done**************************************

pegasus-analyzer: end of status report

The above says one of the job has status unknown or failed.

Lets debug that job further

Usage pegasus-analyzer --debug-job <submit-file> --debug-dir <directory where you want the input files to be copied and executable to be run>

Example USAGE

newbio(99)% pegasus-analyzer --debug-job SRNA_ID000029.sub --debug-dir ./debug-srna
pegasus-analyzer: initializing...
info: debugging condor type workflow

pegasus-analyzer: finished generating job debug script!

To run it, you need to type:
   $ cd /scratch.1/auto_sRNAPredict/workspaces/vahi/NC_002505.1278701378/submit/NC_002505.1278701378/debug-srna
   $ ./debug_SRNA_ID000029.sh

newbio(100)%

The above created a shell script in the debug-srna directory that user can run locally in that directory.
It will copy the input files over, and launch the job.

Debugging an ALL Run

Step 1) Run pegasus-analyzer on the outer level workflow
It will tell you what sub workflows failed, and the command to run for each of the sub workflow failed.

pegasus-analyzer -i /scratch.1/auto_sRNAPredict/workspaces/jlivny/ALL/Jul_15_2010 | more
pegasus-analyzer: initializing...

************************************Summary*************************************

Total jobs         :   1913 (100.00%)
# jobs succeeded   :   1860 (97.23%)
# jobs failed      :     36 (1.88%)
# jobs unsubmitted :      0 (0.00%)
# jobs unknown     :     17 (0.89%)

******************************Failed jobs' details******************************

==============================pegasus-plan_ID1569===============================

last state: JOB_FAILURE
      site: local
submit file: /scratch.1/auto_sRNAPredict/workspaces/jlivny/ALL/Jul_15_2010/pegasus-plan_ID1569.sub
output file: /scratch.1/auto_sRNAPredict/workspaces/jlivny/ALL/Jul_15_2010/pegasus-plan_ID1569.out
error file: /scratch.1/auto_sRNAPredict/workspaces/jlivny/ALL/Jul_15_2010/pegasus-plan_ID1569.err
This job contains sub workflows!
Please run the command below for more information:
pegasus-analyzer -t  -d /scratch/auto_sRNAPredict/workspaces/jlivny/ALL/Jul_15_2010/NC_011971.1279254593/submit/NC_011971.1279254593/.

Step 2) We run pegasus-analyzer on the subworkflow that failed.

The above tells us that a sub workflow failed. To debug a sub workflow , we run the command listed above

pegasus-analyzer -t  -d /scratch/auto_sRNAPredict/workspaces/jlivny/ALL/Jul_15_2010/NC_011971.1279254593/submit/NC_011971.1279254593/.
pegasus-analyzer -t  -d /scratch/auto_sRNAPredict/workspaces/jlivny/ALL/Jul_15_2010/NC_011971.1279254593/submit/NC_011971.1279254593/.
pegasus-analyzer: initializing...
running: tailstatd -n --nodatabase -r -j /scratch/auto_sRNAPredict/workspaces/jlivny/ALL/Jul_15_2010/NC_011971.1279254593/submit/NC_011971.1279254593/./jobstate.log /scratch/auto_sRNAPredict/workspaces/jlivny/ALL/Jul_15_2010/NC_011971.1279254593/submit/NC_011971.1279254593/./sipht-0.dag.dagman.out

************************************Summary*************************************

Total jobs         :     31 (100.00%)
# jobs succeeded   :     22 (70.97%)
# jobs failed      :      0 (0.00%)
# jobs unsubmitted :      8 (25.81%)
# jobs unknown     :      1 (3.23%)

*****************************Unknown jobs' details******************************

===============================RNAMotif_ID000002================================

last state: JOB_TERMINATED
      site: wisc_bio
submit file: /scratch/auto_sRNAPredict/workspaces/jlivny/ALL/Jul_15_2010/NC_011971.1279254593/submit/NC_011971.1279254593/./RNAMotif_ID000002.sub
output file: None
error file: None

**************************************Done**************************************

pegasus-analyzer: end of status report

Step 3) We now know what job in the workflow failed. We use the debug-job feature to debug that job.

pegasus-analyzer --debug-job /scratch/auto_sRNAPredict/workspaces/jlivny/ALL/Jul_15_2010/NC_011971.1279254593/submit/NC_011971.1279254593/./RNAMotif_ID000002.sub --debug-dir /tmp/karan
pegasus-analyzer: initializing...
info: debugging condor type workflow

pegasus-analyzer: finished generating job debug script!

To run it, you need to type:
  $ cd /tmp/karan
  $ ./debug_RNAMotif_ID000002.sh

pegasus-workflow-notify

pegasus-workflow-notify provides a mechanism for notifying the status of the workflow execution to a list of subscribed email users.

Usage: pegasus-workflow-notify <condor-output-file>

By default pegasus-workflow-notify runs the pegasus-analyzer and sends the result of workflow execution to the list of email user’s defined in the environment variable EMAIL.
Optional arguments

  • email – Comma separated list of email user’s to notify the result of the workflow execution.By default it picks up the list of email user’s defined in the environment variable EMAIL.
  • subject – Subject of the email sent to the subscribed users.
  • notify – Specify the condition to be met to sent the email . Available options are ONSUCCESS|ONFAILURE|ALL .Default is ONFAILURE
  • keep – Keeps the pegasus analyzer run output file

Example Usage

pegasus-workflow-notify /scratch/H1L1V1-s6_lowmass_ihope_small-932255943-86400.3IF4OT/pegasus-plan_ID000000.out --subject 'SIPHT workflow run result' --email admin@isi.edu' --notify ONFAILURE --keep
Successfully ran pegasus analyzer command :- ' /usr/bin/pegasus-analyzer -t -i /scratch/H1L1V1-s6_lowmass_ihope_small-932255943-86400.3IF4OT/nsbhlininj/inspiral_hipe_nsbhlininj.NSBHLININJ '.
Pegasus workflow succeeded.
******
sending email:
/usr/bin/mutt -a /tmp/pegasus_5G05Vx_analyzer.out -s 'Pegasus workflow succeeded (Submit dir : /usr1/vahi/work/ihope/s6/hm-hour-osg-itb/pegasus-submit-dir/H1L1V1-s6_lowmass_ihope_small-932255943-86400.3IF4OT/nsbhlininj/inspiral_hipe_nsbhlininj.NSBHLININJ) ' admin@isi.edu
Successfully ran and sent the results as email attachment to admin@isi.edu.
Output written to the file :- ' /tmp/pegasus_5G05Vx_analyzer.out '.

GIDEON test

The configuration was put in

/scratch/auto_sRNAPredict/pegasus/test

modified files:

  • tc.data, add :
    local kickstart file:///scratch/auto_sRNAPredict/pegasus/PEGASUS/default/bin/kickstart STATIC_BINARY INTEL32::LINUX PEGASUS::style="condor";CONDOR::universe="vanilla",should_transfer_files="YES",when_to_transfer_output="ON_EXIT",transfer_executable="true"
    
  • site.config.xml
    • for the ''pprof'' test, add within the local site:
      <!-- ORIGINAL <profile namespace="pegasus" key="gridstart" >none</profile> -->
          <!-- GIDEON's -->
          <profile namespace="pegasus" key="gridstart">Kickstart</profile>
          <profile namespace="pegasus" key="gridstart.path">/scratch/auto_sRNAPredict/pegasus/test/wfprof/pprof</profile>
          <!-- /GIDEON'S -->
      
    • for the ''ioprof'' test, add within the local site:
      <!-- ORIGINAL <profile namespace="pegasus" key="gridstart" >none</profile> -->
          <!-- GIDEON's -->
          <profile namespace="pegasus" key="gridstart">Kickstart</profile>
          <profile namespace="pegasus" key="gridstart.path">/scratch/auto_sRNAPredict/pegasus/test/wfprof/ioprof</profile>
          <!-- /GIDEON'S -->
      
    • for the ''kickstart'' test, add within the local site:
      <!-- ORIGINAL <profile namespace="pegasus" key="gridstart" >none</profile> -->
          <!-- GIDEON's -->
          <profile namespace="pegasus" key="gridstart">Kickstart</profile>
          <profile namespace="pegasus" key="gridstart.path">/scratch/auto_sRNAPredict/pegasus/PEGASUS/default/bin/kickstart</profile>
          <!-- /GIDEON'S -->
      
  • copy 'pegasus_sipht.pl' script to 'pegasus_sipht.pl' and add for each jobs a dependancy to kickstart:
    # Gideon test
    	print DAX "  <uses file=\"kickstart\" link=\"input\" type=\"executable\" transfer=\"true\"/>\n";
    # Gideon test
    
  • No labels