You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 40 Next »

Table Of Contents:

Last Updated March 25th , 2011

Main Focus

  • Add notification support to Pegasus
  • Moving auxillary tools to stampede db
  • Other Stampede Related Changes
  • User Guide Reorganization 
  • Testing Framework and Testing.

Notification Support in Pegasus via monitord

  • Monitord needs to be managed by Condor. What happens if monitord crashes or condor/system crashes? We want monitord to come up automatically as Condor recovers after a restart. [Fabio,Gaurang]*

  • Monitord needs to support notifications [Fabio]

  • Requires changes to Pegasus to generate input file for monitord [Rajiv,Karan]

  • Come up with default notify scripts in the toolkit that notify the user and generate some status reports. [Gaurang]

  • Changes to DAX Schema
    • Addition of invoke element at the workflow level
    • Changes to python API [Gideon]

    • Changes to Perl API [Jens]

    • Changes to JAVA API [Gaurang]

    • Change to JAVA Parser [Karan]

    • Instead of pegasus-run launching monitord, monitord should appear as an independent job in the workflow with the highest priority [Karan]

    • Fabio needs to make sure exitcodes are thrown correctly and restarts are handled correctly. [Fabio]

Monitord Management [Fabio,Gaurang]

https://confluence.pegasus.isi.edu/display/pegasus/Monitord+Management+via+Condor

Auxillary Tools to Stampede DB

  • pegasus-statistics [Prasanth]

  • pegasus-plots [Prasanth]

  • pegasus-analyzer [Fabio]

Monitord Changes [Fabio]

  • Monitord also needs to be able to account for newer versions of Condor DAGMan creating a jobstate.log file.

Stampede Related Changes

  • Additional DB schema changes to be able to connect jobs/tasks in the DAX with corresponding kickstart records [Fabio,Monte]

  • Addition of workflow metrics file containing distribution of jobs into the DB [Fabio,Karan]

    • Revive the metrics file created by Pegasus . Should be populated in the submit directory. [Karan]

    • Monitord picks up the metrics file and stores it in the DB. Dependant on the Schema Redesign [Fabio]

Usability Changes

  • refactoring of clustered jobs (internal to pegasus)
  • Addition of -conf option
    • Java Clients [Prasanth,Rajiv,Karan]

    • Python Clients [Prasanth,Fabio]

      • pegasus-statistics, pegasus-plots, pegasus-analyzer, monitord
    • Perl Clients [Gaurang]

  • Improvements to pegasus-tc-client [Prasanth]

    • the pegasus-tc-client output is in the old deprecated format.
  • Improvements to pegasus-rc-client [Rajiv]

    • Investigation of RLS compatibility issues
  • Addition of default categories to allow for easier specification of category based knobs at DAGMan level
    • cleanup jobs
    • subdax jobs
  • Improve Rescue DAG semantics [Rajiv]

Open Question

Improve the Condor File IO mode in Pegasus ? Not clear how to do it without going down the staging-sites option.
User experience can be improved, but would be a hack to do it in Pegasus without staging-sites option.

User Guide Reorganization [Bill]

  • Dependant on Bill

Testing Framework and Testing

People Involved [Jens,Gaurang]

Porting the VM to 3.1.0

People Involved [Karan,Rajiv]

  • Addition of new exercises

Timeline/ Sequence of changes

For Stampede related changes

  1. Stampede DB schema redesign and then support needs to be implemented by Fabio and Monte.
    1. Original estimate by Fabio for db changes and porting of pegasus-analyzer was end of April.
    2. Depending on scale of schema redesign, this may have to be extended !
  2. Only after that pegasus-analyzer, pegasus-statistics and pegasus-plots can be ported for 3.1.

Notification Related Changes

  1. First monitord needs to be managed
  2. Create input file for monitord containing the notifications for the jobs.
  3. Fabio puts in support for notifications in monitord.

Fabio should first work on monitord to be managed. Then move to stampede changes.
This tells us up front if monitord can do notifications. Assumption is that it is critical to not have missing notifications in case of system crash.
If we are ok with as is approach , then management of monitord is not an issue.

While Fabio works on stampede, we can do the creation of the input file for monitord.

If no major stampede db changes are decided, then following might be feasible

  • Fabio gets stampede related changes done end of april.
  • Fabio gets whole of may to put in notification support.
  • End of May we may have a first beta, where everything idenitifed above is done.
  • No labels