Last Updated March 25th , 2011
- Add notification support to Pegasus
- Moving auxillary tools to stampede db
- Other Stampede Related Changes
- User Guide Reorganization
- Testing Framework and Testing.
Notification Support in Pegasus via monitord
Monitord needs to be managed by Condor. What happens if monitord crashes or condor/system crashes? We want monitord to come up automatically as Condor recovers after a restart. [Fabio,Gaurang]*
Monitord needs to support notifications [Fabio]
Requires changes to Pegasus to generate input file for monitord [Rajiv,Karan]
Come up with default notify scripts in the toolkit that notify the user and generate some status reports. [Gaurang]
- Changes to DAX Schema
- Addition of invoke element at the workflow level
Changes to python API [Gideon]
Changes to Perl API [Jens]
Changes to JAVA API [Gaurang]
Change to JAVA Parser [Karan]
Instead of pegasus-run launching monitord, monitord should appear as an independent job in the workflow with the highest priority [Karan]
Fabio needs to make sure exitcodes are thrown correctly and restarts are handled correctly. [Fabio]
Monitord Management [Fabio,Gaurang]
Auxillary Tools to Stampede DB
Monitord Changes [Fabio]
- Monitord also needs to be able to account for newer versions of Condor DAGMan creating a jobstate.log file.
Stampede Related Changes
Additional DB schema changes to be able to connect jobs/tasks in the DAX with corresponding kickstart records [Fabio,Monte]
Addition of workflow metrics file containing distribution of jobs into the DB [Fabio,Karan]
Revive the metrics file created by Pegasus . Should be populated in the submit directory. [Karan]
Monitord picks up the metrics file and stores it in the DB. Dependant on the Schema Redesign [Fabio]
- refactoring of clustered jobs (internal to pegasus)
- Addition of -conf option
Java Clients [Prasanth,Rajiv,Karan]
Python Clients [Prasanth,Fabio]
- pegasus-statistics, pegasus-plots, pegasus-analyzer, monitord
Perl Clients [Gaurang]
Improvements to pegasus-tc-client [Prasanth]
- the pegasus-tc-client output is in the old deprecated format.
Improvements to pegasus-rc-client [Rajiv]
- Investigation of RLS compatibility issues
- Addition of default categories to allow for easier specification of category based knobs at DAGMan level
- cleanup jobs
- subdax jobs
Improve Rescue DAG semantics [Rajiv]
Improve the Condor File IO mode in Pegasus ? Not clear how to do it without going down the staging-sites option.
User experience can be improved, but would be a hack to do it in Pegasus without staging-sites option.
User Guide Reorganization [Bill]
- Dependant on Bill
Testing Framework and Testing
People Involved [Jens,Gaurang]
Porting the VM to 3.1.0
People Involved [Karan,Rajiv]
- Addition of new exercises
Timeline/ Sequence of changes
For Stampede related changes
- Stampede DB schema redesign and then support needs to be implemented by Fabio and Monte.
- Original estimate by Fabio for db changes and porting of pegasus-analyzer was end of April.
- Depending on scale of schema redesign, this may have to be extended !
- Only after that pegasus-analyzer, pegasus-statistics and pegasus-plots can be ported for 3.1.
Notification Related Changes
- First monitord needs to be managed
- Create input file for monitord containing the notifications for the jobs.
- Fabio puts in support for notifications in monitord.
Fabio should first work on monitord to be managed. Then move to stampede changes.
This tells us up front if monitord can do notifications. Assumption is that it is critical to not have missing notifications in case of system crash.
If we are ok with as is approach , then management of monitord is not an issue.
While Fabio works on stampede, we can do the creation of the input file for monitord.
If no major stampede db changes are decided, then following might be feasible
- Fabio gets stampede related changes done end of april.
- Fabio gets whole of may to put in notification support.
- End of May we may have a first beta, where everything idenitifed above is done.