Table Of Contents:
Last Updated March 17th April 5th , 2011
- Add notification support to Pegasus
- Moving auxillary tools to stampede db
- Other Stampede Related Changes
- Usability Changes
- User Guide Reorganization
- Testing Framework and Testing.
Data to Worker Nodes
The main change for 3.1. The developers document has complete details [Download Developers Document|^Data_To_Worker_Nodes-v3.pdf] *People Involved \[Mats,Karan,Gaurang\]*
Refactor how clustered jobs are handled *\[Karan\]*
Addition of staging-sites option to pegasus-plan *\[Karan\]*
New Shell Job Wrapper for jobs when running on worker nodes *\[Karan,Mats\]*
Creation of pegasus-cleanup jobs *\[Karan\]*
new pegasus-cleanup client that runs locally *\[Mats\]*
New pegasus-createdir client *\[Mats\]*
New java createdir client *\[Gaurang\]*
Special S3 support into pegasus-transfer, pegasus-cleanup and pegasus create dir *\[Mats\]*
Bypassing of staging-site while staging in input data *\[Karan\]*
Bypassing of staging-site while staging out output data *\[Karan\]*
JIRA Task Board
Stampede Related Changes
- Main JIRA Issue
- Updated Database Schema
- Additional DB schema changes to be able to connect jobs/tasks in the DAX with corresponding kickstart records [Karan,Fabio,Monte]
- Identify the new events for the updated schema [Karan,Fabio]
- Pegasus to generate netlogger events to a file in the submit directory [Karan]
- Changes to monitord to conform to new schema, populate netlogger stream from Pegasus. [Fabio]
- Addition of workflow metrics file containing distribution of jobs into the DB [Fabio,Karan]
- Revive the metrics file created by Pegasus . Should be populated in the submit directory. [Karan]
Notification Support in Pegasus via monitord
Monitord needs to support notifications *\[Fabio\]*
Requires changes to Pegasus to generate input file for monitord *\[Rajiv,Karan\]*
Come up with default notify scripts in the toolkit that notify the user and generate some status reports. *\[Gaurang\]*
- Main JIRA Issue
- Monitord needs to be managed by Condor. What happens if monitord crashes or condor/system crashes? We want monitord to come up automatically as Condor recovers after a restart. [Fabio,Gaurang]
- Monitord needs to support notifications [Fabio]
- Requires changes to Pegasus to generate input file for monitord [Rajiv,Karan]
- Come up with default notify scripts in the toolkit that notify the user and generate some status reports. [Gaurang]
- Changes to DAX Schema
- Main JIRA Item PM-350
- Addition of invoke element at the workflow level
- Changes to python API [Gideon]
- Changes to Perl API [Jens]
- Changes to JAVA API [Gaurang]
- Change to JAVA Parser [Karan]
- Fabio needs to make sure exitcodes are thrown correctly and restarts are handled correctly. [Fabio]
Monitord Management [Fabio,Gaurang]
Auxillary Tools to Stampede DB
pegasus-statistics *\[Prasanth\, Mats]* Wiki Markup Wiki Markup
- pegasus-plots *\[Prasanth\]*
- pegasus-analyzer *\[Fabio\]*
- Monitord also needs to be able to account for newer versions of Condor DAGMan creating a jobstate.log file.
Stampede Related Changes
Improve Rescue DAG semantics *\[Rajiv\]*
Additional DB schema changes to be able to connect jobs/tasks in the DAX with corresponding kickstart records *\[Fabio,Monte,Karan\]*
Addition of workflow metrics file containing distribution of jobs into the DB *\[Fabio,Karan\]*
Revive the metrics file created by Pegasus . Should be populated in the submit directory. *\[Karan\]*
Monitord picks up the metrics file and stores it in the DB *\[Fabio\]*
DAX API changes
- Feedback from Duncan brown while using Python API
- Executable handling in Python API and JAVA API
- internally executables handled as lists not as Sets
- python api does not allow for adding edges based on id's.
- also have a getJob function based on ID.
- escape function in python api?is it for all strings
- JAVA API [Gaurang]
- Python API [Gideon]
- Perl API [Jens]
S3 support [Mats, Karan]
- pegasus-transfer to support pegasus-s3 tool Gideon wrote [Mats]
- changes to Pegasus to use pegasus-transfer for S3 [Karan]
- handle transfer of S3 config file etc
- get s3 to work with SeqExec launcher.
- refactoring of clustered jobs (internal to pegasus)
- Addition of -conf option
- Java Clients [Prasanth,Rajiv,Karan]
- Python Clients [Prasanth,Fabio]
- pegasus-statistics, pegasus-plots, pegasus-analyzer, monitord
- Perl Clients [Gaurang]
- Improvements to pegasus-tc-client [Prasanth]
- the pegasus-tc-client output is in the old deprecated format.
- Improvements to pegasus-rc-client [Rajiv]
- Investigation of RLS compatibility issues
- Addition of default categories to allow for easier specification of category based knobs at DAGMan level
- cleanup jobs
- subdax jobs
- Improve Rescue DAG semantics [Rajiv]
- Pegasus should not require the jobmanager compute to be present in site catalog. [Karan] https://jira.isi.edu/browse/PM-277
Condor Common Log Handling?
- Condor Common Log Handling to be discussed. https://jira.isi.edu/browse/PM-222
Improve the Condor File IO mode in Pegasus ? Not clear how to do it without going down the staging-sites option.
User experience can be improved, but would be a hack to do it in Pegasus without staging-sites option.
User Guide Reorganization [Bill]
- Dependant on Bill
Testing Framework and Testing
People Involved [Jens,Gaurang]
Porting the VM to 3.1.0
People Involved [Karan,Rajiv]
- Addition of new exercises
Timeline/ Sequence of changes
For Stampede related changes
- Stampede DB schema redesign and then support needs to be implemented by Fabio and Monte.
- Original estimate by Fabio for db changes and porting of pegasus-analyzer was end of April.
- Depending on scale of schema redesign, this may have to be extended !
- Only after that pegasus-analyzer, pegasus-statistics and pegasus-plots can be ported for 3.1.
Notification Related Changes
- First monitord needs to be managed
- Create input file for monitord containing the notifications for the jobs.
- Fabio puts in support for notifications in monitord.
Fabio should first work on monitord to be managed. Then move to stampede changes.
This tells us up front if monitord can do notifications. Assumption is that it is critical to not have missing notifications in case of system crash.
If we are ok with as is approach , then management of monitord is not an issue.
While Fabio works on stampede, we can do the creation of the input file for monitord.
If no major stampede db changes are decided, then following might be feasible
- Fabio gets stampede related changes done end of april.
- Fabio gets whole of may to put in notification support.
- End of May we may have a first beta, where everything idenitifed above is done.
- June is spent testing on the release .