Table Of Contents:
Last Updated Feb 28th April 5th , 2011
- Improving how we get data to worker nodes. Being able to retrieve data on the worker nodes from various sources.
- Refactor how clustered jobs are handled
- Addition of staging-sites option to pegasus-plan
- New Shell Job Wrapper for jobs when running on worker nodes
- Changes to pegasus-cleanup
- Changes to pegasus-createdir
- Special S3 support into pegasus-transfer, pegasus-cleanup and pegasus create dir
- Bypassing of staging-site while staging in input data
- Bypassing of staging-site while staging out output data
- Transfer of braindump file to remote workflow execution directories
- Add notification support to Pegasus Monitord needs to support notificationsAdd notification support to Pegasus
- Moving auxillary tools to stampede db
- Other Stampede Related Changes
- Usability Changes
- User Guide Reorganization
- Testing Framework and Testing.
JIRA Task Board
Stampede Related Changes
- Main JIRA Issue
- Updated Database Schema
- Additional DB schema changes to be able to connect jobs/tasks in the DAX with corresponding kickstart records [Karan,Fabio,Monte]
- Identify the new events for the updated schema [Karan,Fabio]
- Pegasus to generate netlogger events to a file in the submit directory [Karan]
- Changes to monitord to conform to new schema, populate netlogger stream from Pegasus. [Fabio]
- Addition of workflow metrics file containing distribution of jobs into the DB [Fabio,Karan]
- Revive the metrics file created by Pegasus . Should be populated in the submit directory. [Karan]
Notification Support in Pegasus via monitord
- Main JIRA Issue
- Monitord needs to be managed by Condor. What happens if monitord crashes or condor/system crashes? We want monitord to come up automatically as Condor recovers after a restart. [Fabio,Gaurang]
- Monitord needs to support notifications [Fabio]
- Requires changes to Pegasus to generate input file for monitord [Rajiv,Karan]
- Come up with default notify scripts in the toolkit that notify the user and generate some status reports. [Gaurang]
- Changes to DAX Schema
- Main JIRA Item PM-350
- Addition of invoke element at the workflow level
- Changes to python API [Gideon]
- Changes to Perl API [Jens]
- Changes to JAVA API [Gaurang]
- Change to JAVA Parser [Karan]
- Fabio needs to make sure exitcodes are thrown correctly and restarts are handled correctly. [Fabio]
Monitord Management [Fabio,Gaurang]
Auxillary Tools to Stampede DB
- pegasus-statistics [Prasanth, Mats]
- pegasus-plots [Prasanth]
- Other Stampede Related Changes
- Improve Rescue DAG semantics
- Additional DB schema changes to be able to connect jobs/tasks in the DAX with corresponding kickstart records
- User Guide Reorganization ? Dependant on Bill[Fabio]
Monitord Changes [Fabio]
- Monitord also needs to be able to account for newer versions of Condor DAGMan creating a jobstate.log file.
DAX API changes
- Feedback from Duncan brown while using Python API
- Executable handling in Python API and JAVA API
- internally executables handled as lists not as Sets
- python api does not allow for adding edges based on id's.
- also have a getJob function based on ID.
- escape function in python api?is it for all strings
- JAVA API [Gaurang]
- Python API [Gideon]
- Perl API [Jens]
S3 support [Mats, Karan]
- pegasus-transfer to support pegasus-s3 tool Gideon wrote [Mats]
- changes to Pegasus to use pegasus-transfer for S3 [Karan]
- handle transfer of S3 config file etc
- get s3 to work with SeqExec launcher.
- refactoring of clustered jobs (internal to pegasus)
- Addition of -conf option
- Java Clients [Prasanth,Rajiv,Karan]
- Python Clients [Prasanth,Fabio]
- pegasus-statistics, pegasus-plots, pegasus-analyzer, monitord
- Perl Clients [Gaurang]
- Improvements to pegasus-tc-client [Prasanth]
- the pegasus-tc-client output is in the old deprecated format.
- Improvements to pegasus-rc-client [Rajiv]
- Investigation of RLS compatibility issues
- Addition of default categories to allow for easier specification of category based knobs at DAGMan level
- cleanup jobs
- subdax jobs
- Improve Rescue DAG semantics [Rajiv]
- Pegasus should not require the jobmanager compute to be present in site catalog. [Karan] https://jira.isi.edu/browse/PM-277
Condor Common Log Handling?
- Condor Common Log Handling to be discussed. https://jira.isi.edu/browse/PM-222
Improve the Condor File IO mode in Pegasus ? Not clear how to do it without going down the staging-sites option.
User experience can be improved, but would be a hack to do it in Pegasus without staging-sites option.
User Guide Reorganization [Bill]
- Dependant on Bill
Testing Framework and Testing
People Involved [Jens,Gaurang]
Porting the VM to 3.1.0
People Involved [Karan,Rajiv]
- Addition of new exercises
Timeline/ Sequence of changes
For Stampede related changes
- Stampede DB schema redesign and then support needs to be implemented by Fabio and Monte.
- Original estimate by Fabio for db changes and porting of pegasus-analyzer was end of April.
- Depending on scale of schema redesign, this may have to be extended !
- Only after that pegasus-analyzer, pegasus-statistics and pegasus-plots can be ported for 3.1.
Notification Related Changes
- First monitord needs to be managed
- Create input file for monitord containing the notifications for the jobs.
- Fabio puts in support for notifications in monitord.
Fabio should first work on monitord to be managed. Then move to stampede changes.
This tells us up front if monitord can do notifications. Assumption is that it is critical to not have missing notifications in case of system crash.
If we are ok with as is approach , then management of monitord is not an issue.
While Fabio works on stampede, we can do the creation of the input file for monitord.
If no major stampede db changes are decided, then following might be feasible
- Fabio gets stampede related changes done end of april.
- Fabio gets whole of may to put in notification support.
- End of May we may have a first beta, where everything idenitifed above is done.
- June is spent testing on the release .