Table Of Contents Developer Meetings for 4.0/3.2:
February 2012
Feb 13th, 2011
- Attendees Karan, Gaurang, Mats, Fabio and Rajiv
- Testing update
- Mats tested S3 and SRM configurations
- these 2 test cases will have to be run manually .
- Gaurang suggests a cron job that starts a VM every night.
- Irods will be supported .
- FDT
- not sure as yet.
- Mats has a call with them tomorrow.
- Works only for data staging. Not works for executable staging.
- Gaurang will update the Bio VM , the old rna seq worklfows to run with 4.0
- Rajiv will do the new bow tie version
- Mats tested S3 and SRM configurations
- SSH
- transferring of the ssh private keys. they are password less.
- right now it works out of box, in case of shared fs.
- The credential handling in Pegasus planner for ssh will be implemented this week by Mats
- Will allow only for local transfer jobs.
- Kickstart changes
- if they have to be checked in, they need to be in till thursday.
Documentation
- Examples need to be updated , and moved accordingly .
- condor 7.6 and up has an upload and download queue, that contras how many jobs can use condor file io. By default it is 10. This can be changed. The Condor IO documentation should reflect this.
Feb 6th, 2011
Attendees Karan, Mats, Fabio , Jens and Rajiv
- 3.1.1 release
- python path issue with importing the python dax api for LIGO
- Mats copies the pegasus python dax api and net logger python libraries to python-tools/pegasus and python-tools/netlogger to deal with this.
- Waiting on Duncan to confirm
- Rajiv will port the tutorial to 3.1 in tandem. Will send email when the tutorial is ported to 3.1
- python dax library packaging for 4.0
- the externals will be built in lib/pegasus/externals directory
- mats has changed all the tools to pick up the new paths.
- long term we will put in rpm dependencies for externals tools
- add the dependency for graphviz in the RPM
- 4.0 release items remaining
- credentials handling in pegasus mats
- pegasus-transfer puts in support for ftd mats
- pegasus-statistics changes.
- fabio is eating on monte to update the queries.
- hope to be done this week
- kickstart changes
- jens wants to check in changes to the kickstart code to use threads.
- pegasus examples need to be updated.
- Release Date
- Feb 29th
- Testing
- Rajiv added runtime clustering with executable staging, symlinking and kickstart test case
- Shared fs with symlinking and no kickstart
- in the tests directory there will be a file that lists what each test cases cover
- Rajiv will take a first crack at it
- Documentation
- be assigned to people to go through.
- clustering guide to be added by rajiv
- man pages need to be updated.
- pegasus-statistics and monitord updated.
- talk about pegasus lite and staging site.
- Tutorial VM needs to have pegasus lite exercises added.
- 4.1 release
- new mpi dag that gideon has developed for use on kraken
- need to talk about monitord changes on that
January 2012
Jan 30th, 2011
Attendees: Karan, Mats, Jens, Fabio , Rajiv and Gaurang
- seqexec changes for 3.1 branch and trunk
- - e is the old mode, where if -f is not set, seqexec will return 0 irrespective of failures
- -e and -f are mutually exclusive
- both in -f and if no option is specified, seqexec will return 5 in case anyone of the clustered jobs fail
- 3.1.1 release
- waiting on seqexec changes
- hope to release this week
- pegasus-statistics
- jobs files will have 3 new columns ( multiplier factor , kickstart * multiplier factor, remote cpu times)
- remote cpu time wil be populated only if present in the kickstart record ( it is sum of utime and stime in the kickstart records )
- monitord changes
- has to be changed to not assign default values for the remote cpu time. pegasus-statistitcs will list - in case not present
- upgrade tool changes
- upgrade tool should not populate remote cpu time
- cumulative workflow values / summary file will have the values multiplied.
- Workflow cumulative job wall time
- transformation statistics file per workflow breakdown.txt will use the multiplier factor before doing the min , max etc
- time.txt file will have it changed
- jobs files will have 3 new columns ( multiplier factor , kickstart * multiplier factor, remote cpu times)
- test cases
- Rajiv has checked in blackdiamond case
- Rajiv, mats and karan will meet on tuesday to go through the cases we have so far
- documentation
- running workflows section ( execution environments )
- will break it out, and will be more towards common environments ( like xsede, osg etc ) .
- karan will update the properties documentation (
- data management needs to be worked on
- new clustering technique needs to be worked on
- running workflows section ( execution environments )
Jan 25th, 2011
- pegasus lite configuration will be triggered by pegasus.data.configuration property
- sharedfs the default mode
- nonsharedfs Pegasus Lite in conjuction with pegasus-transfer on non shared filesystem
- condorio Pegasus Lite in conjuction with Condor IO on non shared filesystem
- pegasus analyzer done for 3.2. It will only work for a 3.2 database
- monte has worked on a schema upgrade tool, that upgrades the schema to a new schema. it will populate the exitcode in the job instance table accordingly.
- will be checked in share/pegasus/sql directory.
- because of the upgrade tool, dual queries will not implemented in the retrieval api. only the upgrade tool should be used to upgrade the db.
- also there should be a getSchema function in the netlogger API that the tools will print saying what schema version is detected.
- test cases organization
- hierarchal directory structure proposed to include similar tests with different configuration files
example pegasus/test/core/blackdiamond/t1 , pegasus/test/core/blackdiamond/t2 where t1 and t2 are sub directories that contian the configuration files - plan.out should appear in t1 and t2 directories then. blackdiamond/t1 , blackdiamond/t2 will appear as different tests in bamboo
- rajiv will check in an example for the above.
- hierarchal directory structure proposed to include similar tests with different configuration files
- release to be named 4.0 instead of 3.2
- people feel the release is a major upgrade and we should consider calling it 4.0.
- Karan will followup with Ewa.
Jan 9th, 2011
Agenda
- exitcode for pegasus lite jobs
- stampede schema . placement of the exit code whether in job instance or job table
- pegasus dagman changes for checking on multiple instances of monitord
- testing cases layout
- naming configuration for 3.2
- exitcode for pegasus lite jobs
• exitcode for pegasus lite jobs
- corner case for pegasus lite jobs where the failures may happen outside of the kickstart invocations.
- mats will modify exitcode accordingly
• pegasus dagman changes for checking on multiple instances of monitord
- consensus is that no changes to pegasus-dagman required.
Fabio summary on the raw status population
- only raw status is populated in the DB
- the exitcode in the job instance table is populated with the status as dagman saw it. if a job fails, and post script succeeds then job instance is true.
- pegasus analyzer has been updated.
- in future maybe analyzer can check on disconnects between job instance exit codes and invocation exit codes.. not the default option… probably a better analyze mode.
• stampede schema . placement of the multiplier factor whether in job instance or job table
- everybody agrees it is at the right place.
• testing cases layout
- test case should have date timestamps like 004-montage-grid
- all test cases should have a README file listing the description, purpose and associated jira items.
- one of test cases may indicate a bug for pegasus-analyzer in the DB mode
• naming configuration for 3.2 , will be talked at next meeting
December 2011
Dec 19th, 2011
- Rajiv
- Will setup a test case for shell code generator. Not this week.
- Will also check in an example for bin based clustering.
- Karan
- Fixed the bug for the staged executable paths with PegasusLite that came up during galactic plane workflow
- Worker package staging
- fixed dependencies for stage worker jobs
- stage worker jobs refer to pegasus-transfer
- Mats
- Working on pegasus testing glue
- test cases running as a cron job.
- Data configuration for 3.2 Pegasus
- discussed on whiteboard the options
- Do away with pegasus.data.configuration and replace with pegasus.data.mode sharedfs | condor IO | pegasus-transfer
- Passing of arguments to sub workflows from command line
- it was decided to push this to 3.3 as we cannot come up with a good way to go about it
- https://jira.isi.edu/browse/PM-461
- Fabio update on Stampede
- 3.1 items
- Fabio unable to reproduce the notifications bug
- 3.2 release
- Working on bug for sub workflows retries. will do today
- Wrap up 3.2 analyzer
- Confusion about what goes in the stdout /stderr text fields for clustered kickstart jobs
- Fabio will update the invocation exit code to be populated with exit status.
- 3.1 items
- added in monitord to die if monitord is already running.
- waiting on Gaurang to fix pegasus dagman JIRA PM-523
- Still need to update the queries , and the migration of DB's as discussed for 505
- Release deadline
- End of January for all stampede related development including pegasus-statistics taking in multiple workflow id's
- February for testing and documentation
Dec 12th, 2011
- Karan
- updated on the cleanup workflow for 3.2. After discussing with the group, it was decided to disable the separate cleanup workflow generation for 3.2 as directory remove is not supported in pegasus-cleanup.
Will be tackled for 3.3 - modifications to standard universe handling for LIGO
- Fixed the mismatch in net logger events for the executable attribute for dax jobs
- updated on the cleanup workflow for 3.2. After discussing with the group, it was decided to disable the separate cleanup workflow generation for 3.2 as directory remove is not supported in pegasus-cleanup.
- Fabio
- At conference last week
- Will work on multiplying factor this week
- Rajiv
- Checked in his changes to horizontal clustering for the bin based clustering
- Testing
- No update. Gaurang will spend couple of days this week to get the example up and running.
- Will move the tests to the main pegasus svn instead of the separate testing svn
- Jens has been busy developing the FG tutorial for future grid
- Mats
- At conference last week.
Dec 5th, 2011
- Update from Karan
- Worked on the cleanup jobs to refer to the staging site and using the new pegasus-cleanup executable
- Changed the shell code generator to use the new FHS layout
- Fixed the handling of long argument strings
- Update from Gaurang
- Had troubles with running an example from Bamboo before he left
- Will be working on it this week
- Update from Rajiv
- Rajiv has been working on the runtime based binning for the horizontal clustering.
- Update from Jens
- Jens will spend some time on bamboo before leaving for vacation
Post 3.2, we talked about extending the site catalog schema to specify what credential need to be transferred
Karan mentioned, that some work of automatically associating the jobs with the right credentials will be done for 3.2.
November 2011
Nov 7th, 2011
- Update from Karan
- Has the worker package working for Pegasus Lite
- Executable staging still need to be taken care.
- Mats ran a montage 8 degree workflow with clustering on.
- Mats has to optimize the SRM staging with pegasus-transfer
- For 3.2 no interleaving for staging and execution in pegasus lite.
- Details for the worker package staging
- did the changes to the transfer format we discussed last week.
- Update from Rajiv on time based clustering
- Will start on it this week.
- Will look into how much work is involved.
- Will try to complete in the coming month.
- Update from Fabio
- Statistics will always output both csv and txt format
- Will start work on pegasus-analyzer
- Will identify the queries this week.
- Will try to check in a version of pegasus analyzer before he leaves for vacation.
- Open question
- how to determine whether a workflow is still running or not.
- pegasus-status may need to be modified.
- running
- finished with success / error
- what happens when pegasus-status is run on a directory where workflow does not exist.
- Update from Mats
- Testing the pegasus lite example. has a new test case.
- Pegasus Discuss mailing list.
- We will setup pegasus-users.
- Remove pegasus-discuss
- Testing Update
- Update from Gaurang
- first stage builds pegasus . tar files need to be untarred in a directory.
- Still needs to check out from the testing SVN.
- Second step is the testing stage
- untar the tar ball
- trying to run pegasus condor blackdiamond example.
- Working on creating common config files
- that the test cases can use.
- also there will be a provision to use own site catalogs.
- - Needs to figure out how to pass user arguments to a bamboo plan
- Will give a demo in December on December 12th.
- first stage builds pegasus . tar files need to be untarred in a directory.
- Update from Gaurang
- Update from Jens
- Testing Update:
- Jens has some plan running with builds and ability to run some unit tests.
- Gave a demo of what he has done.
- Has a plan of checking out and running a unit test case.
- Will spend 2 more days on it until next all Pegasus meeting in December.
- PM-518: Update pegasus-cluster to use PATH environment variable to find relative-path executable.
- Added short FutureGrid blurb to Pegasus web site.
- Testing Update:
October 2011
Oct 31st, 2011
- Update on testing from gaurang
- nothing new
- still working on harness
- gaurang will have an example running on it this week . Once the first test case is running, Jens will also start working on the glue .
- update from jens
- talked serban. no headway on it.
- will read up on bamboo. no promises.
- Testing SVN setup.
- Mats and Karan are checking in test cases on it.
- Update from Rajiv
- No update. Has not had a chance to work on the clustering technique
- Will let Karan know by end of Tuesday about the timeline.
- Update from Fabio
- pegasus-statistics generates output in csv format.
- spaces should not be in the format
- Special markers for headers
- Or we have multiple files.
- Fabio will always generates the output in both the formats.
- Karan gave a demonstration of Pegasus Lite work
- Ran ( Planned ) the OSG SRM test case example.
- Jens and Gaurang are ok with how it is has been done. No major comments on missing pieces.
- Jens did notice that relative paths to executables is not specified when executable staging is done. Karan still has to fix executable staging and worker package staging for pegasus lite mode.
- Improvements to transfer input format.
- Changes to submit files format.
- Comment before classed keys
- Change the ordering of keys
- Karan and Fabio should check up on how clustered jobs are affected by the pegasus lite mode.
- Post 3.2 item.
- Remove the index field from the names of the generated files of the executable workflow.
Oct 24th, 2011
Attendees: Karan, Mats, Gaurang, Jens, Fabio and Rajiv.
- Update from Karan
- Common Shell Script with the functions
- Will demo a version next week , showing how the planner plans out pegasus lite wrapped jobs
- Database Schema Update
- https://confluence.pegasus.isi.edu/display/stampede/Stampede+Database+for+3.2
- Addition of exitcode with job instance table
- exitcode population in the job instance table
- everybody likes the idea
- nothing special should happen in the queries, like max of exitcode.
- Update from Fabio
- Working on the csv output
- Jens tripped over a bug regarding notifications.
- Working on pegasus-analyzer with a directory
- Update from Mats
- Change to NMI builds for pegasus
- Pegasus WMS package inclusion of Condor
- Condor is not releasing versions for all the platforms
- Mats had to drop debian 5 from the WMS release.
- Condor no longer does RHEL 6
- Apt file is included now with the packages.
- The FHS package is not relocatable.
- Gaurang would like to see only installs via apt or yum.
- Parallel installs for 3.2 maybe?
- Change to NMI builds for pegasus
- Update from Rajiv
- Talked to Gideon about bin packing algorithm.
- Time based packing.
- On a particular level, jobs will be grouped by transformation name.
- Will work on it this week.
- Testing Update
- Bamboo is installed. Upgraded the version of Bamboo
- Gaurang playing with Mats scripts.
- Jens wants separation into two things
- when you checkout and build, if build fails, then no tests should be run
- Currently trying to use as much bamboo out of the box as possible.
- For a nightly test
- only one build will be used by all the tests.
- Jens will try to meet with Serban.
- Update from Jens
- Worked on the kickstart tasks
- Has a master task, 516 that includes all the under the hood changes.
- Other Gideon related requests will be pushed to 3.3
- Tripped over notifications issue
- Worked on the kickstart tasks
- Comments
Oct 17th, 2011
Attendees: Karan, Mats, Fabio, Jens , Rajiv and Gaurang
- Testing Update
- Gaurang has Bamboo setup and the builds working.
- Gaurang still has to work on the glue.
- Red and Green scripts.
- URL for the test cases.
- Will make a new subversions will have tests.
- Four checks
- check exitcode.
- check existence of files.
- ???
- We will setup a separate SVN for the test cases, and people will check in some mockup tests that can be used to do the initial integration.
- Release Date
- Most of the people will be away and on vacation in the coming months.
- People feel that end of January is a good feature freeze date
- And then target end March for the final release, with all the testing and documentation that needs to be done
- Stampede Exitcode Tracking
- Not talked about as yet.
Oct 10th, 2011
Attendees: Karan, Mats and Jens
- Karan explained how the staging-site command line option works. Mats and Jens agree to the semantics.
- We had a long discussion on the semantics of exit status and exit code. Jens will update JIRA 505 to reflect the discussion.
- At a high level we decided, to stick with only one exitcode column in the DB. monitord will populate that with either raw status, signal , error number or exitcode, whatever is present in the kickstart record.
- Jens and Fabio will also talk about whether monitord is doing the right thing when kickstart is not present.
- Mats did not get a chance to work last week on 3.2
- this week he will give us the first cut of shell functions that we need for pegasus-lite
- Karan thinks he will get to first set of pegasus lite changes next week
- Karan had a talk with Ewa and indicated January end as the release, with coding freeze end of December.
- Testing framework
- Gaurang has done installation of the Bamboo framework.
- Glue still needs to be developed.
Oct 3rd, 2011
Attendees: Karan,Gaurang, Mats and Fabio
- Update from Karan
- Working on putting in the notion of staging sites.
- Updated Pegasus to reflect FHS layout
- Changed Pegasus to refer to new clients ( still ongoing)
- the automatic placement of cleanup and create dir jobs need to be done.
- Has put in JIRA tasks for the release.
- Update from Gaurang
- Gaurang installed bamboo at bamboo.isi.edu for the testing server.
- has problems with the AJP connector.
- Gaurang has issues with confluence page restrictions.
- Will complete the bamboo setup this week!
- Update from Mats
- has initial versions of pegasus-create-dir and pegasus-cleanup.
- we still need to integrate uberftp
- has setup the credentials for the testing framework.
- has initial versions of pegasus-create-dir and pegasus-cleanup.
- Update from Fabio
- refactoring of monitord code - has a library for notifications and event generation.
- will complete it this week.
- added option for disabling stdout and stderr, there is also a boolean property pegasus.monitord.stdout.disable.parsing
- updated the stampede schema with the new columns.
- for signal handling, we feel there should be some special exitcode.
- Some special exitcode , like -ve signal number for the exitcode.
- for no kickstart , convert from exitcode to raw status
September 2011
Sept 26th, 2011
Attendees: Karan, Gaurang , Mats, Fabio, Jens and Rajiv
- Multiplier factor for MPI jobs
- Lets ask Scott about it.
- Have a multiplier factor in the schema. Is always at the Job level.
- multiplied at the query time.
- We will have a Pegasus Profile - cores
- Time based clustering
- Similar to horizontal clustering.
- runtime profile key that can be used with the jobs.
- another profile key that says the clusters.maxruntime.
- Rajiv will talk to Gideon about it.
- Jens waiting on JIRA entries for ctools renaming.
- Testing
- Jens talked to Serban about doing with Bamboo.
- Gaurang will setup Bamboo this week on cartman.
- Databases will be backed up on stewie.
- Gaurang will have an example running end of next week.
- how are the users setup on cartman. tests should be run as one user.
- Mats will setup global user on our condor pool.
- Bamboo will do basic environment setup
- will give a pegasus build
- Bamboo has conventions on stdout and stderr.
- Each test has to cleanup after itself?
- Monitord
- refactoring of monitord codebase will be done this week
- raw exit status and exitcode will be populated
- for the queries only exit status will be used.
- Fabio and Karan will update the database schema pic to reflect the changes.
- New Data Model
- Data
1 Comment
Unknown User (voeckler)
Can you separate the meetings into pages of their own? Otherwise, it is nigh impossible to comment on issues at hand, because there is only one comment section per page.