Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  • Table of Contents

April

...

 2019

April

...

12th 2019

  • Pegasus

...

  • SGI cluster
  • we have 3 potential SGI cluster users Cedars, Vision group at ISI and maybe Rutgers ( that will be replaced with SLURM)

...

...

  • need to now make changes in various parsers

...

  • 5.0

...

  • rc1 working for hub
  • LIGO trying it out.. wanted to change checkpoint files. need to hear back on the dashboard changes.
  • SCEC ? waiting to hear from Scott
  • https://jira.isi.edu/issues/?filter=10851
  • pegasus-db-admin sqlalchemy issues? for updating tables?
  • pass through implemented for Glite to PBS
  • verification of update to pegasus version on running workflows
    • Site Catalog Conversion to YAML
      • mukund is mainly done
      • pushed out his changes
      • trying to make the tests green
    • Checkpointing changes to accomodate LIGO use of vanilla universe
      • Karan and Mats will explore and see if it is possible
      • cumulative stdout|stderr
        • what about time and duration values
        • since there is no DAG Node retry and job just goes on HELD state
    • Composite Events
      • Kibana dashboard needs to be updated
      • dropping __ in the event names
      • George wants the AMQP library updated 
        • Will create a JIRA item
    • Office Hours video
      • Karan will work Jasmine to upload the video
  • Papers
    • RACE Paper submitted last week
    • PEARC Paper this week
  • Proposals
    • Army Research 
      • enabling in-situ supports for ExaScale
      • linked with what Tu is doing
    • SCEC Proposal Submitted
      • have a good chance
    • Exascale one with Michigan
      • the call will come out soon
    • Ewa , Rafael and Deborah
      • NSF GCR Proposal
      • Modelling wild fires
      • Has PRICE school input and also Deborah Post DOC
  • EScience
    • Pegasus Tutorial Proposal
    • May 6, 2019: Tutorial Proposal Deadline
    • Also trying for the workflow comparison paper
    • Dynamo paper by George
  • Pegasus connect discussion
    • tabled it for later when Mats is around
  • HTCondor Week
    • Karan will be doing a Pegasus talk and Pegasus workshop
  • Pegasus OLCF Poster
    • combine the panda poster
    • can also submit to EScience
  • Ryan's work
    • Loic is moving pachyderm setup to AWS
  • Loic Rafael and Tu are working on a paper for Cluster
  • Software X

March  2019

March 29th 2019

  • 4.9.1 Release
    • done and working on 4.9.2
  • Site Catalog Conversion to YAML
    • mukund working on it
    • i still need to look at the bamboo tests
      • bamboo faling on mount scratch thing that condor thing
      • we have to fix in pegasus also. to fail on credentials in /tmp
      • check and do condor_config_val  on the key and check if /tmp is in there
      • mainly affects all the users that use x509
      • LIGO has also tripped over it . Both with Pegasus and without Pegasus
  • Condor vanilla checkpointing
    • karan asked him about what he is trying to do
  • composite events 
    • check for keys with same values
    • also do we need to pad extra keys for all events?
  • Extensions to Jupyter Integration
  • Pegasus Connect
    • will discuss on whiteboard on April 12th


March 1st 2019

  • 4.9.1 Release
  • Office Hours
    • On Friday March 22nd on real time monitoring
  • transformation catalog for 5.0
    • Mukund will work on it next
  • EScience?
    • Paper
  • pegasus-exitcode test
    • success message not parsed correctly  
  • Programmer
    • will interview the 

February  2019

February 22nd 2019

  • 4.9.1 Release
    • Pending Issues
      • https://jira.isi.edu/projects/PM/versions/11891
      • This raises the larger issue of how long we want to support externals packages

        there are some packages we need to ship because of worker packages dependencies.

        Consensus:
        We remove mysql python externals package for 4.9.1 and 5.0.0

        And also remove the dependencies from our deb and RPM builds.

      • Transfers within containers
        • We are only going to transfer from within the container till people complain
        • George Papadimitriou will add to the documentation.
      • non ascii encoding in the stdout
    • Support HPSS storage
  • Office Hours
    • George on real time monitoring.
      • Date?
  • EScience?
    • Paper
    • Tutorial submission

February 1st 2019

  • 4.9.1 Release
    • ascii encoding breaks while parsing for monitoring events. monitors should have the population working and have log a warning error.
      • but we should ensure that stdout in database still gets populated
      • Karan will fix this
  • New TC Format
  • Shifter Support in Pegasus
    • is in 4.9 branch
  • Pegasus Annual Report
    • will be working on it in coming weeks
    • will ask for input
    • next year report will be tricky . in terms of effort allocation.

January  2019

January 25th 2019

  • 4.9.1 Release
    • ascii encoding breaks while parsing for monitoring events. monitors should have the population working and have log a warning error.
      • but we should ensure that stdout in database still gets populated
  • YAML format for the TC
    • the line numbers should be mentioned in the errors
  • GitHub commits don't trigger bamboo builds right now
    • move to webhooks?
    • slack token in bamboo.yml . 
      • mats will look into it further
  • SCEC for HPC Transfer certificate issue
    • Globus online certificates messed up hpc-transfer issue.
  • Data Storage at NERSC
    • almost full
  • Singularity container with the entry point.
    • docker → singularity container conversion does not add the entry point.

January  18th 2019

  • 4.9.1
    • container execution
      • data transfers happen within the container
      • python3 issue
      • vague rules to discover what python to use
    • Singularity HUb URL's updated
      • Documentation and tutorials need to be updated
      • montage examples
      • python stuff: create JIRA item
    • LIGO pull requests
      • Build pull request
      • PAM module
      • subprocess package thing
      • also related to Python3 movement
  • Transformation Catalog Implementation
  • Astro Py
  • Shifter support at NERSC
  • Panda Integration
  • CENON NT
    • Rusio data pull in 
    • fetching data might be easier
  • Journal Paper
    • need to write something about containers

December  2018

December 13rd, 2018

  • Pegasus 4.9.1 release
    • local site catalog entry creation
      • based on the pegasus version on the submit host
    • encoding issue in the stdout.
  • Pegasus 5.0 Release
    • TC yaml implementation
      • mukund will create a yaml schema compatible with the TC
    • backwards compatibly 
      • case by case basis
      • definitely for
        • catalogs
        • dax 
        • pegasus-transfer
  • SWIP Paper
    • we are in good shape
  • Titan
    • under the PBS batch gahp.
  • ZTF
    • the pipeline is based on docker-compose
    • peter will visit ISI with postdoc Danny in January
  • Tutorial at TACC
    • karan has updated pegasus-init to work on wrangler
    • will update the tutorial notes accordingly 
  • OLCF accounts
    • make sure they work 
    • get karan and mats can login

November  2018

Nov 29th, 2018

  • Ryan
    • working on comparison paper with george on workflow systems
    • mats, karan shared neon meeting notes with Ryan
  • Pegasus 4.9.1 release
    • Due for december end
    • potential issue in monitord in reference to hierarchal organization of submit directories
    • pegasus-submitdir
  • ADASS Paper
    • due tomorrow
    • need to add information about sample run
  • SWIP paper
    • mats and karan will work on it tomorrow afternoon.
    • cull out sections
    • add information about updated monitoring in 4.9
  • OLCF Kubernetes 
    • Condor is installed and configured as root
    • George tried condor log directory to lustre as condor in container has to run as user not as root
    • LOG_DIR should be /tmp
    • volumes can be attached to container to contain workflows etc
  • Dynamo 
    • Do dynamic scheduling
    • George thinking of using flocking
    • similar to what is done in OSG
    • non-sharedfs deployments should work

Nov 1st, 2018

  • Pegasus 4.9.0 and 4.8.5 Released
    • We released it this week.
  • Pegasus Business Card
    • Advocate for job postings. 
      • Postdoc options
      • Programmers
      • pegasus.isi.edu/jobs
    • We should take to conferences with us
  • Pegasus JAVA 8 dependence in RPM
    • there is a disconnect between RPM and common.sh
  • ADASS
    • Karan working on a wlpipe demo example
  • New Student
    • Mukund 
  • Duncan started using 4.9.0 and has updated pyCBC to use singularity
    • changed our container execution model
    • all transfers done within the container now.

October  2018

Oct 12th, 2018

  • Rescheduling meetings
    • New time is Thursdays 2PM starting from last week of October
  • DAX APi reporting
    • Perl DAX API - Rajiv
  • Atlas visit
    • Wednesday we have Scientific Computing Seminar
      • Will involve writing a Pegasus code generator
      • Panda is second biggest after Condor on OSG
    • Thursday 
      • Karan and George will be there.
      • Mats might be available remotely
  • 4.9.0 Release
    • Mats preference is to skip the beta tag
    • Aim for the full release
    • Documentation freeze on Oct 26th
    • Try and do the builds over the weekend
  • Duncan container usecase
    •  cvmfs hosted container images
  • Demo repository
    • panorama data and some runs from exogeni / nersc
    • Mats has two new elastic search VM's and are part of Elastic Search cluster
    • these vm's data is backed up also

Oct 5th, 2018

  • Rescheduling meetings
    • Either Tuesday or Thursdays
      • Karan will circulate a doodle poll

September 2018

September 28th, 2018

  • Rescheduling meetings
    • Either Tuesday or Thursdays
      • Karan will circulate a doodle poll
  • Pegasus 4.9.0 Release
    • transformation selection issue
      • karan has not been able to recreate it yet.
      • will look into it more today
    • docker singularity pulls
    • container symlink 
    • deprecate api's
      • modify DAX generators to indicate version/ DAX API used.
      • will look into ways on how to do it
        • one way is workflow metadata attributes
        • second is attribute to ADAG object.
      • rajiv will check how it gets stored in the metrics server
  • ADASS
    • will try and do a poster with Mike at ADASS
    • deadline is Oct 8th

September 21st, 2018

  • Rescheduling meetings
    • Either Tuesday or Thursdays
  • Pegasus 4.9 release
    • integrity error reporting
      • pegasus-statistics reporting information about integrity errors
      • the unicorn dashboard for internal swip purposes
        • errors are appearing in the stream
        • more brainstorming required. the data is there
        • not clear whether to use grafana or kibana
          • does not have drill down functionality
          • mix of production and test workflows
          • create different queues in AMQP exchanges
    • container mount point support
      • karan is close to have that being implemented
    • transferring outputs to multiple location
      • lets say one for portal and the other for 
      • list of output sites
      • good feature to add for 4.9.1
      • update --output-site option to pegasus-plan
    • pull docker images for singularity runs
      • we should do for 4.9.0
      • planner needs to tell pegasus-transfer an extra attribute. 
        • add a type attribute
    • Papers 
      • Github private papers repo
    • Deprecate stuff
      • perl api
      • old catalog formats
      • pegasus-plots
    • Hiring

August 2018

August 24th, 2018

  • Pegasus 4.8.4 Release
    • when are we releasing?
      • next week before mats go on vacation
  • error tagging
    • update stampede schema to add a table called tags
    • will allow us to capture number of integrity errors

August 17th, 2018

  • Pegasus 4.8.4 Release
    • RPM fix ? 
    • mats will manually verify
    • Karan should follow up with Stuart
  • AMQP filtering
    • we are working on having filtering in built into monitord
    • nepomunk already has 33 errors identified
    • we need to db connection, pegasus-db-admin and other tools to pass properties with pegasus property prefix stripped off
  • SWIP Paper
    • one reject seems to be harsh
    • we can try for HPDC also

August 3, 2018

  • Pegasus 4.8.3 Release
  • SLURM
    • Design Safe / TACC on Wrangler headnode
    • Nextflow has integration with SLURM and everything can be installed in user space
  • PMC unit tests are broken
    • lets fix the tests
  • Pegasus 4.9 release
    • more real life runs
    • nepomunk against ceph-s3 from one of uchicago machines
    • we need to get stats reported for integrity errors
      • larger issue of error classification
  • ADASS Tutorial
    • we got into second round
      • add on exercise to run montage in the end.
  • LIGO
    • Bruce group at AEI Hannover has left LSC
  • Infrastructure
    • HipChat mess
      • should we move to ISI Slack
    • Public Chat feature
      • Some clients for Hipchat
    • Get a free channel from Slack
      • for all Hipchat rooms
      • what about ISI slack?? 
    • Github removal of old integrations
  • MINT Meeting
    • went well overall 
    • issue of scoping . 

July 2018

July 27th, 2018

  • Pegasus 4.8.3 Release
    • VM Tutorial
      • will update pegasus-init requirements to get it working
      • main tutorial chapter will be updated for 4.9
        • because then tutorial based container may not work
    • change how docker scripts set environment
    • SCEC database loading error
  • Failing Tests
    • Issue in updates to the dashboard database
  • Panorama Paper
    • agreed on a re-organization

June 2018

June 29th, 2018

  • Pegasus
    • 4.8.3 needs to be released because of singularity launching options
      • will wait till tutorial is updated. 
      • karan will update pegasus-init with population modeling or povray option
    • 4.9
      • pegasus-statistics updated with integrity metrics
      • how to flag job errors because of integrity
        • need to figure out logic
        • value add proposition
        • maybe we should value type in the pegasus lite 
      • need to implement the integrity dial
    • Start creating default local site entries to execute without local site
  • ADASS Tutorial
    • Will submit today 
    • Google doc shared

June 22nd, 2018

  • Pegasus
    • SWIP paper submitted to escience
    • 4.8 montage tests failing
    • changes for integrity metrics in pegasus-transfer
    • updated monitord to parse events from various sources like pegasus lite output
    • mats pointed out to a bug in monitord
  • LIGO
    • pip for python source package
      • update dependencies for latest packages , like pyopen ssl
      • install in the pip repository
    • pegasus-analyzer
    • interested in swip and containers.
  • SCEC CSEP
    • will use containers
    • run on Comet
  • 1000 genome workflow or use chimerica workflow
  • ADASS Tutorial
    • montage ? 
    • probably pycbc is also submitting a proposal

June 8th, 2018

  • Scott Replica Catalog issue
    • Replica Catalog deletes take a long time
  • Bamboo
    • bamboo emails are no longer received. so we dont come to know about workflow plan failures
  • SWIP 
    • monitord integrity changes.  population of data from ks records working now.
    • we still need to populate data from pegasus lite records and pegasus-transfer
    • pegasus-statisitcs need to be updated
    • 0.1% overhead on production osg gem workflow
  •  Pegasus deployment at ORNL
    • we should be doing it similar to hpc-pegasus
  • Pegasus Office Hours
    • next one in August
    • travels in July

May 2018

May 4th, 2018

  • Pegasus 4.8.2 Release done on May 3rd
  • we should consider separate user data to a separate file on pegasus-wms
  • si2 meeting updates
    • some potential new users
    • ewa slides were a good overview summary
    • integrity data schema changes. 
    • monitord changes need thinking

April 2018

April 6th, 2018

  • Pegasus 4.8.2 Release
    • PMC bugs
    • tutorial for usc hpc
    • no longer allow + or . in the names
  • Pegasus Report
    • Submitted for Ewa' review
  • SWIP test run
    • discovered integrity errors in the wild
    • at colorado and university of nebraska
      • we would have not caught it before
    • e-science paper

March 2018

March 30th, 2018

  • SWIP
    • pegasus-run issue, with wf restarting from scratch
      • because dagman rescue file is not there.
      • so should we update pegasus-run to look at the dagman.out file
        • so far we think it should be kept consistent with normal dagman behavior
        • to de discussed at condor week
    • mats created a Jira item for swip related statistics
    • Things remaining
      • Dials to be implemented
      • stampede changes
      • pegasus-transfer changes???
  • SC Tutorial Submission ( April 16th) 
    • https://sc18.supercomputing.org/submit/tutorials-submissions/
    • We should try and add exercises for containers
    • We will try for half day
      • 45 minute introduction
    • Feedback from Arizona Container Camp
      • There is interest.
    • coming up with an existing application that people understand or can relate to
      • montage - complex dax generator
      • rosetta
        • only works in nonsharedfs stuff 
        • with 
      • machine learning example?
        • with tensor flow?
        • requires container
      • NVIDIA has a lot of examples about machine learning
        • has to be multistep
        • and at least bag of tasks
      • Ashwin is doing some tensor flow stuff
        • on workflow.isi.edu
        • is working out of  jupyter notebook
      • Genome sequencing workflows??
        • use Broad GATK sequencing workflow to use
        • SOYKB and IRRI use GATK
        • and are huge communities
      • http://biocontainers.pro/docs/101/running-example/ 
  • Pegasus Report
    • we should be resolve Jira items as we fix them
    • will be also doing cumulative statistics 
  • Pegasus Office Hours
    • Jupyter Notebooks
    • will update the example to use namd example used for Oakridge
  • Panorama Stuff
    • our multiplexing part in monitord done so far
      • however we are relying on amqp queues and routing keys for filtering
    • darshan data population
      • we need to invoke a script (pegasus-darshan) that will be invoked in the namd wrapper script, to pull the data from darshan logs on the file system and generate an ASCII output
    • Panorama.isi.edu VM
      • AMQP
      • Logstash
      • Kibana
      • Elastic Search
        • Make it do a backup every so often.
        • Warns against doing it as a permanent datastore
        • Rajiv will verify
      • Influx
    • Backups
      • CRASH PLAN backup for the /srv and /opt in the panorama VM
  • LIGO Database locked issues
    • we need to look into the locking issues by tinkering with monitord flush intervals

March 16th, 2018

  • SWIP
    • Most of the SWIP stuff is done as far as planner changes and getting the workflows running
    • we are in a position to share something
    • To do
      • sharedfs
      • Dial implementation
      • Update monitoring
      • Paper submission for EScience
  • Pegasus Reports
    • new applications to attribute to pegasus grants
    • all the mike wangs work will go here
    • SCEC
    • LIGO - need to ping Duncan
  • Panorama/ Pegasus workflow endpoints
    • We seems to be going towards AMQP
      • How is AMQP going to be configured
      • So far we have 
        • amqp://[USERNAME:PASSWORD@]amqp.isi.edu[:port]/<exchange_name>
          Online monitoring in kickstart 
          • amqp://[USERNAME:PASSWORD@]amqp.isi.edu[:port]/<virtualhost>/<exchange_name>
      • Virtual Hosts
        • right now virtual host is hardcoded in monitord code. we set it to pegasus
        • global - across workflows
      • Exchanges
        • should be global across workflows
        • type direct - in panorama
        • we want them to be type -> topic instead
      • Queue
        • in panorama different queues for each workflows
      • Routing Keys
        • the routing key should be based on stampede event names
      • Events populated

February 2018

February 23th, 2018

Eliminate support for Py2.6?

Python Dependencies

All - future

pegasus-service - Flask, SQLAlchemy, Flask-SQLAlchemy, Flask-Cache, pam, plex, pyOpenSSL, ordereddict

pegasus-monitord - SQLAlchemy

pegasus-analyzer - SQLAlchemy

pegasus-s3 - boto

pegasus-globus-* - globus-sdk

pegasus-init - jinja2

pegasus-metadata - argparse

pegasus-em - requests

PostgreSQL - psycopg2

MySQL - MySQL-Python OR mysqlclient


Note: Packages in green are available from yum.

February 9th, 2018

  • SWIP 
    • checksum computation will be implemented in pegasus-transfer. 
      • allows us to handle the case where the input files don't have checksums in the RC
    • integrity checks are disabled now for files that dont have checksums in the RC
    • dial knob
  • Tests
    • seem to be slow
    • bamboo could be moved to the new server
    • storage constraint test
  • Lizard FS
    • Mats will give an update next time around
  • Servers
    • Trying to do two server
    • IF we buy one server
      • Buy a storage server. That is Mats preference.
      • SoyKB workflow has
    • Compute 
      • we will get a compute server first. 
    • We should figure out the server and put in the request soon, and done by Feb end
  • LSST
    • Tom Glanzman? 
    • We will touch base on Monday with Tom and Nersc folks
  • Office Hours today
    • have a presentation on containers
    • will upload on the website

January 2018

January 12nd, 2018

  • AWS Batch
    • seems to be running in karan's account.
    • update documentation about aws batch
  • Pegasus 4.8.1 Release
    • upto Mats whether we should tag or not.
  • Pegasus Office Hours
    • Rafael will look up a new name
    • Container Presentation
      • Talk about containers
      • Blue Jeans 
    • Advertising avenues
      • XSEDE workflows list
      • OSG List 

December 2017

December 1st, 2017

  • AWS Batch
    • Client done. still have to figure out about stdout and stderr
    • maybe we should have batch push the files and control where the jobs go in
    • also maybe each file should go to it's own stdout stderr
  • Metrics for SWIP
    • Stampede
    • Metrics Server
    • Elastic Search
  • Rajiv working on changing the salt configuration
  • Model Integration with Wings

November 2017

November 10th, 2017

  • Pegasus
    • AWS Batch
      • checked in stuff
      • jars checked in aws sub directory in the jars folder.  pegasus-config classpath is updated accordingly
    • Bamboo builds
      • change in how users are handled
      • rajiv and mats worked on changing the salt configuration for the various machines
        • the major part changed was how the users are handled
        • the bamboo user got messed up and uid's were mismatching on the filesystem
        • main group for people unix accounts should be pegasus for everybody
        • only project users will have access to VM's for a particular project
    • Stewie Rebuild
      • move off stewie. the main OS needs to be updated
      • parnorama
        • Rafael and Geroge will create a VM for panorama
          • CENTOS 7
            • mats will help George create VM
          • Ashwin consumers from Influx DB
      • mysql server
        • Pegasus metrics server
    • JSON vs YAML
      • initial impressions seem to favor yaml
        • YAML does have benefit of including comments
        • also YAML , JSON will result in additional lines
    • templates for site catalogs
    • LSST
      • mats will update documentation for pyglidein 
      • to work with condor pool passwords thing
      • also will take mike site catalog to update NERSC entries
    • tests
      • rosetta and montage appear working again. not clear what triggered errors in first place
  • SC Next week
    • Rafael and Karan are away
  • AWS workshop for LIGO
  • George Panorama work
    • Dakota ends up launching multiple Pegasus workflows based on it's gradient functions
    • using ensemble manager to do multiple runs 
    • George will check in dakota test case and example
      • pick one approach and update documentation
    • SWIP Demo
    • think about merging stuff from panorama back to production branch
  • work with ian foster and raj kettimutt on globus online
    • do multi site run
  • Tudo
    • working on insitu
    • data spaces approach to have staging area
    • tudo wrote sample applications
    • evaluating on CORI using shared memory
    • burst buffers cannot be used
  • Ashwin
    • analyzes influx db data
    • using statistical learning
    • python panda library

November 3rd, 2017

  • Pegasus 4.8.1 release
    • 3 bugs in worker package staging.
    • pegasus-transfer PYTHONHOME unset does not work
    • hierarchal workflow handling. 
      • to be discussed tomorrow
  • AWS Batch
    • need to check in changes.
    • need to add options for the client and do error checking.
    • still need to figure out how to integrate in pegasus

September 2017

September 15th, 2017

  • Pegasus development
    • Dashboard
      • LSST might want it running out of a directory other than $HOME/.pegasus 
      • No plans to tackle it right now. requirements are vague. and catch 22 situtation
    • Python problem with Pegasus install
      • DAX3 problem does not work.
      • Could not be recreated
    • PyPy account should be disabled
      • pypy has a 4.3 pegasus package
      • we should remove it
    • The jobname with dagman not allowing . is fixed
  • LIGO
    • Heard from Duncan. Tried out metadata stuff
  • Another person at NERSC that is interested in running Condor
  • AWS Batch
    • done initial development.
    • how to retrieve logs etc.

September 8th, 2017

  • Pegasus 4.8.0 Release
    • went out this week
    • documentation
    • pyglidein
      • out of icecube
      • mats added a section in the documentation
        • pretty neat once it is setup
        • and works really well on machines with two factor
        • not tuned for MPI things.
        • on the submit  machine a web based python thing.
        • pegasus resource profiles will work out of the box with pyglidein
  • Releases
    • Post 4.8 Releases 
      • changes in the debain build
        • source package has been renamed. mats removed the source part
        • changed the versioninig of RPM and debian. The dev series will have the timestamp in it.
          • pegasus-version -f also has timestamp
      • Will create a separate YUM and DEB developer repositories
        • repositories will not be signed. 
      • Mats is still playing setup
      • Worked a lot on Debian packaging.
  • HipChat will be upgraded to Stride
  • Mats updated JIRA today
  • Sim Center Workflows
    • Using Condor IO thing
    • for 4.8.1 we should look at the remap thing
  • SWIP Poster
    • the first review is really good
  • Docker and Singularity
    • have stuff about engineering challenges
    • But not enough usage
    • Practical Aspect
  • Von's Group SWAMP thing.
    • pegasus is part of trusthworthy software thing?
  • AWS Batch
    • AWS batch thing works
  • Investigate how Dakota and Pegasus can work together
    1. Run Dakota as a job 
    2. Run Dakota on submission machine
      1. dakota calls a script that does a pegasus workflow
    3. Mix of 1 and 2.

August 2017

August 25th, 2017

  • Pegasus 4.8.0 Release
    • beta3 tagged
    • monitord replay issue for rc tables against mysql server
    • Jupyter thing
      • VM updated with Jupyter
    • Docker example application 
    • R builds with pegasus
      • for time being only brew builds have that disabled.
      • Condor update to the brew installation. 
  • Pegasus 4.9 Roadmap
    • SWIP 
      • lay out the changes
        • prioritize stuff for production readiness
        • the knob for integrity. 
        • get into transfers.
        • signing stuff on the backburner.
      • chaos monkey tests
    • metadata things
    • aws batch support
  • Pegasus Tutorial
    • George felt that Pegasus tutorial was a bit too easy.
    • it should be maybe more interactive. get the user to develop a new workflow
  • Tudo will pick up Decaf work
  • Dataspaces
    • do data management
  • Ashwin will work on deep learning on panorama
    • use tensor flow
  • Dakota
    • ini file . runs simulation and converges simulation points
    • George will be working on it
    • has a checkpoiniting facility

August 18th, 2017

  • mats found a new hydrology user in boulder
    • based at Boulder
    • there was a magpie presentation there. 
    • mats did a hosted ce tutorial
  • 4.8.0beta2 release
    • tagged and sent it out. 
  • monitord workflow and read permissions creation
    • should only when the database is created.
    • ~/.pegasus directory should be 755
  • dashboard errors
    • rajiv should traverse the directory in the dashboard.
  • LSST
    • cleanup issue
      • mats and karan agree on it, that it is bad application
      • we should reply to it. 
      • the wrapper should copy the file and launch the job
  • source a setup a script for jobs
    • has to be generically done
  • registration jobs shell expansion
    • we should not do getEnv=True
  • testing repo
    • stuart from LIGO asked for it.
  • BOSCO
    • we have the examples updated
  • Karan will remind Eliu about LIGO and Bluewaters
  • Slick Jupyter Demos
    • Started up VM's
  • Jupyter tutorial
    • should be integrated into the VM

August 11th, 2017

  • Bamboo is finally green
  • we will do a Pegasus RC1. actually a beta since we still want to address some issues.
  • Rajiv fixed the build with python crypto issues
    • pyopen-ssl was updated during 4.7.x series
    • we should package only things that we are not sensitive to the versions
    • so right now pyopenssl is removed from binary builds, and all associated dependencies were removed.
  • New throttling things.
    • number of jobs scale with the size of the workflows.
  • SCEC all hands meeting.
  • Documentation
    • Took a stab at the containers.
    • Rafael has to add a separate jupyter chapter
    • Karan will update the throttling docs
  • LSST
    • Mats and Karan had a call with Tom about designing a workflow for one of the production pipelines
    • Mats and Rafael had a call with the French cluster folks (Fredrique Sutter). Fredrique works for simgrid
  • Paper
    • rvGAHP paper ready for submissions
  • Suraj Poster
    • Mings pass really helped

July 2017

July 21st, 2017

  • VMs are down, so tests are slow, and cannot test the new features yet
    • Mats will send an email (or call) Derek to check with the VMs issue
  • Try to run the Montage container test on OSG
    • TODO: Reconfigure our poll (it is not flocked yet)
  • Pegasus 4.8.0
    • Bugs on the container (transformation catalog) is fixed
    • Stage in/out nodes based on the number of computing jobs on the workflow
    • TODO: add warning for errors (size of jobs)
    • Warning for category is done
    • TODO: reference implementation of a workflow using docker (1000 Genome workflow - Rafael)
    • Jupyter: add container keyword for API

June 2017

June 23rd, 2017

  • Pegasus 4.8.0
    • Decaf
      • local universe jobs does not honor request_cpus , and jobs remain idle if they ask for multiple cpu's
        • karan will update pegasus to remove the request_ parameters from the local universe jobs
    • Steven Clark
      • Pegasus build issue is related to python 3 compatibility in the DAX API
  • LIGO 
    • Eliu plans to run on Bluewaters
    • we should confirm that he only wants to run on bluewaters.
    • they have sucky performance of getting data to the compute nodes in bluewaters.
    • set the schedd start date

  • NERSC
    • Karan will do a test setup there.

  • Pegasus Builds
    • failed because of detain version upgrades to build tools
    • setup tools in python complains to pegasus 4.8.0-dev 

June 9th, 2017

  • Pegasus 4.7.5
    • pegasus-rc-client bug fix is done
    • 4.7.5 and 4.8.0 together
  • Pegasus 4.8 release
    • docker stuff is complete
      • docker tests added are green
    • karan will work on singularity next week.
    • LIGO reports pegasus lite jobs filling up /tmp . karan will check with LIGO on whether there is any environment set? 
    • rafael will update his api to make it consistent with the container format
    • also will add a bamboo example.
  • DECAF  integration
    • karan has an idea about it.

June 2nd, 2017

  • Pegasus 4.7.5
    • pegasus-rc-client bug fix to be done
  • Jupyter
    • rafael will be working on it during June
  • For 4.8.0 
    • container 
      • docker works in nonsharedfs right now. 
      • work on singularity support.
      • clustering . clustered jobs can only refer to one container
      • symlinks -  for 4.8.0 they are disabled. 
    • container sharedfs example
      • we have pegasus-lite with sharedfs. automatic translation of file URL's
    • transfer refiner
    • notification email updates
      • mats updated default notification scripts. will generate svg files
      • at end of workflow generate notifications that have statistics
        • monitord needs to run the remaining notifications after the workflow is done.
  • makeflow integration
    • limitations for pegasus generating make flow integration
      • makeflow model 
        • all files have to be on the submit host
        • how do we translate auxiliary jobs to make flow description
          • tyson at arizona. 
          • add new transfer jobs
          • add new credentials
          • no postscripts there
        • monitoring 
          • won't work with monitoring
          • write a new monitord.
      • maybe do an oppposite translation???
      • what will be useful is to integrate with using work queue with our own dagman manager.

May 2017

May 12th, 2017

  • auto scaling of stage out and stage in jobs
    • 4.8 transfer refiner will be Cluster by default.
    • auto-computation of number of stage in, stage out and cleanup jobs
      • defaults should be computed based on number of jobs at a level.
      • use a ratio or step function . 
      • come up ratio ranges for auto determination
        • 1:5 for numbers of jobs < 10K ( 20%)
        • 1:20 for number of jobs > 20k ( 5%)
      • will create a JIRA item for this

  • container stuff
    • close to having one example running
    • have not figured clustering jobs out yet.
    • mats agrees with the approach now. pegasus lite invokes the docker run commands.

  • integrity stuff
    • will make slides
    • be specific about we have done . 
    • we give them an option of running synthetic stuff
    • For 
    • also define best effort part. 
      • strict, off, minimal , best effort
    • how do we handle case where SHA exists.

  • WDL
    • workflow definition language
      • WDL is JSON based
      • has a template approach with variable substitution 

  • AWS Cleanup
    • need to delete snapshots and cleanup VM's

March 2017

March 17th, 2016

  • monitord stdout and stderr missing 
  • the VARS one. just expose the variable. 
  • SCEC issue
    • job managers per resource
    • got fixed by one job manager per job
    • BOSCO works partly. 
  • containers call from yesterday
    • dsa
  • metadata 
    • metadata population in postscripts
    • move metadata population to the postscripts.

March 10th, 2016

March 3rd, 2016

  • Pegasus 4.7.4 Release
    • sent out the release
    • we did a ligo fix yesterday to pegasus transfer
  • mats osg gem
    • workflow did not finish
      • pegasus-exitcode has a shortcut for a regex
        • make it more strict. whether to trigger failure in pegasus-exitcode
        • revisit how metadata population
        • trigger failure for missing records. 
  • SCEC RC client issue
    • Rafael will look into it for pegasus-rc-client
  • containers support
    • containers on a pause right now.
  • Webinar
    • lets try and schedule one for april end
    • bluejeans will be an option
    • topic will be covered new features for 4.8.0

February 2017

February 24th, 2016

  • Pegasus 4.7.4 Release
    • we will tag today. 
    • there is a potential monitord bug that happens on sub workflow retires only in the live mode, that Karan is unable to trace
      • ds
  • containers support
    • pegasus lite launches docker wrap
      • or the other way around. because worker package has to be installed in the container in some cases
        • so double install
    • Clustered jobs 
      • we want at max one container to use the clustered job.
  • monitord performance
    • on OSG connect there is a difference between 4.6 and 4.7 performance replay
  • monitord.log has errors indicating unable to read .out .err files. 
    • we think it is a race between DAGMan and the filesystem

February 17th, 2016

  • Pegasus 4.7.4 Release
    • targeted for next week. 
    • LIGO ran into a prescript issue
      • pegasus lite deleted the worker package in the workflow submit directory
        • only triggered when there was a subsequent compute job.
  • new transformation catalog format 
  • containers
    • open issue whether docker wrapper launches pegasus lite 
    • or the other way around

February 10th, 2016

  • Pegasus 4.7.3 Release
    • SCEC has issue with pegasus-db-admin 
      • mysqldump timesout when updating their replica catalog
    • Database TC
      • remove support for Database TC
  • Stewie and fisheye upgrades
    • fisheye upgrade
      • Mats agreed to do the upgrade
    • stewie runs debian 7
      • we need to upgrade it one day or later.
      • runs GridFTP and mysql 
      • RabbitMQ is running there
      • MongoDB is running there
      • Catalog dependencies on stewie
    • 5K limit for a new server
  • OSG All Hands Meeting
    • no tutorial looks like 
    • lots of pegasus users coming there
  • Containers Support
    • pegasus lite invokes the docker wrap. 
    • singularity support will be required.
    • container modes 
      • should we support docker definition file
        • do we build on the worker nodes?
      • pull in  an existing docker image from the hub
        • on the staging site
      • whether we should unload an image or not
        • we should try and cleanup
      • credential renaming has to be worked out
    • Transformation Catalog
      • how to represent container dependency in the transformation catalog

February 3rd, 2016

  • Pegasus 4.7.3 Release
    • we tag later today or first thing monday
    • waiting for scott to reply
  • Jupiter Notebook
    • in general jupyter the interactive interface closes if you close the tab
    • in our case it does not affect us, since we invoke pegasus-plan at the server end
    • Vicky has a workflow out of panorama that she has in jupyter as a set of the instructions
  • Containers
    • karan did some exploration of docker containers via HTCondor
    • by default docker in the container runs as root. 
      • means output files are written out as root
    • also the containers need to be shipped around.

January 2017

January 27th, 2016

  • Pegasus 4.7.3 Release
    • 4.7.3 release.
      • condor stable release has been released.
      • we will tag next friday one way or other
      • fix monitord replay mode
      • crosscheck with rajiv on dashboard 
      • centralized mysql server for master workflow dashboard
        • LIGO wants to host a mysql server for master workflow databases
        • Mats will like to see something similar 
        • also look at some publish subscribe options
  • Rafael give an update on the container
    • docker universe
      • htcondor support i think is mainly geared towards startds
    • preinstall software in user containers
    • another model is to let pegasus figure out data and executables
    • rafael did stuff in pegasus lite stuff
      • will have to rewrite proxy and credential environment variables
      • also how is the environment is rewritten
    • good to have a generic concept of multi-level wrappers
    • need to have a pegasus-docker-wrapper or pegasus-container-wrapper to do launch docker or singularity 
    • lets target pegasus lite mode first
    • little bit of data passing.
  • Rafael will have a student to take forward the docker swarm stuff
    • 8 hours every week 

January 13th, 2016

  • Pegasus 4.7.3 Release
    • sub workflows 
    • better error message for pegasus-transfer when source files don't exist
    • pegasus-kickstart
      • improve error message
    • dashboard to better separate kickstart  and pegasus lite messages
    • Potential SCEC issued with RV-GAHP
  • results of qualtrics user survey
  • Pegasus 4.8 
    • swip stuff for 4.8
    • have sent emails for their use cases

October 2016

October 7th, 2016

  • Pegasus 4.7 Release
    • release notes and documentation is done
    • need to follow up with Action for our build VM's
    • LIGO is not going to test 4.7 release as they are in midst of a cluster upgrade.
    • Rafael will write a blogpost about R API after the 4.7 release
  • Dashboard requests 4.7.1
    • rafael and rajiv will work on getting dashboard to display the database schema version and the pegasus version
    • useful, when a new version of pegasus is deployed and .
    • Unable to read the sqlite database
      • related to users permissions on the database
  • from braindump in replay mode should be able to pick up relative paths.
  • brew error on macos sierra
    • brew releases are built manually 
    • after the release we have to update the formula to reflect latest stable version.
  • ACME workflow on MIRA
    • GitHub page to be updated with list of dependent software
    • ACME team needs to help with installation of one of the software.

September 2016

September 16th, 2016

  • Builds
    • disabling RHEL5, Debian 6, Ubuntu precise. Karan will make sure in the code it works
  • Pegasus 4.7.0 Release
    • reached out to LIGO. hopefully they will start testing
    • rajiv checked in dashboard changes
    • karan to write documentation for directory layout
    • rafael will update pegasus-exitcode next week.
  • Pegasus 4.8.0 release
    • one of the first things will be to update the SUBDAG keyword.
  • LLNL account approved for Karan
  • OLCF account waiting for notarized documents to be received
  • SCEC 
    • concurrency limits for transfer jobs
    • prime candidate for priority stuff that will allow good interleaving of transfer jobs with the compute jobs
    • ask Scott to see if 8.5.6 condor can be released.
  • ACME workflow
    • HSI client for HPSS storage.  
    • Karan will reply to Jamie.
  • Bluewaters HTCondor install
    • Bluewaters renewed till 2019
  • Pegasus HPCC workshop on September 30th
    • karan will be there.

September 9th, 2016

  • Builds
    • disabling RHEL5, Debian 6, Ubuntu precise
  • Pegasus Development
    • 4.6.2 released . LIGO has updated it. 
      • LIGO tripped over changes to planner submit directory behavior
      • held job reasons are recorded in the database
    • 4.7.0 release
      • went through pending items
      • targeting end of the month for the release
  • proposal
    • data aware workflow management
    • no BPEL only a reference for it.

September 2nd, 2016

  • Pegasus Development
    • 4.6.2 released . LIGO has updated it. 
      • pegasus.dir.storage.deep true throws an error right now.
    • 4.7.0 release
      • karan looked into the HELD job
      • rajiv thinks no dashboard change required.
      • pegasus-exitcode changes will be done by rafael
      • LIGO should install 4.7.0 on dev machine.
    • SCEC production run
      • Reverse GAHP OLCF
      • once tokens are reactivated , karan will check up on rhea rvgahp and get it running
    • HTCondor on bluewaters
      • Karan opened a ticket. 
    • LLNL
      • security training to be done by Karan
    • panorama
      • rafael is working on panorama demo
        • two different pegasus workflows running on 2 exogeni slices
        • and data staging server in between. shadow q has to propagate transfer priorities
        • currently it is workflow level priority. will be manually assigned.
        • 1000 genome workflow - 

August 2016

August 12th, 2016

  • Pegasus Development
    • 4.6.2 release
      • release notes are checked
      • tutorial documentation will be updated to include the docker tutorial
      • pegasus service init script
        • we will not include it and enable by default in the builds
        • mats will update the item accordingly
    • 4.7.0 release
      • submit directory structure
        • we need to get the depth thing fixed . Karan need to make sure if the DAGMan knob can be set automatically. 
        • we should have a way to have it set for deeper
      • documentation to be set
      • pegasus-exitcode to have wait lock thing to setup it's logs
        • one option is to log only exceptions initially. 
  • pegasus-keg to mimic IO pattern
    • read files over and over again.
      • this way we can increase IO without increasing file size ( that results in higher data transfer costs)


  • DECAF WMS

August 5th, 2016

  • Pegasus development
    • waiting for LIGO to check the support for changes for OSG, where pegasuslite URLs are converted to file URL if the staging site and compute site are same
      • Karan has a call with Duncan next week planned.
    • staging sites deep directory structure
      • mats has it working for one of the workflow.
    • https://jira.isi.edu/browse/PM-1049
      • automatic delayed job retries 
      • the real fix should be in DAGMan. Karan will follow up with Kent. Will address for 4.8
    • postscript output redirects
      • one file per job is what we had considered earlier
      • maybe we should do it per workflow log file.
  • DIPA workflow development
    • good progress there. 
  • Titan Setup
    • we should consider setting up it the same way as bluewaters
  • Next Pegasus proposal
    • next week meeting we should iterate on items.
  • Samrat issue
    • get pegasus-exitcode to look for final output files
    • checked in workflows to the pegasus repository
      • bioconductor repository
      • would be good to setup PAGE cloud VM with the workflow.
  • Deter Krans Mueller
    • director of supercomputing in germany
    • supermute supercomputing cluster
    • will send a student for 3 months to ISI end of the month.
  • Rafael plans to practical comparison paper
    • Gui's docker stuff.
    • do a blogpost of montage with above docker stuff.

July 2016

July 15th, 2016

  • Pegasus development
    • waiting for LIGO to check the support for changes for OSG, where pegasuslite URLs are converted to file URL if the staging site and compute site are same
    • staging sites deep directory structure
    • dashboard changes for nested submit directory structure
      • fixed the on demand loading for the dashboard.
    • identify workflows that will benefit
      • LIGO
      • Splinter
      • OSG - Kink
    • put in the test cases for testing it out.
      • use the new montage dax generator
      • pull the montage dax generator via squid cache.
  • Release schedule
    • Get 4.6.2 out first. 
    • 4.7 probably early august.
  • ALCF Mira running.
    • cobalt workflow 
    • ACME workflow compilation. Waiting on Ben for the source code.
  • Panorama use case
    • SNS is not enough in terms of data sizes. 
    • anirban will start working on it next week.
  • R Examples
    • samrat working on a bioconductor example
      • has an example workflow
      • code should be checked into github
    • samrat is working on a more advanced workflow that will be put in the examples directory also
  • Gui docker nodes work on amazon ec2
    • uses docker swarm and docker machine to do setup etc
    • workflows run in condor IO mode.
  • DIPA Workflows
    • waisman folks will start working on it.
  • free surfer workflow
    • mats does not think there is enough uptake.
    • suchandra is working on a second version that will add more capabilities
  • seismology workflow
    • rafael will check in to the repo.

July 8th, 2016

  • Pegasus development
    • waiting for LIGO to check the support for changes for OSG, where pegasuslite URLs are converted to file URL if the staging site and compute site are same
    • pegasuslite signal handling
      • mats updated it. LIGO reported cases, where jobs got killed before the outputs were staged back . But the jobs themselves were not marked as failures.
      • duncan's third issue could also be related to the signal handler
    • modify kickstart to compute md5 checksums.
      • we could potentially get kickstart to validate md5 checksums
      • have an architectural idea about it.
        • gridftp currently does not expose checksumming
        • irods client has checksumming in built.
    • pegasus-init R example
      • R example will not run on OSG because of module load issues
      • all R examples will have a wrapper for the scripts
    • 4.6.2 after changes are verified.
  • DIPA Workflow
    • with Waisman brain imaging pipeline that runs on Waisman cluster
  • Rafael is working on a seismology workflow
  • tophat workflow paper got accepted in a bio journal
  • Pegasus Virtual Summer School
    • would be similar to the XSEDE ones
    • will be 1.5 hours long.

July 1st, 2016

  • Mats has moved bamboo to a new RHEL7 VM
    • migrated all the tests to it.
    • there were issues with CondorC tests that are resolved now. because of path issues
  • pegasus-init R
    • Rafael will integrate Samrat's R example workflow
    • Samrat is also working on a bioconductor example workflow
  • rajiv made minor dashboard query changes

May 2016

May 13th, 2016

  • Pegasus development
    • kickstart wrappers
      • process explosion.
      • eventually we would want it to be in the workflow.
        • handle these wrappers as credentials in the workflow. 
        • what are class of files that are always required.
      • KICKSTART_WRAPPER in kickstart
        • was done for the PAPI stuff originally.
    • pegasus-init for OSG
      • pegasus-init 
    • R examples?
      • rafael will do it in june.
    • job held scenarios
      • open with htcondor admin .. a job should never goto the held state
      • maybe pegasus should do quick retry for small workflows
        • for large workflows retries should happen at a longer delay
      • for workflows less than 100 nodes held duration should be small, and failures maybe should be triggered earlier
      • not for large workflows
    • revisit whether clustered jobs should be based on size of the cluster or the number of jobs
      • mats no longer likes the idea of having fixed number of transfers
    • deep directory structure for the workflows
      • can splinter move to using them?
        • right now they are condor io
        • on the data side it deep directory structure will only work 
    • BOSCO SSH
      • Mats tried with condor 8.5.4 on comet.

May 6th, 2016

  • Pegasus development
    • moved the submit directory creation stuff to the mapper interface
      • reorganized the code for it.
    • on the execution site for nonsharedfs case we will enable for the dashboard
    • dashboard works mostly
      • only improvement is on the file browser side. will open a JIRA item for it
    • database changes
      • for 4.7 we will add extra columns to workflow state and job state tables.
    • the dashboard needs to show the better the task metadata better for 4.7
  • pegasus tutorial for virtual summer school.
    • will be based on the XSEDE tutorial
    • bluewaters will setup a VM for the tutorial.
    • Scott will do an introduction and an overview.

April 2016

April 22nd, 2016

  • Pegasus development
    • 4.6.1 released today
      • had to fix bugs for symlinking not being triggered for SCEC
      • dashboard for the home page should work without trailing slash
        • all other pages should work the same way . For 4.7 we should do that
    • Pegasus R example
      • rafael will work on it
    • OSG and XSEDE site catalog examples
    • Submit Directory organization
    • Relative DAGMan paths
  • HTCondor week
    • Lauren said training week
  • Bluewaters training
    • 2 day training might be too long
    • we will work on pegasus training module.

April 15th, 2016

  • Pegasus development
    • 4.6.1 release next week
      • pegasus-status change for new Condor changes
        • cartoon will be upgraded to 8.5.x
      • pegasus-analyzer
        • will report correctly submit failures
      • better errors for mismatch in cores/ppn requirements
      • Tag and build on Thursday.
      • pegasus-s3
        • batched uploads and downloads
      • output directory options fails if local scratch not specified
  • LIGO transfer issue
    • NFS reported write as successful for a transfer job.
      • wget reported data was transferred and wget succeededgood use case for checksumming of data
      • where do checksums come from
        • for data files good placeholder in the transformation catalog.
      SCEC had similar issues where SGT's had gotten corrupted
        • that is why SCEC put a specific job in the workflow and uses ABORT DAG on feature
  • Call with Kent for adding nodes to a running DAG
  • group jobs with similar errors
    • might be a python library in there
  • HTCondor Week
    • proposed a hands on tutorial
  • pegasus 4.7
    • ignore integrity constraints in monitord 
      • only for duplicate keys

April 1st, 2016

  • Pegasus development
  • Submitted tutorial for XSEDE 16
    • will include RADICAL
    • might update tutorial with BOSCO. Mats already have BOSCO to run on Comet
  • Derrick Lazaro wants to build a bigger filesystem ( 400 TB )
    • will be backed up 
    • has a commercial storage vendor in mind
    • has backed up capabilities in built ( block level backup)
    • let Mats know about storage needs
    • Mats estimated our storage needs to 25-50TB
  • Graduate student coming to the group mid may to july. brazilian student. currently in Florida
  • Ahmad group got a EPSCoR grant
  • CRAFT Meeting update

March 2016

March 25th, 2016

 

  • Pegasus development
    • Gideon has been working on kickstart online monitoring for panorama.
      • the lib interpose monitoring requires app code to be dynamically linked to use LD_PRELOAD
      • now kickstart has a new mode, where monitoring thread will scan the proc filesystem for all processes in resource group.
        • this approach disables the PAPI counters as they need to be retrieved from app itself
      • also is working on aggregation logic
        • complicated accounting information
      • added another process called pegasus-monitor . so it is usually pegasus-kickstart-> pegasus-monitor -> application
      • can deploy without any external dependencies.
    • 4.6.1 release
      • in april when karan comes back from PAGE meeting
    • Condor bug on schedd evicting dagman jobs
      • LIGO noticed on other submit nodes
    • mats worked with Derrick to make sure glideins work with BOSCO on comet
      • CyVerse Talk - Mats will do a hands on thing with them.  Mats may do an existing tutorial.
      • raphael used the new slides.

  • Pegasus workshop
    • erin will get back to us with other feedback.
    • make the intro slides more simpler.

March 18th, 2016

 

  • Pegasus development
    • deep submit directory structure working for submit directory on PM-833 branch. however need to move to relative directory paths in the .dag file , before merging back to master
    • gideon is reworking how kickstart online monitoring work
      • working on kickstart monitor that goes through the /proc/ filesystem with the assumption all apps installed via kickstart have the same process group as pegasus-kickstart
    • pegasus workshop on campus on tuesday. it is setup https://pegasus.isi.edu/tutorial/usc/
      • the tutorial is setup using pegasus-init
      • will ask mats to move the XSEDE tutorial to pegasus-init
  • raphael working on energy paper again
  • stephan paper to HPDC got accepted

March 11th, 2016

 

  • Pegasus development
    • R DAX API is done
      • will be proposing for CGSMD 
    • Deep hierarchy structure
  • LIGO meeting
    • do a local file copy against the staging site
      • having a separate staging site bogs down inter site transfers
    • metadata
      • they are interested. want monitord to transfer the stampede database to another location from the scratch submit directories
      • cannot really do it in monitord
      • can also potentially do it in pegasus-dagman
    • argument passing for sub workflows
      • will be done 4.6.1
    • jobs that work on output site directory.
    • credentials issue
    • variable substitution
      • will make use of it
    • submit directory and other directory organizations
      • are interested in using it


  • Rosa
    • wants to do something with pegasus
  • Monitord

March 4th, 2016

 

  • Rosa
    • dispel4py Stream based workflow mapped to MPI, Storm
    •  MPI 3 Failure Recovery from Node Failures
  • Monitord
    •  Triggered by Condor failures. Workflow killed, condor recovery did not spit out all event on recovery.
    •  Need better way to test.
  • DB Admin
    •  Merge issues
    • rafael with confirm with gideon if there is an issue
  • Bamboo 
    •  Rebooted for DROWN Attack
  • R API
    •  Unit tests done.
    •  Packaging - Ship, host?

February 2016

February 19th, 2016

Pegasus development

  • support for GO - mats is working on it
  • dashboard shows multiple workflows with same uuid. fixed in monitord
  • pegasus transfer was prepending path because of globus location
    • mats has changed the logic
  • SCEC wanted to disable the stat of files that was happening automatically because of registration turned on.
    • we now have the property that can explicitly turn it off
  • SCEC tripped over replica catalog insert performance. 
    • rafael working on it. identified the bottleneck
  • Catalog files in submit directories
    • will create a catalogs directory
    • what about file based replica catalogs and cache files etc? some of them can be large.
  • Pegasus Blogs
    • SCEC
    • RVGahp?
  • Website
    • highlight applications better.
  • workq has a catalog server running
    • how do jobs report real time monitoring information back to monitor without rabbitmq
    • have a condor submit wrapper
      • will help us increase memory requirements in case of failures.
  • PegasusLite to have pegasus-transfer invocations as kickstart records
    • kickstart 

February 12th, 2016

Pegasus development

  • support for GO
    • mats found a python REST API - is decent.
    • will only work on a small subset of workflows
      • only third party transfers
      • how to handle file URL's on the submit host
      • and how do we activate the end points. 
      • lifetime of credentials .
      • cannot work on non shared fs mode, as what end point to use when staging to the worker nodes.
      • maybe we should look at how condor does it.
  • held jobs
    • dagman added support in 8.3 where the held job reason appears in dagman.out
    • will need schema change
    • failing workflows
    • held jobs.
    • have  a held job tab.
  • pegasus-submitdir archive
  • PMC job statistics in pegasus-statistics
    • mats and rajiv


Annual Report

February 5th, 2016

Pegasus development

  • 4.6.1 release 
    • pegasus-glite-configure
    • change of how retries are done for transfer jobs, using requirements and dagnode retries
      • https://jira.isi.edu/browse/PM-1049
      • there are just 2 retries implemented for transfer jobs
        • one more option is for pegasus-transfer to do better retries
        • and let the dagman retry set to 1.
      • use DAGMan influence to do in retry. 
      • do more testing at our end.
      • lets change default retries for transfer jobs
        • and do this only for transfer cleanups in condor environments 
    • LIGO runs
      • symlinking
    • R API 
      • will target 4.6.1 and keep it similar to the python API
  • 4.7.0 release
    • filesystem organization
  • Keck workshop on Pegasus on Feb 26th
  • Pegasus Annual Report
  • Pegasus GUI email
    • we will send user a direct link
  • Pegasus Announce SLES email
    • we have done on SLES 11 not on SLES 12

January 2016

January 28th, 2016

Pegasus development

  • 4.6.0 release 
    • Released this week
  • Pegasus Website
    • new website there
    • karan will put in the old release notes.
    • Links for old documentation on the new website
    • Rajiv has updated the docker tutorial
    • Tutorials will be moved to Pegasus website
    • Have a research link to point to Scitech website
  • Gideon confirmed MoabGlite helper scripts work with stock condor
    • will also check in a tool to put in the scripts to the right locations.
  • Pegasus Lite pulls in a worker package
    • should we download even by default from the worker package
    • warnings for worker package not being found.

January 22nd, 2016

 

Pegasus development

  • 4.6.0 release 
    • open items
    • constraints algo implemented and checked in . tests worked . 
    • documentation 
      • karan added chapters on metadata and variable expansion
      • gideon updated execution environments
      • updated the BOSCO section about SSH
    • pegasus-analyzer exits gracefully when nothing in the stampede database
      • check if analyzer and statistics check for the version.
    • pegasus-init
    • pegasus-db-admin 
      • better error message for that case.
    • karan will update tutorial to take account of default options
    • for glite style condor arguments quoting is automatically turned off

  • new website.

January 15th, 2016

Pegasus development

  • 4.6.0 release 
    • open items
      • https://jira.isi.edu/issues/?filter=10952
      • Rafael almost done with Constraints cleanup algo. tests run fine on the branch
      • pegasus-bootstrap
        • gideon was doing it as Jinja templates
        • will set it up a shell script. will be easier for people to update
      • documentation needs to be updated
      • map the globe 
    • for resource requirements add pegasus.queue keyword. update documentation to have one table. remove the documentation for priorities.
    • MOAB stuff  documentation. Will be considered for next major release.
  • DAGMan wants to remove the functionality of running postscript in case of prescript failure
    • does not affect pegasus
  • DAGMan wants to remove DAG NOOP keyword
    • was introduced for LIGO

January 8th, 2016

Pegasus development

  • 4.6.0 release 
  • Condor DAGMan log messages contain HTCondor in 8.5 series
    • broke monitord
    • fixed both 4.5.4 and 4.6.0. 
  • 8.5.2 has DAGMan logging timestamp from condor job log also.
    • monitord has been updated for that.
  • metrics reported were updated
  • Globus strict checking mode.
    • gridftp + ssh version.
  • Scott is working on getting the reverse GAHP stuff
  • How to configure the batch_gahp

December 2015

December 18th, 2015

Pegasus development

  • 4.6.0 release 
  • Reverse GAHP for Oakridge Titan
    • https://github.com/juve/rvgahp
    • done because cannot do incoming connections on titan
    • and also they don't want to use pilot jobs, as it is not easy to yank a job from a HTCondor queue
  • Harvard Pegasus installation
    • with SLURM support.. Karan will work on this.
  • We should explore remote batch GAHP stuff
    • for remote batch do
      • batch gahp --rgahp-key /give/key user@host
      • look at the remote_gahp script.
    • documentation for the batch gahp thing.

December 11th, 2015

Pegasus development

  • 4.6.0 release 
  • pegasus-s3 cert issue
    • updated boto library to account for cacert change
    • on mac, had to disable the automatic failover
  • Bypass PFN's
    • replica selectors can now order replicas. Default and regex ones updated
  • monitord
    • combination of missing job terminated and exception on casting job duration as int, triggered a bug that LIGO reported.
  • default behavior of planner
    • pick up pegasus.properties from cwd as a replacement for conf option
    • --sites option for * behavior , remove local from candidate sites
  • pegasus-bootstrap commands
    • sets up pegasus with site catalog.  and dax generators

December 4th, 2015

Pegasus development

  • JDBCRC 
    • should work for 4.5.3 . will work for the release
    • need to make the changes for 4.6.0
      • should consider batch inserts
      • rafael has implemented the batch inserts also
      • the database locked errors are fixed.
  • Rafael is looking into how the timeouts are implemented in sql alchemy
  • Mac OSX El Capitan Builds
    • Gideon fixed those. El Capitan does not allow root to modify files in /usr
    • Gideon changed the installer to install to /local 
    • Upgrading the mac mini build host. 
  • LIGO proxy issue
    • change in how proxies are generated. 
    • LIGO en-common proxies were not supported by J-Globus
    • Gideon has the patch for making the updated jar.
  • Gideon has added instructions on building globus for El - Capitan
  • Jobmanager-condor for obelix was updated to support both shared fs and non shared fs cases.
  • metadata registration
    • information for output files is tracked. 
  • pegasus-metadata client . Rajiv.
  • Cleanup algorithm - Rafael ?
  • LIGO use case for fallback PFN for PegasusLite cases
    • they want to use existing input data for frame files, on different locations across sites
    • but have a single site catalog entry for the computation, as glideinwms provisions it
    • Karan and Mats are working on it
    • pegasus-transfer changes ?
      • sd
  • LIGO running workflows across LIGO and OSG .
  • Database locked errors for monitord.
  • Call the 4.6 release as 5.0 release.
  • Gideon working on MOAB Blahp support. 

October 2015

October 23rd, 2015

Pegasus development

  • Tutorial VM
    • rajiv will update dashboard screenshots and go through the Virtual machine based tutorial
  • JDBCRC 
    • should work for 4.5.3 . will work for the release
    • need to make the changes for 4.6.0
      • should consider batch inserts
      • sqlite supports unlimited connections
        • for write locks , 25 jobs running for write locks. after 25 and it ignores timeout settings.
        • 67 registration jobs.
        • raphael is implementing a back off
        • category for the registration jobs
        • eventually do the dagman category stuff
    • metadata registration
      • information for output files is tracked. 
      • pegasus-metadata client
  • concurrency limits 
    • in partitionable slots this has an affect on performance
    • for 4.5.3 we will have a knob and set it to false by default.
  • Dashboard and PAM problem.
    • mats will create JIRA item.
  • salon working on data from MYRA
    • trying to find contention of data

October 16th, 2015

Pegasus development

  • does stime include io wait time. does not appear so. the cp of 1GB file indicates that
    • so then is there a way to capture the IO wait time
  • pegasus-db-admin
    • version migration for panorama works
    • metadata schema finalized
  • failing jdbc RC test
  • metadata population
    • metadata population from DAX working
    • metadata attributes from transformation catalog and site catalog are now incorporated, as metadata events are generated at end of site selection
    • output file sizes will be populated for files with register flag set to true.
  • pegasus dashboard
    • metadata display done other than the file information that needs to be populated
  • cleanup algorithm
    • will be done before raphael leaves for vacation
  • website changes
  • panorama changes
    • monitord change to make sure events don't get dropped
    • online monitoring spawns a thread where there is a queue  that is responsible for inserting the online monitoring events into the db
    • the thread checks the database to make sure the job instance is populated.
    • CURRENTLY, it is not done for the anomaly populations. 
  • SNS and Acme workflow
    • maybe we can hire a student to do it
    • maybe scalarm can be used for SNS workflows
    • Ben said there is a meeting about Pegasus on Titan.
  • Mats has installed wordpress on one of the machines.

October 9th, 2015

Pegasus development

  • pegasus-db-admin
    • db version has been moved to string. a new column was added. 
  • metadata population
    • files are populated if a user specifically associates metadata with a file in the DAX or if an output file is marked for registration
    • make sure that for tasks metadata attributes are inherited from the transformation catalog. 
  • pegasus-metadata client
    • output format ? 
    • is the client for end users
    • list files for a workflow
    • list workflow metadata
  • pegasus dashboard
    • workflow level
    • task level level 
    • file level metadata

October 2nd, 2015

Pegasus development

  • pegasus-db-admin
    • changes discussed last week?
    • also change to string for the database version for allowing merges with panorama
      • panorama db versions should be N.x and not whole integers
  • jdbrc sqlite test failures
  • pegasus-transfer
    • better job with grouping for ssh transfers.
  • metadata population
    • planner generates the events now for associating metadata with wf, job and files
    • use case should be for a file what workflow and job created that file.
  • Pegasus workshop
    • we will be using workflow.isi.edu
    • mats has created 30 training accounts on workflow.isi.edu 
    • suggestions on workflow example?
      • blender rendering example..
    • pegasus-dashboard should be installed
  • Sipht portal
    • back up and running

September 2015

September 25th, 2015

  • Pegasus development
    • pegasus-kickstart to return record on condor_rm ( SIGINT)
    • changes to data reuse algo for Chris Edlund
      • delete jobs when inplace cleanup is used for intermediate files that are not transferred to the output site.
    • use of DAGMan NOOP keyword
      • workflow test failures
      • change monitor to not complain for noop jobs.
    • comma separated directories for input dir
      • automatically delete the input directory ? we all agree not a general use case.
    • pegasus-transfer grouping should be done for all protocols?
      • problem is some renames for output files
      • avi has been running workflows on OSG with pegasus lite. 
      • 2 million connections over two days on SSH server 
    • pegasus-db-admin error handling. 
      • if it fails with error, it should not report that database has been updated. This is a bug
      • other is what to do , when 4.5 is run against
      • downgrade option
      • warn if db-admin detects database version is higher than what it is currently running, and exit with 0 exitcode.
  • Pegasus IEEE article accepted
  • montage workflows
    • dax generator is not maintained
    • have it as a student project to convert the DAX generator to python API.
      • they also check an overlap check
    • montage jobs have varying memory requirements
    • we should not showcase it.
  • Pegasus Workshop in October
    • fallback from USC HPCC cluster required
    • whole day will be rough.
    • Mats will not be around! Going for the duke workshop.
  • panorama
    • monitoring thread segfaults
    • why was the segfault happening initially
      • happening in fork system calls
      • related to starting and stopping monitoring threads
      • and how PAPI counters were updated.

September 18th, 2015

  • Pegasus development
    • pegasus-db-admin updated
    • for spec added registration of flat lfn's when deep LFN are used
    • workflow tests now running.
  • pegasus paper
    • will add info about galactic plane and gtfar
    • cloud challenges
      • talk about virtual clusters  . precipe / wranglar
        • tie more closely to setup stuff and talk about chef/puppet and precise and wrangler.
      • gtfar 
      • add them in acknowledgements
    • not much to add about cloud challenges other than image managements
  • hubub conference
    • latech user who wants to run on bleaters
    • tom bishop 
    • pegasus submit tutorial.
    • to do with steven... 
  • panorama
    • segfaults happening randomly
      • happen when the monitoring thread is started.
  • craft
    • jarek 
    • hubzero
      • chip design
      • instead of hubzero use open science framework - a non profit funded thing

September 11th, 2015

  • Pegasus development
    • worker package tests in pegasus lite
      • pegasus lite will complain if the system architecture 
    • panorama tests now work
      • maybe some problems might be masked!
    • jdbcrc 
      • updated jdbcrc . for mysql and postgres deletes work differently. 
      • raphael will abstract it out
    • gideon changed the way the papi counters are used in kickstart
      • earlier signals were being used for threads to report counters
      • PAPI now allows to query for counter values
  • Pegasus cloud article
    • ewa is doing the final edits
  • HubBub presentation
  • panorama
    • darek working on getting papi counters to monitord
    • changed the job metrics table in the stampede database.

September 4th, 2015

  • Pegasus development
    • worker package creation on the submit host.
      • should we include python externals directory .
      • we will put that back in. we only need boto. 
      • also need to make sure it works for a RPM or deb install.
      • implement the compatibility check in PegasusLite
    • panorama tests
    • better error for input file replica selection failures
    • Scalr for openstack tests
      • action has a new openstack deployment. 
      • have our two QNAPS setup on the build VM's to run workflow tests.
      • run on vmware pool.
    • SCEC shallow LFN's
      • for registration in the replica catalog.
      • put the test in 4.5 . 
    • Database schema changes
      • pegasus-db-admin changes to database schema.
      • downgrades work
  • The short paper
    • working on the google doc.
    • we are not actively working on ec2.
  • panorama
    • adding papi counters to online monitoring. 
    • pegasus-transfer explodes when signal is sent
    • online monitoring dashboard.

August 2015

August 28th, 2015

  • pegasus 4.5.2 released
  • worker package staging
    • planner will use a worker package from the submit side installation and use it.
  • pegasus s3 tests
    • currently no s3 tests
  • tests are running against 8.3.8
  • cleanup algorithm update ( Rafael)
    • estimate that it will be done in two weeks
    • has to work for multiple sites
  • cloud computing short paper
  • hub bub
  • panorama and dv/dt poster and presentations . in mid september
  • metadata discussion
    • google doc updated
    • leaning towards monitor populating the database
    • remove the estimated size and md5 checksum

August 21st, 2015

  • pegasus 4.5.2 release
    • release notes checked in
    • db-admin changes?
      • update man pages
    • python source package
    • tests are we moving to dev branch?
    • docker problem
      • how to get around it ?
      • an issue inside docker, that is being exposed
      • we will put in a wrapper around it. 
    • panorama branch is disabled
      • but tests should be fixed.
      • dark will be fixing it
      • rajiv pushed out his dashboard changes for darek. for demo at supercomputing.
  • cleanup algorithm
    • Rafael will start next week 
    • how will the limits be passed
  • kickstart changes
  • metadata schema discussion
    • next week.
    • postscript
    • dagman has plugin's
    • schema 
    • use case
    • stampede is sqlite
    • pegasus-exitcode write locks.
    • separate sqlite database for metadata. 

August 14th, 2015

  • Pegasus 4.5.1 release
  • Bamboo machine troubles
    • panorama tests hung because of bamboo
    • do experiment for the case where we do condor off and see what happens to pegasus-dagman.
  • Panorama tests
    • look at build #73
  • pegasus-kickstart stuff
    • for interpose stuff
    • gideon investigating how to cover all cases for threads
    • wants to make sure that descriptor table is accessed in a thread safe way. in worse case
    • also is doing thread tracking, thread counters and thread lists
  • directory structure organization for submit directories.
  • nonsharedfs mode problem for auxillary jobs
  • sudharshan cleanup algorithm
  • stefan update
    • working on user models on how to submit jobs to HPC
    • what user characteristics are of submission process 
  • to be able to show the IO part for SoyKB
    • metrics of success
      • makespan is reduced.
      • number of service units is reduced
  • what makes an application IO intensive

August 7th, 2015

  • Pegasus 4.5.1 release
  • 4.6 common resource requirements
    • we are now exposing three pegasus profiles cores, nodes and ppn.
    • added logic to do specific translations for PBS and SGE
  • cleanup bug fixed related to DAX transfer flag for input files
    • larger question and agreement. transfer flags for input files usually don't have any meaning.
    • transfer flag should be renamed or in the API
      • change in schema 
      • at minimum we should change the DAX API's
      • transfer attribute renamed to final output? 
  • spaces in Pegasus URL
    • gideon feels it should be mod 20 instead
    • somewhere in documentation . 
      • the planner should have more specific error message in case of spaces. 
  • kickstart enhancements - gideon
    • fixing edge cases in kickstart for the extended reporting
    • what can we do with the papi performance counters and see what will be used in panorama.
    • will be updated for counters.
    • gideon and darek will try and merge

July 2015

July 31st, 2015

  • Pegasus 4.5.1 release
    • will release it next week
    • update the mapper documentation
      • have a link to the replica catalog
    • steven clarke cleanup issue
  • resource requirements
    • update the resource requirements section for 4.6
  • acme integration
    • rajiv will work with bibi to integrate it with the REST monitoring api
  • kickstart changes to get papi counters
    • Only triggered if -Z option is passed
    • the paper on xsede mentioned about them reporting per threads
    • also we make better track of threads launched by the executable
      • some edge cases for the thread case
      • double execve of process does not work currently
        • example: /usr/bin/env date
    • also record command line options for all sub process launched
      • in the proc record , the cmd tag
      • grabs only first 1K of arguments
  • monitord amqp population
    • revert back to use the event name as the routing key for AMQP population.
  • pegasus cleanup with peak storage requirements
  • Panorama
    • Data analysis done..
    • ideas about writing a paper about workflow profiles
  • Anomalies Detection
    • showing anomalies in dashboard and population in stampede schema

July 24th, 2015

  • XSEDE Tutorial
    • 2 Posters and one tutorial
    • news item online
  • Pegasus Development
    • common resource requirements PM-962
      • documentation needs to be updated
      • we have cores , hostcount
      • karan should make sure cores is translated correctly to ncpus for PBS
    • Pegasus REST API for integrating with Pegasus
    • pegasus transfer
      • checkpoint files
    • LIGO developer notion of site attribute
      • maybe we should be more clearer in the documentation
    • automatically changing parameters for memory on job retries
      • check point file for the job is a partial solution
    • monitord amqp population
      • works.. we will document it on JIRA
  • Panorama
    • Darek implemented sending messages in batches from kickstart to rabbitmq
    • socket based communication between kickstart and lib interpose . was done to take of the file interleaving issue.
    • tests on obelix and exogeni indicate socket writes are atomic for panorama message

July 17th, 2015

  • PMC Cpu affinity
  • LIGO pegasus analyzer bug
    • has been passed to LIGO . awaiting to hear from them
  • Cleanup algo
  • Resource Requirements
    • common pegasus profiles
  • SGE
    • change.dir should be set automatically for shared filesystem stuff
    • documented already.
  • kickstart path variable to prepend.
  • REST interface for monitoring for pegasus is done. Rajiv completed this week.
  • extensions to the cleanup algorithm. rafael will start working .
  • Pegasus 4.5.1 release
    • will be done after XSEDE.
  • Pegasus XSEDE tutorial
  • XSEDE Pegasus Poster
    • show a LIGO workflow for the XSEDE poster.
  • Salt configuration needs to be updated
    • Student machines on salt
  • panorama
    • rabbit mq installed on exogeni site.
    • darek will do message batching working.
    • gideon recommends doing it with the AMQP C API library
    • message interleaving in kickstart.
    • lot of unacknowledged messages in rabbit mq
  • kickstart polling loop
  • all kickstart memory values are in MB

July 10th, 2015

  • PMC jobs automatic summing of maxwalltime. Should be disabled
    • In PMC case we will do a division.
  • PMC CPU affinity for jobs PM-953
    • there might be a fragmentation approach.
  • Pegasus REST interface
    • short cut URL end points. 
    • karan will send email to Lavanya.
  • running on SGE cluster using GLite interface. 
  • harmonized pegasus profiles 
  • Metadata
    • will need the file implementation . 
  • Dashboard Panorama stuff
    • September 16th. Time series and anomaly detection.
    • Application level anomalies
    • Infrastructure level anomalies. 
    • no plans for integration in production Pegasus.
  • monitord profiling of monitord population. 
    • we want to see how long 1000 events take to be populated in case of LIGO . 
  • Panorama
    • anomaly detection
      • implemented a working prototype of threshold based anomaly detection
      • kickstart sends events to rabbit mq, then monitord populates to influx db. 
      • darek tool queries influx db and takes in the metadata file generated by pegasus and determines the anomaly and sends it back to rabbit mq
      • monitord then again picks up anomaly and populates it to stampede db for dashboard to display.

June 2015

June 12th, 2015

  • Pegasus profiles for job/resource requirements
    • postponed till next week when mats is here
    • karan to create a list of relevant profiles
  • pegasus dashboard
    • locking issue?
    • can this be related to new connection stuff or the failing tab?
    • look at connection pooling .. or maybe transactions are not being closed properly?
    • also see if there is an option for dashboard to set a read only lock when opening a connection to the databases
  • panorama workflow tests
    • failing.. but merge from master was done.
    • karan to investigate
  • panorama workflow dashboard
    • updated the job metrics tab for doing the polling
    • for mpi jobs the job name appears as aprun, since that is the process running on rank 0
  • Job Survery paper
    • Darek sent a final version
    • will be submitting next week
  • Pegasus Release timeline
    • maybe we should put on our website somewhere?
  • Rafael Energy paper
    • information about building energy profile.

June 5th, 2015

  • panorama usecase and metadata passing through
    • not done yet for the metadata associated with files with replica catalog
    • DONT rebase commits that have been pushed out
  • job.runtime, cluster.maxruntime, maxwalltime parameters
    • how to associate profiles. have a different namespace
    • how is it expose in the DAX API
  • python dependency
    • stopped support for 2.5 and 2.6
    • only affects redhead 5 systems.
    • will have to install redhat 2.6 python package on 2.5
    • setup tools for python 2.6 has to be at build time
  • pegasus-dashboard updates for LIGO
  • cleanup bug for intercept runs with InPlace cleanup.
  • S3 storage
    • about 9TB and rising for pegasus system services backup
    • right now no backups are going to go to Glacier
    • we only keep 2 weeks of data
    • glacier is good if we want to keep 6 months of data
    • 3VM' for pegasus website , CROWD etc
    • database on stewy and obelix
    • qnaps /nfs/ccg3 and /nfs/ccg4
    • Big ticket items of 9TB backup bucket in S3
    • need to keep 2 backups in S3
  • HubBub talk.
    • abstract
  • talk by Jack Donagara.

May 2015

May 29th, 2015

  • Bamboo test failures
    • condor-c tests working now. changed the site catalog for those
    • rhel5 json module
    • pegasus-transfer will do a proper check and complain for missing json module
    • mats will update documentation accordingly
  • Python Dependencies
    • New python dependency 2.6 from 2.4
    • newer versions of Fedora uses Python 3
    • Fedora will keep python 2.x support till 2020.
    • maybe have a dynamic bash wrapper across python code to pick the right python version
    • have a tool called pegasus-python??
  • concurrency limits
    • apply to bamboo machine and our other workflow hosts.
    • throttle number of grid jobs per categories of jobs. that is what SCEC wants and cannot be done.
      • unless negotiation can be employed for grid universe jobs.
      • define own throttles in compute jobs
  • pegasus-dashboard
    • LIGO has an issue with no authentication URL rendering.
  • quoting for environment
    • implemented. changed both for environment and +remote_environment
  • docker universe support
    • should work out of the box with condorio
  • new dagman default values
  • pegasus-statistiscs
    • show bad put?
  • LIGO OSG
  • Documentation
    • 10 minutes using pegasus-docbook
    • using new pipeline it uses 3 minutes
    • the hyperlinks don't work
    • include that into pegasus website template
    • In PHP we tell Google not to index old version
  • panorama

May 8th, 2015

Bamboo test failures

  • montage tests are failing because of the remote service being down
  • documentation tilte is messed up. gideon will look at it

pegasus-transfer new format

  • mats has come up with a new JSon format.
  • backward compatibility with the old format
  • create dir and cleanup jobs will be different

Metatdata

  • google doc shared with people
  • next steps are panorama use case for calling out
  • ssh cleanup . JGlobus library does not implement ftp

LIGO on XSEDE

  • have started using PMC
  • data management

Python builds

  • always check the python version.
  • if we ship our own python modules, then we may have to

Bamboo build machine

  • build and test plan ( running concurrently )
  • also we can run docker stuff
  • automate the salt setup of bamboo agents
  • maintain one OS. Can action give us a beefier VM?
  • we have too many documentation builds running ?
  • VW with bamboo agent and use docker
  • workflow tests are a separate issue
    • they don't load the bamboo machine
    • that is more related to a big condor pool.
    • workflows tests will run always out of bamboo.
  • mats and rajiv will work on it for the VM stuff.

Getting new SSL certificates

  • *.isi.edu is screwed up in firefox

Metrics Server fixes

  • google maps update broke the web UI.
  • somehow all the colors were used in the trends ?

May 1st 2015

 

  • Pegasus 4.5 release
    • not heard back from SCEC and LIGO
    • mats checked in the example
    • will add release slider
  • Variable Expansion
    • pretty much done
      • right now we have $()
      • we will change with ${env-variable}
      • have more helpful error message 
  • pegasus-kickstart
    • file does not exist. now gives a proper error
  • XSEDE poster due next week
  • Monitoring Service API
    • donald is almost done.
  • PMC with PegasusLite
    • PMC job by default runs on the shared filesystem
    • tasks in PMC are pegasus lite tasks
    • if a task does randomio, then on shared fs might be tricky
  • brazilian student contacted about pegasus application for real workflows.
  • mats will be doing the transfer events for panorama next week

April 2015

April 24th 2015

 

  • Pegasus 4.5 release
    • release candidate today rc2
    • updates to pending items
    • job throttling added to optimization guide.
    • release notes are online https://pegasus.isi.edu/news/4.5.0 
    • waiting for db-admin unit tests to be checked in.
    • pegasus-cleanup checking
    • pegasus-lite-local.sh  add some path before starting.
  • rest monitoring API
    • we have not heard back from lavanya yet
    • PNNL acme stuff
  • pegasus 4.6 release
    • common pegasus-transfer , pegasus-cleanup and pegasus-createdir
    • APP_PATH_PREPEND addon
    • pegasus worker package staging
      • planner calls out to common script to determine the worker package
      • if it does not exist , we build a default worker package on the fly 
      • add extra logic to the untar job in the
    • pegasus-gridftp modification for ssh ftp.
    • software eggs
  • panorama
  • metadata for 4.6

April 17th 2015

  • Pegasus 4.5.0 Release
    • rc1 working for hub
    • LIGO trying it out.. wanted to change checkpoint files. need to hear back on the dashboard changes.
    • SCEC ? waiting to hear from Scott
    • https://jira.isi.edu/issues/?filter=10851
    • pegasus-db-admin sqlalchemy issues? for updating tables?
    • pass through implemented for Glite to PBS
    • verification of update to pegasus version on running workflows
      • mats thinks his testing should do the trick.
  • Pegasus Dashboard for bamboo user
    • URL - https://cartman.isi.edu:5000 
      Authentication - Uses PAM Authentication 
      Admin Users - mayani, vahi, rynge, juve, rafsilva, darek, deelman
  • Cedars visit
    • SGE cluster
    • we have 3 potential SGI cluster users Cedars, Vision group at ISI and maybe Rutgers ( that will be replaced with SLURM)
  • Lavanya REST API
  • Pegasus 4.6 release
    • variable expansion thing figured out
      • argument strings in dax, profile values in the dax
      • site catalog. 
      • replica catalog file based one.
      • need to now make changes in various parsers
      • predefined environment variable
    • metadata
      • LIGO Dibbs .. ability to do data reuse based on metadata attributes
      • panorama - pegasus - aspen interface
      • iplant
        • they want in the IRODs
        • S3 tags.
      • mats wants a better idea of what it looks like in the ideal world.
    • file management on scratch directory, submit directory also?
    • implementation of the REST API
    • implementation for held job tracking
    • Panorama requirements
      • influx db monitoring , into pegasus-transfer. 
      • pegasus-transfer sends messages to rabbit mq about file size transferred
      • pegasus aspen interface ( modelling tool ) . apsen is a C++ library.. pegasus planner querying the aspen models for each node.
        • command line tool pegasus-aspen
        • planner needs to send application parameters, and all the metadata for the node.
        • gets back a list of attributes , memory and usage, and convert them internally into pegasus profiles
        • this can be a generator of metadata.
        • application model which is a file and a machine model 
      • timeseries data . monitoring data about the dashboard, anomalies 
      • there is a CEP thing that anirban is developing and will determine anomalies.
    • dv/dt requirements
      • prediction service
      • pegasus will query the prediction service

April 10th 2015

pegasus cleanup

  • gideon removed a bunch of stuff
  • will be completing the cleanup
  • pegasus-plots will be deprecated in the release notes for 4.5 release and removed for 4.6

...