Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  • Table of Contents

April  2019

April 12th 2019

  • Pegasus 5.0
    • Site Catalog Conversion to YAML
      • mukund is mainly done
      • pushed out his changes
      • trying to make the tests green
    • Checkpointing changes to accomodate LIGO use of vanilla universe
      • Karan and Mats will explore and see if it is possible
      • cumulative stdout|stderr
        • what about time and duration values
        • since there is no DAG Node retry and job just goes on HELD state
    • Composite Events
      • Kibana dashboard needs to be updated
      • dropping __ in the event names
      • George wants the AMQP library updated 
        • Will create a JIRA item
    • Office Hours video
      • Karan will work Jasmine to upload the video
  • Papers
    • RACE Paper submitted last week
    • PEARC Paper this week
  • Proposals
    • Army Research 
      • enabling in-situ supports for ExaScale
      • linked with what Tu is doing
    • SCEC Proposal Submitted
      • have a good chance
    • Exascale one with Michigan
      • the call will come out soon
    • Ewa , Rafael and Deborah
      • NSF GCR Proposal
      • Modelling wild fires
      • Has PRICE school input and also Deborah Post DOC
  • EScience
    • Pegasus Tutorial Proposal
    • May 6, 2019: Tutorial Proposal Deadline
    • Also trying for the workflow comparison paper
    • Dynamo paper by George
  • Pegasus connect discussion
    • tabled it for later when Mats is around
  • HTCondor Week
    • Karan will be doing a Pegasus talk and Pegasus workshop
  • Pegasus OLCF Poster
    • combine the panda poster
    • can also submit to EScience
  • Ryan's work
    • Loic is moving pachyderm setup to AWS
  • Loic Rafael and Tu are working on a paper for Cluster
  • Software X

March  2019

March 29th 2019

  • 4.9.1 Release
    • done and working on 4.9.2
  • Site Catalog Conversion to YAML
    • mukund working on it
    • i still need to look at the bamboo tests
      • bamboo faling on mount scratch thing that condor thing
      • we have to fix in pegasus also. to fail on credentials in /tmp
      • check and do condor_config_val  on the key and check if /tmp is in there
      • mainly affects all the users that use x509
      • LIGO has also tripped over it . Both with Pegasus and without Pegasus
  • Condor vanilla checkpointing
    • karan asked him about what he is trying to do
  • composite events 
    • check for keys with same values
    • also do we need to pad extra keys for all events?
  • Extensions to Jupyter Integration
  • Pegasus Connect
    • will discuss on whiteboard on April 12th


March 1st 2019

  • 4.9.1 Release
  • Office Hours
    • On Friday March 22nd on real time monitoring
  • transformation catalog for 5.0
    • Mukund will work on it next
  • EScience?
    • Paper
  • pegasus-exitcode test
    • success message not parsed correctly  
  • Programmer
    • will interview the 

February  2019

February 22nd 2019

  • 4.9.1 Release
    • Pending Issues
      • https://jira.isi.edu/projects/PM/versions/11891
      • This raises the larger issue of how long we want to support externals packages

        there are some packages we need to ship because of worker packages dependencies.

        Consensus:
        We remove mysql python externals package for 4.9.1 and 5.0.0

        And also remove the dependencies from our deb and RPM builds.

      • Transfers within containers
        • We are only going to transfer from within the container till people complain
        • George Papadimitriou will add to the documentation.
      • non ascii encoding in the stdout
    • Support HPSS storage
  • Office Hours
    • George on real time monitoring.
      • Date?
  • EScience?
    • Paper
    • Tutorial submission

February 1st 2019

  • 4.9.1 Release
    • ascii encoding breaks while parsing for monitoring events. monitors should have the population working and have log a warning error.
      • but we should ensure that stdout in database still gets populated
      • Karan will fix this
  • New TC Format
  • Shifter Support in Pegasus
    • is in 4.9 branch
  • Pegasus Annual Report
    • will be working on it in coming weeks
    • will ask for input
    • next year report will be tricky . in terms of effort allocation.

January  2019

January 25th 2019

  • 4.9.1 Release
    • ascii encoding breaks while parsing for monitoring events. monitors should have the population working and have log a warning error.
      • but we should ensure that stdout in database still gets populated
  • YAML format for the TC
    • the line numbers should be mentioned in the errors
  • GitHub commits don't trigger bamboo builds right now
    • move to webhooks?
    • slack token in bamboo.yml . 
      • mats will look into it further
  • SCEC for HPC Transfer certificate issue
    • Globus online certificates messed up hpc-transfer issue.
  • Data Storage at NERSC
    • almost full
  • Singularity container with the entry point.
    • docker → singularity container conversion does not add the entry point.

January  18th 2019

  • 4.9.1
    • container execution
      • data transfers happen within the container
      • python3 issue
      • vague rules to discover what python to use
    • Singularity HUb URL's updated
      • Documentation and tutorials need to be updated
      • montage examples
      • python stuff: create JIRA item
    • LIGO pull requests
      • Build pull request
      • PAM module
      • subprocess package thing
      • also related to Python3 movement
  • Transformation Catalog Implementation
  • Astro Py
  • Shifter support at NERSC
  • Panda Integration
  • CENON NT
    • Rusio data pull in 
    • fetching data might be easier
  • Journal Paper
    • need to write something about containers

December  2018

December 13rd, 2018

  • Pegasus 4.9.1 release
    • local site catalog entry creation
      • based on the pegasus version on the submit host
    • encoding issue in the stdout.
  • Pegasus 5.0 Release
    • TC yaml implementation
      • mukund will create a yaml schema compatible with the TC
    • backwards compatibly 
      • case by case basis
      • definitely for
        • catalogs
        • dax 
        • pegasus-transfer
  • SWIP Paper
    • we are in good shape
  • Titan
    • under the PBS batch gahp.
  • ZTF
    • the pipeline is based on docker-compose
    • peter will visit ISI with postdoc Danny in January
  • Tutorial at TACC
    • karan has updated pegasus-init to work on wrangler
    • will update the tutorial notes accordingly 
  • OLCF accounts
    • make sure they work 
    • get karan and mats can login

November  2018

Nov 29th, 2018

  • Ryan
    • working on comparison paper with george on workflow systems
    • mats, karan shared neon meeting notes with Ryan
  • Pegasus 4.9.1 release
    • Due for december end
    • potential issue in monitord in reference to hierarchal organization of submit directories
    • pegasus-submitdir
  • ADASS Paper
    • due tomorrow
    • need to add information about sample run
  • SWIP paper
    • mats and karan will work on it tomorrow afternoon.
    • cull out sections
    • add information about updated monitoring in 4.9
  • OLCF Kubernetes 
    • Condor is installed and configured as root
    • George tried condor log directory to lustre as condor in container has to run as user not as root
    • LOG_DIR should be /tmp
    • volumes can be attached to container to contain workflows etc
  • Dynamo 
    • Do dynamic scheduling
    • George thinking of using flocking
    • similar to what is done in OSG
    • non-sharedfs deployments should work

Nov 1st, 2018

  • Pegasus 4.9.0 and 4.8.5 Released
    • We released it this week.
  • Pegasus Business Card
    • Advocate for job postings. 
      • Postdoc options
      • Programmers
      • pegasus.isi.edu/jobs
    • We should take to conferences with us
  • Pegasus JAVA 8 dependence in RPM
    • there is a disconnect between RPM and common.sh
  • ADASS
    • Karan working on a wlpipe demo example
  • New Student
    • Mukund 
  • Duncan started using 4.9.0 and has updated pyCBC to use singularity
    • changed our container execution model
    • all transfers done within the container now.

October  2018

Oct 12th, 2018

  • Rescheduling meetings
    • New time is Thursdays 2PM starting from last week of October
  • DAX APi reporting
    • Perl DAX API - Rajiv
  • Atlas visit
    • Wednesday we have Scientific Computing Seminar
      • Will involve writing a Pegasus code generator
      • Panda is second biggest after Condor on OSG
    • Thursday 
      • Karan and George will be there.
      • Mats might be available remotely
  • 4.9.0 Release
    • Mats preference is to skip the beta tag
    • Aim for the full release
    • Documentation freeze on Oct 26th
    • Try and do the builds over the weekend
  • Duncan container usecase
    •  cvmfs hosted container images
  • Demo repository
    • panorama data and some runs from exogeni / nersc
    • Mats has two new elastic search VM's and are part of Elastic Search cluster
    • these vm's data is backed up also

Oct 5th, 2018

  • Rescheduling meetings
    • Either Tuesday or Thursdays
      • Karan will circulate a doodle poll

September 2018

September 28th, 2018

  • Rescheduling meetings
    • Either Tuesday or Thursdays
      • Karan will circulate a doodle poll
  • Pegasus 4.9.0 Release
    • transformation selection issue
      • karan has not been able to recreate it yet.
      • will look into it more today
    • docker singularity pulls
    • container symlink 
    • deprecate api's
      • modify DAX generators to indicate version/ DAX API used.
      • will look into ways on how to do it
        • one way is workflow metadata attributes
        • second is attribute to ADAG object.
      • rajiv will check how it gets stored in the metrics server
  • ADASS
    • will try and do a poster with Mike at ADASS
    • deadline is Oct 8th

September 21st, 2018

  • Rescheduling meetings
    • Either Tuesday or Thursdays
  • Pegasus 4.9 release
    • integrity error reporting
      • pegasus-statistics reporting information about integrity errors
      • the unicorn dashboard for internal swip purposes
        • errors are appearing in the stream
        • more brainstorming required. the data is there
        • not clear whether to use grafana or kibana
          • does not have drill down functionality
          • mix of production and test workflows
          • create different queues in AMQP exchanges
    • container mount point support
      • karan is close to have that being implemented
    • transferring outputs to multiple location
      • lets say one for portal and the other for 
      • list of output sites
      • good feature to add for 4.9.1
      • update --output-site option to pegasus-plan
    • pull docker images for singularity runs
      • we should do for 4.9.0
      • planner needs to tell pegasus-transfer an extra attribute. 
        • add a type attribute
    • Papers 
      • Github private papers repo
    • Deprecate stuff
      • perl api
      • old catalog formats
      • pegasus-plots
    • Hiring

August 2018

August 24th, 2018

  • Pegasus 4.8.4 Release
    • when are we releasing?
      • next week before mats go on vacation
  • error tagging
    • update stampede schema to add a table called tags
    • will allow us to capture number of integrity errors

August 17th, 2018

  • Pegasus 4.8.4 Release
    • RPM fix ? 
    • mats will manually verify
    • Karan should follow up with Stuart
  • AMQP filtering
    • we are working on having filtering in built into monitord
    • nepomunk already has 33 errors identified
    • we need to db connection, pegasus-db-admin and other tools to pass properties with pegasus property prefix stripped off
  • SWIP Paper
    • one reject seems to be harsh
    • we can try for HPDC also

August 3, 2018

  • Pegasus 4.8.3 Release
  • SLURM
    • Design Safe / TACC on Wrangler headnode
    • Nextflow has integration with SLURM and everything can be installed in user space
  • PMC unit tests are broken
    • lets fix the tests
  • Pegasus 4.9 release
    • more real life runs
    • nepomunk against ceph-s3 from one of uchicago machines
    • we need to get stats reported for integrity errors
      • larger issue of error classification
  • ADASS Tutorial
    • we got into second round
      • add on exercise to run montage in the end.
  • LIGO
    • Bruce group at AEI Hannover has left LSC
  • Infrastructure
    • HipChat mess
      • should we move to ISI Slack
    • Public Chat feature
      • Some clients for Hipchat
    • Get a free channel from Slack
      • for all Hipchat rooms
      • what about ISI slack?? 
    • Github removal of old integrations
  • MINT Meeting
    • went well overall 
    • issue of scoping . 

July 2018

July 27th, 2018

  • Pegasus 4.8.3 Release
    • VM Tutorial
      • will update pegasus-init requirements to get it working
      • main tutorial chapter will be updated for 4.9
        • because then tutorial based container may not work
    • change how docker scripts set environment
    • SCEC database loading error
  • Failing Tests
    • Issue in updates to the dashboard database
  • Panorama Paper
    • agreed on a re-organization

June 2018

June 29th, 2018

  • Pegasus
    • 4.8.3 needs to be released because of singularity launching options
      • will wait till tutorial is updated. 
      • karan will update pegasus-init with population modeling or povray option
    • 4.9
      • pegasus-statistics updated with integrity metrics
      • how to flag job errors because of integrity
        • need to figure out logic
        • value add proposition
        • maybe we should value type in the pegasus lite 
      • need to implement the integrity dial
    • Start creating default local site entries to execute without local site
  • ADASS Tutorial
    • Will submit today 
    • Google doc shared

June 22nd, 2018

  • Pegasus
    • SWIP paper submitted to escience
    • 4.8 montage tests failing
    • changes for integrity metrics in pegasus-transfer
    • updated monitord to parse events from various sources like pegasus lite output
    • mats pointed out to a bug in monitord
  • LIGO
    • pip for python source package
      • update dependencies for latest packages , like pyopen ssl
      • install in the pip repository
    • pegasus-analyzer
    • interested in swip and containers.
  • SCEC CSEP
    • will use containers
    • run on Comet
  • 1000 genome workflow or use chimerica workflow
  • ADASS Tutorial
    • montage ? 
    • probably pycbc is also submitting a proposal

June 8th, 2018

  • Scott Replica Catalog issue
    • Replica Catalog deletes take a long time
  • Bamboo
    • bamboo emails are no longer received. so we dont come to know about workflow plan failures
  • SWIP 
    • monitord integrity changes.  population of data from ks records working now.
    • we still need to populate data from pegasus lite records and pegasus-transfer
    • pegasus-statisitcs need to be updated
    • 0.1% overhead on production osg gem workflow
  •  Pegasus deployment at ORNL
    • we should be doing it similar to hpc-pegasus
  • Pegasus Office Hours
    • next one in August
    • travels in July

May 2018

May 4th, 2018

  • Pegasus 4.8.2 Release done on May 3rd
  • we should consider separate user data to a separate file on pegasus-wms
  • si2 meeting updates
    • some potential new users
    • ewa slides were a good overview summary
    • integrity data schema changes. 
    • monitord changes need thinking

April 2018

April 6th, 2018

  • Pegasus 4.8.2 Release
    • PMC bugs
    • tutorial for usc hpc
    • no longer allow + or . in the names
  • Pegasus Report
    • Submitted for Ewa' review
  • SWIP test run
    • discovered integrity errors in the wild
    • at colorado and university of nebraska
      • we would have not caught it before
    • e-science paper

March 2018

March 30th, 2018

  • SWIP
    • pegasus-run issue, with wf restarting from scratch
      • because dagman rescue file is not there.
      • so should we update pegasus-run to look at the dagman.out file
        • so far we think it should be kept consistent with normal dagman behavior
        • to de discussed at condor week
    • mats created a Jira item for swip related statistics
    • Things remaining
      • Dials to be implemented
      • stampede changes
      • pegasus-transfer changes???
  • SC Tutorial Submission ( April 16th) 
    • https://sc18.supercomputing.org/submit/tutorials-submissions/
    • We should try and add exercises for containers
    • We will try for half day
      • 45 minute introduction
    • Feedback from Arizona Container Camp
      • There is interest.
    • coming up with an existing application that people understand or can relate to
      • montage - complex dax generator
      • rosetta
        • only works in nonsharedfs stuff 
        • with 
      • machine learning example?
        • with tensor flow?
        • requires container
      • NVIDIA has a lot of examples about machine learning
        • has to be multistep
        • and at least bag of tasks
      • Ashwin is doing some tensor flow stuff
        • on workflow.isi.edu
        • is working out of  jupyter notebook
      • Genome sequencing workflows??
        • use Broad GATK sequencing workflow to use
        • SOYKB and IRRI use GATK
        • and are huge communities
      • http://biocontainers.pro/docs/101/running-example/ 
  • Pegasus Report
    • we should be resolve Jira items as we fix them
    • will be also doing cumulative statistics 
  • Pegasus Office Hours
    • Jupyter Notebooks
    • will update the example to use namd example used for Oakridge
  • Panorama Stuff
    • our multiplexing part in monitord done so far
      • however we are relying on amqp queues and routing keys for filtering
    • darshan data population
      • we need to invoke a script (pegasus-darshan) that will be invoked in the namd wrapper script, to pull the data from darshan logs on the file system and generate an ASCII output
    • Panorama.isi.edu VM
      • AMQP
      • Logstash
      • Kibana
      • Elastic Search
        • Make it do a backup every so often.
        • Warns against doing it as a permanent datastore
        • Rajiv will verify
      • Influx
    • Backups
      • CRASH PLAN backup for the /srv and /opt in the panorama VM
  • LIGO Database locked issues
    • we need to look into the locking issues by tinkering with monitord flush intervals

March 16th, 2018

  • SWIP
    • Most of the SWIP stuff is done as far as planner changes and getting the workflows running
    • we are in a position to share something
    • To do
      • sharedfs
      • Dial implementation
      • Update monitoring
      • Paper submission for EScience
  • Pegasus Reports
    • new applications to attribute to pegasus grants
    • all the mike wangs work will go here
    • SCEC
    • LIGO - need to ping Duncan
  • Panorama/ Pegasus workflow endpoints
    • We seems to be going towards AMQP
      • How is AMQP going to be configured
      • So far we have 
        • amqp://[USERNAME:PASSWORD@]amqp.isi.edu[:port]/<exchange_name>
          Online monitoring in kickstart 
          • amqp://[USERNAME:PASSWORD@]amqp.isi.edu[:port]/<virtualhost>/<exchange_name>
      • Virtual Hosts
        • right now virtual host is hardcoded in monitord code. we set it to pegasus
        • global - across workflows
      • Exchanges
        • should be global across workflows
        • type direct - in panorama
        • we want them to be type -> topic instead
      • Queue
        • in panorama different queues for each workflows
      • Routing Keys
        • the routing key should be based on stampede event names
      • Events populated

February 2018

February 23th, 2018

Eliminate support for Py2.6?

Python Dependencies

All - future

pegasus-service - Flask, SQLAlchemy, Flask-SQLAlchemy, Flask-Cache, pam, plex, pyOpenSSL, ordereddict

pegasus-monitord - SQLAlchemy

pegasus-analyzer - SQLAlchemy

pegasus-s3 - boto

pegasus-globus-* - globus-sdk

pegasus-init - jinja2

pegasus-metadata - argparse

pegasus-em - requests

PostgreSQL - psycopg2

MySQL - MySQL-Python OR mysqlclient


Note: Packages in green are available from yum.

February 9th, 2018

  • SWIP 
    • checksum computation will be implemented in pegasus-transfer. 
      • allows us to handle the case where the input files don't have checksums in the RC
    • integrity checks are disabled now for files that dont have checksums in the RC
    • dial knob
  • Tests
    • seem to be slow
    • bamboo could be moved to the new server
    • storage constraint test
  • Lizard FS
    • Mats will give an update next time around
  • Servers
    • Trying to do two server
    • IF we buy one server
      • Buy a storage server. That is Mats preference.
      • SoyKB workflow has
    • Compute 
      • we will get a compute server first. 
    • We should figure out the server and put in the request soon, and done by Feb end
  • LSST
    • Tom Glanzman? 
    • We will touch base on Monday with Tom and Nersc folks
  • Office Hours today
    • have a presentation on containers
    • will upload on the website

January 2018

January 12nd, 2018

  • AWS Batch
    • seems to be running in karan's account.
    • update documentation about aws batch
  • Pegasus 4.8.1 Release
    • upto Mats whether we should tag or not.
  • Pegasus Office Hours
    • Rafael will look up a new name
    • Container Presentation
      • Talk about containers
      • Blue Jeans 
    • Advertising avenues
      • XSEDE workflows list
      • OSG List 

December 2017

December 1st, 2017

  • AWS Batch
    • Client done. still have to figure out about stdout and stderr
    • maybe we should have batch push the files and control where the jobs go in
    • also maybe each file should go to it's own stdout stderr
  • Metrics for SWIP
    • Stampede
    • Metrics Server
    • Elastic Search
  • Rajiv working on changing the salt configuration
  • Model Integration with Wings

November 2017

November 10th, 2017

  • Pegasus
    • AWS Batch
      • checked in stuff
      • jars checked in aws sub directory in the jars folder.  pegasus-config classpath is updated accordingly
    • Bamboo builds
      • change in how users are handled
      • rajiv and mats worked on changing the salt configuration for the various machines
        • the major part changed was how the users are handled
        • the bamboo user got messed up and uid's were mismatching on the filesystem
        • main group for people unix accounts should be pegasus for everybody
        • only project users will have access to VM's for a particular project
    • Stewie Rebuild
      • move off stewie. the main OS needs to be updated
      • parnorama
        • Rafael and Geroge will create a VM for panorama
          • CENTOS 7
            • mats will help George create VM
          • Ashwin consumers from Influx DB
      • mysql server
        • Pegasus metrics server
    • JSON vs YAML
      • initial impressions seem to favor yaml
        • YAML does have benefit of including comments
        • also YAML , JSON will result in additional lines
    • templates for site catalogs
    • LSST
      • mats will update documentation for pyglidein 
      • to work with condor pool passwords thing
      • also will take mike site catalog to update NERSC entries
    • tests
      • rosetta and montage appear working again. not clear what triggered errors in first place
  • SC Next week
    • Rafael and Karan are away
  • AWS workshop for LIGO
  • George Panorama work
    • Dakota ends up launching multiple Pegasus workflows based on it's gradient functions
    • using ensemble manager to do multiple runs 
    • George will check in dakota test case and example
      • pick one approach and update documentation
    • SWIP Demo
    • think about merging stuff from panorama back to production branch
  • work with ian foster and raj kettimutt on globus online
    • do multi site run
  • Tudo
    • working on insitu
    • data spaces approach to have staging area
    • tudo wrote sample applications
    • evaluating on CORI using shared memory
    • burst buffers cannot be used
  • Ashwin
    • analyzes influx db data
    • using statistical learning
    • python panda library

November 3rd, 2017

  • Pegasus 4.8.1 release
    • 3 bugs in worker package staging.
    • pegasus-transfer PYTHONHOME unset does not work
    • hierarchal workflow handling. 
      • to be discussed tomorrow
  • AWS Batch
    • need to check in changes.
    • need to add options for the client and do error checking.
    • still need to figure out how to integrate in pegasus

September 2017

September 15th, 2017

  • Pegasus development
    • Dashboard
      • LSST might want it running out of a directory other than $HOME/.pegasus 
      • No plans to tackle it right now. requirements are vague. and catch 22 situtation
    • Python problem with Pegasus install
      • DAX3 problem does not work.
      • Could not be recreated
    • PyPy account should be disabled
      • pypy has a 4.3 pegasus package
      • we should remove it
    • The jobname with dagman not allowing . is fixed
  • LIGO
    • Heard from Duncan. Tried out metadata stuff
  • Another person at NERSC that is interested in running Condor
  • AWS Batch
    • done initial development.
    • how to retrieve logs etc.

September 8th, 2017

  • Pegasus 4.8.0 Release
    • went out this week
    • documentation
    • pyglidein
      • out of icecube
      • mats added a section in the documentation
        • pretty neat once it is setup
        • and works really well on machines with two factor
        • not tuned for MPI things.
        • on the submit  machine a web based python thing.
        • pegasus resource profiles will work out of the box with pyglidein
  • Releases
    • Post 4.8 Releases 
      • changes in the debain build
        • source package has been renamed. mats removed the source part
        • changed the versioninig of RPM and debian. The dev series will have the timestamp in it.
          • pegasus-version -f also has timestamp
      • Will create a separate YUM and DEB developer repositories
        • repositories will not be signed. 
      • Mats is still playing setup
      • Worked a lot on Debian packaging.
  • HipChat will be upgraded to Stride
  • Mats updated JIRA today
  • Sim Center Workflows
    • Using Condor IO thing
    • for 4.8.1 we should look at the remap thing
  • SWIP Poster
    • the first review is really good
  • Docker and Singularity
    • have stuff about engineering challenges
    • But not enough usage
    • Practical Aspect
  • Von's Group SWAMP thing.
    • pegasus is part of trusthworthy software thing?
  • AWS Batch
    • AWS batch thing works
  • Investigate how Dakota and Pegasus can work together
    1. Run Dakota as a job 
    2. Run Dakota on submission machine
      1. dakota calls a script that does a pegasus workflow
    3. Mix of 1 and 2.

August 2017

August 25th, 2017

  • Pegasus 4.8.0 Release
    • beta3 tagged
    • monitord replay issue for rc tables against mysql server
    • Jupyter thing
      • VM updated with Jupyter
    • Docker example application 
    • R builds with pegasus
      • for time being only brew builds have that disabled.
      • Condor update to the brew installation. 
  • Pegasus 4.9 Roadmap
    • SWIP 
      • lay out the changes
        • prioritize stuff for production readiness
        • the knob for integrity. 
        • get into transfers.
        • signing stuff on the backburner.
      • chaos monkey tests
    • metadata things
    • aws batch support
  • Pegasus Tutorial
    • George felt that Pegasus tutorial was a bit too easy.
    • it should be maybe more interactive. get the user to develop a new workflow
  • Tudo will pick up Decaf work
  • Dataspaces
    • do data management
  • Ashwin will work on deep learning on panorama
    • use tensor flow
  • Dakota
    • ini file . runs simulation and converges simulation points
    • George will be working on it
    • has a checkpoiniting facility

August 18th, 2017

  • mats found a new hydrology user in boulder
    • based at Boulder
    • there was a magpie presentation there. 
    • mats did a hosted ce tutorial
  • 4.8.0beta2 release
    • tagged and sent it out. 
  • monitord workflow and read permissions creation
    • should only when the database is created.
    • ~/.pegasus directory should be 755
  • dashboard errors
    • rajiv should traverse the directory in the dashboard.
  • LSST
    • cleanup issue
      • mats and karan agree on it, that it is bad application
      • we should reply to it. 
      • the wrapper should copy the file and launch the job
  • source a setup a script for jobs
    • has to be generically done
  • registration jobs shell expansion
    • we should not do getEnv=True
  • testing repo
    • stuart from LIGO asked for it.
  • BOSCO
    • we have the examples updated
  • Karan will remind Eliu about LIGO and Bluewaters
  • Slick Jupyter Demos
    • Started up VM's
  • Jupyter tutorial
    • should be integrated into the VM

August 11th, 2017

  • Bamboo is finally green
  • we will do a Pegasus RC1. actually a beta since we still want to address some issues.
  • Rajiv fixed the build with python crypto issues
    • pyopen-ssl was updated during 4.7.x series
    • we should package only things that we are not sensitive to the versions
    • so right now pyopenssl is removed from binary builds, and all associated dependencies were removed.
  • New throttling things.
    • number of jobs scale with the size of the workflows.
  • SCEC all hands meeting.
  • Documentation
    • Took a stab at the containers.
    • Rafael has to add a separate jupyter chapter
    • Karan will update the throttling docs
  • LSST
    • Mats and Karan had a call with Tom about designing a workflow for one of the production pipelines
    • Mats and Rafael had a call with the French cluster folks (Fredrique Sutter). Fredrique works for simgrid
  • Paper
    • rvGAHP paper ready for submissions
  • Suraj Poster
    • Mings pass really helped

July 2017

July 21st, 2017

  • VMs are down, so tests are slow, and cannot test the new features yet
    • Mats will send an email (or call) Derek to check with the VMs issue
  • Try to run the Montage container test on OSG
    • TODO: Reconfigure our poll (it is not flocked yet)
  • Pegasus 4.8.0
    • Bugs on the container (transformation catalog) is fixed
    • Stage in/out nodes based on the number of computing jobs on the workflow
    • TODO: add warning for errors (size of jobs)
    • Warning for category is done
    • TODO: reference implementation of a workflow using docker (1000 Genome workflow - Rafael)
    • Jupyter: add container keyword for API

June 2017

June 23rd, 2017

  • Pegasus 4.8.0
    • Decaf
      • local universe jobs does not honor request_cpus , and jobs remain idle if they ask for multiple cpu's
        • karan will update pegasus to remove the request_ parameters from the local universe jobs
    • Steven Clark
      • Pegasus build issue is related to python 3 compatibility in the DAX API
  • LIGO 
    • Eliu plans to run on Bluewaters
    • we should confirm that he only wants to run on bluewaters.
    • they have sucky performance of getting data to the compute nodes in bluewaters.
    • set the schedd start date

  • NERSC
    • Karan will do a test setup there.

  • Pegasus Builds
    • failed because of detain version upgrades to build tools
    • setup tools in python complains to pegasus 4.8.0-dev 

June 9th, 2017

  • Pegasus 4.7.5
    • pegasus-rc-client bug fix is done
    • 4.7.5 and 4.8.0 together
  • Pegasus 4.8 release
    • docker stuff is complete
      • docker tests added are green
    • karan will work on singularity next week.
    • LIGO reports pegasus lite jobs filling up /tmp . karan will check with LIGO on whether there is any environment set? 
    • rafael will update his api to make it consistent with the container format
    • also will add a bamboo example.
  • DECAF  integration
    • karan has an idea about it.

June 2nd, 2017

  • Pegasus 4.7.5
    • pegasus-rc-client bug fix to be done
  • Jupyter
    • rafael will be working on it during June
  • For 4.8.0 
    • container 
      • docker works in nonsharedfs right now. 
      • work on singularity support.
      • clustering . clustered jobs can only refer to one container
      • symlinks -  for 4.8.0 they are disabled. 
    • container sharedfs example
      • we have pegasus-lite with sharedfs. automatic translation of file URL's
    • transfer refiner
    • notification email updates
      • mats updated default notification scripts. will generate svg files
      • at end of workflow generate notifications that have statistics
        • monitord needs to run the remaining notifications after the workflow is done.
  • makeflow integration
    • limitations for pegasus generating make flow integration
      • makeflow model 
        • all files have to be on the submit host
        • how do we translate auxiliary jobs to make flow description
          • tyson at arizona. 
          • add new transfer jobs
          • add new credentials
          • no postscripts there
        • monitoring 
          • won't work with monitoring
          • write a new monitord.
      • maybe do an oppposite translation???
      • what will be useful is to integrate with using work queue with our own dagman manager.

May 2017

May 12th, 2017

  • auto scaling of stage out and stage in jobs
    • 4.8 transfer refiner will be Cluster by default.
    • auto-computation of number of stage in, stage out and cleanup jobs
      • defaults should be computed based on number of jobs at a level.
      • use a ratio or step function . 
      • come up ratio ranges for auto determination
        • 1:5 for numbers of jobs < 10K ( 20%)
        • 1:20 for number of jobs > 20k ( 5%)
      • will create a JIRA item for this

  • container stuff
    • close to having one example running
    • have not figured clustering jobs out yet.
    • mats agrees with the approach now. pegasus lite invokes the docker run commands.

  • integrity stuff
    • will make slides
    • be specific about we have done . 
    • we give them an option of running synthetic stuff
    • For 
    • also define best effort part. 
      • strict, off, minimal , best effort
    • how do we handle case where SHA exists.

  • WDL
    • workflow definition language
      • WDL is JSON based
      • has a template approach with variable substitution 

  • AWS Cleanup
    • need to delete snapshots and cleanup VM's

March 2017

March 17th, 2016

  • monitord stdout and stderr missing 
  • the VARS one. just expose the variable. 
  • SCEC issue
    • job managers per resource
    • got fixed by one job manager per job
    • BOSCO works partly. 
  • containers call from yesterday
    • dsa
  • metadata 
    • metadata population in postscripts
    • move metadata population to the postscripts.

March 10th, 2016

March 3rd, 2016

  • Pegasus 4.7.4 Release
    • sent out the release
    • we did a ligo fix yesterday to pegasus transfer
  • mats osg gem
    • workflow did not finish
      • pegasus-exitcode has a shortcut for a regex
        • make it more strict. whether to trigger failure in pegasus-exitcode
        • revisit how metadata population
        • trigger failure for missing records. 
  • SCEC RC client issue
    • Rafael will look into it for pegasus-rc-client
  • containers support
    • containers on a pause right now.
  • Webinar
    • lets try and schedule one for april end
    • bluejeans will be an option
    • topic will be covered new features for 4.8.0

February 2017

February 24th, 2016

  • Pegasus 4.7.4 Release
    • we will tag today. 
    • there is a potential monitord bug that happens on sub workflow retires only in the live mode, that Karan is unable to trace
      • ds
  • containers support
    • pegasus lite launches docker wrap
      • or the other way around. because worker package has to be installed in the container in some cases
        • so double install
    • Clustered jobs 
      • we want at max one container to use the clustered job.
  • monitord performance
    • on OSG connect there is a difference between 4.6 and 4.7 performance replay
  • monitord.log has errors indicating unable to read .out .err files. 
    • we think it is a race between DAGMan and the filesystem

February 17th, 2016

  • Pegasus 4.7.4 Release
    • targeted for next week. 
    • LIGO ran into a prescript issue
      • pegasus lite deleted the worker package in the workflow submit directory
        • only triggered when there was a subsequent compute job.
  • new transformation catalog format 
  • containers
    • open issue whether docker wrapper launches pegasus lite 
    • or the other way around

February 10th, 2016

  • Pegasus 4.7.3 Release
    • SCEC has issue with pegasus-db-admin 
      • mysqldump timesout when updating their replica catalog
    • Database TC
      • remove support for Database TC
  • Stewie and fisheye upgrades
    • fisheye upgrade
      • Mats agreed to do the upgrade
    • stewie runs debian 7
      • we need to upgrade it one day or later.
      • runs GridFTP and mysql 
      • RabbitMQ is running there
      • MongoDB is running there
      • Catalog dependencies on stewie
    • 5K limit for a new server
  • OSG All Hands Meeting
    • no tutorial looks like 
    • lots of pegasus users coming there
  • Containers Support
    • pegasus lite invokes the docker wrap. 
    • singularity support will be required.
    • container modes 
      • should we support docker definition file
        • do we build on the worker nodes?
      • pull in  an existing docker image from the hub
        • on the staging site
      • whether we should unload an image or not
        • we should try and cleanup
      • credential renaming has to be worked out
    • Transformation Catalog
      • how to represent container dependency in the transformation catalog

February 3rd, 2016

  • Pegasus 4.7.3 Release
    • we tag later today or first thing monday
    • waiting for scott to reply
  • Jupiter Notebook
    • in general jupyter the interactive interface closes if you close the tab
    • in our case it does not affect us, since we invoke pegasus-plan at the server end
    • Vicky has a workflow out of panorama that she has in jupyter as a set of the instructions
  • Containers
    • karan did some exploration of docker containers via HTCondor
    • by default docker in the container runs as root. 
      • means output files are written out as root
    • also the containers need to be shipped around.

January 2017

January 27th, 2016

...