Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
  • Table of Contents

April  2019

April 12th 2019

  • Pegasus 5.0
    • Site Catalog Conversion to YAML
      • mukund is mainly done
      • pushed out his changes
      • trying to make the tests green
    • Checkpointing changes to accomodate LIGO use of vanilla universe
      • Karan and Mats will explore and see if it is possible
      • cumulative stdout|stderr
        • what about time and duration values
        • since there is no DAG Node retry and job just goes on HELD state
    • Composite Events
      • Kibana dashboard needs to be updated
      • dropping __ in the event names
      • George wants the AMQP library updated 
        • Will create a JIRA item
    • Office Hours video
      • Karan will work Jasmine to upload the video
  • Papers
    • RACE Paper submitted last week
    • PEARC Paper this week
  • Proposals
    • Army Research 
      • enabling in-situ supports for ExaScale
      • linked with what Tu is doing
    • SCEC Proposal Submitted
      • have a good chance
    • Exascale one with Michigan
      • the call will come out soon
    • Ewa , Rafael and Deborah
      • NSF GCR Proposal
      • Modelling wild fires
      • Has PRICE school input and also Deborah Post DOC
  • EScience
    • Pegasus Tutorial Proposal
    • May 6, 2019: Tutorial Proposal Deadline
    • Also trying for the workflow comparison paper
    • Dynamo paper by George
  • Pegasus connect discussion
    • tabled it for later when Mats is around
  • HTCondor Week
    • Karan will be doing a Pegasus talk and Pegasus workshop
  • Pegasus OLCF Poster
    • combine the panda poster
    • can also submit to EScience
  • Ryan's work
    • Loic is moving pachyderm setup to AWS
  • Loic Rafael and Tu are working on a paper for Cluster
  • Software X

March  2019

March 29th 2019

  • 4.9.1 Release
    • done and working on 4.9.2
  • Site Catalog Conversion to YAML
    • mukund working on it
    • i still need to look at the bamboo tests
      • bamboo faling on mount scratch thing that condor thing
      • we have to fix in pegasus also. to fail on credentials in /tmp
      • check and do condor_config_val  on the key and check if /tmp is in there
      • mainly affects all the users that use x509
      • LIGO has also tripped over it . Both with Pegasus and without Pegasus
  • Condor vanilla checkpointing
    • karan asked him about what he is trying to do
  • composite events 
    • check for keys with same values
    • also do we need to pad extra keys for all events?
  • Extensions to Jupyter Integration
  • Pegasus Connect
    • will discuss on whiteboard on April 12th

March 1st 2019

  • 4.9.1 Release
  • Office Hours
    • On Friday March 22nd on real time monitoring
  • transformation catalog for 5.0
    • Mukund will work on it next
  • EScience?
    • Paper
  • pegasus-exitcode test
    • success message not parsed correctly  
  • Programmer
    • will interview the 

February  2019

February 22nd 2019

  • 4.9.1 Release
    • Pending Issues
      • This raises the larger issue of how long we want to support externals packages

        there are some packages we need to ship because of worker packages dependencies.

        We remove mysql python externals package for 4.9.1 and 5.0.0

        And also remove the dependencies from our deb and RPM builds.

      • Transfers within containers
        • We are only going to transfer from within the container till people complain
        • George Papadimitriou will add to the documentation.
      • non ascii encoding in the stdout
    • Support HPSS storage
  • Office Hours
    • George on real time monitoring.
      • Date?
  • EScience?
    • Paper
    • Tutorial submission

February 1st 2019

  • 4.9.1 Release
    • ascii encoding breaks while parsing for monitoring events. monitors should have the population working and have log a warning error.
      • but we should ensure that stdout in database still gets populated
      • Karan will fix this
  • New TC Format
  • Shifter Support in Pegasus
    • is in 4.9 branch
  • Pegasus Annual Report
    • will be working on it in coming weeks
    • will ask for input
    • next year report will be tricky . in terms of effort allocation.

January  2019

January 25th 2019

  • 4.9.1 Release
    • ascii encoding breaks while parsing for monitoring events. monitors should have the population working and have log a warning error.
      • but we should ensure that stdout in database still gets populated
  • YAML format for the TC
    • the line numbers should be mentioned in the errors
  • GitHub commits don't trigger bamboo builds right now
    • move to webhooks?
    • slack token in bamboo.yml . 
      • mats will look into it further
  • SCEC for HPC Transfer certificate issue
    • Globus online certificates messed up hpc-transfer issue.
  • Data Storage at NERSC
    • almost full
  • Singularity container with the entry point.
    • docker → singularity container conversion does not add the entry point.

January  18th 2019

  • 4.9.1
    • container execution
      • data transfers happen within the container
      • python3 issue
      • vague rules to discover what python to use
    • Singularity HUb URL's updated
      • Documentation and tutorials need to be updated
      • montage examples
      • python stuff: create JIRA item
    • LIGO pull requests
      • Build pull request
      • PAM module
      • subprocess package thing
      • also related to Python3 movement
  • Transformation Catalog Implementation
  • Astro Py
  • Shifter support at NERSC
  • Panda Integration
    • Rusio data pull in 
    • fetching data might be easier
  • Journal Paper
    • need to write something about containers

December  2018

December 13rd, 2018

  • Pegasus 4.9.1 release
    • local site catalog entry creation
      • based on the pegasus version on the submit host
    • encoding issue in the stdout.
  • Pegasus 5.0 Release
    • TC yaml implementation
      • mukund will create a yaml schema compatible with the TC
    • backwards compatibly 
      • case by case basis
      • definitely for
        • catalogs
        • dax 
        • pegasus-transfer
  • SWIP Paper
    • we are in good shape
  • Titan
    • under the PBS batch gahp.
  • ZTF
    • the pipeline is based on docker-compose
    • peter will visit ISI with postdoc Danny in January
  • Tutorial at TACC
    • karan has updated pegasus-init to work on wrangler
    • will update the tutorial notes accordingly 
  • OLCF accounts
    • make sure they work 
    • get karan and mats can login

November  2018

Nov 29th, 2018

  • Ryan
    • working on comparison paper with george on workflow systems
    • mats, karan shared neon meeting notes with Ryan
  • Pegasus 4.9.1 release
    • Due for december end
    • potential issue in monitord in reference to hierarchal organization of submit directories
    • pegasus-submitdir
  • ADASS Paper
    • due tomorrow
    • need to add information about sample run
  • SWIP paper
    • mats and karan will work on it tomorrow afternoon.
    • cull out sections
    • add information about updated monitoring in 4.9
  • OLCF Kubernetes 
    • Condor is installed and configured as root
    • George tried condor log directory to lustre as condor in container has to run as user not as root
    • LOG_DIR should be /tmp
    • volumes can be attached to container to contain workflows etc
  • Dynamo 
    • Do dynamic scheduling
    • George thinking of using flocking
    • similar to what is done in OSG
    • non-sharedfs deployments should work

Nov 1st, 2018

  • Pegasus 4.9.0 and 4.8.5 Released
    • We released it this week.
  • Pegasus Business Card
    • Advocate for job postings. 
      • Postdoc options
      • Programmers
    • We should take to conferences with us
  • Pegasus JAVA 8 dependence in RPM
    • there is a disconnect between RPM and
    • Karan working on a wlpipe demo example
  • New Student
    • Mukund 
  • Duncan started using 4.9.0 and has updated pyCBC to use singularity
    • changed our container execution model
    • all transfers done within the container now.

October  2018

Oct 12th, 2018

  • Rescheduling meetings
    • New time is Thursdays 2PM starting from last week of October
  • DAX APi reporting
    • Perl DAX API - Rajiv
  • Atlas visit
    • Wednesday we have Scientific Computing Seminar
      • Will involve writing a Pegasus code generator
      • Panda is second biggest after Condor on OSG
    • Thursday 
      • Karan and George will be there.
      • Mats might be available remotely
  • 4.9.0 Release
    • Mats preference is to skip the beta tag
    • Aim for the full release
    • Documentation freeze on Oct 26th
    • Try and do the builds over the weekend
  • Duncan container usecase
    •  cvmfs hosted container images
  • Demo repository
    • panorama data and some runs from exogeni / nersc
    • Mats has two new elastic search VM's and are part of Elastic Search cluster
    • these vm's data is backed up also

Oct 5th, 2018

  • Rescheduling meetings
    • Either Tuesday or Thursdays
      • Karan will circulate a doodle poll

September 2018

September 28th, 2018

  • Rescheduling meetings
    • Either Tuesday or Thursdays
      • Karan will circulate a doodle poll
  • Pegasus 4.9.0 Release
    • transformation selection issue
      • karan has not been able to recreate it yet.
      • will look into it more today
    • docker singularity pulls
    • container symlink 
    • deprecate api's
      • modify DAX generators to indicate version/ DAX API used.
      • will look into ways on how to do it
        • one way is workflow metadata attributes
        • second is attribute to ADAG object.
      • rajiv will check how it gets stored in the metrics server
    • will try and do a poster with Mike at ADASS
    • deadline is Oct 8th

September 21st, 2018

  • Rescheduling meetings
    • Either Tuesday or Thursdays
  • Pegasus 4.9 release
    • integrity error reporting
      • pegasus-statistics reporting information about integrity errors
      • the unicorn dashboard for internal swip purposes
        • errors are appearing in the stream
        • more brainstorming required. the data is there
        • not clear whether to use grafana or kibana
          • does not have drill down functionality
          • mix of production and test workflows
          • create different queues in AMQP exchanges
    • container mount point support
      • karan is close to have that being implemented
    • transferring outputs to multiple location
      • lets say one for portal and the other for 
      • list of output sites
      • good feature to add for 4.9.1
      • update --output-site option to pegasus-plan
    • pull docker images for singularity runs
      • we should do for 4.9.0
      • planner needs to tell pegasus-transfer an extra attribute. 
        • add a type attribute
    • Papers 
      • Github private papers repo
    • Deprecate stuff
      • perl api
      • old catalog formats
      • pegasus-plots
    • Hiring

August 2018

August 24th, 2018

  • Pegasus 4.8.4 Release
    • when are we releasing?
      • next week before mats go on vacation
  • error tagging
    • update stampede schema to add a table called tags
    • will allow us to capture number of integrity errors

August 17th, 2018

  • Pegasus 4.8.4 Release
    • RPM fix ? 
    • mats will manually verify
    • Karan should follow up with Stuart
  • AMQP filtering
    • we are working on having filtering in built into monitord
    • nepomunk already has 33 errors identified
    • we need to db connection, pegasus-db-admin and other tools to pass properties with pegasus property prefix stripped off
  • SWIP Paper
    • one reject seems to be harsh
    • we can try for HPDC also

August 3, 2018

  • Pegasus 4.8.3 Release
    • Design Safe / TACC on Wrangler headnode
    • Nextflow has integration with SLURM and everything can be installed in user space
  • PMC unit tests are broken
    • lets fix the tests
  • Pegasus 4.9 release
    • more real life runs
    • nepomunk against ceph-s3 from one of uchicago machines
    • we need to get stats reported for integrity errors
      • larger issue of error classification
  • ADASS Tutorial
    • we got into second round
      • add on exercise to run montage in the end.
  • LIGO
    • Bruce group at AEI Hannover has left LSC
  • Infrastructure
    • HipChat mess
      • should we move to ISI Slack
    • Public Chat feature
      • Some clients for Hipchat
    • Get a free channel from Slack
      • for all Hipchat rooms
      • what about ISI slack?? 
    • Github removal of old integrations
  • MINT Meeting
    • went well overall 
    • issue of scoping . 

July 2018

July 27th, 2018

  • Pegasus 4.8.3 Release
    • VM Tutorial
      • will update pegasus-init requirements to get it working
      • main tutorial chapter will be updated for 4.9
        • because then tutorial based container may not work
    • change how docker scripts set environment
    • SCEC database loading error
  • Failing Tests
    • Issue in updates to the dashboard database
  • Panorama Paper
    • agreed on a re-organization

June 2018

June 29th, 2018

  • Pegasus
    • 4.8.3 needs to be released because of singularity launching options
      • will wait till tutorial is updated. 
      • karan will update pegasus-init with population modeling or povray option
    • 4.9
      • pegasus-statistics updated with integrity metrics
      • how to flag job errors because of integrity
        • need to figure out logic
        • value add proposition
        • maybe we should value type in the pegasus lite 
      • need to implement the integrity dial
    • Start creating default local site entries to execute without local site
  • ADASS Tutorial
    • Will submit today 
    • Google doc shared

June 22nd, 2018

  • Pegasus
    • SWIP paper submitted to escience
    • 4.8 montage tests failing
    • changes for integrity metrics in pegasus-transfer
    • updated monitord to parse events from various sources like pegasus lite output
    • mats pointed out to a bug in monitord
  • LIGO
    • pip for python source package
      • update dependencies for latest packages , like pyopen ssl
      • install in the pip repository
    • pegasus-analyzer
    • interested in swip and containers.
    • will use containers
    • run on Comet
  • 1000 genome workflow or use chimerica workflow
  • ADASS Tutorial
    • montage ? 
    • probably pycbc is also submitting a proposal

June 8th, 2018

  • Scott Replica Catalog issue
    • Replica Catalog deletes take a long time
  • Bamboo
    • bamboo emails are no longer received. so we dont come to know about workflow plan failures
  • SWIP 
    • monitord integrity changes.  population of data from ks records working now.
    • we still need to populate data from pegasus lite records and pegasus-transfer
    • pegasus-statisitcs need to be updated
    • 0.1% overhead on production osg gem workflow
  •  Pegasus deployment at ORNL
    • we should be doing it similar to hpc-pegasus
  • Pegasus Office Hours
    • next one in August
    • travels in July

May 2018

May 4th, 2018

  • Pegasus 4.8.2 Release done on May 3rd
  • we should consider separate user data to a separate file on pegasus-wms
  • si2 meeting updates
    • some potential new users
    • ewa slides were a good overview summary
    • integrity data schema changes. 
    • monitord changes need thinking

April 2018

April 6th, 2018

  • Pegasus 4.8.2 Release
    • PMC bugs
    • tutorial for usc hpc
    • no longer allow + or . in the names
  • Pegasus Report
    • Submitted for Ewa' review
  • SWIP test run
    • discovered integrity errors in the wild
    • at colorado and university of nebraska
      • we would have not caught it before
    • e-science paper

March 2018

March 30th, 2018

  • SWIP
    • pegasus-run issue, with wf restarting from scratch
      • because dagman rescue file is not there.
      • so should we update pegasus-run to look at the dagman.out file
        • so far we think it should be kept consistent with normal dagman behavior
        • to de discussed at condor week
    • mats created a Jira item for swip related statistics
    • Things remaining
      • Dials to be implemented
      • stampede changes
      • pegasus-transfer changes???
  • SC Tutorial Submission ( April 16th) 
    • We should try and add exercises for containers
    • We will try for half day
      • 45 minute introduction
    • Feedback from Arizona Container Camp
      • There is interest.
    • coming up with an existing application that people understand or can relate to
      • montage - complex dax generator
      • rosetta
        • only works in nonsharedfs stuff 
        • with 
      • machine learning example?
        • with tensor flow?
        • requires container
      • NVIDIA has a lot of examples about machine learning
        • has to be multistep
        • and at least bag of tasks
      • Ashwin is doing some tensor flow stuff
        • on
        • is working out of  jupyter notebook
      • Genome sequencing workflows??
        • use Broad GATK sequencing workflow to use
        • SOYKB and IRRI use GATK
        • and are huge communities
  • Pegasus Report
    • we should be resolve Jira items as we fix them
    • will be also doing cumulative statistics 
  • Pegasus Office Hours
    • Jupyter Notebooks
    • will update the example to use namd example used for Oakridge
  • Panorama Stuff
    • our multiplexing part in monitord done so far
      • however we are relying on amqp queues and routing keys for filtering
    • darshan data population
      • we need to invoke a script (pegasus-darshan) that will be invoked in the namd wrapper script, to pull the data from darshan logs on the file system and generate an ASCII output
    • VM
      • AMQP
      • Logstash
      • Kibana
      • Elastic Search
        • Make it do a backup every so often.
        • Warns against doing it as a permanent datastore
        • Rajiv will verify
      • Influx
    • Backups
      • CRASH PLAN backup for the /srv and /opt in the panorama VM
  • LIGO Database locked issues
    • we need to look into the locking issues by tinkering with monitord flush intervals

March 16th, 2018

  • SWIP
    • Most of the SWIP stuff is done as far as planner changes and getting the workflows running
    • we are in a position to share something
    • To do
      • sharedfs
      • Dial implementation
      • Update monitoring
      • Paper submission for EScience
  • Pegasus Reports
    • new applications to attribute to pegasus grants
    • all the mike wangs work will go here
    • SCEC
    • LIGO - need to ping Duncan
  • Panorama/ Pegasus workflow endpoints
    • We seems to be going towards AMQP
      • How is AMQP going to be configured
      • So far we have 
        • amqp://[USERNAME:PASSWORD@][:port]/<exchange_name>
          Online monitoring in kickstart 
          • amqp://[USERNAME:PASSWORD@][:port]/<virtualhost>/<exchange_name>
      • Virtual Hosts
        • right now virtual host is hardcoded in monitord code. we set it to pegasus
        • global - across workflows
      • Exchanges
        • should be global across workflows
        • type direct - in panorama
        • we want them to be type -> topic instead
      • Queue
        • in panorama different queues for each workflows
      • Routing Keys
        • the routing key should be based on stampede event names
      • Events populated

February 2018

February 23th, 2018

Eliminate support for Py2.6?

Python Dependencies

All - future

pegasus-service - Flask, SQLAlchemy, Flask-SQLAlchemy, Flask-Cache, pam, plex, pyOpenSSL, ordereddict

pegasus-monitord - SQLAlchemy

pegasus-analyzer - SQLAlchemy

pegasus-s3 - boto

pegasus-globus-* - globus-sdk

pegasus-init - jinja2

pegasus-metadata - argparse

pegasus-em - requests

PostgreSQL - psycopg2

MySQL - MySQL-Python OR mysqlclient

Note: Packages in green are available from yum.

February 9th, 2018

  • SWIP 
    • checksum computation will be implemented in pegasus-transfer. 
      • allows us to handle the case where the input files don't have checksums in the RC
    • integrity checks are disabled now for files that dont have checksums in the RC
    • dial knob
  • Tests
    • seem to be slow
    • bamboo could be moved to the new server
    • storage constraint test
  • Lizard FS
    • Mats will give an update next time around
  • Servers
    • Trying to do two server
    • IF we buy one server
      • Buy a storage server. That is Mats preference.
      • SoyKB workflow has
    • Compute 
      • we will get a compute server first. 
    • We should figure out the server and put in the request soon, and done by Feb end
  • LSST
    • Tom Glanzman? 
    • We will touch base on Monday with Tom and Nersc folks
  • Office Hours today
    • have a presentation on containers
    • will upload on the website

January 2018

January 12nd, 2018