Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  • Table of Contents

April  2019

April 12th 2019

  • Pegasus 5.0
    • Site Catalog Conversion to YAML
      • mukund is mainly done
      • pushed out his changes
      • trying to make the tests green
    • Checkpointing changes to accomodate LIGO use of vanilla universe
      • Karan and Mats will explore and see if it is possible
      • cumulative stdout|stderr
        • what about time and duration values
        • since there is no DAG Node retry and job just goes on HELD state
    • Composite Events
      • Kibana dashboard needs to be updated
      • dropping __ in the event names
      • George wants the AMQP library updated 
        • Will create a JIRA item
    • Office Hours video
      • Karan will work Jasmine to upload the video
  • Papers
    • RACE Paper submitted last week
    • PEARC Paper this week
  • Proposals
    • Army Research 
      • enabling in-situ supports for ExaScale
      • linked with what Tu is doing
    • SCEC Proposal Submitted
      • have a good chance
    • Exascale one with Michigan
      • the call will come out soon
    • Ewa , Rafael and Deborah
      • NSF GCR Proposal
      • Modelling wild fires
      • Has PRICE school input and also Deborah Post DOC
  • EScience
    • Pegasus Tutorial Proposal
    • May 6, 2019: Tutorial Proposal Deadline
    • Also trying for the workflow comparison paper
    • Dynamo paper by George
  • Pegasus connect discussion
    • tabled it for later when Mats is around
  • HTCondor Week
    • Karan will be doing a Pegasus talk and Pegasus workshop
  • Pegasus OLCF Poster
    • combine the panda poster
    • can also submit to EScience
  • Ryan's work
    • Loic is moving pachyderm setup to AWS
  • Loic Rafael and Tu are working on a paper for Cluster
  • Software X

March  2019

March 29th 2019

  • 4.9.1 Release
    • done and working on 4.9.2
  • Site Catalog Conversion to YAML
    • mukund working on it
    • i still need to look at the bamboo tests
      • bamboo faling on mount scratch thing that condor thing
      • we have to fix in pegasus also. to fail on credentials in /tmp
      • check and do condor_config_val  on the key and check if /tmp is in there
      • mainly affects all the users that use x509
      • LIGO has also tripped over it . Both with Pegasus and without Pegasus
  • Condor vanilla checkpointing
    • karan asked him about what he is trying to do
  • composite events 
    • check for keys with same values
    • also do we need to pad extra keys for all events?
  • Extensions to Jupyter Integration
  • Pegasus Connect
    • will discuss on whiteboard on April 12th


March 1st 2019

  • 4.9.1 Release
  • Office Hours
    • On Friday March 22nd on real time monitoring
  • transformation catalog for 5.0
    • Mukund will work on it next
  • EScience?
    • Paper
  • pegasus-exitcode test
    • success message not parsed correctly  
  • Programmer
    • will interview the 

February  2019

February 22nd 2019

  • 4.9.1 Release
    • Pending Issues
      • https://jira.isi.edu/projects/PM/versions/11891
      • This raises the larger issue of how long we want to support externals packages

        there are some packages we need to ship because of worker packages dependencies.

        Consensus:
        We remove mysql python externals package for 4.9.1 and 5.0.0

        And also remove the dependencies from our deb and RPM builds.

      • Transfers within containers
        • We are only going to transfer from within the container till people complain
        • George Papadimitriou will add to the documentation.
      • non ascii encoding in the stdout
    • Support HPSS storage
  • Office Hours
    • George on real time monitoring.
      • Date?
  • EScience?
    • Paper
    • Tutorial submission

February 1st 2019

  • 4.9.1 Release
    • ascii encoding breaks while parsing for monitoring events. monitors should have the population working and have log a warning error.
      • but we should ensure that stdout in database still gets populated
      • Karan will fix this
  • New TC Format
  • Shifter Support in Pegasus
    • is in 4.9 branch
  • Pegasus Annual Report
    • will be working on it in coming weeks
    • will ask for input
    • next year report will be tricky . in terms of effort allocation.

January  2019

January 25th 2019

  • 4.9.1 Release
    • ascii encoding breaks while parsing for monitoring events. monitors should have the population working and have log a warning error.
      • but we should ensure that stdout in database still gets populated
  • YAML format for the TC
    • the line numbers should be mentioned in the errors
  • GitHub commits don't trigger bamboo builds right now
    • move to webhooks?
    • slack token in bamboo.yml . 
      • mats will look into it further
  • SCEC for HPC Transfer certificate issue
    • Globus online certificates messed up hpc-transfer issue.
  • Data Storage at NERSC
    • almost full
  • Singularity container with the entry point.
    • docker → singularity container conversion does not add the entry point.

January  18th 2019

  • 4.9.1
    • container execution
      • data transfers happen within the container
      • python3 issue
      • vague rules to discover what python to use
    • Singularity HUb URL's updated
      • Documentation and tutorials need to be updated
      • montage examples
      • python stuff: create JIRA item
    • LIGO pull requests
      • Build pull request
      • PAM module
      • subprocess package thing
      • also related to Python3 movement
  • Transformation Catalog Implementation
  • Astro Py
  • Shifter support at NERSC
  • Panda Integration
  • CENON NT
    • Rusio data pull in 
    • fetching data might be easier
  • Journal Paper
    • need to write something about containers

December  2018

December 13rd, 2018

  • Pegasus 4.9.1 release
    • local site catalog entry creation
      • based on the pegasus version on the submit host
    • encoding issue in the stdout.
  • Pegasus 5.0 Release
    • TC yaml implementation
      • mukund will create a yaml schema compatible with the TC
    • backwards compatibly 
      • case by case basis
      • definitely for
        • catalogs
        • dax 
        • pegasus-transfer
  • SWIP Paper
    • we are in good shape
  • Titan
    • under the PBS batch gahp.
  • ZTF
    • the pipeline is based on docker-compose
    • peter will visit ISI with postdoc Danny in January
  • Tutorial at TACC
    • karan has updated pegasus-init to work on wrangler
    • will update the tutorial notes accordingly 
  • OLCF accounts
    • make sure they work 
    • get karan and mats can login

November  2018

Nov 29th, 2018

  • Ryan
    • working on comparison paper with george on workflow systems
    • mats, karan shared neon meeting notes with Ryan
  • Pegasus 4.9.1 release
    • Due for december end
    • potential issue in monitord in reference to hierarchal organization of submit directories
    • pegasus-submitdir
  • ADASS Paper
    • due tomorrow
    • need to add information about sample run
  • SWIP paper
    • mats and karan will work on it tomorrow afternoon.
    • cull out sections
    • add information about updated monitoring in 4.9
  • OLCF Kubernetes 
    • Condor is installed and configured as root
    • George tried condor log directory to lustre as condor in container has to run as user not as root
    • LOG_DIR should be /tmp
    • volumes can be attached to container to contain workflows etc
  • Dynamo 
    • Do dynamic scheduling
    • George thinking of using flocking
    • similar to what is done in OSG
    • non-sharedfs deployments should work

Nov 1st, 2018

  • Pegasus 4.9.0 and 4.8.5 Released
    • We released it this week.
  • Pegasus Business Card
    • Advocate for job postings. 
      • Postdoc options
      • Programmers
      • pegasus.isi.edu/jobs
    • We should take to conferences with us
  • Pegasus JAVA 8 dependence in RPM
    • there is a disconnect between RPM and common.sh
  • ADASS
    • Karan working on a wlpipe demo example
  • New Student
    • Mukund 
  • Duncan started using 4.9.0 and has updated pyCBC to use singularity
    • changed our container execution model
    • all transfers done within the container now.

October  2018

Oct 12th, 2018

  • Rescheduling meetings
    • New time is Thursdays 2PM starting from last week of October
  • DAX APi reporting
    • Perl DAX API - Rajiv
  • Atlas visit
    • Wednesday we have Scientific Computing Seminar
      • Will involve writing a Pegasus code generator
      • Panda is second biggest after Condor on OSG
    • Thursday 
      • Karan and George will be there.
      • Mats might be available remotely
  • 4.9.0 Release
    • Mats preference is to skip the beta tag
    • Aim for the full release
    • Documentation freeze on Oct 26th
    • Try and do the builds over the weekend
  • Duncan container usecase
    •  cvmfs hosted container images
  • Demo repository
    • panorama data and some runs from exogeni / nersc
    • Mats has two new elastic search VM's and are part of Elastic Search cluster
    • these vm's data is backed up also

Oct 5th, 2018

  • Rescheduling meetings
    • Either Tuesday or Thursdays
      • Karan will circulate a doodle poll

September 2018

September 28th, 2018

  • Rescheduling meetings
    • Either Tuesday or Thursdays
      • Karan will circulate a doodle poll
  • Pegasus 4.9.0 Release
    • transformation selection issue
      • karan has not been able to recreate it yet.
      • will look into it more today
    • docker singularity pulls
    • container symlink 
    • deprecate api's
      • modify DAX generators to indicate version/ DAX API used.
      • will look into ways on how to do it
        • one way is workflow metadata attributes
        • second is attribute to ADAG object.
      • rajiv will check how it gets stored in the metrics server
  • ADASS
    • will try and do a poster with Mike at ADASS
    • deadline is Oct 8th

September 21st, 2018

  • Rescheduling meetings
    • Either Tuesday or Thursdays
  • Pegasus 4.9 release
    • integrity error reporting
      • pegasus-statistics reporting information about integrity errors
      • the unicorn dashboard for internal swip purposes
        • errors are appearing in the stream
        • more brainstorming required. the data is there
        • not clear whether to use grafana or kibana
          • does not have drill down functionality
          • mix of production and test workflows
          • create different queues in AMQP exchanges
    • container mount point support
      • karan is close to have that being implemented
    • transferring outputs to multiple location
      • lets say one for portal and the other for 
      • list of output sites
      • good feature to add for 4.9.1
      • update --output-site option to pegasus-plan
    • pull docker images for singularity runs
      • we should do for 4.9.0
      • planner needs to tell pegasus-transfer an extra attribute. 
        • add a type attribute
    • Papers 
      • Github private papers repo
    • Deprecate stuff
      • perl api
      • old catalog formats
      • pegasus-plots
    • Hiring

August 2018

August 24th, 2018

  • Pegasus 4.8.4 Release
    • when are we releasing?
      • next week before mats go on vacation
  • error tagging
    • update stampede schema to add a table called tags
    • will allow us to capture number of integrity errors

August 17th, 2018

  • Pegasus 4.8.4 Release
    • RPM fix ? 
    • mats will manually verify
    • Karan should follow up with Stuart
  • AMQP filtering
    • we are working on having filtering in built into monitord
    • nepomunk already has 33 errors identified
    • we need to db connection, pegasus-db-admin and other tools to pass properties with pegasus property prefix stripped off
  • SWIP Paper
    • one reject seems to be harsh
    • we can try for HPDC also

August 3, 2018

  • Pegasus 4.8.3 Release
  • SLURM
    • Design Safe / TACC on Wrangler headnode
    • Nextflow has integration with SLURM and everything can be installed in user space
  • PMC unit tests are broken
    • lets fix the tests
  • Pegasus 4.9 release
    • more real life runs
    • nepomunk against ceph-s3 from one of uchicago machines
    • we need to get stats reported for integrity errors
      • larger issue of error classification
  • ADASS Tutorial
    • we got into second round
      • add on exercise to run montage in the end.
  • LIGO
    • Bruce group at AEI Hannover has left LSC
  • Infrastructure
    • HipChat mess
      • should we move to ISI Slack
    • Public Chat feature
      • Some clients for Hipchat
    • Get a free channel from Slack
      • for all Hipchat rooms
      • what about ISI slack?? 
    • Github removal of old integrations
  • MINT Meeting
    • went well overall 
    • issue of scoping . 

July 2018

July 27th, 2018

  • Pegasus 4.8.3 Release
    • VM Tutorial
      • will update pegasus-init requirements to get it working
      • main tutorial chapter will be updated for 4.9
        • because then tutorial based container may not work
    • change how docker scripts set environment
    • SCEC database loading error
  • Failing Tests
    • Issue in updates to the dashboard database
  • Panorama Paper
    • agreed on a re-organization

June 2018

June 29th, 2018

  • Pegasus
    • 4.8.3 needs to be released because of singularity launching options
      • will wait till tutorial is updated. 
      • karan will update pegasus-init with population modeling or povray option
    • 4.9
      • pegasus-statistics updated with integrity metrics
      • how to flag job errors because of integrity
        • need to figure out logic
        • value add proposition
        • maybe we should value type in the pegasus lite 
      • need to implement the integrity dial
    • Start creating default local site entries to execute without local site
  • ADASS Tutorial
    • Will submit today 
    • Google doc shared

June 22nd, 2018

  • Pegasus
    • SWIP paper submitted to escience
    • 4.8 montage tests failing
    • changes for integrity metrics in pegasus-transfer
    • updated monitord to parse events from various sources like pegasus lite output
    • mats pointed out to a bug in monitord
  • LIGO
    • pip for python source package
      • update dependencies for latest packages , like pyopen ssl
      • install in the pip repository
    • pegasus-analyzer
    • interested in swip and containers.
  • SCEC CSEP
    • will use containers
    • run on Comet
  • 1000 genome workflow or use chimerica workflow
  • ADASS Tutorial
    • montage ? 
    • probably pycbc is also submitting a proposal

June 8th, 2018

  • Scott Replica Catalog issue
    • Replica Catalog deletes take a long time
  • Bamboo
    • bamboo emails are no longer received. so we dont come to know about workflow plan failures
  • SWIP 
    • monitord integrity changes.  population of data from ks records working now.
    • we still need to populate data from pegasus lite records and pegasus-transfer
    • pegasus-statisitcs need to be updated
    • 0.1% overhead on production osg gem workflow
  •  Pegasus deployment at ORNL
    • we should be doing it similar to hpc-pegasus
  • Pegasus Office Hours
    • next one in August
    • travels in July

May 2018

May 4th, 2018

  • Pegasus 4.8.2 Release done on May 3rd
  • we should consider separate user data to a separate file on pegasus-wms
  • si2 meeting updates
    • some potential new users
    • ewa slides were a good overview summary
    • integrity data schema changes. 
    • monitord changes need thinking

April 2018

April 6th, 2018

  • Pegasus 4.8.2 Release
    • PMC bugs
    • tutorial for usc hpc
    • no longer allow + or . in the names
  • Pegasus Report
    • Submitted for Ewa' review
  • SWIP test run
    • discovered integrity errors in the wild
    • at colorado and university of nebraska
      • we would have not caught it before
    • e-science paper

March 2018

March 30th, 2018

  • SWIP
    • pegasus-run issue, with wf restarting from scratch
      • because dagman rescue file is not there.
      • so should we update pegasus-run to look at the dagman.out file
        • so far we think it should be kept consistent with normal dagman behavior
        • to de discussed at condor week
    • mats created a Jira item for swip related statistics
    • Things remaining
      • Dials to be implemented
      • stampede changes
      • pegasus-transfer changes???
  • SC Tutorial Submission ( April 16th) 
    • https://sc18.supercomputing.org/submit/tutorials-submissions/
    • We should try and add exercises for containers
    • We will try for half day
      • 45 minute introduction
    • Feedback from Arizona Container Camp
      • There is interest.
    • coming up with an existing application that people understand or can relate to
      • montage - complex dax generator
      • rosetta
        • only works in nonsharedfs stuff 
        • with 
      • machine learning example?
        • with tensor flow?
        • requires container
      • NVIDIA has a lot of examples about machine learning
        • has to be multistep
        • and at least bag of tasks
      • Ashwin is doing some tensor flow stuff
        • on workflow.isi.edu
        • is working out of  jupyter notebook
      • Genome sequencing workflows??
        • use Broad GATK sequencing workflow to use
        • SOYKB and IRRI use GATK
        • and are huge communities
      • http://biocontainers.pro/docs/101/running-example/ 
  • Pegasus Report
    • we should be resolve Jira items as we fix them
    • will be also doing cumulative statistics 
  • Pegasus Office Hours
    • Jupyter Notebooks
    • will update the example to use namd example used for Oakridge
  • Panorama Stuff
    • our multiplexing part in monitord done so far
      • however we are relying on amqp queues and routing keys for filtering
    • darshan data population
      • we need to invoke a script (pegasus-darshan) that will be invoked in the namd wrapper script, to pull the data from darshan logs on the file system and generate an ASCII output
    • Panorama.isi.edu VM
      • AMQP
      • Logstash
      • Kibana
      • Elastic Search
        • Make it do a backup every so often.
        • Warns against doing it as a permanent datastore
        • Rajiv will verify
      • Influx
    • Backups
      • CRASH PLAN backup for the /srv and /opt in the panorama VM
  • LIGO Database locked issues
    • we need to look into the locking issues by tinkering with monitord flush intervals

March 16th, 2018

  • SWIP
    • Most of the SWIP stuff is done as far as planner changes and getting the workflows running
    • we are in a position to share something
    • To do
      • sharedfs
      • Dial implementation
      • Update monitoring
      • Paper submission for EScience
  • Pegasus Reports
    • new applications to attribute to pegasus grants
    • all the mike wangs work will go here
    • SCEC
    • LIGO - need to ping Duncan
  • Panorama/ Pegasus workflow endpoints
    • We seems to be going towards AMQP
      • How is AMQP going to be configured
      • So far we have 
        • amqp://[USERNAME:PASSWORD@]amqp.isi.edu[:port]/<exchange_name>
          Online monitoring in kickstart 
          • amqp://[USERNAME:PASSWORD@]amqp.isi.edu[:port]/<virtualhost>/<exchange_name>
      • Virtual Hosts
        • right now virtual host is hardcoded in monitord code. we set it to pegasus
        • global - across workflows
      • Exchanges
        • should be global across workflows
        • type direct - in panorama
        • we want them to be type -> topic instead
      • Queue
        • in panorama different queues for each workflows
      • Routing Keys
        • the routing key should be based on stampede event names
      • Events populated

February 2018

February 23th, 2018

Eliminate support for Py2.6?

Python Dependencies

All - future

...