You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 392 Next »

April 2016

April 1st, 2016

  • Pegasus development
  • Submitted tutorial for XSEDE 16
    • will include RADICAL
    • might update tutorial with BOSCO. Mats already have BOSCO to run on Comet
  • Derrick Lazaro wants to build a bigger filesystem
    • will be backed up 
    • has a commercial storage vendor in mind
    • has backed up capabilities in built ( block level backup)

March 2016

March 25th, 2016

 

  • Pegasus development
    • Gideon has been working on kickstart online monitoring for panorama.
      • the lib interpose monitoring requires app code to be dynamically linked to use LD_PRELOAD
      • now kickstart has a new mode, where monitoring thread will scan the proc filesystem for all processes in resource group.
        • this approach disables the PAPI counters as they need to be retrieved from app itself
      • also is working on aggregation logic
        • complicated accounting information
      • added another process called pegasus-monitor . so it is usually pegasus-kickstart-> pegasus-monitor -> application
      • can deploy without any external dependencies.
    • 4.6.1 release
      • in april when karan comes back from PAGE meeting
    • Condor bug on schedd evicting dagman jobs
      • LIGO noticed on other submit nodes
    • mats worked with Derrick to make sure glideins work with BOSCO on comet
      • CyVerse Talk - Mats will do a hands on thing with them.  Mats may do an existing tutorial.
      • raphael used the new slides.

  • Pegasus workshop
    • erin will get back to us with other feedback.
    • make the intro slides more simpler.

March 18th, 2016

 

  • Pegasus development
    • deep submit directory structure working for submit directory on PM-833 branch. however need to move to relative directory paths in the .dag file , before merging back to master
    • gideon is reworking how kickstart online monitoring work
      • working on kickstart monitor that goes through the /proc/ filesystem with the assumption all apps installed via kickstart have the same process group as pegasus-kickstart
    • pegasus workshop on campus on tuesday. it is setup https://pegasus.isi.edu/tutorial/usc/
      • the tutorial is setup using pegasus-init
      • will ask mats to move the XSEDE tutorial to pegasus-init
  • raphael working on energy paper again
  • stephan paper to HPDC got accepted

March 11th, 2016

 

  • Pegasus development
    • R DAX API is done
      • will be proposing for CGSMD 
    • Deep hierarchy structure
  • LIGO meeting
    • do a local file copy against the staging site
      • having a separate staging site bogs down inter site transfers
    • metadata
      • they are interested. want monitord to transfer the stampede database to another location from the scratch submit directories
      • cannot really do it in monitord
      • can also potentially do it in pegasus-dagman
    • argument passing for sub workflows
      • will be done 4.6.1
    • jobs that work on output site directory.
    • credentials issue
    • variable substitution
      • will make use of it
    • submit directory and other directory organizations
      • are interested in using it


  • Rosa
    • wants to do something with pegasus
  • Monitord

March 4th, 2016

 

  • Rosa
    • dispel4py Stream based workflow mapped to MPI, Storm
    •  MPI 3 Failure Recovery from Node Failures
  • Monitord
    •  Triggered by Condor failures. Workflow killed, condor recovery did not spit out all event on recovery.
    •  Need better way to test.
  • DB Admin
    •  Merge issues
    • rafael with confirm with gideon if there is an issue
  • Bamboo 
    •  Rebooted for DROWN Attack
  • R API
    •  Unit tests done.
    •  Packaging - Ship, host?

February 2016

February 19th, 2016

Pegasus development

  • support for GO - mats is working on it
  • dashboard shows multiple workflows with same uuid. fixed in monitord
  • pegasus transfer was prepending path because of globus location
    • mats has changed the logic
  • SCEC wanted to disable the stat of files that was happening automatically because of registration turned on.
    • we now have the property that can explicitly turn it off
  • SCEC tripped over replica catalog insert performance. 
    • rafael working on it. identified the bottleneck
  • Catalog files in submit directories
    • will create a catalogs directory
    • what about file based replica catalogs and cache files etc? some of them can be large.
  • Pegasus Blogs
    • SCEC
    • RVGahp?
  • Website
    • highlight applications better.
  • workq has a catalog server running
    • how do jobs report real time monitoring information back to monitor without rabbitmq
    • have a condor submit wrapper
      • will help us increase memory requirements in case of failures.
  • PegasusLite to have pegasus-transfer invocations as kickstart records
    • kickstart 

February 12th, 2016

Pegasus development

  • support for GO
    • mats found a python REST API - is decent.
    • will only work on a small subset of workflows
      • only third party transfers
      • how to handle file URL's on the submit host
      • and how do we activate the end points. 
      • lifetime of credentials .
      • cannot work on non shared fs mode, as what end point to use when staging to the worker nodes.
      • maybe we should look at how condor does it.
  • held jobs
    • dagman added support in 8.3 where the held job reason appears in dagman.out
    • will need schema change
    • failing workflows
    • held jobs.
    • have  a held job tab.
  • pegasus-submitdir archive
  • PMC job statistics in pegasus-statistics
    • mats and rajiv


Annual Report

February 5th, 2016

Pegasus development

  • 4.6.1 release 
    • pegasus-glite-configure
    • change of how retries are done for transfer jobs, using requirements and dagnode retries
      • https://jira.isi.edu/browse/PM-1049
      • there are just 2 retries implemented for transfer jobs
        • one more option is for pegasus-transfer to do better retries
        • and let the dagman retry set to 1.
      • use DAGMan influence to do in retry. 
      • do more testing at our end.
      • lets change default retries for transfer jobs
        • and do this only for transfer cleanups in condor environments 
    • LIGO runs
      • symlinking
    • R API 
      • will target 4.6.1 and keep it similar to the python API
  • 4.7.0 release
    • filesystem organization
  • Keck workshop on Pegasus on Feb 26th
  • Pegasus Annual Report
  • Pegasus GUI email
    • we will send user a direct link
  • Pegasus Announce SLES email
    • we have done on SLES 11 not on SLES 12

January 2016

January 28th, 2016

Pegasus development

  • 4.6.0 release 
    • Released this week
  • Pegasus Website
    • new website there
    • karan will put in the old release notes.
    • Links for old documentation on the new website
    • Rajiv has updated the docker tutorial
    • Tutorials will be moved to Pegasus website
    • Have a research link to point to Scitech website
  • Gideon confirmed MoabGlite helper scripts work with stock condor
    • will also check in a tool to put in the scripts to the right locations.
  • Pegasus Lite pulls in a worker package
    • should we download even by default from the worker package
    • warnings for worker package not being found.

January 22nd, 2016

 

Pegasus development

  • 4.6.0 release 
    • open items
    • constraints algo implemented and checked in . tests worked . 
    • documentation 
      • karan added chapters on metadata and variable expansion
      • gideon updated execution environments
      • updated the BOSCO section about SSH
    • pegasus-analyzer exits gracefully when nothing in the stampede database
      • check if analyzer and statistics check for the version.
    • pegasus-init
    • pegasus-db-admin 
      • better error message for that case.
    • karan will update tutorial to take account of default options
    • for glite style condor arguments quoting is automatically turned off

  • new website.

January 15th, 2016

Pegasus development

  • 4.6.0 release 
    • open items
      • https://jira.isi.edu/issues/?filter=10952
      • Rafael almost done with Constraints cleanup algo. tests run fine on the branch
      • pegasus-bootstrap
        • gideon was doing it as Jinja templates
        • will set it up a shell script. will be easier for people to update
      • documentation needs to be updated
      • map the globe 
    • for resource requirements add pegasus.queue keyword. update documentation to have one table. remove the documentation for priorities.
    • MOAB stuff  documentation. Will be considered for next major release.
  • DAGMan wants to remove the functionality of running postscript in case of prescript failure
    • does not affect pegasus
  • DAGMan wants to remove DAG NOOP keyword
    • was introduced for LIGO

January 8th, 2016

Pegasus development

  • 4.6.0 release 
  • Condor DAGMan log messages contain HTCondor in 8.5 series
    • broke monitord
    • fixed both 4.5.4 and 4.6.0. 
  • 8.5.2 has DAGMan logging timestamp from condor job log also.
    • monitord has been updated for that.
  • metrics reported were updated
  • Globus strict checking mode.
    • gridftp + ssh version.
  • Scott is working on getting the reverse GAHP stuff
  • How to configure the batch_gahp

December 2015

December 18th, 2015

Pegasus development

  • 4.6.0 release 
  • Reverse GAHP for Oakridge Titan
    • https://github.com/juve/rvgahp
    • done because cannot do incoming connections on titan
    • and also they don't want to use pilot jobs, as it is not easy to yank a job from a HTCondor queue
  • Harvard Pegasus installation
    • with SLURM support.. Karan will work on this.
  • We should explore remote batch GAHP stuff
    • for remote batch do
      • batch gahp --rgahp-key /give/key user@host
      • look at the remote_gahp script.
    • documentation for the batch gahp thing.

December 11th, 2015

Pegasus development

  • 4.6.0 release 
  • pegasus-s3 cert issue
    • updated boto library to account for cacert change
    • on mac, had to disable the automatic failover
  • Bypass PFN's
    • replica selectors can now order replicas. Default and regex ones updated
  • monitord
    • combination of missing job terminated and exception on casting job duration as int, triggered a bug that LIGO reported.
  • default behavior of planner
    • pick up pegasus.properties from cwd as a replacement for conf option
    • --sites option for * behavior , remove local from candidate sites
  • pegasus-bootstrap commands
    • sets up pegasus with site catalog.  and dax generators

December 4th, 2015

Pegasus development

  • JDBCRC 
    • should work for 4.5.3 . will work for the release
    • need to make the changes for 4.6.0
      • should consider batch inserts
      • rafael has implemented the batch inserts also
      • the database locked errors are fixed.
  • Rafael is looking into how the timeouts are implemented in sql alchemy
  • Mac OSX El Capitan Builds
    • Gideon fixed those. El Capitan does not allow root to modify files in /usr
    • Gideon changed the installer to install to /local 
    • Upgrading the mac mini build host. 
  • LIGO proxy issue
    • change in how proxies are generated. 
    • LIGO en-common proxies were not supported by J-Globus
    • Gideon has the patch for making the updated jar.
  • Gideon has added instructions on building globus for El - Capitan
  • Jobmanager-condor for obelix was updated to support both shared fs and non shared fs cases.
  • metadata registration
    • information for output files is tracked. 
  • pegasus-metadata client . Rajiv.
  • Cleanup algorithm - Rafael ?
  • LIGO use case for fallback PFN for PegasusLite cases
    • they want to use existing input data for frame files, on different locations across sites
    • but have a single site catalog entry for the computation, as glideinwms provisions it
    • Karan and Mats are working on it
    • pegasus-transfer changes ?
      • sd
  • LIGO running workflows across LIGO and OSG .
  • Database locked errors for monitord.
  • Call the 4.6 release as 5.0 release.
  • Gideon working on MOAB Blahp support. 

October 2015

October 23rd, 2015

Pegasus development

  • Tutorial VM
    • rajiv will update dashboard screenshots and go through the Virtual machine based tutorial
  • JDBCRC 
    • should work for 4.5.3 . will work for the release
    • need to make the changes for 4.6.0
      • should consider batch inserts
      • sqlite supports unlimited connections
        • for write locks , 25 jobs running for write locks. after 25 and it ignores timeout settings.
        • 67 registration jobs.
        • raphael is implementing a back off
        • category for the registration jobs
        • eventually do the dagman category stuff
    • metadata registration
      • information for output files is tracked. 
      • pegasus-metadata client
  • concurrency limits 
    • in partitionable slots this has an affect on performance
    • for 4.5.3 we will have a knob and set it to false by default.
  • Dashboard and PAM problem.
    • mats will create JIRA item.
  • salon working on data from MYRA
    • trying to find contention of data

October 16th, 2015

Pegasus development

  • does stime include io wait time. does not appear so. the cp of 1GB file indicates that
    • so then is there a way to capture the IO wait time
  • pegasus-db-admin
    • version migration for panorama works
    • metadata schema finalized
  • failing jdbc RC test
  • metadata population
    • metadata population from DAX working
    • metadata attributes from transformation catalog and site catalog are now incorporated, as metadata events are generated at end of site selection
    • output file sizes will be populated for files with register flag set to true.
  • pegasus dashboard
    • metadata display done other than the file information that needs to be populated
  • cleanup algorithm
    • will be done before raphael leaves for vacation
  • website changes
  • panorama changes
    • monitord change to make sure events don't get dropped
    • online monitoring spawns a thread where there is a queue  that is responsible for inserting the online monitoring events into the db
    • the thread checks the database to make sure the job instance is populated.
    • CURRENTLY, it is not done for the anomaly populations. 
  • SNS and Acme workflow
    • maybe we can hire a student to do it
    • maybe scalarm can be used for SNS workflows
    • Ben said there is a meeting about Pegasus on Titan.
  • Mats has installed wordpress on one of the machines.

October 9th, 2015

Pegasus development

  • pegasus-db-admin
    • db version has been moved to string. a new column was added. 
  • metadata population
    • files are populated if a user specifically associates metadata with a file in the DAX or if an output file is marked for registration
    • make sure that for tasks metadata attributes are inherited from the transformation catalog. 
  • pegasus-metadata client
    • output format ? 
    • is the client for end users
    • list files for a workflow
    • list workflow metadata
  • pegasus dashboard
    • workflow level
    • task level level 
    • file level metadata

October 2nd, 2015

Pegasus development

  • pegasus-db-admin
    • changes discussed last week?
    • also change to string for the database version for allowing merges with panorama
      • panorama db versions should be N.x and not whole integers
  • jdbrc sqlite test failures
  • pegasus-transfer
    • better job with grouping for ssh transfers.
  • metadata population
    • planner generates the events now for associating metadata with wf, job and files
    • use case should be for a file what workflow and job created that file.
  • Pegasus workshop
    • we will be using workflow.isi.edu
    • mats has created 30 training accounts on workflow.isi.edu 
    • suggestions on workflow example?
      • blender rendering example..
    • pegasus-dashboard should be installed
  • Sipht portal
    • back up and running

September 2015

September 25th, 2015

  • Pegasus development
    • pegasus-kickstart to return record on condor_rm ( SIGINT)
    • changes to data reuse algo for Chris Edlund
      • delete jobs when inplace cleanup is used for intermediate files that are not transferred to the output site.
    • use of DAGMan NOOP keyword
      • workflow test failures
      • change monitor to not complain for noop jobs.
    • comma separated directories for input dir
      • automatically delete the input directory ? we all agree not a general use case.
    • pegasus-transfer grouping should be done for all protocols?
      • problem is some renames for output files
      • avi has been running workflows on OSG with pegasus lite. 
      • 2 million connections over two days on SSH server 
    • pegasus-db-admin error handling. 
      • if it fails with error, it should not report that database has been updated. This is a bug
      • other is what to do , when 4.5 is run against
      • downgrade option
      • warn if db-admin detects database version is higher than what it is currently running, and exit with 0 exitcode.
  • Pegasus IEEE article accepted
  • montage workflows
    • dax generator is not maintained
    • have it as a student project to convert the DAX generator to python API.
      • they also check an overlap check
    • montage jobs have varying memory requirements
    • we should not showcase it.
  • Pegasus Workshop in October
    • fallback from USC HPCC cluster required
    • whole day will be rough.
    • Mats will not be around! Going for the duke workshop.
  • panorama
    • monitoring thread segfaults
    • why was the segfault happening initially
      • happening in fork system calls
      • related to starting and stopping monitoring threads
      • and how PAPI counters were updated.

September 18th, 2015

  • Pegasus development
    • pegasus-db-admin updated
    • for spec added registration of flat lfn's when deep LFN are used
    • workflow tests now running.
  • pegasus paper
    • will add info about galactic plane and gtfar
    • cloud challenges
      • talk about virtual clusters  . precipe / wranglar
        • tie more closely to setup stuff and talk about chef/puppet and precise and wrangler.
      • gtfar 
      • add them in acknowledgements
    • not much to add about cloud challenges other than image managements
  • hubub conference
    • latech user who wants to run on bleaters
    • tom bishop 
    • pegasus submit tutorial.
    • to do with steven... 
  • panorama
    • segfaults happening randomly
      • happen when the monitoring thread is started.
  • craft
    • jarek 
    • hubzero
      • chip design
      • instead of hubzero use open science framework - a non profit funded thing

September 11th, 2015

  • Pegasus development
    • worker package tests in pegasus lite
      • pegasus lite will complain if the system architecture 
    • panorama tests now work
      • maybe some problems might be masked!
    • jdbcrc 
      • updated jdbcrc . for mysql and postgres deletes work differently. 
      • raphael will abstract it out
    • gideon changed the way the papi counters are used in kickstart
      • earlier signals were being used for threads to report counters
      • PAPI now allows to query for counter values
  • Pegasus cloud article
    • ewa is doing the final edits
  • HubBub presentation
  • panorama
    • darek working on getting papi counters to monitord
    • changed the job metrics table in the stampede database.

September 4th, 2015

  • Pegasus development
    • worker package creation on the submit host.
      • should we include python externals directory .
      • we will put that back in. we only need boto. 
      • also need to make sure it works for a RPM or deb install.
      • implement the compatibility check in PegasusLite
    • panorama tests
    • better error for input file replica selection failures
    • Scalr for openstack tests
      • action has a new openstack deployment. 
      • have our two QNAPS setup on the build VM's to run workflow tests.
      • run on vmware pool.
    • SCEC shallow LFN's
      • for registration in the replica catalog.
      • put the test in 4.5 . 
    • Database schema changes
      • pegasus-db-admin changes to database schema.
      • downgrades work
  • The short paper
    • working on the google doc.
    • we are not actively working on ec2.
  • panorama
    • adding papi counters to online monitoring. 
    • pegasus-transfer explodes when signal is sent
    • online monitoring dashboard.

August 2015

August 28th, 2015

  • pegasus 4.5.2 released
  • worker package staging
    • planner will use a worker package from the submit side installation and use it.
  • pegasus s3 tests
    • currently no s3 tests
  • tests are running against 8.3.8
  • cleanup algorithm update ( Rafael)
    • estimate that it will be done in two weeks
    • has to work for multiple sites
  • cloud computing short paper
  • hub bub
  • panorama and dv/dt poster and presentations . in mid september
  • metadata discussion
    • google doc updated
    • leaning towards monitor populating the database
    • remove the estimated size and md5 checksum

August 21st, 2015

  • pegasus 4.5.2 release
    • release notes checked in
    • db-admin changes?
      • update man pages
    • python source package
    • tests are we moving to dev branch?
    • docker problem
      • how to get around it ?
      • an issue inside docker, that is being exposed
      • we will put in a wrapper around it. 
    • panorama branch is disabled
      • but tests should be fixed.
      • dark will be fixing it
      • rajiv pushed out his dashboard changes for darek. for demo at supercomputing.
  • cleanup algorithm
    • Rafael will start next week 
    • how will the limits be passed
  • kickstart changes
  • metadata schema discussion
    • next week.
    • postscript
    • dagman has plugin's
    • schema 
    • use case
    • stampede is sqlite
    • pegasus-exitcode write locks.
    • separate sqlite database for metadata. 

August 14th, 2015

  • Pegasus 4.5.1 release
  • Bamboo machine troubles
    • panorama tests hung because of bamboo
    • do experiment for the case where we do condor off and see what happens to pegasus-dagman.
  • Panorama tests
    • look at build #73
  • pegasus-kickstart stuff
    • for interpose stuff
    • gideon investigating how to cover all cases for threads
    • wants to make sure that descriptor table is accessed in a thread safe way. in worse case
    • also is doing thread tracking, thread counters and thread lists
  • directory structure organization for submit directories.
  • nonsharedfs mode problem for auxillary jobs
  • sudharshan cleanup algorithm
  • stefan update
    • working on user models on how to submit jobs to HPC
    • what user characteristics are of submission process 
  • to be able to show the IO part for SoyKB
    • metrics of success
      • makespan is reduced.
      • number of service units is reduced
  • what makes an application IO intensive

August 7th, 2015

  • Pegasus 4.5.1 release
  • 4.6 common resource requirements
    • we are now exposing three pegasus profiles cores, nodes and ppn.
    • added logic to do specific translations for PBS and SGE
  • cleanup bug fixed related to DAX transfer flag for input files
    • larger question and agreement. transfer flags for input files usually don't have any meaning.
    • transfer flag should be renamed or in the API
      • change in schema 
      • at minimum we should change the DAX API's
      • transfer attribute renamed to final output? 
  • spaces in Pegasus URL
    • gideon feels it should be mod 20 instead
    • somewhere in documentation . 
      • the planner should have more specific error message in case of spaces. 
  • kickstart enhancements - gideon
    • fixing edge cases in kickstart for the extended reporting
    • what can we do with the papi performance counters and see what will be used in panorama.
    • will be updated for counters.
    • gideon and darek will try and merge

July 2015

July 31st, 2015

  • Pegasus 4.5.1 release
    • will release it next week
    • update the mapper documentation
      • have a link to the replica catalog
    • steven clarke cleanup issue
  • resource requirements
    • update the resource requirements section for 4.6
  • acme integration
    • rajiv will work with bibi to integrate it with the REST monitoring api
  • kickstart changes to get papi counters
    • Only triggered if -Z option is passed
    • the paper on xsede mentioned about them reporting per threads
    • also we make better track of threads launched by the executable
      • some edge cases for the thread case
      • double execve of process does not work currently
        • example: /usr/bin/env date
    • also record command line options for all sub process launched
      • in the proc record , the cmd tag
      • grabs only first 1K of arguments
  • monitord amqp population
    • revert back to use the event name as the routing key for AMQP population.
  • pegasus cleanup with peak storage requirements
  • Panorama
    • Data analysis done..
    • ideas about writing a paper about workflow profiles
  • Anomalies Detection
    • showing anomalies in dashboard and population in stampede schema

July 24th, 2015

  • XSEDE Tutorial
    • 2 Posters and one tutorial
    • news item online
  • Pegasus Development
    • common resource requirements PM-962
      • documentation needs to be updated
      • we have cores , hostcount
      • karan should make sure cores is translated correctly to ncpus for PBS
    • Pegasus REST API for integrating with Pegasus
    • pegasus transfer
      • checkpoint files
    • LIGO developer notion of site attribute
      • maybe we should be more clearer in the documentation
    • automatically changing parameters for memory on job retries
      • check point file for the job is a partial solution
    • monitord amqp population
      • works.. we will document it on JIRA
  • Panorama
    • Darek implemented sending messages in batches from kickstart to rabbitmq
    • socket based communication between kickstart and lib interpose . was done to take of the file interleaving issue.
    • tests on obelix and exogeni indicate socket writes are atomic for panorama message

July 17th, 2015

  • PMC Cpu affinity
  • LIGO pegasus analyzer bug
    • has been passed to LIGO . awaiting to hear from them
  • Cleanup algo
  • Resource Requirements
    • common pegasus profiles
  • SGE
    • change.dir should be set automatically for shared filesystem stuff
    • documented already.
  • kickstart path variable to prepend.
  • REST interface for monitoring for pegasus is done. Rajiv completed this week.
  • extensions to the cleanup algorithm. rafael will start working .
  • Pegasus 4.5.1 release
    • will be done after XSEDE.
  • Pegasus XSEDE tutorial
  • XSEDE Pegasus Poster
    • show a LIGO workflow for the XSEDE poster.
  • Salt configuration needs to be updated
    • Student machines on salt
  • panorama
    • rabbit mq installed on exogeni site.
    • darek will do message batching working.
    • gideon recommends doing it with the AMQP C API library
    • message interleaving in kickstart.
    • lot of unacknowledged messages in rabbit mq
  • kickstart polling loop
  • all kickstart memory values are in MB

July 10th, 2015

  • PMC jobs automatic summing of maxwalltime. Should be disabled
    • In PMC case we will do a division.
  • PMC CPU affinity for jobs PM-953
    • there might be a fragmentation approach.
  • Pegasus REST interface
    • short cut URL end points. 
    • karan will send email to Lavanya.
  • running on SGE cluster using GLite interface. 
  • harmonized pegasus profiles 
  • Metadata
    • will need the file implementation . 
  • Dashboard Panorama stuff
    • September 16th. Time series and anomaly detection.
    • Application level anomalies
    • Infrastructure level anomalies. 
    • no plans for integration in production Pegasus.
  • monitord profiling of monitord population. 
    • we want to see how long 1000 events take to be populated in case of LIGO . 
  • Panorama
    • anomaly detection
      • implemented a working prototype of threshold based anomaly detection
      • kickstart sends events to rabbit mq, then monitord populates to influx db. 
      • darek tool queries influx db and takes in the metadata file generated by pegasus and determines the anomaly and sends it back to rabbit mq
      • monitord then again picks up anomaly and populates it to stampede db for dashboard to display.

June 2015

June 12th, 2015

  • Pegasus profiles for job/resource requirements
    • postponed till next week when mats is here
    • karan to create a list of relevant profiles
  • pegasus dashboard
    • locking issue?
    • can this be related to new connection stuff or the failing tab?
    • look at connection pooling .. or maybe transactions are not being closed properly?
    • also see if there is an option for dashboard to set a read only lock when opening a connection to the databases
  • panorama workflow tests
    • failing.. but merge from master was done.
    • karan to investigate
  • panorama workflow dashboard
    • updated the job metrics tab for doing the polling
    • for mpi jobs the job name appears as aprun, since that is the process running on rank 0
  • Job Survery paper
    • Darek sent a final version
    • will be submitting next week
  • Pegasus Release timeline
    • maybe we should put on our website somewhere?
  • Rafael Energy paper
    • information about building energy profile.

June 5th, 2015

  • panorama usecase and metadata passing through
    • not done yet for the metadata associated with files with replica catalog
    • DONT rebase commits that have been pushed out
  • job.runtime, cluster.maxruntime, maxwalltime parameters
    • how to associate profiles. have a different namespace
    • how is it expose in the DAX API
  • python dependency
    • stopped support for 2.5 and 2.6
    • only affects redhead 5 systems.
    • will have to install redhat 2.6 python package on 2.5
    • setup tools for python 2.6 has to be at build time
  • pegasus-dashboard updates for LIGO
  • cleanup bug for intercept runs with InPlace cleanup.
  • S3 storage
    • about 9TB and rising for pegasus system services backup
    • right now no backups are going to go to Glacier
    • we only keep 2 weeks of data
    • glacier is good if we want to keep 6 months of data
    • 3VM' for pegasus website , CROWD etc
    • database on stewy and obelix
    • qnaps /nfs/ccg3 and /nfs/ccg4
    • Big ticket items of 9TB backup bucket in S3
    • need to keep 2 backups in S3
  • HubBub talk.
    • abstract
  • talk by Jack Donagara.

May 2015

May 29th, 2015

  • Bamboo test failures
    • condor-c tests working now. changed the site catalog for those
    • rhel5 json module
    • pegasus-transfer will do a proper check and complain for missing json module
    • mats will update documentation accordingly
  • Python Dependencies
    • New python dependency 2.6 from 2.4
    • newer versions of Fedora uses Python 3
    • Fedora will keep python 2.x support till 2020.
    • maybe have a dynamic bash wrapper across python code to pick the right python version
    • have a tool called pegasus-python??
  • concurrency limits
    • apply to bamboo machine and our other workflow hosts.
    • throttle number of grid jobs per categories of jobs. that is what SCEC wants and cannot be done.
      • unless negotiation can be employed for grid universe jobs.
      • define own throttles in compute jobs
  • pegasus-dashboard
    • LIGO has an issue with no authentication URL rendering.
  • quoting for environment
    • implemented. changed both for environment and +remote_environment
  • docker universe support
    • should work out of the box with condorio
  • new dagman default values
  • pegasus-statistiscs
    • show bad put?
  • LIGO OSG
  • Documentation
    • 10 minutes using pegasus-docbook
    • using new pipeline it uses 3 minutes
    • the hyperlinks don't work
    • include that into pegasus website template
    • In PHP we tell Google not to index old version
  • panorama

May 8th, 2015

Bamboo test failures

  • montage tests are failing because of the remote service being down
  • documentation tilte is messed up. gideon will look at it

pegasus-transfer new format

  • mats has come up with a new JSon format.
  • backward compatibility with the old format
  • create dir and cleanup jobs will be different

Metatdata

  • google doc shared with people
  • next steps are panorama use case for calling out
  • ssh cleanup . JGlobus library does not implement ftp

LIGO on XSEDE

  • have started using PMC
  • data management

Python builds

  • always check the python version.
  • if we ship our own python modules, then we may have to

Bamboo build machine

  • build and test plan ( running concurrently )
  • also we can run docker stuff
  • automate the salt setup of bamboo agents
  • maintain one OS. Can action give us a beefier VM?
  • we have too many documentation builds running ?
  • VW with bamboo agent and use docker
  • workflow tests are a separate issue
    • they don't load the bamboo machine
    • that is more related to a big condor pool.
    • workflows tests will run always out of bamboo.
  • mats and rajiv will work on it for the VM stuff.

Getting new SSL certificates

  • *.isi.edu is screwed up in firefox

Metrics Server fixes

  • google maps update broke the web UI.
  • somehow all the colors were used in the trends ?

May 1st 2015

 

  • Pegasus 4.5 release
    • not heard back from SCEC and LIGO
    • mats checked in the example
    • will add release slider
  • Variable Expansion
    • pretty much done
      • right now we have $()
      • we will change with ${env-variable}
      • have more helpful error message 
  • pegasus-kickstart
    • file does not exist. now gives a proper error
  • XSEDE poster due next week
  • Monitoring Service API
    • donald is almost done.
  • PMC with PegasusLite
    • PMC job by default runs on the shared filesystem
    • tasks in PMC are pegasus lite tasks
    • if a task does randomio, then on shared fs might be tricky
  • brazilian student contacted about pegasus application for real workflows.
  • mats will be doing the transfer events for panorama next week

April 2015

April 24th 2015

 

  • Pegasus 4.5 release
    • release candidate today rc2
    • updates to pending items
    • job throttling added to optimization guide.
    • release notes are online https://pegasus.isi.edu/news/4.5.0 
    • waiting for db-admin unit tests to be checked in.
    • pegasus-cleanup checking
    • pegasus-lite-local.sh  add some path before starting.
  • rest monitoring API
    • we have not heard back from lavanya yet
    • PNNL acme stuff
  • pegasus 4.6 release
    • common pegasus-transfer , pegasus-cleanup and pegasus-createdir
    • APP_PATH_PREPEND addon
    • pegasus worker package staging
      • planner calls out to common script to determine the worker package
      • if it does not exist , we build a default worker package on the fly 
      • add extra logic to the untar job in the
    • pegasus-gridftp modification for ssh ftp.
    • software eggs
  • panorama
  • metadata for 4.6

April 17th 2015

  • Pegasus 4.5.0 Release
    • rc1 working for hub
    • LIGO trying it out.. wanted to change checkpoint files. need to hear back on the dashboard changes.
    • SCEC ? waiting to hear from Scott
    • https://jira.isi.edu/issues/?filter=10851
    • pegasus-db-admin sqlalchemy issues? for updating tables?
    • pass through implemented for Glite to PBS
    • verification of update to pegasus version on running workflows
      • mats thinks his testing should do the trick.
  • Pegasus Dashboard for bamboo user
    • URL - https://cartman.isi.edu:5000 
      Authentication - Uses PAM Authentication 
      Admin Users - mayani, vahi, rynge, juve, rafsilva, darek, deelman
  • Cedars visit
    • SGE cluster
    • we have 3 potential SGI cluster users Cedars, Vision group at ISI and maybe Rutgers ( that will be replaced with SLURM)
  • Lavanya REST API
  • Pegasus 4.6 release
    • variable expansion thing figured out
      • argument strings in dax, profile values in the dax
      • site catalog. 
      • replica catalog file based one.
      • need to now make changes in various parsers
      • predefined environment variable
    • metadata
      • LIGO Dibbs .. ability to do data reuse based on metadata attributes
      • panorama - pegasus - aspen interface
      • iplant
        • they want in the IRODs
        • S3 tags.
      • mats wants a better idea of what it looks like in the ideal world.
    • file management on scratch directory, submit directory also?
    • implementation of the REST API
    • implementation for held job tracking
    • Panorama requirements
      • influx db monitoring , into pegasus-transfer. 
      • pegasus-transfer sends messages to rabbit mq about file size transferred
      • pegasus aspen interface ( modelling tool ) . apsen is a C++ library.. pegasus planner querying the aspen models for each node.
        • command line tool pegasus-aspen
        • planner needs to send application parameters, and all the metadata for the node.
        • gets back a list of attributes , memory and usage, and convert them internally into pegasus profiles
        • this can be a generator of metadata.
        • application model which is a file and a machine model 
      • timeseries data . monitoring data about the dashboard, anomalies 
      • there is a CEP thing that anirban is developing and will determine anomalies.
    • dv/dt requirements
      • prediction service
      • pegasus will query the prediction service

April 10th 2015

pegasus cleanup

  • gideon removed a bunch of stuff
  • will be completing the cleanup
  • pegasus-plots will be deprecated in the release notes for 4.5 release and removed for 4.6

pegasus RC1

  • built now.
  • should have created a 4.5 branch and then done a tag
  • pegasus-halt ( is it prototype )
  • pegasus-run on already running workflow
  • pegasus-db-admin missing import
  • mats will delete the rc1 branch

pegasus 4.5.0 release

  • karan will add options for pass through text for Glite options.

pegasus-db-admin

  • should be done soon

HPCC tutorial

  • send link to Fan fli from CHLA
  • vision group at ISI . former BBN people.

XSEDE paper

  • submitted to xsede
  • for journal paper, expand to pilot workflow systems. panda, swift coasters, big job

REST API

  • rajiv will add to the docbook
  • largely agree
  • uuid for the top level workflow

April 3rd, 2015

  • Pegasus 4.5 release
    • pegasus-db-admin
      • ds
    • planner will set auto update on pegasus-db-admin . and include
    • extra python modules being shipped mysql config and postgres config
      • right now on our build hosts we are building mysql and postgres.
      • RPM packaging adds dependencies automatically
      • openssl dependency
      • best option is database dependencies optional
    • targets 4.5.0 pre release candidate for thursday
    • pegasus-dashboard updates
    • pegasus-monitord failed for 4.4 runs 
    • documentation
      • fix missing references
  • REST API for monitoring workflows and jobs
    • work on it for next week.
  • questionnaire
    • 15 responses in all.
  • xsede paper
    • deadline on monday . 8 pages. 
    • have number of cores
    • no reliable way for specifying cores on OSG
  • web interface for influx db
  • permanent influx db install

 

March 2015

March 27th, 2014

  • metrics server
    • final change pushed out by donald
  • REST API
    • job monitoring API for workflow and jobs
    • will work with Rajiv
    • next week friday we will have a spec out for the API
  • Pegasus 4.5 release
    • resolving pegasus-db-admin issue
    • work on the documentation
    • should reach may first deadline
    • next week we will do a pre release for SCEC.
  • Job submission paper
    • for xsede some sections you will remove.
    • need some major modifications regarding introduction.
    • new deadline for xsede is april 6th.
  • pegasus transfer issue in google cloud vs amazon cloud
    • gsutil causes a 1 second overhead for a zero byte file. probably an authentication protocol
    • directly with wget works faster.
    • when you downloading larger files
      • huge overhead compared to 3 times in amazon.

March 20th, 2015

pegasus 4.4.2 release done

  • will be deployed by LIGO

tagged release for SCEC production runs .. we will do a pre-release candidate

metrics server

  • follow up on histogram page?
  • gideon will deploy the changes on the production machine

pegasus-db-admin

  • updates
  • dashboard and stampede expunge functions.
  • sql alchemy init and duplicate code. will enable foreign keys.
  • SQLAlchemy init interface takes a URI.

pegasus-submit-dir

  • till we come up with a better name
  • can archive, move and delete

pegasus-dashboard archive option

  • gideon will make changes to the dashboard schema.

transfer grouping in Pegasus

  • PM-829

PM-851 kickstart invoke option for auxiliary jobs

pegasus dashboard updates

  • LIGO uses for apache to use uncommon for single sign on and authentication

job submission survey short paper

  • march 30 deadline

Panorama Updates

  • wants to have a separate panorama branch
  • mpi-exec has been merged back to master.
  • similar to the adamant branch
  • rabbit mq 
    • has a rest interface
    • so easy to post http messages to it
    • uses small amount of memory
  • long term we will have pegasus-service receive the messages instead of rabbit mq. 
  • we are collecting data and share with other people in collaboration
    • http location on obelix ( the way we did for stampede)
  • real time monitoring in kickstart
    • runtime metadata and file descriptor 3 ( did for hubzero)

User Questionnaire

  • still at same place as earlier
  • gideon will send out a reminder

March 13th, 2015

  • Metrics Server
    • deployed on the production server.
    • want to do anything on basis of distribution of files
    • donald will create a new histogram page ,
  • Pegasus NSF Report
    • sent to Ewa
  • Pegasus 4.4.2 release
    • karan will check in release notes today
  • Pegasus Tutorial as part of HPC Workshop Series in April
  • Gideon will be going to the summer school.
  • Pegasus 4.5.0 release
    • Targeting May 1st release
    • local-scratch is picked up.
    • ensemble manager submission
      • will support both modes
      • bundle mode
      • public ensemble manager. there are security issues. user credentials.
      • the person who starts the service will setup the credentials
    • pegasus-analyzer fix for case where jobs eventually succeed after failures
    • pegasus-db-admin update
      • ds
    • transfer grouping of staging jobs
    • Pending items
  • User Questionnaire
    • 12 responses for
    • a lot of people are interested in a workshop
    • better support for loops and branches
    • better provenance support .
  • Workflows on Google and Amazon
    • google takes much longer to do data transfers.
    • non shared fs and shared fs
  • metadata
  • Panorama
    • Demo in September of Panorama functionality
    • getting data transfer metrics out of pegasus-transfer in structured way
    • what data we need to collect
    • for third party transfers we can do timings but not rates
    • darek is working on adding real time monitoring to pegasus-kickstart
    • pegasus transfer will communicate to pegasus-kickstart to report to a central server
      • can be a http server similar to metrics server
      • panorama is considering influx DB for real time monitoring.

March 6th, 2015

  • metrics server update
    • plans to deploy the changes today. fixing last issue
    • still has to make the database schema changes required for planner file counts
      • will be done next week
  • planner reports file breakdowns
  • pegasus 4.4.2 release
    • it has fixes LIGO is interested.
    • most probably next week.
  • pegasus-db-admin
    • reorganization of the code and the schema.
  • pegasus-archive /pegasus-delete
    • rafael does not have time to work on these because of proposal work
    • will move to either gideon or mats
  • pegasus-dashboard updates
    • has more LIGO requests for pegasus 4.5.0 release
    • wsgi script for root mode
  • LIGO visit
    • post 4.5 we will do better organization of files on the file structure
    • Pegasus poster for LIGO meeting
  • ensemble manager
    • scec folks will try it
    • monitord netlogger bugfix
  • pegasus-transfer enhancements for panorama
  • job submission paper in github
    • pegasus and job management systems.
  • online monitoring for pegasus-kickstart
    • application sends signal to pegasus-kickstart via libinterpose
  • pegasus-keg extensions
    • the pegasus-mpi-keg is a separate executable
    • extensions to the io stuff
    • will incorporate in 4.5.0
  • NSF report
    • still waiting to hear from mats and scott
    • karan is still updating the metrics page.

February 2015

Feb 20th, 2015

  • metrics server update
    • donald still has to deploy the changes.
  • pegasus user questionnaire
    • gideon will send new links and will update
  • SCEC update
    • scott has debugged his memory
  • Pegasus Report
    • soykb and other iplant workflows ... part of ECSS
    • galactic plane
    • ahmeds work
  • pegasus dashboard updates
    • pegasus-dashboard is started whenever bamboo is built up
    • dashboard show all states for a job now.
  • pegasus-db-admin tool
    • test cases in bamboo
    • documentation
    • migration notes
    • some python errors that need to be fixed.
  • 4.5 release
    • still remaining
      • held jobs tracking in monitord
    • job retry set to 1 and disable retries for DAX jobs
    • decrease the held period from one hour when job is removed.
    • improved documentation for output mappers
    • ensemble manager todo's
      • we won't have ensemble manger in multiuser mode
      • support both modes ( upload a tar file and finer grained control where he specifies the DAX files and the submit directory )
      • only the dashboard will run in multiuser mode
      • how do we start ensemble manager process
        • run as per user .
    • copying of catalog files to submit directory.
  • input directory copies based on recursive transfers as part of directory
    • it won't work in condorio mode because it flattens out
    • add type directory in the DAX schema.
  • pegasus tutorial
  • environment variable file substitution in site catalog, replica catalog and transformation catalog
  • XSEDE Tutorial proposal and Posters

January 2015

Jan 14th, 2015

  • metrics server update
    • no update from Donald still away from vacation
  • Pegasus development
    • data configuration for different sites
      • working for steven
    • held jobs
    • pegasus-dashboard
      • root mode for dashboard and ensemble manager
        • gideon needs to confirm for ensemble manger
        • done for dashboard
    • pegasus-analyzer bug fix
    • pegasus-db-admin tool update
      • unit tests
      • bamboo pool will break.
    • upgrade to newer version of Pegasus
      • what happens to running workflows
    • pegasus-statistics with PMC - Mats and Rajiv
      • mats and rajiv will work on it.
    • docker based tutorial launcher
      • how to integrate in the build process
      • form 
      • candidate machine 
        • obelix
      • vmware colo vm
      • obelix. 

  • Pegasus Poster for Si2
    • will base on the previous years.
    • any particular thing we want to focus on ? or general?
  • Pegasus Annual Report
    • User questionnaire - need to send out. 
      • list of people to send it out to .  Gideon has one.?

Jan 7th, 2015

  • metrics server update
    • no update from Donald still away from vacation

  • 4.4.1
    • installed on workflow
    • OSG and XSEDE submit hosts will be upgraded in 3 weeks
    • need to follow up with LIGO

  • database upgrade tool integration
    • documentation and manage left
    • import error for properties
    • python test case

  • support for per site data configuration
    • mostly done/ still need to figure out worker package staging for that.

  • pegasus-dashboard
    • should we show all job instances for a job.

  • held jobs logged by pegasus-monitord

  • user questionnaire

December 2014

Dec 8th, 2014

  • metrics server update
    • minor bugs in the UI... still need to be fixed, especially how the session states are handled
    • things remaining to do
      • database/server side pagination
      • figure out the scroll issue for the trend charts
      • move the trends charts from the home page to under planner and download tabs
      • rename run metrics to dagman metrics, and instead of showing the most number of times a workflow was run, we want to see the top applications for which dagman workflows were run
      • for the time bar on the top, have drop down menu for years and months
      • can the maps pin show the actual number, for example in the top downloads map thing
  • monitord fixes
    • for the race issue with postscript handling PM-798
      • had to change the way stdout and stderr is populated for job_instance. It is now populated with the POST_SCRIPT_TERMINATED event happens
  • pegasus-analyzer fixes
    • show the planner log when prescript for sub dax fails. PM-808
  • we want to release 4.4.1 before the break.
    • has monitord fixes that LIGO requires
  • tracking held jobs
    • decided to add a column in the jobstate table to capture why a job was held
  • changes to pegasus-keg
    • to simulate reading in input and writing out of output files
    • will also simulate cputime and walltime
    • initially pegasus-keg will read in and write out the outputs and then do the sleep for the cpu time duration
    • removing the system information that it prints out
    • in the mpi version, the IO is solely done by the master.

December 3rd, 2014

  • Update from Duncan on LIGO dashboard requirements
    • run a flask module from apache
    • let apache handle authentication
    • read only dashboard view
    • have a separate flask frontend.
    •  they are ok with a command line tool to remove workflow entries 
    • port collisions .. so they prefer apache to do the handling.
  • failed jdbrc unit test case
  • glite quoting for the environment
  • pegasus-dashboard delete workflows capability
  • failing workflow reporting in the dashboard
  • monitord to follow condor job log
  • db admin tool updates

November 2014

November 12th, 2014

  • DAGMan metrics reporting
    • working and completed for 4.5.0cvs
    • planned metrics
      • exclude the metrics that never ran.
      • have a drop down menu - planned , planned and run
  • RPM/ and DEB tracking for downloads
    • mats has a script that goes through the download logs to populate the server.
    • So we are tracking those now.
  • Failed data reuse regex test
    • make it a planning only test case
  • hierarchal workflows options forwarding
    • have a value of null/none
    • --inherit option with a comma separated list of long opts.
  • higher level DAX API for sub workflows ?
    • hack to figure out the command line arguments for the planner
  • Pegasus Distribute Wrapper
    • waiting to hear further from Steven
    • a /bin/bash test case
  • Metrics Server Updates by Donald
    • has the geo location running
  • DB Upgrade tool - Rafael ??

November 5th, 2014

  • DAGMan metrics reporting
    • already in recent DAGMan versions. can be enabled.
    • pegasus-run having the duplicate logic.
  • Pegasus Distribute Wrapper
    • Initial implementation done and there is an example for Steven to try out
  • Metrics Server Updates by Donald
  • DB Upgrade tool - Rafael ??

October 2014

October 29th, 2014

  • Upcoming Proposals
    • NEESGrid call
      • Robert Flashgun with Nirav..ASU stuff. Do some earthquake stuff
      • frank mckenna for nees type stuff
        • SCEC is part of the proposal
      • December 3rd due date

  • Pegasus Development
    • monitord postscript handling
    • dynamic hierarchy stuff
    • Condor C with LIGO
    • Steven Clarke Distribute Stuff
    • pegasus-hpc-cluster ( PHC )
    • DAGMan metrics

  • Kenichi Workflow
    • SNS workflow
    • Training material. 

  • Metrics UI updates
    • Trends over times
    • Geo overlay

  • Darek from Poland - A postdoc 1206
    • panorama project
  • Adaptive Workflows
    • adapting workflows... they are not converging.
    • templating workflows
    • Hopper Site Catalog
    • Sample Site Catalogs

September 2014

September 17th, 2014

  • Checkpointing feature
    • tested and implemented into pegasus
    • communicated with LIGO and John Veitch will test it next week.
    • will be run from a binary install
    • kickstart won't enforce non zero exit code for application exit code . we will require application codes to exit with non zero status.
  • Profile and Properties documentation integration
  • database schema upgrade tool
    • rafael starts working on it
  • support for google storage
    • hassan writes a paper for google storage
    • compare S3 with google storage
    • parallel uploads of chunks not supported with gsutils.. relies on a very specific python module
    • ~/.botoconfig
    • uses oath token for authentication
  • works paper revisions due oct 1st.
  • dv/dt paper has been submitted as a CS dept tech report.
  • DOE Oakridge meeting
    • interface with ASPEN ( analytical modeling ) - domain specific language for defining code.
    • combine aspen model with machine model and come up with estimates of runtimes.
    • christopher riggers from RPI models parallel storage systems.
  • Explore visualization stuff for pegasus-plots and dashboard?

August 2014

August 25th, 2014

  • Ensemble Manager - User Authentication
    • initially gideon is working on a PAM based approach
  • refactored netlogger dead code
  • Workflow Checkpointing support - ongoing
  • Google Compute Engine
    • related to google genomics
    • put in support for GCE transfer tool to interact with Google Storage ( their S3 equivalent)
    • put in credential handling in the planner.
    • fits well with long term planning for pegasus.
  • Replica Catalog Service

August 18th, 2014

  • Data Reuse Partial Mode
  • Service integration
  • Profiles and Properties Documentation
    • Scope Column in the properties documentation ( transformation, job and global )
    • in profiles documentation corresponding property key
  • pegasus-service integration
    • need to integrate the documentation
  • redhat 5 builds
    • partially... because of 2.4 installed version pegasus-s3 fail
  • authentication mechanism
  • pegasus-service-admin migrate option
  • new tool pegasus-db-admin
  • get a new 32 bit VM with cents 6.5
  • also centos 7 VM
  • add a setup task that cleans $HOME/.pegasus in bamboo infrastructure.
  • Docker Kernel Problem
    • if a docker build running and you stop the build, then the whole thing crashes
    • one solution is to upgrade the kernel version.
    • cartman OS can be changed or move the docker builds to a VM.

August 11, 2014

August 4th, 2014

  • how to handle a single job wrapping around PMC
    • will add a property to turn the wrapping off.
  • checkpointing for LIGO . synonym for checkpointing. user level state files.
    • create a JIRA item that explains that.
    • list the various cases that will be handled
      • a lot of times in case of eviction kill -9 is sent.
  • pegasus dashboard changes
    • multi tenancy for users.

June 2014

June 30th, 2014

  • pegasus-remove and pegasus-dagman. pegasus-dagman has a wait of 100 seconds before monitord is killed, when pegasus-remove is called.
  • rafael will add a workflow test case for JDBCRC
  • Still have to make a slider.
  • Karan will work on XSEDE poster for Pegasus
  • IPlant and metadata requirements.
  • pegasus-dagman / monitord /condor-dagman
    • hierarchal
    • PMC
    • GRAM

June 9th, 2014

  • 4.4 release
    • next week
    • documentation items remaining
    • JDBRC test cases and handover to SCEC

  • Dashboard improvements
    • dashboard improvements
  • Post Release Activiites
    • integrate pegasus service back into the main codebase

May 2014

May 12th, 2014

  • PM-747
    • will be used for soykb
    • test case
  • Development releases
    • 4.4
      • plan for June 20th
      • automatic data dependencies
      • wrap up existing stuff
      • documentation
      • JDBCRC change
      • documentation of FAQ's
    • 4.5
      • pegasus-service
        • some form of multi tenancy
        • python dependencies especially for external stuff is tricky
        • rename of dashboard database tables
      • pegasus-dashboard enhancements
      • separate the planning job from the prescript
      • checkpointing
      • software cleanup
      • transfers with hierarchies
      • leverage condor asynch transfers in pegasus lite
      • try for before christmas
      • 5 minute youtube video
    • 4.6
      • metadata
      • dax annotation
      • enhanced notifications
        • monitord
      • PMC data locality
      • globus online support ??
        • get credentials . at least do more research.
      • skipping symbolic links

May 5th, 2014

Condor week

  •   Lauren
    •  Karan needs to provide more documentation for her
  •  Kent Wenger
    •   dagman reporting
      •   dagman metrics files is created by newer versions of DAGMan in the submit directory.
    •  retry immediate parent
      • CMS has a requirement for this also. The most important thing on Kent's plate
  •  dynamic workflows
    •  node expansion . may not be that worthwhile
  •  pegasus lite asynch transfers
    •  using condor chirp in the pegasus lite shell script once the main computations are done. that way we can pipeline 
    •  does not work with partitionable slots
    •  does not work with condor file io

Bamboo Test Cases

  •  Job got hung for a long time??

User Survey

  • Developer Meeting will be moved to 1PM for 

April 2014

April 21st, 2014

      • Pegasus Metrics
        • ewa sent out the report for metrics to Dan. we need to get her final version.
        • JIRA metrics
          • work log feature of JIRA - everybody does not find it useful.
          • all developers need to be diligent of putting tasks into JIRA
          • sub tasks in JIRA ???
          • how to track user feature requests
        • performance improvement
          • get the data structures upto speed.
          • timing the cleanup is also important and canceling it if it goes too long
      • SI2 Tasks
        • Support Data as first class objects
          • file movement open JIRA item
          • data flow dependencies
        • Support annotations for runtime and files sizes
        • software review of streamlined
        • tutorial VM's
        • refine and document metrics
          • we have the confluence page that captures
        • metadata registration in catalogs
        • triggers for enhanced notifications for long runtimes
          • we personally feel
        • pegasus service
          • have a release and multi tenancy
          • sort out all the python stuff.
          • reconsider moving pegasus-service back into pegasus git repo
        • documentation for integrating pegasus
        • enhance feature coverage and testing framework.
          • unit test coverage
        • adopt a model on how others can contribute to pegasus
          • document the process how people can contribute.
      • Customer Survey
        • identify questions to ask.

April 14th, 2014

  • JIRA Policy Document or page
  • Pegasus Metrics
  • Pegasus Survey
    • Develop a list of questions .
    • Forward to Duncan CBC Group
  • New Default Transfer Refiner - BalancedCluster

March 2014

March 31st, 2014

  • Gideon changed the tutorial VM.
  • Put in backward support for old credential handling.
  • Mats started on an outline for the optimizations chapter.
  • next week's developer meeting is cancelled.
  • general Pegasus dependencies
    • python > 2.4 and less 3.0
    • in general, easier to build from source rather than from source RPMs
  • update Pegasus README
  • change the build.xml to say default build without docs. remove the dist-nodoc target. instead we will have ant dist-release as the default target
  • also we should start having documentation per minor release and not per major release as we do now.

March 24th, 2014

  • Pegasus 4.3.2 release done last week
  • storage constraints paper - gideon, rafael and karan worked on it.
  • karan worked on the hpc-pegasus setup.. has workflows running through PMC
  • karan and mats have a XSEDE tutorial proposal that will be submitted today
  • dv/dt paper rejected for HPDC. Will try for a middleware conference due mid may
  • 4.4 release
    • checkpointing solution
    • leaf cleanup for hierarchal workflows
    • md5checksum option for guc transfers
      • we won't follow up on kickstart generating the checksums, but tracking checksums in replica catalog.

March 17th, 2014

Agenda

  • XSEDE poster and tutorial proposal
    • will get it done this week. mats and karan will work on it.
  • idafen will work on a workshop paper for xsede on reproducibility
    • 4 page limit
    • deadline is april 5th.
  • energy simulation for SC 2014
    • measure energy when running workflows
    • try to check if energy usage changes whether data is transferred to a site, or everything is executed at one site.
  • sane defaults for 4.4 for transfer jobs, pre scripts etc
    • transfer jobs
      • how many stage in jobs - 2 jobs and each job with 2 threads.
      • how many threads each transfer jobs - pegasus-transfer has a default to 2
      • pegasuslite job
        • change sls name ? property name change
        • control the number of threads
      • add a chapter called tuning workflows
        • mats will add about a section on tuning transfers.
        • setting clustering parameters.
      • changing back the default refiner to bundle???
    • cleanup job
    • change hold release time to one hour.
  • new transfer refiner
    • maybe can use k means clustering ?
  • leaf cleanup for hierarchal workflows
    • --cleanup leaf,inplace,none
    • tell the planner to throw a warning when
  • sudharshan's paper
    • emphasize that the goal is not improving the makespan.
  • 4.3.2 release
    • release notes checked in on friday
    • mats will tag after the release.
    • the service should be installed in the tutorial VM image.
  • Condor Categories
    • similar to dagman categories.
    • will condor accounting groups work??

March 10th, 2014

Agenda

  • Should we stage sub-workflow output files to parent workflow scratch? (related to leaf cleanup)
  • Should we enable DAX jobs to have input and output uses, and distinguish between planner inputs and sub-workflow inputs?
  • SUB DAG keyword to make pegasus generated subdag submit files match with dagman version alway
  • data reuse edge case
    • have fix for it and have added unit test cases
  • altassian licenses expiring?
  • plan for a pegasus workshop / meeting for 2nd week of January 2015


March 3rd, 2014

  • monitord fix for LIGO
    • pegasus plan prescripts were not logged in the database.
  • checkpointing files
    • karan will create a JIRA item and send it to ligo folks for comment.
  • transfer fix
  • held jobs ?
  • separate pegasus plan planning jobs
    • throttle jobs via category.
  • real full ahead planning
    • plan full ahead -
    • will help in debugging workflows
  • hierarchal workflows planner arguments in the prescript wrapper shell scripts.
  • final cleanup job for the workflow
  • fix for iplant workflows cleanup. previously generated files whose locations are determined in the replica catalog should not be cleaned up

Workflow reproducability ( idafen )

  • here for 3 months - march/april and may
  • document the infrastructure that was used to generate the workflows
  • created ontologies to describe infrastructure.
  • precip API
    • expressed an interest  in it . 
    • he focuses not  on how to deploy, but instead to describe the infrastructure
    • then do experiments that take in his description and deploy it using precept
  • target two conferences
    • one systems
    • other semantic

Pegasus Submit Node on HPCC

  • waiting on glite recommendations from condor-admin

Feb 2014

February 24th, 2014

SCEC Transfer Issues

  • hpc login crashed for scec workflows because of too many stageout jobs
  • there were too many connections open at xinetd level
  • also the stageout jobs were starving all the other local universe jobs in the workflows
  • so the workflows were getting bunched at the stageout level
    • we solved it by moving only the transfers to the vanilla universe on shock
    • ran into credential handling backward compatibility we put in 4.4 after new credential handling.

Transfer Configuration for 4.4

  • by default the number of threads will be 2
  • we will expose a way via properties to increase the number if users want to have better bandwidth
  • in case of any failures, pegasus-transfer will revert back on a single thread

February 10th, 2014

Postscript handling

————————————————————————————————

 

- We have implemented a solution in PM-737 to get around condor quoting rules.

 

- MPI code are not kickstart wrapped

 - Pegasus should indicate whether a clustered job or a kickstart job.

 

- DAGMan exitcode 

 

 

checkpoint jobs

 - 10% of runtimes

 - pegasus-transfer will have to be changed

 - link is set to type checkpoint

 - transaction support for checkpoint

 - timeout  is job runtime - process

 - pegasus-kickstart timeout method

 - also has dv/dt implications for monitoring. 

 

pegasus-exitcode assumes success and checks for failure

 - refactored the script for unit tests as a library

 - pegasus-statistics

 - pegasus-analyzer  ( maybe some commonality)

 - pegasus python library has to be included in worker package

 

 

 

pegasus-transfer 

 - threads are handled similar to pegasus-s3

 - default threading

 

 - expose options end to end

 - initial threads to irods

 - what options to set

 

pegasus-config will now work with a source checkout

December 2013

December 16th, 2013

  • TODO: Talk about ADAMANT design

December 3rd, 2013

  • 4.3.1 release
    • just need to send the announcement.
    • gideon has updated the build infrastructure in bamboo to build the release
    • to do
      • do a drupal snippet, to update the downloads page automatically.
        • dynamically render the page using the shared directory in drupal.
    • pegasus-analyzer will have a recurse option.
  • identity management for pegasus service
    • portal use case
    • user authentications
    • website
      • put a token in a cookie.
    • draw bigger pictures on the identity stuff.
  • Unicore Testing

November 2013

 November 11th, 2013

  • 4.4 Planning
    • according to proposal, we need pegasus as a service, metadata registration, enhanced notifications on long runtimes etc.
    • ligo realtime analysis?
      • scott and kent mentioned that real time analysis is a priority.
      • gstreamer interface.
      • investigate streaming workflows
    • unicore testing support
  • Pegasus Tutorial on (Mats VM on oregon region)
  • Pegasus as a service
  • Ensemble Manager
    • an ensemble has no end state currently.
    • update documentation on the website
    • gideon plans to remove the upload catalog options. instead the clients will read in the properties and automatically upload.
  • NSF Cloud Proposal
    • Experiment management.... maybe does not align itself with NSF Cloud.
  • Adamant Demo
    • workflows are setup and done.

November 4th, 2013

  • Tutorial format finalized for November 14th meeting. similar to software carpentry layout
  • 4.4 release things
    • pegasus metadata support
      • dax schema changes
      • irods - support for metadata attributes
      • s3 objects - they can have tags associated with it.
    • transient replica catalog.
    • unicore support
    • for JIRA items move to the next one.
    • moteur support.
    • dv/dt wrapper support ( probably in a separate dv/dt branch)
  • move to VMWare for hosting websites
    • pegasus.isi.edu will be as a VM in a VMWARE ESX pool.
      • initially 4 VM's for Bamboo BNT
      • retire the machine for PAGE QC
    • long term we are moving to ESX

October 2013

October 1st, 2013

Pegasus 4.3 release

  • dashboard is separate
  • prepare rpm for ligo
  • ssh submission for 4.3
  • tutorial vm almost done
    • the clock issue remains. probably an issue with how virtualbox does the time.
  • need to hear back from scott
  • sepiddeh working on make flow compatible code generator.

September 2013

September 23rd, 2013

Software Carpentry followup
  • Create a pegasus youtube channel.
  • See if that can be linked from the ISI webcast page.

ISI Pegasus Workshop

  • Submit host setup at HPCC
  • specs are similar to workflow.isi.edu
  • gideon will mail to HPCC admins today about this

Tutorial VM

  • networking issue
    • persistent rules file /etc/udev/rules/70-persistent-networking.rules
    • instead of deleting it lets just disable it in our VM's
  • X with virtual box guest additions for enabling copy paste
  • turn on ntp
  • larger virtual disk - will increase the size to 8GB
  • X should just add couple of hundred MB's

Pegasus Release

  • JDBC RC
  • Tutorial VM
  • pegasus-statistics
  • pick up a release date
  • tentatively next friday i.e the 4th.

September 9th, 2013

Software Carpentry

  • Karan will prepare introductory slides for Pegasus.
  • Talk to John about providing a Pegasus submit node.
  • Rajiv will be working on the Pegasus RNASeq VM.
  • John Mehringer will go first in the second day.
  • Parking is in Levy structure in southwest corner.
  • Inquire about shuttle from Health Science Campus.
  • Still do - RNASeq module.
  • Put Information about parking and HSC Shuttle.
    • Parking Center.

Pegasus Release

  • waiting for Scott to do release testing.

Pegasus Lite Paper

  • Karan will send the camera ready version today.

Precip

  • using netlogger for logging.
  • replace python logging framework
  • incorporating events from the remote site
  • AMQP ?
    • Getting events into a common file.
  • Run montage using precip

Condo of Condos Workshop

  • Laurent and Gideon have 10 minutes each.
  • Bosco new name is MyHTC.

 

August 26th, 2013

Pegasus 4.3 release

  • dagman metrics not implemented yet by kent. still in design phase.
  • testing stuff
    • unit tests running in bamboo.
  • add missing data dependencies
    • still checks and produces errors

Precip Logging

  • getting the metrics back

Pegasus Hold

  • how to get dagman stop submitting jobs
  • idle jobs need to go on hold.
  • we can send sigusr1 to dagman.
  • need to handle hierarchal workflows.
  • JDBC RC stuff

JDBC RC

  • we will just update the existing version one.
  • have a python based RC for Replica Catalog.

Ensemble Manager Paper

  • Gideon will be working on it.

DAGMan replacement??

  • Software engg stuff.

August 19th, 2013

  • Pegasus 4.3 release
    • output mapper stuff implemented.
    • pegasus-statistics changes checked in by Rajiv
    • app metrics associated with the metrics report
      • pegasus.metrics.app
      • can be used for RNASeq tracking and other applications
      • the metrics UI will be able to filter on the name.
  • Globus Online Support - move to 4.4 release
    • can only do certain parts of transfers.
    • for transfers from local submit host , we need to use globus connect
      • credentials issue
      • for submit host, there needs a local endpoint.
  • LIGO testing ?
    • prepare a pre release RPM for LIGO 

August 12th, 2013

  • Pegasus Lite Paper
    • Wait for the Big Data and Science Workshop
  • 4.3 Release
      • Output Mapper Submission
        • error if output site and a output mapper replica catalog specified
      • Globus Online Support in pegasus-transfer
        • OAuth tokens issue.. when to get the token
        • support for multi end point with different credentials
        • probably need to do a pegasus-globus-online
          • the client needs to be blocking .
      • SSH Submission
        • Will use RNASeq for that.
      • Boto downgrade worked.
        • did not build on RHEL 5
      • Test Suite
        • Suite of integration tests
          • checksum the files
  • Ensemble Manager
    • Almost done with the first version
    • Will work on the Galactic Plane version
  • General JUnit Tests for Pegasus
  • Galactic Plane Paper

July 2013

July 29th, 2013

Software Carpentry

  • Workflows Tutorial
    • 1 hours overview of HPCC if HPCC folks are interested.
    • Pegasus Tutorial ( 2 hours )
    • An info part on where to run jobs
      • OSG
      • HPCC
      • XSEDE

  • Pegasus Development
    • Rajiv will complete the pegasus-statistics part
    • error messages ( give more hints on what went wrong on site selection )

  • Monitoring API
    • wants a jar with a simple API to monitor workflows
    • wrap it up in a jar
    • provide interface 
    • portal integration
      • rest interface for the pegasus service

July 8th, 2013

  • gideon has changes checked in dax2dot based on the closures and reductions
  • karan has checked in the LCA approach. But does not scale for our performance test case.
  • Also changed the way edges added for the create dir nodes. that will go in for 4.3.
  • Precip Paper
    • deadline extended to the 19th of July.
  • Posters to be made for XSEDE
  • Sudharshan will make a poster on his cleanup work on Monday.
    • Sudharshan will be going on Monday to campus to present the poster around 1-3PM
    • Will give a talk to CCG group Tuesday July 16th at 11:00AM
  • Currently, sudharshan's algo takes 15 seconds on a 1000 node montage workflow.


July 1st, 2013

  • monitord bug fix checked in
  • algorithm to remove extra graph dependencies
  • backups
    • we need to update the pegasus machine
      • jira, svn , website ( website and svn need to move at the same time ) , crowd updates
      • confluence was moved to another . also coordinate with action to do the move.
      • mats already updated crowd today
        • there is secret number of conf files... apache on top of tomcat
      • update to debian machines
        • obelix, cartman and stewie, and the ccg worker nodes.
  • mats has updated the bamboo tests to use new filesystem paths
  • ADAS abstract
    • for galactic plane on Amazon. if accepted due in september.
  • 4.3 release
    • fix error messages. see what can be done to improve them .
    • output replica catalog
    • pegasus-transfer tests.
    • updates to cleanup algorithm based on sudharshan's work ??
    • release notes will be updated to indicate the dashboards move to pegasus-services thing.
  • Precip Paper
    • mats will do the zotero work.
    • submitting to cloud com in bristol uk.
    • seppideh has some data on openstack. could not get all instances started up.
    • seppideh will release the token to gideon to do an edit pass
  • Cleanup Algorithm

June 2013

June 24th, 2013

  • Pegasus Development
  • Update on SCEC visit
    • pegasus-archive tool
      • archive everything other than the stampede db and braindump file
    • scott will try to cluster rupture variations for the same rupture in one task based on runtime estimates
    • the SGT will become 16 times bigger and post processing 8 times bigger on move to 1HZ. clustering rupture variations in scec code will help in reducing the number of jobs in the DAX
    • Scott tried to generate a single DAX for the post processing worklfow. Was unable to do so. Has generated two dax'es
  • Galactic Plane
    • Cut out service. Slow times on retrieving the image from S3. Small bandwith between S3 and EC2
    • Will need to have monitoring etc... Not fast enough for a webpage to be responsive.. will need some queuing up
    • Backups
      • Mats working on Kepler data.
      • mats tried backup with S3. does not like symlinks. will change the way backups are managed. the transfer times can be long.
  • Update from Sudharshan
    • Good progress. showed some simulations
  • Adamant Update
    • we are on hook for providing the interfaces in pegasus-transfer that will talk to the exo planner service
    • also provide shadow queue service, that gives estimates on jobs that will be in the queue.
    • supercomputing demo?
  • Precip Paper
    • majeick si doing some experiments

June 17th, 2013

  • Pegasus Development
    • the dax job handling is completed.
    • update on ligo front.
    • condor priorities for local universe jobs
      • not handled right now.
      • gideon has a ticket open for them.
    • gideon observation of s3
      • scalable but not good latency or
  • Pegasus Lite Paper
    • mats is almost done with the runs. to grep through the runs to get the intermediate files in and out of S3
    • not done the S3 caching for rosetta as yet. still not sure. too much work for the time remaining.
    • mats did do the runs with task clustering. he got better numbers and saw a difference in case of rosetta.
    • interleaving of compute jobs and transfers. may help montage.. but won't help rosetta
    • whether we should include the new pegasus 4.2 features.
  • Cleanup Algorithm
  • Glacier Backups for NFS?
    • instead of using two qnaps, just have one and use other for duplicates
    • we need a place for backups
    • currently the QNAPS are 18TB each with raid 6. Raid 10 is a better configuration on the QNAP according to the forums. This means though we will have half the space.
      • have one qnap for scratch
      • have other qnap for storage - the storage will be backed upto glacier. right now QNAP only support S3. Support for glacier is coming.
    • ewa and richard think glacier backups are a good option.
      • there might be a purge policy required on glacier.
  • Precip Paper
    • change tracking on
    • use dropbox
    • broadcast when you making a new version.

June 10th, 2013

- Pegasus Development

- change to dax handling

- fix of stdout 

- regex based replica catalog. 

- changes to pegasus-statistics for aggregate statistics

 

Pegasus  Lite Paper

- compute data between s3 and local disk.

- compute costs for the runs ? 

- have data outside 

- local cache for the S3 client ??  could affect the rosette cache. 

 - change the rosetta workflow.

 - if there are a lot of small files.

 - reading parts of files.

- Ewa will send her version of the changes.

 

Sudharshan Algorithm for Cleanup

  • Greedy appraoch planned
  • will try implementing a version and show the different executable workflows created


June 3rd, 2013

Pegasus Lite Paper

  • Breakdown of the runtimes , experiments
    • In case of sharedfs, the kickstart runtimes in the breakdown file will be longer
    • for the S3 case we can calculate the S3 transfer time by calculating the difference between the cumulative runtimes
    • doing two experiments rosetta(cpu intensive) and montage( io intensive)

Pegasus Development

  • Java DAX API issues
    • might be some bugs in there.

Precip Paper

  • Ewa wants a link to pegasus website in the paper.
  • have more logical thinking in the paper, like reliability and repeatability
  • Sepideh adding some new figures to the paper.
  • Maciek will provide an experiment use-case for the paper.

Stampede and Corral Annual Reports

  • Karan and Mats will be working on these

Sudarshan's Project

  • Going to look into providing a cleanup algorithm that meets a given storage constraint
  • Will look at the static problem of inserting dependencies into the workflow to achieve a solution

PMC Paper

  • on amazon
  • with clustering and pmc

Shirts

  • Should get the logo sample this week, once we approve then we can order shirts

dV/dT

  • Rafael is working on a draft of the data collection and modeling paper
  • We are planning on publishing data, will start drafting a format this week

May 2013

May 20th, 2013

Confluence is going slow. Mats is going to look.

Analytics are set up on Confluence now.

Pegasus Transfer

  • Mats committed a new version that has support for 2-stage transfers

Pegasus S3 Client

  • Gideon changed .s3cfg to .pegasus/s3cfg

Pegasus Lite Paper

  • Mats is working on the experiments
  • We have two weeks to the deadline

PMC Paper

  • Experiments on Amazon comparing Pegasus, Pegasus w/ Clustering, PMC alone

Pegasus Service

  • Finished setting up users and test suite
  • Next is a quick-and-dirty ensemble manager implementation
  • Gideon is going to commit a change to Pegasus that removes the dashboard components. They will live in the pegasus-service repository from now on.

Summer Student

  • Need to think up a project. Needs to be research-oriented and relatively small.
  • Cleanup? Precip? 

Contacting users

  • Find out if they need anything.

Examples

  • Simple examples in Perl, Python and Java
  • Gideon will add them to the examples in the pegasus Git repo

April 2013

April 22nd, 2013

Pegasus 4.2.1 Release
  • monitord prescript handling fixed
    • pegasus-analyzer should detect prescript failures, and the prescript exitstatus should be logged in the database
    • pegasus-statistics was updated for the job instance report
  • pegasus planner
    • need to confirm all checkin's are complete
  • do we want to get LIGO to do a test or just release?

Pegasus statistics across workflows - Rajiv

Pegasus Lite Paper

  • Mats will do the runs on Amazon
  • Karan will work on paper when he comes back

pegasus-hold and pegasus-release

  • any difference between doing a hold on the dagman directly or pegasus-dagman
  • we need to do more investigations on monitord

BOSCO

  • Mats is trying to run on HPCC
  • a single job is running fine.

April 8th, 2013

Pegasus 4.2.1 Release
  • Work on it towards this week
  • monitord prescript issue to fix
Pegasus 4.3

Pegasus Posters

  • One at XSEDE
  • joint one with BOSCO team

Pegasus Lite Paper

  • Submission to IEEE Big Data

New Programmer Hire

  • expanded posting on confluence
  • New Programmer Hire
  • will send out to HPC Wire , RENCI and USC SC Connect

April 1st, 2013

Pegasus Lite Paper

  • Waiting on Ewa
  • Not much we can do about the IEEE conference. The page limit is 8 , the current size of the paper.

XSEDE Poster

  • Pegasus Poster. Karan will send update
  • Also a joint Pegasus BOSCO poster
  • Also as part of that we will get the MPI workflows up and running through Pegasus and BOSCO

Pegasus Development

  • Bypass of staging input files for Pegasus Lite Case
  • Inplace cleanup bug fixes done.
  • pegasus-s3
    • gideon checked in changes of copy from one file to another
    • mats adds a pegasus transfer
  • workflow cleanup nodes
    • separate cleanup node in the workflow
    • for hierarchal workflows we only delete the outermost workflow
    • what happens if no output-site specified
      • the ligo case!
  • backward compatiblity for LIGO
  • Pegasus Dashboard
    • general javascript updates
  • Generic Pegasus Slides
    • 2-3 slides.



 

March 2013

March 25th, 2013

  • Pegasus Lite Paper Submission
  • Pegasus-statisitcs
    • Waiting on Scott to get back with the list of metrics
    • Rajiv will be working on it
  • pegasus-s3 changes
    • we want to be able to copy output files from one s3 bucket to another
    • requires changes to pegasus-transfer and pegasus-s3
  • final node for cleaning up remote directories
    • also related is getting the cleanup algorithm working when we bypass first level staging.

March 18th, 2013

  • Mats has an RPM almost sorted out for LIGO that does not require us to have PYTHONPATH set. Instead the libraries go into standard locations
  • Karan is testing this RPM at on spice-dev1 and has setup a page with instructions on how to submit a test workflow to VIRGO
  • Statistics across root workflows
    • earlier gaurang had generated statistics for scec runs by hand... executiing queries on the msql command line
    • he does not have the queries documented anywhere
    • this is something we have talked about in context of 4.3 with Rajiv
    • will follow up with scott on wednesday's call
  • 4.2.1 release
    • backward compatibility for LIGO . still to be done
    • probably next week after the pegasus annual report
    • RPM to handle native python installation
  • Pegasus Annual Report
    • Karan will work on it this week
    • Try to follow the same template as earlier.

March 4th, 2013

  • Sent link on DAGMan metrics to DAGMan Metrics Reporting to Ewa
  • Metrics for Rob Quick's workflow
  • Gideon pushed out kickstart changes
  • Rajiv has pushed changes to the queries for the dashboard.
  • Setup meeting with Jaime and Derrick at OSG AHM to discuss
    • remote_initialdir
    • extra attributes for glite/bosco submissions
    • mpi workflows.
  • OSG Poster to be made this week. And 4.2 Release slides.

February 2013

February 11th, 2013

Direct submission of workflows to PBS

  • Glite submission in Condor. We setup a VM that hosts a PBS scheduler and using that too test
  • Karan prepared an example for 4.2 that can be used to submit directly to local PBS using the glite interfaces in Condor
    • the remote_initialdir  / +remote_iwd  does not work
      • problem for MPI codes
      • for the time being, the example prepared relies on kickstart to change the directory before launching a job
    • there is also a ssh style that allows us to use BOSCO to do remote submissions using SSH to a PBS cluster
      • that one also has the issue of remote initialdir

 - jobstate.log refactoring. 

 - data transfer ( support for globus online) 

- lightweight tracing

 -  task stats. net link socket pegasus-kickstart . how much memory the task used and io used. 

 - add task stats to kickstart

 - ptrace

 - trace  linux equivalent is system tap

 

- dashboard improvements

 - single api for clients

 - last week drop down

 - performance run on large workflows.

 

February 4th, 2013

  • CCGrid / Pegasus Lite Paper
    •  Performance section
    •  remove the experiments section?
    •  OR
    •  extra experiments section 
    •  have the squid proxy cache
    • find a workshop to submit the paper
  • Cloud Paper
    •  Ewa is working on it.

  • Git HUB Migration
    •  - couple of branches like monitord , pmc and dang are branches
    •  - svn will be made read only . 
    •  - update the website with all the development information
    •  - bamboo scripts
    •  - documentation ( long term )
    •  - nightly builds
  • SSH Submission
    •  - gsissh submission for blue waters
    •  - ssh to blue waters is required for OTP
    •  - passing of parameters to PBS
    •  - SSH key
    •  - ssh agent.
    •  - queue keyword
    •  - Batch session
    •  - submit jobs to HPCC
    •  - Gideon will do that. 

  • monitord memory explosion
    •  - long term for monitord 
    •  - pegasus-dagman replacement 

  •   minor release 4.2.1
    •  - potential monitord bug issue
    •  - long term dagman replacement

  • Response time for metrics page
    •  - occasionally it is slow
  • No labels