You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 23 Next »

To facilitate evaluation of workflow algorithms and systems on a range of workflow sizes, we have developed a workflow generator. This generator uses the information gathered from actual executions of scientific workflows on the Grid as well as our understanding of the processes behind these workflows to generate synthetic workflows resembling those used by real world scientific applications.

Pegasus Workflows

Additional details about this work can be found in our publication at WORKS08 [1]

Workflow Type



The Montage application created
by NASA/IPAC stitches together multiple
input images to create
custom mosaics of the sky.

25 Node DAX
50 Node DAX
100 Node DAX
1000 Node DAX

The CyberShake workflow is used
by the Southern Calfornia Earthquake
Center to characterize
earthquake hazards in a region.

30 Node DAX
50 Node DAX
100 Node DAX
1000 Node DAX

The epigenomics workflow created
by the USC Epigenome Center
and the Pegasus Team is used to
automate various operations
in genome sequence processing.

24 Node DAX
46 Node DAX
100 Node DAX
997 Node DAX

LIGO Inspiral Analysis
LIGO's Inspiral Analysis workflow
is used to generate and
analyze gravitational waveforms
from data collected during the
coalescing of compact binary systems.

30 Node DAX
50 Node DAX
100 Node DAX
1000 Node DAX

The SIPHT workflow, from the
bioinformatics project at Harvard,
is used to automate the search for
untranslated RNAs (sRNAs) for bacterial
replicons in the NCBI database.

30 Node DAX
60 Node DAX
100 Node DAX
1000 Node DAX

A large collection of DAXes similar to the ones listed above is available here. Note that is is about 290MB.

Other Workflows

These workflows come from a report by Ramakrishnan and Gannon [2].

Workflow TypeFigureExampleDAX
LEAD Mesoscale MeteorologyFigure 1  
LEAD ARPS Data Analysis SystemFigure 2 


LEAD Data Mining WorkflowFigure 3  
Storm Surge SCOOP WorkflowFigure 4  
Floodplain MappingFigure 5  
GlimmerFigure 6  
Gene2LifeFigure 7  
Motif NetworkFigure 8  
MEME-MASTFigure 9  
Molecular SciencesFigure 10  
Avian FluFigure 11  
caDSRFigure 12  
Pan-STARRS LoadFigure 13  
Pan-STARRS MergeFigure 14  
McStasFigure 15  

The code used to generate the above DAX files was written in Python and can be downloaded here.


[1] S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M.-H. Su, and K. Vahi, “Characterization of Scientific Workflows”, 3rd Workshop on Workflows in Support of Large Scale Science (WORKS 08), 2008.

[2] L. Ramakrishnan and D. Gannon, "A Survey of Distributed Workflow Characteristics and Resource Requirements", University of Indiana Technical Report TR671, 2008.

  • No labels