How to represent a Bash script using the Pegasus Python API.
On the left we describe the script to convert, and on the right, we see the concrete version of the script.
Abstract Bash Script (main.sh)
# Current Working Dir: /lfs/sorter # Contents of alpha.txt # 2 # 1 # 3 # Input File: alpha.txt # Output File: beta.txt # Stderr: gamma.txt # Executable: custom-sort, configured using CUSTOM_SORT_TYPE env. variable # Requirements: Min. 128 MB memory # Input file contains numeric data, so sort it numerically export CUSTOM_SORT_TYPE="numeric" bin/custom-sort -i alpha.txt -o beta.txt 2> gamma.txt
Concrete Bash Script (main.sh)
# Current Working Dir: /lfs/sorter # Contents of alpha.txt # 2 # 1 # 3 # Input File: alpha.txt # Output File: beta.txt # Stderr: gamma.txt # Executable: custom-sort, configured using CUSTOM_SORT_TYPE env. variable # Requirements: Min. 128 MB memory # Input file contains numeric data, so sort it numerically export CUSTOM_SORT_TYPE="numeric" /lfs/sorter/bin/custom-sort -i /lfs/sorter/alpha.txt -o /lfs/sorter/beta.txt 2> /lfs/sorter/gamma.txt
Script
Command Invocation in Python
from Pegasus.api import Workflow wf = Workflow("main-sh", infer_dependencies=True)
Command Invocation
Command Invocation in Bash
# Requirements: Min. 128 MB memory export CUSTOM_SORT_TYPE="numeric" bin/custom-sort -i alpha.txt -o output.txt 2> log.txt
Command Invocation in Python
from Pegasus.api import File, Job alpha_txt = File("alpha.txt") beta_txt = File("beta.txt") gamma_txt = File("gamma.txt") custom_sort_job = ( Job("custom-sort-exec") .add_args("-i", alpha_txt, "-o", beta_txt) .add_inputs(alpha_txt) .add_outputs(beta_txt) .set_stderr(gamma_txt) .add_env(CUSTOM_SORT_TYPE="numeric") .add_pegasus_profile(memory="128 MB") ) wf.add_jobs(custom_sort_job)
Site Catalog
Transformation Catalog (Executables)
Handling Executables
# Abstract version --> bin/custom-sort <-- -i alpha.txt -o output.txt 2> log.txt # Concrete version --> /lfs/sorter/bin/custom-sort <-- -i /lfs/sorter/alpha.txt -o /lfs/sorter/beta.txt 2> /lfs/sorter/gamma.txt
Handling Executables
from Pegasus.api import TransformationCatalog, Transformation tc = TransformationCatalog() custom_sort_exec = Transformation( site="condorpool", --> name="custom-sort-exec", <-- --> pfn="/lfs/sorter/bin/custom-sort", <-- is_stageable=True, ) tc.add_transformations(custom_sort_exec)
Replica Catalog (Input Files)
Handling Input Files
# Abstract version bin/custom-sort -i --> alpha.txt <-- -o output.txt 2> log.txt # Concrete version /lfs/sorter/bin/custom-sort -i --> /lfs/sorter/alpha.txt <-- -o /lfs/sorter/beta.txt 2> /lfs/sorter/gamma.txt
Handling Input Files
from Pegasus.api import ReplicaCatalog rc = ReplicaCatalog() rc.add_replica( site="local", --> lfn="alpha.txt", <-- --> pfn="/lfs/sorter/alpha.txt", <-- ) wf.add_replica_catalog(rc)
Run the script
Execute the Script
./main.sh
Run the Workflow
wf.plan().run().wait()