How to represent a Bash script using the Pegasus Python API.

On the left we describe the script to convert, and on the right, we see the concrete version of the script.

Abstract Bash Script (main.sh)
# Current Working Dir: /lfs/sorter

# Contents of alpha.txt
# 2
# 1
# 3

# Input File: alpha.txt
# Output File: beta.txt
# Stderr: gamma.txt
# Executable: custom-sort, configured using CUSTOM_SORT_TYPE env. variable
# Requirements: Min. 128 MB memory

# Input file contains numeric data, so sort it numerically
export CUSTOM_SORT_TYPE="numeric"
bin/custom-sort -i alpha.txt -o beta.txt 2> gamma.txt
Concrete Bash Script (main.sh)
# Current Working Dir: /lfs/sorter

# Contents of alpha.txt
# 2
# 1
# 3

# Input File: alpha.txt 
# Output File: beta.txt
# Stderr: gamma.txt
# Executable: custom-sort, configured using CUSTOM_SORT_TYPE env. variable
# Requirements: Min. 128 MB memory

# Input file contains numeric data, so sort it numerically
export CUSTOM_SORT_TYPE="numeric"
/lfs/sorter/bin/custom-sort -i /lfs/sorter/alpha.txt -o /lfs/sorter/beta.txt 2> /lfs/sorter/gamma.txt

Script


Command Invocation in Python
from Pegasus.api import Workflow

wf = Workflow("main-sh", infer_dependencies=True)

Command Invocation

Command Invocation in Bash
# Requirements: Min. 128 MB memory
export CUSTOM_SORT_TYPE="numeric"
bin/custom-sort -i alpha.txt -o output.txt 2> log.txt
Command Invocation in Python
from Pegasus.api import File, Job

alpha_txt = File("alpha.txt")
beta_txt = File("beta.txt")
gamma_txt = File("gamma.txt")

custom_sort_job = (
    Job("custom-sort-exec")
        .add_args("-i", alpha_txt, "-o", beta_txt)
        .add_inputs(alpha_txt)
        .add_outputs(beta_txt)
        .set_stderr(gamma_txt)
        .add_env(CUSTOM_SORT_TYPE="numeric")
        .add_pegasus_profile(memory="128 MB")
)

wf.add_jobs(custom_sort_job)

Site Catalog



Transformation Catalog (Executables)

Handling Executables
# Abstract version
--> bin/custom-sort <-- -i alpha.txt -o output.txt 2> log.txt

# Concrete version
--> /lfs/sorter/bin/custom-sort <-- -i /lfs/sorter/alpha.txt -o /lfs/sorter/beta.txt 2> /lfs/sorter/gamma.txt
Handling Executables
from Pegasus.api import TransformationCatalog, Transformation

tc = TransformationCatalog()

custom_sort_exec = Transformation(
	site="condorpool",  
--> name="custom-sort-exec", <--
--> pfn="/lfs/sorter/bin/custom-sort", <--
    is_stageable=True,
)

tc.add_transformations(custom_sort_exec)

Replica Catalog (Input Files)

Handling Input Files
# Abstract version
bin/custom-sort -i --> alpha.txt <-- -o output.txt 2> log.txt

# Concrete version
/lfs/sorter/bin/custom-sort -i --> /lfs/sorter/alpha.txt <-- -o /lfs/sorter/beta.txt 2> /lfs/sorter/gamma.txt
Handling Input Files
from Pegasus.api import ReplicaCatalog

rc = ReplicaCatalog()

rc.add_replica(
    site="local", 
--> lfn="alpha.txt", <--
--> pfn="/lfs/sorter/alpha.txt", <--
)

wf.add_replica_catalog(rc)

Run the script

Execute the Script
./main.sh 
Run the Workflow
wf.run().wait()
  • No labels