Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

No Format
ALLOW_WRITE = <what it was before>, \*.compute-1.amazonaws.com

...

Go to the "AMIs" area in the console.
We are going to launch ami-06dd226f858741ec. Filter by "Public Images" and "CentOS" using the drop-downs, type "ami-06dd226f858741ec" into the text box and hit 'Refresh'. It may take a few seconds to give you a list.
Select the one called "405596411149/centos-5.6-x86_64-pegasus-cloud-tutorial-2" and click "Launch".
A launch wizard will pop up.
Select the number of instances (1), and instance type (m1.large), then "Continue".
On the "Advanced Instance Options" page add the following to "User Data" and hit "Continue" (note: host.example.com should be replaced with your submit host):

...

ALSO IMPORTANT: The "User Data" is how you tell the image what to do. This will be copied directly into the Condor configuration file. You can define any extra configuration values you like, but you must specify at least CONDOR_HOST.

7. Log into

...

the node

This is how you SSH to a node you launched.

...

VERY IMPORTANT: Make sure you log in ssh from your submit host otherwise this won't work because the security group does will not match.

8. Check your submit host

...

You should see something that looks like this:

No Format

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@50.16.16.204 LINUX      X86_64 Unclaimed Idle     0.080  3843  0+00:00:04
slot2@50.16.16.204 LINUX      X86_64 Unclaimed Idle     0.000  3843  0+00:00:05
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX     2     0       0         2       0          0        0

               Total     2     0       0         2       0          0        0

...

Make sure the workers are usable
Once the workers show up in condor_status you can test to make sure they
will run jobs.
Create a file called "vanilla.sub" on your submit host with this inside:

No Format

universe = vanilla
executable = /bin/hostname
transfer_executable = false
output = test_$(cluster).$(process).out
error = test_$(cluster).$(process).err
log = test_$(cluster).$(process).log
requirements = (Arch == Arch) && (OpSys == OpSys) && (Disk \!= 0) && (Memory \!= 0)
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
copy_to_spool = false
notification = NEVER
queue 1

...

Add an ec2 site to your sites.xml (note: this is the old XML format, modify as needed if your application uses the new format):

No Format

<site handle="ec2" sysinfo="INTEL32::LINUX">
    <!-- This is where pegasus is installed in the VM -->
    <profile namespace="env" key="PEGASUS_HOME">/usr/local/pegasus/default</profile>

    <!-- Just in case you need to stage data via GridFTP -->
    <profile namespace="env" key="GLOBUS_LOCATION">/usr/local/globus/default</profile>
    <profile namespace="env" key="LD_LIBRARY_PATH">/usr/local/globus/default/lib</profile>

    <!-- Some misc. pegasus settings -->
    <profile namespace="pegasus" key="stagein.clusters">1</profile>
    <profile namespace="pegasus" key="stageout.clusters">1</profile>
    <profile namespace="pegasus" key="transfer.proxy">true</profile>

    <!-- These cause Pegasus to generate vanilla universe jobs -->
    <profile namespace="pegasus" key="style">glidein</profile>
    <profile namespace="condor" key="universe">vanilla</profile>
    <profile namespace="condor" key="requirements">(Arch==Arch)&amp;&amp;(Disk!=0)&amp;&amp;(Memory!=0)&amp;&amp;(OpSys==OpSys)&amp;&amp;(FileSystemDomain!="")</profile>
    <profile namespace="condor" key="rank">SlotID</profile>

    <!-- These are not actually needed, but they are required by the site catalog format -->
    <lrc url="rls://example.com"/>
    <gridftp url="file://" storage="" major="2" minor="4" patch="0"/>
    <jobmanager universe="vanilla" url="example.com/jobmanager-pbs" major="2" minor="4" patch="3"/>
    <jobmanager universe="transfer" url="example.com/jobmanager-fork" major="2" minor="4" patch="3"/>

    <!-- Where the data will be stored on the worker node -->
    <workdirectory>/mnt</workdirectory>
</site>

In your pegasus.properties file, make sure you disable thirdparty transfer mode:

No Format

# Comment-out the next line to run on site "ec2"
#pegasus.transfer.*.thirdparty.sites=*

...

This tutorial only shows you how to set up a single worker node. In order to scale-up your workflows you will need to either a) set up a shared file system such as NFS on EC2, or b) configure Pegasus to use Condor file transfers for each job.