This tutorial takes you step-by-step through the process of setting up your environment a worker node to run workflows on Amazon EC2 using Pegasus. We assume you are already familiar with Pegasus and have a workflow that already runs on a local cluster or grid site. This tutorial will show you how to make that workflow run on an Amazon node.
1. Get an AWS account
You need this before you can do anything
Note that those are CIDR addresses, so don't forget the /32. Also, don't forget to replace 192.168.1.1 with your submit host IP.
6. Launch the Pegasus public image
Go to the "AMIs" area in the console.
We are going to launch ami-06dd226f. Filter by "Public Images" and "CentOS" using the drop-downs, type "ami-06dd226f" into the text box and hit 'Refresh'. It may take a few seconds to give you a list.
Select the one called "405596411149/centos-5.6-x86_64-cloud-tutorial" and click "Launch".
A launch wizard will pop up.
Select the number of instances (1 for now), and instance type (m1.large), then "Continue".
On the "Advanced Instance Options" page add the following to "User Data" and hit "Continue" (note: host.example.com should be replaced with your submit host):
Add an ec2 site to your sites.xml (note: this is the old XML format, modify as needed if your application uses the new format):
<site handle="ec2" sysinfo="INTEL32::LINUX"> <!-- This is where pegasus is installed in the VM --> <profile namespace="env" key="PEGASUS_HOME">/usr/local/pegasus/default</profile> <!-- Just in case you need to stage data via GridFTP --> <profile namespace="env" key="GLOBUS_LOCATION">/usr/local/globus/default</profile> <profile namespace="env" key="LD_LIBRARY_PATH">/usr/local/globus/default/lib</profile> <!-- Some misc. pegasus settings --> <profile namespace="pegasus" key="stagein.clusters">1</profile> <profile namespace="pegasus" key="stageout.clusters">1</profile> <profile namespace="pegasus" key="transfer.proxy">true</profile> <!-- These cause Pegasus to generate vanilla universe jobs --> <profile namespace="pegasus" key="style">glidein</profile> <profile namespace="condor" key="universe">vanilla</profile> <profile namespace="condor" key="requirements">(Arch==Arch)&&(Disk!=0)&&(Memory!=0)&&(OpSys==OpSys)&&(FileSystemDomain!="")</profile> <profile namespace="condor" key="rank">SlotID</profile> <!-- These are not actually needed, but they are required by the site catalog format --> <lrc url="rls://example.com"/> <gridftp url="file://" storage="" major="2" minor="4" patch="0"/> <jobmanager universe="vanilla" url="example.com/jobmanager-pbs" major="2" minor="4" patch="3"/> <jobmanager universe="transfer" url="example.com/jobmanager-fork" major="2" minor="4" patch="3"/> <!-- Where the data will be stored on the worker node --> <workdirectory>/mnt</workdirectory> </site>
In your pegasus.properties file, make sure you disable thirdparty transfer mode:
# Comment-out the next line to run on site "ec2"
If you installed your application code in the image, then modify your Transformation Catalog to include the new entries. (Tip: Make sure the sysinfo of your "ec2" site matches the new transformations you add to the TC)
If you run into any problems during planning, debug them before moving on to the next step.
If you have problems contact: firstname.lastname@example.org
14. Launch a
You will do basically the same thing you did to launch the first worker.
This time you will start a virtual cluster with 2 nodes. Instead Instead of using the Pegasus image, use the new image you created earlier.
In the "AMIs" area select your new image and click "Launch".
Select 2 instances, m1.large.
Set "User Data" to:
CONDOR_HOST = host.example.com
VERY IMPORTANT: Don't just copy-paste the above, you need to replace "host.example.com" with the actual DNS name of your submit host.
Choose your keypair and security group as before and launch the clusternode.
Wait until you see the workers show up in condor_status before proceeding. You should see twice as many as you did last time. You may want to run your vanilla.sub test job again to make sure they work.
VERY IMPORTANT: You are virtually guaranteed to have problems at this point because of the large number of possible configurations required by your application. Please contact us and we will help.
Hopefully your workflow will run to completion. When you are finished make sure you terminate any running instances in the "Instances" area of the console.
17. Next steps
This tutorial only shows you how to set up a single worker node. In order to scale-up your workflows you will need to either a) set up a shared file system such as NFS on EC2, or b) configure Pegasus to use Condor file transfers for each job.