Swift for the Cloud
-------------------

Modes of operation
------------------

1. Static mode: You define your cluster ahead of swift runs.
2. Dynamic mode: Cloud resources provisioned dynamically.

Swift installation
~~~~~~~~~~~~~~~~~~

Prerequisites:
Java 1.7
Ant
Python 2.7
The following steps
[source, bash]
-----
# Install swift-trunk from git
https://github.com/swift-lang/swift-k.git
# Extract package
tar xfz swift-0.95-RC6.tar.gz
# Add swift to the PATH environment variable
export PATH=$PATH:/path/to/swift-0.95-RC6/bin
-----

Get the swift-on-cloud repository
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Clone the repository from github
[source,bash]
----
git clone https://github.com/yadudoc/swift-on-cloud.git
cd swift-on-cloud
----

Or, download the zip file from github and unpack.
[source,bash]
----
# Download
wget https://github.com/yadudoc/swift-on-cloud/archive/master.zip
unzip master.zip
mv swift-on-cloud-master swift-on-cloud
cd swift-on-cloud
----

Run the swift-cloud-tutorial from the cloud
-------------------------------------------

To run the tutorial on Google Compute Engine (GCE), follow the instructions here: +
    https://github.com/yadudoc/swift-on-cloud/tree/master/compute-engine +
or, follow instructions for GCE, in the compute-engine folder of the swift-on-cloud
repository. Once your instances are running, connect to the headnode. Everthing
that you require for the swift-cloud-tutorial is already set up for you on the headnode.


Simple "science applications" for the workflow tutorial
-------------------------------------------------------

This tutorial is based on two intentionally trivial example programs,
`simulation.sh` and `stats.sh`, (implemented as bash shell scripts)
that serve as easy-to-understand proxies for real science
applications. These "programs" behave as follows.

simulate.sh
~~~~~~~~~~~

The simulation.sh script serves as a trivial proxy for any more
complex scientific simulation application. It generates and prints a
set of one or more random integers in the range [0-2^62) as controlled
by its command line arguments, which are:

-----
$ ./app/simulate.sh --help
./app/simulate.sh: usage:
    -b|--bias       offset bias: add this integer to all results [0]
    -B|--biasfile   file of integer biases to add to results [none]
    -l|--log        generate a log in stderr if not null [y]
    -n|--nvalues    print this many values per simulation [1]
    -r|--range      range (limit) of generated results [100]
    -s|--seed       use this integer [0..32767] as a seed [none]
    -S|--seedfile   use this file (containing integer seeds [0..32767]) one per line [none]
    -t|--timesteps  number of simulated "timesteps" in seconds (determines runtime) [1]
    -x|--scale      scale the results by this integer [1]
    -h|-?|?|--help  print this help
$
-----

All of thess arguments are optional, with default values indicated above as `[n]`.

////
.simulation.sh arguments
[width="80%",cols="^2,10",options="header"]

|=======================
|Argument|Short|Description
|1    |runtime: sets run time of simulation.sh in seconds
|2    |range: limits generated values to the range [0,range-1]
|3    |biasfile: add the integer contained in this file to each value generated
|4    |scale: multiplies each generated value by this integer
|5    |count: number of values to generate in the simulation
|=======================
////

With no arguments, simulate.sh prints 1 number in the range of
1-100. Otherwise it generates n numbers of the form (R*scale)+bias
where R is a random integer. By default it logs information about its
execution environment to stderr.  Here's some examples of its usage:

-----
$ simulate.sh 2>log
       5
$ head -4 log

Called as: /home/wilde/swift/tut/CIC_2013-08-09/app/simulate.sh:
Start time: Thu Aug 22 12:40:24 CDT 2013
Running on node: login01.osgconnect.net

$ simulate.sh -n 4 -r 1000000 2>log
  239454
  386702
   13849
  873526

$ simulate.sh -n 3 -r 1000000 -x 100 2>log
 6643700
62182300
 5230600

$ simulate.sh -n 2 -r 1000 -x 1000 2>log
  565000
  636000

$ time simulate.sh -n 2 -r 1000 -x 1000 -t 3 2>log
  336000
  320000
real    0m3.012s
user    0m0.005s
sys     0m0.006s
-----

stats.sh
~~~~~~~~

The stats.sh script serves as a trivial model of an "analysis"
program. It reads N files each containing M integers and simply prints
the\ average of all those numbers to stdout. Similarly to simulate.sh
it logs environmental information to the stderr.

-----
$ ls f*
f1  f2  f3  f4

$ cat f*
25
60
40
75

$ stats.sh f* 2>log
50
-----


Basic of the Swift language with local execution
------------------------------------------------

A Summary of Swift in a nutshell
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Swift scripts are text files ending in `.swift` The `swift` command
runs on any host, and executes these scripts. `swift` is a Java
application, which you can install almost anywhere.  On Linux, just
unpack the distribution `tar` file and add its `bin/` directory to
your `PATH`.

* Swift scripts run ordinary applications, just like shell scripts
do. Swift makes it easy to run these applications on parallel and
remote computers (from laptops to supercomputers). If you can `ssh` to
the system, Swift can likely run applications there.

* The details of where to run applications and how to get files back
and forth are described in configuration files separate from your
program. Swift speaks ssh, PBS, Condor, SLURM, LSF, SGE, Cobalt, and
Globus to run applications, and scp, http, ftp, and GridFTP to move
data.

* The Swift language has 5 main data types: `boolean`, `int`,
`string`, `float`, and `file`. Collections of these are dynamic,
sparse arrays of arbitrary dimension and structures of scalars and/or
arrays defined by the `type` declaration.

* Swift file variables are "mapped" to external files. Swift sends
files to and from remote systems for you automatically.

* Swift variables are "single assignment": once you set them you can't
change them (in a given block of code).  This makes Swift a natural,
"parallel data flow" language. This programming model keeps your
workflow scripts simple and easy to write and understand.

* Swift lets you define functions to "wrap" application programs, and
to cleanly structure more complex scripts. Swift `app` functions take
files and parameters as inputs and return files as outputs.

* A compact set of built-in functions for string and file
manipulation, type conversions, high level IO, etc. is provided.
Swift's equivalent of `printf()` is `tracef()`, with limited and
slightly different format codes.

* Swift's `foreach {}` statement is the main parallel workhorse of the
language, and executes all iterations of the loop concurrently. The
actual number of parallel tasks executed is based on available
resources and settable "throttles".

* In fact, Swift conceptually executes *all* the statements,
expressions and function calls in your program in parallel, based on
data flow. These are similarly throttled based on available resources
and settings.

* Swift also has `if` and `switch` statements for conditional
execution. These are seldom needed in simple workflows but they enable
very dynamic workflow patterns to be specified.

We'll see many of these points in action in the examples below. Lets
get started!

Part 1: Run a single application under Swift
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The first swift script, p1.swift, runs simulate.sh to generate a
single random number. It writes the number to a file.

image::part01.png["p1 workflow",align="center"]

.p1.swift
-----
sys::[cat ../part01/p1.swift]
-----


To run this script, run the following command:
-----
$ cd part01
$ swift p1.swift
Swift 0.94.1 RC2 swift-r6895 cog-r3765

RunID: 20130827-1413-oa6fdib2
Progress:  time: Tue, 27 Aug 2013 14:13:33 -0500
Final status: Tue, 27 Aug 2013 14:13:33 -0500  Finished successfully:1
$ cat sim.out 
      84
$ swift p1.swift
$ cat sim.out 
      36
-----

To cleanup the directory and remove all outputs (including the log
files and directories that Swift generates), run the cleanup script
which is located in the tutorial PATH:

[source,bash]
-----
$ cleanup
-----

NOTE: You'll also find two Swift configuration files in each `partNN`
directory of this tutorial.  These specify the environment-specific
details of where to find application programs (file `apps`) and where
to run them (file `sites.xml`).  These files will be explained in more
detail in parts 4-6, and can be ignored for now.

////
It defines
things like the work directory, the scheduler to use, and how to
control parallelism. The sites.xml file below will tell Swift to run
on the local machine only, and run just 1 task at a time.

.swift.properties
-----
sys::[cat ../part01/swift.properties]
-----

 In this case, it
indicates that the app "simulate" (the first token in the command line
declaration of the function `simulation`, at line NNN) is located in the file
simulate.sh and (since the path `simulate.sh` is specified with no
directory components) Swift expects that the `simulate.sh` executable
will be available in your $PATH.

.apps
-----
sys::[cat ../part01/apps]
-----

////

Part 2: Running an ensemble of many apps in parallel with a "foreach" loop
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The `p2.swift` script introduces the `foreach` parallel iteration
construct to run many concurrent simulations.

image::part02.png[align="center"]

.p2.swift
-----
sys::[cat ../part02/p2.swift]
-----

The script also shows an
example of naming the output files of an ensemble run. In this case, the output files will be named
`output/sim_N.out`.

In part 2, we also update the apps file. Instead of using shell script (simulate.sh), we use
the equivalent python version (simulate.py). The new apps file now looks like this:

-----
sys::[cat ../part02/apps]
-----

Swift does not need to know anything about the language an application is written in. The application 
can be written in Perl, Python, Java, Fortran, or any other language.

To run the script and view the output:
-----
$ cd ../part02
$ swift p2.swift
$ ls output
sim_0.out  sim_1.out  sim_2.out  sim_3.out  sim_4.out  sim_5.out  sim_6.out  sim_7.out  sim_8.out  sim_9.out
$ more output/*
::::::::::::::
output/sim_0.out
::::::::::::::
      44
::::::::::::::
output/sim_1.out
::::::::::::::
      55
...
::::::::::::::
output/sim_9.out
::::::::::::::
      82
-----

Part 3: Analyzing results of a parallel ensemble
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

After all the parallel simulations in an ensemble run have completed,
its typically necessary to gather and analyze their results with some
kind of post-processing analysis program or script.  p3.swift
introduces such a postprocessing step. In this case, the files created
by all of the parallel runs of `simulation.sh` will be averaged by by
the trivial "analysis application" `stats.sh`:

image::part03.png[align="center"]

.p3.swift
----
sys::[cat ../part03/p3.swift]
----

To run:
----
$ cd part03
$ swift p3.swift
----

Note that in `p3.swift` we expose more of the capabilities of the
`simulate.sh` application to the `simulation()` app function:

-----
app (file o) simulation (int sim_steps, int sim_range, int sim_values)
{
  simulate "--timesteps" sim_steps "--range" sim_range "--nvalues" sim_values stdout=filename(o);
}
-----

`p3.swift` also shows how to fetch application-specific values from
the `swift` command line in a Swift script using `arg()` which
accepts a keyword-style argument and its default value:

-----
int nsim   = toInt(arg("nsim","10"));
int steps  = toInt(arg("steps","1"));
int range  = toInt(arg("range","100"));
int values = toInt(arg("values","5"));
-----

Now we can specify that more runs should be performed and that each should run for more timesteps, and produce more that one value each, within a specified range, using command line arguments placed after the Swift script name in the form `-parameterName=value`:

-----
$ swift p3.swift -nsim=3 -steps=10 -values=4 -range=1000000

Swift 0.94.1 RC2 swift-r6895 cog-r3765

RunID: 20130827-1439-s3vvo809
Progress:  time: Tue, 27 Aug 2013 14:39:42 -0500
Progress:  time: Tue, 27 Aug 2013 14:39:53 -0500  Active:2  Stage out:1
Final status: Tue, 27 Aug 2013 14:39:53 -0500  Finished successfully:4

$ ls output/
average.out  sim_0.out  sim_1.out  sim_2.out
$ more output/*
::::::::::::::
output/average.out
::::::::::::::
651368
::::::::::::::
output/sim_0.out
::::::::::::::
  735700
  886206
  997391
  982970
::::::::::::::
output/sim_1.out
::::::::::::::
  260071
  264195
  869198
  933537
::::::::::::::
output/sim_2.out
::::::::::::::
  201806
  213540
  527576
  944233
-----

Now try running (`-nsim=`) 100 simulations of (`-steps=`) 1 second each:

-----
$ swift p3.swift -nsim=100 -steps=1 
Swift 0.94.1 RC2 swift-r6895 cog-r3765

RunID: 20130827-1444-rq809ts6
Progress:  time: Tue, 27 Aug 2013 14:44:55 -0500
Progress:  time: Tue, 27 Aug 2013 14:44:56 -0500  Selecting site:79  Active:20  Stage out:1
Progress:  time: Tue, 27 Aug 2013 14:44:58 -0500  Selecting site:58  Active:20  Stage out:1  Finished successfully:21
Progress:  time: Tue, 27 Aug 2013 14:44:59 -0500  Selecting site:37  Active:20  Stage out:1  Finished successfully:42
Progress:  time: Tue, 27 Aug 2013 14:45:00 -0500  Selecting site:16  Active:20  Stage out:1  Finished successfully:63
Progress:  time: Tue, 27 Aug 2013 14:45:02 -0500  Active:15  Stage out:1  Finished successfully:84
Progress:  time: Tue, 27 Aug 2013 14:45:03 -0500  Finished successfully:101
Final status: Tue, 27 Aug 2013 14:45:03 -0500  Finished successfully:101
-----

We can see from Swift's "progress" status that the tutorial's default
`swift.properties` parameters for local execution allow Swift to run up to 20
application invocations concurrently on the login node. We'll look at
this in more detail in the next sections where we execute applications
on the site's compute nodes.


Running applications on compute nodes with Swift
------------------------------------------------

Part 4: Running a parallel ensemble on compute nodes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`p4.swift` will run our mock "simulation"
applications on compute nodes.  The script is similar to as
`p3.swift`, but specifies that each simulation app invocation should
additionally return the log file which the application writes to
`stderr`.

////

FIXME: need to revise this figure: drop prog:

, making the parallel portion of the script behave like this:

image::part04.png[align="center"]

.p4.swift
----
sys::[cat ../part04/p4.swift]
----
////

Now when you run `swift p4.swift` you'll see that two types output
files will placed in the `output/` directory: `sim_N.out` and
`sim_N.log`.  The log files provide data on the runtime environment of
each app invocation. For example:

-----
$ cat output/sim_0.log 

Called as: simulate.sh: --timesteps 1 --range 100 --nvalues 5

Start time: Tue Oct 22 14:54:11 CDT 2013
Running as user: uid=5116(davidk) gid=311(collab) groups=311(collab),104(fuse),1349(swift),45053(swat)
Running on node: stomp
Node IP address: 140.221.9.237 

Simulation parameters:

bias=0
biasfile=none
initseed=none
log=yes
paramfile=none
range=100
scale=1
seedfile=none
timesteps=1
output width=8

Environment:

EDITOR=vim
HOME=/homes/davidk
JAVA_HOME=/nfs/proj-davidk/jdk1.7.0_01
LANG=C
....
-----

/////
To tell Swift to run the apps on compute nodes, we specify in the
`apps` file that the apps should be executed on the `cloud` site
(instead of the `localhost` site).  We can specify the location of
each app in the third field of the `apps` file, with either an
absolute pathname or the name of an executable to be located in
`PATH`). Here we use the latter form:

-----
$ cat apps
cloud simulate simulate.sh
cloud stats    stats.sh
-----

You can experiment, for example, with an alternate version of stats.sh by specfying that app's location explicitly:

-----
$ cat apps
cloud simulate simulate.sh
cloud stats    /home/users/p01532/bin/my-alt-stats.sh
-----

We can see that when we run many apps requesting a larger set of nodes (6), we are indeed running on the compute nodes:
-----
$ swift p4.swift -nsim=1000 -steps=1 
Swift 0.94.1 RC2 swift-r6895 cog-r3765

RunID: 20130827-1638-t23ax37a
Progress:  time: Tue, 27 Aug 2013 16:38:11 -0500
Progress:  time: Tue, 27 Aug 2013 16:38:12 -0500  Initializing:966
Progress:  time: Tue, 27 Aug 2013 16:38:13 -0500  Selecting site:499  Submitting:500  Submitted:1
Progress:  time: Tue, 27 Aug 2013 16:38:14 -0500  Selecting site:499  Stage in:1  Submitted:500
Progress:  time: Tue, 27 Aug 2013 16:38:16 -0500  Selecting site:499  Submitted:405  Active:95  Stage out:1
Progress:  time: Tue, 27 Aug 2013 16:38:17 -0500  Selecting site:430  Submitted:434  Active:66  Stage out:1  Finished successfully:69
Progress:  time: Tue, 27 Aug 2013 16:38:18 -0500  Selecting site:388  Submitted:405  Active:95  Stage out:1  Finished successfully:111
...
Progress:  time: Tue, 27 Aug 2013 16:38:30 -0500  Stage in:1  Submitted:93  Active:94  Finished successfully:812
Progress:  time: Tue, 27 Aug 2013 16:38:31 -0500  Submitted:55  Active:95  Stage out:1  Finished successfully:849
Progress:  time: Tue, 27 Aug 2013 16:38:32 -0500  Active:78  Stage out:1  Finished successfully:921
Progress:  time: Tue, 27 Aug 2013 16:38:34 -0500  Active:70  Stage out:1  Finished successfully:929
Progress:  time: Tue, 27 Aug 2013 16:38:37 -0500  Stage in:1  Finished successfully:1000
Progress:  time: Tue, 27 Aug 2013 16:38:38 -0500  Stage out:1  Finished successfully:1000
Final status: Tue, 27 Aug 2013 16:38:38 -0500  Finished successfully:1001

$ grep "on node:" output/*log | head
output/sim_0.log:Running on node: nid00063
output/sim_100.log:Running on node: nid00060
output/sim_101.log:Running on node: nid00061
output/sim_102.log:Running on node: nid00032
output/sim_103.log:Running on node: nid00060
output/sim_104.log:Running on node: nid00061
output/sim_105.log:Running on node: nid00032
output/sim_106.log:Running on node: nid00060
output/sim_107.log:Running on node: nid00061
output/sim_108.log:Running on node: nid00062

$ grep "on node:" output/*log | awk '{print $4}' | sort | uniq -c
    158 nid00032
    156 nid00033
    171 nid00060
    178 nid00061
    166 nid00062
    171 nid00063
$ hostname
raven
$ hostname -f
nid00008
-----
/////
Performing larger Swift runs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To test with larger runs, there are two changes that are required. The first is a 
change to the command line arguments. The example below will run 1000 simulations
with each simulation taking 5 seconds.

-----
$ swift p6.swift -steps=5 -nsim=1000
-----

Part 5: Controlling the compute-node pools where applications run
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This section is under development.

Part 6: Specifying more complex workflow patterns
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

p6.swift expands the workflow pattern of p4.swift to add additional
stages to the workflow. Here, we generate a dynamic seed value that
will be used by all of the simulations, and for each simulation, we
run an pre-processing application to generate a unique "bias
file". This pattern is shown below, followed by the Swift script.

image::part06.png[align="center"]

.p6.swift
----
sys::[cat ../part06/p6.swift]
----

Note that the workflow is based on data flow dependencies: each simulation depends on the seed value, calculated in this statement:
-----
seedfile = genseed(1);
-----
and on the bias file, computed and then consumed in these two dependent statements:
-----
  biasfile = genbias(1000, 20, simulate_script);
  (simout,simlog) = simulation(steps, range, biasfile, 1000000, values, simulate_script, seedfile);
-----

To run:
----
$ cd ../part06
$ swift p6.swift
----

The default parameters result in the following execution log:

-----
$ swift p6.swift
Swift 0.94.1 RC2 swift-r6895 cog-r3765

RunID: 20130827-1917-jvs4gqm5
Progress:  time: Tue, 27 Aug 2013 19:17:56 -0500

*** Script parameters: nsim=10 range=100 num values=10

Progress:  time: Tue, 27 Aug 2013 19:17:57 -0500  Stage in:1  Submitted:10
Generated seed=382537
Progress:  time: Tue, 27 Aug 2013 19:17:59 -0500  Active:9  Stage out:1  Finished successfully:11
Final status: Tue, 27 Aug 2013 19:18:00 -0500  Finished successfully:22
-----
which produces the following output:
-----
$ ls -lrt output
total 264
-rw-r--r-- 1 p01532 61532     9 Aug 27 19:17 seed.dat
-rw-r--r-- 1 p01532 61532   180 Aug 27 19:17 bias_9.dat
-rw-r--r-- 1 p01532 61532   180 Aug 27 19:17 bias_8.dat
-rw-r--r-- 1 p01532 61532   180 Aug 27 19:17 bias_7.dat
-rw-r--r-- 1 p01532 61532   180 Aug 27 19:17 bias_6.dat
-rw-r--r-- 1 p01532 61532   180 Aug 27 19:17 bias_5.dat
-rw-r--r-- 1 p01532 61532   180 Aug 27 19:17 bias_4.dat
-rw-r--r-- 1 p01532 61532   180 Aug 27 19:17 bias_3.dat
-rw-r--r-- 1 p01532 61532   180 Aug 27 19:17 bias_2.dat
-rw-r--r-- 1 p01532 61532   180 Aug 27 19:17 bias_1.dat
-rw-r--r-- 1 p01532 61532   180 Aug 27 19:17 bias_0.dat
-rw-r--r-- 1 p01532 61532    90 Aug 27 19:17 sim_9.out
-rw-r--r-- 1 p01532 61532 14897 Aug 27 19:17 sim_9.log
-rw-r--r-- 1 p01532 61532 14897 Aug 27 19:17 sim_8.log
-rw-r--r-- 1 p01532 61532    90 Aug 27 19:17 sim_7.out
-rw-r--r-- 1 p01532 61532    90 Aug 27 19:17 sim_6.out
-rw-r--r-- 1 p01532 61532 14897 Aug 27 19:17 sim_6.log
-rw-r--r-- 1 p01532 61532    90 Aug 27 19:17 sim_5.out
-rw-r--r-- 1 p01532 61532 14897 Aug 27 19:17 sim_5.log
-rw-r--r-- 1 p01532 61532    90 Aug 27 19:17 sim_4.out
-rw-r--r-- 1 p01532 61532 14897 Aug 27 19:17 sim_4.log
-rw-r--r-- 1 p01532 61532 14897 Aug 27 19:17 sim_1.log
-rw-r--r-- 1 p01532 61532    90 Aug 27 19:18 sim_8.out
-rw-r--r-- 1 p01532 61532 14897 Aug 27 19:18 sim_7.log
-rw-r--r-- 1 p01532 61532    90 Aug 27 19:18 sim_3.out
-rw-r--r-- 1 p01532 61532 14897 Aug 27 19:18 sim_3.log
-rw-r--r-- 1 p01532 61532    90 Aug 27 19:18 sim_2.out
-rw-r--r-- 1 p01532 61532 14898 Aug 27 19:18 sim_2.log
-rw-r--r-- 1 p01532 61532    90 Aug 27 19:18 sim_1.out
-rw-r--r-- 1 p01532 61532    90 Aug 27 19:18 sim_0.out
-rw-r--r-- 1 p01532 61532 14897 Aug 27 19:18 sim_0.log
-rw-r--r-- 1 p01532 61532     9 Aug 27 19:18 average.out
-rw-r--r-- 1 p01532 61532 14675 Aug 27 19:18 average.log
-----

Each sim_N.out file is the sum of its bias file plus newly "simulated" random output scaled by 1,000,000:

-----
$ cat output/bias_0.dat
     302
     489
      81
     582
     664
     290
     839
     258
     506
     310
     293
     508
      88
     261
     453
     187
      26
     198
     402
     555

$ cat output/sim_0.out
64000302
38000489
32000081
12000582
46000664
36000290
35000839
22000258
49000506
75000310
-----

We produce 20 values in each bias file. Simulations of less than that
number of values ignore the unneeded number, while simualtions of more
than 20 will use the last bias number for all remoaining values past
20.  As an exercise, adjust the code to produce the same number of
bias values as is needed for each simulation.  As a further exercise,
modify the script to generate a unique seed value for each simulation,
which is a common practice in ensemble computations.

Tips for Specific Resources
---------------------------

Open Science Data Cloud
~~~~~~~~~~~~~~~~~~~~~~~
1. When you start instances on OSDC, use the standard Ubuntu image.
2. Ensure that your SSH key is added to the instance for password login.
3. Swift should run on the OSDC headnode.
4. You can use the following command within coaster-service.conf to automatically
populate WORKER_HOSTS with the IP addresses of all active instances you have running.

-----
export WORKER_HOSTS=$( nova list | grep ACTIVE | sed -e 's/^.*private=//' -e 's/ .*//' |sed ':a;N;$!ba;s/\n/ /g' )
-----