Swift Cookbook
March 2011
Swift
team
swift-user@ci.uchicago.edu
ST
0.92March 2011ST
Overview
This cookbook covers various recipes involving setting up and running Swift under diverse
configurations based on the application requirements and the underlying
infrastructures. The Swift system comprises of SwiftScript language and the
Swift runtime system. For
introductory material, consult the Swift tutorial found
here.
Please be warned that the cookbook is under construction and most recipes are not tested rigorously. You are advised to use them at your own risk.
Swift Basics
Installation
This section takes you through the installation of the Swift system on your
computer. We will start with the prerequisites as explained in the subsequent
section.
Prerequisites
Check your JavaSwift is a Java application. Make sure you are running Java version 5 or higher. You
can make sure you have Java in your $PATH (or $HOME/.soft file depending upon
your environment)
Following are the possible ways to detect and run Java:
$ grep java $HOME/.soft
#+java-sun # Gives you Java 5
+java-1.6.0_03-sun-r1
$ which java
/soft/java-1.6.0_11-sun-r1/bin/java
$ java -version
java version "1.6.0_11"
Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)
Setting up to run Swift
This is simple. We will be using a pre-compiled version of Swift that can be
downloaded from here. Download and untar the latest precompiled version as follows:
$ tar xf swift-0.92.1.tar.gz
Environment Setup
The examples were tested with Java version 1.6. Make sure you do not already
have Swift in your PATH. If you do, remove it, or remove any +swift or @swift
lines from your $HOME/.soft or $HOME/.bash_profile file. Then do:
PATH=$PATH:/path/to/swift/bin
Note that the environment will be different when using Swift from prebuilt distribution (as above) and trunk. The PATH setup when using swift from trunk would be as follows:
PATH=$PATH:/path/to/swift/dist/swift-svn/bin
Do NOT set SWIFT_HOME or CLASSPATH in your environment unless you
fully understand how these will affect Swift’s execution.
To execute your Swift script on a login host (or "localhost") use
the following command:
swift -tc.file tc somescript.swift
Setting swift configuration
Many configuration properties could be set using the Swift configuration file.
We will not cover them all in this section. see
here for details. In this section we will cover a simple configuration file with the most basic properties.
# A comment
wrapperlog.always.transfer=true
sitedir.keep=true
execution.retries=1
lazy.errors=true
status.mode=provider
use.provider.staging=true
provider.staging.pin.swiftfiles=false
clustering.enabled=false
clustering.queue.delay=10
clustering.min.time=86400
foreach.max.threads=100
provenance.log=true
Setting sites.xml
sites.xml specifies details of the sites that Swift can run on. Following is
an example of a simple sites.xml file entry for running Swift on local
environment:
<pool handle="localhost">
<filesystem provider="local" />
<execution provider="local" />
<workdirectory >/var/tmp</workdirectory>
<profile namespace="karajan" key="jobThrottle">.07</profile>
<profile namespace="karajan"
key="initialScore">100000</profile>
</pool>
First SwiftScript
Your first SwiftScript
Hello Swift-World!
A good sanity check that Swift is set up and running OK locally is this:
$ which swift
/home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/swift
$ echo 'trace("Hello, Swift world!");' >hello.swift
$ swift hello.swift
Swift svn swift-r3202 cog-r2682
RunID: 20100115-1240-6xhzxuz3
Progress:
SwiftScript trace: Hello, Swift world!
Final status:
$
A good first tutorial in using Swift is at:
http://www.ci.uchicago.edu/swift/guides/tutorial.php. Follow the steps in that
tutorial to learn how to run a few simple scripts on the login host.
second SwiftScript
Following is a more involved Swift script.
type file;
app (file o) cat (file i)
{
cat @i stdout=@o;
}
file out[]<simple_mapper; location="outdir", prefix="f.",suffix=".out">;
foreach j in [1:@toint(@arg("n","1"))] {
file data<"data.txt">;
out[j] = cat(data);
}
Swift Commandline Options
A description of Swift Commandline Options
Also includes a description of Swift inputs and outputs.
What if Swift hangs
Owing to its highly multithreaded architecture it is often the case that the
underlying java virtual machine gets into deadlock situations or Swift hangs
because of other complications in its threaded operations. Under such
situations, Swift hang-checker chips in and resolves the situation.
how to use the information to identify and correct the deadlock.
How close to the Swift source code can we make the hang-checker messages, so that the user can relate it to Swift functions, expressions, and ideally source code lines?
The current Hang Checker output is actually very nice and useful already:
Registered futures:
Rupture[] rups Closed, 1 elements, 0 listeners
Variation vars - Closed, no listeners
SgtDim sub - Open, 1 listeners
string site Closed, no listeners
Variation[] vars Closed, 72 elements, 0 listeners
Resuming a stopped or crashed Swift Run
I had a .rlog file from a Swift run that ran out of time. I kicked it off
using the -resume flag described in section 16.2 of the Swift User Guide and
it picked up where it left off. Then I killed it because I wanted to make
changes to my sites file.
. . .
Progress: Selecting site:1150 Stage in:55 Active:3 Checking status:1
Stage out:37 Finished in previous run:2462 Finished successfully:96
Progress: Selecting site:1150 Stage in:55 Active:2 Checking status:1
Stage out:38 Finished in previous run:2462 Finished successfully:96
Cleaning up...
Shutting down service at https://192.5.86.6:54813
Got channel MetaChannel: 1293358091 -> null
+ Done
Canceling job 9297.svc.pads.ci.uchicago.edu
No new rlog file was emitted but it did recognize the progress that had been
made, the 96 tasks that finished sucessfully above and resumed from 2558 tasks
finished.
[nbest@login2 files]$ pwd
/home/nbest/bigdata/files
[nbest@login2 files]$
~wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/swift \
> -tc.file tc -sites.file pbs.xml ~/scripts/mcd12q1.swift -resume
> mcd12q1-20100310-1326-ptxe1x1d.0.rlog
Swift svn swift-r3255 (swift modified locally) cog-r2723 (cog modified
locally)
RunID: 20100311-1027-148caf0a
Progress:
Progress: uninitialized:4
Progress: Selecting site:671 Initializing site shared directory:1 Finished
in previous run:1864
Progress: uninitialized:1 Selecting site:576 Stage in:96 Finished in
previous run:1864
Progress: Selecting site:1150 Stage in:94 Submitting:2 Finished in
previous run:2558
Progress: Selecting site:1150 Stage in:94 Submitted:2 Finished in previous
run:2558
Progress: Selecting site:1150 Stage in:93 Submitting:1 Submitted:2
Finished in previous run:2558
Progress: Selecting site:1150 Stage in:90 Submitting:1 Submitted:5
Finished in previous run:2558
Progress: Selecting site:1150 Stage in:90 Submitted:5 Active:1 Finished
in previous run:2558
From Neil: A comment about that section of the user guide: It says "In order
to restart from a restart log file, the -resume logfile argument can be used
after the SwiftScript? program file name." and then puts the -resume logfile
argument before the script file name. I’m sure the order doesn’t matter but
the contradiction is confusing.
Notes to add (from Mike):
explain what aspects of a Swift script make it restartable, and which
aspects are notrestartable. Eg, if your mappers can return different data at
different times, what happens? What other non-determinsitc behavior would
cause unpredictable, unexpected, or undesired behavior on resumption?
explain what changes you can make in the execution environment (eg
increasing or reducing CPUs to run on or throttles, etc); fixing tc.data
entries, env vars, or apps, etc.
note that resume will again retry failed app() calls. Explain if the retry
count starts over or not.
explain how to resume after multiple failures and resumes - i.e. if a .rlog
is generated on each run, which one should you resume from? Do you have a
choice of resuming from any of them, and what happens if you go backwards to
an older resume file?
whap happens when you kill (eg with C) a running swift script? Is the
signal caught, and the resume file written out at that point? Or written out
all along? (Note case in which script ws running for hours, then hit C, but
resume fie was short (54 bbytes) and swift shows no sign of doing a resume?
(It silently ignored resume file instead of acknowleging that it found one
with not useful resume state in it???) Swift should clearly state that its
resuming and what its resume state is.
swift -resume ftdock-[id].0.rlog \[rest of the exact command line from initial
run\]
Passing an array to swift?
Arrays can be passed to Swift in one of the following ways:
You can write the array to a file and read in in swift using
readData (or readData2).
Direct the array into a file (possibly with a "here document" which expands the array) and then read the file in Swift with readData() or process it with a Swift app() function?
You can use @strsplit on a comma separated command line arg and that works well for me.
Mappers
SimpleMapper
$ cat swiftapply.swift
type RFile;
trace("hi 1");
app (RFile result) RunR (RFile rcall)
{
RunR @rcall @result;
}
trace("hi 2");
RFile rcalls[] ;
RFile results[] ;
trace("start");
foreach c, i in rcalls {
trace("c",i,@c);
trace("r",i,@filename(results[i]));
results[i] = RunR(c);
}
$ ls calldir resdir
calldir:
rcall.1.Rdata rcall.2.Rdata rcall.3.Rdata rcall.4.Rdata
resdir:
result.1.Rdata result.2.Rdata result.3.Rdata result.4.Rdata
$
Notes:
how the .'s match
prefix and suffix dont span dirs
intervening pattern must be digits
these digits become the array indices
explain how padding= arg works & helps (including padding=0)
figure out and explain differences between simple_mapper and
filesys_mapper
FIXME: Use the "filesys_mapper" and its "location=" parameter to map the
input data from /home/wilde/bigdata/*
Abbreviations for SingleFileMapper
Notes:
within <> you can only have a literal string as in <"filename">, not an
expression. Someday we will fix this to make <> accept a general expression.
you can use @filenames( ) (note: plural) to pull off a list of filenames.
writeData()
example here
$ cat writedata.swift
type file;
file f <"filea">;
file nf <"filenames">;
nf = writeData(@f);
$ swift writedata.swift
Swift svn swift-r3264 (swift modified locally) cog-r2730 (cog modified
locally)
RunID: 20100319-2002-s9vpo0pe
Progress:
Final status:
$ cat filenames
filea$
StructuredRegexpMapper
IN PROGRESS This mapper can be used to base the mapped filenames of an output
array on the mapped filenames of an existing array. landuse outputfiles[]
<structured_regexp_mapper; source=inputfiles,
location="./output",match="(.)*tif", transform="\\1histogram">;
Use the undocumented "structured_regexp_mapper" to name the output
filenames based on the input filenames:
For example:
login2$ ls /home/wilde/bigdata/data/sample
h11v04.histogram h11v05.histogram h12v04.histogram h32v08.histogram
h11v04.tif h11v05.tif h12v04.tif h32v08.tif
login2$
login2$ cat regexp2.swift
type tif;
type mytype;
tif images[]<filesys_mapper;
location="/home/wilde/bigdata/data/sample", prefix="h", suffix=".tif">;
mytype of[] <structured_regexp_mapper; source=images, match="(h..v..)",
transform="output/myfile.\\1.mytype">;
foreach image, i in images {
trace(i,@filename(images));
trace(i,@filename(of[i]));
}
login2$
login1$ swift regexp2.swift
Swift svn swift-r3255 (swift modified locally) cog-r2723 (cog modified
locally)
RunID: 20100310-1105-4okarq08
Progress:
SwiftScript trace: 1, output/myfile.h11v04.mytype
SwiftScript trace: 2, home/wilde/bigdata/data/sample/h11v05.tif
SwiftScript trace: 3, home/wilde/bigdata/data/sample/h12v04.tif
SwiftScript trace: 0, output/myfile.h32v08.mytype
SwiftScript trace: 0, home/wilde/bigdata/data/sample/h32v08.tif
SwiftScript trace: 3, output/myfile.h12v04.mytype
SwiftScript trace: 1, home/wilde/bigdata/data/sample/h11v04.tif
SwiftScript trace: 2, output/myfile.h11v05.mytype
Final status:
login1$
Coasters
Coasters were introduced in Swift v0.6 as an experimental feature. In many
applications, Swift performance can be greatly enhanced by the use of CoG
coasters. CoG coasters provide a low-overhead job submission and file transfer
mechanism suited for the execution of short jobs (on the order of a few
seconds). A detailed information on coasters can be found at http://www.ci.uchicago.edu/swift/guides/userguide.php#coasters.
Following is a coasters setup case-study for a PBS underlying provider where sites.xml coaster settings were:
<execution provider="coaster" jobmanager="local:pbs"/>
<profile namespace="globus" key="project">CI-CCR000013</profile>
<!-- Note that the following is going to be defunct in the new version (0.93+) and replaced by
"ProviderAttributes" key and may not work in the future Swift versions-->
<!--<profile namespace="globus" key="ppn">24:cray:pack</profile>-->
<profile namespace="globus" key="providerAttributes">
pbs.aprun
pbs.mpp=true
</profile>
<profile namespace="globus" key="workersPerNode">24</profile>
<profile namespace="globus" key="maxTime">100000</profile>
<profile namespace="globus" key="lowOverallocation">100</profile>
<profile namespace="globus" key="highOverallocation">100</profile>
<profile namespace="globus" key="slots">20</profile>
<profile namespace="globus" key="nodeGranularity">5</profile>
<profile namespace="globus" key="maxNodes">5</profile>
<profile namespace="karajan" key="jobThrottle">20.00</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
The following table briefly describes the elements on the coasters setup:
profile key
brief description
slots
How many maximum LRM jobs/worker blocks are allowed
workersPerNode
How many coaster workers to run per execution node
nodeGranularity
Each worker block uses a number of nodes that is a multiple of this number
lowOverallocation
How many times larger than the job walltime should a block’s walltime be if all jobs are 1s long
highOverallocation
How many times larger than the job walltime should a block’s walltime be if all jobs are infinitely long
workersPerNode
How many coaster workers to run per execution node reserve How many seconds to reserve in a block’s walltime for starting/shutdown operations
maxnodes
The maximum number of nodes allowed in a block
maxtime
The maximum number of walltime allowed for a block coaster service
jobThrottle
the number of concurrent jobs allowed on a site
For Advanced Users
One of the main reason that one would initially deviate from coaster
defaults into more complex pool entries is to force jobs to fit into some
site-imposed constraint. For instance a typical submission to the experimental
queue requires a user to request upto 3 nodes for under 1 hour. This setup
could be achieved with a careful tuning of coaters parameters.
Coaster providers: local, ssh, pbs
Settings and examples for different coaster providers mechanisms.
Swift on Diverse Infrastructures
Beagle
Swift is available on Beagle as a software module.
In this recipe, we will walk you through a very simple example running Swift
on Beagle. The prerequisites required for running this example on Beagle is an access to Beagle and a project allocation. The example is a simplest form of a real application that involves reading from a disc file and writing onto another disc file. It runs the /bin/cat Unix command to read an input file called data.txt and write it to an output file in a folder outdir, called f.nnn.out.
Swift Example
The steps are as follows:
step 1. Load the Swift module on Beagle commandline as follows: $ module load swift
step 2. Create and change to a directory where your Swift related work (source, inputs, outputs) will
stay. (say, $ mkdir swift-lab, followed by, $ cd swift-lab)
step 3. copy the folder at /home/ketan/labs/catsn to the above directory. ($cp -r /home/ketan/catsn . followed by $ cd catsn).
step 4. In the sites file: sites.xml make the following two
changes:
change the path of the <workdirectory> element to your preferred location (this location is required to be on /lustre, say to /lustre/beagle/<yourname>/swift-lab/swift.workdir), and
Change the project name to your project (for instance, CI-CCR000013) . The workdirectory will contain
execution data related to each run, e.g. wrapper scripts, system information, inputs and outputs.
step 5. Run the example using following commandline (also found in run.sh):
$ swift -config cf -tc.file tc -sites.file sites.xml catsn.swift -n=1
You can further change the value of -n to any arbitrary number to run that
many number of concurrent cat
step 6. Check the output in the generated outdir directory (ls outdir)
Note: Running from sandbox node or requesting 1 hour walltime for upto 3 nodes
will get fast prioritized execution. Good for small tests
Swift supports a coasters based computing environment for Beagle. A detailed version of Swift documentation is maintained
here.
PADS
Swift on PADS
To execute your Swift script on the PADS cluster use this command:
swift -tc.file tc -sites.file pbs.xml modis.swift
where the contents of a simple pbs.xml sites file could be:
<config>
<pool handle="pbs">
<execution provider="pbs" url="none"/>
<profile namespace="globus" key="queue">fast</profile>
<profile namespace="globus" key="maxwalltime">00:05:00</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
<profile namespace="karajan" key="jobThrottle">.10</profile>
<filesystem provider="local"/>
<workdirectory >/home/you/swiftwork</workdirectory>
</pool>
</config>
OSG
This section describes how to get Swift running on the OSG Grid. We will use a
manual coaster setup to get Swift running on OSG.
Coaster setup on OSGThe following figure shows an abstract scheme for the manual coasters setup on
OSG.
Coaster setup
In the following steps, we will go through the process of manually setting
Bionimbus
This section explains a step by step procedure on getting Swift running on the
Bionimbus cloud. We will use the manual coasters configuration on the
Bionimbus cloud.
step1. Connect to the gateway (ssh gatewayx.lac.uic.edu)
step2. Start a virtual machine (euca-run-instances -n 1 -t m1.small
emi-17EB1170)
step3. Start the coaster-service on gateway
coaster-service -port 1984 -localport 35753 -nosec
step4. Start the Swift-script from the gateway using normal Swift commandline
swift -config cf -tc.file tc -sites.file sites.xml yourscript.swift -aparam=999
cf
wrapperlog.always.transfer=true
sitedir.keep=true
execution.retries=1
lazy.errors=true
status.mode=provider
use.provider.staging=true
provider.staging.pin.swiftfiles=false
foreach.max.threads=100
provenance.log=true
tc
localhost modftdock /home/ketan/swift-labs/bionimbus-coaster-modftdock/app/modftdock.sh null null GLOBUS::maxwalltime="1:00:00"
(See below for a sample sites.xml for this run)
step5. Connect back to gateway from virtual machines using reverse ssh tunneling as follows:
From the gateway promptssh -R *:5000:localhost:5000 root@10.101.8.50 sleep 999
WHERE:
*=network interface, should remain the same on all cases
localhost=the gateway host, should remain the same
5000(LEFT OF localhost)=the port number on localhost to listen to **THIS WILL
vary depending upon which port you want to listen to
5000(RIGHT OF localhost)=the port on target host that you want to forward
root@10.101.8.50=the ip of the Virtual Machine on Bionimbus cloud, this will
vary based on what ip you get for your Virtual Machine instance
#On anywhere as long as provide the correct callback uri: here the
"http://140.221.9.110:42195" is the callback uri of previous ones
step6. Start the worker from the virtual machine
worker.pl http://localhost:42195 tmp /tmp # where 42195 is the port where the
coaster service is listening to the workers
sites.xml for the above run
<config>
<pool handle="localhost">
<execution provider="coaster-persistent" url="http://localhost:1984" jobmanager="local:local"/>
<profile namespace="globus" key="workerManager">passive</profile>
<profile namespace="globus" key="workersPerNode">4</profile>
<profile namespace="globus" key="maxTime">10000</profile>
<profile namespace="globus" key="lowOverAllocation">100</profile>
<profile namespace="globus" key="highOverAllocation">100</profile>
<profile namespace="globus" key="slots">100</profile>
<profile namespace="globus" key="nodeGranularity">1</profile>
<profile namespace="globus" key="maxNodes">10</profile>
<profile namespace="karajan" key="jobThrottle">25.00</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
<profile namespace="swift" key="stagingMethod">proxy</profile>
<filesystem provider="local"/>
<workdirectory>/home/ketan/swift-labs/bionimbus-coaster-modftdock/swift.workdir</workdirectory>
</pool>
</config>
Debugging Swift
Swift errors are logged in several places:
All text from standard output and standard error produced by running the
swift command
The .log file from this run. It will be named swiftscript.uniqueID.log
where "swiftscript" is the name of your *.swift script source file, and
uniqueID is a long unique id which starts with the date and time you ran the
swift command.
$HOME/.globus/coasters directory on remote machines on which you are
running coasters
$HOME/.globus/scripts directory on the host on which you run the Swift
command, when swift is submitting to a local scheduler (Condor, PBS, SGE,
Cobalt)
$HOME/.globus/??? on remote systems that you access via Globus
Log Processing
To properly generate log plots, you must enable VDL/Karajan logging. Make sure
log4.properties contains:
log4j.logger.swift=DEBUG
log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=DEBUG
Make a basic load plot from Coasters Cpu log lines
Generate the log (may set log4j.logger.swift=INFO for this one)(assuming the log is titled swift-run.log)
Convert the log times to Unix time
./iso-to-secs < swift-run.log > swift-run.time
Make the start time file (this contains the earliest timestamp)
make LOG=swift-run.log start-time.tmp
or
extract-start-time swift-run.log > start-time.tmp
Normalize the transition times
./normalise-event-start-time < swift-run.time > swift-run.norm
Build up a load data file:
./cpu-job-load.pl < swift-run.norm > load.data
Plot with the JFreeChart-based plotter in usertools/plotter:
lines.zsh load.cfg load.eps load.data
Make a basic job completion plot from Coasters Cpu log lines
Same as above, but, build up a completed data file:
./cpu-job-completed.pl < swift-run.norm > completed.data
Plot with the JFreeChart-based plotter in usertools/plotter:
lines.zsh completed.cfg completed.eps completed.data
Make a basic Block allocation plot from Coasters Block log lines
Same as above, but:
Build up a block allocation data file:
./block-level.pl < swift-run.norm > blocks.data
Plot with the JFreeChart-based plotter in usertools/plotter:
lines.zsh blocks.{cfg,eps,data}
Problem Reporting
When reporting problems to swift-user@ci.uchicago.edu, please attach the
following files and information:
tc.data and sites.xml (or whatever you named these files)
your .swift source file and any .swift files it imports
any external mapper scripts called by your .swift script
all text from standard output and standard error produced by running the
swift command
The .log file from this run. It will be named swiftscript.uniqueID.log
where "swiftscript" is the name of your *.swift script source file, and
uniqueID is a long unique id which starts with the date and time you ran the
swift command.
The swift command line you invoked
Any swift.properties entries you over-rode ($HOME/.swift/swift.properties,
-config.file argument properties file, any changes to etc/swift.proerties from
your swift distribution)
Which swift distribution you are running (release; svn revisions; other
local changes you mave have made or included)