Swift User GuideOverviewSwift is a data-flow oriented coarse grained scripting language that supports
dataset typing and mapping, dataset iteration, conditional branching,
and procedural composition.Swift programs (or workflows) are written in a language called
Swift.Swift scripts are primarily concerned with processing (possibly large)
collections of data files, by invoking programs to do that processing. Swift
handles execution of such programs on remote sites by choosing sites, handling
the staging of input and output files to and from the chosen sites and remote
execution of programs.Getting StartedThis section will provide links and information to new
Swift users about how to get started using Swift.QuickstartThis section provides the basic steps for downloading and installing Swift.
Swift requires that a recent version of Oracle Java is installed. More information about installing Java can be found at http://www.oracle.com/technetwork/java.
Download Swift 0.95 at http://swiftlang.org/packages/swift-0.95.tar.gz.
Extract by running "tar xfz swift-0.95.tar.gz"
Add Swift to $PATH by running "export PATH=$PATH:/path/to/swift-0.95/bin"
Verify swift is working by running "swift -version"
TutorialsThere are a few tutorials available for specific clusters and
supercomputers.Swift on Clouds and Ad Hoc collections of workstationsSwift on OSG ConnectSwift on CraysSwift on RCC Midway Cluster at UChicago / SlurmThe Swift LanguageLanguage BasicsA Swift script describes data, application components, invocations of
applications components, and the inter-relations (data flow) between
those invocations.Data is represented in a script by strongly-typed single-assignment
variables. The syntax superficially resembles C and Java. For example,
{ and } characters are used to enclose blocks of statements.Types in Swift can be atomic or composite. An atomic type can be either a
primitive type or a mapped type. Swift provides a fixed set of primitive types,
such as integer and string. A mapped type indicates that the actual data does
not reside in CPU addressable memory (as it would in conventional programming
languages), but in POSIX-like files. Composite types are further subdivided
into structures and arrays. Structures are similar in most respects to
structure types in other languages. In Swift, structures are defined using the
type keyword (there is no struct keyword). Arrays use numeric indices, but
are sparse. They can contain elements of any type, including other array types,
but all elements in an array must be of the same type. We often refer to
instances of composites of mapped types as datasets.type-hierarchy.pngAtomic types such as string, int, float and double work the same way as in
C-like programming languages. A variable of such atomic types can be defined as
follows:string astring = "hello";A struct variable is defined using the type keyword as discussed above.
Following is an example of a variable holding employee data:type Employee{
string name;
int id;
string loc;
}The members of the structure defined above can be accessed using the dot
notation. An example of a variable of type Employee is as follows:Employee emp;
emp.name="Thomas";
emp.id=2222;
emp.loc="Chicago";Arrays of structures are allowed in Swift. A convenient way of populating
structures and arrays of structures is to use the readData() function.Mapped type and composite type variable declarations can be annotated
with a mapping descriptor indicating the file(s) that make up that
dataset. For example, the following line declares a variable named
photo with type image. It additionally declares that the data for
this variable is stored in a single file named shane.jpg.image photo <"shane.jpg">;Component programs of scripts are declared in an app declaration, with
the description of the command line syntax for that program and a list
of input and output data. An app block describes a functional/dataflow
style interface to imperative components.For example, the following example lists a procedure which makes use of
the ImageMagick http://www.imagemagick.org/ convert command to
rotate a supplied image by a specified angle:app (image output) rotate(image input) {
convert "-rotate" angle @input @output;
}A procedure is invoked using the familiar syntax:rotated = rotate(photo, 180);While this looks like an assignment, the actual unix level execution
consists of invoking the command line specified in the app
declaration, with variables on the left of the assignment bound to the
output parameters, and variables to the right of the procedure
invocation passed as inputs.The examples above have used the type image without any definition of
that type. We can declare it as a marker type which has no structure
exposed to Swift script:type image;This does not indicate that the data is unstructured; but it indicates
that the structure of the data is not exposed to Swift. Instead,
Swift will treat variables of this type as individual opaque files.With mechanisms to declare types, map variables to data files, and
declare and invoke procedures, we can build a complete (albeit simple)
script:type image;
image photo <"shane.jpg">;
image rotated <"rotated.jpg">;
app (image output) rotate(image input, int angle) {
convert "-rotate" angle @input @output;
}
rotated = rotate(photo, 180);This script can be invoked from the command line: $ ls *.jpg
shane.jpg
$ swift example.swift
...
$ ls *.jpg
shane.jpg rotated.jpgThis executes a single convert command, hiding from the user features
such as remote multisite execution and fault tolerance that will be
discussed in a later section.Figure 1. shane.jpguserguide-shane.jpgFigure 2. rotated.jpguserguide-rotated.jpgArrays and Parallel ExecutionArrays of values can be declared using the [] suffix. Following is an example
of an array of strings:string pets[] = ["shane", "noddy", "leo"];An array may be mapped to a collection of files, one element per file, by using a
different form of mapping expression. For example, the filesys_mapper
maps all files matching a particular unix glob pattern into an array:file frames[] <filesys_mapper; pattern="*.jpg">;The foreach construct can be used to apply the same block of code to
each element of an array:foreach f,ix in frames {
output[ix] = rotate(f, 180);Sequential iteration can be expressed using the iterate construct:step[0] = initialCondition();
iterate ix {
step[ix] = simulate(step[ix-1]);
}This fragment will initialise the 0-th element of the step array to
some initial condition, and then repeatedly run the simulate
procedure, using each execution’s outputs as input to the next step.Associative ArraysBy default, array keys are integers. However, other primitive types are also
allowed as array keys. The syntax for declaring an array with a key type different
than the default is:<valueType>[<keyType>] array;For example, the following code declares and assigns items to an array with string
keys and float values:float[string] a;
a["one"] = 0.2;
a["two"] = 0.4;In addition to primitive types, a special type named auto can be used to
declare an array for which an additional append operation is available:int[auto] array;
foreach i in [1:100] {
array << (i*2) ;
}
foreach v in array {
trace(v);
}Items in an array with auto keys cannot be accessed directly using a primitive
type. The following example results in a compile-time error:int[auto] array;
array[0] = 1;However, it is possible to use auto key values from one array to access another:int[auto] a;
int[auto] b;
a << 1;
a << 2;
foreach v, k in a {
b[k] = a[k] * 2;
}Ordering of executionNon-array variables are single-assignment, which means that they must
be assigned to exactly one value during execution. A procedure or
expression will be executed when all of its input parameters have been
assigned values. As a result of such execution, more variables may
become assigned, possibly allowing further parts of the script to execute.In this way, scripts are implicitly parallel. Aside from serialisation
implied by these dataflow dependencies, execution of component programs
can proceed in parallel.In this fragment, execution of procedures p and q can happen in
parallel:y=p(x);
z=q(x);while in this fragment, execution is serialised by the variable y,
with procedure p executing before q.y=p(x);
z=q(y);Arrays in Swift are more monotonic - a generalisation of being
assignment. Knowledge about the content of an array increases during
execution, but cannot otherwise change. Each element of the array is
itself single assignment or monotonic (depending on its type). During a
run all values for an array are eventually known, and that array is
regarded as closed.Statements which deal with the array as a whole will often wait for the
array to be closed before executing (thus, a closed array is the
equivalent of a non-array type being assigned). However, a foreach
statement will apply its body to elements of an array as they become
known. It will not wait until the array is closed.Consider this script:file a[];
file b[];
foreach v,i in a {
b[i] = p(v);
}
a[0] = r();
a[1] = s();Initially, the foreach statement will have nothing to execute, as the
array a has not been assigned any values. The procedures r and s
will execute. As soon as either of them is finished, the corresponding
invocation of procedure p will occur. After both r and s have
completed, the array a will be closed since no other statements in the
script make an assignment to a.Compound proceduresAs with many other programming languages, procedures consisting of
Swift script can be defined. These differ from the previously
mentioned procedures declared with the app keyword, as they invoke
other Swift procedures rather than a component program.(file output) process (file input) {
file intermediate;
intermediate = first(input);
output = second(intermediate);
}
file x <"x.txt">;
file y <"y.txt">;
y = process(x);This will invoke two procedures, with an intermediate data file named
anonymously connecting the first and second procedures.Ordering of execution is generally determined by execution of app
procedures, not by any containing compound procedures. In this code block:(file a, file b) A() {
a = A1();
b = A2();
}
file x, y, s, t;
(x,y) = A();
s = S(x);
t = S(y);then a valid execution order is: A1 S(x) A2 S(y). The compound
procedure A does not have to have fully completed for its return
values to be used by subsequent statements.More about typesEach variable and procedure parameter in Swift script is strongly typed.
Types are used to structure data, to aid in debugging and checking
program correctness and to influence how Swift interacts with data.The image type declared in previous examples is a marker type.
Marker types indicate that data for a variable is stored in a single
file with no further structure exposed at the Swift script level.Arrays have been mentioned above, in the arrays section. A code block
may be applied to each element of an array using foreach; or
individual elements may be references using [] notation.There are a number of primitive types:type containsintintegersstringstrings of textfloatfloating point numbers, that behave the same as Java doublesbooleantrue/falseComplex types may be defined using the type keyword:type headerfile;
type voxelfile;
type volume {
headerfile h;
voxelfile v;
}Members of a complex type can be accessed using the . operator:volume brain;
o = p(brain.h);Sometimes data may be stored in a form that does not fit with Swift’s
file-and-site model; for example, data might be stored in an RDBMS on
some database server. In that case, a variable can be declared to have
external type. This indicates that Swift should use the variable to
determine execution dependency, but should not attempt other data
management; for example, it will not perform any form of data stage-in
or stage-out it will not manage local data caches on sites; and it will
not enforce component program atomicity on data output. This can add
substantial responsibility to component programs, in exchange for
allowing arbitrary data storage and access methods to be plugged in to
scripts.type file;
app (external o) populateDatabase() {
populationProgram;
}
app (file o) analyseDatabase(external i) {
analysisProgram @o;
}
external database;
file result <"results.txt">;
database = populateDatabase();
result = analyseDatabase(database);Some external database is represented by the database variable. The
populateDatabase procedure populates the database with some data, and
the analyseDatabase procedure performs some subsequent analysis on
that database. The declaration of database contains no mapping; and
the procedures which use database do not reference them in any way;
the description of database is entirely outside of the script. The
single assignment and execution ordering rules will still apply though;
populateDatabase will always be run before analyseDatabase.Data modelData processed by Swift is strongly typed. It may be take the form of
values in memory or as out-of-core files on disk. Language constructs
called mappers specify how each piece of data is stored.More technical details about Swift scriptThe syntax of Swift script has a superficial resemblance to C and Java.
For example, { and } characters are used to enclose blocks of statements.A Swift script consists of a number of statements. Statements may
declare types, procedures and variables, assign values to variables, and
express operations over arrays.VariablesVariables in Swift scripts are declared to be of a specific type.
Assignments to those variables must be data of that type. Swift script
variables are single-assignment - a value may be assigned to a variable
at most once. This assignment can happen at declaration time or later on
in execution. When an attempt to read from a variable that has not yet
been assigned is made, the code performing the read is suspended until
that variable has been written to. This forms the basis for Swift’s
ability to parallelise execution - all code will execute in parallel
unless there are variables shared between the code that cause sequencing.Variable DeclarationsVariable declaration statements declare new variables. They can
optionally assign a value to them or map those variables to on-disk files.Declaration statements have the general form:typename variablename (<mapping> | = initialValue ) ;The format of the mapping expression is defined in the Mappers section.
initialValue may be either an expression or a procedure call that
returns a single value.Variables can also be declared in a multivalued-procedure statement,
described in another section.Assignment StatementsAssignment statements assign values to previously declared variables.
Assignments may only be made to variables that have not already been
assigned. Assignment statements have the general form:variable = value;where value can be either an expression or a procedure call that returns
a single value.Variables can also be assigned in a multivalued-procedure statement,
described in another section.ProceduresThere are two kinds of procedure: An atomic procedure, which describes
how an external program can be executed; and compound procedures which
consist of a sequence of Swift script statements.A procedure declaration defines the name of a procedure and its input
and output parameters. Swift script procedures can take multiple inputs
and produce multiple outputs. Inputs are specified to the right of the
function name, and outputs are specified to the left. For example:(type3 out1, type4 out2) myproc (type1 in1, type2 in2)The above example declares a procedure called myproc, which has two
inputs in1 (of type type1) and in2 (of type type2) and two
outputs out1 (of type type3) and out2 (of type type4).A procedure input parameter can be an optional parameter in which case
it must be declared with a default value. When calling a procedure, both
positional parameter and named parameter passings can be passed,
provided that all optional parameters are declared after the required
parameters and any optional parameter is bound using keyword parameter
passing. For example, if myproc1 is defined as:(binaryfile bf) myproc1 (int i, string s="foo")Then that procedure can be called like this, omitting the optionalparameter s:
binaryfile mybf = myproc1(1);or like this supplying a value for the optional parameter s:binaryfile mybf = myproc1 (1, s="bar");Atomic proceduresAn atomic procedure specifies how to invoke an external executable
program, and how logical data types are mapped to command line arguments.Atomic procedures are defined with the app keyword:app (binaryfile bf) myproc (int i, string s="foo") {
myapp i s @filename(bf);
}which specifies that myproc invokes an executable called myapp,
passing the values of i, s and the filename of bf as command line
arguments.Compound proceduresA compound procedure contains a set of Swift script statements:(type2 b) foo_bar (type1 a) {
type3 c;
c = foo(a); // c holds the result of foo
b = bar(c); // c is an input to bar
}Control ConstructsSwift script provides if, switch, foreach, and iterate
constructs, with syntax and semantics similar to comparable constructs
in other high-level languages.foreachThe foreach construct is used to apply a block of statements to each
element in an array. For example:check_order (file a[]) {
foreach f in a {
compute(f);
}
}foreach statements have the general form:foreach controlvariable (,index) in expression {
statements
}The block of statements is evaluated once for each element in
expression which must be an array, with controlvariable set to the
corresponding element and index (if specified) set to the integer
position in the array that is being iterated over.ifThe if statement allows one of two blocks of statements to be
executed, based on a boolean predicate. if statements generally have
the form:if(predicate) {
statements
} else {
statements
}where predicate is a boolean expression.switchswitch expressions allow one of a selection of blocks to be chosen
based on the value of a numerical control expression. switch
statements take the general form:switch(controlExpression) {
case n1:
statements2
case n2:
statements2
[...]
default:
statements
}The control expression is evaluated, the resulting numerical value used
to select a corresponding case, and the statements belonging to that
case block are evaluated. If no case corresponds, then the statements
belonging to the default block are evaluated.Unlike C or Java switch statements, execution does not fall through to
subsequent case blocks, and no break statement is necessary at the
end of each block.Following is an example of a switch expression in Swift:int score=60;
switch (score){
case 100:
tracef("%s\n", "Bravo!");
case 90:
tracef("%s\n", "very good");
case 80:
tracef("%s\n", "good");
case 70:
tracef("%s\n", "fair");
default:
tracef("%s\n", "unknown grade");
}iterateiterate expressions allow a block of code to be evaluated repeatedly,
with an iteration variable being incremented after each iteration.The general form is:iterate var {
statements;
} until (terminationExpression);Here var is the iteration variable. Its initial value is 0. After each iteration,
but before terminationExpression is evaluated, the iteration variable is incremented.
This means that if the termination expression is a function of only the iteration variable,
the body will never be executed while the termination expression is true.Example:iterate i {
trace(i); // will print 0, 1, and 2
} until (i == 3);Variables declared inside the body of iterate can be used in the termination expression.
However, their values will reflect the values calculated as part of the last invocation
of the body, and may not reflect the incremented value of the iteration variable:iterate i {
trace(i);
int j = i; // will print 0, 1, 2, and 3
} until (j == 3);OperatorsThe following infix operators are available for use in Swift script
expressions.operatorpurpose+numeric addition; string concatenation-numeric subtraction*numeric multiplication/floating point division%/integer division%%integer remainder of division== !=comparison and not-equal-to< > ⇐ >=numerical ordering&& ||boolean and, or!boolean notGlobal constantsAt the top level of a Swift script program, the global modified may be
added to a declaration so that it is visible throughout the program,
rather than only at the top level of the program. This allows global
constants (of any type) to be defined.ImportsThe import directive can be used to import definitions from another
Swift file.For example, a Swift script might contain this:import "defs";
file f;which would import the content of defs.swift:type file;Imported files are read from two places. They are either read from
the path that is specified from the import command, such as:import "definitions/file/defs";or they are read from the environment variable SWIFT_LIB. This
environment variable is used just like the PATH environment
variable. For example, if the command below was issued to the bash
shell:export SWIFT_LIB=${HOME}/Swift/defs:${HOME}/Swift/functionsthen the import command will check for the file defs.swift in both
"${HOME}/Swift/defs" and "${HOME}/Swift/functions" first before trying
the path that was specified in the import command.Other valid imports:import "../functions/func"
import "/home/user/Swift/definitions/defs"There is no requirement that a module is imported only once. If a module
is imported multiple times, for example in different files, then Swift
will only process the imports once.Imports may contain anything that is valid in a Swift script,
including the code that causes remote execution.MappersMappers provide a mechanism to specify the layout of mapped datasets on
disk. This is needed when Swift must access files to transfer them to
remote sites for execution or to pass to applications.Swift provides a number of mappers that are useful in common cases. This
section details those mappers. For more complex cases, it is
possible to write application-specific mappers in Java and use them
within a Swift script.The Single File MapperThe single_file_mapper maps a single physical file to a dataset.Swift variableFilenamefmyfilef [0]INVALIDf.barINVALIDparametermeaningfileThe location of the physical file including path and file name.Example:file f <single_file_mapper;file="plot_outfile_param">;There is a simplified syntax for this mapper:file f <"plot_outfile_param">;The Simple MapperThe simple_mapper maps a file or a list of files into an array by
prefix, suffix, and pattern. If more than one file is matched, each of
the file names will be mapped as a subelement of the dataset.ParameterMeaninglocationA directory that the files are located.prefixThe prefix of the filessuffixThe suffix of the files, for instance: ".txt"paddingThe number of digits used to uniquely identify the mapped file. This is an optional parameter which defaults to 4.patternA UNIX glob style pattern, for instance: "*foo*" would match
all file names that contain foo. When this mapper is used to specify
output filenames, pattern is ignored.type file;
file f <simple_mapper;prefix="foo", suffix=".txt">;The above maps all filenames that start with foo and have an extension
.txt into file f.Swift variableFilenameffoo.txttype messagefile;
(messagefile t) greeting(string m) {.
app {
echo m stdout=@filename(t);
}
}
messagefile outfile <simple_mapper;prefix="foo",suffix=".txt">;
outfile = greeting("hi");This will output the string hi to the file foo.txt.The simple_mapper can be used to map arrays. It will map the array
index into the filename between the prefix and suffix.type messagefile;
(messagefile t) greeting(string m) {
app {
echo m stdout=@filename(t);
}
}
messagefile outfile[] <simple_mapper;prefix="baz",suffix=".txt", padding=2>;
outfile[0] = greeting("hello");
outfile[1] = greeting("middle");
outfile[2] = greeting("goodbye");Swift variableFilenameoutfile[0]baz00.txtoutfile[1]baz01.txtoutfile[2]baz02.txtsimple_mapper can be used to map structures. It will map the name of
the structure member into the filename, between the prefix and the suffix.type messagefile;
type mystruct {
messagefile left;
messagefile right;
};
(messagefile t) greeting(string m) {
app {
echo m stdout=@filename(t);
}
}
mystruct out <simple_mapper;prefix="qux",suffix=".txt">;
out.left = greeting("hello");
out.right = greeting("goodbye");This will output the string "hello" into the file qux.left.txt and the
string "goodbye" into the file qux.right.txt.Swift variableFilenameout.leftquxleft.txtout.rightquxright.txtConcurrent MapperThe concurrent_mapper is almost the same as the simple mapper, except that
it is used to map an output file, and the filename generated will
contain an extract sequence that is unique. This mapper is the default
mapper for variables when no mapper is specified.ParameterMeaninglocationA directory that the files are located.prefixThe prefix of the filessuffixThe suffix of the files, for instance: ".txt"
pattern A UNIX glob style pattern, for instance: "*foo*" would match
all file names that contain foo. When this mapper is used to specify
output filenames, pattern is ignored.Example:file f1;
file f2 <concurrent_mapper;prefix="foo", suffix=".txt">;The above example would use concurrent mapper for f1 and f2, and
generate f2 filename with prefix "foo" and extension ".txt"Filesystem MapperThe filesys_mapper is similar to the simple mapper, but maps a file or a
list of files to an array. Each of the filename is mapped as an element
in the array. The order of files in the resulting array is not defined.TODO: note on difference between location as a relative vs absolute path
w.r.t. staging to remote location - as mihael said: It’s because you
specify that location in the mapper. Try location="." instead of
location="/sandbox/…"parametermeaninglocationThe directory where the files are located.prefixThe prefix of the filessuffixThe suffix of the files, for instance: ".txt"patternA UNIX glob style pattern, for instance: "*foo*" would match
all file names that contain foo.Example:file texts[] <filesys_mapper;prefix="foo", suffix=".txt">;The above example would map all filenames that start with "foo" and
have an extension ".txt" into the array texts. For example, if the
specified directory contains files: foo1.txt, footest.txt,
foo__1.txt, then the mapping might be:Swift variableFilenametexts[0]footest.txttexts[1]foo1.txttexts[2]foo__1.txtFixed Array MapperThe fixed_array_mapper maps from a string that contains a list of
filenames into a file array.parameterMeaningfilesA string that contains a list of filenames, separated by space,
comma or colonExample:file texts[] <fixed_array_mapper;files="file1.txt, fileB.txt, file3.txt">;would cause a mapping like this:Swift variableFilenametexts[0]file1.txttexts[1]fileB.txttexts[2]file3.txtArray MapperThe array_mapper maps from an array of strings into a fileparametermeaningfilesAn array of strings containing one filename per elementExample:string s[] = [ "a.txt", "b.txt", "c.txt" ];
file f[] <array_mapper;files=s>;This will establish the mapping:Swift variableFilenamef[0]a.txtf[1]b.txtf[2]c.txtRegular Expression MapperThe regexp_mapper transforms one file name to another using regular
expression matching.parametermeaningsourceThe source file namematchRegular expression pattern to match, use()to match whatever
regular expression is inside the parentheses, and indicate the start and
end of a group; the contents of a group can be retrieved with the\\numberspecial sequence (two backslashes are needed because the
backslash is an escape sequence introducer)transformThe pattern of the file name to transform to, use \number to
reference the group matched.Example:file s <"picture.gif">;
file f <regexp_mapper; source=s,
match="(.*)gif", transform="\\1jpg">;This example transforms a file ending gif into one ending jpg and
maps that to a file.Swift variableFilenamefpicture.jpgStructured Regular Expression MapperThe structured_regexp_mapper is similar to the regexp_mapper with the only
difference that it can be applied to arrays while the regexp_mapper cannot.parametermeaningsourceThe source file namematchRegular expression pattern to match, use()to match whatever
regular expression is inside the parentheses, and indicate the start and
end of a group; the contents of a group can be retrieved with the\\numberspecial sequence (two backslashes are needed because the
backslash is an escape sequence introducer)transformThe pattern of the file name to transform to, use \number to
reference the group matched.Example:file s[] <filesys_mapper; pattern="*.gif">;
file f[] <structured_regexp_mapper; source=s,
match="(.*)gif", transform="\\1jpg">;This example transforms all files in a list that end in gif to end in jpg and maps
the list to those files.CSV MapperThe csv_mapper maps the content of a CSV (comma-separated value) file
into an array of structures. The dataset type needs to be correctly
defined to conform to the column names in the file. For instance, if the
file contains columns: name age GPA then the type needs to have member
elements like this:type student {
file name;
file age;
file GPA;
}If the file does not contain a header with column info, then the column
names are assumed as column1, column2, etc.ParameterMeaningfileThe name of the CSV file to read mappings from.headerWhether the file has a line describing header info; default istrueskipThe number of lines to skip at the beginning (after header line);
default is 0.hdelimHeader field delimiter; default is the value of thedelimparameterdelimContent field delimiters; defaults are space, tab and commaExample:student stus[] <csv_mapper;file="stu_list.txt">;The above example would read a list of student info from file
"stu_list.txt" and map them into a student array. By default, the file
should contain a header line specifying the names of the columns. If
stu_list.txt contains the following:name,age,gpa
101-name.txt, 101-age.txt, 101-gpa.txt
name55.txt, age55.txt, age55.txt
q, r, sthen some of the mappings produced by this example would be:stus[0].name101-name.txtstus[0].age101-age.txtstus[0].gpa101-gpa.txtstus[1].namename55.txtstus[1].ageage55.txtstus[1].gpagpa55.txtstus[2].nameqstus[2].agerstus[2].gpasExternal MapperThe external mapper, ext maps based on the output of a supplied Unix
executable.parametermeaningexecThe name of the executable (relative to the current directory, if
an absolute path is not specified)*Other parameters are passed to the executable prefixed with a - symbolThe output (stdout) of the executable should consist of two columns of data,
separated by a space. The first column should be the path of the mapped
variable, in Swift script syntax (for example [2] means the 2nd element of an
array) or the symbol $ to represent the root of the mapped variable. The
following table shows the symbols that should appear in the first column
corresponding to the mapping of different types of swift constructs such as
scalars, arrays and structs.Swift constructfirst columnsecond columnscalar$file_nameanarray[][]file_name2dimarray[][][][]file_nameastruct.fldfldfile_nameastructarray[].fldname[].fldnamefile_nameExample: With the following in mapper.sh,#!/bin/bash
echo "[2] qux"
echo "[0] foo"
echo "[1] bar"then a mapping statement:student stus[] <ext;exec="mapper.sh">;would mapSwift variableFilenamestus[0]foostus[1]barstus[2]quxAdvanced Example: The following mapper.sh is an advanced example of an external
mapper that maps a two-dimensional array to a directory of files. The files in
the said directory are identified by their names appended by a number between
000 and 099. The first index of the array maps to the first part of the
filename while the second index of the array maps to the second part of the
filename.#!/bin/sh
#take care of the mapper args
while [ $# -gt 0 ]; do
case $1 in
-location) location=$2;;
-padding) padding=$2;;
-prefix) prefix=$2;;
-suffix) suffix=$2;;
-mod_index) mod_index=$2;;
-outer_index) outer_index=$2;;
*) echo "$0: bad mapper args" 1>&2
exit 1;;
esac
shift 2
done
for i in `seq 0 ${outer_index}`
do
for j in `seq -w 000 ${mod_index}`
do
fj=`echo ${j} | awk '{print $1 +0}'` #format j by removing leading zeros
echo "["${i}"]["${fj}"]" ${location}"/"${prefix}${j}${suffix}
done
doneThe mapper definition is as follows:file_dat dat_files[][] < ext;
exec="mapper.sh",
padding=3,
location="output",
prefix=@strcat( str_root, "_" ),
suffix=".dat",
outer_index=pid,
mod_index=n >;Assuming there are 4 files with name aaa, bbb, ccc, ddd and a mod_index of 10,
we will have 4x10=40 files mapped to a two-dimensional array in the following
pattern:Swift variableFilenamestus[0][0]output/aaa_000.datstus[0][1]output/aaa_001.datstus[0][2]output/aaa_002.datstus[0][3]output/aaa_003.dat……stus[0][9]output/aaa_009.datstus[1][0]output/bbb_000.datstus[1][1]output/bbb_001.dat……stus[3][9]output/ddd_009.datExecuting app proceduresThis section describes how Swift executes app procedures, and
requirements on the behaviour of application programs used in app
procedures. These requirements are primarily to ensure that the Swift
can run your application in different places and with the various fault
tolerance mechanisms in place.Mapping of app semantics into unix process execution semanticsThis section describes how an app procedure invocation is translated
into a (remote) unix process execution. It does not describe the
mechanisms by which Swift performs that translation; that is described
in the next section.In this section, this example Swift script is used for reference:type file;
app (file o) count(file i) {
wc @i stdout=@o;
}
file q <"input.txt">;
file r <"output.txt">;The executable for wc will be looked up in tc.data.This unix executable will then be executed in some application
procedure workspace. This means:Each application procedure workspace will have an application workspace
directory. (TODO: can collapse terms application procedure workspace
and application workspace directory ?This application workspace directory will not be shared with any other
application procedure execution attempt; all application procedure
execution attempts will run with distinct application procedure
workspaces. (for the avoidance of doubt: If a Swift script procedure
invocation is subject to multiple application procedure execution
attempts (due to Swift-level restarts, retries or replication) then each
of those application procedure execution attempts will be made in a
different application procedure workspace. )The application workspace directory will be a directory on a POSIX
filesystem accessible throughout the application execution by the
application executable.Before the application executable is executed:
The application workspace directory will exist.
The input files will exist inside the application workspace
directory (but not necessarily as direct children; there may be
subdirectories within the application workspace directory).
The input files will be those files mapped to input parameters
of the application procedure invocation. (In the example, this
means that the file input.txt will exist in the application
workspace directory)
For each input file dataset, it will be the case that @filename
or @filenames invoked with that dataset as a parameter will
return the path relative to the application workspace directory
for the file(s) that are associated with that dataset. (In the
example, that means that @i will evaluate to the path input.txt)
For each file-bound parameter of the Swift procedure invocation,
the associated files (determined by data type?) will always exist.
The input files must be treated as read only files. This may or
may not be enforced by unix file system permissions. They may or
may not be copies of the source file (conversely, they may be
links to the actual source file).
During/after the application executable execution, the following must
be true:
If the application executable execution was successful (in the
opinion of the application executable), then the application
executable should exit with unix return code 0; if the
application executable execution was unsuccessful (in the opinion
of the application executable), then the application executable
should exit with unix return code not equal to 0.
Each file mapped from an output parameter of the Swift script
procedure call must exist. Files will be mapped in the same way as
for input files.
The output subdirectories will be precreated
before execution by Swift if defined within a Swift script such as the
location attribute of a mapper. App executables expect to make them if
they are referred to in the wrapper scripts.
Output produced by running the application executable on some
inputs should be the same no matter how many times, when or where
that application executable is run. The same can vary depending
on application (for example, in an application it might be
acceptable for a PNG→JPEG conversion to produce different,
similar looking, output jpegs depending on the environment)
Things to not assume:
Anything about the path of the application workspace directory
That either the application workspace directory will be deleted or
will continue to exist or will remain unmodified after execution
has finished
That files can be passed between application procedure
invocations through any mechanism except through files known to
Swift through the mapping mechanism (there is some exception here
for external datasets - there are a separate set of assertions
that hold for external datasets)
That application executables will run on any particular site of
those available, or than any combination of applications will run
on the same or different sites.
How Swift implements the site execution modelThis section describes the implementation of the semantics described in
the previous section.Swift executes application procedures on one or more sites.Each site consists of:
worker nodes. There is some execution mechanism through which
the Swift client side executable can execute its wrapper script
on those worker nodes. This is commonly GRAM or Falkon or coasters.
a site-shared file system. This site shared filesystem is
accessible through some file transfer mechanism from the Swift
client side executable. This is commonly GridFTP or coasters. This
site shared filesystem is also accessible through the posix file
system on all worker nodes, mounted at the same location as seen
through the file transfer mechanism. Swift is configured with the
location of some site working directory on that site-shared file
system.
There is no assumption that the site shared file system for one site is
accessible from another site.For each workflow run, on each site that is used by that run, a run
directory is created in the site working directory, by the Swift client
side.In that run directory are placed several subdirectories:
shared/ - site shared files cache
kickstart/ - when kickstart is used, kickstart record files for
each job that has generated a kickstart record.
info/ - wrapper script log files
status/ - job status files
jobs/ - application workspace directories (optionally placed
here - see below)
Application execution looks like this:For each application procedure call:The Swift client side selects a site; copies the input files for that
procedure call to the site shared file cache if they are not already in
the cache, using the file transfer mechanism; and then invokes the
wrapper script on that site using the execution mechanism.The wrapper script creates the application workspace directory; places
the input files for that job into the application workspace directory
using either cp or ln -s (depending on a configuration option);
executes the application unix executable; copies output files from the
application workspace directory to the site shared directory using cp;
creates a status file under the status/ directory; and exits,
returning control to the Swift client side. Logs created during the
execution of the wrapper script are stored under the info/ directory.The Swift client side then checks for the presence of and deletes a
status file indicating success; and copies files from the site shared
directory to the appropriate client side location.The job directory is created (in the default mode) under the jobs/
directory. However, it can be created under an arbitrary other path,
which allows it to be created on a different file system (such as a
worker node local file system in the case that the worker node has a
local file system).swift-site-model.pngTechnical overview of the Swift architectureThis section attempts to provide a technical overview of the Swift
architecture.Execution layerThe execution layer causes an application program (in the form of a unix
executable) to be executed either locally or remotely.The two main choices are local unix execution and execution through
GRAM. Other options are available, and user provided code can also be
plugged in.The kickstart utility can be used to capture environmental
information at execution time to aid in debugging and provenance capture.Swift script language compilation layerStep i: text to XML intermediate form parser/processor. parser written
in ANTLR - see resources/VDL.g. The XML Schema Definition (XSD) for the
intermediate language is in resources/XDTM.xsd.Step ii: XML intermediate form to Karajan workflow. Karajan.java - reads
the XML intermediate form. compiles to karajan workflow language - for
example, expressions are converted from Swift script syntax into Karajan
syntax, and function invocations become karajan function invocations
with various modifications to parameters to accomodate return parameters
and dataset handling.Swift/karajan library layerSome Swift functionality is provided in the form of Karajan libraries
that are used at runtime by the Karajan workflows that the Swift
compiler generates.Function referenceThis section details functions that are available for use in the
Swift language.argTakes a command line parameter name as a string parameter and an
optional default value and returns the value of that string parameter
from the command line. If no default value is specified and the command
line parameter is missing, an error is generated. If a default value is
specified and the command line parameter is missing, @arg will return
the default value.Command line parameters recognized by @arg begin with exactly one
hyphen and need to be positioned after the script name.For example:trace(arg("myparam"));
trace(arg("optionalparam", "defaultvalue"));$ swift arg.swift -myparam=hello
Swift v0.3-dev r1674 (modified locally)
RunID: 20080220-1548-ylc4pmda
Swift trace: defaultvalue
Swift trace: helloextractIntextractInt(file) will read the specified file, parse an integer from
the file contents and return that integer.extractFloatSimilar to extractInt, extractFloat(file) will read the specified file, parse a float from
the file contents and return that float.filenamefilename(v) will return a string containing the filename(s) for the
file(s) mapped to the variable v. When more than one filename is
returned, the filenames will be space separated inside a single string
return value.filenamesfilenames(v) will return multiple values containing the
filename(s) for the file(s) mapped to the variable v.lengthlength(array) will return the length of an array in Swift. This function will wait for all
elements in the array to be written before returning the length.readDatareadData will read data from a specified file and assign it to Swift variable. The format of the input file is
controlled by the type of the return value. For scalar return types, such as
int, the specified file should contain a single value of that type. For arrays
of scalars, the specified file should contain one value per line. For complex types
of scalars, the file should contain two rows. The first row should be structure
member names separated by whitespace. The second row should be the
corresponding values for each structure member, separated by whitespace, in the
same order as the header row. For arrays of structs, the file should contain a
heading row listing structure member names separated by whitespace. There
should be one row for each element of the array, with structure member elements
listed in the same order as the header row and separated by whitespace. The following example shows how readData() can be used to populate an array of Swift struct-like complex type:type Employee{
string name;
int id;
string loc;
}
Employee emps[] = readData("emps.txt");Where the contents of the "emps.txt" file are:name id loc
Thomas 2222 Chicago
Gina 3333 Boston
Anne 4444 HoustonThis will result in the array "emps" with 3 members. This can be processed within a Swift script using the foreach construct as follows:foreach emp in emps{
tracef("Employee %s lives in %s and has id %d", emp.name, emp.loc, emp.id);
}readStructuredreadStructured will read data from a specified file, like readdata, but
using a different file format more closely related to that used by the
ext mapper.Input files should list, one per line, a path into a Swift structure,
and the value for that position in the structure:rows[0].columns[0] = 0
rows[0].columns[1] = 2
rows[0].columns[2] = 4
rows[1].columns[0] = 1
rows[1].columns[1] = 3
rows[1].columns[2] = 5which can be read into a structure defined like this:type vector {
int columns[];
}
type matrix {
vector rows[];
}
matrix m;
m = readStructured("readStructured.in");(since Swift 0.7, was readData2(deprecated))regexpregexp(input,pattern,replacement) will apply regular expression
substitution using the Java java.util.regexp API
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html.
For example:string v = regexp("abcdefghi", "c(def)g","monkey");will assign the value "abmonkeyhi" to the variable v.sprintfsprintf(spec, variable list) will generate a string based on the specified format.Example: string s = sprintf("\t%s\n", "hello");Format specifiers%%% sign%MFilename output (waits for close)%pFormat variable according to an internal format%bBoolean output%fFloat output%iint output%sString output%kVariable sKipped, no output%qArray outputstrcatstrcat(a,b,c,d,…) will return a string containing all of the
strings passed as parameters joined into a single string. There may be
any number of parameters.The + operator concatenates two strings: strcat(a,b) is the same as
a + bstrcutstrcut(input,pattern) will match the regular expression in the
pattern parameter against the supplied input string and return the
section that matches the first matching parenthesised group.For example:string t = "my name is John and i like puppies.";
string name = strcut(t, "my name is ([^ ]*) ");
string out = strcat("Your name is ",name);
trace(out);This will output the message: Your name is John.strjoinstrjoin(array, delimiter) will combine the elements of an array
into a single string separated by a given delimiter. The array
passed to strjoin must be of a primitive type (string, int, float,
or boolean). It will not join the contents of an array of files.Example:string test[] = ["this", "is", "a", "test" ];
string mystring = strjoin(test, " ");
tracef("%s\n", mystring);This will print the string "this is a test".strsplitstrsplit(input,pattern) will split the input string based on
separators that match the given pattern and return a string array.Example:string t = "my name is John and i like puppies.";
string words[] = strsplit(t, "\\s");
foreach word in words {
trace(word);
}This will output one word of the sentence on each line (though not
necessarily in order, due to the fact that foreach iterations execute in
parallel).toInttoInt(input) will parse its input string into an integer. This can be
used with arg() to pass input parameters to a Swift script as
integers.toFloattoFloat(input) will parse its input string into a floating point number. This can be
used with arg() to pass input parameters to a Swift script as
floating point numbers.toStringtoString(input) will parse its input into a string. Input can be an int, float, string,
or boolean.tracetrace will log its parameters. By default these will appear on both
stdout and in the run log file. Some formatting occurs to produce the
log message. The particular output format should not be relied upon.traceftracef(spec, variable list) will log its parameters as formatted
by the formatter spec. spec must be a string. Checks the type of
the specifiers arguments against the variable list and allows for
certain escape characters.Example:int i = 3;
tracef("%s: %i\n", "the value is", i);Specifiers:%s
Format a string.
%b
Format a boolean.
%i
Format a number as an integer.
%f
Format a number as a floating point number.
%q
Format an array.
%M
Format a mapped variable’s filename.
%k
Wait for the given variable but do not format it.
%p
Format variable according to an internal format.
Escape sequences:\n
Produce a newline.
\t
Produce a tab.
Known issues:
Swift does not correctly scan certain backslash
sequences such as \\.
javajava(class_name, static_method, method_arg) will call a java static method of the class class_name.writeDatawriteData will write out data structures in the format described for
readData. The following example demonstrates how one can write a string "foo" into a file "writeDataPrimitive.out":type file;
string s = "foo";
file f <"writeDataPrimitive.out">;
f=writeData(s);ConfigurationSwift is mainly configured using a configuration file, typically called swift.conf.
This file contains configuration properties and site descriptions. A simple
configuration file may look like this:site.mysite {
execution {
type: "coaster"
URL: "my.site.org"
jobManager: "ssh:local"
}
staging: "local"
app.ALL {executable: "*"}
}
# select sites to run on
sites: [mysite]
# other settings
lazy.errors: falseConfiguration SyntaxThe Swift configuration files are expressed in a modified version of JSON. The main
additions to JSON are:
Quotes around string values, in particular keys, are optional, unless the strings
contain special characters (single/double quotes, square and curly braces, white space,
$, :, =, ,, `, ^, ?, !, @, *, \), or if they
represent other values: true, false, null, and numbers.
= and : can be used interchangeably to separate keys from values
= (or :) is optional before an open bracket
Commas are optional as separators if there is a new line
${…} expansion can be used to substitute environment variable values or Java
system properties. If the value of an environment variable is needed, it must be
prefixed with env.. For example ${env.PATH}. Except for include directives, the
${…} must not be inside double quotes for the substitution to work. The same
outcome can be achieved using implicit string concatenation: "/home/"${env.USER}"/bin"Comments can be introduced by starting a line with a hash symbol (#) or using
a double slash (//):# This is a comment
// This is also a comment
keepSitesDir: true # This is a comment following a valid propertyInclude DirectivesInclude directives can be used to include the contents of a Swift configuration file
from another Swift configuration file. This is done using the literal include followed
by a quoted string containing the path to the target file. The path may contain
references to environment variables or system properties using the substitution
syntax explained above. For example:# an absolute path name
include "/home/joedoe/swift-config/site1.conf"
# include a file from the Swift distribution package
include "${swift.home}/etc/sites/beagle.conf"
# include a file using an environment variable
include "${env.SWIFT_CONFIG_DIR}/glow.conf"Property MergingIf two properties with the same name are present in a configuration file, they are either
merged or the latter one overrides the earlier one. This depends on the type of property.
Simple values are always overridden, while objects are merged. For example:key: 1
key: 2
# key is now 2
object {
key1: 1
}
object {
key2: 2
}
# object is now { key1: 1, key2: 2}This can be used to define certain template files that contain most of the definitions for
sites, and then include them in other files and override or add only certain aspects of
those sites. For example, assume swift-local.conf includes a definition for a site named
local that can be used to run applications on the Swift client side. If you wanted to override
only the work directory, the following swift.conf could be used:include "swift-local.conf"
site.local {
# use existing definition for site.local, but override workDirectory
workDirectory: "/tmp"
}If, on the other hand, you want to fully override the definition of site.local, you could
set it to null first and then provide your own definition:include "swift-local.conf"
# forget previous definition of site.local
site.local: null
# define your own site.local
site.local {
...
}Configuration Search PathBy default, Swift attempts to load multiple configuration files, merging them sequentially
as described in the Property Merging Section. The files are:
Distribution Configuration ([D]): ${swift.home}/etc/swift.conf
Site Configuration ([S]): ${env.SWIFT_SITE_CONF} (if SWIFT_SITE_CONF is defined)
User Configuration ([U]): ${env.HOME}/.swift/swift.conf (if present)
Run Configiuration ([R]): ${env.PWD}/swift.conf (if present)
In addition, a number of configuration properties can be overridden individually on the
Swift command line. For a list of such configuration properties, please use swift -help
or refer to the <<??, Running Swift>> Section in this document.The run configuration can be overridden on the Swift command line using the -config
argument. If -config is specified, Swift will not attempt to load swift.conf from
the current directory.The entire configuration search path can be replaced with a custom search path using
the -configpath command line argument. The value passed to -configpath must be a
list of paths pointing to various configuration files, separated by the standard
operating system path separator (: on Linux and ; on Windows). For example:swift -configpath /etc/swift/swift.conf:~/swift-configs/s1.conf:swift.confIf in doubt about what configuration files are being loaded or to troubleshoot
configuration issues, Swift can be started with the -listconfig command line argument.
-listconfig accepts to possible values:files: will print a list of configuration files loaded by Swift
full: will print a list of configuration files loaded by Swift, as well as
the final merged configuration.
Configuration File StructureThe contents of a Swift configuration file can be divided into a number of relevant
sections:
site declarations
global application declarations
Swift configuration properties
Site DeclarationsSwift site declarations are specified using the site.<name> property, where text
inside angle brackets is to be interpreted as a generic label for user-specified
content, whereas content between square brackets is optional:site.<name> {
execution {...}
[staging: "swift" | "local" | "service-local" | "shared-fs" | "wrapper"]
[filesystem {...}]
workDirectory: <path>
[<site options>]
[<application declarations>]
}A site name can be any string. If the string contains special characters, it must be
quoted:site."My-$pecial-$ite" {...}Site SelectionOnce sites are declared, they must be explicitly enabled for Swift to use them. This
can be achieved with the sites option, which accepts either an array or a comma-separated
list of site names:sites: ["site1", "site2"]
# alternatively:
sites: "site1, site2"The sites option can also be specified on the Swift command line:swift -sites site1,site2 script.swiftExecution MechanismThe execution property tells Swift how applications should be executed on a site: execution {
type: <string>
[URL: <string>]
[jobManager: <string>]
[<execution provider options>]
}The type property is used to select one of the mechanisms for application execution
that is known by Swift. A comprehensinve list of execution mechanisms can be found
in Execution Mechanisms Section. A summary is shown below:
Swift Execution MechanismsType URL requiredUses jobManagerDefault jobManagerStaging methods supported Descriptionlocalnono-swift, local, wrapperRuns applications locally using a simple fork()-based mechanismcoasteryesyesnoneswift, wrapper, local, service-local, shared-fs, directSubmits applications through an automatically-deployed Swift Coasters servicecoaster-persistentyesyesnoneswift, wrapper, local, service-local, shared-fs, directUses a manually deployed Swift Coasters serviceGRAM5yesyes"fork"swift, wrapperUses the GRAM: User’s Guide component of
the Globus Toolkit.GT2An alias for GRAM5SSHyesno-swift, wrapperRuns applications using a Java implementation of the SSH protocolSSH-CLyesno-swift, wrapperLike SSH except it uses the command-line ssh tool.PBSnono-swift, wrapperSubmits applications to a PBS or Torque resource managerCondornono-swift, wrapperSubmits applications using CondorSGEnono-swift, wrapperUses the Sun Grid EngineSLURMnono-swift, wrapperUses the SLURM local schedulerLSFnono-swift, wrapperSubmits applications to Platform’s Load Sharing Facility
The execution provider options are options that specify finer details on how
on application should be executed. They depend on the chosen mechanism and are detailed in
Execution Mechanisms Section. This is where
Coasters options, such as nodeGranularity or softImage, would
be specified. Example:execution {
type: "coaster"
jobManager: "local:local"
options {
maxJobs: 1
tasksPerNode: 2
workerLoggingLevel: TRACE
}
}A complete list of Swift Coasters options can be found in Coaster OptionsStagingThe staging property instructs Swift how to handle application input and output files.
The swift and wrapper staging methods are supported universally, but the swift method
requires the filesystem property to be specified. If not specified, this option defaults to
swift. Support for the other choices is dependent on the execution mechanism. This is
detailed in the Execution Mechanisms Table above. A
description of each staging method is provided in the table below:
Swift Staging Methods Staging Method DescriptionswiftThis method instructs Swift to use a filesystem provider to direct all
necessary staging operations from the Swift client-side to the cluster
head node. If this method is used, the workDirectory must point to
a head node path that is on a shared file system accessible by the
compute nodes.wrapperFile staging is done by the Swift application wrapperlocalUsed to indicate that files should be staged in/out from/to the site
on which Swift is running. In the case of Swift Coasters, the system
proxies the tranfers between client side and compute nodes through the
Coaster Service.service-localThis method instructs the execution mechanism provider to stage input and
output files from the remote site where the execution service is located.
For example, if a Coaster Service is started on the login node of a
cluster, the Coaster Service will perform the staging from a file system
on the login node to the compute node and back.shared-fsThis method is used by Coasters to implement a simple staging mechanism in
which files are accessed using a shared filesystem that is accessible by
compute nodesdirectTries to avoid moving files around as much as possible and passes absolute
file names to the application instead. The node on which the application
is running must have access to the filesystem on which swift data is
located.
File SystemThe file system properties are used with staging: "swift" to tell Swift how to access remote
file systems. Valid types are described below:
Swift File System Providers Type URL required DescriptionlocalnoCopies files locally on the Swift client sideGSIFTPyesAccesses a remote file system using GridFTPGridFTPyesAn alias for GSIFTPSSHyesUses the SCP protocol
Site OptionsSite options control various aspects of how Swift handles application execution on a site.
All options except workDirectory are optional. They are listed in the following table:
Site Options Option Valid values Default value DescriptionOSmany"INTEL32::LINUX"Can be used to tell Swift the type of the operating system
running on the remote site. By default, Swift assumes a
UNIX/Linux type OS. There is some limited support for
running under Windows, in which case this property must be
set to one of "INTEL32::WINDOWS" or "INTEL64::WINDOWS"workDirectorypath-Points to a directory in which Swift can maintain a set of
files relevant to the execution of an application on the
site. By default, applications will be executed on the
compute nodes in a sub-directory of workDirectory, which
implies that workDirectory must be accessible from the
compute nodes.scratchpath-If specified, it instructs swift to run applications in
a directory different than workDirectory. Contrary to the
requirement for workDirectory, scratch can point to
a file system local to compute nodes. This option is useful
if applications do intensive I/O on temporary files created
in their work directory, or if they access their input/output
files in a non-linear fashion.keepSiteDirtrue, falsefalseIf set to true, site application directories (i.e. workDirectory)
will not be cleaned up when Swift completes a run. This
can be useful for debugging.statusMode"files", "provider""files"Controls whether application exit codes are handled by the
execution mechanism or passed back to Swift by the Swift
wrapper script through files. Traditionally, Globus GRAM
did not use to return application exit codes. This has
changed in Globus Toolkit 5.x. However, some local scheduler
execution mechanisms, such as PBS, are still unable to
return application exit codes. In such cases, it is necessary
to pass the application exit codes back to Swift in files.
This comes at a slight price in performance, since a file
needs to be created, written to, and transferred back to
Swift for each application invocation. It is however also
the default, since it works in all cases.maxParallelTasksinteger2The maximum number of concurrent application invocations
allowed on this site.initialParallelTasksinteger2The limit on the number of concurrent application invocations
on this site when a Swift run is started. As invocations
complete successfully, the number of concurrent invocations
on the site is increased up to maxParallelTasks.
Additional, less frequently used options, are as follows:
Obscure options that you are unlikely to need to worry about Option Valid values Default value DescriptionwrapperParameterMode"args", "files""args"If set to "files", Swift will, as much as possible, pass
application arguments through files. The applications will
be invoked normally, with their arguments in the **argv
parameter to the main() function. This can be useful if the
execution mechanism has limitations on the size of command
line arguments that can be passed through. An example of
execution mechanism exhibiting this problem is Condor.wrapperInterpreterpath"/bin/bash" or "cscript.exe" on WindowsPoints to the interpreter used to run the Swift application
invocation wrapperwrapperScriptstring"_swiftwrap" or "_swiftwrap.vbs" on WindowsPoints to the Swift application invocation wrapper. The file
must exist in the libexec directory in the Swift distributionwrapperInterpreterOptionslist of strings[] on UNIX/Linux or ["//Nologo"] on WindowsCommand line options to be passed to the wrapper interpretercleanupCommandstring"/bin/rm" or "cmd.exe" on WindowsA command to use for the cleaning of site directories (unless
keepSiteDir is set to true) at the end of a run.cleanupCommandOptionslist of strings["-rf"] or ["/C", "del", "/Q"] on WindowsArguments to pass to the cleanup command when cleaning up site
work directoriesdelayBasenumber2.0Swift keeps a quality indicator for each site it runs applications
on. This is a number that gets increased for every successful
application invocation, and decreased for every failure. It then
uses this number in deciding which sites to run applications on
(when multiple sites are defined). If this number becomes very
low (a sign of repeated failures on a site), Swift implements
an exponential back-off that prevents jobs from being sent to a
site that continously fails them. delayBase is the base for
that exponential back-off:
delay = delayBase ^ (-score * 100)maxSubmitRatepositive number-Some combinations of site and execution mechanisms may become
error prone if jobs are submitted too fast. This option can
be used to limit the submission rate. If set to some number N,
Swift will submit applications at a rate of at most N
per second.
Application DeclarationsApplications can either be declared globally, outside of a site declaration,
or specific to a site, inside a site declaration:app.(<appName>|ALL) {
# global application
...
}
site.<siteName> {
app.(<appName>|ALL) {
# site application
...
}
}A special application name, ALL, can be used to declare options for all
applications. When Swift attempts to run an application named X, it will
first look at site application declarations for app.X. If not found, it will
check if a site application declaration exists for app.ALL. The search will
continue with the global app.X and then the global all.ALL until a match
is found. It is possible that a specific application will only be declared
on a sub-set of all the sites and not globally. Swift will then only select
a site where the application is declared and will not attempt to run the
application on other sites.An application declaration takes the following form:app.<appName> {
executable: (<string>|"*")
[jobQueue: <string>]
[jobProject: <string>]
[maxWallTime: <time>]
[options: {...}]
<environment variables>
}The executable is mandatory, and it points to the actual location of the
executable that implements the application. The special string "*" can
be used to indicate that the executable has the same name as the application
name. This is useful in conjunction with app.ALL to essentially declare
that a site can be used to execute any application from a Swift script. If the
executable is not an absolute path, it will be searched using the PATH
envirnoment variable on the remote site.Environment variables can be defined as follows: env.<name>: <value>For example: env.LD_LIBRARY_PATH: "/home/joedoe/lib"The remaining options are:
Application Options Name Valid values DescriptionjobQueueanyIf the application is executed using a mechanism that submits to
a queuing system, this option can be used to select a specific
queue for the applicationjobProjectanyA queuing system project to associate the job with.maxWallTime"mm" or "hh:mm" or "hh:mm:ss"The maximum amount of time that the application will take to execute
on the site. Most application execution mechanisms will both require
and enforce this value by terminating the application if it exceeds
the specified time. The default value is 10 minutes.
General Swift OptionsThere are a number of configuration options that modify the way that
the Swift run-time behaves. They are listed below:
General Swift Options Name Valid values Default value Descriptionsitesarray of strings (e.g. ["site1", "site2"]) or CSV string (e.g. "site1, site2")noneSelects, out of the set of all declared sites, a sub-set of sites to
run applications on.hostNamestringautodetectedCan be used to specify a publicly reacheable DNS name or IP address for this
machine which is generally used for Globus or Coaster callbacks. Normally this should be
auto-detected, but if you do not have a public DNS name, you may want to set this.TCPPortRange"lowPort, highPort"noneA TCP port range can be specified to restrict the ports on which certain callback
services are started. This is likely needed if your submit host is behind a firewall,
in which case the firewall should be configured to allow incoming connections on
ports in this range.lazyErrorstrue, falsefalseUse a lazy mode to deal with errors. When set to true Swift will proceed with the
execution until no more data can be derived because of errors in dependent steps. If
set to false, an error will cause the execution to immediately stopexecutionRetriesnon-negative integer0The number of time an application invocation will be retries if it fails until Swift
finally gives up and declares it failed. The total number of attempts will be 1
executionRetries.logProvenancetrue, falsefalseIf set to true, Swift will record provenance information in the log file.alwaysTransferWrapperLogtrue, falsefalseControls when wrapper logs are transfered back to the submit host. If set to
false, Swift will only transfer a wrapper log for a given job when that job fails.
If set to true, Swift will transfer wrapper logs whether a job fails or not.fileGCEnabledtrue, falsetrueControls the file garbage collector. If set to false, files mapped by
collectable mappers (such as the concurrent mapper) will not be deleted when their
Swift variables go out of scope.mappingCheckerEnabledtrue, falsetrueControls the run-time duplicate mapping checker (which indetifies mapping
conflicts). When enabled, a record of all mapped data is kept, so this comes at the
expense of a slight memory leak. If set false, the mapping checker is disabled.tracingEnabledtrue, falsefalseEnables execution tracing. If set to true, operations within Swift such as
iterations, invocations, assignments, and declarations, as well as data dependencies
will be logged. This comes at a cost in performance. It is therefore disabled by
default.maxForeachThreadspositive integer16384Limits the number of concurrent iterations that each foreach statement
can have at one time. This conserves memory for swift programs that
have large numbers of iterations (which would otherwise all be executed
in parallel).TickertickerEnabledtrue, falsetrueControls the output ticker, which regularly prints information about the counts
of application states on the Swift’s process standard outputtickerPrefixstring"Progress: "Specifies a string to prefix to each ticker line outputtickerDateFormatstring"E, dd MMM yyyy HH:mm:ssZ"Specifies the date/time format to use for the time stamp of each ticker
line. It must conform to Java’s
<http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html,SimpleDateFormat>
syntax.CDMCDMBroadcastModestring"file"-CDMLogFilestring"cdm.log"-ReplicationreplicationEnabledtrue, falsefalseIf enabled, jobs that are queued longer than a certain amount of time will
have a duplicate version re-submitted. This process will continue until a
maximum pre-set number of such replicas is queued. When one of the replicas
becomes active, all other replicas are canceled. This mechanism can potentially
prevent a single overloaded site from completely blocking a run.replicationMinQueueTimeseconds60When replication is enabled, this is the amount of time that a job needs to
be queued until a new replica is created.replicationLimitinteger > 03The maximum number of replicas allowed for a given application instance.Wrapper StagingwrapperStagingLocalServerstring"file://"When file staging is set to "wrapper", this indicates the default URL
scheme that is prefixed to local files.ThrottlingjobSubmitThrottleinteger > 0 or "off"4Limits the number of jobs that can concurrently be in the process of being
submitted, that is in the "Submitting" state. This is the state where the job
information is being communicated to a remote service. Certain execution
mechanisms may become inefficient if too many jobs are being submitted
concurrently and there are no benefits to parallelizing submission beyond a
certain point. Please not that this does not apply to the number of jobs that
can be active concurrently.hostJobSubmitThrottleinteger > 0 or "off"2Like jobSubmitThrottle, except it applies to each individual site.fileTransfersThrottleinteger > 0 or "off"4Limits the number of concurrent file transfers when file staging is set to
"swift". Arbitrarily increasing file transfer parallelism leads to little
benefits as the throughput approaches the maximum avaiable network bandwidth.
Instead it can lead to an increase in latencies which may increase the chances
of triggering timeouts.fileOperationsThrottleinteger > 0 or "off"8Limits the number of concurrent file operations that can be active at a given
time when file staging is set to "swift". File operations are defined to be all
remote operations on a filesystem that exclude file transfers. Examples are:
listing the contents of a directory, creating a directory, removing a file, etc.Global versions of site optionsstaging"swift", "local", "service-local", "shared-fs", "wrapper""swift"See Staging Methods.keepSiteDirtrue, falsefalseSee Site Options.statusMode"files", "provider""files"See Site Options.wrapperParameterMode"args", "files""args"See Other Site Options.
Run directoriesWhen you run Swift, you will see a run directory get created. The run
directory has the name of runNNN, where NNN starts at 000 and increments for
every run.The run directories can be useful for debugging. They contain:
.Run directory contentsappsAn apps generated from swift.propertiescfA configuration file generated from swift.propertiesrunNNN.logThe log file generated during the Swift runscriptname-runNNN.dDebug directory containing wrapper logsscriptsDirectory that contains scheduler scripts used for that runsites.xmlA sites.xml generated from swift.propertiesswift.outThe standard out and standard error generated by SwiftExecution MechanismsSwift allows application execution through a number of mechanisms (or execution
providers). The choice of each mechanism is dependent on the software installed
on a certain compute cluster. The following sub-sections list the available
choices together with their supported options.LocalThe local execution mechanism can be used to run applications locally through
simple fork() calls.General ConfigurationURL requirednoJob Managernot usedStaging methodsswift, wrapper, local, service-local, shared-fs, directOptionsN/AJob Options Name Type Default Value DescriptioncountInteger1Launch this many copies of the application for each invocationExample site.local {
execution {
type: "local"
}
staging: direct
app.ALL {
executable: "*"
count: 1
}
}GT5Uses the <http://toolkit.globus.org/toolkit/docs/latest-stable/gram5/#gram5,GRAM> component
of the Globus Toolkit to launch jobs on remote resources.General ConfigurationURL requiredyesJob ManagerIn GRAM, job managers instruct the GRAM service to submit
jobs to specific resource managers on the server side. The
exact available job managers depend on the particular GRAM
installation. However, "fork", which instructs GRAM to
run jobs directly on the service node, should always be
available. In addition, the available job managers would
typically match the queuing systems installed on the server
side. For example, if a cluster uses Torque/PBS, then the
"PBS"Swift automatically converts job manager
names to lower case strings and adds the "jobmanager-"
prefix to match the format required by Globus GRAM job
manager should be available. The following is
a list of known possible job manager values: "fork", "PBS",
"LSF", "Condor", "SGE", "Slurm"Staging methodsswift, wrapperOptionsN/AJob OptionsFor a complete list and description of these options, please see the
Globus GRAM documentation Name Type Default Value DescriptioncountInteger1Launch this many copies of the application for each invocationmax_timeInteger (minutes)-max_wall_timeInteger (minutes)-max_cpu_timeInteger (minutes)-max_memoryInteger (MB)-min_memoryInteger (MB)-projectString-A LRM project to associate the job withqueueString-LRM queue to submit toExample site.example {
execution {
type: "gt5"
url: "login.example.org"
jobManager: "PBS"
}
staging: swift
app.sim {
executable: /usr/bin/sim
queue: "fast"
min_memory: 120
}
}SSHRuns jobs through a Java implementation of the SSH protocol. This mechanism
generally results in a higher throughput than using the command-line SSH tool
since it can reduce the number of authentication operations by re-using
connections.General ConfigurationURL requiredyesJob Managernot usedStaging methodsswift, wrapperOptionsN/AJob OptionsN/AExample site.example {
execution {
type: "ssh"
url: "login.example.org"
}
}SSH-CLUses the ssh command-line tool to run jobs.General ConfigurationURL requiredyesJob Managernot usedStaging methodsswift, wrapperOptionsN/AJob OptionsN/AExample site.example {
execution {
type: "ssh-cl"
url: "login.example.org"
}
}PBSSubmits jobs directly to a Torque/PBS queue.General ConfigurationURL requirednoJob Managernot usedStaging methodsswift, wrapperOptionsN/AJob Options Name Type Default Value DescriptioncountInteger1Request this number of nodes for the jobppnInteger1Sets the number of Processes Per NodedepthInteger1Only used if mpp is set to true. Sets the
depth (number of OpenMP threads/cores to allocate
for each process)pbs.mppBooleanfalseIf set to true, use the mpp versions of
count, ppn, and depth: mppwidth, mppnppn,
mppdepth respectively.pbs.propertiesString-If specified, this string will be passed verbatim
to PBS inside the "#PBS -l" line.projectString-A PBS project to associate the job withqueueString-PBS queue to submit topbs.resource_listString-WRITEME!pbs.aprunBooleanfalseIf specified, use the aprun tool instead of ssh
to start jobs on the compute nodes. aprun is a tool
typically found on Cray systems.Example site.pbs {
execution {
type: "PBS"
}
app.sim {
executable: "/usr/bin/sim"
count: 2
ppn: 2
depth: 2
pbs.mpp: true
queue: "fast"
}
}CondorSubmits jobs using the HTCondor system.General ConfigurationURL requirednoJob Managernot usedStaging methodsswift, wrapperOptionsN/AJob Options Name Type Default Value DescriptionjobType"MPI", "grid", "nonshared", nonenoneSpecifies the job type (Condor universe).
"nonshared" translates to the "vanilla" universe.holdIsFailureBooleanfalseTreat jobs in the held state as failed.countInteger1Number of machines to request for the jobcondor.*Any-Can be used to pass arbitrary properties to Condor.Example site.condor {
execution {
type: "Condor"
}
app.sim {
executable: "/usr/bin/sim"
condor.leave_in_queue: "TRUE"
}
}CoastersCoasters are a mechanism that packages multiple swift application invocations into larger
LRM jobs resulting in increased efficiency when running multiple small applications.
To distinguish between the application invocations and the jobs in which Coasters
package them, the terms task and job are used, respectively.General ConfigurationURL requiredmaybeJob Manager"em1:em2", where em1 is an execution mechanism used to
start the Coaster Service and em2 is an execution mechanism
used by the Coaster Service to start jobs. If em1 requires
an URL, then the URL is required.Staging methodsswift, wrapper, local, service-local, shared-fs, directOptions Name Type Default Value DescriptionmaxJobsInteger20The maximum number of jobs that can be running at a time.nodeGranularityInteger1If specified, the number of nodes requested for each job
will be a multiple of this numbertasksPerNodeInteger1The maximum number of concurrent tasks allowed to run
on a nodeallocationStepSize[0.0, 1.0]0.1The Coaster service allocates jobs periodically depending
on the number of tasks queued. This number can be used to
limit the percentage of jobs out of maxJobs that will
be used in each allocation step.lowOverallocationPositive float10Indicates how much bigger the job wall time should be in
comparison to the task wall time for tasks that have a
small wall time (around 1 second)highOverallocationPositive float1Indicates how much bigger the job wall time should in
comparison to the task wall time for tasks that have a
very large wall timeoverallocationDecayFactorPositive float1e-3Used to interpolate the "overallocation" for task wall times
that are neither very large or very small. The formula used
is jobWalltime = taskWalltime * ((L - H) * exp(-taskWalltime * D) + H),
where L is lowOverallocation, H is highOverallocation, and
D is overallocationDecayFactor.spread[0.0, 1.0]0.9When allocating jobs, the total number of nodes to allocate
can be fixed based on, for example, maximizing parallelism
for all the tasks. However, the way the nodes are distributed
to individual jobs can be arbitrary. This parameter controls
whether nodes should be uniformly distributed among jobs
(spread = 0) or if the node distribution should be as diverse
as possible (spread = 1). A high spread could be useful
in fitting jobs better into a cluster’s schedule.reserveInteger (seconds)60The amount of time to add to each job’s wall time in order to
prevent premature termination of tasks due to various
overheadsmaxNodesPerJobInteger1The maximum number of nodes that a job is allowed to have.maxJobTime"HH:MM:SS"-The maximum wall time that a job is allowed to haveuserHomeOverrideString-A path that can be used to override the default user home
directory. This may be necessary on systems on which
compute nodes do not have access to the default user home
directory.internalHostNameString-A host name or address that can be used to initiate connections
from compute nodes to the login node. Specifying this is
seldom necessary.jobQueueString-The LRM queue to submit the jobs tojobProjectString-A LRM project to associate the job withworkerLoggingLevel"ERROR", "WARN", "INFO", "DEBUG", "TRACE", or nonenoneIf specified, the Coaster Workers produce a log fileworkerLoggingDirectoryString"~/.globus/coasters"The directory where the worker logs will be created.
This directory needs to be accessible from compute
nodes.softImageString-WRITEME!Job OptionsN/AExample site.condor {
execution {
type: "Condor"
}
app.sim {
executable: "/usr/bin/sim"
condor.leave_in_queue: "TRUE"
}
}DebuggingRetriesIf an application procedure execution fails, Swift will attempt that
execution again repeatedly until it succeeds, up until the limit defined
in the execution.retries configuration property.Site selection will occur for retried jobs in the same way that it
happens for new jobs. Retried jobs may run on the same site or may run
on a different site.If the retry limit execution.retries is reached for an application
procedure, then that application procedure will fail. This will cause
the entire run to fail - either immediately (if the lazy.errors
property is false) or after all other possible work has been attempted
(if the lazy.errors property is true).With or without lazy errors, each app is re-tried <execution.retries>
times before it is considered failed for good. An app that has failed
but still has retries left will appear as "Failed but can retry".Without lazy errors, once the first (time-wise) app has run out of
retries, the whole run is stopped and the error reported.With lazy errors, if an app fails after all retries, its outputs are
marked as failed. All apps that depend on failed outputs will also fail
and their outputs marked as failed. All apps that have non-failed
outputs will continue to run normally until everything that can proceed
completes.For example, if you have:foreach x in [1:1024] {
app(x);
}If the first started app fails, all the other ones can still
continue, and if they don’t otherwise fail, the run will only terminate
when all 1023 of them will complete.So basically the idea behind lazy errors is to run EVERYTHING that can
safely be run before stopping.Some types of errors (such as internal swift errors happening in an app
thread) will still stop the run immediately even in lazy errors mode.
But we all know there are no such things as internal swift errors :)RestartsIf a run fails, Swift can resume the program from the point of failure.
When a run fails, a restart log file will be left behind in the run directory
called restart.log. This restart log can then be passed to a subsequent Swift
invocation using the -resume parameter. Swift will resume execution, avoiding
execution of invocations that have previously completed successfully. The Swift
source file and input data files should not be modified between runs.Normally, if the run completes successfully, the restart log file is deleted.
If however the workflow fails, swift can use the restart log file to continue
execution from a point before the failure occurred. In order to restart from a
restart log file, the -resume logfile argument can be used after the
Swift script file name. Example:$ swift -resume runNNN/restart.log example.swift.Monitoring SwiftSwift runs can be monitored for progress and resource usage. There are three basic monitors available: Swing, TUI, and http.HTTP MonitorThe HTTP monitor will allow for the monitoring of Swift via a web browser. To start the HTTP monitor, run Swift with the -ui http:<port>
command line option. For example:swift -ui http:8000 modis.swiftHTTP-monitor.jpgThis will create a server running on port 8000 on the machine where Swift is running. Point your web browser to http://<ip_address>:8000 to view
progress.Swing MonitorThe Swing monitor displays information via a Java gui/X window. To start the Swing monitor, run Swift with the
-ui Swing command line option. For example:swift -ui Swing modis.swiftSwing-monitor.jpgThis will produce a gui/X window consisting of the following tabs:
Summary
Graphs
Applications
Tasks
Gantt Chart
TUI MonitorThe TUI (textual user interface) monitor is one option for monitoring Swift on the console using a curses-like
library.The progress of a Swift run can be monitored using the -ui TUI option. For example:swift -ui TUI modis.swiftTUI-monitor.jpgThis will produce a textual user interface with multiple tabs, each showing the following features of the current Swift run:
A summary view showing task status
An apps tab
A jobs tab
A transfer tab
A scheduler tab
A Task statistics tab
A customized tab called Ben’s ViewNavigation between these tabs can be done using the function keys f2 through f8.Log analysisSwift logs can contain a lot of information. Swift includes a utility called "swiftlog" that
analyzes the log and prints a nicely formatted summary of all tasks of a given run.swiftlog usage$ swiftlog run027
Task 1
App name = cat
Command line arguments = data.txt data2.txt
Host = westmere
Start time = 17:09:59,607+0000
Stop time = 17:10:22,962+0000
Work directory = catsn-run027/jobs/r/cat-r6pxt6kl
Staged in files = file://localhost/data.txt file://localhost/data2.txt
Staged out files = catsn.0004.outcatsn.0004.err
Task 2
App name = cat
Command line arguments = data.txt data2.txt
Host = westmere
Start time = 17:09:59,607+0000
Stop time = 17:10:22,965+0000
Work directory = catsn-run027/jobs/q/cat-q6pxt6kl
Staged in files = file://localhost/data.txt file://localhost/data2.txt
Staged out files = catsn.0010.outcatsn.0010.errhome