1. Overview

Swift is a data-flow oriented coarse grained scripting language that supports dataset typing and mapping, dataset iteration, conditional branching, and procedural composition.

Swift programs (or workflows) are written in a language called Swift.

Swift scripts are primarily concerned with processing (possibly large) collections of data files, by invoking programs to do that processing. Swift handles execution of such programs on remote sites by choosing sites, handling the staging of input and output files to and from the chosen sites and remote execution of programs.

2. Getting Started

This section will provide links and information to new Swift users about how to get started using Swift.

2.1. Quickstart

This section provides the basic steps for downloading and installing Swift.

2.2. Tutorials

There are a few tutorials available for specific clusters and supercomputers.

3. The Swift Language

3.1. Language Basics

A Swift script describes data, application components, invocations of applications components, and the inter-relations (data flow) between those invocations.

Data is represented in a script by strongly-typed single-assignment variables. The syntax superficially resembles C and Java. For example, { and } characters are used to enclose blocks of statements.

Types in Swift can be atomic or composite. An atomic type can be either a primitive type or a mapped type. Swift provides a fixed set of primitive types, such as integer and string. A mapped type indicates that the actual data does not reside in CPU addressable memory (as it would in conventional programming languages), but in POSIX-like files. Composite types are further subdivided into structures and arrays. Structures are similar in most respects to structure types in other languages. In Swift, structures are defined using the type keyword (there is no struct keyword). Arrays use numeric indices, but are sparse. They can contain elements of any type, including other array types, but all elements in an array must be of the same type. We often refer to instances of composites of mapped types as datasets.

type-hierarchy.png

Atomic types such as string, int, float and double work the same way as in C-like programming languages. A variable of such atomic types can be defined as follows:

string astring = "hello";

A struct variable is defined using the type keyword as discussed above. Following is an example of a variable holding employee data:

type Employee{
    string name;
    int id;
    string loc;
}

The members of the structure defined above can be accessed using the dot notation. An example of a variable of type Employee is as follows:

Employee emp;
emp.name="Thomas";
emp.id=2222;
emp.loc="Chicago";

Arrays of structures are allowed in Swift. A convenient way of populating structures and arrays of structures is to use the readData() function.

Mapped type and composite type variable declarations can be annotated with a mapping descriptor indicating the file(s) that make up that dataset. For example, the following line declares a variable named photo with type image. It additionally declares that the data for this variable is stored in a single file named shane.jpg.

image photo <"shane.jpg">;

Component programs of scripts are declared in an app declaration, with the description of the command line syntax for that program and a list of input and output data. An app block describes a functional/dataflow style interface to imperative components.

For example, the following example lists a procedure which makes use of the ImageMagick http://www.imagemagick.org/ convert command to rotate a supplied image by a specified angle:

app (image output) rotate(image input) {
  convert "-rotate" angle @input @output;
}

A procedure is invoked using the familiar syntax:

rotated = rotate(photo, 180);

While this looks like an assignment, the actual unix level execution consists of invoking the command line specified in the app declaration, with variables on the left of the assignment bound to the output parameters, and variables to the right of the procedure invocation passed as inputs.

The examples above have used the type image without any definition of that type. We can declare it as a marker type which has no structure exposed to Swift script:

type image;

This does not indicate that the data is unstructured; but it indicates that the structure of the data is not exposed to Swift. Instead, Swift will treat variables of this type as individual opaque files.

With mechanisms to declare types, map variables to data files, and declare and invoke procedures, we can build a complete (albeit simple) script:

type image;
image photo <"shane.jpg">;
image rotated <"rotated.jpg">;

app (image output) rotate(image input, int angle) {
   convert "-rotate" angle @input @output;
}

rotated = rotate(photo, 180);

This script can be invoked from the command line:

  $ ls *.jpg
  shane.jpg
  $ swift example.swift
  ...
  $ ls *.jpg
  shane.jpg rotated.jpg

This executes a single convert command, hiding from the user features such as remote multisite execution and fault tolerance that will be discussed in a later section.

Figure 1. shane.jpg

userguide-shane.jpg

Figure 2. rotated.jpg

userguide-rotated.jpg

3.2. Arrays and Parallel Execution

Arrays of values can be declared using the [] suffix. Following is an example of an array of strings:

string pets[] = ["shane", "noddy", "leo"];

An array may be mapped to a collection of files, one element per file, by using a different form of mapping expression. For example, the filesys_mapper maps all files matching a particular unix glob pattern into an array:

file frames[] <filesys_mapper; pattern="*.jpg">;

The foreach construct can be used to apply the same block of code to each element of an array:

foreach f,ix in frames {
  output[ix] = rotate(f, 180);

Sequential iteration can be expressed using the iterate construct:

step[0] = initialCondition();
iterate ix {
  step[ix] = simulate(step[ix-1]);
}

This fragment will initialise the 0-th element of the step array to some initial condition, and then repeatedly run the simulate procedure, using each execution’s outputs as input to the next step.

3.3. Associative Arrays

By default, array keys are integers. However, other primitive types are also allowed as array keys. The syntax for declaring an array with a key type different than the default is:

<valueType>[<keyType>] array;

For example, the following code declares and assigns items to an array with string keys and float values:

float[string] a;
a["one"] = 0.2;
a["two"] = 0.4;

In addition to primitive types, a special type named auto can be used to declare an array for which an additional append operation is available:

int[auto] array;

foreach i in [1:100] {
  array << (i*2) ;
}

foreach v in array {
  trace(v);
}

Items in an array with auto keys cannot be accessed directly using a primitive type. The following example results in a compile-time error:

int[auto] array;
array[0] = 1;

However, it is possible to use auto key values from one array to access another:

int[auto] a;
int[auto] b;

a << 1;
a << 2;

foreach v, k in a {
  b[k] = a[k] * 2;
}

3.4. Ordering of execution

Non-array variables are single-assignment, which means that they must be assigned to exactly one value during execution. A procedure or expression will be executed when all of its input parameters have been assigned values. As a result of such execution, more variables may become assigned, possibly allowing further parts of the script to execute.

In this way, scripts are implicitly parallel. Aside from serialisation implied by these dataflow dependencies, execution of component programs can proceed in parallel.

In this fragment, execution of procedures p and q can happen in parallel:

y=p(x);
z=q(x);

while in this fragment, execution is serialised by the variable y, with procedure p executing before q.

y=p(x);
z=q(y);

Arrays in Swift are more monotonic - a generalisation of being assignment. Knowledge about the content of an array increases during execution, but cannot otherwise change. Each element of the array is itself single assignment or monotonic (depending on its type). During a run all values for an array are eventually known, and that array is regarded as closed.

Statements which deal with the array as a whole will often wait for the array to be closed before executing (thus, a closed array is the equivalent of a non-array type being assigned). However, a foreach statement will apply its body to elements of an array as they become known. It will not wait until the array is closed.

Consider this script:

file a[];
file b[];
foreach v,i in a {
  b[i] = p(v);
}
a[0] = r();
a[1] = s();

Initially, the foreach statement will have nothing to execute, as the array a has not been assigned any values. The procedures r and s will execute. As soon as either of them is finished, the corresponding invocation of procedure p will occur. After both r and s have completed, the array a will be closed since no other statements in the script make an assignment to a.

3.5. Compound procedures

As with many other programming languages, procedures consisting of Swift script can be defined. These differ from the previously mentioned procedures declared with the app keyword, as they invoke other Swift procedures rather than a component program.

(file output) process (file input) {
  file intermediate;
  intermediate = first(input);
  output = second(intermediate);
}

file x <"x.txt">;
file y <"y.txt">;
y = process(x);

This will invoke two procedures, with an intermediate data file named anonymously connecting the first and second procedures.

Ordering of execution is generally determined by execution of app procedures, not by any containing compound procedures. In this code block:

(file a, file b) A() {
  a = A1();
  b = A2();
}
file x, y, s, t;
(x,y) = A();
s = S(x);
t = S(y);

then a valid execution order is: A1 S(x) A2 S(y). The compound procedure A does not have to have fully completed for its return values to be used by subsequent statements.

3.6. More about types

Each variable and procedure parameter in Swift script is strongly typed. Types are used to structure data, to aid in debugging and checking program correctness and to influence how Swift interacts with data.

The image type declared in previous examples is a marker type. Marker types indicate that data for a variable is stored in a single file with no further structure exposed at the Swift script level.

Arrays have been mentioned above, in the arrays section. A code block may be applied to each element of an array using foreach; or individual elements may be references using [] notation.

There are a number of primitive types:

type contains

int

integers

string

strings of text

float

floating point numbers, that behave the same as Java doubles

boolean

true/false

Complex types may be defined using the type keyword:

type headerfile;
type voxelfile;
type volume {
  headerfile h;
  voxelfile v;
}

Members of a complex type can be accessed using the . operator:

volume brain;
o = p(brain.h);

Sometimes data may be stored in a form that does not fit with Swift’s file-and-site model; for example, data might be stored in an RDBMS on some database server. In that case, a variable can be declared to have external type. This indicates that Swift should use the variable to determine execution dependency, but should not attempt other data management; for example, it will not perform any form of data stage-in or stage-out it will not manage local data caches on sites; and it will not enforce component program atomicity on data output. This can add substantial responsibility to component programs, in exchange for allowing arbitrary data storage and access methods to be plugged in to scripts.

type file;

app (external o) populateDatabase() {
  populationProgram;
}

app (file o) analyseDatabase(external i) {
  analysisProgram @o;
}

external database;
file result <"results.txt">;

database = populateDatabase();
result = analyseDatabase(database);

Some external database is represented by the database variable. The populateDatabase procedure populates the database with some data, and the analyseDatabase procedure performs some subsequent analysis on that database. The declaration of database contains no mapping; and the procedures which use database do not reference them in any way; the description of database is entirely outside of the script. The single assignment and execution ordering rules will still apply though; populateDatabase will always be run before analyseDatabase.

3.7. Data model

Data processed by Swift is strongly typed. It may be take the form of values in memory or as out-of-core files on disk. Language constructs called mappers specify how each piece of data is stored.

3.8. More technical details about Swift script

The syntax of Swift script has a superficial resemblance to C and Java. For example, { and } characters are used to enclose blocks of statements.

A Swift script consists of a number of statements. Statements may declare types, procedures and variables, assign values to variables, and express operations over arrays.

3.9. Variables

Variables in Swift scripts are declared to be of a specific type. Assignments to those variables must be data of that type. Swift script variables are single-assignment - a value may be assigned to a variable at most once. This assignment can happen at declaration time or later on in execution. When an attempt to read from a variable that has not yet been assigned is made, the code performing the read is suspended until that variable has been written to. This forms the basis for Swift’s ability to parallelise execution - all code will execute in parallel unless there are variables shared between the code that cause sequencing.

3.10. Variable Declarations

Variable declaration statements declare new variables. They can optionally assign a value to them or map those variables to on-disk files.

Declaration statements have the general form:

typename variablename (<mapping> | = initialValue ) ;

The format of the mapping expression is defined in the Mappers section. initialValue may be either an expression or a procedure call that returns a single value.

Variables can also be declared in a multivalued-procedure statement, described in another section.

3.11. Assignment Statements

Assignment statements assign values to previously declared variables. Assignments may only be made to variables that have not already been assigned. Assignment statements have the general form:

variable = value;

where value can be either an expression or a procedure call that returns a single value.

Variables can also be assigned in a multivalued-procedure statement, described in another section.

3.12. Procedures

There are two kinds of procedure: An atomic procedure, which describes how an external program can be executed; and compound procedures which consist of a sequence of Swift script statements.

A procedure declaration defines the name of a procedure and its input and output parameters. Swift script procedures can take multiple inputs and produce multiple outputs. Inputs are specified to the right of the function name, and outputs are specified to the left. For example:

(type3 out1, type4 out2) myproc (type1 in1, type2 in2)

The above example declares a procedure called myproc, which has two inputs in1 (of type type1) and in2 (of type type2) and two outputs out1 (of type type3) and out2 (of type type4).

A procedure input parameter can be an optional parameter in which case it must be declared with a default value. When calling a procedure, both positional parameter and named parameter passings can be passed, provided that all optional parameters are declared after the required parameters and any optional parameter is bound using keyword parameter passing. For example, if myproc1 is defined as:

(binaryfile bf) myproc1 (int i, string s="foo")

Then that procedure can be called like this, omitting the optional

parameter s:
binaryfile mybf = myproc1(1);

or like this supplying a value for the optional parameter s:

binaryfile mybf = myproc1 (1, s="bar");

3.12.1. Atomic procedures

An atomic procedure specifies how to invoke an external executable program, and how logical data types are mapped to command line arguments.

Atomic procedures are defined with the app keyword:

app (binaryfile bf) myproc (int i, string s="foo") {
    myapp i s @filename(bf);
}

which specifies that myproc invokes an executable called myapp, passing the values of i, s and the filename of bf as command line arguments.

3.12.2. Compound procedures

A compound procedure contains a set of Swift script statements:

(type2 b) foo_bar (type1 a) {
    type3 c;
    c = foo(a);    // c holds the result of foo
    b = bar(c);    // c is an input to bar
}

3.13. Control Constructs

Swift script provides if, switch, foreach, and iterate constructs, with syntax and semantics similar to comparable constructs in other high-level languages.

3.13.1. foreach

The foreach construct is used to apply a block of statements to each element in an array. For example:

check_order (file a[]) {
    foreach f in a {
        compute(f);
    }
}

foreach statements have the general form:

foreach controlvariable (,index) in expression {
    statements
}

The block of statements is evaluated once for each element in expression which must be an array, with controlvariable set to the corresponding element and index (if specified) set to the integer position in the array that is being iterated over.

3.13.2. if

The if statement allows one of two blocks of statements to be executed, based on a boolean predicate. if statements generally have the form:

if(predicate) {
    statements
} else {
    statements
}

where predicate is a boolean expression.

3.13.3. switch

switch expressions allow one of a selection of blocks to be chosen based on the value of a numerical control expression. switch statements take the general form:

switch(controlExpression) {
    case n1:
        statements2
    case n2:
        statements2
    [...]
    default:
        statements
}

The control expression is evaluated, the resulting numerical value used to select a corresponding case, and the statements belonging to that case block are evaluated. If no case corresponds, then the statements belonging to the default block are evaluated.

Unlike C or Java switch statements, execution does not fall through to subsequent case blocks, and no break statement is necessary at the end of each block.

Following is an example of a switch expression in Swift:

int score=60;
switch (score){
case 100:
    tracef("%s\n", "Bravo!");
case 90:
    tracef("%s\n", "very good");
case 80:
    tracef("%s\n", "good");
case 70:
    tracef("%s\n", "fair");
default:
    tracef("%s\n", "unknown grade");
    }

3.13.4. iterate

iterate expressions allow a block of code to be evaluated repeatedly, with an iteration variable being incremented after each iteration.

The general form is:

iterate var {
    statements;
} until (terminationExpression);

Here var is the iteration variable. Its initial value is 0. After each iteration, but before terminationExpression is evaluated, the iteration variable is incremented. This means that if the termination expression is a function of only the iteration variable, the body will never be executed while the termination expression is true.

Example:

iterate i {
    trace(i); // will print 0, 1, and 2
} until (i == 3);

Variables declared inside the body of iterate can be used in the termination expression. However, their values will reflect the values calculated as part of the last invocation of the body, and may not reflect the incremented value of the iteration variable:

iterate i {
    trace(i);
    int j = i; // will print 0, 1, 2, and 3
} until (j == 3);

3.14. Operators

The following infix operators are available for use in Swift script expressions.

operator purpose

+

numeric addition; string concatenation

-

numeric subtraction

*

numeric multiplication

/

floating point division

%/

integer division

%%

integer remainder of division

== !=

comparison and not-equal-to

< > ⇐ >=

numerical ordering

&& ||

boolean and, or

!

boolean not

3.15. Global constants

At the top level of a Swift script program, the global modified may be added to a declaration so that it is visible throughout the program, rather than only at the top level of the program. This allows global constants (of any type) to be defined.

3.16. Imports

The import directive can be used to import definitions from another Swift file.

For example, a Swift script might contain this:

import "defs";
file f;

which would import the content of defs.swift:

type file;

Imported files are read from two places. They are either read from the path that is specified from the import command, such as:

import "definitions/file/defs";

or they are read from the environment variable SWIFT_LIB. This environment variable is used just like the PATH environment variable. For example, if the command below was issued to the bash shell:

export SWIFT_LIB=${HOME}/Swift/defs:${HOME}/Swift/functions

then the import command will check for the file defs.swift in both "${HOME}/Swift/defs" and "${HOME}/Swift/functions" first before trying the path that was specified in the import command.

Other valid imports:

import "../functions/func"
import "/home/user/Swift/definitions/defs"

There is no requirement that a module is imported only once. If a module is imported multiple times, for example in different files, then Swift will only process the imports once.

Imports may contain anything that is valid in a Swift script, including the code that causes remote execution.

3.17. Mappers

Mappers provide a mechanism to specify the layout of mapped datasets on disk. This is needed when Swift must access files to transfer them to remote sites for execution or to pass to applications.

Swift provides a number of mappers that are useful in common cases. This section details those mappers. For more complex cases, it is possible to write application-specific mappers in Java and use them within a Swift script.

3.17.1. The Single File Mapper

The single_file_mapper maps a single physical file to a dataset.

Swift variable Filename

f

myfile

f [0]

INVALID

f.bar

INVALID

parameter meaning

file

The location of the physical file including path and file name.

Example:

file f <single_file_mapper;file="plot_outfile_param">;

There is a simplified syntax for this mapper:

file f <"plot_outfile_param">;

3.17.2. The Simple Mapper

The simple_mapper maps a file or a list of files into an array by prefix, suffix, and pattern. If more than one file is matched, each of the file names will be mapped as a subelement of the dataset.

Parameter Meaning

location

A directory that the files are located.

prefix

The prefix of the files

suffix

The suffix of the files, for instance: ".txt"

padding

The number of digits used to uniquely identify the mapped file. This is an optional parameter which defaults to 4.

pattern

A UNIX glob style pattern, for instance: "*foo*" would match all file names that contain foo. When this mapper is used to specify output filenames, pattern is ignored.

type file;
file f <simple_mapper;prefix="foo", suffix=".txt">;

The above maps all filenames that start with foo and have an extension .txt into file f.

Swift variable Filename

f

foo.txt

type messagefile;

(messagefile t) greeting(string m) {.
    app {
        echo m stdout=@filename(t);
    }
}

messagefile outfile <simple_mapper;prefix="foo",suffix=".txt">;

outfile = greeting("hi");

This will output the string hi to the file foo.txt.

The simple_mapper can be used to map arrays. It will map the array index into the filename between the prefix and suffix.

type messagefile;

(messagefile t) greeting(string m) {
    app {
        echo m stdout=@filename(t);
    }
}

messagefile outfile[] <simple_mapper;prefix="baz",suffix=".txt", padding=2>;

outfile[0] = greeting("hello");
outfile[1] = greeting("middle");
outfile[2] = greeting("goodbye");
Swift variable Filename

outfile[0]

baz00.txt

outfile[1]

baz01.txt

outfile[2]

baz02.txt

simple_mapper can be used to map structures. It will map the name of the structure member into the filename, between the prefix and the suffix.

type messagefile;

type mystruct {
  messagefile left;
  messagefile right;
};

(messagefile t) greeting(string m) {
    app {
        echo m stdout=@filename(t);
    }
}

mystruct out <simple_mapper;prefix="qux",suffix=".txt">;

out.left = greeting("hello");
out.right = greeting("goodbye");

This will output the string "hello" into the file qux.left.txt and the string "goodbye" into the file qux.right.txt.

Swift variable Filename

out.left

quxleft.txt

out.right

quxright.txt

3.17.3. Concurrent Mapper

The concurrent_mapper is almost the same as the simple mapper, except that it is used to map an output file, and the filename generated will contain an extract sequence that is unique. This mapper is the default mapper for variables when no mapper is specified.

Parameter Meaning

location

A directory that the files are located.

prefix

The prefix of the files

suffix

The suffix of the files, for instance: ".txt" pattern A UNIX glob style pattern, for instance: "*foo*" would match all file names that contain foo. When this mapper is used to specify output filenames, pattern is ignored.

Example:

file f1;
file f2 <concurrent_mapper;prefix="foo", suffix=".txt">;

The above example would use concurrent mapper for f1 and f2, and generate f2 filename with prefix "foo" and extension ".txt"

3.17.4. Filesystem Mapper

The filesys_mapper is similar to the simple mapper, but maps a file or a list of files to an array. Each of the filename is mapped as an element in the array. The order of files in the resulting array is not defined.

TODO: note on difference between location as a relative vs absolute path w.r.t. staging to remote location - as mihael said: It’s because you specify that location in the mapper. Try location="." instead of location="/sandbox/…"

parameter meaning

location

The directory where the files are located.

prefix

The prefix of the files

suffix

The suffix of the files, for instance: ".txt"

pattern

A UNIX glob style pattern, for instance: "*foo*" would match all file names that contain foo.

Example:

file texts[] <filesys_mapper;prefix="foo", suffix=".txt">;

The above example would map all filenames that start with "foo" and have an extension ".txt" into the array texts. For example, if the specified directory contains files: foo1.txt, footest.txt, foo__1.txt, then the mapping might be:

Swift variable Filename

texts[0]

footest.txt

texts[1]

foo1.txt

texts[2]

foo__1.txt

3.17.5. Fixed Array Mapper

The fixed_array_mapper maps from a string that contains a list of filenames into a file array.

parameter Meaning

files

A string that contains a list of filenames, separated by space, comma or colon

Example:

file texts[] <fixed_array_mapper;files="file1.txt, fileB.txt, file3.txt">;

would cause a mapping like this:

Swift variable Filename

texts[0]

file1.txt

texts[1]

fileB.txt

texts[2]

file3.txt

3.17.6. Array Mapper

The array_mapper maps from an array of strings into a file

parameter meaning

files

An array of strings containing one filename per element

Example:

string s[] = [ "a.txt", "b.txt", "c.txt" ];

file f[] <array_mapper;files=s>;

This will establish the mapping:

Swift variable Filename

f[0]

a.txt

f[1]

b.txt

f[2]

c.txt

3.17.7. Regular Expression Mapper

The regexp_mapper transforms one file name to another using regular expression matching.

parameter meaning

source

The source file name

match

Regular expression pattern to match, use

()

to match whatever regular expression is inside the parentheses, and indicate the start and end of a group; the contents of a group can be retrieved with the

\\number

special sequence (two backslashes are needed because the backslash is an escape sequence introducer)

transform

The pattern of the file name to transform to, use \number to reference the group matched.

Example:

file s <"picture.gif">;
file f <regexp_mapper; source=s,
  match="(.*)gif", transform="\\1jpg">;

This example transforms a file ending gif into one ending jpg and maps that to a file.

Swift variable Filename

f

picture.jpg

3.17.8. Structured Regular Expression Mapper

The structured_regexp_mapper is similar to the regexp_mapper with the only difference that it can be applied to arrays while the regexp_mapper cannot.

parameter meaning

source

The source file name

match

Regular expression pattern to match, use

()

to match whatever regular expression is inside the parentheses, and indicate the start and end of a group; the contents of a group can be retrieved with the

\\number

special sequence (two backslashes are needed because the backslash is an escape sequence introducer)

transform

The pattern of the file name to transform to, use \number to reference the group matched.

Example:

file s[] <filesys_mapper; pattern="*.gif">;

file f[] <structured_regexp_mapper; source=s,
          match="(.*)gif", transform="\\1jpg">;

This example transforms all files in a list that end in gif to end in jpg and maps the list to those files.

3.17.9. CSV Mapper

The csv_mapper maps the content of a CSV (comma-separated value) file into an array of structures. The dataset type needs to be correctly defined to conform to the column names in the file. For instance, if the file contains columns: name age GPA then the type needs to have member elements like this:

type student {
  file name;
  file age;
  file GPA;
}

If the file does not contain a header with column info, then the column names are assumed as column1, column2, etc.

Parameter Meaning

file

The name of the CSV file to read mappings from.

header

Whether the file has a line describing header info; default is

true

skip

The number of lines to skip at the beginning (after header line); default is 0.

hdelim

Header field delimiter; default is the value of the

delim

parameter

delim

Content field delimiters; defaults are space, tab and comma

Example:

student stus[] <csv_mapper;file="stu_list.txt">;

The above example would read a list of student info from file "stu_list.txt" and map them into a student array. By default, the file should contain a header line specifying the names of the columns. If stu_list.txt contains the following:

name,age,gpa
101-name.txt, 101-age.txt, 101-gpa.txt
name55.txt, age55.txt, age55.txt
q, r, s

then some of the mappings produced by this example would be:

stus[0].name 101-name.txt

stus[0].age

101-age.txt

stus[0].gpa

101-gpa.txt

stus[1].name

name55.txt

stus[1].age

age55.txt

stus[1].gpa

gpa55.txt

stus[2].name

q

stus[2].age

r

stus[2].gpa

s

3.17.10. External Mapper

The external mapper, ext maps based on the output of a supplied Unix executable.

parameter

meaning

exec

The name of the executable (relative to the current directory, if an absolute path is not specified)

*

Other parameters are passed to the executable prefixed with a - symbol

The output (stdout) of the executable should consist of two columns of data, separated by a space. The first column should be the path of the mapped variable, in Swift script syntax (for example [2] means the 2nd element of an array) or the symbol $ to represent the root of the mapped variable. The following table shows the symbols that should appear in the first column corresponding to the mapping of different types of swift constructs such as scalars, arrays and structs.

Swift construct

first column

second column

scalar

$

file_name

anarray[]

[]

file_name

2dimarray[][]

[][]

file_name

astruct.fld

fld

file_name

astructarray[].fldname

[].fldname

file_name

Example: With the following in mapper.sh,

#!/bin/bash
echo "[2] qux"
echo "[0] foo"
echo "[1] bar"

then a mapping statement:

student stus[] <ext;exec="mapper.sh">;

would map

Swift variable Filename

stus[0]

foo

stus[1]

bar

stus[2]

qux

Advanced Example: The following mapper.sh is an advanced example of an external mapper that maps a two-dimensional array to a directory of files. The files in the said directory are identified by their names appended by a number between 000 and 099. The first index of the array maps to the first part of the filename while the second index of the array maps to the second part of the filename.

#!/bin/sh

#take care of the mapper args
while [ $# -gt 0 ]; do
  case $1 in
    -location)          location=$2;;
    -padding)           padding=$2;;
    -prefix)            prefix=$2;;
    -suffix)            suffix=$2;;
    -mod_index)         mod_index=$2;;
    -outer_index)       outer_index=$2;;
    *) echo "$0: bad mapper args" 1>&2
       exit 1;;
  esac
  shift 2
done

for i in `seq 0 ${outer_index}`
do
 for j in `seq -w 000 ${mod_index}`
 do
  fj=`echo ${j} | awk '{print $1 +0}'` #format j by removing leading zeros
  echo "["${i}"]["${fj}"]" ${location}"/"${prefix}${j}${suffix}
 done
done

The mapper definition is as follows:

file_dat dat_files[][] < ext;
                              exec="mapper.sh",
                              padding=3,
                              location="output",
                              prefix=@strcat( str_root, "_" ),
                              suffix=".dat",
                              outer_index=pid,
                              mod_index=n >;

Assuming there are 4 files with name aaa, bbb, ccc, ddd and a mod_index of 10, we will have 4x10=40 files mapped to a two-dimensional array in the following pattern:

Swift variable Filename

stus[0][0]

output/aaa_000.dat

stus[0][1]

output/aaa_001.dat

stus[0][2]

output/aaa_002.dat

stus[0][3]

output/aaa_003.dat

stus[0][9]

output/aaa_009.dat

stus[1][0]

output/bbb_000.dat

stus[1][1]

output/bbb_001.dat

stus[3][9]

output/ddd_009.dat

3.18. Executing app procedures

This section describes how Swift executes app procedures, and requirements on the behaviour of application programs used in app procedures. These requirements are primarily to ensure that the Swift can run your application in different places and with the various fault tolerance mechanisms in place.

3.18.1. Mapping of app semantics into unix process execution semantics

This section describes how an app procedure invocation is translated into a (remote) unix process execution. It does not describe the mechanisms by which Swift performs that translation; that is described in the next section.

In this section, this example Swift script is used for reference:

type file;

app (file o) count(file i) {
  wc @i stdout=@o;
}

file q <"input.txt">;
file r <"output.txt">;

The executable for wc will be looked up in tc.data.

This unix executable will then be executed in some application procedure workspace. This means:

Each application procedure workspace will have an application workspace directory. (TODO: can collapse terms application procedure workspace and application workspace directory ?

This application workspace directory will not be shared with any other application procedure execution attempt; all application procedure execution attempts will run with distinct application procedure workspaces. (for the avoidance of doubt: If a Swift script procedure invocation is subject to multiple application procedure execution attempts (due to Swift-level restarts, retries or replication) then each of those application procedure execution attempts will be made in a different application procedure workspace. )

The application workspace directory will be a directory on a POSIX filesystem accessible throughout the application execution by the application executable.

Before the application executable is executed:

  • The application workspace directory will exist.

  • The input files will exist inside the application workspace directory (but not necessarily as direct children; there may be subdirectories within the application workspace directory).

  • The input files will be those files mapped to input parameters of the application procedure invocation. (In the example, this means that the file input.txt will exist in the application workspace directory)

  • For each input file dataset, it will be the case that @filename or @filenames invoked with that dataset as a parameter will return the path relative to the application workspace directory for the file(s) that are associated with that dataset. (In the example, that means that @i will evaluate to the path input.txt)

  • For each file-bound parameter of the Swift procedure invocation, the associated files (determined by data type?) will always exist.

  • The input files must be treated as read only files. This may or may not be enforced by unix file system permissions. They may or may not be copies of the source file (conversely, they may be links to the actual source file).

During/after the application executable execution, the following must be true:

  • If the application executable execution was successful (in the opinion of the application executable), then the application executable should exit with unix return code 0; if the application executable execution was unsuccessful (in the opinion of the application executable), then the application executable should exit with unix return code not equal to 0.

  • Each file mapped from an output parameter of the Swift script procedure call must exist. Files will be mapped in the same way as for input files.

  • The output subdirectories will be precreated before execution by Swift if defined within a Swift script such as the location attribute of a mapper. App executables expect to make them if they are referred to in the wrapper scripts.

  • Output produced by running the application executable on some inputs should be the same no matter how many times, when or where that application executable is run. The same can vary depending on application (for example, in an application it might be acceptable for a PNG→JPEG conversion to produce different, similar looking, output jpegs depending on the environment)

Things to not assume:

  • Anything about the path of the application workspace directory

  • That either the application workspace directory will be deleted or will continue to exist or will remain unmodified after execution has finished

  • That files can be passed between application procedure invocations through any mechanism except through files known to Swift through the mapping mechanism (there is some exception here for external datasets - there are a separate set of assertions that hold for external datasets)

  • That application executables will run on any particular site of those available, or than any combination of applications will run on the same or different sites.

3.19. How Swift implements the site execution model

This section describes the implementation of the semantics described in the previous section.

Swift executes application procedures on one or more sites.

Each site consists of:

  • worker nodes. There is some execution mechanism through which the Swift client side executable can execute its wrapper script on those worker nodes. This is commonly GRAM or Falkon or coasters.

  • a site-shared file system. This site shared filesystem is accessible through some file transfer mechanism from the Swift client side executable. This is commonly GridFTP or coasters. This site shared filesystem is also accessible through the posix file system on all worker nodes, mounted at the same location as seen through the file transfer mechanism. Swift is configured with the location of some site working directory on that site-shared file system.

There is no assumption that the site shared file system for one site is accessible from another site.

For each workflow run, on each site that is used by that run, a run directory is created in the site working directory, by the Swift client side.

In that run directory are placed several subdirectories:

  • shared/ - site shared files cache

  • kickstart/ - when kickstart is used, kickstart record files for each job that has generated a kickstart record.

  • info/ - wrapper script log files

  • status/ - job status files

  • jobs/ - application workspace directories (optionally placed here - see below)

Application execution looks like this:

For each application procedure call:

The Swift client side selects a site; copies the input files for that procedure call to the site shared file cache if they are not already in the cache, using the file transfer mechanism; and then invokes the wrapper script on that site using the execution mechanism.

The wrapper script creates the application workspace directory; places the input files for that job into the application workspace directory using either cp or ln -s (depending on a configuration option); executes the application unix executable; copies output files from the application workspace directory to the site shared directory using cp; creates a status file under the status/ directory; and exits, returning control to the Swift client side. Logs created during the execution of the wrapper script are stored under the info/ directory.

The Swift client side then checks for the presence of and deletes a status file indicating success; and copies files from the site shared directory to the appropriate client side location.

The job directory is created (in the default mode) under the jobs/ directory. However, it can be created under an arbitrary other path, which allows it to be created on a different file system (such as a worker node local file system in the case that the worker node has a local file system).

swift-site-model.png

3.20. Technical overview of the Swift architecture

This section attempts to provide a technical overview of the Swift architecture.

3.20.1. Execution layer

The execution layer causes an application program (in the form of a unix executable) to be executed either locally or remotely.

The two main choices are local unix execution and execution through GRAM. Other options are available, and user provided code can also be plugged in.

The kickstart utility can be used to capture environmental information at execution time to aid in debugging and provenance capture.

3.20.2. Swift script language compilation layer

Step i: text to XML intermediate form parser/processor. parser written in ANTLR - see resources/VDL.g. The XML Schema Definition (XSD) for the intermediate language is in resources/XDTM.xsd.

Step ii: XML intermediate form to Karajan workflow. Karajan.java - reads the XML intermediate form. compiles to karajan workflow language - for example, expressions are converted from Swift script syntax into Karajan syntax, and function invocations become karajan function invocations with various modifications to parameters to accomodate return parameters and dataset handling.

3.20.3. Swift/karajan library layer

Some Swift functionality is provided in the form of Karajan libraries that are used at runtime by the Karajan workflows that the Swift compiler generates.

3.21. Function reference

This section details functions that are available for use in the Swift language.

3.21.1. arg

Takes a command line parameter name as a string parameter and an optional default value and returns the value of that string parameter from the command line. If no default value is specified and the command line parameter is missing, an error is generated. If a default value is specified and the command line parameter is missing, @arg will return the default value.

Command line parameters recognized by @arg begin with exactly one hyphen and need to be positioned after the script name.

For example:

trace(arg("myparam"));
trace(arg("optionalparam", "defaultvalue"));
$ swift arg.swift -myparam=hello
Swift v0.3-dev r1674 (modified locally)

RunID: 20080220-1548-ylc4pmda
Swift trace: defaultvalue
Swift trace: hello

3.21.2. extractInt

extractInt(file) will read the specified file, parse an integer from the file contents and return that integer.

3.21.3. extractFloat

Similar to extractInt, extractFloat(file) will read the specified file, parse a float from the file contents and return that float.

3.21.4. filename

filename(v) will return a string containing the filename(s) for the file(s) mapped to the variable v. When more than one filename is returned, the filenames will be space separated inside a single string return value.

3.21.5. filenames

filenames(v) will return multiple values containing the filename(s) for the file(s) mapped to the variable v.

3.21.6. length

length(array) will return the length of an array in Swift. This function will wait for all elements in the array to be written before returning the length.

3.21.7. readData

readData will read data from a specified file and assign it to Swift variable. The format of the input file is controlled by the type of the return value. For scalar return types, such as int, the specified file should contain a single value of that type. For arrays of scalars, the specified file should contain one value per line. For complex types of scalars, the file should contain two rows. The first row should be structure member names separated by whitespace. The second row should be the corresponding values for each structure member, separated by whitespace, in the same order as the header row. For arrays of structs, the file should contain a heading row listing structure member names separated by whitespace. There should be one row for each element of the array, with structure member elements listed in the same order as the header row and separated by whitespace. The following example shows how readData() can be used to populate an array of Swift struct-like complex type:

type Employee{
    string name;
    int id;
    string loc;
}

Employee emps[] = readData("emps.txt");

Where the contents of the "emps.txt" file are:

name id loc
Thomas 2222 Chicago
Gina 3333 Boston
Anne 4444 Houston

This will result in the array "emps" with 3 members. This can be processed within a Swift script using the foreach construct as follows:

foreach emp in emps{
    tracef("Employee %s lives in %s and has id %d", emp.name, emp.loc, emp.id);
}

3.21.8. readStructured

readStructured will read data from a specified file, like readdata, but using a different file format more closely related to that used by the ext mapper.

Input files should list, one per line, a path into a Swift structure, and the value for that position in the structure:

rows[0].columns[0] = 0
rows[0].columns[1] = 2
rows[0].columns[2] = 4
rows[1].columns[0] = 1
rows[1].columns[1] = 3
rows[1].columns[2] = 5

which can be read into a structure defined like this:

type vector {
        int columns[];
}

type matrix {
        vector rows[];
}

matrix m;

m = readStructured("readStructured.in");

(since Swift 0.7, was readData2(deprecated))

3.21.9. regexp

regexp(input,pattern,replacement) will apply regular expression substitution using the Java java.util.regexp API http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html. For example:

string v =  regexp("abcdefghi", "c(def)g","monkey");

will assign the value "abmonkeyhi" to the variable v.

3.21.10. sprintf

sprintf(spec, variable list) will generate a string based on the specified format.

Example: string s = sprintf("\t%s\n", "hello");

Format specifiers

%%

% sign

%M

Filename output (waits for close)

%p

Format variable according to an internal format

%b

Boolean output

%f

Float output

%i

int output

%s

String output

%k

Variable sKipped, no output

%q

Array output

3.21.11. strcat

strcat(a,b,c,d,…) will return a string containing all of the strings passed as parameters joined into a single string. There may be any number of parameters.

The + operator concatenates two strings: strcat(a,b) is the same as a + b

3.21.12. strcut

strcut(input,pattern) will match the regular expression in the pattern parameter against the supplied input string and return the section that matches the first matching parenthesised group.

For example:

string t = "my name is John and i like puppies.";
string name = strcut(t, "my name is ([^ ]*) ");
string out = strcat("Your name is ",name);
trace(out);

This will output the message: Your name is John.

3.21.13. strjoin

strjoin(array, delimiter) will combine the elements of an array into a single string separated by a given delimiter. The array passed to strjoin must be of a primitive type (string, int, float, or boolean). It will not join the contents of an array of files.

Example:

string test[] = ["this", "is", "a", "test" ];
string mystring = strjoin(test, " ");
tracef("%s\n", mystring);

This will print the string "this is a test".

3.21.14. strsplit

strsplit(input,pattern) will split the input string based on separators that match the given pattern and return a string array.

Example:

string t = "my name is John and i like puppies.";
string words[] = strsplit(t, "\\s");
foreach word in words {
    trace(word);
}

This will output one word of the sentence on each line (though not necessarily in order, due to the fact that foreach iterations execute in parallel).

3.21.15. toInt

toInt(input) will parse its input string into an integer. This can be used with arg() to pass input parameters to a Swift script as integers.

3.21.16. toFloat

toFloat(input) will parse its input string into a floating point number. This can be used with arg() to pass input parameters to a Swift script as floating point numbers.

3.21.17. toString

toString(input) will parse its input into a string. Input can be an int, float, string, or boolean.

3.21.18. trace

trace will log its parameters. By default these will appear on both stdout and in the run log file. Some formatting occurs to produce the log message. The particular output format should not be relied upon.

3.21.19. tracef

tracef(spec, variable list) will log its parameters as formatted by the formatter spec. spec must be a string. Checks the type of the specifiers arguments against the variable list and allows for certain escape characters.

Example:

int i = 3;
tracef("%s: %i\n", "the value is", i);

Specifiers:

%s

Format a string.

%b

Format a boolean.

%i

Format a number as an integer.

%f

Format a number as a floating point number.

%q

Format an array.

%M

Format a mapped variable’s filename.

%k

Wait for the given variable but do not format it.

%p

Format variable according to an internal format.

Escape sequences:

\n

Produce a newline.

\t

Produce a tab.

Known issues:

Swift does not correctly scan certain backslash sequences such as \\.

3.21.20. java

java(class_name, static_method, method_arg) will call a java static method of the class class_name.

3.21.21. writeData

writeData will write out data structures in the format described for readData. The following example demonstrates how one can write a string "foo" into a file "writeDataPrimitive.out":

type file;

string s = "foo";

file f <"writeDataPrimitive.out">;

f=writeData(s);

4. Configuration

Swift is mainly configured using a configuration file, typically called swift.conf. This file contains configuration properties and site descriptions. A simple configuration file may look like this:

site.mysite {
    execution {
        type: "coaster"
        URL: "my.site.org"
        jobManager: "ssh:local"
    }
    staging: "local"

    app.ALL {executable: "*"}
}

# select sites to run on
sites: [mysite]

# other settings
lazy.errors: false

4.1. Configuration Syntax

The Swift configuration files are expressed in a modified version of JSON. The main additions to JSON are:

  • Quotes around string values, in particular keys, are optional, unless the strings contain special characters (single/double quotes, square and curly braces, white space, $, :, =, ,, `, ^, ?, !, @, *, \), or if they represent other values: true, false, null, and numbers.

  • = and : can be used interchangeably to separate keys from values

  • = (or :) is optional before an open bracket

  • Commas are optional as separators if there is a new line

  • ${…} expansion can be used to substitute environment variable values or Java system properties. If the value of an environment variable is needed, it must be prefixed with env.. For example ${env.PATH}. Except for include directives, the ${…} must not be inside double quotes for the substitution to work. The same outcome can be achieved using implicit string concatenation: "/home/"${env.USER}"/bin"

Comments can be introduced by starting a line with a hash symbol (#) or using a double slash (//):

# This is a comment
// This is also a comment

keepSitesDir: true # This is a comment following a valid property

4.2. Include Directives

Include directives can be used to include the contents of a Swift configuration file from another Swift configuration file. This is done using the literal include followed by a quoted string containing the path to the target file. The path may contain references to environment variables or system properties using the substitution syntax explained above. For example:

# an absolute path name
include "/home/joedoe/swift-config/site1.conf"

# include a file from the Swift distribution package
include "${swift.home}/etc/sites/beagle.conf"

# include a file using an environment variable
include "${env.SWIFT_CONFIG_DIR}/glow.conf"

4.3. Property Merging

If two properties with the same name are present in a configuration file, they are either merged or the latter one overrides the earlier one. This depends on the type of property. Simple values are always overridden, while objects are merged. For example:

key: 1
key: 2
# key is now 2

object {
    key1: 1
}

object {
    key2: 2
}

# object is now { key1: 1, key2: 2}

This can be used to define certain template files that contain most of the definitions for sites, and then include them in other files and override or add only certain aspects of those sites. For example, assume swift-local.conf includes a definition for a site named local that can be used to run applications on the Swift client side. If you wanted to override only the work directory, the following swift.conf could be used:

include "swift-local.conf"

site.local {
    # use existing definition for site.local, but override workDirectory
    workDirectory: "/tmp"
}

If, on the other hand, you want to fully override the definition of site.local, you could set it to null first and then provide your own definition:

include "swift-local.conf"

# forget previous definition of site.local
site.local: null

# define your own site.local
site.local {
    ...
}

4.4. Configuration Search Path

By default, Swift attempts to load multiple configuration files, merging them sequentially as described in the Property Merging Section. The files are:

  1. Distribution Configuration ([D]): ${swift.home}/etc/swift.conf

  2. Site Configuration ([S]): ${env.SWIFT_SITE_CONF} (if SWIFT_SITE_CONF is defined)

  3. User Configuration ([U]): ${env.HOME}/.swift/swift.conf (if present)

  4. Run Configiuration ([R]): ${env.PWD}/swift.conf (if present)

In addition, a number of configuration properties can be overridden individually on the Swift command line. For a list of such configuration properties, please use swift -help or refer to the <<??, Running Swift>> Section in this document.

The run configuration can be overridden on the Swift command line using the -config argument. If -config is specified, Swift will not attempt to load swift.conf from the current directory.

The entire configuration search path can be replaced with a custom search path using the -configpath command line argument. The value passed to -configpath must be a list of paths pointing to various configuration files, separated by the standard operating system path separator (: on Linux and ; on Windows). For example:

swift -configpath /etc/swift/swift.conf:~/swift-configs/s1.conf:swift.conf

If in doubt about what configuration files are being loaded or to troubleshoot configuration issues, Swift can be started with the -listconfig command line argument. -listconfig accepts to possible values:

  • files: will print a list of configuration files loaded by Swift

  • full: will print a list of configuration files loaded by Swift, as well as the final merged configuration.

4.5. Configuration File Structure

The contents of a Swift configuration file can be divided into a number of relevant sections:

  • site declarations

  • global application declarations

  • Swift configuration properties

4.5.1. Site Declarations

Swift site declarations are specified using the site.<name> property, where text inside angle brackets is to be interpreted as a generic label for user-specified content, whereas content between square brackets is optional:

site.<name> {
    execution {...}
    [staging: "swift" | "local" | "service-local" | "shared-fs" | "wrapper"]
    [filesystem {...}]
    workDirectory: <path>

    [<site options>]
    [<application declarations>]
}

A site name can be any string. If the string contains special characters, it must be quoted:

site."My-$pecial-$ite" {...}

4.5.2. Site Selection

Once sites are declared, they must be explicitly enabled for Swift to use them. This can be achieved with the sites option, which accepts either an array or a comma-separated list of site names:

sites: ["site1", "site2"]

# alternatively:

sites: "site1, site2"

The sites option can also be specified on the Swift command line:

swift -sites site1,site2 script.swift
Execution Mechanism

The execution property tells Swift how applications should be executed on a site:

    execution {
        type: <string>
        [URL: <string>]
        [jobManager: <string>]

        [<execution provider options>]
    }

The type property is used to select one of the mechanisms for application execution that is known by Swift. A comprehensinve list of execution mechanisms can be found in Execution Mechanisms Section. A summary is shown below:

Table 1. Swift Execution Mechanisms
Type URL required Uses jobManager Default jobManager Staging methods supported Description

local

no

no

-

swift, local, wrapper

Runs applications locally using a simple fork()-based mechanism

coaster

yes

yes

none

swift, wrapper, local, service-local, shared-fs, direct

Submits applications through an automatically-deployed Swift Coasters service

coaster-persistent

yes

yes

none

swift, wrapper, local, service-local, shared-fs, direct

Uses a manually deployed Swift Coasters service

GRAM5

yes

yes

"fork"

swift, wrapper

Uses the GRAM: User’s Guide component of the Globus Toolkit.

GT2

An alias for GRAM5

SSH

yes

no

-

swift, wrapper

Runs applications using a Java implementation of the SSH protocol

SSH-CL

yes

no

-

swift, wrapper

Like SSH except it uses the command-line ssh tool.

PBS

no

no

-

swift, wrapper

Submits applications to a PBS or Torque resource manager

Condor

no

no

-

swift, wrapper

Submits applications using Condor

SGE

no

no

-

swift, wrapper

Uses the Sun Grid Engine

SLURM

no

no

-

swift, wrapper

Uses the SLURM local scheduler

LSF

no

no

-

swift, wrapper

Submits applications to Platform’s Load Sharing Facility

The execution provider options are options that specify finer details on how on application should be executed. They depend on the chosen mechanism and are detailed in Execution Mechanisms Section. This is where Coasters options, such as nodeGranularity or softImage, would be specified. Example:

execution {
    type: "coaster"
    jobManager: "local:local"
    options {
        maxJobs: 1
        tasksPerNode: 2
        workerLoggingLevel: TRACE
    }
}

A complete list of Swift Coasters options can be found in Coaster Options

Staging

The staging property instructs Swift how to handle application input and output files. The swift and wrapper staging methods are supported universally, but the swift method requires the filesystem property to be specified. If not specified, this option defaults to swift. Support for the other choices is dependent on the execution mechanism. This is detailed in the Execution Mechanisms Table above. A description of each staging method is provided in the table below:

Table 2. Swift Staging Methods
Staging Method Description

swift

This method instructs Swift to use a filesystem provider to direct all necessary staging operations from the Swift client-side to the cluster head node. If this method is used, the workDirectory must point to a head node path that is on a shared file system accessible by the compute nodes.

wrapper

File staging is done by the Swift application wrapper

local

Used to indicate that files should be staged in/out from/to the site on which Swift is running. In the case of Swift Coasters, the system proxies the tranfers between client side and compute nodes through the Coaster Service.

service-local

This method instructs the execution mechanism provider to stage input and output files from the remote site where the execution service is located. For example, if a Coaster Service is started on the login node of a cluster, the Coaster Service will perform the staging from a file system on the login node to the compute node and back.

shared-fs

This method is used by Coasters to implement a simple staging mechanism in which files are accessed using a shared filesystem that is accessible by compute nodes

direct

Tries to avoid moving files around as much as possible and passes absolute file names to the application instead. The node on which the application is running must have access to the filesystem on which swift data is located.

File System

The file system properties are used with staging: "swift" to tell Swift how to access remote file systems. Valid types are described below:

Table 3. Swift File System Providers
Type URL required Description

local

no

Copies files locally on the Swift client side

GSIFTP

yes

Accesses a remote file system using GridFTP

GridFTP

yes

An alias for GSIFTP

SSH

yes

Uses the SCP protocol

Site Options

Site options control various aspects of how Swift handles application execution on a site. All options except workDirectory are optional. They are listed in the following table:

Table 4. Site Options
Option Valid values Default value Description

OS

many

"INTEL32::LINUX"

Can be used to tell Swift the type of the operating system running on the remote site. By default, Swift assumes a UNIX/Linux type OS. There is some limited support for running under Windows, in which case this property must be set to one of "INTEL32::WINDOWS" or "INTEL64::WINDOWS"

workDirectory

path

-

Points to a directory in which Swift can maintain a set of files relevant to the execution of an application on the site. By default, applications will be executed on the compute nodes in a sub-directory of workDirectory, which implies that workDirectory must be accessible from the compute nodes.

scratch

path

-

If specified, it instructs swift to run applications in a directory different than workDirectory. Contrary to the requirement for workDirectory, scratch can point to a file system local to compute nodes. This option is useful if applications do intensive I/O on temporary files created in their work directory, or if they access their input/output files in a non-linear fashion.

keepSiteDir

true, false

false

If set to true, site application directories (i.e. workDirectory) will not be cleaned up when Swift completes a run. This can be useful for debugging.

statusMode

"files", "provider"

"files"

Controls whether application exit codes are handled by the execution mechanism or passed back to Swift by the Swift wrapper script through files. Traditionally, Globus GRAM did not use to return application exit codes. This has changed in Globus Toolkit 5.x. However, some local scheduler execution mechanisms, such as PBS, are still unable to return application exit codes. In such cases, it is necessary to pass the application exit codes back to Swift in files. This comes at a slight price in performance, since a file needs to be created, written to, and transferred back to Swift for each application invocation. It is however also the default, since it works in all cases.

maxParallelTasks

integer

2

The maximum number of concurrent application invocations allowed on this site.

initialParallelTasks

integer

2

The limit on the number of concurrent application invocations on this site when a Swift run is started. As invocations complete successfully, the number of concurrent invocations on the site is increased up to maxParallelTasks.

Additional, less frequently used options, are as follows:

Table 5. Obscure options that you are unlikely to need to worry about
Option Valid values Default value Description

wrapperParameterMode

"args", "files"

"args"

If set to "files", Swift will, as much as possible, pass application arguments through files. The applications will be invoked normally, with their arguments in the **argv parameter to the main() function. This can be useful if the execution mechanism has limitations on the size of command line arguments that can be passed through. An example of execution mechanism exhibiting this problem is Condor.

wrapperInterpreter

path

"/bin/bash" or "cscript.exe" on Windows

Points to the interpreter used to run the Swift application invocation wrapper

wrapperScript

string

"_swiftwrap" or "_swiftwrap.vbs" on Windows

Points to the Swift application invocation wrapper. The file must exist in the libexec directory in the Swift distribution

wrapperInterpreterOptions

list of strings

[] on UNIX/Linux or ["//Nologo"] on Windows

Command line options to be passed to the wrapper interpreter

cleanupCommand

string

"/bin/rm" or "cmd.exe" on Windows

A command to use for the cleaning of site directories (unless keepSiteDir is set to true) at the end of a run.

cleanupCommandOptions

list of strings

["-rf"] or ["/C", "del", "/Q"] on Windows

Arguments to pass to the cleanup command when cleaning up site work directories

delayBase

number

2.0

Swift keeps a quality indicator for each site it runs applications on. This is a number that gets increased for every successful application invocation, and decreased for every failure. It then uses this number in deciding which sites to run applications on (when multiple sites are defined). If this number becomes very low (a sign of repeated failures on a site), Swift implements an exponential back-off that prevents jobs from being sent to a site that continously fails them. delayBase is the base for that exponential back-off: delay = delayBase ^ (-score * 100)

maxSubmitRate

positive number

-

Some combinations of site and execution mechanisms may become error prone if jobs are submitted too fast. This option can be used to limit the submission rate. If set to some number N, Swift will submit applications at a rate of at most N per second.

Application Declarations

Applications can either be declared globally, outside of a site declaration, or specific to a site, inside a site declaration:

app.(<appName>|ALL) {
    # global application
    ...
}

site.<siteName> {
    app.(<appName>|ALL) {
        # site application
        ...
    }
}

A special application name, ALL, can be used to declare options for all applications. When Swift attempts to run an application named X, it will first look at site application declarations for app.X. If not found, it will check if a site application declaration exists for app.ALL. The search will continue with the global app.X and then the global all.ALL until a match is found. It is possible that a specific application will only be declared on a sub-set of all the sites and not globally. Swift will then only select a site where the application is declared and will not attempt to run the application on other sites.

An application declaration takes the following form:

app.<appName> {
    executable: (<string>|"*")
    [jobQueue: <string>]
    [jobProject: <string>]
    [maxWallTime: <time>]
    [options: {...}]
    <environment variables>
}

The executable is mandatory, and it points to the actual location of the executable that implements the application. The special string "*" can be used to indicate that the executable has the same name as the application name. This is useful in conjunction with app.ALL to essentially declare that a site can be used to execute any application from a Swift script. If the executable is not an absolute path, it will be searched using the PATH envirnoment variable on the remote site.

Environment variables can be defined as follows:

    env.<name>: <value>

For example:

    env.LD_LIBRARY_PATH: "/home/joedoe/lib"

The remaining options are:

Table 6. Application Options
Name Valid values Description

jobQueue

any

If the application is executed using a mechanism that submits to a queuing system, this option can be used to select a specific queue for the application

jobProject

any

A queuing system project to associate the job with.

maxWallTime

"mm" or "hh:mm" or "hh:mm:ss"

The maximum amount of time that the application will take to execute on the site. Most application execution mechanisms will both require and enforce this value by terminating the application if it exceeds the specified time. The default value is 10 minutes.

General Swift Options

There are a number of configuration options that modify the way that the Swift run-time behaves. They are listed below:

Table 7. General Swift Options
Name Valid values Default value Description

sites

array of strings (e.g. ["site1", "site2"]) or CSV string (e.g. "site1, site2")

none

Selects, out of the set of all declared sites, a sub-set of sites to run applications on.

hostName

string

autodetected

Can be used to specify a publicly reacheable DNS name or IP address for this machine which is generally used for Globus or Coaster callbacks. Normally this should be auto-detected, but if you do not have a public DNS name, you may want to set this.

TCPPortRange

"lowPort, highPort"

none

A TCP port range can be specified to restrict the ports on which certain callback services are started. This is likely needed if your submit host is behind a firewall, in which case the firewall should be configured to allow incoming connections on ports in this range.

lazyErrors

true, false

false

Use a lazy mode to deal with errors. When set to true Swift will proceed with the execution until no more data can be derived because of errors in dependent steps. If set to false, an error will cause the execution to immediately stop

executionRetries

non-negative integer

0

The number of time an application invocation will be retries if it fails until Swift finally gives up and declares it failed. The total number of attempts will be 1
executionRetries
.

logProvenance

true, false

false

If set to true, Swift will record provenance information in the log file.

alwaysTransferWrapperLog

true, false

false

Controls when wrapper logs are transfered back to the submit host. If set to false, Swift will only transfer a wrapper log for a given job when that job fails. If set to true, Swift will transfer wrapper logs whether a job fails or not.

fileGCEnabled

true, false

true

Controls the file garbage collector. If set to false, files mapped by collectable mappers (such as the concurrent mapper) will not be deleted when their Swift variables go out of scope.

mappingCheckerEnabled

true, false

true

Controls the run-time duplicate mapping checker (which indetifies mapping conflicts). When enabled, a record of all mapped data is kept, so this comes at the expense of a slight memory leak. If set false, the mapping checker is disabled.

tracingEnabled

true, false

false

Enables execution tracing. If set to true, operations within Swift such as iterations, invocations, assignments, and declarations, as well as data dependencies will be logged. This comes at a cost in performance. It is therefore disabled by default.

maxForeachThreads

positive integer

16384

Limits the number of concurrent iterations that each foreach statement can have at one time. This conserves memory for swift programs that have large numbers of iterations (which would otherwise all be executed in parallel).

Ticker

tickerEnabled

true, false

true

Controls the output ticker, which regularly prints information about the counts of application states on the Swift’s process standard output

tickerPrefix

string

"Progress: "

Specifies a string to prefix to each ticker line output

tickerDateFormat

string

"E, dd MMM yyyy HH:mm:ssZ"

Specifies the date/time format to use for the time stamp of each ticker line. It must conform to Java’s <http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html,SimpleDateFormat> syntax.

CDM

CDMBroadcastMode

string

"file"

-

CDMLogFile

string

"cdm.log"

-

Replication

replicationEnabled

true, false

false

If enabled, jobs that are queued longer than a certain amount of time will have a duplicate version re-submitted. This process will continue until a maximum pre-set number of such replicas is queued. When one of the replicas becomes active, all other replicas are canceled. This mechanism can potentially prevent a single overloaded site from completely blocking a run.

replicationMinQueueTime

seconds

60

When replication is enabled, this is the amount of time that a job needs to be queued until a new replica is created.

replicationLimit

integer > 0

3

The maximum number of replicas allowed for a given application instance.

Wrapper Staging

wrapperStagingLocalServer

string

"file://"

When file staging is set to "wrapper", this indicates the default URL scheme that is prefixed to local files.

Throttling

jobSubmitThrottle

integer > 0 or "off"

4

Limits the number of jobs that can concurrently be in the process of being submitted, that is in the "Submitting" state. This is the state where the job information is being communicated to a remote service. Certain execution mechanisms may become inefficient if too many jobs are being submitted concurrently and there are no benefits to parallelizing submission beyond a certain point. Please not that this does not apply to the number of jobs that can be active concurrently.

hostJobSubmitThrottle

integer > 0 or "off"

2

Like jobSubmitThrottle, except it applies to each individual site.

fileTransfersThrottle

integer > 0 or "off"

4

Limits the number of concurrent file transfers when file staging is set to "swift". Arbitrarily increasing file transfer parallelism leads to little benefits as the throughput approaches the maximum avaiable network bandwidth. Instead it can lead to an increase in latencies which may increase the chances of triggering timeouts.

fileOperationsThrottle

integer > 0 or "off"

8

Limits the number of concurrent file operations that can be active at a given time when file staging is set to "swift". File operations are defined to be all remote operations on a filesystem that exclude file transfers. Examples are: listing the contents of a directory, creating a directory, removing a file, etc.

Global versions of site options

staging

"swift", "local", "service-local", "shared-fs", "wrapper"

"swift"

See Staging Methods.

keepSiteDir

true, false

false

See Site Options.

statusMode

"files", "provider"

"files"

See Site Options.

wrapperParameterMode

"args", "files"

"args"

See Other Site Options.

4.6. Run directories

When you run Swift, you will see a run directory get created. The run directory has the name of runNNN, where NNN starts at 000 and increments for every run.

The run directories can be useful for debugging. They contain: .Run directory contents

apps

An apps generated from swift.properties

cf

A configuration file generated from swift.properties

runNNN.log

The log file generated during the Swift run

scriptname-runNNN.d

Debug directory containing wrapper logs

scripts

Directory that contains scheduler scripts used for that run

sites.xml

A sites.xml generated from swift.properties

swift.out

The standard out and standard error generated by Swift

4.7. Execution Mechanisms

Swift allows application execution through a number of mechanisms (or execution providers). The choice of each mechanism is dependent on the software installed on a certain compute cluster. The following sub-sections list the available choices together with their supported options.

4.7.1. Local

The local execution mechanism can be used to run applications locally through simple fork() calls.

General Configuration

URL required

no

Job Manager

not used

Staging methods

swift, wrapper, local, service-local, shared-fs, direct

Options

N/A

Job Options
Name Type Default Value Description

count

Integer

1

Launch this many copies of the application for each invocation

Example
    site.local {
        execution {
            type: "local"
        }

        staging: direct

        app.ALL {
            executable: "*"
            count: 1
        }
    }

4.7.2. GT5

Uses the <http://toolkit.globus.org/toolkit/docs/latest-stable/gram5/#gram5,GRAM> component of the Globus Toolkit to launch jobs on remote resources.

General Configuration

URL required

yes

Job Manager

In GRAM, job managers instruct the GRAM service to submit jobs to specific resource managers on the server side. The exact available job managers depend on the particular GRAM installation. However, "fork", which instructs GRAM to run jobs directly on the service node, should always be available. In addition, the available job managers would typically match the queuing systems installed on the server side. For example, if a cluster uses Torque/PBS, then the "PBS"
[Swift automatically converts job manager names to lower case strings and adds the "jobmanager-" prefix to match the format required by Globus GRAM]
job manager should be available. The following is a list of known possible job manager values: "fork", "PBS", "LSF", "Condor", "SGE", "Slurm"

Staging methods

swift, wrapper

Options

N/A

Job Options

For a complete list and description of these options, please see the Globus GRAM documentation

Name Type Default Value Description

count

Integer

1

Launch this many copies of the application for each invocation

max_time

Integer (minutes)

-

max_wall_time

Integer (minutes)

-

max_cpu_time

Integer (minutes)

-

max_memory

Integer (MB)

-

min_memory

Integer (MB)

-

project

String

-

A LRM project to associate the job with

queue

String

-

LRM queue to submit to

Example
    site.example {
        execution {
            type: "gt5"
            url: "login.example.org"
            jobManager: "PBS"
        }

        staging: swift

        app.sim {
            executable: /usr/bin/sim
            queue: "fast"
            min_memory: 120
        }
    }

4.7.3. SSH

Runs jobs through a Java implementation of the SSH protocol. This mechanism generally results in a higher throughput than using the command-line SSH tool since it can reduce the number of authentication operations by re-using connections.

General Configuration

URL required

yes

Job Manager

not used

Staging methods

swift, wrapper

Options

N/A

Job Options

N/A

Example
    site.example {
        execution {
            type: "ssh"
            url: "login.example.org"
        }
    }

4.7.4. SSH-CL

Uses the ssh command-line tool to run jobs.

General Configuration

URL required

yes

Job Manager

not used

Staging methods

swift, wrapper

Options

N/A

Job Options

N/A

Example
    site.example {
        execution {
            type: "ssh-cl"
            url: "login.example.org"
        }
    }

4.7.5. PBS

Submits jobs directly to a Torque/PBS queue.

General Configuration

URL required

no

Job Manager

not used

Staging methods

swift, wrapper

Options

N/A

Job Options
Name Type Default Value Description

count

Integer

1

Request this number of nodes for the job

ppn

Integer

1

Sets the number of Processes Per Node

depth

Integer

1

Only used if mpp is set to true. Sets the depth (number of OpenMP threads/cores to allocate for each process)

pbs.mpp

Boolean

false

If set to true, use the mpp versions of count, ppn, and depth: mppwidth, mppnppn, mppdepth respectively.

pbs.properties

String

-

If specified, this string will be passed verbatim to PBS inside the "#PBS -l" line.

project

String

-

A PBS project to associate the job with

queue

String

-

PBS queue to submit to

pbs.resource_list

String

-

WRITEME!

pbs.aprun

Boolean

false

If specified, use the aprun tool instead of ssh to start jobs on the compute nodes. aprun is a tool typically found on Cray systems.

Example
    site.pbs {
        execution {
            type: "PBS"
        }

        app.sim {
                executable: "/usr/bin/sim"
                count: 2
                ppn: 2
                depth: 2
                pbs.mpp: true
                queue: "fast"
        }
    }

4.7.6. Condor

Submits jobs using the HTCondor system.

General Configuration

URL required

no

Job Manager

not used

Staging methods

swift, wrapper

Options

N/A

Job Options
Name Type Default Value Description

jobType

"MPI", "grid", "nonshared", none

none

Specifies the job type (Condor universe). "nonshared" translates to the "vanilla" universe.

holdIsFailure

Boolean

false

Treat jobs in the held state as failed.

count

Integer

1

Number of machines to request for the job

condor.*

Any

-

Can be used to pass arbitrary properties to Condor.

Example
    site.condor {
        execution {
            type: "Condor"
        }

        app.sim {
            executable: "/usr/bin/sim"
            condor.leave_in_queue: "TRUE"
        }
    }

4.7.7. Coasters

Coasters are a mechanism that packages multiple swift application invocations into larger LRM jobs resulting in increased efficiency when running multiple small applications. To distinguish between the application invocations and the jobs in which Coasters package them, the terms task and job are used, respectively.

General Configuration

URL required

maybe

Job Manager

"em1:em2", where em1 is an execution mechanism used to start the Coaster Service and em2 is an execution mechanism used by the Coaster Service to start jobs. If em1 requires an URL, then the URL is required.

Staging methods

swift, wrapper, local, service-local, shared-fs, direct

Options
Name Type Default Value Description

maxJobs

Integer

20

The maximum number of jobs that can be running at a time.

nodeGranularity

Integer

1

If specified, the number of nodes requested for each job will be a multiple of this number

tasksPerNode

Integer

1

The maximum number of concurrent tasks allowed to run on a node

allocationStepSize

[0.0, 1.0]

0.1

The Coaster service allocates jobs periodically depending on the number of tasks queued. This number can be used to limit the percentage of jobs out of maxJobs that will be used in each allocation step.

lowOverallocation

Positive float

10

Indicates how much bigger the job wall time should be in comparison to the task wall time for tasks that have a small wall time (around 1 second)

highOverallocation

Positive float

1

Indicates how much bigger the job wall time should in comparison to the task wall time for tasks that have a very large wall time

overallocationDecayFactor

Positive float

1e-3

Used to interpolate the "overallocation" for task wall times that are neither very large or very small. The formula used is jobWalltime = taskWalltime * ((L - H) * exp(-taskWalltime * D) + H), where L is lowOverallocation, H is highOverallocation, and D is overallocationDecayFactor.

spread

[0.0, 1.0]

0.9

When allocating jobs, the total number of nodes to allocate can be fixed based on, for example, maximizing parallelism for all the tasks. However, the way the nodes are distributed to individual jobs can be arbitrary. This parameter controls whether nodes should be uniformly distributed among jobs (spread = 0) or if the node distribution should be as diverse as possible (spread = 1). A high spread could be useful in fitting jobs better into a cluster’s schedule.

reserve

Integer (seconds)

60

The amount of time to add to each job’s wall time in order to prevent premature termination of tasks due to various overheads

maxNodesPerJob

Integer

1

The maximum number of nodes that a job is allowed to have.

maxJobTime

"HH:MM:SS"

-

The maximum wall time that a job is allowed to have

userHomeOverride

String

-

A path that can be used to override the default user home directory. This may be necessary on systems on which compute nodes do not have access to the default user home directory.

internalHostName

String

-

A host name or address that can be used to initiate connections from compute nodes to the login node. Specifying this is seldom necessary.

jobQueue

String

-

The LRM queue to submit the jobs to

jobProject

String

-

A LRM project to associate the job with

workerLoggingLevel

"ERROR", "WARN", "INFO", "DEBUG", "TRACE", or none

none

If specified, the Coaster Workers produce a log file

workerLoggingDirectory

String

"~/.globus/coasters"

The directory where the worker logs will be created. This directory needs to be accessible from compute nodes.

softImage

String

-

WRITEME!

Job Options

N/A

Example
    site.condor {
        execution {
            type: "Condor"
        }

        app.sim {
            executable: "/usr/bin/sim"
            condor.leave_in_queue: "TRUE"
        }
    }

5. Debugging

5.1. Retries

If an application procedure execution fails, Swift will attempt that execution again repeatedly until it succeeds, up until the limit defined in the execution.retries configuration property.

Site selection will occur for retried jobs in the same way that it happens for new jobs. Retried jobs may run on the same site or may run on a different site.

If the retry limit execution.retries is reached for an application procedure, then that application procedure will fail. This will cause the entire run to fail - either immediately (if the lazy.errors property is false) or after all other possible work has been attempted (if the lazy.errors property is true).

With or without lazy errors, each app is re-tried <execution.retries> times before it is considered failed for good. An app that has failed but still has retries left will appear as "Failed but can retry".

Without lazy errors, once the first (time-wise) app has run out of retries, the whole run is stopped and the error reported.

With lazy errors, if an app fails after all retries, its outputs are marked as failed. All apps that depend on failed outputs will also fail and their outputs marked as failed. All apps that have non-failed outputs will continue to run normally until everything that can proceed completes.

For example, if you have:

foreach x in [1:1024] {
   app(x);
}

If the first started app fails, all the other ones can still continue, and if they don’t otherwise fail, the run will only terminate when all 1023 of them will complete.

So basically the idea behind lazy errors is to run EVERYTHING that can safely be run before stopping.

Some types of errors (such as internal swift errors happening in an app thread) will still stop the run immediately even in lazy errors mode. But we all know there are no such things as internal swift errors :)

5.2. Restarts

If a run fails, Swift can resume the program from the point of failure. When a run fails, a restart log file will be left behind in the run directory called restart.log. This restart log can then be passed to a subsequent Swift invocation using the -resume parameter. Swift will resume execution, avoiding execution of invocations that have previously completed successfully. The Swift source file and input data files should not be modified between runs.

Normally, if the run completes successfully, the restart log file is deleted. If however the workflow fails, swift can use the restart log file to continue execution from a point before the failure occurred. In order to restart from a restart log file, the -resume logfile argument can be used after the Swift script file name. Example:

$ swift -resume runNNN/restart.log example.swift.

5.3. Monitoring Swift

Swift runs can be monitored for progress and resource usage. There are three basic monitors available: Swing, TUI, and http.

5.3.1. HTTP Monitor

The HTTP monitor will allow for the monitoring of Swift via a web browser. To start the HTTP monitor, run Swift with the -ui http:<port> command line option. For example:

swift -ui http:8000 modis.swift

HTTP-monitor.jpg

This will create a server running on port 8000 on the machine where Swift is running. Point your web browser to http://<ip_address>:8000 to view progress.

5.3.2. Swing Monitor

The Swing monitor displays information via a Java gui/X window. To start the Swing monitor, run Swift with the -ui Swing command line option. For example:

swift -ui Swing modis.swift

Swing-monitor.jpg

This will produce a gui/X window consisting of the following tabs:

  • Summary

  • Graphs

  • Applications

  • Tasks

  • Gantt Chart

5.3.3. TUI Monitor

The TUI (textual user interface) monitor is one option for monitoring Swift on the console using a curses-like library.

The progress of a Swift run can be monitored using the -ui TUI option. For example:

swift -ui TUI modis.swift

TUI-monitor.jpg

This will produce a textual user interface with multiple tabs, each showing the following features of the current Swift run:

  • A summary view showing task status

  • An apps tab

  • A jobs tab

  • A transfer tab

  • A scheduler tab

  • A Task statistics tab

  • A customized tab called Ben’s View

Navigation between these tabs can be done using the function keys f2 through f8.

5.4. Log analysis

Swift logs can contain a lot of information. Swift includes a utility called "swiftlog" that analyzes the log and prints a nicely formatted summary of all tasks of a given run.

swiftlog usage
$ swiftlog run027
Task 1
        App name = cat
        Command line arguments = data.txt data2.txt
        Host = westmere
        Start time = 17:09:59,607+0000
        Stop time = 17:10:22,962+0000
        Work directory = catsn-run027/jobs/r/cat-r6pxt6kl
        Staged in files = file://localhost/data.txt file://localhost/data2.txt
        Staged out files = catsn.0004.outcatsn.0004.err

Task 2
        App name = cat
        Command line arguments = data.txt data2.txt
        Host = westmere
        Start time = 17:09:59,607+0000
        Stop time = 17:10:22,965+0000
        Work directory = catsn-run027/jobs/q/cat-q6pxt6kl
        Staged in files = file://localhost/data.txt file://localhost/data2.txt
        Staged out files = catsn.0010.outcatsn.0010.err