:stem: asciimath

The Swift Runtime
-----------------

Swift is a deterministic dataflow language. The lexical ordering of statements 
is generally irrelevant. What is relevant are the *dependencies* between data.

Values as Futures
~~~~~~~~~~~~~~~~~

Each <<language:values, value>> in Swift is a *future*. A future *wraps* a 
*concrete value* and has two possible states:

Open:: (or unbound). This is the default initial state of a value. The concrete
value is absent and cannot yet be used in a concrete operation. It will be 
available at a later time.
Closed:: (or bound). This is a state in which a concrete value is available and 
can be used in an operation.

If a value is open at some time, it can be closed only at a later time. It is
not possible for a variable to become open after it was closed.

Value literals are represented with futures that are closed when a program starts execution.

[[runtime:order-of-operations]]
Order of Operations
~~~~~~~~~~~~~~~~~~~

Independent operations in Swift are all executed in parallel. However, 
operations can depend on values which must be closed before the respective
operations can be executed. Consider the following example:

[listing, swift]
----
int a = 1;
int b = 2;

int c = f(a) + g(b);
----

The following operations can be identified:

 * three assignments, for variables +a+, +b+ and +c+
 * two function invocations: +f(a)+ and +g(b)+
 * an addition operation
 
All these operations are started in parallel as soon as Swift starts running the
program. The assignments to variables +a+ and +b+ can continue immediately since
they depend only on integer literals, which are closed by default. The 
invocations of +f+ and +g+ can then continue. The addition has to wait for the
results from the invocations of both +f+ and +g+. When those results are 
available, the addition can be peformed and the resulting value can be finally
assigned to +c+.

Types of Operations
~~~~~~~~~~~~~~~~~~~

Technically speaking, many things can be considered ``operations''. However, it
is worth emphasizing some of them due to the particular way in which they are
executed.

Assignments:: The <<language:assignment-statement, assignment statement>> waits 
for the right hand side to be closed, copies the concrete value from the right 
hand side to the left hand side, and finally closes the left hand side.

TIP: See also: 
<<runtime:assignment-of-mapped-values, assignment of mapped values>>

Application Functions:: An application instance will only run after all of its 
actual parameters are closed. After an application invocation completes, all 
the actual return parameters are closed by Swift.


Function Invocations:: When a non-application function is invoked, it does not
necessarily wait for all of its actual parameters to be closed. The details
depend on the actual definition of the function. Consider the following example:
+
[listing, swift]
----
(int result) product(int f1, int f2) {
    if (f1 == 0) {
        result = 0;
    }
    else {
        result = f1 * f2;
    }
}

int r1 = product(0, x);
int r2 = product(2, x);
----
+
Two cases exist:
+
. In the first invocation of the +product+ function, the parameter +f1+ is zero. 
The first branch of the +if+ condition is taken, which does not include an 
operation that waits for the value of +f2+ to be closed.
. In the second invocation, the value of +f1+ is non-zero. The second branch
of the +if+ is taken. This branch contains an operator that has the values of
both +f1+ and +f2+ as parameters. In this case the +product+ function needs to
wait for +f2+ to be closed before producing a result.

+
For functions defined in the Swift <<runtime:library, standard library>>, it is
implied that the implementation will always wait for the value of the parameters
to be closed. Any exceptions to this rule are explicitly documented.
+
Operators:: There is no semantic difference between operator invocations and 
library function invocations, so the above rule applies. In particular, the
current implementation does not implement shortcut evaluation for boolean 
operators.

[[runtime:arrays-and-iterations]]
Arrays and Iterations
~~~~~~~~~~~~~~~~~~~~~

Arrays in Swift are sparse. This means that array sizes, in general, can only be
fully determined at run-time. Because of this, the array structure itself has 
the properties of a future. Consider the following example:

[listing, swift]
----
int[] a, b;

a[0] = 1;

iterate i {
    int value = someComplexFunction(i);
    a[i + 1] = value;
} until (value > 100);

foreach x, i in a {
    b[i] = f(x);
}
----

Since +iterate+ and +foreach+ run in parallel, the +foreach+ will start before
+a+ is fully constructed. We assume that the +iterate+ condition 
eventually becomes +true+. In order for the program to terminate and 
function deterministically, the +foreach+ must eventually terminate and it must
not do so before all the items have been added to +a+. Arrays must therefore
have two states:

Open array:: In this state it is unknown whether more items will be added to an 
array or not.
Closed array:: Both the size of the array and the indices corresponding to all 
the items in the array are known.

A +foreach+ statement will start, in parallel, an iteration for each item 
available in an array. As more items are added to the array, +foreach+ will 
start the corresponding iterations. When the array becomes closed and all 
iterations complete, +foreach+ will complete.

When are Arrays Closed
^^^^^^^^^^^^^^^^^^^^^^

The determination of when an array can be closed is made based on analyzing all
source code locations in which that array is written to. In the last example 
above, all writing operations to +a+ are done within the +iterate+ statement. 
The Swift compiler will generate code to close +a+ as soon as the +iterate+
statement completes.

Writing to an Array that is Being Iterated on
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Swift allows writing to an array that is being iterated on using the +foreach+
statement:

[listing, swift]
----
int[] a;

a[0] = 1;

foreach x, i in a {
    int value = someComplexFunction(i);
    if (value <= 100) {
        a[i + 1] = value;
    }
}
----

Swift handles array closing slightly differently in the case of 
``self referencing'' +foreach+ statements. The +a+ array will be closed when 
both the following conditions are true:

. All writing operations to +a+ outside of the +foreach+ have completed
. Iterations for all available items in +a+ have completed

This can be used to implement everything that can be implemented using the
+iterate+ statement, possibly in a more clear fashion. Users are encouraged
to use +foreach+ instead of +iterate+.

Circular Dependencies
~~~~~~~~~~~~~~~~~~~~~

Circular dependencies are situations in which two or more values form a 
dependency cycle through operations applied on them. The simplest case is that
of two mutually dependent values:

[listing, swift]
----
int a, b;

a = f(b);
b = f(a);
----

The variable +a+ cannot be closed until +b+ is closed which in turn cannot be
closed until +a+ is closed. Situations like this will cause a Swift program
to stop progressing. This is detected by the Swift run-time and results in a
run-time error.

While the above situation can theoretically be detected at compile-time, this 
cannot be done in all cases. Consider the following example:

[listing, swift]
----
int[] a;

a[0] = 1;

a[1] = a[getIndex(1)];
a[2] = a[getIndex(2)];
----

It is possible, but not necessary, that at run-time the above code will
result in:

[listing, swift]
----
int[] a;

a[0] = 1;

a[1] = a[2];
a[2] = a[1];
----

This can only be determined by knowing the values returned by the 
+getIndex()+ function, which in general can only be done at run-time.

Error Handling
~~~~~~~~~~~~~~

TODO

[[runtime:file-mapping]]
File Mapping
~~~~~~~~~~~~

File mapping is the process through which values stored in Swift variables of
<<language:mapped-types, mapped types>> are associated with physical files. This 
process is implemented by *mappers* which are specified using 
<<language:mapping-declarations, mapping declarations>>. A non-composite mapped
value can be associated to a file using the 
<<mappers:SingleFileMapper, SingleFileMapper>> or its short form:

[listing, swift]
----
file f <"file.dat">;
----

The above code associates the variable +f+ to a local file named +file.dat+ 
which is assumed to be in the directory from which Swift is invoked. It is 
possible to more complex path names or even *URLs* to associate a Swift variable
with files that do not necessarily reside in the current directory:

[listing, swift]
----
file f1 <"/tmp/file.dat">;

file f2 <"http://example.org/index.html">;
----

For a list of all supported remote file access methods, please see 
<<table-filesystem-providers, Filesystem Providers>>.

Values of composite types containing file-valued data can be mapped in bulk 
using one of the additional <<library:mappers, mappers>> provided by Swift. 
For example, the <<mappers:FilesysMapper, FilesysMapper>> can be used
to glob files in a directory and map them to an entire array:

[listing, swift]
----
file[] a <FilesysMapper; location = ".", pattern = "*.dat">;
----

[[runtime:file-mapping:implicit-mapping]]
Implicit Mapping
^^^^^^^^^^^^^^^^

A mapped type value that is not mapped explicitly is *implicitly mapped*. 
Specifically Swift will map it to a deterministic but opaque temporary file.

Input or Output
^^^^^^^^^^^^^^^

Swift distinguishes between input and output mapped data based on whether 
explicit assignments are present in the program. If a variable is assigned to 
in a Swift program, Swift considers that variable to be an output. Otherwise, 
Swift marks it as an input. When a variable is marked as an input, Swift 
requires that the corresponding files be present, unless the variable is an 
array that can be empty. An input mapped variable is considered to be 
implicitly assigned. Example:

[listing, swift]
----
file f <"input.dat">; <1>
file g <"output.dat">; <2>

g = cat(f);
----
<1> Variable +f+ is not explicitly assigned to. It is therefore an input 
variable, and Swift implicitly assigns a file value representing the +input.dat+
file to it. It is an error for +input.dat+ not to exist as a file.
<2> Variable +g+ is explicitly assigned to. It is therefore an output and it 
will be created by Swift during the program's execution.

It does not make sense for certain mappers to be used for output (such as the 
<<mappers:FilesysMapper, FilesysMapper>>), since their operation depends
on having a set of physical files present.

[[runtime:assignment-of-mapped-values]]
Assignment of Mapped Values
^^^^^^^^^^^^^^^^^^^^^^^^^^^

When a mapped value is assigned to a mapped type variable, and the destination
variable is not <<runtime:file-mapping:implicit-mapping, implicitly mapped>>,
Swift guarantees that the file that the destination variable is mapped to will
exist. This is generally done by copying the file. However, Swift may optimize
this when possible, in particular by creating symbolic links on systems that
support it.

[[runtime:application-functions-execution]]
Application Functions Execution
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Applications in Swift are generally executed on remote computing resources. In 
the Swift language, applications functions are specified in a way that is 
independent of where and how applications are run, which allows application 
instances to be scheduled efficiently based on resource availability. When
an application function is invoked, the following steps are taken:

. Swift waits for all the actual parameters to the application function to be
closed
. *Site selection*: a remote resource is selected from all available resources 
based on whether they contain the given application, load, and other run-time information (see Section on <<runtime:site-selection, Site Selection>>).
. A file sandbox is created, where the application input and output files will
be stored
. All files corresponding to mapped values in the actual parameters are copied
to the application sandbox ("stage-in")
. The application is executed remotely using one of the available 
<<configuration:execution-mechanisms, execution mechanisms>>
. All files corresponding to mapped values in the actual return parameters are
copied back from the application sandbox, and the actual return parameters are
closed ("stage-out")
. The application sandbox is deleted

The application execution is performed indirectly. A small script, called 
"wrapper" (or +_swiftwrap+), is used to implement the following functions:

- set up the details of the application sandbox, such as directory structures 
where input and output files go
- perform basic tests of the environment and try to generate user-friendly 
error messages if something is wrong
- copy or link files from a resource-local swift cache to the application 
sandbox and back
- optionally clean up the application sandbox
- record the status of the application execution
- log various application timing information

There are three ways in which the above operations can be performed and they
will be explained in detail in the following sections: *swift staging*, 
*provider staging*, and *wrapper staging*.

Swift Staging
^^^^^^^^^^^^^

Swift staging is historically the first mechanism used by Swift to deal with
remote application files. In swift staging mode (selected through the 
`staging: "swift"` <<configuration, configuration>> option), for each 
job, the Swift runtime selects a site as described in 
<<runtime:site-selection, Site Selection>>, and, using that site, it performs
the following operations in order:

. Using the <<table-filesystem-providers, filesystem provider>> specified in 
the configuration file, it creates a *shared directory* in the location 
specified by the `workDirectory` site property that will mirror the
local directory structure with respect to all files used by applications running
on that site. This directory is only created once per site per swift program
execution, and it is initially empty. It is guaranteed that two invocations of
swift programs will have different shared directories on a given site.
. In the shared directory, Swift creates the directory structure needed by the 
job's input and output files. For example, if an application uses the following
input:
+
[listing, swift]
----
file f <"data/f.dat">;
----
+
then Swift will create a directory named `data` inside the site shared 
directory.
+
. Using the filesystem provider, swift copies all the application's input files
into their respective directories inside the shared directory. Files that are
already in the shared directory are skipped.
. The application is invoked through `_swiftwrap`. The essential steps taken by 
`_swiftwrap` in Swift staging mode are as follows:
.. Create a sandbox directory either inside the shared directory, or in a
temporary location if the `scratch` property is specified for the site
.. For each of the application's input files, either copy the file from the
shared directory into the sandbox directory or create a symbolic link inside the
sandbox directory to the file in the shared directory. The choice of whether to
copy or link is determined by existence of the `scratch` property. If the 
property is defined for the site, then the files will be copied. The copying
process preserves the directory structure.
.. Run the application
.. If the application returns a zero exit code indicating success, then:
... Check that all the output files were created by the application and fail if
not
... Move the output files from the sandbox directory back to the shared 
directory preserving the directory structure
... Remove the job directory
.. Exit signalling success or failure; the exact method of signalling depends on
the value of the `statusMode` property. If set to `provider`, then `_swiftwrap` 
exits with an exit code equal to the exit code of the application. If set to
`files`, then swift creates either an empty success file or a failure file 
containing the application exit code
. Transfer and check the status files for the job status if `statusMode` is 
`files`
. Copy all application output files back to the machine that Swift is running
on

For example, consider the following Swift program:

[listing, swift]
----
app (file outf) cat(file inf) {
    "/bin/cat" filename(inf) stdout = filename(outf);
}

file inf <"inputs/a.dat">;
file outf <"outputs/b.dat">;

outf = cat(inf);
----

and the following configuration:

[listing, swiftconf]
----
site.cluster {
    execution {
        type: "GRAM"
        url: "login.cluster.example.org"
        jobManager: "PBS"
    }
    
    filesystem {
        type: "GSIFTP"
        url: "login.cluster.example.org"
    }
    
    statusMode: "provider"
    staging: "swift"
    workDirectory: "/homes/johndoe/swiftwork"
}

sites: [cluster]
----

Swift would execute the `cat` application as follows:

image:swift-staging.svg[]

A few observations are in order:

- `statusMode: "files"` was historically used to deal with execution providers
that did not have a mechanism of reporting the application exit code. The most
notable example is the early implementation of the GRAM protocol from the 
Globus Toolkit version 2. It is unlikely to be needed.
- when running on compute clusters, the shared directory and therefore the
work directory needs to reside on a shared filesystem that is accessible from 
the compute nodes.
- transfers and other file operations are all controlled directly by Swift and
are governed by <<configuration:throttling, throttling>> settings, such as 
`fileTransfersThrottle` and `fileOperationsThrottle`. Throttling limits the 
number of concurrent operations and is useful to increase stability without
affecting performance.
- use of a `scratch` option pointing to a compute-node local filesystem can 
yield better performance if the work directory resides on a shared filesystem 
that has high latencies and high throughput. The performance improvement also 
depends on how the application accesses its input files and is more noticeable 
if the application uses a random-access pattern on its input files rather than 
sequential reads.
- while Swift staging is mostly superseded by provider staging through Coasters,
there are still legitimate reasons to use Swift staging, such as running MPI
applications through GRAM and a local resource manager (e.g. PBS).

Provider Staging
^^^^^^^^^^^^^^^^

With provider staging, Swift delegates the task of transferring files to the 
compute nodes to the execution provider. The full functionality needed to manage
all the file operations needed by a job are currently only supported by 
the <<configuration:coasters, Coasters>> execution provider, so provider 
staging requires the use of Coasters.

Since the provider takes care of shipping files to the compute nodes, there is
no strict need for a shared filesystem on the cluster except as needed by 
<<configuration:coasters, Coasters>>. Files on the Swift side, as well
as the job sandboxes, can reside on directly-attached disks. This can improve 
performance with respect to shared filesystems that would have non-trivial
latencies.

Wrapper staging is enabled by specifying one of `local`, `service-local`, 
`shared-fs` or `direct` choices to the <<configuration:staging, staging>> 
site configuration property. The meaning of the various choices is as follows:

`local`:: files are assumed to reside on the machine where Swift is running. 
Coasters will copy the files using the Coaster Service as a proxy. This allows
files to be copied even if the compute nodes cannot reach networks outside of 
the cluster they belong to.
`service-local`:: files are assumed to reside on the machine where Swift is
running and the Coaster Service is assumed to be running on the same machine.
`shared-fs`:: files are assumed to reside on a shared filesystem that is 
accessible by the compute nodes and are copied using standard POSIX copying
routines.
`direct`:: this is an experimental mode similar to `shared-fs` in which no
copying to a sandbox is actually done. Instead, applications are passed the 
absolute paths to the files involved.


Using the same example application as in the case of Swift staging, an example
set of operations performed by Swift with `staging: local` is shown below:

image:provider-staging.svg[]

Wrapper Staging
^^^^^^^^^^^^^^^

Wrapper staging is an experimental feature that allows `_swiftwrap` to perform
the necessary file staging operations. It generally requires a shared 
filesystem accessible from the compute nodes.

[[runtime:data-flow-and-staging]]
Data flow and staging
~~~~~~~~~~~~~~~~~~~~~

Overview
^^^^^^^^

The execution components involved in a Swift workflow are the client, the swift
service and the workers. The client is the program that executes the workflow
described in a swift script and is invoked by the swift command. The service
may be started separately or automatically by the swift client and is responsible
for provisioning resources from clouds, clusters or HPC systems. The workers are
launched by the swift-service and are responsible for controlling the execution
of the user's application on the compute nodes.

Different clusters, HPC systems, and cloud vendors may have shared file-systems,
varying network characteristics and local-disks available which can be utilized
differently to marshal data efficiently within a workflow. Data flow refers to
this movement of data within a workflow. On distributed systems with varying
levels of shared resources, the Swift client and service coordinates the flow of
data among the worker-nodes such that the data required for computation is available
to the worker prior to the execution of the users's application as well as ensuring
that the computed results are captured once tasks run to completion.

There are 6 different staging methods that are supported by Swift. They are:

 * Local
 * Direct
 * Wrapper
 * Swift
 * Shared-fs
 * Service-local

These staging methods are explained in detail in the following sections.

Legend for the following sections:
image:figs/legend.png["Legend"]

Staging method : local
^^^^^^^^^^^^^^^^^^^^^^

Data flow in local staging:
image:figs/local.png["Local staging"]

Summary
+++++++

The local staging method is designed for shared-nothing architectures such as
clusters and clouds with no shared file-systems. The data originates on the
node where the client runs and all data transfers are done explicity over the network.
This method avoids using a shared-filesystem to transfer the files over the network,
as in many cases the shared-filesystem is a shared among multiple users which adds
congestion and it also could just be unsuitable for certain file access patterns.
The client and service need not be on the same machine, which allows a client running on a local workstation to channel
data through a service on the headnode of a Cluster1 to compute nodes provisioned
from Cluster1. The is the default file staging method as this works on all computational
resources. Since all the data is transferred via the swift-service the network
bandwidth of the service could bottleneck the data flow. Similarly if the swift
client is running remotely, the network links between the client and the service
could potentially become a bottleneck for large volumes of data.

When to use this mode
+++++++++++++++++++++

The data volumes that need to be transferred to and from the workers to the client per worker
are not more that hundreds of MB. As data sizes approach GBs of data per task, other transport
mechanisms such as Globus transfers are worth considering.

When each task either consumes or generates a large number of small files, shared-filesystem
based copies can be very slow. The local staging method is an ideal candidate for this scenario.
However, when there are large number of files involved the filesystem of the system on which
the client is executing could become a bottleneck. Using a faster non-disk filesystem when available
generally improves performance significantly.


Example configs
+++++++++++++++

[listing,swiftconf]
-----
sites: midway
site.midway {
    execution {
        type: "coaster"
        URL: "swift.rcc.uchicago.edu"
        jobManager: "ssh-cl:slurm"  # Client connects remotely to the login node.
        options {
            nodeGranularity: 1
            maxNodesPerJob: 1
            jobQueue: "sandyb"
            maxJobs: 1
            tasksPerNode: 1
            maxJobTime: "00:08:20"
        }
    }
    staging: "local"
    workDirectory: "/tmp/"${env.USER}
    app.date {
        executable: "/bin/date"
        maxWallTime: "00:05:00"
    }
}
-----


Performance
+++++++++++

All data-flow is over the network links from the client node and service node in this staging method
and as a result, the network capacity of the client node is a potential limiting factor for large data
volumes.

When several small files are involved, or with sufficiently large files, the
filesystem on the client node can become a bottleneck.

There are performance limitations to the the staging/transport mechanism that swift uses, which could
limit transfer throughputs. [TODO: Data to support this would be very nice]

Notes:
++++++

When running using local coasters (local instead of ssh-cl), the client and service run on the same node.
    In this case, the network links are between the service and workers.


Staging method : Direct
^^^^^^^^^^^^^^^^^^^^^^^

Data flow with Direct staging:
image:figs/direct.png["Direct staging"]

Data flow with Direct staging and a scratch directory:
image:figs/direct_with_scratch.png["Direct staging with scratch directory"]

Summary
+++++++

The direct staging mode is designed for computational resources with High-Performance shared-filesystems.
This mode requires that a shared filesystem such as NFS, Lustre, or even FUSE-mounted-S3
is mounted across the nodes where the client, service, and the workers are executing.
Instead of Swift managing network transfers, the network transfers are implicitly
managed by the shared-filesystem.
The apps run in sandbox directories created under the workDirectory, but the tasks
themselves will receive absolute paths for the input and output files. For applications
that are IO bound, writing directly to the shared-filesystem can adversely affect the
shared filesystem performance. To avoid this there is an option to specify a “scratch”
folder on a local disk on the compute nodes.

When to use this mode
+++++++++++++++++++++

Large volumes of data are either consumed or generated by the application and a High Performance
shared-filesystem is available across the nodes. On systems which have shared-filesystems, with
I/O bandwidth that exceeds the network links between the headnode and the worker nodes, using
the network to transfer data to the compute nodes could be sub-optimal.

When a high-performance shared filesystem is available, such as the case on many supercomputing
systems, there is sufficient I/O bandwidth to support several applications reading and writing
to the filesystem in parallel.

Another scenario is when the shared-filesystem is sensitive to creation and deletion
of small files and directories. The swift workers create a sandbox directory for each
task, which is (3 : TODO:confirm this with Mihael) levels deep. Using the direct mode
with the workDirectory on a local disk (say /tmp) could avoid the overhead from swift's
mechanisms for sandboxing tasks.

Example configs
+++++++++++++++

The following is an example for the direct staging mode.
* Staging method is set to “direct”.
* workDirectory may be set to the shared filesystem or a local filesystem.

In this case, Swift assumes that file variables point at files on the shared filesystem.
The apps which are executed on the workers resolve the file variables to absolute paths
to the input and output files on the shared-filesystem.

[listing,swiftconf]
-----
sites: midway

site.midway {
    execution {
        type: "coaster"
        URL: "swift.rcc.uchicago.edu"
        jobManager: "local:slurm"
        options {
            nodeGranularity: 1
            maxNodesPerJob: 1
            jobQueue: "sandyb"
            maxJobs: 1
            tasksPerNode: 1
            maxJobTime: "00:08:20"
        }
    }
    staging: direct
    workDirectory: "/tmp/"${env.USER}"/swiftwork"
    app.bash {
        executable: "/bin/bash"
        maxWallTime: "00:05:00"
    }
}
-----

The following is an example for the direct staging mode.

* Staging method is set to “direct”
* workDirectory may be set to the shared filesystem or a local filesystem.
* Scratch is set to a directory on the local disks of the workers.

Since the staging method is set to “direct”, swift will assume that file are on a shared file-system.
In the context of user-application the file variables will resolve to absolute paths of the input/output files on the scratch directory.
Before the workers start the execution of user tasks, the workers will copy the input files from the shared-filesystem to the
scratch directory, and after execution will copy out the output files from the scratch directory to the shared-filesystem.

[listing,swiftconf]
-----
sites: midway

site.midway {
    execution {
        type: "coaster"
        URL: "swift.rcc.uchicago.edu"
        jobManager: "local:slurm"
        options {
            nodeGranularity: 1
            maxNodesPerJob: 1
            jobQueue: "sandyb"
            maxJobs: 1
            tasksPerNode: 1
            maxJobTime: "00:08:20"
        }
    }
    staging: direct
    workDirectory: "/tmp/"${env.USER}"/swiftwork"
    scratch: "/scratch/local/"${env.USER}"/work/"
    app.bash {
        executable: "/bin/bash"
        maxWallTime: "00:05:00"
    }
}

TCPPortRange: "50000,51000"
lazyErrors: false
executionRetries: 0
keepSiteDir: true
providerStagingPinSwiftFiles: false
alwaysTransferWrapperLog: true
-----

Notes:
++++++

TODO : Details of the filename behavior in apps and within swiftscript body.

When this configuration is used, the worker copies the input files from the shared-filesystem to the scratch directory, and the user application will get the path to the file on scratch when the filename(<file_variable>) and it's shorthand @<file_variable> primitives are used in the app definition. The filename and @ primitives when used outside of the app definitions will point at the files on the shared-filesystem.


Performance
+++++++++++

"Direct" is theoretically the optimal way to use the shared-filesystem. There are no unnecessary copies, and the application
that requires the file alone access the data.

If the data access pattern of the application involves random seeks or creation of several intermediate small files, the "scratch"
option allows you to offload sub-optimal file access patterns to a local disk/memory. This avoids costly accesses on the
shared-filesystem and indirectly the network.

Staging method : Swift
^^^^^^^^^^^^^^^^^^^^^^

Data flow with staging method Swift:
image:figs/swift.png["Swift staging"]

Summary
+++++++

Swift staging, involves the client accessing file over a supported method like
ssh or a local-filesystem access, and making the inputs available to the workers
over a work-directory on a shared filesystem. This staging method uses
an intermediate staging location that is on a shared-FS so each files is,
in addition to being read from the initial location, written to and read from a
shared FS, both of each are overhead. The only advantage to this is that you don't
need coasters to use it and it's supported on a large number of computational resources.

This is the default staging mechanism used if no staging method is defined in the
swift config file.

When to use this mode
+++++++++++++++++++++

1. You can access data using one of the supported methods like:
    local filesystem access
    ssh - Use scp to access files
    GSIFTP
    GridFTP
2. A shared-FS is present, that works well for your data access patterns.
3. You want to use a non-coaster execution provider.


The GSIFTP and GridFTP are not actively tested, and are not guaranteed to work.

Example configs
+++++++++++++++

[listing,swiftconf]
-----
sites: midway

site.midway {
    execution {
        type: "coaster"
        URL: "swift.rcc.uchicago.edu"
        jobManager: "local:slurm"
        options {
            nodeGranularity: 1
            maxNodesPerJob: 1
            jobQueue: "sandyb"
            maxJobs: 1
            tasksPerNode: 1
            maxJobTime: "00:08:20"
        }
    }
    filesystem {
        type: "local"
        URL:  "localhost"
    }
    staging: direct
    workDirectory: "/scratch/midway/"${env.USER}"/swiftwork"
    app.bash {
        executable: "/bin/bash"
        maxWallTime: "00:05:00"
    }
}

TCPPortRange: "50000,51000"
lazyErrors: false
executionRetries: 0
keepSiteDir: true
providerStagingPinSwiftFiles: false
alwaysTransferWrapperLog: true
-----


Performance
+++++++++++

The Swift staging method uses an intermediate staging location that is on a shared FS,
each files is, in addition to being read from the initial location, written to and read
from a shared FS, both of each are overhead. The only advantage to this is that you
don't need coasters to use it and it's supported in a lot of cases.

Staging method : Wrapper
^^^^^^^^^^^^^^^^^^^^^^^^

image:figs/wrapper.png["Wrapper staging"]

Summary
+++++++

The wrapper staging method relies on a wrapper script used to stage files to and from the swift workers.
Currently the wrapper staging method supports fetching files over HTTP and between the client filesystem.
The wrapper staging method provides a flexible interface to add support for third party transfer mechanisms
to the swift worker.

TODO: Is *guc* supported ?

When to use this mode
+++++++++++++++++++++

The repository for the input or output data can be accessed over a supported tranfer mechanism.

The data can be accessed only by an exotic tranfer mechanism, which could be incorporated into
the supported methods for wrapper staging.

Example configs
+++++++++++++++

[listing,swiftconf]
-----
sites: midway

site.midway {
        execution {
                type: "coaster"
                jobManager: "local:local"
                URL: "localhost"
        }
        filesystem {
                type: local
        }
        staging: "wrapper"
        scratch: "/tmp/"${env.USER}"/swift-scratch"
        workDirectory: "swiftwork"

        app.ALL {
                executable: "*"
        }
}

wrapperStagingLocalServer: "file://"
-----


[[runtime:site-selection]]
Site Selection
~~~~~~~~~~~~~~

TODO