:stem: asciimath The Swift Runtime ----------------- Swift is a deterministic dataflow language. The lexical ordering of statements is generally irrelevant. What is relevant are the *dependencies* between data. Values as Futures ~~~~~~~~~~~~~~~~~ Each <> in Swift is a *future*. A future *wraps* a *concrete value* and has two possible states: Open:: (or unbound). This is the default initial state of a value. The concrete value is absent and cannot yet be used in a concrete operation. It will be available at a later time. Closed:: (or bound). This is a state in which a concrete value is available and can be used in an operation. If a value is open at some time, it can be closed only at a later time. It is not possible for a variable to become open after it was closed. Value literals are represented with futures that are closed when a program starts execution. [[runtime:order-of-operations]] Order of Operations ~~~~~~~~~~~~~~~~~~~ Independent operations in Swift are all executed in parallel. However, operations can depend on values which must be closed before the respective operations can be executed. Consider the following example: [listing, swift] ---- int a = 1; int b = 2; int c = f(a) + g(b); ---- The following operations can be identified: * three assignments, for variables +a+, +b+ and +c+ * two function invocations: +f(a)+ and +g(b)+ * an addition operation All these operations are started in parallel as soon as Swift starts running the program. The assignments to variables +a+ and +b+ can continue immediately since they depend only on integer literals, which are closed by default. The invocations of +f+ and +g+ can then continue. The addition has to wait for the results from the invocations of both +f+ and +g+. When those results are available, the addition can be peformed and the resulting value can be finally assigned to +c+. Types of Operations ~~~~~~~~~~~~~~~~~~~ Technically speaking, many things can be considered ``operations''. However, it is worth emphasizing some of them due to the particular way in which they are executed. Assignments:: The <> waits for the right hand side to be closed, copies the concrete value from the right hand side to the left hand side, and finally closes the left hand side. TIP: See also: <> Application Functions:: An application instance will only run after all of its actual parameters are closed. After an application invocation completes, all the actual return parameters are closed by Swift. Function Invocations:: When a non-application function is invoked, it does not necessarily wait for all of its actual parameters to be closed. The details depend on the actual definition of the function. Consider the following example: + [listing, swift] ---- (int result) product(int f1, int f2) { if (f1 == 0) { result = 0; } else { result = f1 * f2; } } int r1 = product(0, x); int r2 = product(2, x); ---- + Two cases exist: + . In the first invocation of the +product+ function, the parameter +f1+ is zero. The first branch of the +if+ condition is taken, which does not include an operation that waits for the value of +f2+ to be closed. . In the second invocation, the value of +f1+ is non-zero. The second branch of the +if+ is taken. This branch contains an operator that has the values of both +f1+ and +f2+ as parameters. In this case the +product+ function needs to wait for +f2+ to be closed before producing a result. + For functions defined in the Swift <>, it is implied that the implementation will always wait for the value of the parameters to be closed. Any exceptions to this rule are explicitly documented. + Operators:: There is no semantic difference between operator invocations and library function invocations, so the above rule applies. In particular, the current implementation does not implement shortcut evaluation for boolean operators. [[runtime:arrays-and-iterations]] Arrays and Iterations ~~~~~~~~~~~~~~~~~~~~~ Arrays in Swift are sparse. This means that array sizes, in general, can only be fully determined at run-time. Because of this, the array structure itself has the properties of a future. Consider the following example: [listing, swift] ---- int[] a, b; a[0] = 1; iterate i { int value = someComplexFunction(i); a[i + 1] = value; } until (value > 100); foreach x, i in a { b[i] = f(x); } ---- Since +iterate+ and +foreach+ run in parallel, the +foreach+ will start before +a+ is fully constructed. We assume that the +iterate+ condition eventually becomes +true+. In order for the program to terminate and function deterministically, the +foreach+ must eventually terminate and it must not do so before all the items have been added to +a+. Arrays must therefore have two states: Open array:: In this state it is unknown whether more items will be added to an array or not. Closed array:: Both the size of the array and the indices corresponding to all the items in the array are known. A +foreach+ statement will start, in parallel, an iteration for each item available in an array. As more items are added to the array, +foreach+ will start the corresponding iterations. When the array becomes closed and all iterations complete, +foreach+ will complete. When are Arrays Closed ^^^^^^^^^^^^^^^^^^^^^^ The determination of when an array can be closed is made based on analyzing all source code locations in which that array is written to. In the last example above, all writing operations to +a+ are done within the +iterate+ statement. The Swift compiler will generate code to close +a+ as soon as the +iterate+ statement completes. Writing to an Array that is Being Iterated on ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Swift allows writing to an array that is being iterated on using the +foreach+ statement: [listing, swift] ---- int[] a; a[0] = 1; foreach x, i in a { int value = someComplexFunction(i); if (value <= 100) { a[i + 1] = value; } } ---- Swift handles array closing slightly differently in the case of ``self referencing'' +foreach+ statements. The +a+ array will be closed when both the following conditions are true: . All writing operations to +a+ outside of the +foreach+ have completed . Iterations for all available items in +a+ have completed This can be used to implement everything that can be implemented using the +iterate+ statement, possibly in a more clear fashion. Users are encouraged to use +foreach+ instead of +iterate+. Circular Dependencies ~~~~~~~~~~~~~~~~~~~~~ Circular dependencies are situations in which two or more values form a dependency cycle through operations applied on them. The simplest case is that of two mutually dependent values: [listing, swift] ---- int a, b; a = f(b); b = f(a); ---- The variable +a+ cannot be closed until +b+ is closed which in turn cannot be closed until +a+ is closed. Situations like this will cause a Swift program to stop progressing. This is detected by the Swift run-time and results in a run-time error. While the above situation can theoretically be detected at compile-time, this cannot be done in all cases. Consider the following example: [listing, swift] ---- int[] a; a[0] = 1; a[1] = a[getIndex(1)]; a[2] = a[getIndex(2)]; ---- It is possible, but not necessary, that at run-time the above code will result in: [listing, swift] ---- int[] a; a[0] = 1; a[1] = a[2]; a[2] = a[1]; ---- This can only be determined by knowing the values returned by the +getIndex()+ function, which in general can only be done at run-time. Error Handling ~~~~~~~~~~~~~~ TODO [[runtime:file-mapping]] File Mapping ~~~~~~~~~~~~ File mapping is the process through which values stored in Swift variables of <> are associated with physical files. This process is implemented by *mappers* which are specified using <>. A non-composite mapped value can be associated to a file using the <> or its short form: [listing, swift] ---- file f <"file.dat">; ---- The above code associates the variable +f+ to a local file named +file.dat+ which is assumed to be in the directory from which Swift is invoked. It is possible to more complex path names or even *URLs* to associate a Swift variable with files that do not necessarily reside in the current directory: [listing, swift] ---- file f1 <"/tmp/file.dat">; file f2 <"http://example.org/index.html">; ---- For a list of all supported remote file access methods, please see <>. Values of composite types containing file-valued data can be mapped in bulk using one of the additional <> provided by Swift. For example, the <> can be used to glob files in a directory and map them to an entire array: [listing, swift] ---- file[] a ; ---- [[runtime:file-mapping:implicit-mapping]] Implicit Mapping ^^^^^^^^^^^^^^^^ A mapped type value that is not mapped explicitly is *implicitly mapped*. Specifically Swift will map it to a deterministic but opaque temporary file. Input or Output ^^^^^^^^^^^^^^^ Swift distinguishes between input and output mapped data based on whether explicit assignments are present in the program. If a variable is assigned to in a Swift program, Swift considers that variable to be an output. Otherwise, Swift marks it as an input. When a variable is marked as an input, Swift requires that the corresponding files be present, unless the variable is an array that can be empty. An input mapped variable is considered to be implicitly assigned. Example: [listing, swift] ---- file f <"input.dat">; <1> file g <"output.dat">; <2> g = cat(f); ---- <1> Variable +f+ is not explicitly assigned to. It is therefore an input variable, and Swift implicitly assigns a file value representing the +input.dat+ file to it. It is an error for +input.dat+ not to exist as a file. <2> Variable +g+ is explicitly assigned to. It is therefore an output and it will be created by Swift during the program's execution. It does not make sense for certain mappers to be used for output (such as the <>), since their operation depends on having a set of physical files present. [[runtime:assignment-of-mapped-values]] Assignment of Mapped Values ^^^^^^^^^^^^^^^^^^^^^^^^^^^ When a mapped value is assigned to a mapped type variable, and the destination variable is not <>, Swift guarantees that the file that the destination variable is mapped to will exist. This is generally done by copying the file. However, Swift may optimize this when possible, in particular by creating symbolic links on systems that support it. [[runtime:application-functions-execution]] Application Functions Execution ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Applications in Swift are generally executed on remote computing resources. In the Swift language, applications functions are specified in a way that is independent of where and how applications are run, which allows application instances to be scheduled efficiently based on resource availability. When an application function is invoked, the following steps are taken: . Swift waits for all the actual parameters to the application function to be closed . *Site selection*: a remote resource is selected from all available resources based on whether they contain the given application, load, and other run-time information (see Section on <>). . A file sandbox is created, where the application input and output files will be stored . All files corresponding to mapped values in the actual parameters are copied to the application sandbox ("stage-in") . The application is executed remotely using one of the available <> . All files corresponding to mapped values in the actual return parameters are copied back from the application sandbox, and the actual return parameters are closed ("stage-out") . The application sandbox is deleted The application execution is performed indirectly. A small script, called "wrapper" (or +_swiftwrap+), is used to implement the following functions: - set up the details of the application sandbox, such as directory structures where input and output files go - perform basic tests of the environment and try to generate user-friendly error messages if something is wrong - copy or link files from a resource-local swift cache to the application sandbox and back - optionally clean up the application sandbox - record the status of the application execution - log various application timing information There are three ways in which the above operations can be performed and they will be explained in detail in the following sections: *swift staging*, *provider staging*, and *wrapper staging*. Swift Staging ^^^^^^^^^^^^^ Swift staging is historically the first mechanism used by Swift to deal with remote application files. In swift staging mode (selected through the `staging: "swift"` <> option), for each job, the Swift runtime selects a site as described in <>, and, using that site, it performs the following operations in order: . Using the <> specified in the configuration file, it creates a *shared directory* in the location specified by the `workDirectory` site property that will mirror the local directory structure with respect to all files used by applications running on that site. This directory is only created once per site per swift program execution, and it is initially empty. It is guaranteed that two invocations of swift programs will have different shared directories on a given site. . In the shared directory, Swift creates the directory structure needed by the job's input and output files. For example, if an application uses the following input: + [listing, swift] ---- file f <"data/f.dat">; ---- + then Swift will create a directory named `data` inside the site shared directory. + . Using the filesystem provider, swift copies all the application's input files into their respective directories inside the shared directory. Files that are already in the shared directory are skipped. . The application is invoked through `_swiftwrap`. The essential steps taken by `_swiftwrap` in Swift staging mode are as follows: .. Create a sandbox directory either inside the shared directory, or in a temporary location if the `scratch` property is specified for the site .. For each of the application's input files, either copy the file from the shared directory into the sandbox directory or create a symbolic link inside the sandbox directory to the file in the shared directory. The choice of whether to copy or link is determined by existence of the `scratch` property. If the property is defined for the site, then the files will be copied. The copying process preserves the directory structure. .. Run the application .. If the application returns a zero exit code indicating success, then: ... Check that all the output files were created by the application and fail if not ... Move the output files from the sandbox directory back to the shared directory preserving the directory structure ... Remove the job directory .. Exit signalling success or failure; the exact method of signalling depends on the value of the `statusMode` property. If set to `provider`, then `_swiftwrap` exits with an exit code equal to the exit code of the application. If set to `files`, then swift creates either an empty success file or a failure file containing the application exit code . Transfer and check the status files for the job status if `statusMode` is `files` . Copy all application output files back to the machine that Swift is running on For example, consider the following Swift program: [listing, swift] ---- app (file outf) cat(file inf) { "/bin/cat" filename(inf) stdout = filename(outf); } file inf <"inputs/a.dat">; file outf <"outputs/b.dat">; outf = cat(inf); ---- and the following configuration: [listing, swiftconf] ---- site.cluster { execution { type: "GRAM" url: "login.cluster.example.org" jobManager: "PBS" } filesystem { type: "GSIFTP" url: "login.cluster.example.org" } statusMode: "provider" staging: "swift" workDirectory: "/homes/johndoe/swiftwork" } sites: [cluster] ---- Swift would execute the `cat` application as follows: image:swift-staging.svg[] A few observations are in order: - `statusMode: "files"` was historically used to deal with execution providers that did not have a mechanism of reporting the application exit code. The most notable example is the early implementation of the GRAM protocol from the Globus Toolkit version 2. It is unlikely to be needed. - when running on compute clusters, the shared directory and therefore the work directory needs to reside on a shared filesystem that is accessible from the compute nodes. - transfers and other file operations are all controlled directly by Swift and are governed by <> settings, such as `fileTransfersThrottle` and `fileOperationsThrottle`. Throttling limits the number of concurrent operations and is useful to increase stability without affecting performance. - use of a `scratch` option pointing to a compute-node local filesystem can yield better performance if the work directory resides on a shared filesystem that has high latencies and high throughput. The performance improvement also depends on how the application accesses its input files and is more noticeable if the application uses a random-access pattern on its input files rather than sequential reads. - while Swift staging is mostly superseded by provider staging through Coasters, there are still legitimate reasons to use Swift staging, such as running MPI applications through GRAM and a local resource manager (e.g. PBS). Provider Staging ^^^^^^^^^^^^^^^^ With provider staging, Swift delegates the task of transferring files to the compute nodes to the execution provider. The full functionality needed to manage all the file operations needed by a job are currently only supported by the <> execution provider, so provider staging requires the use of Coasters. Since the provider takes care of shipping files to the compute nodes, there is no strict need for a shared filesystem on the cluster except as needed by <>. Files on the Swift side, as well as the job sandboxes, can reside on directly-attached disks. This can improve performance with respect to shared filesystems that would have non-trivial latencies. Wrapper staging is enabled by specifying one of `local`, `service-local`, `shared-fs` or `direct` choices to the <> site configuration property. The meaning of the various choices is as follows: `local`:: files are assumed to reside on the machine where Swift is running. Coasters will copy the files using the Coaster Service as a proxy. This allows files to be copied even if the compute nodes cannot reach networks outside of the cluster they belong to. `service-local`:: files are assumed to reside on the machine where Swift is running and the Coaster Service is assumed to be running on the same machine. `shared-fs`:: files are assumed to reside on a shared filesystem that is accessible by the compute nodes and are copied using standard POSIX copying routines. `direct`:: this is an experimental mode similar to `shared-fs` in which no copying to a sandbox is actually done. Instead, applications are passed the absolute paths to the files involved. Using the same example application as in the case of Swift staging, an example set of operations performed by Swift with `staging: local` is shown below: image:provider-staging.svg[] Wrapper Staging ^^^^^^^^^^^^^^^ Wrapper staging is an experimental feature that allows `_swiftwrap` to perform the necessary file staging operations. It generally requires a shared filesystem accessible from the compute nodes. [[runtime:site-selection]] Site Selection ~~~~~~~~~~~~~~ TODO