Configuration ------------- Swift uses a single configuration file called swift.properties. The swift.properties file is responsible for: 1. Defining how to interface with schedulers 2. Defining app names and locations 3. Defining various other swift settings and behavior Here is an example swift.properties file. ----- # Define a site named sandyb site.sandyb { tasksPerWorker=16 taskWalltime=00:05:00 jobManager=slurm jobQueue=sandyb maxJobs=1 workdir=/scratch/midway/$USER/work filesystem=local } # Define sandyb apps app.sandyb.echo=/bin/echo # Define other swift properties sitedir.keep=true wrapperlog.always.transfer=true # Select which site to run on site=sandyb ----- The details of this file will be explained more later. Let's first look at an example of running Swift. Using the swift.properties the new Swift command a user would run is: ----- $ swift script.swift ----- That is all that is needed. Everything Swift needs to know is defined in swift.properties. Location of swift.properties ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Swift searches for swift.properties files in multiple locations: 1. The etc/swift.properties file included with the Swift distribution. 2. $SWIFT_SITE_CONF/swift.properties - used for defining site templates. 3. $HOME/.swift/swift.properties 4. swift.properties in your current directory. 5. Any property file you point to with the command line argument "-properties " Settings get read in this order. Definitions in the later files will override any previous definitions. For example, if you have execution.retries=10 in $HOME/.swift/swift.properties, and execution.retries=0 in the swift.properties in your current directory, execution.retries will be set to 0. To verify what files are being read, and what values will be set, run: ----- $ swift -listconfig ----- Selecting a site ~~~~~~~~~~~~~~~~ There are two ways Swift knows where to run. The first is via swift.properties. The site command specified which site entries should be used for a particular run. ----- site=sandyb ----- Sites can also be selected on the command line by using the -site option. ----- $ swift -site westmere script.swift ----- The -site command line argument will override any sites selected in swift.properties. Selecting multiple sites ~~~~~~~~~~~~~~~~~~~~~~~~ To use multiple sites, use a list of site names separated by commas. In swift.properties: ----- site=westmere,sandyb ----- The same format can be used on the command line: ----- $ swift -site westmere,sandyb script.swift ----- NOTE: You can also use "sites=" in swift.properties, and "-sites x,y,z" on the command line. Run directories ~~~~~~~~~~~~~~~ When you run Swift, you will see a run directory get created. The run directory has the name of runNNN, where NNN starts at 000 and increments for every run. The run directories can be useful for debugging. They contain: .Run directory contents |====================== |apps |An apps generated from swift.properties |cf |A configuration file generated from swift.properties |runNNN.log|The log file generated during the Swift run |scriptname-runNNN.d|Debug directory containing wrapper logs |scripts|Directory that contains scheduler scripts used for that run |sites.xml|A sites.xml generated from swift.properties |swift.out|The standard out and standard error generated by Swift |====================== Using site templates ~~~~~~~~~~~~~~~~~~~~ Swift recognizes an environmnet variable called $SWIFT_SITE_CONF, which points to a directory containing a swift.properties file. This swift.properties can contain multiple site definitions for the various queues available on the cluster you are using. Your local swift.properties then does not need to define the entire site. It may contain only differences you need to make that are specific to your application, like walltime. Backward compatability ~~~~~~~~~~~~~~~~~~~~~~~ New users are encouraged to use the configuration mechanisms described in this documentation. However, if you are migrating from an older Swift release to 0.95, the older-style configurations using sites.xml and tc.data should still work. If you notice an instance where this is not true, please send an email to swift-support@ci.uchicago.edu. The swift.properties file format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Site definitions ^^^^^^^^^^^^^^^^ Site definitions in the swift.properties files begin with "site". The second word is the name of the site you are defining. In these examples we will define a site called westmere. The third word is the property. For example: ----- site.westmere.jobQueue=fast ----- Before the site properties are listed, it's important to understand the terminology used. A *task*, or *app task* is an instance of a program as defined in a Swift app() function. A *worker* is the program that launches app tasks. A *job* is related to schedulers. It is the mechanism by which workers are launched. Below is the list of valid site properties with brief explanations of what they do, and an example swift.properties entry. .swift.properties site properties [options="header"] |================================ |Property|Description|Example |condor |Pass parameters directly through to the submit script generated for the condor scheduler. For example, the setting "site.osgconnect.condor.+projectname=Swift" will generate the line "+projectname = Swift". |site.osgconnect.condor.+projectname=Swift |filesystem |Defines how files should be accessed |site.westmere.filesystem=local |jobGranularity |Specifies the granularity of a job, in nodes |site.westmere.jobGranularity=2 |jobManager |Specifies how jobs will be launched. The supported job managers are "cobalt", "slurm", "condor", "pbs", "lsf", "local", and "sge". |site.westmere.jobManager=slurm |jobProject |Set the project name for the job scheduler |site.westmere.project=myproject |jobQueue |Set the name of the scheduler queue to use. |site.westmere.jobQueue=westmere |jobWalltime |The maximum number amount of time allocated in a scheduler job, in hh:mm:ss format. |site.westmere.jobWalltime=01:00:00 |maxJobs |Maximum number of scheduler jobs to submit |site.westmere.maxJobs=20 |maxNodesPerJob |The maximum number of nodes to request per scheduler job. |site.westmere.maxNodesPerJob=2 |pe |The parallel environment to use for SGE schedulers |site.sunhpc.pe=mpi |providerAttributes |Allows user to pass attributes through directly to scheduler submit script. Currently only implemented for sites that use PBS. |site.beagle.providerAttributes=pbs.aprun;pbs.mpp;depth=24 |slurm |Pass parameters directly through to the submit script generated for the slurm scheduler. For example, the setting "site.midway.slurm.mail-user=username" generates the line "#SBATCH --mail-user=username". |site.midway.slurm.mail-user=username |stagingMethod |When provider staging is enabled, this option will specify the staging mechanism for use for each site. If set to 'file', staging is done from a filesystem accessible to the coaster service (typically running on the head node). If set to 'proxy', staging is done from a filesystem accessible to the client machine that swift is running on, and is proxied through the coaster service. If set to 'sfs' (short for "shared filesystem"), staging is done by copying files to and from a filesystem accessible by the compute node (such as an NFS or GPFS mount) |site.osg.stagingMethod=file |taskDir |Tasks will be run from this directory. In the absence of a taskDir definition, Swift will run the task from workdir. |site.westmere.taskDir=/scratch/local/$USER/work |tasksPerWorker |The number of tasks that each worker can run simultaneously. |site.westmere.tasksPernode=12 |taskThrottle |The maximum number of active tasks across all workers. |site.westmere.taskThrottle=100 |taskWalltime |The maximum amount of time a task may run, in hh:mm:ss. |site.westmere.taskWalltime=01:00:00 |site |Name of site or sites to run on. This is the same as running with swift -site |site=westmere |userHomeOverride |Sets the Swift user home. This must be a shared filesystem. This defaults to $HOME. For clusters where $HOME is not accessible to the worker nodes, you may override the value to point to a shared directory that you own. |site.beagle.userHomeOverride=/lustre/beagle/username |workdir |The workdirectory element specifies where on the site files can be stored. This directory must be available on all worker nodes that will be used for execution. A shared cluster filesystem is appropriate for this. Note that you need to specify absolute pathname for this field. |site.westmere.workdir=/scratch/midway/$USER/work |================================ Grouping site properties ~~~~~~~~~~~~~~~~~~~~~~~~ The example swift.properties in this document listed the following site related properties: ----- site.westmere.provider=local:slurm site.westmere.jobsPerNode=12 site.westmere.taskWalltime=00:05:00 site.westmere.queue=westmere site.westmere.initialScore=10000 site.westmere.filesystem=local site.westmere.workdir=/scratch/midway/$USER ----- However, you can also simplify this by grouping site properties together with curly brackets. ------ site.westmere { provider=local:slurm jobsPerNode=12 taskWalltime=00:05:00 queue=westmere initialScore=10000 filesystem=local workdir=/scratch/midway/$USER/work } ----- App definitions ~~~~~~~~~~~~~~~ In 0.95, applications wildcards will be used by default. This means that $PATH will be searched and pathnames to application do not have to be defined. In the case where you have multiple sites defined, and you want control over where things run, you will need to define the location of apps. In this scenario, you will can define apps in swift.properties with something like this: ----- app.westmere.cat=/bin/cat ----- When an app is defined in swift.properties for any site you are running on, wildcards will be disabled, and all apps you want to use must be defined. General Swift properties ~~~~~~~~~~~~~~~~~~~~~~~~ Swift behavior can be configured through general Swift properties. Below is a list of properties: [options="header"] |================ |Name|Valid Values|Default Value|Description |config.rundirs |true, false |true |By default, Swift will generate a run directory that contains logs, scheduler submit scripts, debug directories, and other files associated with a particular Swift run. Setting this value to false disables the creation of run directories and causes all logs and directories to be created in the current working directory. |execution.retries |Positive integer |2 |The number of time a job will be retried if it fails (giving a maximum of 1 + execution.retries attempts at execution) |file.gc.enabled |true, false |true |Files mapped by the concurrent mapper (i.e. when you don't explicitly specify a mapper) are deleted when they are not in use any more. This property can be used to prevent files mapped by the concurrent mapper from being deleted. |foreach.max.threads |Positive integer |1024 |Limits the number of concurrent iterations that each foreach statement can have at one time. This conserves memory for swift programs that have large numbers of iterations (which would otherwise all be executed in parallel) |lazy.errors |true, false |false |Swift can report application errors in two modes, depending on the value of this property. If set to false, Swift will report the first error encountered and immediately stop execution. If set to true, Swift will attempt to run as much as possible from a Swift script before stopping execution and reporting all errors encountered. When developing Swift scripts, using the default value of false can make the program easier to debug. However in production runs, using true will allow more of a Swift script to be run before Swift aborts execution. |swift.home |String | |Points to the Swift installation directory ($SWIFT_HOME). In general, this should not be set as Swift can find its own installation directory, and incorrectly setting it may impair the correct functionality of Swift. |pgraph |true, false |false |Swift can generate a Graphviz file representing the structure of the Swift script it has run. If this property is set to true, Swift will save the provenance graph in a file named by concatenating the program name and the instance ID (e.g. helloworld-ht0adgi315l61.dot). If set to false, no provenance graph will be generated. If a file name is used, then the provenance graph will be saved in the specified file. The generated dot file can be rendered into a graphical form using Graphviz , for example with a command-line such as: $ swift -pgraph graph1.dot q1.swift $ dot -ograph.png -Tpng graph1.dot |pgraph.graph.options |String |splines="compound", rankdir="TB" |This property specifies a Graphviz specific set of parameters for the graph. |pgraph.node.options |String |color="seagreen", style="filled" |Used to specify a set of Graphviz specific properties for the nodes in the graph. |provenance.log |true, false |false |This property controls whether the log file will contain provenance information enabling this will increase the size of log files, sometimes significantly. |provider.staging.pin.swiftfiles |true, false |false |When provider staging is enabled and provider.staging.pin.swiftfiles is set, cache some small files needed by Swift to avoid the cost of staging more than once. |sitedir.keep |true, false |false |Indicates whether the working directory on the remote site should be left intact even when a run completes successfully. This can be used to inspect the site working directory for debugging purposes. |status.mode |files, provider |files |Controls how Swift will communicate the result code of running user programs from workers to the submit side. In files mode, a file indicating success or failure will be created on the site shared filesystem. In provider mode, the execution provider job status will be used. provider mode requires the underlying job execution system to correctly return exit codes. |tcp.port.range |none |, where start and end are integers |A TCP port range can be specified to restrict the ports on which GRAM callback services are started. This is likely needed if your submit host is behind a firewall, in which case the firewall should be configured to allow incoming connections on ports in the range. |throttle.file.operations |, off |8 |Limits the total number of concurrent file operations that can happen at any given time. File operations (like transfers) require an exclusive connection to a site. These connections can be expensive to establish. A large number of concurrent file operations may cause Swift to attempt to establish many such expensive connections to various sites. Limiting the number of concurrent file operations causes Swift to use a small number of cached connections and achieve better overall performance. |throttle.host.submit |, off |2 |Limits the number of concurrent submissions for any of the sites Swift will try to send jobs to. In other words it guarantees that no more than the value of this throttle jobs sent to any site will be concurrently in a state of being submitted. |throttle.score.job.factor |, off |4 |The Swift scheduler has the ability to limit the number of concurrent jobs allowed on a site based on the performance history of that site. Each site is assigned a score (initially 1), which can increase or decrease based on whether the site yields successful or faulty job runs. The score for a site can take values in the (0.1, 100) interval. The number of allowed jobs is calculated using the following formula: 2 + score*throttle.score.job.factor This means a site will always be allowed at least two concurrent jobs and at most 2 + 100*throttle.score.job.factor. With a default of 4 this means at least 2 jobs and at most 402. This parameter can also be set per site using the jobThrottle profile key in a site catalog entry. |throttle.submit |, off |4 |Limits the number of concurrent submissions for a run. This throttle only limits the number of concurrent tasks (jobs) that are being sent to sites, not the total number of concurrent jobs that can be run. The submission stage in GRAM is one of the most CPU expensive stages (due mostly to the mutual authentication and delegation). Having too many concurrent submissions can overload either or both the submit host CPU and the remote host/head node causing degraded performance. |throttle.transfers |, off |4 |Limits the total number of concurrent file transfers that can happen at any given time. File transfers consume bandwidth. Too many concurrent transfers can cause the network to be overloaded preventing various other signaling traffic from flowing properly. |ticker.date.format |String | |Describes how to format the ticker date output. The format of this string is documented in the Java SimpleDateFormat class, at http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html |ticker.disable |true, false |false |When set to true, suppresses the output progress ticker that Swift sends to the console every few seconds during a run |ticker.prefix |String |Progress: |String to prepend to ticker output |tracing.enabled |true, false |true |Enables tracing of procedure invocations, assignments, iteration constructs, as well as certain dataflow events such as data intialization and waiting. This is done at a slight decrease in performance. Traces will be available in the log file. |use.wrapper.staging |true, false |false |Determines if the Swift wrapper should do file staging. |use.provider.staging |true, false |false |If true, files will be staged by Swift over the network. |wrapper.invocation.mode |absolute, relative |absolute |Determines if Swift remote wrappers will be executed by specifying an absolute path, or a path relative to the job initial working directory. In most cases, execution will be successful with either option. However, some execution sites ignore the specified initial working directory, and so absolute must be used. Conversely on some sites, job directories appear in a different place on the worker node file system than on the filesystem access node, with the execution system handling translation of the job initial working directory. In such cases, relative mode must be used. |wrapper.parameter.mode |args,files |args |Controls how Swift will supply parameters to the remote wrapper script. args mode will pass parameters on the command line. Some execution systems do not pass commandline parameters sufficiently cleanly for Swift to operate correctly. files mode will pass parameters through an additional input file. This provides a cleaner communication channel for parameters, at the expense of transferring an additional file for each job invocation. |wrapperlog.always.transfer |true, false |false |This property controls when output from the Swift remote wrapper is transfered back to the submit site. When set to false, wrapper logs are only transfered for jobs that fail. If set to true, wrapper logs are transfered after every job is completed or failed. |================ Using shell variables ~~~~~~~~~~~~~~~~~~~~~ Any value in swift.properties may contain environment variables. For example: ----- workdir=/scratch/midway/$USER/work ---- Environment variables are expanded locally on the machine where you are running Swift. Swift will also define a variable called $RUNDIRECTORY that is the path to the run directory Swift creates. In a case where you'd like your work directory to be in the runNNN directory, you may do something like this: ----- workdir=$RUNDIRECTORY -----