Coasters -------- Introduction ~~~~~~~~~~~~ Coasters are the Swift's implementation of _pilot job abstraction_. In many applications, Swift performance can be greatly enhanced by the use of coasters. Coasters provide a low-overhead job submission and file transfer mechanism suited for the execution of jobs and the transfer of files for which other grid protocols such as GRAM and GridFTP are poorly suited. Benefits ~~~~~~~~ Much of the overhead associated with other grid protocols (such as authentication and authorization, and allocation of worker nodes by the site's local resource manager) is reduced, because that overhead is associated with the allocation of a coaster pilot or coaster worker, rather than with every Swift-level procedure invocation; potentially hundreds or thousands of Swift-level procedure invocations can be run through a single worker. Coasters can be configured for two purposes: job execution and file staging. In practice, the Swift script remains the same while working with coasters. A detailed description of coaster mechanism is explained in the next section. Mechanism ~~~~~~~~~ Coasters run at the task management layer logically under the Swift script. The jobs and data movement requirements resulting after the interpretation of a Swift script are handled by the coasters. The coaster mechanism submits a pilot job using some other execution mechanism such as GRAM, SGE or PBS scheduler, and for each worker node that will be used in a remote cluster, it submits a worker job, again using some other execution mechanism such as GRAM. Details on the design of the coaster mechanism can be found here: . The pilot job manages file transfers and the dispatch of execution jobs to workers. Coasters How-to ~~~~~~~~~~~~~~~ To use for job execution, specify a sites.xml execution element like this: ---- ---- The jobmanager string contains more detail than with other providers. It contains either two or three colon separated fields: 1:the provider to be use to execute the coaster pilot job - this provider will submit from the Swift client side environment. Commonly this will be one of the GRAM providers; 2: the provider to be used to execute coaster worker jobs. This provider will be used to submit from the coaster pilot job environment, so a local scheduler provider can sometimes be used instead of GRAM. 3: optionally, the jobmanager to be used when submitting worker job using the provider specified in field 2. To use for file transfer, specify a sites.xml filesystem element like this: ---- ---- The url parameter should be a pseudo-URI formed with the URI scheme being the name of the provider to use to submit the coaster pilot job, and the hostname portion being the hostname to be used to execute the coaster pilot job. Note that this provider and hostname will be used for execution of a coaster pilot job, not for file transfer; so for example, a GRAM endpoint should be specified here rather than a GridFTP endpoint. Coasters are affected by the following profile settings, which are documented in the Globus namespace profile section: [options="header, autowidth"] |================= |Profile key|Brief description |slots|How many maximum LRM jobs/worker blocks are allowed |jobsPerNode|How many coaster workers to run per execution node |nodeGranularity|Each worker block uses a number of nodes that is a multiple of this number |lowOverallocation|How many times larger than the job walltime should a block's walltime be if all jobs are 1s long |highOverallocation|How many times larger than the job walltime should a block's walltime be if all jobs are infinitely long |overallocationDecayFactor|How quickly should the overallocation curve tend towards the highOverallocation as job walltimes get larger |spread|By how much should worker blocks vary in worker size |jobsPerNode|How many coaster workers to run per execution node |reserve|How many seconds to reserve in a block's walltime for starting/shutdown operations |maxnodes|The maximum number of nodes allowed in a block |maxtime|The maximum walltime allowed for a block, in integer seconds |remoteMonitorEnabled|If true, show a graphical display of the status of the coaster service |================== Coaster config parameters and Job Quantities ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This section presents information on coaster configuration parameters and their effect on application and scheduler jobs submitted by Swift. In order to achieve optimal performance, the number of application tasks must correspond to the total number of coaster workers. Coaster configuration parameters influence both application tasks and the LRM (Local Resource Manager) jobs that are submitted by Swift. Specifically, the following quantities are influenced by coasters configuration parameters: * Number of application tasks Swift will pack in one "wave" to be executed in parallel depends on the +foreach.max.threads+ and the +throttle+ parameters. Furthermore, the number of +foreach+ loops in a Swift script influences the aggregate +foreach.max.threads+. The relation between application tasks and the above mentioned quantities could be explained as follows: ---- app_tasks = min{nforeach X foreach.max.threads), (throttle X 100 +1)} ---- Where +nforeach+ is the number of independent foreach loops appearing in a Swift script. * Number of jobs Swift will submit via the LRM interface is determined by the +slots+ configuration parameter in sites file. ---- LRM jobs = slots ---- * Size of each LRM job in terms of number of compute nodes per job is determined by the +maxnodes+ and +nodegranularity+ parameters. LRM jobs submitted by Swift will be of size spread between +nodegranularity+ and +maxnodes+ values. ---- nodegranularity <= LRM job size <= maxnodes ---- * Number of coaster workers to be run per LRM job on a target cluster is determined by the +jobspernode+ parameter. Considering the above factors, the following paramter expressions must match in order for a Swift run to be optimal: ---- min{nforeach X foreach.max.threads), (throttle X 100 +1)} ~= slots X avg(nodegranularity,maxnodes) X jobspernode ----