Index of /guides/trunk/uc3

Name	Last modified	Size

Parent Directory		-
p9.png	2016-12-21 19:10	32K
p8.png	2016-12-21 19:10	19K
p7.png	2016-12-21 19:10	13K
p6.png	2016-12-21 19:10	17K
p5.png	2016-12-21 19:10	16K
p4.png	2016-12-21 19:10	6.5K
p3.png	2016-12-21 19:10	6.2K
p2.png	2016-12-21 19:10	3.3K
p1.png	2016-12-21 19:10	2.9K
README.html	2016-12-21 19:10	30K
README	2016-12-21 19:10	11K

Swift UC3 mini-tutorial

Installing UC3 tutorial

Check out scripts from SVN

To checkout the most recent UC3 tutorial scripts from SVN, run the following command:

$ svn co https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/uc3

This will create a directory called uc3 which contains all of the scripts mentioned in this document.

Run setup

Once the scripts are checked out, run the following commands to perform the initial setup.

$ cd uc3            # change to the newly created uc3 directory
$ source setup.sh   # sets swift config files in $HOME/.swift
$ swift -version    # verify that Swift 0.94 is in your $PATH and functional

Note	If you disconnect from the machine, you will need to rerun source setup.sh.

Overview of the applications

There are two shell scripts included that act as a mock science application: simulation.sh and stats.sh

simulation.sh

The simulation.sh script generates and prints a random number. It optionally takes the following arguments:

Table 1. simulation.sh arguments
Argument number	Description
1	runtime. Set how long simulation.sh should run, in seconds.
2	range. Limit random numbers to a given range.
3	biasfile. Look a number contained within this file to set bias.
4	scale. Scale random number by this factor.
5	n. Generate n number of random numbers.

With no arguments, simulate.sh prints 1 number in the range of 1-100.

$ ./simulate.sh
96

stats.sh

The stats.sh script reads a file containing n numbers and prints the average of those numbers.

Overview of the Swift scripts

Parts 1-6 run locally and serve as examples of the Swift language. Parts 7-9 submit jobs via Condor to UC3 resources

part01

The first swift script, p1.swift, runs simulate.sh to generate a single random number. It writes the number to a file.

p1.swift

type file;

app (file o) mysim ()
{
  simulate stdout=@filename(o);
}

file f = mysim();

To run this script, run the following command:

$ cd part01
$ swift p1.swift

The simulate application gets translated to simulate.sh within the apps file.

Note	Since the file you created is not named, swift will generate a random name for the file in a directory called _concurrent. To view the created output, run "cat _concurrent/*"

To cleanup the directory and remove all outputs, run:

$ ./cleanup.sh

part02

The second swift script shows an example of naming the file. The output is now in a file called sim.out.

p2.swift

type file;

app (file o) mysim ()
{
  simulate stdout=@filename(o);
}

file f <"sim.out">;
f = mysim();

To run the script:

$ cd part02
$ swift p2.swift

part03

The p3.swift script introduces the foreach loop. This script runs many simulations. Output files are named here by Swift and will get created in the _concurrent directory.

p3.swift

type file;

app (file o) mysim ()
{
  simulate stdout=@filename(o);
}

foreach i in [0:9] {
  file f = mysim();
}

To run:

$ cd part03
$ swift p3.swift

part04

Part 4 gives an example of naming multiple files within a foreach loop.

p4.swift

type file;

app (file o) mysim ()
{
  simulate stdout=@filename(o);
}

foreach i in [0:9] {
  file f <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
  f = mysim();
}

To run:

$ swift p4.swift

Output files will be named output/sim_N.out.

part05

Part 5 introduces a postprocessing step. After many simulations have run, the files created by simulation.sh will be sent to stats.sh for averaging.

p5.swift

type file;

app (file o) mysim ()
{
  simulate stdout=@filename(o);
}

app (file o) analyze (file s[])
{
  stats @filenames(s) stdout=@filename(o);
}

file sims[];

int nsim = @toInt(@arg("nsim","10"));

foreach i in [0:nsim-1] {
  file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
  simout = mysim();
  sims[i] = simout;
}

file stats<"output/average.out">;
stats = analyze(sims);

To run:

$ swift p5.swift

part06

Part 6 introduces command line arguments. The script sets a variable called "steps" here, which determines the length of time that the simulation.sh will run for. It also defines a variable called nsim, which determines the number of simulations to run.

p6.swift

type file;

app (file o) mysim (int timesteps)
{
  simulate timesteps stdout=@filename(o);
}

app (file o) analyze (file s[])
{
  stats @filenames(s) stdout=@filename(o);
}

file sims[];
int  nsim = @toInt(@arg("nsim","10"));
int steps = @toInt(@arg("steps","1"));

foreach i in [0:nsim-1] {
  file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
  simout = mysim(steps);
  sims[i] = simout;
}

file stats<"output/average.out">;
stats = analyze(sims);

Use the command below to specify the time for each simulation.

$ cd part06
$ swift p6.swift -steps=3  # each simulation takes 3 seconds

part07

Part 7 is the first script that will submit jobs to UC3 via Condor. It is similar to earlier scripts, with a few minor exceptions. Since there is not a shared filesystems when using OSG, the application simulate.sh will get transferred to the worker node by Swift.

p7.swift

type file;

# Application to be called by this script

file simulation_script <"simulate.sh">;

# app() functions for application programs to be called:

app (file out) simulation (file script, int timesteps, int sim_range)
{
  sh @filename(script) timesteps sim_range stdout=@filename(out);
}

# Command line params to this script:

int  nsim  = @toInt(@arg("nsim",  "10"));  # number of simulation programs to run
int  range = @toInt(@arg("range", "100")); # range of the generated random numbers

# Main script and data

int steps=3;

tracef("\n*** Script parameters: nsim=%i steps=%i range=%i \n\n", nsim, steps, range);

foreach i in [0:nsim-1] {
  file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
  simout = simulation(simulation_script, steps, range);
}

To run:

$ cd part07
$ swift p7.swift

part08

Part 8 will also stage in and run stats.sh to calculate averages. It adds a trace statement so you can see the order in which things execute.

p8.swift

type file;

# Applications to be called by this script

file simulation_script <"simulate.sh">;
file analysis_script   <"stats.sh">;

# app() functions for application programs to be called:

app (file out) simulation (file script, int timesteps, int sim_range, file bias_file, int scale, int sim_count)
{
  sh @filename(script) timesteps sim_range @filename(bias_file) scale sim_count stdout=@filename(out);
}

app (file out) analyze (file script, file s[])
{
  sh @script @filenames(s) stdout=@filename(out);
}

# Command line params to this script:

int  nsim  = @toInt(@arg("nsim",  "10"));  # number of simulation programs to run
int  steps = @toInt(@arg("steps", "1"));   # number of "steps" each simulation (==seconds of runtime)
int  range = @toInt(@arg("range", "100")); # range of the generated random numbers
int  count = @toInt(@arg("count", "10"));  # number of random numbers generated per simulation

# Main script and data

tracef("\n*** Script parameters: nsim=%i steps=%i range=%i count=%i\n\n", nsim, steps, range, count);

file sims[];                               # Array of files to hold each simulation output
file bias<"bias.dat">;                     # Input data file to "bias" the numbers:
                                           # 1 line: scale offset ( N = n*scale + offset)
foreach i in [0:nsim-1] {
  file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
  simout = simulation(simulation_script, steps, range, bias, 100000, count);
  sims[i] = simout;
}

file stats<"output/stats.out">;         # Final output file: average of all "simulations"
stats = analyze(analysis_script,sims);

To run:

$ cd part08
$ swift p8.swift

part09

Part 9 adds another app function called genrand. Genrand will produce a random number that will be used to determine how long each simulation app will run.

p9.swift

type file;

# Applications to be called by this script

file simulation_script <"simulate.sh">;
file analysis_script   <"stats.sh">;

# app() functions for application programs to be called:

app (file out) genrand (file script, int timesteps, int sim_range)
{
  sh @filename(script) timesteps sim_range stdout=@filename(out);
}

app (file out) simulation (file script, int timesteps, int sim_range, file bias_file, int scale, int sim_count)
{
  sh @filename(script) timesteps sim_range @filename(bias_file) scale sim_count stdout=@filename(out);
}

app (file out) analyze (file script, file s[])
{
  sh @script @filenames(s) stdout=@filename(out);
}

# Command line params to this script:
int  nsim  = @toInt(@arg("nsim",  "10"));  # number of simulation programs to run
int  range = @toInt(@arg("range", "100")); # range of the generated random numbers
int  count = @toInt(@arg("count", "10"));  # number of random numbers generated per simulation

# Main script and data

tracef("\n*** Script parameters: nsim=%i range=%i count=%i\n\n", nsim, range, count);

file bias<"dynamic_bias.dat">;        # Dynamically generated bias for simulation ensemble

bias = genrand(simulation_script, 1, 1000);

file sims[];                               # Array of files to hold each simulation output

foreach i in [0:nsim-1] {

  int steps = readData(genrand(simulation_script, 1, 5));
  tracef("  for simulation[%i] steps=%i\n", i, steps+1);

  file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
  simout = simulation(simulation_script, steps+1, range, bias, 100000, count);
  sims[i] = simout;
}

file stats<"output/stats.out">;            # Final output file: average of all "simulations"
stats = analyze(analysis_script,sims);

To run:

$ cd part09
$ swift p9.swift

part10

p10.swift is exactly the same as p9.swift. Instead of the swift script, take a look at the sites.xml configuration file. The sites.xml file determines where swift runs its job at. Here the line with the condor requirement to select nodes from the UC3 seeder cluster is left un-commented to select that site.

<profile namespace="globus" key="condor.Requirements">regexp("uc3-c*", Machine)</profile>

The condor requirements for selecting nodes from UC3 seeder, ITS Virtualization lab, Open Science Grid and Atlas Midwest Tier 2 (at UC, IU, UIUC) are present in the sites.xml file. To choose any of these sites, simply uncomment the requirement line for the target system and run the swift script as:

To run:

$ cd part10
$ swift p10.swift

Once the script completes, run the script find_host.sh to find where the jobs were run.

./find_host.sh