Index of /guides/trunk/uc3

[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory  -  
[   ]README2016-12-21 19:10 11K 
[TXT]README.html2016-12-21 19:10 30K 
[IMG]p1.png2016-12-21 19:10 2.9K 
[IMG]p2.png2016-12-21 19:10 3.3K 
[IMG]p3.png2016-12-21 19:10 6.2K 
[IMG]p4.png2016-12-21 19:10 6.5K 
[IMG]p5.png2016-12-21 19:10 16K 
[IMG]p6.png2016-12-21 19:10 17K 
[IMG]p7.png2016-12-21 19:10 13K 
[IMG]p8.png2016-12-21 19:10 19K 
[IMG]p9.png2016-12-21 19:10 32K 

Swift UC3 mini-tutorial

Installing UC3 tutorial

Check out scripts from SVN

To checkout the most recent UC3 tutorial scripts from SVN, run the following command:

$ svn co https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/uc3

This will create a directory called uc3 which contains all of the scripts mentioned in this document.

Run setup

Once the scripts are checked out, run the following commands to perform the initial setup.

$ cd uc3            # change to the newly created uc3 directory
$ source setup.sh   # sets swift config files in $HOME/.swift
$ swift -version    # verify that Swift 0.94 is in your $PATH and functional
Note
If you disconnect from the machine, you will need to rerun source setup.sh.

Overview of the applications

There are two shell scripts included that act as a mock science application: simulation.sh and stats.sh

simulation.sh

The simulation.sh script generates and prints a random number. It optionally takes the following arguments:

Table 1. simulation.sh arguments
Argument number Description

1

runtime. Set how long simulation.sh should run, in seconds.

2

range. Limit random numbers to a given range.

3

biasfile. Look a number contained within this file to set bias.

4

scale. Scale random number by this factor.

5

n. Generate n number of random numbers.

With no arguments, simulate.sh prints 1 number in the range of 1-100.

$ ./simulate.sh
96

stats.sh

The stats.sh script reads a file containing n numbers and prints the average of those numbers.

Overview of the Swift scripts

Parts 1-6 run locally and serve as examples of the Swift language. Parts 7-9 submit jobs via Condor to UC3 resources

part01

The first swift script, p1.swift, runs simulate.sh to generate a single random number. It writes the number to a file.

p1.png

p1.swift
type file;

app (file o) mysim ()
{
  simulate stdout=@filename(o);
}

file f = mysim();

To run this script, run the following command:

$ cd part01
$ swift p1.swift

The simulate application gets translated to simulate.sh within the apps file.

Note
Since the file you created is not named, swift will generate a random name for the file in a directory called _concurrent. To view the created output, run "cat _concurrent/*"

To cleanup the directory and remove all outputs, run:

$ ./cleanup.sh

part02

The second swift script shows an example of naming the file. The output is now in a file called sim.out.

p2.png

p2.swift
type file;

app (file o) mysim ()
{
  simulate stdout=@filename(o);
}

file f <"sim.out">;
f = mysim();

To run the script:

$ cd part02
$ swift p2.swift

part03

The p3.swift script introduces the foreach loop. This script runs many simulations. Output files are named here by Swift and will get created in the _concurrent directory.

p3.png

p3.swift
type file;

app (file o) mysim ()
{
  simulate stdout=@filename(o);
}

foreach i in [0:9] {
  file f = mysim();
}

To run:

$ cd part03
$ swift p3.swift

part04

Part 4 gives an example of naming multiple files within a foreach loop.

p4.png

p4.swift
type file;

app (file o) mysim ()
{
  simulate stdout=@filename(o);
}

foreach i in [0:9] {
  file f <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
  f = mysim();
}

To run:

$ swift p4.swift

Output files will be named output/sim_N.out.

part05

Part 5 introduces a postprocessing step. After many simulations have run, the files created by simulation.sh will be sent to stats.sh for averaging.

p5.png

p5.swift
type file;

app (file o) mysim ()
{
  simulate stdout=@filename(o);
}

app (file o) analyze (file s[])
{
  stats @filenames(s) stdout=@filename(o);
}

file sims[];

int nsim = @toInt(@arg("nsim","10"));

foreach i in [0:nsim-1] {
  file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
  simout = mysim();
  sims[i] = simout;
}

file stats<"output/average.out">;
stats = analyze(sims);

To run:

$ swift p5.swift

part06

Part 6 introduces command line arguments. The script sets a variable called "steps" here, which determines the length of time that the simulation.sh will run for. It also defines a variable called nsim, which determines the number of simulations to run.

p6.png

p6.swift
type file;

app (file o) mysim (int timesteps)
{
  simulate timesteps stdout=@filename(o);
}

app (file o) analyze (file s[])
{
  stats @filenames(s) stdout=@filename(o);
}

file sims[];
int  nsim = @toInt(@arg("nsim","10"));
int steps = @toInt(@arg("steps","1"));

foreach i in [0:nsim-1] {
  file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
  simout = mysim(steps);
  sims[i] = simout;
}

file stats<"output/average.out">;
stats = analyze(sims);

Use the command below to specify the time for each simulation.

$ cd part06
$ swift p6.swift -steps=3  # each simulation takes 3 seconds

part07

Part 7 is the first script that will submit jobs to UC3 via Condor. It is similar to earlier scripts, with a few minor exceptions. Since there is not a shared filesystems when using OSG, the application simulate.sh will get transferred to the worker node by Swift.

p7.png

p7.swift
type file;

# Application to be called by this script

file simulation_script <"simulate.sh">;

# app() functions for application programs to be called:

app (file out) simulation (file script, int timesteps, int sim_range)
{
  sh @filename(script) timesteps sim_range stdout=@filename(out);
}

# Command line params to this script:

int  nsim  = @toInt(@arg("nsim",  "10"));  # number of simulation programs to run
int  range = @toInt(@arg("range", "100")); # range of the generated random numbers

# Main script and data

int steps=3;

tracef("\n*** Script parameters: nsim=%i steps=%i range=%i \n\n", nsim, steps, range);

foreach i in [0:nsim-1] {
  file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
  simout = simulation(simulation_script, steps, range);
}

To run:

$ cd part07
$ swift p7.swift

part08

Part 8 will also stage in and run stats.sh to calculate averages. It adds a trace statement so you can see the order in which things execute.

p8.png

p8.swift
type file;

# Applications to be called by this script

file simulation_script <"simulate.sh">;
file analysis_script   <"stats.sh">;

# app() functions for application programs to be called:

app (file out) simulation (file script, int timesteps, int sim_range, file bias_file, int scale, int sim_count)
{
  sh @filename(script) timesteps sim_range @filename(bias_file) scale sim_count stdout=@filename(out);
}

app (file out) analyze (file script, file s[])
{
  sh @script @filenames(s) stdout=@filename(out);
}

# Command line params to this script:

int  nsim  = @toInt(@arg("nsim",  "10"));  # number of simulation programs to run
int  steps = @toInt(@arg("steps", "1"));   # number of "steps" each simulation (==seconds of runtime)
int  range = @toInt(@arg("range", "100")); # range of the generated random numbers
int  count = @toInt(@arg("count", "10"));  # number of random numbers generated per simulation

# Main script and data

tracef("\n*** Script parameters: nsim=%i steps=%i range=%i count=%i\n\n", nsim, steps, range, count);

file sims[];                               # Array of files to hold each simulation output
file bias<"bias.dat">;                     # Input data file to "bias" the numbers:
                                           # 1 line: scale offset ( N = n*scale + offset)
foreach i in [0:nsim-1] {
  file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
  simout = simulation(simulation_script, steps, range, bias, 100000, count);
  sims[i] = simout;
}

file stats<"output/stats.out">;         # Final output file: average of all "simulations"
stats = analyze(analysis_script,sims);

To run:

$ cd part08
$ swift p8.swift

part09

Part 9 adds another app function called genrand. Genrand will produce a random number that will be used to determine how long each simulation app will run.

p9.png

p9.swift
type file;

# Applications to be called by this script

file simulation_script <"simulate.sh">;
file analysis_script   <"stats.sh">;

# app() functions for application programs to be called:

app (file out) genrand (file script, int timesteps, int sim_range)
{
  sh @filename(script) timesteps sim_range stdout=@filename(out);
}

app (file out) simulation (file script, int timesteps, int sim_range, file bias_file, int scale, int sim_count)
{
  sh @filename(script) timesteps sim_range @filename(bias_file) scale sim_count stdout=@filename(out);
}

app (file out) analyze (file script, file s[])
{
  sh @script @filenames(s) stdout=@filename(out);
}

# Command line params to this script:
int  nsim  = @toInt(@arg("nsim",  "10"));  # number of simulation programs to run
int  range = @toInt(@arg("range", "100")); # range of the generated random numbers
int  count = @toInt(@arg("count", "10"));  # number of random numbers generated per simulation

# Main script and data

tracef("\n*** Script parameters: nsim=%i range=%i count=%i\n\n", nsim, range, count);

file bias<"dynamic_bias.dat">;        # Dynamically generated bias for simulation ensemble

bias = genrand(simulation_script, 1, 1000);

file sims[];                               # Array of files to hold each simulation output

foreach i in [0:nsim-1] {

  int steps = readData(genrand(simulation_script, 1, 5));
  tracef("  for simulation[%i] steps=%i\n", i, steps+1);

  file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>;
  simout = simulation(simulation_script, steps+1, range, bias, 100000, count);
  sims[i] = simout;
}

file stats<"output/stats.out">;            # Final output file: average of all "simulations"
stats = analyze(analysis_script,sims);

To run:

$ cd part09
$ swift p9.swift

part10

p10.swift is exactly the same as p9.swift. Instead of the swift script, take a look at the sites.xml configuration file. The sites.xml file determines where swift runs its job at. Here the line with the condor requirement to select nodes from the UC3 seeder cluster is left un-commented to select that site.

<profile namespace="globus" key="condor.Requirements">regexp("uc3-c*", Machine)</profile>

The condor requirements for selecting nodes from UC3 seeder, ITS Virtualization lab, Open Science Grid and Atlas Midwest Tier 2 (at UC, IU, UIUC) are present in the sites.xml file. To choose any of these sites, simply uncomment the requirement line for the target system and run the swift script as:

To run:

$ cd part10
$ swift p10.swift

Once the script completes, run the script find_host.sh to find where the jobs were run.

./find_host.sh