Installing UC3 tutorial
Check out scripts from SVN
To checkout the most recent UC3 tutorial scripts from SVN, run the following command:
$ svn co https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/uc3
This will create a directory called uc3 which contains all of the scripts mentioned in this document.
Run setup
Once the scripts are checked out, run the following commands to perform the initial setup.
$ cd uc3 # change to the newly created uc3 directory $ source setup.sh # sets swift config files in $HOME/.swift $ swift -version # verify that Swift 0.94 is in your $PATH and functional
Note
|
If you disconnect from the machine, you will need to rerun source setup.sh. |
Overview of the applications
There are two shell scripts included that act as a mock science application: simulation.sh and stats.sh
simulation.sh
The simulation.sh script generates and prints a random number. It optionally takes the following arguments:
Argument number | Description |
---|---|
1 |
runtime. Set how long simulation.sh should run, in seconds. |
2 |
range. Limit random numbers to a given range. |
3 |
biasfile. Look a number contained within this file to set bias. |
4 |
scale. Scale random number by this factor. |
5 |
n. Generate n number of random numbers. |
With no arguments, simulate.sh prints 1 number in the range of 1-100.
$ ./simulate.sh 96
stats.sh
The stats.sh script reads a file containing n numbers and prints the average of those numbers.
Overview of the Swift scripts
Parts 1-6 run locally and serve as examples of the Swift language. Parts 7-9 submit jobs via Condor to UC3 resources
part01
The first swift script, p1.swift, runs simulate.sh to generate a single random number. It writes the number to a file.
type file; app (file o) mysim () { simulate stdout=@filename(o); } file f = mysim();
To run this script, run the following command:
$ cd part01 $ swift p1.swift
The simulate application gets translated to simulate.sh within the apps file.
Note
|
Since the file you created is not named, swift will generate a random name for the file in a directory called _concurrent. To view the created output, run "cat _concurrent/*" |
To cleanup the directory and remove all outputs, run:
$ ./cleanup.sh
part02
The second swift script shows an example of naming the file. The output is now in a file called sim.out.
type file; app (file o) mysim () { simulate stdout=@filename(o); } file f <"sim.out">; f = mysim();
To run the script:
$ cd part02 $ swift p2.swift
part03
The p3.swift script introduces the foreach loop. This script runs many simulations. Output files are named here by Swift and will get created in the _concurrent directory.
type file; app (file o) mysim () { simulate stdout=@filename(o); } foreach i in [0:9] { file f = mysim(); }
To run:
$ cd part03 $ swift p3.swift
part04
Part 4 gives an example of naming multiple files within a foreach loop.
type file; app (file o) mysim () { simulate stdout=@filename(o); } foreach i in [0:9] { file f <single_file_mapper; file=@strcat("output/sim_",i,".out")>; f = mysim(); }
To run:
$ swift p4.swift
Output files will be named output/sim_N.out.
part05
Part 5 introduces a postprocessing step. After many simulations have run, the files created by simulation.sh will be sent to stats.sh for averaging.
type file; app (file o) mysim () { simulate stdout=@filename(o); } app (file o) analyze (file s[]) { stats @filenames(s) stdout=@filename(o); } file sims[]; int nsim = @toInt(@arg("nsim","10")); foreach i in [0:nsim-1] { file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>; simout = mysim(); sims[i] = simout; } file stats<"output/average.out">; stats = analyze(sims);
To run:
$ swift p5.swift
part06
Part 6 introduces command line arguments. The script sets a variable called "steps" here, which determines the length of time that the simulation.sh will run for. It also defines a variable called nsim, which determines the number of simulations to run.
type file; app (file o) mysim (int timesteps) { simulate timesteps stdout=@filename(o); } app (file o) analyze (file s[]) { stats @filenames(s) stdout=@filename(o); } file sims[]; int nsim = @toInt(@arg("nsim","10")); int steps = @toInt(@arg("steps","1")); foreach i in [0:nsim-1] { file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>; simout = mysim(steps); sims[i] = simout; } file stats<"output/average.out">; stats = analyze(sims);
Use the command below to specify the time for each simulation.
$ cd part06 $ swift p6.swift -steps=3 # each simulation takes 3 seconds
part07
Part 7 is the first script that will submit jobs to UC3 via Condor. It is similar to earlier scripts, with a few minor exceptions. Since there is not a shared filesystems when using OSG, the application simulate.sh will get transferred to the worker node by Swift.
type file; # Application to be called by this script file simulation_script <"simulate.sh">; # app() functions for application programs to be called: app (file out) simulation (file script, int timesteps, int sim_range) { sh @filename(script) timesteps sim_range stdout=@filename(out); } # Command line params to this script: int nsim = @toInt(@arg("nsim", "10")); # number of simulation programs to run int range = @toInt(@arg("range", "100")); # range of the generated random numbers # Main script and data int steps=3; tracef("\n*** Script parameters: nsim=%i steps=%i range=%i \n\n", nsim, steps, range); foreach i in [0:nsim-1] { file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>; simout = simulation(simulation_script, steps, range); }
To run:
$ cd part07 $ swift p7.swift
part08
Part 8 will also stage in and run stats.sh to calculate averages. It adds a trace statement so you can see the order in which things execute.
type file; # Applications to be called by this script file simulation_script <"simulate.sh">; file analysis_script <"stats.sh">; # app() functions for application programs to be called: app (file out) simulation (file script, int timesteps, int sim_range, file bias_file, int scale, int sim_count) { sh @filename(script) timesteps sim_range @filename(bias_file) scale sim_count stdout=@filename(out); } app (file out) analyze (file script, file s[]) { sh @script @filenames(s) stdout=@filename(out); } # Command line params to this script: int nsim = @toInt(@arg("nsim", "10")); # number of simulation programs to run int steps = @toInt(@arg("steps", "1")); # number of "steps" each simulation (==seconds of runtime) int range = @toInt(@arg("range", "100")); # range of the generated random numbers int count = @toInt(@arg("count", "10")); # number of random numbers generated per simulation # Main script and data tracef("\n*** Script parameters: nsim=%i steps=%i range=%i count=%i\n\n", nsim, steps, range, count); file sims[]; # Array of files to hold each simulation output file bias<"bias.dat">; # Input data file to "bias" the numbers: # 1 line: scale offset ( N = n*scale + offset) foreach i in [0:nsim-1] { file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>; simout = simulation(simulation_script, steps, range, bias, 100000, count); sims[i] = simout; } file stats<"output/stats.out">; # Final output file: average of all "simulations" stats = analyze(analysis_script,sims);
To run:
$ cd part08 $ swift p8.swift
part09
Part 9 adds another app function called genrand. Genrand will produce a random number that will be used to determine how long each simulation app will run.
type file; # Applications to be called by this script file simulation_script <"simulate.sh">; file analysis_script <"stats.sh">; # app() functions for application programs to be called: app (file out) genrand (file script, int timesteps, int sim_range) { sh @filename(script) timesteps sim_range stdout=@filename(out); } app (file out) simulation (file script, int timesteps, int sim_range, file bias_file, int scale, int sim_count) { sh @filename(script) timesteps sim_range @filename(bias_file) scale sim_count stdout=@filename(out); } app (file out) analyze (file script, file s[]) { sh @script @filenames(s) stdout=@filename(out); } # Command line params to this script: int nsim = @toInt(@arg("nsim", "10")); # number of simulation programs to run int range = @toInt(@arg("range", "100")); # range of the generated random numbers int count = @toInt(@arg("count", "10")); # number of random numbers generated per simulation # Main script and data tracef("\n*** Script parameters: nsim=%i range=%i count=%i\n\n", nsim, range, count); file bias<"dynamic_bias.dat">; # Dynamically generated bias for simulation ensemble bias = genrand(simulation_script, 1, 1000); file sims[]; # Array of files to hold each simulation output foreach i in [0:nsim-1] { int steps = readData(genrand(simulation_script, 1, 5)); tracef(" for simulation[%i] steps=%i\n", i, steps+1); file simout <single_file_mapper; file=@strcat("output/sim_",i,".out")>; simout = simulation(simulation_script, steps+1, range, bias, 100000, count); sims[i] = simout; } file stats<"output/stats.out">; # Final output file: average of all "simulations" stats = analyze(analysis_script,sims);
To run:
$ cd part09 $ swift p9.swift
part10
p10.swift is exactly the same as p9.swift. Instead of the swift script, take a look at the sites.xml configuration file. The sites.xml file determines where swift runs its job at. Here the line with the condor requirement to select nodes from the UC3 seeder cluster is left un-commented to select that site.
<profile namespace="globus" key="condor.Requirements">regexp("uc3-c*", Machine)</profile>
The condor requirements for selecting nodes from UC3 seeder, ITS Virtualization lab, Open Science Grid and Atlas Midwest Tier 2 (at UC, IU, UIUC) are present in the sites.xml file. To choose any of these sites, simply uncomment the requirement line for the target system and run the swift script as:
To run:
$ cd part10 $ swift p10.swift
Once the script completes, run the script find_host.sh to find where the jobs were run.
./find_host.sh