Swift Basics ------------ Installation ~~~~~~~~~~~~ This section takes you through the installation of the Swift system on your computer. We will start with the prerequisites as explained in the subsequent section. Prerequisites ^^^^^^^^^^^^^^ .Check your Java Swift is a Java application. Make sure you are running Java version 5 or higher. You can make sure you have Java in your $PATH (or $HOME/.soft file depending upon your environment) Following are the possible ways to detect and run Java: ---- $ grep java $HOME/.soft #+java-sun # Gives you Java 5 +java-1.6.0_03-sun-r1 $ which java /soft/java-1.6.0_11-sun-r1/bin/java $ java -version java version "1.6.0_11" Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode) ---- Setting up to run Swift ~~~~~~~~~~~~~~~~~~~~~~~~ This is simple. We will be using a pre-compiled version of Swift that can be downloaded from link:http://www.ci.uchicago.edu/swift/downloads/index.php[here]. Download and untar the latest precompiled version as follows: ---- $ tar xf swift-0.92.1.tar.gz ---- Environment Setup ^^^^^^^^^^^^^^^^^^ The examples were tested with Java version 1.6. Make sure you do not already have Swift in your PATH. If you do, remove it, or remove any +swift or @swift lines from your $HOME/.soft or $HOME/.bash_profile file. Then do: ---- PATH=$PATH:/path/to/swift/bin ---- Note that the environment will be different when using Swift from prebuilt distribution (as above) and trunk. The PATH setup when using swift from trunk would be as follows: ---- PATH=$PATH:/path/to/swift/dist/swift-svn/bin ---- WARNING: Do NOT set SWIFT_HOME or CLASSPATH in your environment unless you fully understand how these will affect Swift's execution. To execute your Swift script on a login host (or "localhost") use the following command: ---- swift -tc.file tc somescript.swift ---- Setting transformation catalog ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The transformation catalog lists where application executables are located on remote sites. By default, the site catalog is stored in etc/tc.data. This path can be overridden with the tc.file configuration property, either in the Swift configuration file or on the command line. The format is one line per executable per site, with fields separated by tabs or spaces. Some example entries: ---- localhost echo /bin/echo INSTALLED INTEL32::LINUX null TGUC touch /usr/bin/touch INSTALLED INTEL32::LINUX GLOBUS::maxwalltime="00:00:10" ---- The fields are: _site_, _transformation-name_, _executable-path_, _installation-status_, _platform_, and _profile_ entries. The _site_ field should correspond to a site name listed in the sites catalog. The _transformation-name_ should correspond to the transformation name used in a SwiftScript app procedure. The _executable-path_ should specify where the particular executable is located on that site. The _installation-status_ and _platform_ fields are not used. Set them to **INSTALLED** and **INTEL32::LINUX** respectively. The _profiles_ field should be set to null if no profile entries are to be specified, or should contain the profile entries separated by semicolons. Setting swift configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Many configuration properties could be set using the Swift configuration file. We will not cover them all in this section. see link:http://www.ci.uchicago.edu/swift/guides/userguide.php#engineconfiguration[here] for details. In this section we will cover a simple configuration file with the most basic properties. ---- # A comment wrapperlog.always.transfer=true sitedir.keep=true execution.retries=1 lazy.errors=true status.mode=provider use.provider.staging=true provider.staging.pin.swiftfiles=false clustering.enabled=false clustering.queue.delay=10 clustering.min.time=86400 foreach.max.threads=100 provenance.log=true ---- Setting sites.xml ^^^^^^^^^^^^^^^^^^ sites.xml specifies details of the sites that Swift can run on. Following is an example of a simple sites.xml file entry for running Swift on local environment: [xml] source~~~~~~ /var/tmp .07 100000 source~~~~~~ First SwiftScript ~~~~~~~~~~~~~~~~~ Your first SwiftScript Hello Swift-World! A good sanity check that Swift is set up and running OK locally is this: ---- $ which swift /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/swift $ echo 'trace("Hello, Swift world!");' >hello.swift $ swift hello.swift Swift svn swift-r3202 cog-r2682 RunID: 20100115-1240-6xhzxuz3 Progress: SwiftScript trace: Hello, Swift world! Final status: $ ---- A good first tutorial in using Swift is at: http://www.ci.uchicago.edu/swift/guides/tutorial.php. Follow the steps in that tutorial to learn how to run a few simple scripts on the login host. second SwiftScript ~~~~~~~~~~~~~~~~~~~ Following is a more involved Swift script. [java] source~~~~~~~ type file; app (file o) cat (file i) { cat @i stdout=@o; } file out[]; foreach j in [1:@toint(@arg("n","1"))] { file data<"data.txt">; out[j] = cat(data); } source~~~~~~~ Swift Commandline Options ~~~~~~~~~~~~~~~~~~~~~~~~~ A description of Swift Commandline Options Also includes a description of Swift inputs and outputs. What if Swift hangs ~~~~~~~~~~~~~~~~~~~ Owing to its highly multithreaded architecture it is often the case that the underlying java virtual machine gets into deadlock situations or Swift hangs because of other complications in its threaded operations. Under such situations, Swift _hang-checker_ chips in and resolves the situation. . how to use the information to identify and correct the deadlock. . How close to the Swift source code can we make the hang-checker messages, so that the user can relate it to Swift functions, expressions, and ideally source code lines? . The current Hang Checker output is actually *very* nice and useful already: ---- Registered futures: Rupture[] rups Closed, 1 elements, 0 listeners Variation vars - Closed, no listeners SgtDim sub - Open, 1 listeners string site Closed, no listeners Variation[] vars Closed, 72 elements, 0 listeners ---- Resuming a stopped or crashed Swift Run ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I had a .rlog file from a Swift run that ran out of time. I kicked it off using the -resume flag described in section 16.2 of the Swift User Guide and it picked up where it left off. Then I killed it because I wanted to make changes to my sites file. ---- . . . Progress: Selecting site:1150 Stage in:55 Active:3 Checking status:1 Stage out:37 Finished in previous run:2462 Finished successfully:96 Progress: Selecting site:1150 Stage in:55 Active:2 Checking status:1 Stage out:38 Finished in previous run:2462 Finished successfully:96 Cleaning up... Shutting down service at https://192.5.86.6:54813 Got channel MetaChannel: 1293358091 -> null + Done Canceling job 9297.svc.pads.ci.uchicago.edu ---- No new rlog file was emitted but it did recognize the progress that had been made, the 96 tasks that finished sucessfully above and resumed from 2558 tasks finished. ---- [nbest@login2 files]$ pwd /home/nbest/bigdata/files [nbest@login2 files]$ ~wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/swift \ > -tc.file tc -sites.file pbs.xml ~/scripts/mcd12q1.swift -resume > mcd12q1-20100310-1326-ptxe1x1d.0.rlog Swift svn swift-r3255 (swift modified locally) cog-r2723 (cog modified locally) RunID: 20100311-1027-148caf0a Progress: Progress: uninitialized:4 Progress: Selecting site:671 Initializing site shared directory:1 Finished in previous run:1864 Progress: uninitialized:1 Selecting site:576 Stage in:96 Finished in previous run:1864 Progress: Selecting site:1150 Stage in:94 Submitting:2 Finished in previous run:2558 Progress: Selecting site:1150 Stage in:94 Submitted:2 Finished in previous run:2558 Progress: Selecting site:1150 Stage in:93 Submitting:1 Submitted:2 Finished in previous run:2558 Progress: Selecting site:1150 Stage in:90 Submitting:1 Submitted:5 Finished in previous run:2558 Progress: Selecting site:1150 Stage in:90 Submitted:5 Active:1 Finished in previous run:2558 ---- From Neil: A comment about that section of the user guide: It says "In order to restart from a restart log file, the -resume logfile argument can be used after the SwiftScript? program file name." and then puts the -resume logfile argument before the script file name. I'm sure the order doesn't matter but the contradiction is confusing. Notes to add (from Mike): - explain what aspects of a Swift script make it restartable, and which aspects are notrestartable. Eg, if your mappers can return different data at different times, what happens? What other non-determinsitc behavior would cause unpredictable, unexpected, or undesired behavior on resumption? - explain what changes you can make in the execution environment (eg increasing or reducing CPUs to run on or throttles, etc); fixing tc.data entries, env vars, or apps, etc. - note that resume will again retry failed app() calls. Explain if the retry count starts over or not. - explain how to resume after multiple failures and resumes - i.e. if a .rlog is generated on each run, which one should you resume from? Do you have a choice of resuming from any of them, and what happens if you go backwards to an older resume file? - whap happens when you kill (eg with ^C) a running swift script? Is the signal caught, and the resume file written out at that point? Or written out all along? (Note case in which script ws running for hours, then hit ^C, but resume fie was short (54 bbytes) and swift shows no sign of doing a resume? (It silently ignored resume file instead of acknowleging that it found one with not useful resume state in it???) Swift should clearly state that its resuming and what its resume state is. +swift -resume ftdock-[id].0.rlog \[rest of the exact command line from initial run\]+ Passing an array to swift? ~~~~~~~~~~~~~~~~~~~~~~~~~~ Arrays can be passed to Swift in one of the following ways: . You can write the array to a file and read in in swift using readData (or readData2). . Direct the array into a file (possibly with a "here document" which expands the array) and then read the file in Swift with readData() or process it with a Swift app() function? . You can use @strsplit on a comma separated command line arg and that works well for me. Mappers ^^^^^^^^ SimpleMapper ---- $ cat swiftapply.swift ---- [java] source~~~~ type RFile; trace("hi 1"); app (RFile result) RunR (RFile rcall) { RunR @rcall @result; } trace("hi 2"); RFile rcalls[] ; RFile results[] ; trace("start"); foreach c, i in rcalls { trace("c",i,@c); trace("r",i,@filename(results[i])); results[i] = RunR(c); } source~~~~ ---- $ ls calldir resdir calldir: rcall.1.Rdata rcall.2.Rdata rcall.3.Rdata rcall.4.Rdata resdir: result.1.Rdata result.2.Rdata result.3.Rdata result.4.Rdata $ ---- Notes: how the .'s match prefix and suffix dont span dirs intervening pattern must be digits these digits become the array indices explain how padding= arg works & helps (including padding=0) figure out and explain differences between simple_mapper and filesys_mapper FIXME: Use the "filesys_mapper" and its "location=" parameter to map the input data from /home/wilde/bigdata/* Abbreviations for SingleFileMapper Notes: within <> you can only have a literal string as in <"filename">, not an expression. Someday we will fix this to make <> accept a general expression. you can use @filenames( ) (note: plural) to pull off a list of filenames. writeData() example here ---- $ cat writedata.swift ---- [java] source~~~~ type file; file f <"filea">; file nf <"filenames">; nf = writeData(@f); source~~~~ ---- $ swift writedata.swift Swift svn swift-r3264 (swift modified locally) cog-r2730 (cog modified locally) RunID: 20100319-2002-s9vpo0pe Progress: Final status: $ cat filenames filea$ ---- StructuredRegexpMapper IN PROGRESS This mapper can be used to base the mapped filenames of an output array on the mapped filenames of an existing array. landuse outputfiles[] ; Use the undocumented "structured_regexp_mapper" to name the output filenames based on the input filenames: For example: ---- login2$ ls /home/wilde/bigdata/data/sample h11v04.histogram h11v05.histogram h12v04.histogram h32v08.histogram h11v04.tif h11v05.tif h12v04.tif h32v08.tif login2$ login2$ cat regexp2.swift type tif; type mytype; tif images[]; mytype of[] ; foreach image, i in images { trace(i,@filename(images)); trace(i,@filename(of[i])); } login2$ login1$ swift regexp2.swift Swift svn swift-r3255 (swift modified locally) cog-r2723 (cog modified locally) RunID: 20100310-1105-4okarq08 Progress: SwiftScript trace: 1, output/myfile.h11v04.mytype SwiftScript trace: 2, home/wilde/bigdata/data/sample/h11v05.tif SwiftScript trace: 3, home/wilde/bigdata/data/sample/h12v04.tif SwiftScript trace: 0, output/myfile.h32v08.mytype SwiftScript trace: 0, home/wilde/bigdata/data/sample/h32v08.tif SwiftScript trace: 3, output/myfile.h12v04.mytype SwiftScript trace: 1, home/wilde/bigdata/data/sample/h11v04.tif SwiftScript trace: 2, output/myfile.h11v05.mytype Final status: login1$ ----