How to use this guide

This manual provides a reference on how to run Swift/T Turbine programs on a variety of systems. It also contains an index of sites maintained by the Swift/T team for use by Turbine.

For each machine, a public installation and/or a build procedure will be provided. The user need only follow one set of directions.

A login node installation may be available on certain systems. This will run Swift/T on the login node of that system. This only acceptable for short debugging runs of 1 minute or less. It will affect other users so please be cautious when using this mode for debugging.

Public installations

These are maintained by the Swift/T team. Because they may become out of date after a release, the release version and a timestamp are recorded below.

To request maintenance on a public installation, simply email exm-user@mcs.anl.gov .

Build procedures

The build procedure is based on the installation process described in the Swift/T Guide. You should follow that build procedure, and use this guide for information on specific configuration settings for your system.

The settings are generally implemented by modifying the exm-settings.sh configuration script. In some cases, where the setting is not configurable through exm-settings.sh, it may be necessary to directly modify the configure or make command lines by following the manual build process, or by modifying build scripts under the build subdirectory (e.g. turbine-build.sh).

Version numbers

The component version numbers that correspond together to make up a Swift/T release may be found on the Downloads page.

Freshness

These instructions may become stale for various reasons. For example, system administrators may update directory locations, breaking these instructions. Thus, we mark As of: dates on the instructions for each system.

To report a problem, simply email exm-user@mcs.anl.gov .

For more information

Turbine as MPI program

Turbine is a moderately complex MPI program. It is essentially a Tcl library that glues together multiple C-based systems, including MPI, ADLB, and the Turbine dataflow library.

Running Turbine on a MPI-enabled system works as follows:

  • Compilation and installation: This builds the Turbine libraries and links with the system-specific MPI library. STC must also be informed of the Turbine installation to access correct built-in function information

  • Run-time configuration: The startup job submission script locates the Turbine installation and reads configuration information

  • Process launch: The Tcl shell, tclsh, is launched in parallel and configuration information is passed to it so it can find the libraries. The Tcl program script is the STC-generated user program file. The MPI library enables communication among the tclsh processes.

Each of the systems below follows this basic outline.

On simpler systems, use the turbine program. This is a small shell script wrapper that configures Turbine and essentially runs:

mpiexec tclsh program.tcl

On more complex, scheduled systems, users do not invoke mpiexec directly. Thus, sample scripts are provided below.

Submitting Turbine jobs on scheduled systems

On scheduled systems (PBS, SLURM, Cobalt, etc.), Turbine is launched with a customized run script (turbine-<name>-run) that launches Turbine on that system. This produces a batch script if necessary and submits it with the job submission program (e.g., qsub).

Turbine run scripts

PBS

turbine-pbs-run.zsh

Cobalt

turbine-cobalt-run.zsh

Cray/APRUN

turbine-aprun-run.zsh (PBS with Cray’s aprun)

SLURM

turbine-slurm-run.zsh

Each script accepts input via environment variables and command-line options.

A typical invocation is:

stc program.swift
turbine-pbs-run.zsh -n 96 -s settings.sh program.tcl

where program.tcl is the output of STC and settings.sh contains:

export QUEUE=bigqueue
export PPN=8

which would run program.tcl in 96 MPI processes on 12 nodes (8 processes per node), submitted by PBS to queue bigqueue.

Turbine scheduler variables

For scheduled systems, Turbine accepts a common set of environment variables.

PROCS

Number of processes to use

PPN

Number of processes per node

PROJECT

The project name to use with the system scheduler

QUEUE

Name of queue in which to run

TURBINE_OUTPUT

Directory in which to place Turbine output (if unset, a default value is automatically created)

TURBINE_OUTPUT_ROOT

Directory under which Turbine will automatically create TURBINE_OUTPUT if necessary

Turbine scheduler script options

For scheduled systems, Turbine accepts a common set of command line options.

-d <directory>

Set the Turbine output directory. (Overrides TURBINE_OUTPUT).

-e <key>=<value>

Set an environment variable in the job environment. This may be used multiple times

-i <script>

Set a script to run before launching Turbine. This script will have TURBINE_OUTPUT in the environment, so you may perform additional configuration just before job launch.

-n <procs>

Number of processes. (Overrides PROCS.)

-o <directory>

Set the Turbine output directory root, in which default Turbine output directories are automatically created based on the date. (Overrides TURBINE_OUTPUT_ROOT.)

-s <script>

Source this file for environment variables. These variables override any other Turbine scheduler variables. You may place arbitrary shell code in this script.

-t <time>

Set scheduler walltime. The argument format is passed through to the scheduler

-V

Make script verbose. This typically just applies set -x, allowing you to inspect variables and arguments as passed to the system scheduler.

-x

Use turbine_sh launcher with compiled-in libraries instead of tclsh (reduces number of files that must be read from file system).

-X

Run standalone Turbine executable (created by mkstatic.tcl) instead of program.tcl.

Turbine output directory

The working directory (PWD) for the job is called TURBINE_OUTPUT.

If the user does not set this variable, Turbine will select one based on the date and report it. The automatically selected directory will be placed under TURBINE_OUTPUT_ROOT, which defaults to $HOME/turbine-output. The user program will be copied to TURBINE_OUTPUT before submission. Standard output and error goes to TURBINE_OUTPUT/output.txt.

Tip
When running on a big HPC machine, it may be difficult to get STC (a Java-based program) running. STC output (program.tcl) is platform-independent. You may run STC to develop and debug your script on your local workstation, then simply copy program.tcl to the big machine for execution. Just make sure that the STC and Turbine versions are compatible (the same release number).

x86 clusters

Generic clusters

This is the simplest method to run Turbine.

Build procedure

The exm-setup.zsh script should work without any special configuration.

To run, simply build a MPI hosts file and pass that to Turbine, which will pass it to mpiexec.

turbine -l -n 3 -f hosts.txt program.tcl

MCS compute servers

Compute servers at MCS Division, ANL. Operates as a generic cluster (see above).

echo crush.mcs.anl.gov >  hosts.txt
echo crank.mcs.anl.gov >> hosts.txt
turbine -l -n 3 -f hosts.txt program.tcl

Public installation

As of: trunk, 8/13/2013

MCS users are welcome to use this installation.

  • STC: ~wozniak/Public/stc/bin/stc

  • Turbine: ~wozniak/Public/turbine/bin/turbine

Breadboard

Breadboard is a cloud-ish cluster for software development in MCS. This is a fragile resource used by many MCS developers. Do not overuse.

Operates as a generic cluster (see above). No scheduler. Once you have the nodes, you can use them until you release them or time expires (12 hours by default).

  1. Allocate nodes with heckle. See Breadboard wiki

  2. Wait for nodes to boot

  3. Use heckle allocate -w for better interaction

  4. Create MPICH hosts file:

    heckle stat | grep $USER | cut -f 1 -d ' ' > hosts.txt
  5. Run:

    export TURBINE_LAUNCH_OPTS='-f hosts.txt'
    turbine -l -n 4 program.tcl
  6. Run as many jobs as desired on the allocation

  7. When done, release the allocation:

    for h in $( cat hosts.txt )
    do
      heckle free $h
    done

Midway

Midway is a mid-sized SLURM cluster at the University of Chicago

Public installation

As of: 0.2.1 - 02/11/2013

  • STC: ~wozniak/Public/stc-0.0.3/bin/stc

To run:

srun ~wozniak/Public/turbine-0.1.1/scripts/submit/slurm/turbine-slurm.sh -n 3 ~/program.tcl

Build procedure

  • Midway uses OpenMPI. We have tested with /software/openmpi-1.6-el6-x86_64

  • Put mpicc in your PATH

  • Use these settings in exm-settings.sh:

    export LDFLAGS="-Wl,-rpath -Wl,/software/openmpi-1.6-el6-x86_64/lib"
    MPI_VERSION=2
    MPI_LIB_NAME=mpi
  • Or if doing a manual build with configure and make:

    • Configure ADLB with:

      LDFLAGS="-Wl,-rpath -Wl,/software/openmpi-1.6-el6-x86_64/lib" --enable-mpi-2
    • Configure Turbine with:

       --with-mpi-lib-name=mpi

Eureka

Eureka is a 100-node x86 cluster at the Argonne Leadership Computing Facility (ALCF). It uses the Cobalt scheduler.

Build procedure

  • Configure Tcl and c-utils with gcc

  • Configure ADLB with mpicc

To run:

export MODE=cluster
submit/cobalt/turbine-cobalt-run.zsh -n 3 ~/program.tcl

The normal Turbine environment variables are honored, plus the Turbine scheduler variables.

Tukey

Tukey is a 96-node x86 cluster at the Argonne Leadership Computing Facility (ALCF). It uses the Cobalt scheduler.

As of: Trunk, 4/9/2014

Public installation

Add to PATH: * STC: wozniak/Public/sfw/x86/stc/bin * Turbine submit script: wozniak/Public/sfw/x86/turbine/scripts/submit/cobalt

To run:

export MODE=cluster
export QUEUE=pubnet
export PROJECT=...
turbine-cobalt-run.zsh -n 3 program.tcl

Build procedure

  • Check that the system-provided MVAPICH mpicc is in your PATH

  • Configure c-utils with gcc

  • Configure ADLB with CC=mpicc --enable-mpi-2

  • Configure Turbine with --with-launcher=/soft/libraries/mpi/mvapich2/gcc/bin/mpiexec

Blues

Blues is a 310-node x86 cluster at ANL. It uses PBS.

Public installation

  • STC: ~wozniak/Public/stc/bin/stc

To run:

export QUEUE=batch
~wozniak/Public/turbine/scripts/submit/pbs/turbine-pbs-run.zsh -n 3 program.tcl

See the Turbine scheduler variables and Turbine run script options for additional settings.

Build procedure

Apparently need to append to use GCC 4.7.2 and set LD_LIBRARY_PATH:

$ which gcc
/software/gcc-4.7.2/bin/gcc
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/software/gcc-4.7.2/lib64

Blue Gene

The Blue Gene systems at ANL are scheduled systems that use Cobalt.

  • The job ID is placed in TURBINE_OUTPUT/jobid.txt

  • Job metadata is placed in TURBINE_OUTPUT/turbine-cobalt.log

  • The Cobalt log is placed in TURBINE_OUTPUT

Blue Gene/P

Surveyor/Intrepid/Challenger

These machines were at the Argonne Leadership Computing Facility (ALCF). Other existing Blue Gene/P systems may be configured in a similar way.

Public installation
  • Based on trunk

  • STC: ~wozniak/Public/stc-trunk/bin/stc

To run:

~wozniak/Public/turbine/scripts/submit/cobalt/turbine-cobalt-run.zsh -n 3 ~/program.tcl
Build procedure

To run on the login node:

  • Install MPICH for the login nodes

  • Configure Tcl and c-utils with gcc

  • Configure ADLB with your MPICH

  • Configure Turbine with

    --enable-bgp LDFLAGS=-shared-libgcc

    This makes adjustments for some Blue Gene quirks.

  • Then, simply use the bin/turbine program to run. Be cautious in your use of the login nodes to avoid affecting other users.

To run on the compute nodes under IBM CNK:

In this mode, you cannot use app functions to launch external programs because CNK does not support this. See ZeptoOS below.

  • Configure Tcl with mpixlc

  • Configure c-utils with gcc

  • Configure ADLB with:

    --enable-xlc
    CC=/bgsys/drivers/ppcfloor/comm/bin/mpixlc
  • Configure Turbine with:

    CC=/soft/apps/gcc-4.3.2/gnu-linux/bin/powerpc-bgp-linux-gcc
    --enable-custom
    --with-mpi-include=/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/comm/default/include

To run, use scripts/submit/bgp/turbine-cobalt.zsh See the script header for usage.

To run on the compute nodes under ZeptoOS:

  • Configure Tcl with zmpicc

  • Configure c-utils with gcc

  • Configure ADLB with

    CC=zmpicc --enable-mpi-2
  • Configure Turbine with

    CC=/soft/apps/gcc-4.3.2/gnu-linux/bin/powerpc-bgp-linux-gcc
    --enable-custom
    --with-mpi-include=/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/comm/default/include

To run, use scripts/submit/bgp/turbine-cobalt.zsh See the script header for usage.

Blue Gene/Q

Vesta

Public installation

As of: 0.4.0 - 07/29/2013

  • STC: ~wozniak/Public/stc-bgq/bin/stc

  • Turbine: ~wozniak/Public/turbine-bgq/scrips/submit/cobalt/turbine-cobalt-run.zsh

  • Run as:

    export MODE=BGQ
    export QUEUE=<queue_name>
    turbine-cobalt-run.zsh -n 3 program.tcl

The normal Turbine environment variables are honored, plus the Turbine scheduler variables.

Cetus

As of: 0.5.0 - 4/4/2014

Add to PATH: * STC: wozniak/Public/ppc64/stc/bin * Turbine: wozniak/Public/ppc64/turbine/scripts/submit/cobalt

  • Run as:

    export MODE=BGQ
    export QUEUE=...
    export PROJECT=...
    turbine-cobalt-run.zsh -n 3 program.tcl
Build procedure

As of: 0.5.0 - 4/4/2014

Tcl:

The GCC installation does not support shared libraries. Thus, you must compile Tcl with bgxlc. You must modify the Makefile to use bgxlc arguments: -qpic, -qmkshrobj. You must link with -qnostaticlink.

You may get errors that say wrong digit. This is apparently a bgxlc bug when applied to Tcl’s StrToD.c. Compiling this file with -O3 fixes the problem.

Put /bgsys/drivers/ppcfloor/comm/gcc/bin in your PATH.

  • Compile c-utils with /usr/bin/gcc

  • Configure ADLB with mpixlc --enable-mpi-2 --enable-xlc

  • Configure Turbine with /usr/bin/gcc --disable-static --with-tcl=/home/wozniak/Public/sfw/ppc64/tcl-8.5.12

External scripting:

  • Python

    • Configure Python with BGXLC

  • R

    • Configure R with GCC as usual

    • Run with:

      turbine-cobalt-run.zsh -e R_HOME=/path/to/R/lib64/R -e LD_LIBRARY_PATH=/path/to/R/lib64/R/lib

Cray

Blue Waters

Blue Waters is a Cray XE6/XK7 at the University of Illinois at Urbana-Champaign.

Build procedure

As of: 11/05/2013

Cray systems do not use mpicc. We set CC=gcc and use compiler flags to configure the MPI library.

  • Use the following settings in exm-settings.sh

    export CC=cc
    
    MPI_VERSION=2
    EXM_CUSTOM_MPI=1
    
    EXM_CRAY=1
    
    # Optionally, if you want to exclusively build static executables
    EXM_BUILD_STATIC=1

Or, if doing a manual build with configure/make:

  • Configure ADLB with:

    ./configure --prefix=/path/to/lb --with-c-utils=/path/to/c-utils
    CC=gcc
    CFLAGS=-I/opt/cray/mpt/default/gni/mpich2-gnu/47/include
    LDFLAGS="-L/opt/cray/mpt/default/gni/mpich2-gnu/47/lib -lmpich"
    --enable-mpi-2
  • In the Turbine configure step, replace the --with-mpi option with:

    --enable-custom-mpi --with-mpi=/opt/cray/mpt/default/gni/mpich2-gnu/47

Submitting jobs

Submitting jobs on Blue Waters is largely the same with with other Cray systems. One difference is that the size of the job is specified using a different notation.

Blue Waters requires the submit script to specify job size using different directives to other Cray systems. It does not support the mpp directives: trying to use an mpp directive may cause your job to be rejected or stuck in the queue. The correct directive is:

#PBS -l nodes=1:ppn=32

The turbine-aprun-run.zsh script supports Blue Waters. You can invoke it as follows (for a single node/32 processes per node):

QUEUE=normal BLUE_WATERS=true PPN=32 turbine-aprun-run.zsh -n 32 helloworld.tcl

Installing from source

Prerequisites
  • Tcl 8.5 is installed in: /usr/bin/tclsh

  • Swig 1.3.36 is installed in: /usr/bin/swig

  • The following steps ensure that the right compiler modules are loaded.

  • Switch your programming environment to use gcc.

module unload PrgEnv-cray
module load PrgEnv-gnu
  • Load module with latest Oracle Java JDK 7+

module load java
  • Download and install the Apache Ant build tool (required to build STC)

wget http://www.apache.org/dist/ant/binaries/apache-ant-1.9.2-bin.tar.bz2
# Check archive is valid
ant_xsum=$(shasum apache-ant-1.9.2-bin.tar.bz2 | awk '{ print $1 }')
if [ ! "$ant_xsum" = "50cfaaeecee4f88a3ff9de5068fc98e4e9268daf" ]
then
  echo "Bad ant download checksum"
fi

# Extract ant install ant somewhere permanent
tar xvjf apache-ant-1.9.2-bin.tar.bz2
mkdir -p ~/soft
mv apache-ant-1.9.2/ ~/soft/

# Add ant to path (put this in .bashrc)
export PATH="$PATH:$HOME/soft/apache-ant-1.9.2/bin"

# Check ant version
ant -version
Installation
  • Need to install to a lustre fs:

  • /scratch (not backed up, best performance)

  • /u home directory (backed up, good performance)

  • /projects (backed up, good performance)

  • I used a prepackaged distro built using distro/construct.zsh -t to build from trunk. The following instructions are to install from this distro to trunk on Blue Waters.

  • First extract the tarball

tar xvzf exm-trunk.tar.gz
cd exm-trunk
  • exm-settings.sh needs some customization. The changed settings were:

EXM_PREFIX=/u/sciteam/tarmstro/soft/exm-trunk-r8770

# Use the latest GNU-compatible version of mpich
EXM_MPI=/opt/cray/mpt/default/gni/mpich2-gnu/48
# Need to use gcc (mpicc doesn't exist on Cray)
EXM_MPICC=`which gcc`

# Custom MPI
EXM_CUSTOM_MPI=1

# Since we're not using mpicc wrapper, add CC options for MPI libraries
export CFLAGS="-I/opt/cray/mpt/default/gni/mpich2-gnu/48/include/"
export LDFLAGS="-L/opt/cray/mpt/default/gni/mpich2-gnu/48/lib/ -lmpich"

# Currently MPI 3 not supported
MPI_VERSION=2

Beagle

Beagle is a Cray XE6 at the University of Chicago

Remember that at run time, Beagle jobs can access only /lustre, not NFS (including home directories). Thus, you must install Turbine and its libraries in /lustre. Also, your data must be in /lustre.

Public installation

Login nodes

This installation is for use on the login node.

  • Swift/T trunk - 6/11/2013

  • Turbine: ~wozniak/Public/turbine-trunk-beagle-login

  • STC: ~wozniak/Public/stc-trunk-beagle-login

Compute nodes
  • Swift/T trunk - 6/11/2013

  • Turbine: /lustre/beagle/wozniak/Public/turbine

  • STC: /lustre/beagle/wozniak/Public/stc

To run:

  1. Set environment variables. The normal Turbine environment variables are honored, plus the Turbine scheduler variables and Turbine scheduler options..

  2. Run submit script (in turbine/scripts/submit/cray):

    turbine-aprun-run.zsh -n <numprocs> script.tcl --arg1=value1 ...

Build procedure

Cray systems do not use mpicc. We set CC=gcc and use compiler flags to configure the MPI library.

  • Configure ADLB with:

    ./configure --prefix=/path/to/lb --with-c-utils=/path/to/c-utils
    CC=gcc
    CFLAGS=-I/opt/cray/mpt/default/gni/mpich2-gnu/47/include
    LDFLAGS="-L/opt/cray/mpt/default/gni/mpich2-gnu/47/lib -lmpich"
    --enable-mpi-2
  • In the Turbine configure step, replace the --with-mpi option with:

    --enable-custom-mpi --with-mpi=/opt/cray/mpt/default/gni/mpich2-gnu/47

Build procedure with MPE

Configure MPE 1.3.0 with:

export CFLAGS=-fPIC
export MPI_CFLAGS="-I/opt/cray/mpt/default/gni/mpich2-gnu/47/include -fPIC"
export LDFLAGS="-L/opt/cray/mpt/default/gni/mpich2-gnu/47/lib -lmpich"
export F77=gfortran
export MPI_F77=$F77
export MPI_FFLAGS=$MPI_CFLAGS
CC="gcc -fPIC" ./configure --prefix=... --disable-graphics

Configure ADLB with:

export CFLAGS=-mpilog
export LDFLAGS="-L/path/to/mpe/lib -lmpe -Wl,-rpath -Wl,/path/to/mpe/lib"
./configure --prefix=... CC=mpecc --with-c-utils=/path/to/c-utils --with-mpe=/path/to/mpe --enable-mpi-2

Configure Turbine with:

./configure --enable-custom-mpi --with-mpi=/opt/cray/mpt/default/gni/mpich2-gnu/47 --with-mpe=/path/to/mpe

Raven

Raven is a Cray XE6/XK7 at Cray.

Build procedure

  • Configure ADLB with:

    ./configure --prefix=/path/to/lb --with-c-utils=/path/to/c-utils
    CC=gcc
    CFLAGS=-I/opt/cray/mpt/default/gni/mpich2-gnu/46/include
    LDFLAGS="-L/opt/cray/mpt/default/gni/mpich2-gnu/46/lib -lmpich"
    --enable-mpi-2
  • In the Turbine configure step, use:

    --with-mpi=/opt/cray/mpt/default/gni/mpich2-gnu/46
  • Use this Java when compiling/running STC: /opt/java/jdk1.7.0_07/bin/java

To run:

  1. Set environment variables. The normal Turbine environment variables are honored, plus the Turbine scheduler variables.

  2. Run submit script (in turbine/scripts/submit/cray):

    turbine-aprun-run.zsh script.tcl --arg1=value1 ...

Advanced usage:

Turbine uses a PBS template file called turbine/scripts/submit/cray/turbine-aprun.sh.m4. This file is simply filtered and submitted via qsub. You can edit this file to add additional settings as necessary.

Module:

You may load Swift/T with:

module use /home/users/p01577/Public/modules
module load swift-t

Cloud

EC2

Setup

  • Install ec2-host on your local system

  • Launch EC2 instances.

    • Enable SSH among instances.

    • Firewall settings must allow all all TCP/IP traffic for MPICH to run.

    • If necessary, install Swift/T

    • An AMI with Swift/T installed is available

  • Use the provided script turbine/scripts/submit/ec2/turbine-setup-ec2.zsh.

    • See the script header for usage notes

    • This will configure SSH settings and create a hosts file for MPICH and install them on the EC2 instance

To run:

export TURBINE_LAUNCH_OPTS="-f $HOME/hosts.txt"
turbine program.tcl