Broadband Platform on HPCC

From SCECpedia
Revision as of 00:01, 26 September 2012 by Fsilva (talk | contribs)
Jump to navigationJump to search

Current broadband studies may exceed 200K seismograms. To produce this number of seismograms, we need to use high performance computing.

Overview

In order to run the Broadband Platform on HPCC, users need to follow these steps:

  1. Installing and Building Broadband on HPCC
  2. Installing Desired Green's Functions and Validation Packages
  3. Configuring Required Environment Variables
  4. Creating Validation Runs
  5. Running Simulations

Broadband File System Issues

When running on HPCC, the validation scripts set up Broadband's BBP_DATA_DIR to use a directory in the /tmp filesystem. Because Broadband simulations are very I/O intensive, reading and writing thousands of small/medium files on an average simulation, we use a local filesystem on each compute node to minimize remote reads and writes and thus improve execution time. This approach also avoids creating a bottleneck a file server and eliminates unnecessary network traffic. It is therefore possible for multiple users to run their simulations on HPCC without significant interference.

Because the /tmp filesystem on each node is automatically cleaned at the end of each simulation, it is necessary to copy all wanted files to a permanent location. The HPCC validation scripts do that automatically after the simulations are finished (but before the PBS job ends).

Installing and Building BBP on HPCC

Users should log into HPCC's hpc-login2.usc.edu head node and use the rcf-104 filesystem for their simulations. The rcf-104 filesystem is visible from the head node and from all worker nodes in the cluster. The first step in setting up the Broadband Platform on HPCC is to download and build the platform. Users should make sure they have a version of Broadband 12.x.x or greater. as Broadband releases 11.2.3 and earlier cannot be used on the USC HPCC cluster according to these instructions. It is also possible to use the svn version of Broadband (as described in the User Guide), but users should be aware that unreleased Broadband code from svn can change daily and is not recommended for official/paper simulations. After downloading the Broadband package from the website, users need to untar it using the following command:

$ tar -xzvf bbp_dist_<version>.tgz

Before compiling Broadband, users will need to set up their environments. Depending on the shell employed, users will need too add the following lines:

csh -- add the following lines to the .cshrc file

# Setup for running broadband
source /usr/usc/gnu/gcc/default/setup.csh
source /usr/usc/intel/10.0/setup.csh
source /home/scec-00/opt/Python-2.6.2/setup.csh
source /usr/usc/matlab/default/setup.csh

bash -- add the following lines to the .bashrc file

# Setup for running broadband
source /usr/usc/gnu/gcc/default/setup.sh
source /usr/usc/intel/10.0/setup.sh
source /home/scec-00/opt/Python-2.6.2/setup.sh
source /usr/usc/matlab/default/setup.sh

It may be necessary to logout and login back again for these changes to be incorporated in the user environment (alternatively, users can source the changed file to force the changes to take effect immediately). To make sure the correct compilers are set up, users can type the following commands:

$ ifort --version
ifort (IFORT) 10.0 20070426
Copyright (C) 1985-2007 Intel Corporation.  All rights reserved.

$ gcc --version
gcc (GCC) 4.3.3
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Once the environment is configured with the proper compilers, users can build the Broadband Platform. Assuming the platform is installed in the /home/rcf-104/earthquake/bbp directory, users should do the following to build the Broadband Platform:

$ cd /home/rcf-104/earthquake/bbp/src
$ make

If all compilers were properly added to the user's path, the code will start compiling. This process can take a while, and users may encounter some "build warnings", which are fine. If compilation errors are found, the problem needs to be investigated further.

Installing Desired Green's Functions and Validation Packages

In order to run simulations, users will need to download and install one or more velocity models/Green's Functions. Validation packages are only required for historical simulations/validation runs. The first step is to create a top-level directory where all Green's Functions packages will reside.

$ cd /home/rcf-104/earthquake
$ mkdir bbp_gf
$ cd bbp_gf

Then, users need to untar inside the Green's Functions top-level directory each Green's Functions package downloaded from the Broadband website. For example:

$ tar -xzvf bbp_northridge_gf_<version>.tgz
$ tar -xzvf bbp_lomaprieta_gf_<version>.tgz
...

The same procedure should be followed for needed validation packages. Users need to create a top-level directory for all validation packages:

$ cd /home/rcf-104/earthquake
$ mkdir bbp_val
$ cd bbp_val

And then download each validation package from the Broadband website and untar it inside the validation top-level directory:

$ tar -xzvf bbo_northridge_val_<version>.tgz
$ tar -xzvf bbp_lomaprieta_val_<version>.tgz

Configuring Required Environment Variables

Before users can run the Broadband Platform, they need to set up a few environment variables that tell the Platform how to find its components. This step is also shell dependent, and users may want to add these lines to their .cshrc (csh) or .bashrc (bash) in order to avoid having to type them every time they log into the head node to run simulations:

For csh:

setenv BBP_DIR /home/rcf-104/earthquake/bbp
setenv BBP_GF_DIR /home/rcf-104/earthquake/bbp_gf
setenv BBP_VAL_DIR /home/rcf-104/earthquake/bbp_val
setenv PYTHONPATH /home/rcf-104/earthquake/bbp/comps

For bash:

export BBP_DIR /home/rcf-104/earthquake/bbp
export BBP_GF_DIR /home/rcf-104/earthquake/bbp_gf
export BBP_VAL_DIR /home/rcf-104/earthquake/bbp_val
export PYTHONPATH /home/rcf-104/earthquake/bbp/comps

Please note that this example features the path names used in the steps above. Users need to customize these with their actual installation locations.

The step above is needed so users can run Broadband scripts on the head node (the steps in the next section will fail if these variables are not properly set!). Additionally, users need to edit the setup_bbp_env.template file (located inside the utils/batch directory), and change the values for BBP_DIR, BBP_GF_DIR, and BBP_VAL_DIR as described in that file. Once edited, the file should be renamed to setup_bbp_env.sh. This file will be used by worker nodes when running the actual simulations.

Creating Validation Runs

After completing all the steps above, creating validation runs, and running the simulations on HPCC is easy! Users should first create a top-level directory for their simulations:

$ cd /home/rcf-104/earthquake
$ mkdir sims
$ cd sims

The next step is to create the validation runs using the provided bbp_hpcc_validation.py script. This script needs a few parameters, such as codebase to use, event to use for validation, number of realizations to run, a simulation directory where the results will go, and an e-mail address for job/status notifications. For example, to run 8 realizations of the lomap validation using the Graves & Pitarka method, users should type:

$ /home/rcf-104/earthquake/utils/batch/bbp_hpcc_validation.py --codebase gp --event lomap --dir lomap-gp-8 -n 8 --email fsilva@usc.edu

The bbp_hpcc_validation.py script will prepare each realization and at the end will tell the user how to submit the job to the cluster. For example, when the bbp_hpcc_validation.py above finishes, it will print:

Validation run is set up on: /auto/rcf-104/earthquake/sims/lomap-gp-8

To start the validation run, just type: 
$ qsub /auto/rcf-104/earthquake/sims/lomap-gp-8/lomap-gp.pbs

Users should copy-paste the qsub line on their shell to start the validation run on HPCC.

A few things to note:

  • If the simulation directory already exists, the script will ask the user if it should be deleted.
  • The script will allocate 8-core nodes. So, running 1-8 simulations will use 1 node, 9-16 simulations will use 2 nodes, and so on.
  • Each realization will use a different random seed in the rupture generator. Every other simulation parameter remains the same among all realizations
  • When users re-run simulations, the same random seeds are used in order to allow for reproducible results.
  • Users will receive an e-mail on the e-mail address provided when their job begins, and another one when the job finishes.
  • Users should only run 1 simulation at a time in order to be nice to others users of the cluster.