Difference between revisions of "Broadband Platform on HPCC"

From SCECpedia
Jump to navigationJump to search
 
(33 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
Current broadband studies may exceed 200K seismograms. To produce this number of seismograms, we need to use high performance computing.
 
Current broadband studies may exceed 200K seismograms. To produce this number of seismograms, we need to use high performance computing.
 +
 +
This entries contains information that may be useful when running a recent version of broadband platform on USC HPCC system. These instructions work for broadband platform v12.10 and later.
  
 
== Overview ==
 
== Overview ==
  
In order to run the Broadband Platform on HPCC, users need to follow these steps:
+
In order to set up Broadband validation runs on HPCC, users need to follow these steps:
  
#Installing and Building Broadband on HPCC
+
#Install and Build Broadband on HPCC
#Installing Desired Green's Functions and Validation Packages
+
#Install Desired Green's Functions and Validation Packages
#Configuring Required Environment Variables
+
#Configure Required Environment Variables
#Creating Validation Runs and Starting Simulations
+
#Create Validation Runs and Start Simulations
 +
 
 +
Information about the USC HPCC system is available on the [http://www.usc.edu/hpcc/ USC HPCC] web site.
  
 
== Broadband File System Issues ==
 
== Broadband File System Issues ==
  
When running on HPCC, the validation scripts set up Broadband's BBP_DATA_DIR to use a directory in the /tmp filesystem. Because Broadband simulations are very I/O intensive, reading and writing thousands of small/medium files on an average simulation, we use a local filesystem on each compute node to minimize remote reads and writes and thus improve execution time. This approach also avoids creating a bottleneck a file server and eliminates unnecessary network traffic. It is therefore possible for multiple users to run their simulations on HPCC without significant interference.
+
Broadband simulations are very I/O intensive, reading and writing thousands of small/medium files on an average simulation. To manage this I/O load, we use a local filesystem on each compute node to minimize remote reads and writes and thus improve execution time. This approach also avoids creating a bottleneck a file server and eliminates unnecessary network traffic. It is therefore possible for multiple users to run their simulations on HPCC without significant interference.
 +
 
 +
If users run broadband platform (bbp) validation processing on HPCC, the bbp validation scripts set up Broadband's BBP_DATA_DIR to use a directory in the /tmp filesystem.
  
 
Because the /tmp filesystem on each node is automatically cleaned at the end of each simulation, it is necessary to copy all wanted files to a permanent location. The HPCC validation scripts do that automatically after the simulations are finished (but before the PBS job ends).
 
Because the /tmp filesystem on each node is automatically cleaned at the end of each simulation, it is necessary to copy all wanted files to a permanent location. The HPCC validation scripts do that automatically after the simulations are finished (but before the PBS job ends).
  
== Installing and Building BBP on HPCC ==
+
== Install and Build BBP on HPCC ==
 +
 
 +
The SCEC broadband platform should make use of 64bit compute nodes. These are accessible through the hpc-login2.usc.edu head node.
 +
 
 +
Users should log into HPCC's hpc-login2.usc.edu head node and then confirm that they can access use the rcf-104 filesystem for their simulations. Their home directories may not be on this file system.
 +
 +
The rcf-104 filesystem is visible from the head node and from all worker nodes in the cluster. However, it may not be the default file system. To confirm your account is configured to use rcf-104, login into hpc-login2.usc.edu and cd over to rcf-104, like this:
  
Users should log into HPCC's hpc-login2.usc.edu head node and use the rcf-104 filesystem for their simulations. The rcf-104 filesystem is visible from the head node and from all worker nodes in the cluster. The first step in setting up the Broadband Platform on HPCC is to download and build the platform. Users should make sure they have a version of Broadband 12.x.x or greater. as Broadband releases 11.2.3 and earlier cannot be used on the USC HPCC cluster according to these instructions. It is also possible to use the svn version of Broadband (as described in the User Guide), but users should be aware that unreleased Broadband code from svn can change daily and is not recommended for official/paper simulations. After downloading the Broadband package from the website, users need to untar it using the following command:
+
Please replace the username shown as "maechlin" with your HPCC username in the commands given below.
 +
 
 +
<pre>
 +
-bash-3.2$ pwd
 +
/home/rcf-01/maechlin
 +
 
 +
-bash-3.2$ cd /home/rcf-104/maechlin
 +
-bash-3.2$ pwd
 +
/home/rcf-104/maechlin
 +
</pre>
 +
 
 +
If these commands work, and you have access to a directory on /home/rcf-104, please proceed to the next steps. Otherwise, please contact the broadband developers at SCEC who will help you setup your HPCC account as needed.
 +
 
 +
The next step in setting up the Broadband Platform on HPCC is to download and build the platform. Users should make sure they have a version of Broadband 12.x.x or greater. as Broadband releases 11.2.3 and earlier cannot be used on the USC HPCC cluster according to these instructions.  
 +
 
 +
It is also possible to use the svn version of Broadband (as described in the User Guide), but users should be aware that unreleased Broadband code from svn can change daily and is not recommended for official/paper simulations.
 +
 
 +
In this case, let us assume a user has downloaded the tgz file. After downloading the Broadband package from the website, users need to untar it using the following command:
  
 
<pre>
 
<pre>
Line 24: Line 53:
 
</pre>
 
</pre>
  
Before compiling Broadband, users will need to set up their environments. Depending on the shell employed, users will need too add the following lines:
+
== Setup HPCC environment ==
 +
 
 +
Before compiling Broadband, users will need to set up their HPCC computing environments. This is done by setting environment variables in their .bashrc or .login files. These files are typically in the users home directory (and probably not in your rcf-104 directory).
 +
 
 +
Depending on the shell employed, users will need too add the following lines:
 +
 
 +
bash -- add the following lines to the .bashrc file
 +
<pre>
 +
# Setup for running broadband
 +
source /usr/usc/gnu/gcc/default/setup.sh
 +
source /usr/usc/intel/10.0/setup.sh
 +
source /home/scec-00/opt/Python-2.6.2/setup.sh
 +
source /usr/usc/matlab/default/setup.sh
 +
</pre>
  
 
csh -- add the following lines to the .cshrc file
 
csh -- add the following lines to the .cshrc file
Line 35: Line 77:
 
</pre>   
 
</pre>   
  
bash -- add the following lines to the .bashrc file
+
It may be necessary to logout and login back again for these changes to be incorporated in the user environment (alternatively, users can source the changed file to force the changes to take effect immediately).  
<pre>
+
 
# Setup for running broadband
+
== Compiler Settings ==
source /usr/usc/gnu/gcc/default/setup.sh
 
source /usr/usc/intel/10.0/setup.sh
 
source /home/scec-00/opt/Python-2.6.2/setup.sh
 
source /usr/usc/matlab/default/setup.sh
 
</pre>
 
  
It may be necessary to logout and login back again for these changes to be incorporated in the user environment (alternatively, users can source the changed file to force the changes to take effect immediately). To make sure the correct compilers are set up, users can type the following commands:
+
Some programs in the broadband platform must be compiled before they can be used. To make sure the correct compilers are set up, users can type the following commands:
  
 
<pre>
 
<pre>
Line 57: Line 94:
 
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
</pre>
 
</pre>
 +
 +
== Building the Broadband Platform ==
  
 
Once the environment is configured with the proper compilers, users can build the Broadband Platform. Assuming the platform is installed in the /home/rcf-104/earthquake/bbp directory, users should do the following to build the Broadband Platform:
 
Once the environment is configured with the proper compilers, users can build the Broadband Platform. Assuming the platform is installed in the /home/rcf-104/earthquake/bbp directory, users should do the following to build the Broadband Platform:
Line 67: Line 106:
 
If all compilers were properly added to the user's path, the code will start compiling. This process can take a while, and users may encounter some "build warnings", which are fine. If compilation errors are found, the problem needs to be investigated further.
 
If all compilers were properly added to the user's path, the code will start compiling. This process can take a while, and users may encounter some "build warnings", which are fine. If compilation errors are found, the problem needs to be investigated further.
  
== Installing Desired Green's Functions and Validation Packages ==
+
== Install Desired Green's Functions and Validation Packages ==
  
 
In order to run simulations, users will need to download and install one or more velocity models/Green's Functions. Validation packages are only required for historical simulations/validation runs. The first step is to create a top-level directory where all Green's Functions packages will reside.
 
In order to run simulations, users will need to download and install one or more velocity models/Green's Functions. Validation packages are only required for historical simulations/validation runs. The first step is to create a top-level directory where all Green's Functions packages will reside.
Line 80: Line 119:
  
 
<pre>
 
<pre>
$ tar -xzvf bbp_northridge_gf_<version>.tgz
+
gunzip < file.tar.gz | tar xvf -
$ tar -xzvf bbp_lomaprieta_gf_<version>.tgz
+
$ tar -xzvf bbp_northridge_gf_<version>.gz
 +
$ tar -xzvf bbp_lomaprieta_gf_<version>.gz
 
...
 
...
 
</pre>
 
</pre>
Line 99: Line 139:
 
$ tar -xzvf bbp_lomaprieta_val_<version>.tgz
 
$ tar -xzvf bbp_lomaprieta_val_<version>.tgz
 
</pre>
 
</pre>
+
 
 
== Configuring Required Environment Variables ==
 
== Configuring Required Environment Variables ==
  
Line 114: Line 154:
 
For bash:
 
For bash:
  
export BBP_DIR /home/rcf-104/earthquake/bbp
+
export BBP_DIR=/home/rcf-104/earthquake/bbp
export BBP_GF_DIR /home/rcf-104/earthquake/bbp_gf
+
export BBP_GF_DIR=/home/rcf-104/earthquake/bbp_gf
export BBP_VAL_DIR /home/rcf-104/earthquake/bbp_val
+
export BBP_VAL_DIR=/home/rcf-104/earthquake/bbp_val
export PYTHONPATH /home/rcf-104/earthquake/bbp/comps
+
export PYTHONPATH=/home/rcf-104/earthquake/bbp/comps
 
</pre>
 
</pre>
  
 
Please note that this example features the path names used in the steps above. Users need to customize these with their actual installation locations.
 
Please note that this example features the path names used in the steps above. Users need to customize these with their actual installation locations.
  
The step above is needed so users can run Broadband scripts on the head node (the steps in the next section will fail if these variables are not properly set!). Additionally, users need to edit the setup_bbp_env.template file (located inside the utils/batch directory), and change the values for BBP_DIR, BBP_GF_DIR, and BBP_VAL_DIR as described in that file. Once edited, the file should be renamed to setup_bbp_env.sh. This file will be used by worker nodes when running the actual simulations.  
+
The step above is needed so users can run Broadband scripts on the head node (the steps in the next section will fail if these variables are not properly set!).
+
 
== Creating Validation Runs and Starting the Simulations==
+
== Run Unit Tests ==
 +
Confirm that the system is configured properly (so far) by moving to the bbp/tests directory and running the unit tests like this:
 +
<pre>
 +
./UnitTests.py
 +
</pre>
 +
If the UnitTests run without errors, the software was built correctly and the environment variables are setup correctly. It is important for your installation to pass UnitTests before going onto the next stages.
 +
 
 +
== Edit setup_bbp_env.template ==
 +
 
 +
Additionally, users need to edit the setup_bbp_env.template file (located inside the utils/batch directory), and change the values for BBP_DIR, BBP_GF_DIR, and BBP_VAL_DIR as described in that file. Once edited, the file should be renamed to setup_bbp_env.sh. This file will be used by worker nodes when running the actual simulations.
 +
 
 +
== Create Simulations Directory==
 +
 
 +
After completing all the steps above, there are a few additional steps that involve creating validation runs, and starting the simulations on HPCC.
  
After completing all the steps above, creating validation runs, and starting the simulations on HPCC is easy! Users should first create a top-level directory for their simulations:
+
Users should first create a top-level directory for their simulations. This directory should be on a filesystem that has enough space to contain both input and output results. When results are calculated on the cluster, they are returned to this directory for review and analysis.
  
 
<pre>
 
<pre>
Line 134: Line 187:
 
</pre>
 
</pre>
  
The next step is to create the validation runs using the provided bbp_hpcc_validation.py script. This script needs a few parameters, such as codebase to use, event to use for validation, number of realizations to run, a simulation directory where the results will go, and an e-mail address for job/status notifications. For example, to run 8 realizations of the lomap validation using the Graves & Pitarka method, users should type:  
+
== Create Validation Runs and Start the Simulations==
 +
The next step is to create the validation runs using the provided bbp_hpcc_validation.py script which is found int he bbp/utils/batch directory.
 +
 
 +
This script needs a few parameters, such as codebase to use, event to use for validation, number of realizations to run, a simulation directory where the results will go, and an e-mail address for job/status notifications. For example, to run 8 realizations of the lomap validation using the Graves & Pitarka method, users should type:  
  
 
<pre>
 
<pre>
Line 151: Line 207:
 
Users should copy-paste the qsub line on their shell to start the validation run on HPCC.
 
Users should copy-paste the qsub line on their shell to start the validation run on HPCC.
  
A few things to note:
+
== Important Notes ==
  
* If the simulation directory already exists, the script will ask the user if it should be deleted.
+
* The simulation directory provided to the bbp_hpcc_validation.py script should not exist. If it does, the script will ask the user if it should be deleted.
* The script will allocate 8-core nodes. So, running 1-8 simulations will use 1 node, 9-16 simulations will use 2 nodes, and so on.
+
* The script will allocate 8-core nodes on HPCC. Running 1-8 simulations will use 1 node, 9-16 simulations will use 2 nodes, and so on.
* Each realization will use a different random seed in the rupture generator. Every other simulation parameter remains the same among all realizations
+
* Each realization will use a different random seed for the rupture generator. All other simulation parameters remain the same among all realizations
* When users re-run simulations, the same random seeds are used in order to allow for reproducible results.
+
* When users re-run simulations, the same random seeds are used in order to allow for reproducible results.
* Users will receive an e-mail on the e-mail address provided when their job begins, and another one when the job finishes.
+
* Some validation packages include a SRF file instead of a source description (SRC file). In these cases the script cannot generate multiple realizations as the rupture is already defined. Users should invoke the bbp_hpcc_validation.py script with the --skip-rupgen option, which implies that only a single realization will run.
 +
* Users will receive an e-mail at the e-mail address provided when their job begins, and another one when the job finishes.
 
* Users should only run 1 simulation at a time in order to be nice to others users of the cluster.
 
* Users should only run 1 simulation at a time in order to be nice to others users of the cluster.

Latest revision as of 16:33, 27 September 2012

Current broadband studies may exceed 200K seismograms. To produce this number of seismograms, we need to use high performance computing.

This entries contains information that may be useful when running a recent version of broadband platform on USC HPCC system. These instructions work for broadband platform v12.10 and later.

Overview

In order to set up Broadband validation runs on HPCC, users need to follow these steps:

  1. Install and Build Broadband on HPCC
  2. Install Desired Green's Functions and Validation Packages
  3. Configure Required Environment Variables
  4. Create Validation Runs and Start Simulations

Information about the USC HPCC system is available on the USC HPCC web site.

Broadband File System Issues

Broadband simulations are very I/O intensive, reading and writing thousands of small/medium files on an average simulation. To manage this I/O load, we use a local filesystem on each compute node to minimize remote reads and writes and thus improve execution time. This approach also avoids creating a bottleneck a file server and eliminates unnecessary network traffic. It is therefore possible for multiple users to run their simulations on HPCC without significant interference.

If users run broadband platform (bbp) validation processing on HPCC, the bbp validation scripts set up Broadband's BBP_DATA_DIR to use a directory in the /tmp filesystem.

Because the /tmp filesystem on each node is automatically cleaned at the end of each simulation, it is necessary to copy all wanted files to a permanent location. The HPCC validation scripts do that automatically after the simulations are finished (but before the PBS job ends).

Install and Build BBP on HPCC

The SCEC broadband platform should make use of 64bit compute nodes. These are accessible through the hpc-login2.usc.edu head node.

Users should log into HPCC's hpc-login2.usc.edu head node and then confirm that they can access use the rcf-104 filesystem for their simulations. Their home directories may not be on this file system.

The rcf-104 filesystem is visible from the head node and from all worker nodes in the cluster. However, it may not be the default file system. To confirm your account is configured to use rcf-104, login into hpc-login2.usc.edu and cd over to rcf-104, like this:

Please replace the username shown as "maechlin" with your HPCC username in the commands given below.

-bash-3.2$ pwd
/home/rcf-01/maechlin

-bash-3.2$ cd /home/rcf-104/maechlin
-bash-3.2$ pwd
/home/rcf-104/maechlin

If these commands work, and you have access to a directory on /home/rcf-104, please proceed to the next steps. Otherwise, please contact the broadband developers at SCEC who will help you setup your HPCC account as needed.

The next step in setting up the Broadband Platform on HPCC is to download and build the platform. Users should make sure they have a version of Broadband 12.x.x or greater. as Broadband releases 11.2.3 and earlier cannot be used on the USC HPCC cluster according to these instructions.

It is also possible to use the svn version of Broadband (as described in the User Guide), but users should be aware that unreleased Broadband code from svn can change daily and is not recommended for official/paper simulations.

In this case, let us assume a user has downloaded the tgz file. After downloading the Broadband package from the website, users need to untar it using the following command:

$ tar -xzvf bbp_dist_<version>.tgz

Setup HPCC environment

Before compiling Broadband, users will need to set up their HPCC computing environments. This is done by setting environment variables in their .bashrc or .login files. These files are typically in the users home directory (and probably not in your rcf-104 directory).

Depending on the shell employed, users will need too add the following lines:

bash -- add the following lines to the .bashrc file

# Setup for running broadband
source /usr/usc/gnu/gcc/default/setup.sh
source /usr/usc/intel/10.0/setup.sh
source /home/scec-00/opt/Python-2.6.2/setup.sh
source /usr/usc/matlab/default/setup.sh

csh -- add the following lines to the .cshrc file

# Setup for running broadband
source /usr/usc/gnu/gcc/default/setup.csh
source /usr/usc/intel/10.0/setup.csh
source /home/scec-00/opt/Python-2.6.2/setup.csh
source /usr/usc/matlab/default/setup.csh

It may be necessary to logout and login back again for these changes to be incorporated in the user environment (alternatively, users can source the changed file to force the changes to take effect immediately).

Compiler Settings

Some programs in the broadband platform must be compiled before they can be used. To make sure the correct compilers are set up, users can type the following commands:

$ ifort --version
ifort (IFORT) 10.0 20070426
Copyright (C) 1985-2007 Intel Corporation.  All rights reserved.

$ gcc --version
gcc (GCC) 4.3.3
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Building the Broadband Platform

Once the environment is configured with the proper compilers, users can build the Broadband Platform. Assuming the platform is installed in the /home/rcf-104/earthquake/bbp directory, users should do the following to build the Broadband Platform:

$ cd /home/rcf-104/earthquake/bbp/src
$ make

If all compilers were properly added to the user's path, the code will start compiling. This process can take a while, and users may encounter some "build warnings", which are fine. If compilation errors are found, the problem needs to be investigated further.

Install Desired Green's Functions and Validation Packages

In order to run simulations, users will need to download and install one or more velocity models/Green's Functions. Validation packages are only required for historical simulations/validation runs. The first step is to create a top-level directory where all Green's Functions packages will reside.

$ cd /home/rcf-104/earthquake
$ mkdir bbp_gf
$ cd bbp_gf

Then, users need to untar inside the Green's Functions top-level directory each Green's Functions package downloaded from the Broadband website. For example:

gunzip < file.tar.gz | tar xvf -
$ tar -xzvf bbp_northridge_gf_<version>.gz
$ tar -xzvf bbp_lomaprieta_gf_<version>.gz
...

The same procedure should be followed for needed validation packages. Users need to create a top-level directory for all validation packages:

$ cd /home/rcf-104/earthquake
$ mkdir bbp_val
$ cd bbp_val

And then download each validation package from the Broadband website and untar it inside the validation top-level directory:

$ tar -xzvf bbo_northridge_val_<version>.tgz
$ tar -xzvf bbp_lomaprieta_val_<version>.tgz

Configuring Required Environment Variables

Before users can run the Broadband Platform, they need to set up a few environment variables that tell the Platform how to find its components. This step is also shell dependent, and users may want to add these lines to their .cshrc (csh) or .bashrc (bash) in order to avoid having to type them every time they log into the head node to run simulations:

For csh:

setenv BBP_DIR /home/rcf-104/earthquake/bbp
setenv BBP_GF_DIR /home/rcf-104/earthquake/bbp_gf
setenv BBP_VAL_DIR /home/rcf-104/earthquake/bbp_val
setenv PYTHONPATH /home/rcf-104/earthquake/bbp/comps

For bash:

export BBP_DIR=/home/rcf-104/earthquake/bbp
export BBP_GF_DIR=/home/rcf-104/earthquake/bbp_gf
export BBP_VAL_DIR=/home/rcf-104/earthquake/bbp_val
export PYTHONPATH=/home/rcf-104/earthquake/bbp/comps

Please note that this example features the path names used in the steps above. Users need to customize these with their actual installation locations.

The step above is needed so users can run Broadband scripts on the head node (the steps in the next section will fail if these variables are not properly set!).

Run Unit Tests

Confirm that the system is configured properly (so far) by moving to the bbp/tests directory and running the unit tests like this:

./UnitTests.py

If the UnitTests run without errors, the software was built correctly and the environment variables are setup correctly. It is important for your installation to pass UnitTests before going onto the next stages.

Edit setup_bbp_env.template

Additionally, users need to edit the setup_bbp_env.template file (located inside the utils/batch directory), and change the values for BBP_DIR, BBP_GF_DIR, and BBP_VAL_DIR as described in that file. Once edited, the file should be renamed to setup_bbp_env.sh. This file will be used by worker nodes when running the actual simulations.

Create Simulations Directory

After completing all the steps above, there are a few additional steps that involve creating validation runs, and starting the simulations on HPCC.

Users should first create a top-level directory for their simulations. This directory should be on a filesystem that has enough space to contain both input and output results. When results are calculated on the cluster, they are returned to this directory for review and analysis.

$ cd /home/rcf-104/earthquake
$ mkdir sims
$ cd sims

Create Validation Runs and Start the Simulations

The next step is to create the validation runs using the provided bbp_hpcc_validation.py script which is found int he bbp/utils/batch directory.

This script needs a few parameters, such as codebase to use, event to use for validation, number of realizations to run, a simulation directory where the results will go, and an e-mail address for job/status notifications. For example, to run 8 realizations of the lomap validation using the Graves & Pitarka method, users should type:

$ /home/rcf-104/earthquake/utils/batch/bbp_hpcc_validation.py --codebase gp --event lomap --dir lomap-gp-8 -n 8 --email fsilva@usc.edu

The bbp_hpcc_validation.py script will prepare each realization and at the end will tell the user how to submit the job to the cluster. For example, when the bbp_hpcc_validation.py above finishes, it will print:

Validation run is set up on: /auto/rcf-104/earthquake/sims/lomap-gp-8

To start the validation run, just type: 
$ qsub /auto/rcf-104/earthquake/sims/lomap-gp-8/lomap-gp.pbs

Users should copy-paste the qsub line on their shell to start the validation run on HPCC.

Important Notes

  • The simulation directory provided to the bbp_hpcc_validation.py script should not exist. If it does, the script will ask the user if it should be deleted.
  • The script will allocate 8-core nodes on HPCC. Running 1-8 simulations will use 1 node, 9-16 simulations will use 2 nodes, and so on.
  • Each realization will use a different random seed for the rupture generator. All other simulation parameters remain the same among all realizations
  • When users re-run simulations, the same random seeds are used in order to allow for reproducible results.
  • Some validation packages include a SRF file instead of a source description (SRC file). In these cases the script cannot generate multiple realizations as the rupture is already defined. Users should invoke the bbp_hpcc_validation.py script with the --skip-rupgen option, which implies that only a single realization will run.
  • Users will receive an e-mail at the e-mail address provided when their job begins, and another one when the job finishes.
  • Users should only run 1 simulation at a time in order to be nice to others users of the cluster.