Difference between revisions of "CyberShake Code Base"

From SCECpedia
Jump to navigationJump to search
Line 508: Line 508:
 
Used by: PreSGT
 
Used by: PreSGT
  
=== SgtCoords ===
+
=== SGT Coordinate files ===
 +
 
 +
There are two formats for the list of points to save SGTs for, one for Rob's codes and one for AWP-ODC.  As with other coordinate transformations between the two systems, to convert X and Y offsets from RWG to AWP you have to flip the X and Y and add 1 to each, since RWG is 0-indexed and AWP is 1-indexed.
 +
 
 +
==== SgtCoords ====
  
 
Purpose: List of all the points to save SGTs for.
 
Purpose: List of all the points to save SGTs for.
Line 528: Line 532:
  
 
Used by: PreSGT, PreAWP
 
Used by: PreSGT, PreAWP
 +
 +
==== AWP cordfile ====
 +
 +
Purpose: List of SGT points to save in a format usable by AWP-ODC-SGT.
 +
 +
Filename convention: awp.<site>.cordfile
 +
 +
Format: Remember that X and Y are flipped and have 1 added from RWG.  The points are sorted by Y, then X, then Z.
 +
<pre>
 +
<number of points>
 +
<X coordinate> <Y coordinate> <Z coordinate>
 +
...
 +
</pre>
 +
 +
Generated by: PreAWP
 +
 +
Used by:
  
 
=== Impulse source descriptions ===
 
=== Impulse source descriptions ===

Revision as of 17:12, 20 October 2017

This page details all the pieces of code which make up the CyberShake code base, as of November 2017. Note that this does not include the workflow middleware, or the workflow generators; that code is detailed at CyberShake Workflow Framework.

Conceptually, we can divide up the CyberShake codes into three categories:

  1. Strain Green Tensor-related codes: These codes produce the input files needed to generate SGTs, actually calculate the SGTs, and do some reformatting and sanity checks on the results.
  2. Synthesis-related codes: These codes take the SGTs and perform seismogram synthesis and intensity measure calculations.
  3. Data product codes: These codes insert the results into the database, and use the database to generate a variety of output data products.

Below is a description of each piece of software we use, organized by these categories. For each piece of software, we include a description of where it is located, how to compile and use it, and what its inputs and outputs are. At the end, we provide a description of input and output files and formats.

Code Installation

Historically, we have selected a root directory for CyberShake, then created the subdirectories 'software' for all the code, 'ruptures' for the rupture files, and 'utils' for workflow tools. Each code listed below, along with the configuration file, should be checked out into the 'software' subdirectory.

Configuration file

Many CyberShake codes use a configuration file, which specifies the root directory for the CyberShake installation, the command use to start an MPI executable, paths to a tmp and scratch space (which can be the same), and the path to the CyberShake rupture directory. We have done this instead of environment variables because it's more transparent and easier for multiple users. Both of these files should be stored in the 'software' subdirectory.

The configuration file is available at:

http://source.usc.edu/svn/cybershake/import/trunk/cybershake.cfg

Obviously, this file must be edited to be correct for the install.

Additionally, you must check out a Python script which is used to read in the configuration file and deliver it as key-value pairs, located here:

http://source.usc.edu/svn/cybershake/import/trunk/config.py

Several CyberShake codes import config, then use it to read out the cybershake.cfg file.

SGT-related codes

Overview of the codes involved in the SGT part of CyberShake, source file (ODG)

PreCVM

This code stands for "Pre-Community-Velocity-Model". It has to be run before the UCVM codes, since it generates input files required by UCVM.

Purpose: To determine the simulation volume for a particular CyberShake site.

Detailed description: PreCVM queries the CyberShake database to determine all of the ruptures which fall within a given cutoff for a certain site. From that information, padding is added around the edges to construct the CyberShake simulation volume for this site. Additional padding so the X and Y dimensions are multiples of 10, 20, or 40 might also be applied, depending on the input parameters. Using this volume, both the X/Y offset of each grid point, and then the latitude and longitude using a great circle projection, are determined and written to output files.

Needs to be changed if:

  1. The CyberShake volume depth needs to be changed, so as to have the right number of grid points. That is set in the genGrid() function in GenGrid_py/gen_grid.py.
  2. X and Y padding needs to be altered. That is set using 'bound_pad' in Modelbox/get_modelbox.py, around line 70.
  3. The rotation of the simulation volume needs to be changed. That is set using 'model_rot' in Modelbox/get_modelbox.py, around line 70.
  4. The database access parameters have changed. That's in Modelbox/get_modelbox.py, around line 80.
  5. The divisibility needs for GPU simulations change (currently, we need the dimensions to be evenly divisible by the number of GPUs used in that dimension. That is in Modelbox/get_modelbox.py, around line 250.

Source code location: http://source.usc.edu/svn/cybershake/import/trunk/PreCVM/

Author: Rob Graves, wrapped by Scott Callaghan

Dependencies: Getpar, MySQLdb for Python

Executable chain:

 pre_cvm.py
   Modelbox/get_modelbox.py
     Modelbox/bin/gcproj
   GenGrid_py/gen_grid.py
     GenGrid_py/bin/gen_model_cords

Compile instructions:Run 'make' in the Modelbox/src and the Getpar_py/src directories.

Usage:

Usage: pre_cvm.py [options]
  Options:
  -h, --help            show this help message and exit
  --site=SITE           Site name
  --erf_id=ERF_ID       ERF ID
  --modelbox=MODELBOX   Path to modelbox file (output)
  --gridfile=GRIDFILE   Path to gridfile (output)
  --gridout=GRIDOUT     Path to gridout (output)
  --coordfile=COORDSFILE
                        Path to coorfile (output)
  --paramsfile=PARAMSFILE
                        Path to paramsfile (output)
  --boundsfile=BOUNDSFILE
                        Path to boundsfile (output)
  --frequency=FREQUENCY
                        Frequency
  --gpu                 Use GPU box settings.
  --spacing=SPACING     Override default spacing with this value.
  --server=SERVER       Address of server to query in creating modelbox,
                        default is focal.usc.edu.

Typical run configuration: Serial; requires 6 minutes for 100m spacing, 10 billion point volume

Input files: None; inputs are retrieved from the database

Output files: modelbox, gridfile, gridout, params, coord, bounds

UCVM

Purpose: To generate a populated velocity mesh for a CyberShake simulation volume.

Detailed description: UCVM takes the volume defined by PreCVM and queries the UCVM software to populate the volume. The resulting mesh is then checked for Vp/Vs ratio, minimum Vp/Vs/rho, and for no Infs or NaNs. The data is outputted in either Graves (RWG) format or AWP format.

Needs to be changed if:

  1. New velocity models are added. Velocity models are specified in the DAX and passed through the wrapper scripts into the C code and then ultimately to UCVM, so an if statement must be added to around line 250 (and around line 450 if it's applicable for no GTL).
  2. The backend UCVM substantially changes. If we move to the Python implementation, for example.

Source code location: http://source.usc.edu/svn/cybershake/import/trunk/UCVM

Author: Scott Callaghan

Dependencies: Getpar, UCVM

Executable chain:

 single_exe.py
   single_csh.py
     bin/ucvm-single-mpi

Compile instructions:Run 'make' in the UCVM/src directory.

Usage:

All of site, gridout, modelcords, models, and format must be specified.
Usage: single_exe.py [options]

Options:
  -h, --help            show this help message and exit
  --site=SITE           Site name
  --gridout=GRIDOUT     Path to gridout (output)
  --coordfile=COORDSFILE
                        Path to coordfile (output)
  --models=MODELS       Comma-separated string on velocity models to use.
  --format=FORMAT       Specify awp or rwg format for output.
  --frequency=FREQUENCY
                        Frequency
  --spacing=SPACING     Override default spacing with this value (km)
  --min_vs=MIN_VS       Override minimum Vs value.  Minimum Vp and minimum
                        density will be 3.4 times this value.

Typical run configuration: Parallel on ~4000 cores; for 10 billion points and the C version of UCVM, takes about 20 minutes. Typically only half the cores per node are used to get more memory per process.

Input files: gridout, coords

Output files: either RWG format or AWP format, depending on the option selected.

Smoothing

Purpose: To smooth a velocity file along model interfaces.

Detailed description: The smoothing code takes in a velocity mesh, determines the surface coordinates of the interfaces between velocity models, gets a list of all the points which need to be smoothed, and then performs the smoothing by averaging in both the X and Y direction for a user-specified number of points (default of 10km in each direction).

Needs to be changed if:

  1. We change our version of UCVM. The LD_LIBRARY_PATH needs to be modified, in run_smoothing.py around line 98.
  2. The smoothing algorithm is modified. Currently that is specified in the average_point() function in smooth_mpi.c.
  3. We start using velocity models with boundaries aren't perpendicular to the earth's surface.

Source code location: http://source.usc.edu/svn/cybershake/import/trunk/UCVM/smoothing

Author: Scott Callaghan

Dependencies: UCVM

Executable chain:

 smoothing/run_smoothing.py
   bin/determine_surface_model
   smoothing/determine_smoothing_points.py
   smoothing/smooth_mpi

Compile instructions:Run 'make' in the smoothing directory, and make sure that direct_surface_model has been compiled in the UCVM/src directory.

Usage:

Usage: run_smoothing.py [options]

Options:
  -h, --help            show this help message and exit
  --gridout=GRIDOUT     gridout file
  --coords=COORDS       coords file
  --models=MODELSTRING  comma-separated list of velocity models
  --smoothing-dist=SMOOTHING_DIST
                        Number of grid points to smooth over.  About 10km of
                        grid points is a good starting place.
  --mesh=MESH           AWP-format velocity mesh to smooth
  --mesh-out=MESH_OUT   Output smoothed mesh

Typical run configuration: Parallel on ~1500 cores; for 5 billion points and the C version of UCVM, takes about 16 minutes.

Input files: AWP format velocity file, gridout, coord

Output files: AWP format smoothed velocity file.

PreSGT

Purpose: To generate a series of input files which are used by the wave propagation codes.

Detailed description: PreSGT determines the X and Y coordinates of the site location (where the impulse will go for the wave propagation simulation) and determines, which mesh point (X and Y) maps most closely to every point on a fault surface which is within the cutoff. That information is combined with an adaptive mesh approach to create a list of all the points for which SGTs should be saved.

Needs to be changed if:

  1. We change our approach for saving adaptive mesh points.
  2. We switch to RSQSim ruptures, or other ruptures in which the geometry isn't planar. Modifications would be required to gen_sgtgrid.c.

Source code location: http://source.usc.edu/svn/cybershake/import/trunk/PreSGT

Author: Rob Graves, heavily modified by Scott Callaghan

Dependencies: Getpar, libcfu, MySQLdb for Python

Executable chain:

 presgt.py
   faultlist_py/CreateFaultList.py
   bin/gen_sgtgrid

Compile instructions:Run 'make' in the src directory.

Usage:

Usage: ./presgt.py <site> <erf_id> <modelbox> <gridout> <model_coords> <fdloc> <faultlist> <radiusfile> <sgtcords> <spacing> [frequency]
Example: ./presgt.py USC 33 USC.modelbox gridout_USC model_coords_GC_USC USC.fdloc USC.faultlist USC.radiusfile USC.cordfile 200.0 0.1

Typical run configuration: Parallel on 8 nodes, 32 cores (gen_sgtgrid is a parallel code); for 200m spacing UCERF2, takes about 8 minutes.

Input files: modelbox, gridout, coord

Output files: fdloc, faultlist, radiusfile, sgtcoords.

PreAWP

Purpose: To generate input files in a format that AWP-ODC expects.

Detailed description: PreAWP uses the input files to produce an IN3D parameter file, a file with the SGT coordinates to save, and velocity file in the right format (if it isn't already). Striping for the output file is also set up here. Note that slightly different versions of this exist for the CPU and GPU implementations of AWP-ODC-SGT.

Needs to be changed if:

  1. The AWP code changes its input format.

Source code location: http://source.usc.edu/svn/cybershake/import/trunk/AWP-GPU-SGT/utils/ (GPU) or http://source.usc.edu/svn/cybershake/import/trunk/AWP-ODC-SGT/utils/ (CPU)

Author: Scott Callaghan

Dependencies: SgtHead

Executable chain:

 build_awp_inputs.py
   build_IN3D.py
   build_src.py
   build_cordfile.py
     SgtHead/gen_awp_cordfile.py
   build_media.py
     SgtHead/bin/reformat_velocity

Compile instructions:Run 'make' in the SgtHead/src directory.

Usage:

Usage: build_awp_inputs.py [options]

Options:
  -h, --help            show this help message and exit
  --site=SITE           Site name
  --gridout=GRIDOUT     Path to gridout input file
  --fdloc=FDLOC         Path to fdloc input file
  --cordfile=CORDFILE   Path to cordfile input file
  --velocity-prefix=VEL_PREFIX
                        RWG velocity prefix.  If omitted, will not reformat
                        velocity file, just symlink.
  --frequency=FREQUENCY
                        Frequency of SGT run, 0.5 Hz by default.
  --px=PX               Number of processors in X-direction.
  --py=PY               Number of processors in Y-direction.
  --pz=PZ               Number of processors in Z-direction.
  --source-frequency=SOURCE_FREQ
                        Low-pass filter frequency to use on the source,
                        default is same frequency as the frequency of the run.
  --spacing=SPACING     Override default spacing, derived from frequency.
  --velocity-mesh=VEL_MESH
                        Provide path to velocity mesh.  If omitted, will
                        assume mesh is named awp.<site>.media.

Typical run configuration: Serial; for 1 Hz run, takes about 11 minutes.

Input files: gridout, fdloc, cordfile, velocity mesh (if in RWG format, will be converted to AWP), RWG source

Output files: IN3D, AWP source, AWP velocity mesh, AWP cordfile.

File types

Modelbox

Purpose: Contains a description of the simulation box, at the surface.

Filename convention: <site>.modelbox

Format:

<site name>
APPROXIMATE CENTROID:
  clon= <centroid lon> clat =<centroid lat>
MODEL PARAMETERS:
  mlon= <model lon> mlat =<model lat> mrot=<model rot, default -55> xlen= <x-length in km> ylen= <y-length in km>
MODEL CORNERS:
  <lon 1> <lat 1> (x= 0.000 y= 0.000)
  <lon 2> <lat 2> (x= <max x> y= 0.000)
  <lon 3> <lat 3> (x= <max x> y= <max y>)
  <lon 4> <lat 4> (x= 0.000 y= <max y>)

Generated by: PreCVM

Used by: PreSGT

Gridfile

Purpose: Specify the three dimensions, and gridspacing in each dimension, of the volume.

Filename convention: gridfile_<site>

Format:

xlen=<x-length in km>
   0.0  <x-length>  <grid spacing in km>
ylen=<y-length in km>
   0.0  <y-length>  <grid spacing in km>
zlen=<z-length in km>
   0.0  <z-length>  <grid spacing in km>

Gridout

Purpose: Specify the km offsets for each grid index, in X, Y, and Z, from the upper southwest corner.

Filename convention: gridout_<site>

Format:

xlen=<x-length in km>
nx=<number of gridpoints in X direction>
  0   0   <grid spacing>
  1   <grid spacing>  <grid spacing>
  2   <2*grid spacing> <grid spacing>
  3   <3*grid spacing> <grid spacing>
...
  nx-1 <(nx-1)*grid spacing> <grid spacing>
ylen=<y-length in km>
ny=<number of gridpoints in Y direction>
  0   0   <grid spacing>
  1   <grid spacing>  <grid spacing>
...
  ny-1 <(ny-1)*grid spacing> <grid spacing>
zlen=<z-length in km>
nz=<number of gridpoints in Z direction>
  0   0   <grid spacing>
  1   <grid spacing>  <grid spacing>
...
  nz-1 <(nz-1)*grid spacing> <grid spacing>

Generated by: PreCVM

Used by: UCVM, smoothing, PreSGT, PreAWP

Params

Purpose: Succinctly specify the parameters for the CyberShake volume. Similar information to the modelbox file, but in a different format.

Filename convention: model_params_GC_<site> (GC stands for 'great circle', the projection we use).

Format:

Model origin coordinates:
 lon= <model lon> lat=   <model lat> rotate=  <model rotation, default -55>

Model origin shift (cartesian vs. geographic):
 xshift(km)=   <x shift, usually half the x-length minus 1 grid spacing> yshift(km)=   <y-shift, usually half the y-length minus 1 grid spacing>

Model corners:
 c1= <nw lon>   <nw lat>
 c2= <ne lon>   <ne lat>
 c3= <se lon>   <se lat>
 c4= <sw lon>   <sw lat>

Model Dimensions:
 xlen=   <x-length> km
 ylen=   <y-length> km
 zlen=   <z-length> km

Generated by: PreCVM

Used by:

Coord

Purpose: Specify the mapping of latitude and longitude to X and Y offsets, for each point on the surface.

Filename convention: model_coords_GC_<site> (GC stands for 'great circle', the projection we use).

Format:

<lon> <lat> 0 0
<lon> <lat> 1 0
<lon> <lat> 2 0
...
<lon> <lat> <nx-1> 0
<lon> <lat> 0 1
...
<lon> <lat> <nx-1> 1
...
<lon> <lat> <nx-1> <ny-1>

Generated by: PreCVM

Used by: UCVM, smoothing, PreSGT

Bounds

Purpose: Specify the mapping of latitude and longitude to X and Y offsets, but only for the points along the boundary. A subset of the coord file.

Filename convention: model_bounds_GC_<site> (GC stands for 'great circle', the projection we use).

Format:

<lon> <lat> 0 0
<lon> <lat> 1 0
<lon> <lat> 2 0
...
<lon> <lat> <nx-1> 0
<lon> <lat> 0 1
<lon> <lat> <nx-1> 1
<lon> <lat> 0 2
<lon> <lat> <nx-1> 2
...
<lon> <lat> 0 <ny-1>
<lon> <lat> 1 <ny-1>
...
<lon> <lat> <nx-1> <ny-1>

Generated by: PreCVM

Used by:

Velocity files

RWG format

Purpose: Input velocity files for the RWG wave propagation code, emod3d.

Filename convention: v_sgt-<site>.<p, s, or d>

Format: 3 files, one each for Vp (*.p), Vs (*.s), and rho (*.d). Each is binary, with 4-byte floats, in fast X, Z (surface down), slow Y order.

Generated by: UCVM

Used by: PreAWP

AWP format

Purpose: Input velocity file for the AWP-ODC wave propagation code.

Filename convention: awp.<site>.media

Format: Binary, with 4-byte floats, in fast Y, X, slow Z (surface down) order.

Generated by: UCVM

Used by: Smoothing, PreAWP

Fdloc

Purpose: Coordinates of the site, in X Y grid indices, and therefore the coordinates where the SGT impulse should be placed.

Filename convention: <site>.fdloc

Format:

<X grid index of site> <Y grid index of site>

Generated by: PreSGT

Used by: PreAWP

Faultlist

Purpose: List of paths to all the rupture geometry files for all ruptures which are within the cutoff for this site. Used to produce a list of points to save SGTs for.

Filename convention: <site>.faultlist

Format:

<path to rupture file> nheader=<number of header lines, usually 6> latfirst=<1, to signify that latitude comes first in the rupture files>
...

Generated by: PreSGT

Used by: PreSGT

Radiusfile

Purpose: Describe the adaptive mesh SGTs will be saved for.

Filename convention: <site>.radiusfile

Format:

<number of gradations in X and Y>
<radius 1> <radius 2> <radius 3> <radius 4>
<decimation less than radius 1> <decimation between radius 1 and 2> <between 2 and 3> <between 3 and 4>
<number of gradations in Z>
<depth 1> <depth 2> <depth 3> <depth 4>
<decimation less than depth 1> <decimation between depth 1 and 2> <between 2 and 3> <between 3 and 4>

Generated by: PreSGT

Used by: PreSGT

SGT Coordinate files

There are two formats for the list of points to save SGTs for, one for Rob's codes and one for AWP-ODC. As with other coordinate transformations between the two systems, to convert X and Y offsets from RWG to AWP you have to flip the X and Y and add 1 to each, since RWG is 0-indexed and AWP is 1-indexed.

SgtCoords

Purpose: List of all the points to save SGTs for.

Filename convention: <site>.cordfile

Format:

# geoproj= <projection; we usually use 1 for great circle>
# modellon= <model lon> modellat= <model lat> modelrot= <model rot, usually -55>
# xlen= <x-length> ylen= <y-length>
#
<total number of points>
<X index> <Y index> <Z index> <Single long to capture the index, in the form XXXXYYYYZZZZ> <lon> <lat> <depth in km>
...

Generated by: PreSGT

Used by: PreSGT, PreAWP

AWP cordfile

Purpose: List of SGT points to save in a format usable by AWP-ODC-SGT.

Filename convention: awp.<site>.cordfile

Format: Remember that X and Y are flipped and have 1 added from RWG. The points are sorted by Y, then X, then Z.

<number of points>
<X coordinate> <Y coordinate> <Z coordinate>
...

Generated by: PreAWP

Used by:

Impulse source descriptions

We generate the initial source description for CyberShake, with the required dt, nt, and filtering, using gen_source, in http://source.usc.edu/svn/cybershake/import/trunk/SimSgt_V3.0.3/src/ (run 'make get_source'). gen_source hard-codes its parameters, but you should only change 'nt', 'dt', and 'flo'. We have been setting flo to twice the CyberShake maximum frequency, to reduce filtering affects at the frequency of interest. gen_source wraps Rob Graves's source generator, which we use for consistency.

Once this RWG source is generated, we then use AWP-GPU-SGT/utils/data/format_source.py to reprocess the RWG source into an AWP-source friendly format. This involves reformatting the file and multiplying all values by 1e15 for unit conversion. Different files must be produced for X and Y coordinates, since in the AWP format different columns are used for different components.

Finally, AWP-GPU-SGT/utils/build_src.py takes the correct AWP-friendly source (nt and dt) for a run and adds the impulse location coordinates, producing a complete AWP format source description.

RWG source

Purpose: Source description for the SGT impulse.

Filename convention: source_cos0.10_<frequency>hz

Format:

source cos
<nt> <dt> 0 0 0.0 0.0 0.0 0.0
<value at ts0> <value at ts1> <value at ts2> <value at ts3> <value at ts4> <value at ts5>
<value at ts6> <value at ts7> <value at ts8> <value at ts9> <value at ts10> <value at ts11>
...

Generated by: gen_source (see above)

Used by: PreAWP

AWP source

Purpose: Source description which can be used by AWP-ODC.

Filename convention: <site>_f<x or y>_src

Format: Note that X and Y coordinates are swapped between RWG and AWP format, because of how the box is defined. Additionally, RWG is 0-indexed, and AWP is 1-indexed, and the RWG values must be multiplied by 1e15 for unit conversion.

<X index of source, same as site X index> <Y index of source, same as site Y index>
<XX impulse at ts0> <YY at ts0> <ZZ at ts0> <XY at ts0> <XZ at ts0> <YZ at ts0>
...

Generated by: PreAWP

Used by:

IN3D

Purpose: Input file for AWP-ODC.

Filename convention: IN3D.<site>.<x or y>

Format: Specified here (login required).

Generated by: PreAWP

Used by:

Dependencies

Getpar

MySQLdb

UCVM

libcfu