Difference between revisions of "UCVM Install Stampede3"

From SCECpedia
Jump to navigationJump to search
Line 188: Line 188:
 
ibrun ./ucvm2mesh_mpi -f ./la_habra_cvmsi.conf
 
ibrun ./ucvm2mesh_mpi -f ./la_habra_cvmsi.conf
 
</pre>
 
</pre>
 +
 +
the manual page for the ucvm2mesh_mpi comamand is:
 +
<pre>
 +
login1.stampede3(1777)$ ucvm2mesh_mpi -h
 +
[0] ucvm2mesh_mpi Version: 25.7.0
 +
[0] Running on 1 cores
 +
Usage: ucvm2mesh_mpi [-h] [-o dir] -f configfile
 +
 +
where:
 +
-h: help message
 +
-o: final stage out directory for mesh files
 +
-f: config file containing mesh params
 +
 +
Config file format:
 +
ucvmlist: comma-delimited list of CVMs to query (as supported by UCVM)
 +
ucvmconf: UCVM API config file
 +
gridtype: location of x-y gridded points: VERTEX, or CENTER
 +
querymode: query mode, DEPTH, or ELEVATION
 +
spacing: grid spacing (units appropriate for proj)
 +
proj: Proj.4 projection specification, or 'cmu' for TeraShake
 +
rot: proj rotation angle in degrees, (+ is counter-clockwise)
 +
x0: longitude of origin (deg), or x offset in cmu proj (m)
 +
y0: latitude of origin (deg), or y offset in cmu proj (m)
 +
z0: depth of origin (m, typically 0.0)
 +
nx: number of points along x-axis
 +
ny: number of points along y-axis
 +
nz: number of points along z-axis (depth positive)
 +
px: number of procs along x-axis
 +
py: number of procs along y-axis
 +
pz: number of procs along z-axis
 +
vp_min: vp minimum (m/s), enforced on vs_min conditions
 +
vs_min: vs minimum (m/s)
 +
meshfile: path and basename to output mesh files
 +
gridfile: path and filename to output grid filesfiles
 +
meshtype: mesh format: IJK-12, IJK-20, IJK-32, or SORD
 +
scratch: path to scratch space
 +
 +
Version: 25.7.0
 +
</pre>
 +
 +
The second test is also a small mesh, but it is extracted in layers. This calls three
 +
<pre>
 +
#!/bin/bash
 +
 +
#SBATCH -t 00:30:00
 +
#SBATCH --nodes=1
 +
#SBATCH --ntasks-per-node=12
 +
#SBATCH --partition=skx
 +
#SBATCH --account=DS-Cybershake
 +
#SBATCH --output=skx_etas-%x.%j.out
 +
#SBATCH --error=skx_etas-%x.%j.err
 +
#SBATCH --mail-user=maechlin@usc.edu
 +
#SBATCH --mail-type=ALL
 +
#SBATCH --export=ALL
 +
 +
####################
 +
## Configuration ##
 +
####################
 +
 +
source /work2/00329/tg456034/stampede3/ucvm_257/conf/ucvm_env.sh
 +
cp ${UCVM_INSTALL_PATH}/bin/ucvm2mesh_mpi_layer .
 +
 +
ibrun -n 4 ./ucvm2mesh_mpi_layer -f la_habra_cvmsi.conf -l 1 -c 3 &
 +
ibrun -n 4 ./ucvm2mesh_mpi_layer -f la_habra_cvmsi.conf -l 4 -c 3 &
 +
ibrun -n 4 ./ucvm2mesh_mpi_layer -f la_habra_cvmsi.conf -l 7 -c 4 &
 +
 +
wait
 +
echo "Simulation complete. Exit code: $RET"
 +
date
 +
</pre>
 +
 +
the man pages for this command are:
 +
<pre>
 +
login1.stampede3(1780)$ ucvm2mesh_mpi_layer -h
 +
[0] ucvm2mesh_mpi_layer Version: 25.7.0
 +
[0] Running on 1 cores
 +
Usage: ucvm2mesh_mpi_layer [-h] [-o dir] -f configfile [-l layer] [-c count]
 +
 +
where:
 +
-h: help message
 +
-f: config file containing mesh params
 +
 +
-l: which rank layer to start process
 +
 +
-c: how many rank layer to process
 +
 +
Config file format:
 +
ucvmlist: comma-delimited list of CVMs to query (as supported by UCVM)
 +
ucvmconf: UCVM API config file
 +
gridtype: location of x-y gridded points: VERTEX, or CENTER
 +
querymode: query mode, DEPTH, or ELEVATION
 +
spacing: grid spacing (units appropriate for proj)
 +
proj: Proj.4 projection specification, or 'cmu' for TeraShake
 +
rot: proj rotation angle in degrees, (+ is counter-clockwise)
 +
x0: longitude of origin (deg), or x offset in cmu proj (m)
 +
y0: latitude of origin (deg), or y offset in cmu proj (m)
 +
z0: depth of origin (m, typically 0.0)
 +
nx: number of points along x-axis
 +
ny: number of points along y-axis
 +
nz: number of points along z-axis (depth positive)
 +
px: number of procs along x-axis
 +
py: number of procs along y-axis
 +
pz: number of procs along z-axis
 +
vp_min: vp minimum (m/s), enforced on vs_min conditions
 +
vs_min: vs minimum (m/s)
 +
meshfile: path and basename to output mesh files
 +
gridfile: path and filename to output grid filesfiles
 +
meshtype: mesh format: IJK-12, IJK-20, IJK-32, or SORD
 +
scratch: path to scratch space
 +
 +
Version: 25.7.0
 +
</pre>
 +
 +
== large mesh generation ==
 +
This creates three meshes using ucvm2mesh_mpi. The first two are small examples to show the configuration is usable and executable okay. The third is a large mesh, 20m used for La Habra simulation with over 1B points. This is a fairly large stress test for the software and the system.

Revision as of 02:25, 9 July 2025

We installed and tested UCVM_25_7 on Stampede3

Modules used on Stampede3

login4.stampede3(1151)$ module list

Currently Loaded Modules:
  1) intel/24.0   2) impi/21.11   3) autotools/1.4   4) cmake/3.31.5   5) xalt/3.1.1   6) python/3.9.18   7) TACC

Define Install Parameters

  1. These are defined only during installation. The will be defined at run-time when the
  2. ucvm_env.sh is run

export UCVM_SRC_PATH=$WORK/ucvm_src/ucvm export UCVM_INSTALL_PATH=$WORK/ucvm_257

Install commands

# Unbuffer python log files so results are visible during build
export PYTHONUNBUFFERED=TRUE
#
#
#Automatically sets up UCVMC and alerts the user to potential complications.
#

Install issue

Runnning test test_ssh_generate
Traceback (most recent call last):
  File "/work2/00329/tg456034/stampede3/ucvm_257/tests/./accept_test.py", line 84, in <module>
    if eval("%s('%s')" % (func, sys.argv[1])) == 0:
  File "<string>", line 1, in <module>
  File "/work2/00329/tg456034/stampede3/ucvm_257/tests/./accept_test.py", line 55, in test_ssh_generate
    generatedfloats.fromfile(f, 100 * 100 * 100)
EOFError: read() didn't return enough bytes
make[1]: *** [Makefile:723: check] Error 1
make[1]: Leaving directory '/scratch/00329/tg456034/ucvm_src/ucvm/test'
make: *** [Makefile:373: check-recursive] Error 1
  1. -s --static Use static linking.
  2. -d --dynamic Use dynamic linking.
  3. -a --all Use all available models.
  4. -r --restart This is a restart of ucvm_setup.py call.
  5. -p --path use supplied installation path.
  6. -h --help usage.
  7. UCVMC 25.7.0

./ucvm_setup.py -a -d -p your-ucvm-install-path >& ucvm_setup_install.log &

Determine if MPI executable were built

After the ucvm_env.sh is run, you can dtermine if mpi executable were built by running an mpi command % basin_query_mpi

GitHub UCVM Info Page

Update the link on github page from this to a replacement: https://www.scec.org/research/ucvm

Unit Test and Accept Test

% make check

MPI Tests

  • Simple basin_query_mpi Tests

The first two tests required the basin_query_mpi executable, and the cvms5 model. The tests extract some basin depth values from a model. It then compares the extracted data from expected data that is included in input file The command used is:


login1.stampede3(1761)$ basin_query_mpi -h
Usage: basin_query_mpi [-h] [-b outfile] [-m models<:ifunc>] [-f config] [-d max_depth] [-i inter] [-v vs_thresh] [-l lon,lat] [-s spacing] [-x num lon pts] [-y num lat pts]

where:
	-b Binary output to file.
	-h This help message
	-f Configuration file. Default is ./ucvm.conf.
	-i Interval between query points along z-axis (m, default is 20.0)
	-m Comma delimited list of crustal/GTL models to query in order
	-v Vs threshold (m/s, default is 1000.0).
	-l Bottom-left lat,lon separated by comma.
	-s Grid spacing.
	-x Number of longitude points.
	-y Number of latitude points.
Notes:
	- If running interactively, type Cntl-D to end input coord list.

Version: 25.7.

ibrun ${UCVM_INSTALL_PATH}/bin/basin_query_mpi -b ./${TEST}.simple \
  -f ${UCVM_INSTALL_PATH}/conf/ucvm.conf -m cvms5 -i 20 -v 2500 -l 35.0,-122.5 -s 0.1 -x 16 -y 11

These parameters mean:
-b output binary file
-m use model cvms5
-i interval between query points 20m
-v find depth to Vs2500
-l bottom left of region to search
-s grid spacing in degree
-x number of lat points
-y number of lon points
  • twotasks_onenode.slurm - Test on 1 node with two tasks on one node
  • twotasks_twonodes.slurm - Test on 2 nodes with one task per node for a total of two tasks

Simple ucvm2mesh_mpi tests

These tests call the ucvm2mesh_mpi and ucvm2mesh_mpi_layer to generate a small mesh file. It's not clear if the stampede3 system will do variable substitution, so we put absoluate path names into this file instead of environment variables defined on stamped3 including $WORK and $SCRATCH the 20x20x50 mesh points are 20,000 pts total. The model used is cvmsi. The input configuration file is:

# List of CVMs to query
ucvmlist=cvmsi

# UCVM conf file
ucvmconf=/work2/00329/tg456034/stampede3/ucvm_257/conf/ucvm.conf

# Gridding cell centered or vertex
gridtype=CENTER

# Spacing of cells
spacing=20.0

# Projection
proj=+proj=utm +datum=WGS84 +zone=11
rot=-39.9
x0=-118.20819
y0=33.85173
z0=0.0

# Number of cells along each dim
nx=20
ny=20
nz=50

# Partitioning of grid among cores
px=2
py=2
pz=10

# Vs/Vp minimum
vp_min=0
vs_min=0


# Mesh and grid files, format
meshfile=la_habra_cvmsi.media
gridfile=la_habra_cvmsi.grid
meshtype=IJK-12

# Location of scratch dir
scratch=/scratch/00329/tg456034
  • ucvm2mesh_mpi

This slurm script is called: ucvm_mpi.slurm It appears that the ibrun pre-processing has problems parsing the *.conf file without spaces between the +datum value. We currently resolve this issue by copying the executable to the local directory, then running the script. The slurm script looks like this:

Based on the Stamped3 documentation, they recommend defining -N (number of nodes) and -n (total number of cores (aka tasks), the letting the ibrun figure out how to distribut them. As a result, we remove additional command line params from the ibrun command.

In this case, we are saying run on one node, and use twenty cores.

There is a restriction that the number of cores (as calculated from the .conf file 2 * 2 * 10 = 40) must be evently divisible by the number of cores requested (which is 20).

ucvm_mpi.slurm
::::::::::::::
#!/bin/bash

#SBATCH -t 00:30:00
#SBATCH -N 1
#SBATCH -n 20
#SBATCH --partition=skx
#SBATCH --account=DS-Cybershake
#SBATCH --output=ucvm_mpi-%x.%j.out
#SBATCH --error=ucvm_mpi-%x.%j.err
#SBATCH --mail-user=maechlin@usc.edu
#SBATCH --mail-type=ALL
#SBATCH --export=ALL

####################
## Configuration ##
####################

source /work2/00329/tg456034/stampede3/ucvm_257/conf/ucvm_env.sh
cp ${UCVM_INSTALL_PATH}/bin/ucvm2mesh_mpi .

ibrun ./ucvm2mesh_mpi -f ./la_habra_cvmsi.conf

the manual page for the ucvm2mesh_mpi comamand is:

login1.stampede3(1777)$ ucvm2mesh_mpi -h
[0] ucvm2mesh_mpi Version: 25.7.0
[0] Running on 1 cores
Usage: ucvm2mesh_mpi [-h] [-o dir] -f configfile

where:
	-h: help message
	-o: final stage out directory for mesh files
	-f: config file containing mesh params

Config file format:
	ucvmlist: comma-delimited list of CVMs to query (as supported by UCVM)
	ucvmconf: UCVM API config file
	gridtype: location of x-y gridded points: VERTEX, or CENTER
	querymode: query mode, DEPTH, or ELEVATION
	spacing: grid spacing (units appropriate for proj)
	proj: Proj.4 projection specification, or 'cmu' for TeraShake
	rot: proj rotation angle in degrees, (+ is counter-clockwise)
	x0: longitude of origin (deg), or x offset in cmu proj (m)
	y0: latitude of origin (deg), or y offset in cmu proj (m)
	z0: depth of origin (m, typically 0.0)
	nx: number of points along x-axis
	ny: number of points along y-axis
	nz: number of points along z-axis (depth positive)
	px: number of procs along x-axis
	py: number of procs along y-axis
	pz: number of procs along z-axis
	vp_min: vp minimum (m/s), enforced on vs_min conditions
	vs_min: vs minimum (m/s)
	meshfile: path and basename to output mesh files
	gridfile: path and filename to output grid filesfiles
	meshtype: mesh format: IJK-12, IJK-20, IJK-32, or SORD
	scratch: path to scratch space

Version: 25.7.0

The second test is also a small mesh, but it is extracted in layers. This calls three

#!/bin/bash

#SBATCH -t 00:30:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --partition=skx
#SBATCH --account=DS-Cybershake
#SBATCH --output=skx_etas-%x.%j.out
#SBATCH --error=skx_etas-%x.%j.err
#SBATCH --mail-user=maechlin@usc.edu
#SBATCH --mail-type=ALL
#SBATCH --export=ALL

####################
## Configuration ##
####################

source /work2/00329/tg456034/stampede3/ucvm_257/conf/ucvm_env.sh
cp ${UCVM_INSTALL_PATH}/bin/ucvm2mesh_mpi_layer .

ibrun -n 4 ./ucvm2mesh_mpi_layer -f la_habra_cvmsi.conf -l 1 -c 3 &
ibrun -n 4 ./ucvm2mesh_mpi_layer -f la_habra_cvmsi.conf -l 4 -c 3 &
ibrun -n 4 ./ucvm2mesh_mpi_layer -f la_habra_cvmsi.conf -l 7 -c 4 &

wait
echo "Simulation complete. Exit code: $RET"
date

the man pages for this command are:

login1.stampede3(1780)$ ucvm2mesh_mpi_layer -h
[0] ucvm2mesh_mpi_layer Version: 25.7.0
[0] Running on 1 cores
Usage: ucvm2mesh_mpi_layer [-h] [-o dir] -f configfile [-l layer] [-c count]

where:
	-h: help message
	-f: config file containing mesh params

	-l: which rank layer to start process

	-c: how many rank layer to process

Config file format:
	ucvmlist: comma-delimited list of CVMs to query (as supported by UCVM)
	ucvmconf: UCVM API config file
	gridtype: location of x-y gridded points: VERTEX, or CENTER
	querymode: query mode, DEPTH, or ELEVATION
	spacing: grid spacing (units appropriate for proj)
	proj: Proj.4 projection specification, or 'cmu' for TeraShake
	rot: proj rotation angle in degrees, (+ is counter-clockwise)
	x0: longitude of origin (deg), or x offset in cmu proj (m)
	y0: latitude of origin (deg), or y offset in cmu proj (m)
	z0: depth of origin (m, typically 0.0)
	nx: number of points along x-axis
	ny: number of points along y-axis
	nz: number of points along z-axis (depth positive)
	px: number of procs along x-axis
	py: number of procs along y-axis
	pz: number of procs along z-axis
	vp_min: vp minimum (m/s), enforced on vs_min conditions
	vs_min: vs minimum (m/s)
	meshfile: path and basename to output mesh files
	gridfile: path and filename to output grid filesfiles
	meshtype: mesh format: IJK-12, IJK-20, IJK-32, or SORD
	scratch: path to scratch space

Version: 25.7.0

large mesh generation

This creates three meshes using ucvm2mesh_mpi. The first two are small examples to show the configuration is usable and executable okay. The third is a large mesh, 20m used for La Habra simulation with over 1B points. This is a fairly large stress test for the software and the system.