CSEP - ETAS Simulation Plan

From SCECpedia
Revision as of 17:27, 9 August 2024 by Fsilva (talk | contribs)
Jump to navigationJump to search

This page summarizes the performance study ran on Stampede 3 in order to calculate the simulation requirements for computing 1 day ETAS forecasts from 2007 to 2018.

Stampede 3 Installation and Configuration

  • Java - jdk-21.0.1+12
  • FastMPJ

The Slurm file header used to run the simulations is shown below:

#SBATCH -t 6:00:00
#SBATCH -N 7
#SBATCH -n 336
#SBATCH -p spr
#SBATCH -A DS-Cybershake

where the "-n" parameter is calculated as 48 * number of nodes (N). Additionally, the following parameters were used in the Slurm script

MEM_GIGS=110
THREADS=20
FMPJ_HOME=/work2/02404/fsilva/stampede3/FastMPJ
#CLEAN_OPTION="--clean"
export FMPJ_HOME=/work2/02404/fsilva/stampede3/FastMPJ

Performance Results

To measure the Stampede 3 performance running the one day ETAS forecasts, we ran the same simulation scenario using different numbers of nodes. To compute these results, we used the same random seed and date. We also deleted the cached results stored in the scratch filesystem after each simulation in order to force the recalculation of the entire run. The command-line used to generate the run was:

u3etas_comcat_config_builder.sh --end-time 1717484400000 --num-simulations 100000 --duration-years 0.002737851 --include-spontaneous --historical-catalog --start-after-historical --etas-k-cov 1.5 --random-seed 123456789 --hpc-site TACC_FRONTERA --nodes 35 --hours 24 --queue normal --output-dir $ETAS_SIM_DIR/2024_06_04-ComcatPlusHistorical-Start20240604_1day_100000Simulations_Statewide_PointSources_kCOV1p5_Spontaneous_HistCatalog --binary-output

The results are as follows:

Number of Nodes Runtime (min) Service Units (SUs) Used
7 169 19.7
14 84 19.6
28 43 20.1
56 24 22.4

In the table above, service units was computed by dividing the runtime by 60 and multiplying by the number of nodes used. We observe that as the numbers of nodes used increased, runtime decreased linearly while the allocation usage remained mostly flat.

Requirements for the Complete ETAS Simulation Runs

Using the performance results obtained above, we can calculate the requirements for the full simulation run by multiplying the required SUs for a single run by the total number of runs:

  • Start Date = 1 August 2007
  • End Date = 30 August 2018
  • Total Number of Days = 4045

For the total required service units, we multiple the total number of 1 day forecasts (4045) by the number of service units used for each run (20):

  • SUs needed = 20 * 4045 = 80900 SUs

The storage requirements per 1 day forecast is as follows:

  • Binary results (results_*.bin files) ~ 100M
  • Complete output folder (with logs) ~ 272M

If we multiply the numbers above by the total number of 1-day forecasts, we have

  • Total storage for data = 100M * 4045 = 405G
  • Total storage (including logs) = 272M * 4045 = 1.1T

ETAS Production Run Setup

For the UCERF3-ETAS production run setup on Stampede 3, we split the simulation period into 134 30-day bundles using UTC times. We also used the following parameters for each bundle:

  • Number of nodes: 40
  • Node type: Skylake
  • Wall time: 24 hours
  • Memory: 110 GB
  • Threads: 20
  • Options: TEMP_OPTION, SCRATCH_OPTION

Slurm header:

#SBATCH -t 24:00:00
#SBATCH -N 40
#SBATCH -n 1920
#SBATCH -p skx

ETAS Production Run Status