CSEP - ETAS Simulation Plan
This page summarizes the performance study ran on Stampede 3 in order to calculate the simulation requirements for computing 1 day ETAS forecasts from 2007 to 2018.
Stampede 3 Installation and Configuration
- Java - jdk-21.0.1+12
- FastMPJ
The Slurm file header used to run the simulations is shown below:
#SBATCH -t 6:00:00 #SBATCH -N 7 #SBATCH -n 336 #SBATCH -p spr #SBATCH -A DS-Cybershake
where the "-n" parameter is calculated as 48 * number of nodes (N). Additionally, the following parameters were used in the Slurm script
MEM_GIGS=110 THREADS=20 FMPJ_HOME=/work2/02404/fsilva/stampede3/FastMPJ #CLEAN_OPTION="--clean" export FMPJ_HOME=/work2/02404/fsilva/stampede3/FastMPJ
Performance Results
To measure the Stampede 3 performance running the one day ETAS forecasts, we ran the same simulation scenario using different numbers of nodes. To compute these results, we used the same random seed and date. We also deleted the cached results stored in the scratch filesystem after each simulation in order to force the recalculation of the entire run. The command-line used to generate the run was:
u3etas_comcat_config_builder.sh --end-time 1717484400000 --num-simulations 100000 --duration-years 0.002737851 --include-spontaneous --historical-catalog --start-after-historical --etas-k-cov 1.5 --random-seed 123456789 --hpc-site TACC_FRONTERA --nodes 35 --hours 24 --queue normal --output-dir $ETAS_SIM_DIR/2024_06_04-ComcatPlusHistorical-Start20240604_1day_100000Simulations_Statewide_PointSources_kCOV1p5_Spontaneous_HistCatalog --binary-output
The results are as follows:
Number of Nodes | Runtime (min) | Service Units (SUs) Used |
---|---|---|
7 | 169 | 19.7 |
14 | 84 | 19.6 |
28 | 43 | 20.1 |
56 | 24 | 22.4 |
In the table above, service units was computed by dividing the runtime by 60 and multiplying by the number of nodes used. We observe that as the numbers of nodes used increased, runtime decreased linearly while the allocation usage remained mostly flat.
Requirements for the Complete ETAS Simulation Runs
Using the performance results obtained above, we can calculate the requirements for the full simulation run by multiplying the required SUs for a single run by the total number of runs:
- Start Date = 1 August 2007
- End Date = 30 August 2018
- Total Number of Days = 4045
For the total required service units, we multiple the total number of 1 day forecasts (4045) by the number of service units used for each run (20):
- SUs needed = 20 * 4045 = 80900 SUs
The storage requirements per 1 day forecast is as follows:
- Binary results (results_*.bin files) ~ 100M
- Complete output folder (with logs) ~ 272M
If we multiply the numbers above by the total number of 1-day forecasts, we have
- Total storage for data = 100M * 4045 = 405G
- Total storage (including logs) = 272M * 4045 = 1.1T