Difference between revisions of "CSEP - ETAS Simulation Plan"
(5 intermediate revisions by the same user not shown) | |||
Line 106: | Line 106: | ||
* Total batches: 134 | * Total batches: 134 | ||
− | * Completed batches: | + | * Completed batches: 134 |
− | * Completed percent: | + | * Completed percent: 100.0% |
− | * Updated: | + | * Updated: 27-August-2024 5:00PM PDT |
The table below shows the status for each individual batch: | The table below shows the status for each individual batch: | ||
Line 161: | Line 161: | ||
| style="background-color:lime" | 019 | | style="background-color:lime" | 019 | ||
| style="background-color:lime" | 020 | | style="background-color:lime" | 020 | ||
− | | style="background-color: | + | | style="background-color:lime" | 021 |
− | | 022 | + | | style="background-color:lime" | 022 |
− | | 023 | + | | style="background-color:lime" | 023 |
− | | 024 | + | | style="background-color:lime" | 024 |
− | | 025 | + | | style="background-color:lime" | 025 |
− | | 026 | + | | style="background-color:lime" | 026 |
− | | 027 | + | | style="background-color:lime" | 027 |
− | | 028 | + | | style="background-color:lime" | 028 |
|- | |- | ||
! 12/2009-12/2010 | ! 12/2009-12/2010 | ||
Line 279: | Line 279: | ||
| style="background-color:lime" | 121 | | style="background-color:lime" | 121 | ||
| style="background-color:lime" | 122 | | style="background-color:lime" | 122 | ||
− | | style="background-color: | + | | style="background-color:lime" | 123 |
| style="background-color:lime" | 124 | | style="background-color:lime" | 124 | ||
|- | |- |
Latest revision as of 17:34, 28 August 2024
This page summarizes the performance study ran on Stampede 3 in order to calculate the simulation requirements for computing 1 day ETAS forecasts from 2007 to 2018.
Contents
Stampede 3 Installation and Configuration
- Java - jdk-21.0.1+12
- FastMPJ
The Slurm file header used to run the simulations is shown below:
#SBATCH -t 6:00:00 #SBATCH -N 7 #SBATCH -n 336 #SBATCH -p spr #SBATCH -A DS-Cybershake
where the "-n" parameter is calculated as 48 * number of nodes (N). Additionally, the following parameters were used in the Slurm script
MEM_GIGS=110 THREADS=20 FMPJ_HOME=/work2/02404/fsilva/stampede3/FastMPJ #CLEAN_OPTION="--clean" export FMPJ_HOME=/work2/02404/fsilva/stampede3/FastMPJ
Performance Results
To measure the Stampede 3 performance running the one day ETAS forecasts, we ran the same simulation scenario using different numbers of nodes. To compute these results, we used the same random seed and date. We also deleted the cached results stored in the scratch filesystem after each simulation in order to force the recalculation of the entire run. The command-line used to generate the run was:
u3etas_comcat_config_builder.sh --end-time 1717484400000 --num-simulations 100000 --duration-years 0.002737851 --include-spontaneous --historical-catalog --start-after-historical --etas-k-cov 1.5 --random-seed 123456789 --hpc-site TACC_FRONTERA --nodes 35 --hours 24 --queue normal --output-dir $ETAS_SIM_DIR/2024_06_04-ComcatPlusHistorical-Start20240604_1day_100000Simulations_Statewide_PointSources_kCOV1p5_Spontaneous_HistCatalog --binary-output
The results are as follows:
Number of Nodes | Runtime (min) | Service Units (SUs) Used |
---|---|---|
7 | 169 | 19.7 |
14 | 84 | 19.6 |
28 | 43 | 20.1 |
56 | 24 | 22.4 |
In the table above, service units was computed by dividing the runtime by 60 and multiplying by the number of nodes used. We observe that as the numbers of nodes used increased, runtime decreased linearly while the allocation usage remained mostly flat.
Requirements for the Complete ETAS Simulation Runs
Using the performance results obtained above, we can calculate the requirements for the full simulation run by multiplying the required SUs for a single run by the total number of runs:
- Start Date = 1 August 2007
- End Date = 30 August 2018
- Total Number of Days = 4045
For the total required service units, we multiple the total number of 1 day forecasts (4045) by the number of service units used for each run (20):
- SUs needed = 20 * 4045 = 80900 SUs
The storage requirements per 1 day forecast is as follows:
- Binary results (results_*.bin files) ~ 100M
- Complete output folder (with logs) ~ 272M
If we multiply the numbers above by the total number of 1-day forecasts, we have
- Total storage for data = 100M * 4045 = 405G
- Total storage (including logs) = 272M * 4045 = 1.1T
ETAS Production Run Setup
For the UCERF3-ETAS production run setup on Stampede 3, we split the simulation period into 134 30-day bundles using UTC times. We also used the following parameters for each bundle:
- Number of nodes: 40
- Node type: Skylake
- Wall time: 24 hours
- Memory: 110 GB
- Threads: 20
- Options: TEMP_OPTION, SCRATCH_OPTION
Slurm header:
#SBATCH -t 24:00:00 #SBATCH -N 40 #SBATCH -n 1920 #SBATCH -p skx
ETAS Production Run Status
- Total batches: 134
- Completed batches: 134
- Completed percent: 100.0%
- Updated: 27-August-2024 5:00PM PDT
The table below shows the status for each individual batch:
Period | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
12/2006-12/2007 | 000 | 001 | 002 | 003 | 004 | |||||||
12/2007-12/2008 | 005 | 006 | 007 | 008 | 009 | 010 | 011 | 012 | 013 | 014 | 015 | 016 |
12/2008-12/2009 | 017 | 018 | 019 | 020 | 021 | 022 | 023 | 024 | 025 | 026 | 027 | 028 |
12/2009-12/2010 | 029 | 030 | 031 | 032 | 033 | 034 | 035 | 036 | 037 | 038 | 039 | 040 |
12/2010-12/2011 | 041 | 042 | 043 | 044 | 045 | 046 | 047 | 048 | 049 | 050 | 051 | 052 |
12/2011-12/2012 | 053 | 054 | 055 | 056 | 057 | 058 | 059 | 060 | 061 | 062 | 063 | 064 |
12/2012-12/2013 | 065 | 066 | 067 | 068 | 069 | 070 | 071 | 072 | 073 | 074 | 075 | 076 |
12/2013-12/2014 | 077 | 078 | 079 | 080 | 081 | 082 | 083 | 084 | 085 | 086 | 087 | 088 |
12/2014-12/2015 | 089 | 090 | 091 | 092 | 093 | 094 | 095 | 096 | 097 | 098 | 099 | 100 |
12/2015-12/2016 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 |
12/2016-12/2017 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 |
12/2017-12/2018 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 |
Completed
Running
Queued
Failed