CSEP - ETAS Simulation Plan

From SCECpedia
Jump to navigationJump to search

This page summarizes the performance study ran on Stampede 3 in order to calculate the simulation requirements for computing 1 day ETAS forecasts from 2007 to 2018.

Stampede 3 Installation and Configuration

  • Java - jdk-21.0.1+12
  • FastMPJ

The Slurm file header used to run the simulations is shown below:

#SBATCH -t 6:00:00
#SBATCH -N 7
#SBATCH -n 336
#SBATCH -p spr
#SBATCH -A DS-Cybershake

where the "-n" parameter is calculated as 48 * number of nodes (N). Additionally, the following parameters were used in the Slurm script

MEM_GIGS=110
THREADS=20
FMPJ_HOME=/work2/02404/fsilva/stampede3/FastMPJ
#CLEAN_OPTION="--clean"
export FMPJ_HOME=/work2/02404/fsilva/stampede3/FastMPJ

Performance Results

To measure the Stampede 3 performance running the one day ETAS forecasts, we ran the same simulation scenario using different numbers of nodes. To compute these results, we used the same random seed and date. We also deleted the cached results stored in the scratch filesystem after each simulation in order to force the recalculation of the entire run. The command-line used to generate the run was:

u3etas_comcat_config_builder.sh --end-time 1717484400000 --num-simulations 100000 --duration-years 0.002737851 --include-spontaneous --historical-catalog --start-after-historical --etas-k-cov 1.5 --random-seed 123456789 --hpc-site TACC_FRONTERA --nodes 35 --hours 24 --queue normal --output-dir $ETAS_SIM_DIR/2024_06_04-ComcatPlusHistorical-Start20240604_1day_100000Simulations_Statewide_PointSources_kCOV1p5_Spontaneous_HistCatalog --binary-output

The results are as follows:

Number of Nodes Runtime (min) Service Units (SUs) Used
7 169 19.7
14 84 19.6
28 43 20.1
56 24 22.4

In the table above, service units was computed by dividing the runtime by 60 and multiplying by the number of nodes used. We observe that as the numbers of nodes used increased, runtime decreased linearly while the allocation usage remained mostly flat.

Requirements for the Complete ETAS Simulation Runs

Using the performance results obtained above, we can calculate the requirements for the full simulation run by multiplying the required SUs for a single run by the total number of runs:

  • Start Date = 1 August 2007
  • End Date = 30 August 2018
  • Total Number of Days = 4045

For the total required service units, we multiple the total number of 1 day forecasts (4045) by the number of service units used for each run (20):

  • SUs needed = 20 * 4045 = 80900 SUs

The storage requirements per 1 day forecast is as follows:

  • Binary results (results_*.bin files) ~ 100M
  • Complete output folder (with logs) ~ 272M

If we multiply the numbers above by the total number of 1-day forecasts, we have

  • Total storage for data = 100M * 4045 = 405G
  • Total storage (including logs) = 272M * 4045 = 1.1T

ETAS Production Run Setup

For the UCERF3-ETAS production run setup on Stampede 3, we split the simulation period into 134 30-day bundles using UTC times. We also used the following parameters for each bundle:

  • Number of nodes: 40
  • Node type: Skylake
  • Wall time: 24 hours
  • Memory: 110 GB
  • Threads: 20
  • Options: TEMP_OPTION, SCRATCH_OPTION

Slurm header:

#SBATCH -t 24:00:00
#SBATCH -N 40
#SBATCH -n 1920
#SBATCH -p skx

ETAS Production Run Status

Period
12/2006-12/2007 000 001 002 003 004
12/2007-12/2008 005 006 007 008 009 010 011 012 013 014 015 016
12/2008-12/2009 017 018 019 020 021 022 023 024 025 026 027 028
12/2009-12/2010 029 030 031 032 033 034 035 036 037 038 039 040
12/2010-12/2011 041 042 043 044 045 046 047 048 049 050 051 052
12/2011-12/2012 053 054 055 056 057 058 059 060 061 062 063 064
12/2012-12/2013 065 066 067 068 069 070 071 072 073 074 075 076
12/2013-12/2014 077 078 079 080 081 082 083 084 085 086 087 088
12/2014-12/2015 089 090 091 092 093 094 095 096 097 098 099 100
12/2015-12/2016 101 102 103 104 105 106 107 108 109 110 111 112
12/2016-12/2017 113 114 115 116 117 118 119 120 121 122 123 124
12/2017-12/2018 125 126 127 128 129 130 131 132 133


Completed

Running

Queued