Difference between revisions of "Rupture Variation Generator v5.4.2"

From SCECpedia
Jump to navigationJump to search
 
(2 intermediate revisions by the same user not shown)
Line 157: Line 157:
 
         amp = fac*1.0/k2;
 
         amp = fac*1.0/k2;
 
</pre>
 
</pre>
These optimizations depend on beta=-2 and ord=4, so before the for loops I added checks that this is in fact true:
+
These optimizations depend on beta=2 and ord=4, so before the for loops I added checks that this is in fact true:
 
<pre>
 
<pre>
 
if (ord!=4) {
 
if (ord!=4) {
Line 171: Line 171:
  
 
== target_hypo_spacing ==
 
== target_hypo_spacing ==
 +
 +
'''Conclusion: ''' we will use 4 km spacing for CyberShake runs using genslip-v5.4.2.  Due to sampling effects (more details below), this value may slightly underestimate the hazard at long return periods (>1e-5)
  
 
One of the parameters to genslip is 'target_hypo_spacing', which determines the spacing between hypocenters when rupture variations are generated.  Decreasing the spacing improves sampling, but also increases the number of rupture variations and increases runtime.  Past CyberShake runs used the value 4.5 km.
 
One of the parameters to genslip is 'target_hypo_spacing', which determines the spacing between hypocenters when rupture variations are generated.  Decreasing the spacing improves sampling, but also increases the number of rupture variations and increases runtime.  Past CyberShake runs used the value 4.5 km.

Latest revision as of 18:25, 31 October 2022

This page details the work to migrate the Graves & Pitarka (2019) rupture generator, v5.4.2, to CyberShake.

The specific code changes required to create an API are detailed here: Rupture Variation Generator v5.4.2 code changes

Status

The work below was performed on Summit.

  • Replicate reference SRFs using stand-alone code: Complete
  • Create RupGen-api-5.4.2: Complete
  • Replicate SRFs from stand-alone code using RupGen library: Complete
  • Compile DirectSynth against RupGen library: Complete
  • Replicate SRFs from stand-alone code using DirectSynth: Complete
  • In the database, create new Rupture Variation Scenario ID and populate the Rupture Variations table: In progress
  • Perform CyberShake run for USC using RupGen-api-5.4.2: Complete

Verification

The verification sequence is:

  1. Reference results from Rob
  2. (1) reproduced using Rob's supplied stand-alone code, compiled and run on a Summit head node.
  3. (2) is used to produce reference SRFs from ERF 36 geometry files for
    1. Source 76, rupture 0 (M6.35)
    2. Source 128, rupture 858 (M7.35)
    3. Source 68, rupture 7 (M8.45)
  4. Results from (3) are reproduced using test code which is compiled against the RupGen-api-5.4.2 library.
  5. Results from (3) are reproduced using DirectSynth and writing out the SRFs.

RupGen-api-5.4.2 against stand-alone code

Source 76, rupture 0

We generated all 77 rupture variations using the stand-alone code, then generated them using test code compiled against the library. These were all done on a login node.

Only a few non-slip fields differed more than the permitted tolerance, less than 1 per variation.

The average largest percent difference (which is mostly the difference between slips) was 0.0012%, and the average largest difference was ~1e-5 (on values which range up to ~100).

Since in the past we have had issues with an order dependence in the rupture generator, we also spot-checked by generating every 10th variation using the test code. These yielded the same md5sums as when they were generated in order.

Source 128, rupture 858

We generated all 256 rupture variations using the stand-alone code, then generated them using test code compiled against the library. These were all done on a login node.

Each rupture variation has approximately 6 differences outside of the tolerance values (out of approximately 2.5 million values).

The average largest percent difference (which is mostly the difference between slips) was 0.0072%, and the average largest difference was ~7e-5 (on values which range up to ~1000).

Since in the past we have had issues with an order dependence in the rupture generator, we also spot-checked by generating every 20th variation using the test code. These yielded the same md5sums as when they were generated in order, as did rupture variations 250 and 251 generated consecutively.

Source 68, rupture 7

All these SRFs were generated on the compute nodes. If you generate them on a login node, you will get something slightly different.

This source/rupture combo has 1190 rupture variations, but since each one takes about 90 seconds to generate, it would take 30 hours to create them all. Instead, we used the stand-alone code to generate the first 423 rupture variations, then generated the first 41 using test code compiled against the library.

Using the same tolerances, many rupture variations had more than 10% of points which at least 1 difference, which causes an abort. We doubled the tolerance (from 0.00011 to 0.00021) and ran the comparisons again. Each rupture variation has approximately 498 differences outside of the tolerance values (out of approximately 35 million values).

The average largest percent difference (which is mostly the difference between slips) was 0.052%, and the average largest difference was ~4e-4 (on values which range up to ~10000).

The spot-check test was done on rupture variations 0, 10, 20, 30, and 40. We also spot-checked 100, 150, 200, 250, 300, 350, and 400 against the reference, with similar differences.

DirectSynth compiled with librupgen compared with stand-alone code

For this comparison, we recalculated the stand-alone code results using a compute node, since we saw that this introduces some slight differences, and DirectSynth has to be run on a login node.

Source 76, rupture 0

We compared all 77 rupture variations.

The average largest difference was 4e-5, and average largest percent difference was 0.0032%.

Using the slightly higher difference tolerance (0.000021), we average 1 difference higher than the tolerance per variation.

Source 128, rupture 858

We compared all 256 rupture variations.

The average largest difference was 6e-5, and average largest percent difference was 0.0044%.

Using the slightly higher difference tolerance (0.000021), we average 12 differences higher than the tolerance per variation.

Source 68, rupture 7

We compared only the first 40 rupture variations.

The average largest difference was 8e-4, and average largest percent difference was 0.1293%.

Using the slightly higher difference tolerance (0.000021), we average 19,700 differences higher than the tolerance per variation.

Since this is quite a bit worse than the previous comparison, we also did a comparison with the results generated directly from the library, and got about the same results.

Optimization

v5.4.2 is approximately 3x slower than v3.3.1, so we investigated optimization.

We are running source 68, rupture 7, rupture variation 0 as our benchmark. We run it 5 times and take the average.

Reference runtime: 69.400300 sec

Random number generator

For source 128, rupture 858 (source 68, rupture 7 produced 42GB trace results), Score-P suggests that gaus_rand() and sfrand() are both called an extraordinary number of times and are responsible for about 75% of the runtime (total runtime was 18.9 sec):

cube::Region          NumberOfCalls ExclusiveTime InclusiveTime
gaus_rand                   5531760      8.104156     15.400368
sfrand                     66381120      7.296212      7.296212
fft2d_fftw                       14      1.584906      1.584949
write_srf2                        1      1.272117      1.272618
kfilt_beta2                       2      1.182854     15.778140
...

Various optimizations on gaus_rand() and sfrand(), and the -mcpu=power9 compiler flag got us about 1% speedup on source 68, rupture 7.

Running Score-P on a larger rupture (source 63, rupture 3) gives a different breakdown, with more time spent in kfilt_beta2 and fft2d_fftw:

cube::Region          NumberOfCalls ExclusiveTime InclusiveTime
fft2d_fftw                       14      7.450708      7.450762
kfilt_beta2                       2      5.281393      8.775227
write_srf2                        1      3.780037      3.780435
gaus_rand                  24231024      3.583294      3.583294
kfilter                           1      0.822850      0.822850
gen_Mliu_stf                 110892      0.365619      0.398298
_mc_read_ruppars                  1      0.197096      0.271803
mc_genslip                        1      0.152591     18.446691

Only about 0.7 sec of the time in fft2d_fftw that's used isn't due to fftw calls, so there's not much potential for optimization.

kfilt_beta2

In kfilt_beta2, about 25% of the time is spent in the if statement which calculates amp. Some refactoring to pull out constants and eliminate as many exp() and log() calls as possible reduced the runtime of this if statement by 75%, yielding an overall improvement of 19%.

Specifically, we took (slip.c, line 1457)

k2 = kx*kx + ky*ky;
fac = 1.0/((1.0 + exp(ord*log(k2*lmin2)))*(1.0 + exp(-ord*log(k2*lmax2))));
amp = fac*exp(-0.5*beta2*log(k2));

and modified it via algebraic manipulation and pulling constants out of the loops:

//New constants
float lmin2_ord = pow(lmin2, ord);
float lmax2_neg_ord = pow(lmax2, -1.0*ord);
float k2_ord;

for(j=0;j<=ny0/2;j++)  /* only do positive half, then use symmetry */
<snip>
        k2 = kx*kx + ky*ky;
        k2_ord = k2*k2*k2*k2;
        fac = 1.0/((1.0+k2_ord*lmin2_ord)*(1.0+1.0/k2_ord*lmax2_neg_ord));
        amp = fac*1.0/k2;

These optimizations depend on beta=2 and ord=4, so before the for loops I added checks that this is in fact true:

if (ord!=4) {
        fprintf(stderr, "The optimized calculation of amp in kfilt_beta2 depends on ord=4.  Ord is actually %d, so we are aborting and you should change this calculation back to the unoptimized version.\n", ord);
        exit(5);
}

if (fabs(beta2-2.0)>0.000001) {
        fprintf(stderr, "The optimized calculation of amp in kfilt_beta2 depends on beta2 = 2.  beta2 is actually %d, so we are aborting and you should change this calculation.\n", beta2);
        exit(6);
}

target_hypo_spacing

Conclusion: we will use 4 km spacing for CyberShake runs using genslip-v5.4.2. Due to sampling effects (more details below), this value may slightly underestimate the hazard at long return periods (>1e-5)

One of the parameters to genslip is 'target_hypo_spacing', which determines the spacing between hypocenters when rupture variations are generated. Decreasing the spacing improves sampling, but also increases the number of rupture variations and increases runtime. Past CyberShake runs used the value 4.5 km.

We performed an experiment in which, for site USC using the Study 15.4 parameters, varied the target_hypo_spacing value from 2.5 km to 5 km in 0.5 km increments, and compared the resulting hazard curves.

The table below shows how many rupture variations were considered for each spacing, and what increase/decrease in post-processing computational cost this represents, compared to the 4.5 km used in the past.

Spacing Number of rupture variations Change in computational cost
5 km 371,513 -22%
4.5 km 476,429 0%
4 km 623,290 31%
3.5 km 793,184 66%
3 km 1,135,139 138%
2.5 km 1,609,701 238%

Curves

Below are plots of the RotD50 hazard curves for each spacing, using data from this spreadsheet.

2 sec RotD50 3 sec RotD50 5 sec RotD50 10 sec RotD50
Target hypo spacing 2sRotD50 curve plots.png
Target hypo spacing 3sRotD50 curve plots.png
Target hypo spacing 5sRotD50 curve plots.png
Target hypo spacing 10sRotD50 curve plots.png


The highest hazard is in the sparsest spacing at high probabilities and the densest spacing at low probabilities.

Disaggregation

We performed disaggregation at 1e-6 for 3 sec RotD50 at the 6 hypocenter spacings.

5 km 4.5 km 4 km 3.5 km 3 km 2.5 km
Top contributor source 219 (Newport Inglewood)

30.21%

source 244 (Puente Hills (LA))

49.66%

source 244 (Puente Hills (LA))

60.88%

source 244 (Puente Hills (LA))

40.20%

source 244 (Puente Hills (LA))

36.90%

source 244 (Puente Hills (LA))

59.47%

2nd source 244 (Puente Hills (LA))

23.59%

source 219 (Newport Inglewood)

20.04%

source 219 (Newport Inglewood)

15.81%

source 219 (Newport Inglewood)

18.27%

source 219 (Newport Inglewood)

17.26%

source 265 (Santa Monica)

11.62%

3rd source 265 (Santa Monica)

17.64%

source 218 (Newport Inglewood)

13.67%

source 218 (Newport Inglewood)

14.25%

source 218 (Newport Inglewood)

14.96%

source 218 (Newport Inglewood)

16.80%

source 219 (Newport Inglewood)

10.95%

Source 244, rupture 12

We pulled out source 244, rupture 12, since rupture 12 was the rupture with the largest ground motions for source 244, and looked at the individual IMs for each spacing. This ranged from 28 rupture variations for the 5 km spacing to 12 variations for the 2.5 km spacing.

Their means and medians look quite similar, but we see a higher max for 3.5 and 2.5 km.

Spacing Mean Median Min Max
5 km 763.73 770.37 266.85 1470.71
4.5 km 791.63 758.99 223.47 1504.48
4 km 833.84 788.58 406.21 1473.57
3.5 km 801.43 765.02 346.44 1654.31
3 km 793.99 742.54 426.42 1455.13
2.5 km 795.67 738.93 242.04 1856.50

I created a histogram of the fraction of rupture variations to fall in each IM bin (for 3 sec RotD50). It does look like we have more representation at the high end with the denser spacings.

Target hypo spacing s244 r12 3sRotD50 histogram.png

5 km with different seeds

To examine whether the effects seen with different spacings are due to sampling, we ran the post-processing at 5 km 3 more times with different initial seeds, producing 4 different sets of rupture realizations at 5 km spacing. This gives us approximately the same size sample as the 2.5 km spacing.

Below is a histogram comparing the distribution of IMs for 3 sec RotD50 between 1) 5 km spacing; 2) the 4 sets of 5 km spacing with different seeds, taken together as a group; 3) the 2.5 km spacing. The 4 sets of 5 km spacing have a very similar distribution to the 2.5 km spacing, supporting the hypothesis that the larger IMs we are seeing with denser spacing are just due to increased sampling.

Target hypo spacing s244 r12 3sRotD50 multiseed histogram.png

Comparison with v3.3.1

At site USC, there is little difference between the v3.3.1 (in green) and the v5.4.2 hazard curves.

2 sec RotD50 3 sec RotD50 5 sec RotD50 10 sec RotD50
USC v5.4.2 compare with 7052 2sec RotD50.png
USC v5.4.2 compare with 7052 3sec RotD50.png
USC v5.4.2 compare with 7052 5sec RotD50.png
USC v5.4.2 compare with 7052 10sec RotD50.png