Rupture Variation Generator v5.4.2

From SCECpedia
Jump to navigationJump to search

This page details the work to migrate the Graves & Pitarka (2019) rupture generator, v5.4.2, to CyberShake.

The specific code changes required to create an API are detailed here: Rupture Variation Generator v5.4.2 code changes

Status

  • Replicate reference SRFs using stand-alone code: Complete
  • Create RupGen-api-5.4.2: Complete
  • Replicate SRFs from stand-alone code using RupGen library: Complete
  • Compile DirectSynth against RupGen library: Complete
  • Replicate SRFs from stand-alone code using DirectSynth: In progress
  • In the database, create new Rupture Variation Scenario ID and populate the Rupture Variations table: Not yet started
  • Perform CyberShake run for USC using RupGen-api-5.4.2: Not yet started

Verification

The verification sequence is:

  1. Reference results from Rob
  2. (1) reproduced using Rob's supplied stand-alone code, compiled and run on a Summit head node.
  3. (2) is used to produce reference SRFs from ERF 36 geometry files for
    1. Source 76, rupture 0 (M6.35)
    2. Source 128, rupture 858 (M7.35)
    3. Source 68, rupture 7 (M8.45)
  4. Results from (3) are reproduced using test code which is compiled against the RupGen-api-5.4.2 library.
  5. Results from (3) are reproduced using DirectSynth and writing out the SRFs.

RupGen-api-5.4.2 against stand-alone code

Source 76, rupture 0

We generated all 77 rupture variations using the stand-alone code, then generated them using test code compiled against the library. These were all done on a login node.

Only a few non-slip fields differed more than the permitted tolerance, less than 1 per variation.

The average difference (which is mostly the difference between slips) was 0.0012%, and the largest difference was ~1e-5 (on values which range up to ~100).

Since in the past we have had issues with an order dependence in the rupture generator, we also spot-checked by generating every 10th variation using the test code. These yielded the same md5sums as when they were generated in order.

Source 128, rupture 858

We generated all 256 rupture variations using the stand-alone code, then generated them using test code compiled against the library. These were all done on a login node.

Each rupture variation has approximately 6 differences outside of the tolerance values (out of approximately 2.5 million values).

The average difference (which is mostly the difference between slips) was 0.0072%, and the largest difference was ~7e-5 (on values which range up to ~1000).

Since in the past we have had issues with an order dependence in the rupture generator, we also spot-checked by generating every 20th variation using the test code. These yielded the same md5sums as when they were generated in order, as did rupture variations 250 and 251 generated consecutively.

Source 68, rupture 7

All these SRFs were generated on the compute nodes. If you generate them on a login node, you will get something slightly different.

This source/rupture combo has 1190 rupture variations, but since each one takes about 90 seconds to generate, it would take 30 hours to create them all. Instead, we used the stand-alone code to generate the first 423 rupture variations, then generated the first 41 using test code compiled against the library.

Using the same tolerances, many rupture variations had more than 10% of points which at least 1 difference, which causes an abort. We doubled the tolerance (from 0.00011 to 0.00021) and ran the comparisons again. Each rupture variation has approximately 498 differences outside of the tolerance values (out of approximately 35 million values).

The average difference (which is mostly the difference between slips) was 0.052%, and the largest difference was ~4e-4 (on values which range up to ~10000).

The spot-check test was done on rupture variations 0, 10, 20, 30, and 40. We also spot-checked 100, 150, 200, 250, 300, 350, and 400 against the reference, with similar differences.

Optimization

v5.4.2 is approximately 3x slower than v3.3.1, so we investigated optimization.

We are running source 68, rupture 7, rupture variation 0 as our benchmark. We run it 5 times and take the average.

Reference runtime: 69.400300 sec

For source 128, rupture 858 (source 68, rupture 7 produced 42GB trace results), Score-P suggests that gaus_rand() and sfrand() are both called an extraordinary number of times and are responsible for about 75% of the runtime (total runtime was 18.9 sec):

cube::Region          NumberOfCalls ExclusiveTime InclusiveTime
gaus_rand                   5531760      8.104156     15.400368
sfrand                     66381120      7.296212      7.296212
fft2d_fftw                       14      1.584906      1.584949
write_srf2                        1      1.272117      1.272618
kfilt_beta2                       2      1.182854     15.778140
...

Various optimizations on gaus_rand() and sfrand(), and the -mcpu=power9 compiler flag got us about 1% speedup on source 68, rupture 7.

Running Score-P on a larger rupture (source 63, rupture 3) gives a different breakdown, with more time spent in kfilt_beta2 and fft2d_fftw:

cube::Region          NumberOfCalls ExclusiveTime InclusiveTime
fft2d_fftw                       14      7.450708      7.450762
kfilt_beta2                       2      5.281393      8.775227
write_srf2                        1      3.780037      3.780435
gaus_rand                  24231024      3.583294      3.583294
kfilter                           1      0.822850      0.822850
gen_Mliu_stf                 110892      0.365619      0.398298
_mc_read_ruppars                  1      0.197096      0.271803
mc_genslip                        1      0.152591     18.446691

Only about 0.7 sec of the time in fft2d_fftw that's used isn't due to fftw calls, so there's not much potential for optimization.

In kfilt_beta2, about 25% of the time is spent in the if statement which calculates amp. Some refactoring to pull out constants and eliminate as many exp() and log() calls as possible reduced the runtime of this if statement by 75%, yielding an overall improvement of 19%.