Rupture Variation Generator v5.4.2
This page details the work to migrate the Graves & Pitarka (2019) rupture generator, v5.4.2, to CyberShake.
The specific code changes required to create an API are detailed here: Rupture Variation Generator v5.4.2 code changes
Status
The work below was performed on Summit.
- Replicate reference SRFs using stand-alone code: Complete
- Create RupGen-api-5.4.2: Complete
- Replicate SRFs from stand-alone code using RupGen library: Complete
- Compile DirectSynth against RupGen library: Complete
- Replicate SRFs from stand-alone code using DirectSynth: Complete
- In the database, create new Rupture Variation Scenario ID and populate the Rupture Variations table: In progress
- Perform CyberShake run for USC using RupGen-api-5.4.2: Not yet started
Verification
The verification sequence is:
- Reference results from Rob
- (1) reproduced using Rob's supplied stand-alone code, compiled and run on a Summit head node.
- (2) is used to produce reference SRFs from ERF 36 geometry files for
- Source 76, rupture 0 (M6.35)
- Source 128, rupture 858 (M7.35)
- Source 68, rupture 7 (M8.45)
- Results from (3) are reproduced using test code which is compiled against the RupGen-api-5.4.2 library.
- Results from (3) are reproduced using DirectSynth and writing out the SRFs.
RupGen-api-5.4.2 against stand-alone code
Source 76, rupture 0
We generated all 77 rupture variations using the stand-alone code, then generated them using test code compiled against the library. These were all done on a login node.
Only a few non-slip fields differed more than the permitted tolerance, less than 1 per variation.
The average largest percent difference (which is mostly the difference between slips) was 0.0012%, and the average largest difference was ~1e-5 (on values which range up to ~100).
Since in the past we have had issues with an order dependence in the rupture generator, we also spot-checked by generating every 10th variation using the test code. These yielded the same md5sums as when they were generated in order.
Source 128, rupture 858
We generated all 256 rupture variations using the stand-alone code, then generated them using test code compiled against the library. These were all done on a login node.
Each rupture variation has approximately 6 differences outside of the tolerance values (out of approximately 2.5 million values).
The average largest percent difference (which is mostly the difference between slips) was 0.0072%, and the average largest difference was ~7e-5 (on values which range up to ~1000).
Since in the past we have had issues with an order dependence in the rupture generator, we also spot-checked by generating every 20th variation using the test code. These yielded the same md5sums as when they were generated in order, as did rupture variations 250 and 251 generated consecutively.
Source 68, rupture 7
All these SRFs were generated on the compute nodes. If you generate them on a login node, you will get something slightly different.
This source/rupture combo has 1190 rupture variations, but since each one takes about 90 seconds to generate, it would take 30 hours to create them all. Instead, we used the stand-alone code to generate the first 423 rupture variations, then generated the first 41 using test code compiled against the library.
Using the same tolerances, many rupture variations had more than 10% of points which at least 1 difference, which causes an abort. We doubled the tolerance (from 0.00011 to 0.00021) and ran the comparisons again. Each rupture variation has approximately 498 differences outside of the tolerance values (out of approximately 35 million values).
The average largest percent difference (which is mostly the difference between slips) was 0.052%, and the average largest difference was ~4e-4 (on values which range up to ~10000).
The spot-check test was done on rupture variations 0, 10, 20, 30, and 40. We also spot-checked 100, 150, 200, 250, 300, 350, and 400 against the reference, with similar differences.
DirectSynth compiled with librupgen compared with stand-alone code
For this comparison, we recalculated the stand-alone code results using a compute node, since we saw that this introduces some slight differences, and DirectSynth has to be run on a login node.
Source 76, rupture 0
We compared all 77 rupture variations.
The average largest difference was 4e-5, and average largest percent difference was 0.0032%.
Using the slightly higher difference tolerance (0.000021), we average 1 difference higher than the tolerance per variation.
Source 128, rupture 858
We compared all 256 rupture variations.
The average largest difference was 6e-5, and average largest percent difference was 0.0044%.
Using the slightly higher difference tolerance (0.000021), we average 12 differences higher than the tolerance per variation.
Source 68, rupture 7
We compared only the first 40 rupture variations.
The average largest difference was 8e-4, and average largest percent difference was 0.1293%.
Using the slightly higher difference tolerance (0.000021), we average 19,700 differences higher than the tolerance per variation.
Since this is quite a bit worse than the previous comparison, we also did a comparison with the results generated directly from the library, and got about the same results.
Optimization
v5.4.2 is approximately 3x slower than v3.3.1, so we investigated optimization.
We are running source 68, rupture 7, rupture variation 0 as our benchmark. We run it 5 times and take the average.
Reference runtime: 69.400300 sec
For source 128, rupture 858 (source 68, rupture 7 produced 42GB trace results), Score-P suggests that gaus_rand() and sfrand() are both called an extraordinary number of times and are responsible for about 75% of the runtime (total runtime was 18.9 sec):
cube::Region NumberOfCalls ExclusiveTime InclusiveTime gaus_rand 5531760 8.104156 15.400368 sfrand 66381120 7.296212 7.296212 fft2d_fftw 14 1.584906 1.584949 write_srf2 1 1.272117 1.272618 kfilt_beta2 2 1.182854 15.778140 ...
Various optimizations on gaus_rand() and sfrand(), and the -mcpu=power9 compiler flag got us about 1% speedup on source 68, rupture 7.
Running Score-P on a larger rupture (source 63, rupture 3) gives a different breakdown, with more time spent in kfilt_beta2 and fft2d_fftw:
cube::Region NumberOfCalls ExclusiveTime InclusiveTime fft2d_fftw 14 7.450708 7.450762 kfilt_beta2 2 5.281393 8.775227 write_srf2 1 3.780037 3.780435 gaus_rand 24231024 3.583294 3.583294 kfilter 1 0.822850 0.822850 gen_Mliu_stf 110892 0.365619 0.398298 _mc_read_ruppars 1 0.197096 0.271803 mc_genslip 1 0.152591 18.446691
Only about 0.7 sec of the time in fft2d_fftw that's used isn't due to fftw calls, so there's not much potential for optimization.
In kfilt_beta2, about 25% of the time is spent in the if statement which calculates amp. Some refactoring to pull out constants and eliminate as many exp() and log() calls as possible reduced the runtime of this if statement by 75%, yielding an overall improvement of 19%.
target_hypo_spacing
One of the parameters to genslip is 'target_hypo_spacing', which determines the spacing between hypocenters when rupture variations are generated. Decreasing the spacing improves sampling, but also increases the number of rupture variations and increases runtime.