Difference between revisions of "CyberShake BBP Integration"

From SCECpedia
Jump to navigationJump to search
Line 438: Line 438:
 
| [[File:CS_v_BBP_scatter_21_2_5.png|400 px]]
 
| [[File:CS_v_BBP_scatter_21_2_5.png|400 px]]
 
| <pre>
 
| <pre>
 +
12992 of 100000 values above threshold 1e-10
 +
Average absolute difference: 0.000152
 +
Max absolute difference: 0.003678
 +
Average absolute percent difference: 0.551089
 +
Max absolute percent difference: 516.549509
 +
Bins:
 +
0.0 <= reference amplitude < 0.01:
 +
avg diff=0.000009, avg percent diff=0.758%
 +
0.01 <= reference amplitude < 0.1:
 +
avg diff=0.000163, avg percent diff=0.429%
 +
0.1 <= reference amplitude < 1.0:
 +
avg diff=0.000459, avg percent diff=0.225%
 +
1.0 <= reference amplitude < 10.0:
 +
None
 
</pre>
 
</pre>
 
|-
 
|-

Revision as of 03:34, 6 May 2020

This page details the process of integrating CyberShake with the Broadband Platform, so that we can produce stochastic high-frequency seismograms in CyberShake as a complement to deterministic low-frequency seismograms.

We performed a similar integration process for CyberShake 1.4 and CyberShake Study 15.12. This time, we would like to avoid maintaining a separate CyberShake version of the high-frequency stochastic codes. Instead, we would like to invoke the BBP codes from CyberShake, so that as new modifications are made to BBP, CyberShake can use these new codes without requiring the entire integration process again.

Approach

We have identified the following BBP executables, or elements, which are needed in CyberShake:

  • srf2stoch (C)
  • hb_high (Fortran)
  • wcc_getpeak (C)
  • wcc_siteamp14 (C)
  • wcc_tfilter (C)
  • wcc_resamp_arbdt (C)
  • wcc_add (C)
  • integ_diff (C)

As part of the BBP, each of these pieces of code contains a main() function, which typically works as follows:

main() {
  // Parse command-line parameters
  // Open and read input files into input data structure
  // Execute science kernel, populating output data structure
  // Open and write output files from output data structure
}

For CyberShake, we would like to be able to use the science kernels of the BBP elements, but provide CyberShake-specific parameters, pass data structures around in memory between multiple elements, and read from and write to different data formats. To accomplish this, we propose extracting the science kernels from the main() functions and creating subroutines with them, separating the I/O from the scientific calculations. Following this approach, a revised main method would contain:

main() {
  // Open and read input files into input data structure
  science_kernel_subroutine(command-line arguments, input_data_structure, output_data_structure)
  // Open and write output files from output data structure
}

science_kernel_subroutine(command-line arguments, input_data_structure, output_data_structure) {
  // Use input data structure for processing
  // Place results in output data structure
}

Then, in the CyberShake codebase, we would use this function as follows:

// Open and read input files into input data structure
// Allocate memory for output data structure
// Create parameter string with CyberShake-specific parameters for getpar to parse
science_kernel_subroutine(parameter_string, input_data_structure, output_data_structure)
// Do additional processing with output data structure
// Open and write output data structure to file

The intent is that these changes would be pushed out to the BBP codebase and also to Rob Graves, so that future revisions work from this refactored version and are straightforward to integrate with CyberShake.

Filename convention

Since for CyberShake we link the CyberShake code with the subroutine object files, the BBP main methods can't be in the subroutine object files (otherwise we would have multiple main()s), and must be contained in separate files. The convention we will follow is to have all subroutines in <module>_sub.c, and the main method with subroutine prototypes in <module>_main.c. The compiled executable used by BBP will have the name <module>_sub, to distinguish it from the non-refactored executables when testing.

Example

As an example, below we show the proposed modifications for wcc_getpeak. wcc_getpeak doesn't have an output data structure; in the BBP the result is printed, whereas in the subroutine it is returned.

Change Original code Modified code
Subroutine prototype N/A

float wcc_getpeak(int param_string_len, char** param_string, float* seis, struct statdata* head1);

Extract science kernel into subroutine

int main(ac,av) {
...
float *s1, amax;
int i;
...
float max = -1.0e+20;
float min = 1.0e+20;
int inbin = 0;
int outbin = 0;
int keepsign = 0;
float scale = 1.0;
...
s1 = NULL;
s1 = read_wccseis(infile,&head1,s1,inbin);
for(i=0;i<head1.nt;i++)
 {
 if(s1[i] > max)
  max = s1[i];
 if(s1[i] < min)
  min = s1[i];
 }
...

...
s1 = NULL;
s1 = read_wccseis(infile,&head1,s1,inbin);

float peak = wcc_getpeak(ac, av, s1, head1);

printf("%10.2f %13.5e %s\n",head1.edist,peak,head1.stat);
}

float wcc_getpeak(int param_string_len, char** param_string, float* s1, struct statdata* head1) {
 float amax;
 int i;

 float max = -1.0e+20;
 float min = 1.0e+20;
 int keepsign = 0;
 float scale = 1.0;
...
 for(i=0;i<head1->nt;i++)
  {
  if(s1[i] > max)
   max = s1[i];
  if(s1[i] < min)
   min = s1[i];
  }
 ...

Relocate parsing of non-I/O parameters to subroutine

int main(ac,av) {
...
sprintf(infile,"stdin");

setpar(ac,av);
getpar("infile","s",infile);
getpar("inbin","d",&inbin);
getpar("keepsign","d",&keepsign);
getpar("scale","f",&scale);
endpar();

s1 = NULL;
s1 = read_wccseis(infile,&head1,s1,inbin);
...

float wcc_getpeak(int param_string_len, char** param_string, float* s1, struct statdata* head1) {
 float max = -1.0e+20;
 float min = 1.0e+20;
 int keepsign = 0;
 float scale = 1.0;

 setpar(param_string_len, param_string);
 getpar("keepsign","d",&keepsign);
 getpar("scale","f",&scale);
 endpar();

 for(i=0;i<head1->nt;i++)
...

Retain I/O in main function

int main(ac,av)
...
char infile[128];

float max = -1.0e+20;
float min = 1.0e+20;
int inbin = 0;
int outbin = 0;
int keepsign = 0;
float scale = 1.0;

sprintf(infile,"stdin");

setpar(ac,av);
getpar("infile","s",infile);
getpar("inbin","d",&inbin);
getpar("keepsign","d",&keepsign);
getpar("scale","f",&scale);
endpar();

s1 = NULL;
s1 = read_wccseis(infile,&head1,s1,inbin);

for(i=0;i<head1.nt;i++)
...

int main(ac,av)
...
char infile[128];

int inbin = 0;
int outbin = 0;

sprintf(infile,"stdin");

setpar(ac,av);
getpar("infile","s",infile);
getpar("inbin","d",&inbin);
endpar();

s1 = NULL;
s1 = read_wccseis(infile,&head1,s1,inbin);

float peak = wcc_getpeak(ac, av, s1, head1);

printf("%10.2f %13.5e %s\n",head1.edist,peak,head1.stat);

Here is a way CyberShake code could use this modified code:

...
float* seis = malloc(nt*sizeof(float));
fread(fp_in, sizeof(float), nt, seis);
struct statdata head1;
head1.nt = nt;
char** param_string = NULL;
float peak = wcc_getpeak(param_string, 0, seis, &head1);
...

Migration Status

Element Refactored Passes BBP test Called from CyberShake
wcc_getpeak yes yes
wcc_add yes yes
wcc_tfilter yes yes
wcc_resamp_arbdt yes yes
integ_diff yes yes
wcc_siteamp14 yes yes
hb_high yes yes yes
srf2stoch yes yes yes

Verification

hb_high

hb_high is the most difficult code to verify, as it's the most complex.

To assist in verification, we constructed scatter plots comparing the value in CyberShake-calling-BBP to the value in BBP directly, for each point in the acceleration time series.

Our initial results are

CS v BBP scatter s280 r7 rv0 initial.png

Upon further investigation, we uncovered a few issues.

  1. The default value in the BBP for kappa is 0.04, which is what we are using in CyberShake. However, when using the LA Basin velocity model, the default value is overwritten in the BBP and 0.045 is used instead. Once we updated to the right value of kappa, that improved the scatter a bit:
    CS v BBP scatter s280 r7 rv0 kappa.png
  2. In the CyberShake code, the geographic coordinates are rounded to 4 decimal places to agree with the output file that srf2stoch produces. However, this rounding is accomplished by multiplying, adding 0.5, casting to an int as a floor(), and then dividing. This is fine for positive numbers, but for negative numbers int and floor are not equivalent. Casting to an int will truncate towards 0, and floor() will truncate towards negative infinity. The rounding used when C writes files matches using floor(). Once this modification is also made, the seismograms very closely agree:
    CS v BBP scatter s280 r7 rv0 floor.png

A visual comparison looks good:

CS v BBP acc s280 r7 rv0.png

A numerical comparison is also good:

Average absolute difference: 0.000000
Max absolute difference: 0.000005
Average absolute percent difference: 0.024559
Max absolute percent difference: 150.758352
Bins:
0.000000 <= reference amplitude < 0.010000:
	avg diff=0.000000, avg percent diff=0.028306%
0.010000 <= reference amplitude < 0.100000:
	avg diff=0.000000, avg percent diff=0.000988%
0.100000 <= reference amplitude < 1.000000:
	avg diff=0.000002, avg percent diff=0.001870%
1.000000 <= reference amplitude < 10.000000:

Site Response

Visual comparisons for source 280, rupture 7, rv 0:

CS v BBP amp vel s280 r7 rv0.png
CS v BBP amp vel s280 r7 rv0 zoom.png
CS v BBP amp vel s280 r7 rv0 zoom2.png

The scatterplot:

CS v BBP amp vel scatter s280 r7 rv0.png

A numerical comparison looks good:

Average absolute difference: 0.000004
Max absolute difference: 0.000069
Average absolute percent difference: 0.265180
Max absolute percent difference: 265.281916
Bins:
0.000000 <= reference amplitude < 0.010000:
	avg diff=0.000001, avg percent diff=0.297034%
0.010000 <= reference amplitude < 0.100000:
	avg diff=0.000022, avg percent diff=0.076535%
0.100000 <= reference amplitude < 1.000000:
1.000000 <= reference amplitude < 10.000000:
10.000000 <= reference amplitude < 100.000000:
100.000000 <= reference amplitude < 1000.000000:
1000.000000 <= reference amplitude < 10000.000000:

Other comparisons

Results for the other 9 ruptures. Each rupture has approximately 60% more rupture surface points than the previous rupture.

Event Seismogram Zoomed Seismogram Scatterplot Numerical comparison
Source 269
Rupture 8
Rup var 1
Santa Monica, M6.85
CS v BBP amp vel 262 8 1.png CS v BBP amp vel 262 8 1 zoom.png CS v BBP scatter 262 8 1.png
97365 of 100000 values above threshold 1e-10
Average absolute difference: 0.000015
Max absolute difference: 0.002969
Average absolute percent difference: 0.083599
Max absolute percent difference: 6179.308700
Bins:
0.0 <= reference amplitude < 0.01:
	avg diff=0.000024, avg percent diff=10.9%
0.01 <= reference amplitude < 0.1:
	avg diff=0.000009, avg percent diff=0.0216%
0.1 <= reference amplitude < 1.0:
	avg diff=0.000018, avg percent diff=0.00697%
1.0 <= reference amplitude < 10.0:
	avg diff=0.000865, avg percent diff=0.0659%
10.0 <= reference amplitude < 100.0:
  None
Source 220
Rupture 10
Rup var 2
North Channel, M6.85
CS v BBP amp vel 220 10 2.png CS v BBP amp vel 220 10 2 zoom.png CS v BBP scatter 220 10 2.png
90932 of 100000 values above threshold 1e-10
Average absolute difference: 0.000005
Max absolute difference: 0.000263
Average absolute percent difference: 0.035927
Max absolute percent difference: 141.763417
Bins:
0.000000 <= reference amplitude < 0.01:
	avg diff=0.000004, avg percent diff=0.377%
0.010000 <= reference amplitude < 0.1:
	avg diff=0.000003, avg percent diff=0.0134%
0.100000 <= reference amplitude < 1.000000:
	avg diff=0.000085, avg percent diff=0.0502%
1.000000 <= reference amplitude < 10.000000:
	None
Source 269
Rupture 24
Rup var 3
Santa Ynez, M6.95
CS v BBP amp vel 269 24 3.png CS v BBP amp vel 269 24 3 zoom.png CS v BBP scatter 269 24 3.png
10620 of 100000 values above threshold 1.0e-10
Average absolute difference: 0.000021
Max absolute difference: 0.000203
Average absolute percent difference: 0.435710
Max absolute percent difference: 1194.548858
Bins:
0.0 <= reference amplitude < 0.01:
	avg diff=0.000003, avg percent diff=0.597%
0.01 <= reference amplitude < 0.1:
	avg diff=0.000038, avg percent diff=0.103%
0.1 <= reference amplitude < 1.0:
	avg diff=0.000117, avg percent diff=0.0702%
1.0 <= reference amplitude < 10.0:
	None
Source 105
Rupture 3
Rup var 4
San Jacinto, M6.95
CS v BBP amp vel 105 3 4.png CS v BBP amp vel 105 3 4 zoom.png CS v BBP scatter 105 3 4.png
39565 of 100000 values above threshold 1e-10
Average absolute difference: 0.000003
Max absolute difference: 0.000082
Average absolute percent difference: 0.095660
Max absolute percent difference: 323.116766
Bins:
0.0 <= reference amplitude < 0.01:
	avg diff=0.000001, avg percent diff=0.100%
0.01 <= reference amplitude < 0.1:
	avg diff=0.000020, avg percent diff=0.0602%
0.1 <= reference amplitude < 1.0:
	avg diff=0.000070, avg percent diff=0.0619%
1.0 <= reference amplitude < 10.0:
	None
Source 21
Rupture 2
Rup var 5
Garlock, M7.15
CS v BBP amp vel 21 2 5.png CS v BBP amp vel 21 2 5 zoom.png CS v BBP scatter 21 2 5.png
12992 of 100000 values above threshold 1e-10
Average absolute difference: 0.000152
Max absolute difference: 0.003678
Average absolute percent difference: 0.551089
Max absolute percent difference: 516.549509
Bins:
0.0 <= reference amplitude < 0.01:
	avg diff=0.000009, avg percent diff=0.758%
0.01 <= reference amplitude < 0.1:
	avg diff=0.000163, avg percent diff=0.429%
0.1 <= reference amplitude < 1.0:
	avg diff=0.000459, avg percent diff=0.225%
1.0 <= reference amplitude < 10.0:
	None
Source 100
Rupture 1
Rup var 6
San Jacinto, M7.25
CS v BBP amp vel 100 1 6.png CS v BBP amp vel 100 1 6 zoom.png CS v BBP scatter 100 1 6.png

Source 10
Rupture 1
Rup var 7
Elsinore, M7.55
CS v BBP amp vel 10 1 7.png CS v BBP amp vel 10 1 7 zoom.png CS v BBP scatter 10 1 7.png

Source 86
Rupture 0
Rup var 8
San Andreas, M7.65
CS v BBP amp vel 86 0 8.png CS v BBP amp vel 86 0 8 zoom.png CS v BBP scatter 86 0 8.png

Source 128
Rupture 1296
Rup var 9
San Andreas, M8.45
CS v BBP amp vel 128 1296 9.png CS v BBP amp vel 128 1296 9 zoom.png CS v BBP scatter 128 1296 9.png