Difference between revisions of "Rupture Variation Generator v5.5.2 code changes"

From SCECpedia
Jump to navigationJump to search
 
(9 intermediate revisions by the same user not shown)
Line 44: Line 44:
 
<li>sliprate_subs.c</li>
 
<li>sliprate_subs.c</li>
 
<li>srf_subs.c (from StandRupFormat)</li>
 
<li>srf_subs.c (from StandRupFormat)</li>
 +
<li>gslip_sliprate_subs.c</li>
 +
<li>geoproj_subs.c</li>
 
<li>wafront2d-rwg.f</li>
 
<li>wafront2d-rwg.f</li>
 
<li>makefile</li>
 
<li>makefile</li>
Line 167: Line 169:
 
<li>Add an entry for zapit:
 
<li>Add an entry for zapit:
 
<pre>void zapit(float* s,int n);</pre></li>
 
<pre>void zapit(float* s,int n);</pre></li>
 +
<li>Add entries for load_command_srf() and load_seed_srf():
 +
<pre>void load_command_srf(struct standrupformat *srf,int ac,char **av);
 +
void load_seed_srf(struct standrupformat *srf,int starting_seed,int ending_seed);</pre></li>
 
<li>At the end of the file, add:
 
<li>At the end of the file, add:
 
<pre>
 
<pre>
Line 509: Line 514:
 
#endif
 
#endif
 
</pre></li>
 
</pre></li>
 +
<li>In line 415, after
 +
<pre>char seedfile[1024];</pre>
 +
add
 +
<pre>//Used to bypass iterating over slips to get to the 'right' seed
 +
int use_unmodified_seed = 0;</pre>
 
<li>In line 444, after
 
<li>In line 444, after
 
<pre>struct velmodel rvmod;</pre>
 
<pre>struct velmodel rvmod;</pre>
Line 543: Line 553:
 
getpar("dohypo","d",&dohypo);
 
getpar("dohypo","d",&dohypo);
 
</pre></li>
 
</pre></li>
<li>In line 707-9, change all the 'srf.version' to 'srf->version'.
+
<li>In line 750-52, change all the 'srf.version' to 'srf->version'.
<li>In line 959, change
+
<li>In line 915, after
 +
<pre>getpar("seed","d",&seed);</pre>
 +
add
 +
<pre>getpar("use_unmodified_seed","d",&use_unmodified_seed);</pre>
 +
<li>In line 1025, change
 
<pre> psrc_orig = read_ruppars(infile,psrc_orig,&mag,&nstk,&ndip,&dstk,&ddip,&dtop,&avgstk,&avgdip,&elon,&elat);</pre>
 
<pre> psrc_orig = read_ruppars(infile,psrc_orig,&mag,&nstk,&ndip,&dstk,&ddip,&dtop,&avgstk,&avgdip,&elon,&elat);</pre>
 
to
 
to
Line 554: Line 568:
 
#endif
 
#endif
 
</pre></li>
 
</pre></li>
<li>In line 1012, change '&srf' to 'srf'.</li>
+
<li>In line 1077, in the init_plane_srf() call, change '&srf' to 'srf'.</li>
<li>In line 1016, change '&srf' to 'srf'.</li>
+
<li>In line 1081, in the get_seg_bounds() call, change '&srf' to 'srf'.</li>
<li>In line 1315, after 'dhypo = dhypo_frac*fwid;', add:
+
<li>In line 1393, after 'dhypo = dhypo_frac*fwid;', add:
 
<pre>
 
<pre>
 
stats->numslip = ns;
 
stats->numslip = ns;
Line 584: Line 598:
 
   free(dip_r);
 
   free(dip_r);
 
   free(psrc_rake);
 
   free(psrc_rake);
 +
  free(vmod.vp);
 +
  free(vmod.vs);
 +
  free(vmod.den);
 +
  free(vmod.th);
 +
  free(vmod.dep);
 +
  free(vmod.mu);
 +
  free(vmod.invb2);
 +
  free_srf_ptrs(srf);
 
   return 0;
 
   return 0;
 
}
 
}
 +
  
 
if (dohypo>=nh) {
 
if (dohypo>=nh) {
Line 602: Line 625:
 
}
 
}
 
</pre></li>
 
</pre></li>
<li>In line 1363, after:
+
<li>In line 1470, after:
 
<pre>
 
<pre>
 
for(js=0;js<ns;js++)    /* loop over slip/rupture realizations */
 
for(js=0;js<ns;js++)    /* loop over slip/rupture realizations */
Line 619: Line 642:
 
   }
 
   }
 
</pre></li>
 
</pre></li>
<li>In lines 1394-1403, change iterative calculation of shypo and dhypo to explicit.  Delete
+
<li>In lines 1500-1509, change iterative calculation of shypo and dhypo to explicit.  Delete
 
<pre>
 
<pre>
 
if(ih_scnt == nhypo_s)
 
if(ih_scnt == nhypo_s)
Line 637: Line 660:
 
       dhypo = hypo_d0 + (js/nhypo_s)*hypo_dd;
 
       dhypo = hypo_d0 + (js/nhypo_s)*hypo_dd;
 
</pre></li>
 
</pre></li>
<li>In line 2394, change '&srf' to 'srf'.</li>
+
<li>In line 1496, put the for loop that calls _sfrand() into an if statement.  It should look like:
<li>In the for loop starting in line 2452, we replace some of these calculations with refactored ones to pull out constants and take advantage of FMA.  Specifically, make the following changes:
+
<pre>
 +
  if (use_unmodified_seed==0) {
 +
      for(k=0;k<10*js;k++)
 +
          sval =  _sfrand(&seed);
 +
  }
 +
</pre>
 +
<li>In line 2653, in the load_slip_srf_dd4() call, change '&srf' to 'srf'.</li>
 +
<li>In the for loop starting in line 2742, starting with for(j=0;j<ndip;j++), we replace some of these calculations with refactored ones to pull out constants and take advantage of FMA.  Specifically, make the following changes:
 
<ol>
 
<ol>
 
<li>Before the for loop, define the following variables:
 
<li>Before the for loop, define the following variables:
Line 646: Line 676:
 
float half_flen_plus_shypo = 0.5*flen + shypo;
 
float half_flen_plus_shypo = 0.5*flen + shypo;
 
y0 = (deep_vrdep1-dtop)/sin(avgdip*rperd);
 
y0 = (deep_vrdep1-dtop)/sin(avgdip*rperd);
float deep_alpha = 1.0 + (deep_vrdep1-deep_rup*deep_vrdep1)/(deep_vrdep2-deep_vrdep1);
+
float deep_alpha = 1.0 + (deep_vrdep1-deep_vrup*deep_vrdep1)/(deep_vrdep2-deep_vrdep1);
float deep_const = (deep_rup - 1.0)/(deep_vrdep2-deep_vrdep1);
+
float deep_const = (deep_vrup - 1.0)/(deep_vrdep2-deep_vrdep1);
 
</pre></li>
 
</pre></li>
 
<li>Replace
 
<li>Replace
Line 671: Line 701:
 
</ol>
 
</ol>
 
</li>
 
</li>
<li>In line 2573, after 'psrc[j].rupt = psrc[j].rupt - tsmin + rupture_delay;', add:
+
<li>In line 2862, after 'psrc[j].rupt = psrc[j].rupt - tsmin + rupture_delay;', add:
 
<pre>
 
<pre>
 
if (dohypo!=ih || doslip!=js) {
 
if (dohypo!=ih || doslip!=js) {
Line 677: Line 707:
 
}
 
}
 
</pre></li>
 
</pre></li>
<li>In line 2601, change '&srf' to 'srf'.</li>
+
<li>In line 2869, in the load_rupt_srf() call, change '&srf' to 'srf'.</li>
<li>In line 2611, remove
+
<li>In line 2892, remove:
<pre>free_srf_stf(&srf);</pre></li>
 
<li>In line 2618, change '&srf' to 'srf'.</li>
 
<li>In line 2631, remove:
 
 
<pre> write2srf(srf,str,outbin);</pre>
 
<pre> write2srf(srf,str,outbin);</pre>
 
Writing out SRFs is done by calling write_srf() on the srf object.</li>
 
Writing out SRFs is done by calling write_srf() on the srf object.</li>
 +
<li>In line 2906, remove
 +
<pre>free_srf_stf(&srf);</pre></li>
 +
<li>In lines 2887 and 2890, change '&srf' to 'srf'.
 
<li>At the very end of the function, we want to free everything not needed by the SRF object.  Add:
 
<li>At the very end of the function, we want to free everything not needed by the SRF object.  Add:
 
<pre>
 
<pre>
Line 719: Line 749:
 
free(rvmod.th);
 
free(rvmod.th);
 
free(rvmod.invb2);
 
free(rvmod.invb2);
 +
free(rslw);
 +
free(fdrt);
 +
free(fspace);
 +
free(ispace);
 +
fftwf_cleanup();
  
 
return 0;
 
return 0;
Line 753: Line 788:
 
*gen_brune_stf
 
*gen_brune_stf
  
This might accidentally convert a 'write' to 'w_rite'; fix that with
+
This might accidentally convert a 'write' to 'w_rite', and you might end up with _f_opfile and __opfile_ro; fix those with
<pre> sed -i 's/w_rite/write/g' *.c *.h */*.c */*.h</pre>
+
<pre>sed -i 's/w_rite/write/g' *.c *.h */*.c */*.h
 +
sed -i 's/__opfile_ro/_opfile_ro/g' *.c *.h */*.c */*.h
 +
sed -i 's/_f_opfile/_fopfile/g' *.c *.h */*.c */*.h
 +
</pre>
  
 
and also run (since Fortran function names can't start with an underscore, but are case insensitive):
 
and also run (since Fortran function names can't start with an underscore, but are case insensitive):
Line 760: Line 798:
 
sed -i 's/SORT/SORT1/g' */*.f
 
sed -i 's/SORT/SORT1/g' */*.f
 
</pre>
 
</pre>
 +
 +
=== Optimizations ===
 +
 +
Based on work done with v5.4.2, we found that we can reduce the runtime of kfilt_beta2() by 75%, speeding up the entire code for large ruptures by 15-20%.
 +
 +
Specifically, we took (slip.c, line 1469)
 +
<pre>
 +
k2 = kx*kx + ky*ky;
 +
fac = 1.0/((1.0 + exp(ord*log(k2*lmin2)))*(1.0 + exp(-ord*log(k2*lmax2))));
 +
amp = fac*exp(-0.5*beta2*log(k2));
 +
</pre>
 +
 +
and modified it via algebraic manipulation and pulling constants out of the loops:
 +
<pre>
 +
//New constants
 +
float lmin2_ord = pow(lmin2, ord);
 +
float lmax2_neg_ord = pow(lmax2, -1.0*ord);
 +
float k2_ord;
 +
 +
for(j=0;j<=ny0/2;j++)  /* only do positive half, then use symmetry */
 +
<snip>
 +
        k2 = kx*kx + ky*ky;
 +
        k2_ord = k2*k2*k2*k2;
 +
        fac = 1.0/((1.0+k2_ord*lmin2_ord)*(1.0+1.0/k2_ord*lmax2_neg_ord));
 +
        amp = fac*1.0/k2;
 +
</pre>
 +
These optimizations depend on beta=2 and ord=4, so before the for loops I added checks that this is in fact true:
 +
<pre>
 +
if (ord!=4) {
 +
        fprintf(stderr, "The optimized calculation of amp in kfilt_beta2 depends on ord=4.  Ord is actually %d, so we are aborting and you should change this calculation back to the unoptimized version.\n", ord);
 +
        exit(5);
 +
}
 +
 +
if (fabs(beta2-2.0)>0.000001) {
 +
        fprintf(stderr, "The optimized calculation of amp in kfilt_beta2 depends on beta2 = 2.  beta2 is actually %d, so we are aborting and you should change this calculation.\n", beta2);
 +
        exit(6);
 +
}
 +
</pre>
 +
  
 
== Compiling ==
 
== Compiling ==
  
 
Go back into the src directory and try to compile it with 'make'.
 
Go back into the src directory and try to compile it with 'make'.
 
With v5.4.2, we are not including the -fno-expensive-optimizations flag, since backwards compatibility with Blue Waters and Titan isn't an issue.
 

Latest revision as of 20:18, 2 November 2022

This page details the changes that were made to the Graves & Pitarka rupture generator v5.5.2, as shipped in the BBP, to make it CyberShake-ready. These changes are required to:

  • Create an API
  • Support using the code in two different modes:
    • Determine the number of rupture variations for a given rupture
    • Generate a rupture variation from a given geometry
  • Support the use of memcached
  • Pass in the rvfrac and seed value, if desired

In this document, $ROOT_DIR is taken to be the <CyberShake home>/software/RuptureCodes/RupGen-api-5.5.2 directory. GenRandV5.0 is the code as obtained from the BBP v22.4.

Installing files

  1. Create src, include, lib, and bin directories inside $ROOT_DIR .
  2. Inside src, create GenRandV5.0.
  3. Copy rupgen_api.c from a previous API version into src.
  4. Copy rupgen_defs.h from a previous API version into src.
  5. Create a symlink to rupgen_defs.h in src/GenRand5.0.
  6. Copy rupgen_api.h from a previous API version into src.
  7. Create a symlink to rupgen_api.h in src/GenRand5.0.
  8. Confusingly, there are two files named structure.h. One is in StandRupFormat and contains the structures needed for an SRF. The other is in GenRandV5.0 and contains the definition for complex, along with a few other supporting structures.
    1. Copy the BBP StandRupFormat/structure.h into src/srf_structure.h .
    2. Copy the BBP GenRandV5.0/structure.h into src/structure.h .
    3. Create a symlink to both files in src/GenRandV5.0.
  9. There are also two files named defs.h.
    1. Copy StandRupFormat/defs.h into src/GenRandV5.0/srf_defs.h .
    2. Copy GenRandV5.0/defs.h into src/GenRandV5.0/defs.h .
  10. From the code provided by Rob, copy the following files into src/GenRandV5.0. Some of them are only lightly used, but it's easier to copy in the entire file rather than only taking the needed functions:
    • function.h
    • include.h
    • fourg.f
    • genslip-v5.5.2.c
    • gslip_srf_subs.c
    • iofunc.c
    • misc.c
    • ruptime.c
    • slip.c
    • sliprate_subs.c
    • srf_subs.c (from StandRupFormat)
    • gslip_sliprate_subs.c
    • geoproj_subs.c
    • wafront2d-rwg.f
    • makefile
  11. Copy Makefile from a previous API version into src.

Editing files

In $ROOT_DIR/src

structure.h

  1. At the top of the file, change '#include "StandRupFormat/structure.h"' to '#include srf_structure.h'
  2. Above that line, add:
    #ifndef STRUCT_H
    #define STRUCT_H
    
  3. At the end of the file, add:
    #endif
    

srf_structure.h

  1. At the top of the file, add:
    #ifndef SRF_STRUCT_H
    #define SRF_STRUCT_H
    
  2. At the end of the file, add:
    #endif
    

Makefile

  1. Edit the GETPAR values to point to the getpar install.
  2. Edit GENSLIP to have the same name as the GenRandV5.0 directory.
  3. Edit the library creation line (under the librupgen.a target, starts with $(AR)) to point to the correct version of genslip.

In GenRandV5.0

defs.h

  1. At the top of the file, add:
    #ifndef DEFS_H
    #define DEFS_H
    
    #include "srf_defs.h"
    
  2. Delete DEFAULT_SHAL_VRUP_FRAC, as it is not used anymore.
  3. At the end of the file, add:
    #endif
    

srf_defs.h

  1. At the top of the file, add:
    #ifndef SRF_DEFS_H
    #define SRF_DEFS_H
    
  2. Delete DEFAULT_SHAL_VRUP_FRAC, as it is not used anymore.
  3. Delete the lines corresponding to RDONLY_FLAGS, RDWR_FLAGS, CROPTR_FLAGS, and the _FILE_OFFSET_BITS if statement, as this already exists in defs.h.
  4. At the end of the file, add:
    #endif
    

include.h

  1. At the top of the file, add:
    #ifndef RUPGEN_INCLUDE_H
    #define RUPGEN_INCLUDE_H
    
  2. At the end of the file, add:
    #include <string.h>
    #include "fftw3.h"
    
    #include "rupgen_api.h"
    #include "rupgen_defs.h"
    
    #ifdef _USE_MEMCACHED
    #include <libmemcached/memcached.h>
    #endif
    
    #endif
    

function.h

  1. At the top of the file, add:
    #ifndef RUPGEN_FUNC_H
    #define RUPGEN_FUNC_H
    
    #include "structure.h"
    
  2. Change the entry for read_ruppars to _read_ruppars, like:
    struct pointsource *_read_ruppars(char *,struct pointsource *,float *,int *,int *,float *,float *,float *,float *,float *,float *,float *);
  3. Add an entry for get_rupt:
    void get_rupt(struct velmodel* vm, float* h, float* srcd, float* recd, float* srcr, float* recr, double* p, double* rad, float* tt);
  4. Add an entry for zapit:
    void zapit(float* s,int n);
  5. Add entries for load_command_srf() and load_seed_srf():
    void load_command_srf(struct standrupformat *srf,int ac,char **av);
    void load_seed_srf(struct standrupformat *srf,int starting_seed,int ending_seed);
  6. At the end of the file, add:
    void write_srf1(struct standrupformat *srf,char *file,int bflag);
    void write_srf2(struct standrupformat *srf,char *file,int bflag);
    
    #ifdef _USE_MEMCACHED
    int mc_genslip(int ac,char **av, rg_stats_t *stats, struct standrupformat* srf, int state, char* mc_server);
    struct pointsource* _mc_read_ruppars(char *file,struct pointsource *psrc,float *mag,int *nx,int *ny,float *dx,float *dy,float *dtop,float *stk,float *dip,float *elon,float *elat, char* mc_server);
    #else
    int genslip(int ac,char **av, rg_stats_t *stats, struct standrupformat* srf, int state);
    #endif
    
    #endif
    

slip.c

  1. In line 850, just before
    arr = check_realloc(arr, n2*sizeof(fftwf_complex));

    insert

    fftwf_destroy_plan(plan);
  2. In line 881, change the 'fftwf_free(arr)' to 'free(arr)'; since arr was allocated using check_malloc/check_realloc it should be freed with just free().

makefile

  1. Edit the 2nd line that begins with OBJS so it contains
    OBJS = iofunc.o misc.o slip.o ruptime.o srf_subs.o gslip_srf_subs.o geoproj_subs.o gslip_sliprate_subs.o wafront2d-rwg.o fourg.o
    
  2. Delete the SRF_OBJS and FDRT_OBJS lines, 3-4.
  3. Edit GETPAR and INCPAR so they correctly point to the Getpar install in <CyberShake home>/software/Getpar .
  4. Modify FFTW_INCDIR and FFTW_LIBDIR to point to the FFTW directories on the system.
  5. Add:
    # Uncomment for memcached support
    MEMC = ../../../../../utils/libmemcached_1.0.18
    CFLAGS := $(CFLAGS) -I${MEMC}/include -D_USE_MEMCACHED
    LDLIBS := ${LDLIBS} -L${MEMC}/lib -lmemcached
    
  6. Edit the 'all' target to only be genslip-v5.5.2.
  7. Modify the genslip-v5.5.2 target. We need to remove unnecessary objects, and compile it to an object rather than an executable for inclusion in the genslip library. It should look like:
  8. genslip-v5.4.2.o : genslip-v5.4.2.c ${OBJS}
            ${CC} ${CFLAGS} -c -o genslip-v5.4.2.o genslip-v5.4.2.c ${LDLIBS}
    Note that if you copy/paste the line above, make will probably give you a whitespace error. Remember to replace the whitespace in front of ${CC} with a tab.
  9. If you choose to keep the other targets, remove ${SRF_OBJS} and ${FDRT_OBJS} and replace with just ${OBJS}.

gslip_srf_subs.c

  1. In line 188, at the end of init_plane_srf(), add initialization for srf.np_seg, so we can free it without worry:
    if (strcmp(srf[0].version, "2.0")==0) {
            srf[0].np_seg = NULL;
            srf[0].srf_hcmnt.cbuf = NULL;
    }
    
  2. Enclose lines 1188-1198 in an if statement:
    if (strcmp(srf[0].version, "2.0")==0) {
            srf[0].srf_hcmnt.nline = 0;
    
            /* 2018-04-02 updated for multiple segs */
            srf[0].nseg = srf[0].srf_prect.nseg;
            srf[0].np_seg = (int *)check_malloc((srf[0].nseg)*sizeof(int));
    
            for(iseg=0;iseg<nseg;iseg++)
               srf[0].np_seg[iseg] = prseg_ptr[iseg].nstk*prseg_ptr[0].ndip;
    
            /* end version 2.0 stuff, just in case */
    }
    

iofunc.c

  1. In line 179, change the function name read_ruppars to _read_ruppars:
    struct pointsource *_read_ruppars(char *file,struct pointsource *psrc,float *mag,int *nx,int *ny,float *dx,float *dy,float *dtop,float *stk,float *dip,float *elon,float *elat)
  2. Add the function _mc_add_file to the end of the file:
    #ifdef _USE_MEMCACHED
    void _mc_add_file(char* filename, char** file_buffer, memcached_st* mst) {
            memcached_return ret;
            uint32_t flags = 0;
    
            FILE* fpr = fopfile(filename, "rb");
            //Get file size so we know how big to make the file buffer
            fseek(fpr, 0, SEEK_END);
            long file_length = ftell(fpr);
            fseek(fpr, 0, SEEK_SET);
            *file_buffer = check_malloc(sizeof(char) * (file_length+1));
            memset(*file_buffer, '\0', file_length+1);
            int part_id = 0;
            long read = fread(*file_buffer, 1, file_length, fpr);
            if (read!=file_length) {
                    fprintf(stderr, "READ ERROR - %ld attempted %ld read.\n", file_length, read);
                    exit(1);
            }
            char* file_key = check_malloc(sizeof(char) * (strlen(filename) + 4));
            char buffer[1000*1000];
            //Memcached thinks this is 1 MB
            int ONE_MB = 1000*1000;
            int i=0;
            while (i*ONE_MB < file_length) {
                    int to_copy = ONE_MB;
                    if (file_length - i*ONE_MB < ONE_MB) {
                            to_copy = file_length - i*ONE_MB;
                    }
                    sprintf(file_key, "%s.%d", filename, i);
                    memcpy(buffer, (*file_buffer)+i*ONE_MB, to_copy);
    
                    ret = memcached_set(mst, file_key, strlen(file_key), buffer, strlen(buffer), 0, flags);
                    if (ret!=MEMCACHED_SUCCESS) {
                            printf("Caching of %s failed.\n", file_key);
                            printf("%s\n", memcached_strerror(mst, ret));
                            printf("value length: %d bytes\n", strlen(buffer));
                    }
                    i++;
                    memset(buffer, '\0', ONE_MB);
            }
            //Commit number of parts
            int num_parts = i;
            char* parts_key = check_malloc(sizeof(char) * (strlen(filename) + 1));
            strcpy(parts_key, filename);
            ret = memcached_set(mst, parts_key, strlen(parts_key), (char*)(&num_parts), sizeof(int), 0, flags);
            if (ret!=MEMCACHED_SUCCESS) {
                    printf("Caching of %s failed.\n", parts_key);
                    printf("%s\n", memcached_strerror(mst, ret));
            }
            free(file_key);
            free(parts_key);
            fclose(fpr);
    }
  3. Add the definition of _mc_read_ruppars:
    struct pointsource* _mc_read_ruppars(char *file,struct pointsource *psrc,float *mag,int *nx,int *ny,float *dx,float *dy,float *dtop,float *stk,float *dip,float *elon,float *elat, char* mc_server)
    {
           FILE *fpr, *fopfile();
           float area;
           int i, nn;
           char str[1024];
    
           double rperd = 0.017453293;
           int ONE_MB = 1000*1000; //This is what memcached thinks 1 MB is
           *dtop = 1.0e+15;
           *stk = 0.0;
           *dip = 0.0;
           *elon = 0.0;
           *elat = 0.0;
    
           char* buffer;
           char* file_data;
           int* num_parts;
    
            printf("Using %s as memcached server.\n", mc_server);
    
            memcached_return ret;
            memcached_st* mst;
    
            mst = memcached_create(NULL);
            memcached_server_st *server = memcached_server_list_append(NULL, mc_server, 11211, &ret);
    
            ret = memcached_server_push(mst, server);
            memcached_server_free(server);
    
            uint32_t flags = 0;
    
            //See if file parts is in cache
            //key is filename + "_" + part
            char* parts_key = check_malloc(sizeof(char) * (strlen(file) + 1));
            strcpy(parts_key, file);
    
            size_t incoming_size;
            num_parts = (int*) memcached_get(mst, parts_key, strlen(parts_key), &incoming_size, &flags, &ret);
            if (num_parts==NULL) {
                    printf("Key %s hasn't been cached, adding.\n", parts_key);
                    _mc_add_file(file, &file_data, mst);
            } else {
                    printf("Found key %s, num parts is %d.\n", parts_key, *num_parts);
                    printf("Looking for parts.\n");
                    //See if all the parts are actually around
                    file_data = check_malloc(sizeof(char) * (ONE_MB * *num_parts + 1));
                    memset(file_data, '\0', ONE_MB* *num_parts + 1);
                    char* file_key = check_malloc(sizeof(char) * (strlen(file) + 4));
                    size_t incoming;
                    for (i=0; i<*num_parts; i++) {
                            sprintf(file_key, "%s.%d", file, i);
                            buffer = (char*) memcached_get(mst, file_key, strlen(file_key), &incoming, &flags, &ret);
                            if (buffer==NULL) {
                                    printf("Piece %s was missing, adding file %s.\n", file_key, file);
                                    //This piece is missing
                                    //Easiest to add the whole file
                                    free(file_data);
                                    _mc_add_file(file, &file_data, mst);
                                    break;
                            } else {
                                    memcpy(file_data+i*ONE_MB, buffer, incoming);
                            }
                    }
                    free(file_key);
            }
    
            free(parts_key);
            memcached_free(mst);
    
            char* tok;
            tok = strtok(file_data, "\n"); /* Probability = <float> */
    
            tok = strtok(NULL, "\n"); /* Magnitude = <float> */
            sscanf(tok,"%*s %*s %f",mag);
    
            tok = strtok(NULL, "\n");   /* GridSpacing = <float> */
            sscanf(tok,"%*s %*s %f",dx);
    
            if(*dx == (float)(0.0))
               {
               fprintf(stderr,"***** input error\n");
               fprintf(stderr,"      GridSpacing = 0.0, exiting...\n");
               exit(-1);
               }
    
            *dy = *dx;
    
            tok = strtok(NULL, "\n");   /* NumRows = <int> */
            sscanf(tok,"%*s %*s %d",ny);
    
            tok = strtok(NULL, "\n");   /* NumCols = <int> */
            sscanf(tok,"%*s %*s %d",nx);
    
            tok = strtok(NULL, "\n");/* header comment */
            psrc = (struct pointsource *)check_realloc(psrc,(*nx)*(*ny)*sizeof(struct pointsource));
    
            area = (*dx)*(*dy)*1.0e+10;  /* km -> cm */
    
            for(i=0;i<(*nx)*(*ny);i++)
               {
               tok = strtok(NULL, "\n");   /* Lat , Lon , Depth , Rake , Dip , Strike */
               sscanf(tok,"%f %f %f %f %f %f",&psrc[i].lat,
                                              &psrc[i].lon,
                                              &psrc[i].dep,
                                              &psrc[i].rak,
                                              &psrc[i].dip,
                                              &psrc[i].stk);
    
               psrc[i].area = area;
    
               if(psrc[i].dep < *dtop)
                  *dtop = psrc[i].dep;
    
               *stk = *stk + psrc[i].stk;
               *dip = *dip + psrc[i].dip;
               }
            *stk = *stk/((*nx)*(*ny));
            *dip = *dip/((*nx)*(*ny));
    
            nn = 0;
            for(i=0;i<(*nx)*(*ny);i++)
               {
               if(psrc[i].dep < (*dtop + 0.01))
                  {
                  *elon = *elon + psrc[i].lon;
                  *elat = *elat + psrc[i].lat;
                  nn++;
                  }
               }
    
            /* adjust for half subfault width */
            *dtop = (*dtop) - 0.5*(*dx)*sin((*dip)*rperd);
    
            if(nn == 0)
               nn++;
    
            *elon = *elon/nn;
            *elat = *elat/nn;
    
            free(file_data);
    
            return(psrc);
    }
    #endif
    

misc.c

  1. In line 68, modify zapit to use a for-loop and specify a return type. The function should become:
    void zapit(float* s,int n)
    {
    int i;
    for (i=0; i<n; i++) {
            s[i] = 0.0;
    }
    }
    
  2. Add the free_srf_ptrs function to clean up everything:
    void free_srf_ptrs(struct standrupformat* srf) {
            struct srf_planerectangle srf_prect = srf->srf_prect;
            struct srf_allpoints srf_apnts = srf->srf_apnts;
    
            free(srf_prect.prectseg);
            int i;
            for (i=0; i<srf_apnts.np; i++) {
                    free(srf_apnts.apntvals[i].stf1);
                    free(srf_apnts.apntvals[i].stf2);
                    free(srf_apnts.apntvals[i].stf3);
            }
            free(srf_apnts.apntvals);
            if (strcmp(srf->version,"2.0")==0) {
                    free(srf->np_seg);
                    free(srf->srf_hcmnt.cbuf);
            }
    }
    

genslip-v5.5.2.c

This is where the majority of changes are made to turn this into an API. I recommend having an original copy for reference.

Note that line numbers are approximate, and reflect the line number after having made the previous edits.

  1. In line 352, change from
    int main(int ac,char **av)
    

    to

    #ifdef _USE_MEMCACHED
    int mc_genslip(int ac, char** av, rg_stats_t *stats, struct standrupformat* srf, int state, char* memcached_server)
    #else
    int genslip(int ac,char **av, rg_stats_t *stats, struct standrupformat* srf, int state)
    #endif
    
  2. In line 415, after
    char seedfile[1024];

    add

    //Used to bypass iterating over slips to get to the 'right' seed
    int use_unmodified_seed = 0;
  3. In line 444, after
    struct velmodel rvmod;

    add initializations:

    rvmod.vs = rvmod.invb2 = NULL;
    rvmod.th = NULL;
  4. In line 451, after
    float delh, hx, hg, gwid2, *gbnd, *gwid, shypo_mseg, *rvfac_seg;

    add initializations:

    rvfac_seg = gbnd = gwid = NULL;
  5. Delete SRF definition in line 602:
    struct standrupformat srf;
  6. After line 621:
    int outbin = 0;

    Add

    int doslip = -1;
    int dohypo = -1;
    
  7. In line 638, after
    double dh, *fdrt, *rslw, *fspace;

    add initializations:

    fdrt = rslw = fspace = NULL;
  8. In line 642, after
    int ixs, iys, nsring, ntot, *ispace;

    add initialization:

    ispace = NULL;
  9. Change 'srf.version' to 'srf->version' in line 653.
  10. In line 653, change the default version to 2.0:
    sprintf(srf->version,"2.0");
  11. In line 741, after 'getpar("outbin","d",&outbin);', add:
    getpar("doslip","d",&doslip);
    getpar("dohypo","d",&dohypo);
    
  12. In line 750-52, change all the 'srf.version' to 'srf->version'.
  13. In line 915, after
    getpar("seed","d",&seed);

    add

    getpar("use_unmodified_seed","d",&use_unmodified_seed);
  14. In line 1025, change
     psrc_orig = read_ruppars(infile,psrc_orig,&mag,&nstk,&ndip,&dstk,&ddip,&dtop,&avgstk,&avgdip,&elon,&elat);

    to

    #ifdef _USE_MEMCACHED
       psrc_orig = _mc_read_ruppars(infile,psrc_orig,&mag,&nstk,&ndip,&dstk,&ddip,&dtop,&avgstk,&avgdip,&elon,&elat,memcached_server);
    #else
       psrc_orig = _read_ruppars(infile,psrc_orig,&mag,&nstk,&ndip,&dstk,&ddip,&dtop,&avgstk,&avgdip,&elon,&elat);
    #endif
    
  15. In line 1077, in the init_plane_srf() call, change '&srf' to 'srf'.
  16. In line 1081, in the get_seg_bounds() call, change '&srf' to 'srf'.
  17. In line 1393, after 'dhypo = dhypo_frac*fwid;', add:
    stats->numslip = ns;
    stats->numhypo = nh;
    
    if (state==GET_STATS) {
       free(rvfac_seg);
       free(gbnd);
       free(gwid);
       free(psrc);
       free(psrc_orig);
       free(slip_c);
       free(rake_c);
       free(tsfac1_c);
       free(rtime1_c);
       free(rough_c);
       free(tsfac2_c);
       free(rtime2_c);
       free(slip_r);
       free(rake_r);
       free(tsfac1_r);
       free(rtime1_r);
       free(rough_r);
       free(tsfac2_r);
       free(rtime2_r);
       free(stk_r);
       free(dip_r);
       free(psrc_rake);
       free(vmod.vp);
       free(vmod.vs);
       free(vmod.den);
       free(vmod.th);
       free(vmod.dep);
       free(vmod.mu);
       free(vmod.invb2);
       free_srf_ptrs(srf);
       return 0;
    }
    
    
    if (dohypo>=nh) {
            fprintf(stderr,"Error:  You requested hypocenter ID %d but from infile %s genslip calculates %d hypocenters.\n", dohypo, infile, nh);
            exit(1);
    }
    
    if (doslip>=ns) {
            fprintf(stderr,"Error:  You requested slip ID %d but from infile %s genslip calculates %d slips.\n", doslip, infile, ns);
            exit(2);
    }
    
    if (ns<-1 || nh<-1) {
            fprintf(stderr, "Error: hypocenter and slip ID need to be >= -1.");
            exit(3);
    }
    
  18. In line 1470, after:
    for(js=0;js<ns;js++)    /* loop over slip/rupture realizations */
       {
    

    add

       if (doslip!=-1 && js>doslip) {
            //We've already done the slip we need, break out of the loop
            break;
       }
    
       if (doslip!=-1 && js<doslip) {
            //Haven't gotten to the right slip yet, continue
            continue;
       }
    
  19. In lines 1500-1509, change iterative calculation of shypo and dhypo to explicit. Delete
    if(ih_scnt == nhypo_s)
             {
             ih_scnt = 0;
             ih_dcnt++;
             }
    
          shypo = hypo_s0 + ih_scnt*hypo_ds;
          dhypo = hypo_d0 + ih_dcnt*hypo_dd;
    
          ih_scnt++;
    

    and replace with

          shypo = hypo_s0 + (js%nhypo_s)*hypo_ds;
          dhypo = hypo_d0 + (js/nhypo_s)*hypo_dd;
    
  20. In line 1496, put the for loop that calls _sfrand() into an if statement. It should look like:
       if (use_unmodified_seed==0) {
           for(k=0;k<10*js;k++)
              sval =  _sfrand(&seed);
       }
    
  21. In line 2653, in the load_slip_srf_dd4() call, change '&srf' to 'srf'.
  22. In the for loop starting in line 2742, starting with for(j=0;j<ndip;j++), we replace some of these calculations with refactored ones to pull out constants and take advantage of FMA. Specifically, make the following changes:
    1. Before the for loop, define the following variables:
      float half_ddip = 0.5*ddip;
      float half_dstk_minus_half_flen = 0.5*dstk - 0.5*flen;
      float half_flen_plus_shypo = 0.5*flen + shypo;
      y0 = (deep_vrdep1-dtop)/sin(avgdip*rperd);
      float deep_alpha = 1.0 + (deep_vrdep1-deep_vrup*deep_vrdep1)/(deep_vrdep2-deep_vrdep1);
      float deep_const = (deep_vrup - 1.0)/(deep_vrdep2-deep_vrdep1);
      
    2. Replace
      yy = (j + 0.5)*ddip;

      with

      yy = j*ddip + half_ddip;
    3. Replace
      xx = (i+0.5)*dstk - 0.5*flen;

      with

      xx = i*dstk + half_dstk_minus_half_flen;
    4. Replace
      hg = (gbnd[ig]-0.5*flen)-shypo;

      with

      hg = gbnd[ig] - half_flen_plus_shypo;
    5. Replace
      vrfac = 1.0 + (deep_vrup - 1.0)*((psrc[ip].dep)-deep_vrdep1)/(deep_vrdep2-deep_vrdep1);

      with

      vrfac = deep_alpha + deep_const*psrc[ip].dep;
  23. In line 2862, after 'psrc[j].rupt = psrc[j].rupt - tsmin + rupture_delay;', add:
    if (dohypo!=ih || doslip!=js) {
        continue;
    }
    
  24. In line 2869, in the load_rupt_srf() call, change '&srf' to 'srf'.
  25. In line 2892, remove:
     write2srf(srf,str,outbin);
    Writing out SRFs is done by calling write_srf() on the srf object.
  26. In line 2906, remove
    free_srf_stf(&srf);
  27. In lines 2887 and 2890, change '&srf' to 'srf'.
  28. At the very end of the function, we want to free everything not needed by the SRF object. Add:
    free(rvfac_seg);
    free(gbnd);
    free(gwid);
    free(psrc);
    free(psrc_orig);
    free(slip_c);
    free(rake_c);
    free(tsfac1_c);
    free(rtime1_c);
    free(rough_c);
    free(tsfac2_c);
    free(rtime2_c);
    free(slip_r);
    free(rake_r);
    free(tsfac1_r);
    free(rtime1_r);
    free(rough_r);
    free(tsfac2_r);
    free(rtime2_r);
    free(stk_r);
    free(dip_r);
    free(psrc_rake);
    free(rspd);
    free(vmod.vp);
    free(vmod.vs);
    free(vmod.den);
    free(vmod.th);
    free(vmod.dep);
    free(vmod.mu);
    free(vmod.invb2);
    free(rvmod.vs);
    free(rvmod.th);
    free(rvmod.invb2);
    free(rslw);
    free(fdrt);
    free(fspace);
    free(ispace);
    fftwf_cleanup();
    
    return 0;
    
  29. Once verification is complete, you may wish to comment out some of the print statements.

Function name changes

Since Rob uses some of the same helper functions in his genslip code as in his jbsim3d code, we prepend an underscore to some of the function names in the genslip library to avoid namespace conflicts when compiling DirectSynth.

From the src directory, run the following sed command

sed -i 's/<function name>/_<function name>/g' *.c *.h */*.c */*.h 

for each of these function names:

  • fopfile
  • opfile_ro
  • opfile
  • croptrfile
  • reed
  • rite
  • check_malloc
  • check_realloc
  • sfrand
  • gaus_rand
  • zapit
  • set_ne
  • set_ll
  • init_plane_srf
  • load_slip_srf
  • load_rupt_srf
  • gen_2tri_stf
  • gen_brune_stf

This might accidentally convert a 'write' to 'w_rite', and you might end up with _f_opfile and __opfile_ro; fix those with

sed -i 's/w_rite/write/g' *.c *.h */*.c */*.h
sed -i 's/__opfile_ro/_opfile_ro/g' *.c *.h */*.c */*.h
sed -i 's/_f_opfile/_fopfile/g' *.c *.h */*.c */*.h

and also run (since Fortran function names can't start with an underscore, but are case insensitive):

sed -i 's/sort/sort1/g' */*.f
sed -i 's/SORT/SORT1/g' */*.f

Optimizations

Based on work done with v5.4.2, we found that we can reduce the runtime of kfilt_beta2() by 75%, speeding up the entire code for large ruptures by 15-20%.

Specifically, we took (slip.c, line 1469)

k2 = kx*kx + ky*ky;
fac = 1.0/((1.0 + exp(ord*log(k2*lmin2)))*(1.0 + exp(-ord*log(k2*lmax2))));
amp = fac*exp(-0.5*beta2*log(k2));

and modified it via algebraic manipulation and pulling constants out of the loops:

//New constants
float lmin2_ord = pow(lmin2, ord);
float lmax2_neg_ord = pow(lmax2, -1.0*ord);
float k2_ord;

for(j=0;j<=ny0/2;j++)  /* only do positive half, then use symmetry */
<snip>
        k2 = kx*kx + ky*ky;
        k2_ord = k2*k2*k2*k2;
        fac = 1.0/((1.0+k2_ord*lmin2_ord)*(1.0+1.0/k2_ord*lmax2_neg_ord));
        amp = fac*1.0/k2;

These optimizations depend on beta=2 and ord=4, so before the for loops I added checks that this is in fact true:

if (ord!=4) {
        fprintf(stderr, "The optimized calculation of amp in kfilt_beta2 depends on ord=4.  Ord is actually %d, so we are aborting and you should change this calculation back to the unoptimized version.\n", ord);
        exit(5);
}

if (fabs(beta2-2.0)>0.000001) {
        fprintf(stderr, "The optimized calculation of amp in kfilt_beta2 depends on beta2 = 2.  beta2 is actually %d, so we are aborting and you should change this calculation.\n", beta2);
        exit(6);
}


Compiling

Go back into the src directory and try to compile it with 'make'.