Rupture Variation Generator v5.5.2 code changes
This page details the changes that were made to the Graves & Pitarka rupture generator v5.5.2, as shipped in the BBP, to make it CyberShake-ready. These changes are required to:
- Create an API
- Support using the code in two different modes:
- Determine the number of rupture variations for a given rupture
- Generate a rupture variation from a given geometry
- Support the use of memcached
- Pass in the rvfrac and seed value, if desired
In this document, $ROOT_DIR is taken to be the <CyberShake home>/software/RuptureCodes/RupGen-api-5.5.2 directory. GenRandV5.0 is the code as obtained from the BBP v22.4.
Contents
Installing files
- Create src, include, lib, and bin directories inside $ROOT_DIR .
- Inside src, create GenRandV5.0.
- Copy rupgen_api.c from a previous API version into src.
- Copy rupgen_defs.h from a previous API version into src.
- Create a symlink to rupgen_defs.h in src/GenRand5.0.
- Copy rupgen_api.h from a previous API version into src.
- Create a symlink to rupgen_api.h in src/GenRand5.0.
- Confusingly, there are two files named structure.h. One is in StandRupFormat and contains the structures needed for an SRF. The other is in GenRandV5.0 and contains the definition for complex, along with a few other supporting structures.
- Copy the BBP StandRupFormat/structure.h into src/srf_structure.h .
- Copy the BBP GenRandV5.0/structure.h into src/structure.h .
- Create a symlink to both files in src/GenRandV5.0.
- There are also two files named defs.h.
- Copy StandRupFormat/defs.h into src/GenRandV5.0/srf_defs.h .
- Copy GenRandV5.0/defs.h into src/GenRandV5.0/defs.h .
- From the code provided by Rob, copy the following files into src/GenRandV5.0. Some of them are only lightly used, but it's easier to copy in the entire file rather than only taking the needed functions:
- function.h
- include.h
- fourg.f
- genslip-v5.5.2.c
- gslip_srf_subs.c
- iofunc.c
- misc.c
- ruptime.c
- slip.c
- sliprate_subs.c
- srf_subs.c (from StandRupFormat)
- gslip_sliprate_subs.c
- geoproj_subs.c
- wafront2d-rwg.f
- makefile
- Copy Makefile from a previous API version into src.
Editing files
In $ROOT_DIR/src
structure.h
- At the top of the file, change '#include "StandRupFormat/structure.h"' to '#include srf_structure.h'
- Above that line, add:
#ifndef STRUCT_H #define STRUCT_H
- At the end of the file, add:
#endif
srf_structure.h
- At the top of the file, add:
#ifndef SRF_STRUCT_H #define SRF_STRUCT_H
- At the end of the file, add:
#endif
Makefile
- Edit the GETPAR values to point to the getpar install.
- Edit GENSLIP to have the same name as the GenRandV5.0 directory.
- Edit the library creation line (under the librupgen.a target, starts with $(AR)) to point to the correct version of genslip.
In GenRandV5.0
defs.h
- At the top of the file, add:
#ifndef DEFS_H #define DEFS_H #include "srf_defs.h"
- Delete DEFAULT_SHAL_VRUP_FRAC, as it is not used anymore.
- At the end of the file, add:
#endif
srf_defs.h
- At the top of the file, add:
#ifndef SRF_DEFS_H #define SRF_DEFS_H
- Delete DEFAULT_SHAL_VRUP_FRAC, as it is not used anymore.
- Delete the lines corresponding to RDONLY_FLAGS, RDWR_FLAGS, CROPTR_FLAGS, and the _FILE_OFFSET_BITS if statement, as this already exists in defs.h.
- At the end of the file, add:
#endif
include.h
- At the top of the file, add:
#ifndef RUPGEN_INCLUDE_H #define RUPGEN_INCLUDE_H
- At the end of the file, add:
#include <string.h> #include "fftw3.h" #include "rupgen_api.h" #include "rupgen_defs.h" #ifdef _USE_MEMCACHED #include <libmemcached/memcached.h> #endif #endif
function.h
- At the top of the file, add:
#ifndef RUPGEN_FUNC_H #define RUPGEN_FUNC_H #include "structure.h"
- Change the entry for read_ruppars to _read_ruppars, like:
struct pointsource *_read_ruppars(char *,struct pointsource *,float *,int *,int *,float *,float *,float *,float *,float *,float *,float *);
- Add an entry for get_rupt:
void get_rupt(struct velmodel* vm, float* h, float* srcd, float* recd, float* srcr, float* recr, double* p, double* rad, float* tt);
- Add an entry for zapit:
void zapit(float* s,int n);
- Add entries for load_command_srf() and load_seed_srf():
void load_command_srf(struct standrupformat *srf,int ac,char **av); void load_seed_srf(struct standrupformat *srf,int starting_seed,int ending_seed);
- At the end of the file, add:
void write_srf1(struct standrupformat *srf,char *file,int bflag); void write_srf2(struct standrupformat *srf,char *file,int bflag); #ifdef _USE_MEMCACHED int mc_genslip(int ac,char **av, rg_stats_t *stats, struct standrupformat* srf, int state, char* mc_server); struct pointsource* _mc_read_ruppars(char *file,struct pointsource *psrc,float *mag,int *nx,int *ny,float *dx,float *dy,float *dtop,float *stk,float *dip,float *elon,float *elat, char* mc_server); #else int genslip(int ac,char **av, rg_stats_t *stats, struct standrupformat* srf, int state); #endif #endif
slip.c
- In line 850, just before
arr = check_realloc(arr, n2*sizeof(fftwf_complex));
insert
fftwf_destroy_plan(plan);
- In line 881, change the 'fftwf_free(arr)' to 'free(arr)'; since arr was allocated using check_malloc/check_realloc it should be freed with just free().
makefile
- Edit the 2nd line that begins with OBJS so it contains
OBJS = iofunc.o misc.o slip.o ruptime.o srf_subs.o gslip_srf_subs.o geoproj_subs.o gslip_sliprate_subs.o wafront2d-rwg.o fourg.o
- Delete the SRF_OBJS and FDRT_OBJS lines, 3-4.
- Edit GETPAR and INCPAR so they correctly point to the Getpar install in <CyberShake home>/software/Getpar .
- Modify FFTW_INCDIR and FFTW_LIBDIR to point to the FFTW directories on the system.
- Add:
# Uncomment for memcached support MEMC = ../../../../../utils/libmemcached_1.0.18 CFLAGS := $(CFLAGS) -I${MEMC}/include -D_USE_MEMCACHED LDLIBS := ${LDLIBS} -L${MEMC}/lib -lmemcached
- Edit the 'all' target to only be genslip-v5.5.2.
- Modify the genslip-v5.5.2 target. We need to remove unnecessary objects, and compile it to an object rather than an executable for inclusion in the genslip library. It should look like:
- If you choose to keep the other targets, remove ${SRF_OBJS} and ${FDRT_OBJS} and replace with just ${OBJS}.
genslip-v5.4.2.o : genslip-v5.4.2.c ${OBJS} ${CC} ${CFLAGS} -c -o genslip-v5.4.2.o genslip-v5.4.2.c ${LDLIBS}Note that if you copy/paste the line above, make will probably give you a whitespace error. Remember to replace the whitespace in front of ${CC} with a tab.
gslip_srf_subs.c
- In line 188, at the end of init_plane_srf(), add initialization for srf.np_seg, so we can free it without worry:
if (strcmp(srf[0].version, "2.0")==0) { srf[0].np_seg = NULL; srf[0].srf_hcmnt.cbuf = NULL; }
- Enclose lines 1188-1198 in an if statement:
if (strcmp(srf[0].version, "2.0")==0) { srf[0].srf_hcmnt.nline = 0; /* 2018-04-02 updated for multiple segs */ srf[0].nseg = srf[0].srf_prect.nseg; srf[0].np_seg = (int *)check_malloc((srf[0].nseg)*sizeof(int)); for(iseg=0;iseg<nseg;iseg++) srf[0].np_seg[iseg] = prseg_ptr[iseg].nstk*prseg_ptr[0].ndip; /* end version 2.0 stuff, just in case */ }
iofunc.c
- In line 179, change the function name read_ruppars to _read_ruppars:
struct pointsource *_read_ruppars(char *file,struct pointsource *psrc,float *mag,int *nx,int *ny,float *dx,float *dy,float *dtop,float *stk,float *dip,float *elon,float *elat)
- Add the function _mc_add_file to the end of the file:
#ifdef _USE_MEMCACHED void _mc_add_file(char* filename, char** file_buffer, memcached_st* mst) { memcached_return ret; uint32_t flags = 0; FILE* fpr = fopfile(filename, "rb"); //Get file size so we know how big to make the file buffer fseek(fpr, 0, SEEK_END); long file_length = ftell(fpr); fseek(fpr, 0, SEEK_SET); *file_buffer = check_malloc(sizeof(char) * (file_length+1)); memset(*file_buffer, '\0', file_length+1); int part_id = 0; long read = fread(*file_buffer, 1, file_length, fpr); if (read!=file_length) { fprintf(stderr, "READ ERROR - %ld attempted %ld read.\n", file_length, read); exit(1); } char* file_key = check_malloc(sizeof(char) * (strlen(filename) + 4)); char buffer[1000*1000]; //Memcached thinks this is 1 MB int ONE_MB = 1000*1000; int i=0; while (i*ONE_MB < file_length) { int to_copy = ONE_MB; if (file_length - i*ONE_MB < ONE_MB) { to_copy = file_length - i*ONE_MB; } sprintf(file_key, "%s.%d", filename, i); memcpy(buffer, (*file_buffer)+i*ONE_MB, to_copy); ret = memcached_set(mst, file_key, strlen(file_key), buffer, strlen(buffer), 0, flags); if (ret!=MEMCACHED_SUCCESS) { printf("Caching of %s failed.\n", file_key); printf("%s\n", memcached_strerror(mst, ret)); printf("value length: %d bytes\n", strlen(buffer)); } i++; memset(buffer, '\0', ONE_MB); } //Commit number of parts int num_parts = i; char* parts_key = check_malloc(sizeof(char) * (strlen(filename) + 1)); strcpy(parts_key, filename); ret = memcached_set(mst, parts_key, strlen(parts_key), (char*)(&num_parts), sizeof(int), 0, flags); if (ret!=MEMCACHED_SUCCESS) { printf("Caching of %s failed.\n", parts_key); printf("%s\n", memcached_strerror(mst, ret)); } free(file_key); free(parts_key); fclose(fpr); }
- Add the definition of _mc_read_ruppars:
struct pointsource* _mc_read_ruppars(char *file,struct pointsource *psrc,float *mag,int *nx,int *ny,float *dx,float *dy,float *dtop,float *stk,float *dip,float *elon,float *elat, char* mc_server) { FILE *fpr, *fopfile(); float area; int i, nn; char str[1024]; double rperd = 0.017453293; int ONE_MB = 1000*1000; //This is what memcached thinks 1 MB is *dtop = 1.0e+15; *stk = 0.0; *dip = 0.0; *elon = 0.0; *elat = 0.0; char* buffer; char* file_data; int* num_parts; printf("Using %s as memcached server.\n", mc_server); memcached_return ret; memcached_st* mst; mst = memcached_create(NULL); memcached_server_st *server = memcached_server_list_append(NULL, mc_server, 11211, &ret); ret = memcached_server_push(mst, server); memcached_server_free(server); uint32_t flags = 0; //See if file parts is in cache //key is filename + "_" + part char* parts_key = check_malloc(sizeof(char) * (strlen(file) + 1)); strcpy(parts_key, file); size_t incoming_size; num_parts = (int*) memcached_get(mst, parts_key, strlen(parts_key), &incoming_size, &flags, &ret); if (num_parts==NULL) { printf("Key %s hasn't been cached, adding.\n", parts_key); _mc_add_file(file, &file_data, mst); } else { printf("Found key %s, num parts is %d.\n", parts_key, *num_parts); printf("Looking for parts.\n"); //See if all the parts are actually around file_data = check_malloc(sizeof(char) * (ONE_MB * *num_parts + 1)); memset(file_data, '\0', ONE_MB* *num_parts + 1); char* file_key = check_malloc(sizeof(char) * (strlen(file) + 4)); size_t incoming; for (i=0; i<*num_parts; i++) { sprintf(file_key, "%s.%d", file, i); buffer = (char*) memcached_get(mst, file_key, strlen(file_key), &incoming, &flags, &ret); if (buffer==NULL) { printf("Piece %s was missing, adding file %s.\n", file_key, file); //This piece is missing //Easiest to add the whole file free(file_data); _mc_add_file(file, &file_data, mst); break; } else { memcpy(file_data+i*ONE_MB, buffer, incoming); } } free(file_key); } free(parts_key); memcached_free(mst); char* tok; tok = strtok(file_data, "\n"); /* Probability = <float> */ tok = strtok(NULL, "\n"); /* Magnitude = <float> */ sscanf(tok,"%*s %*s %f",mag); tok = strtok(NULL, "\n"); /* GridSpacing = <float> */ sscanf(tok,"%*s %*s %f",dx); if(*dx == (float)(0.0)) { fprintf(stderr,"***** input error\n"); fprintf(stderr," GridSpacing = 0.0, exiting...\n"); exit(-1); } *dy = *dx; tok = strtok(NULL, "\n"); /* NumRows = <int> */ sscanf(tok,"%*s %*s %d",ny); tok = strtok(NULL, "\n"); /* NumCols = <int> */ sscanf(tok,"%*s %*s %d",nx); tok = strtok(NULL, "\n");/* header comment */ psrc = (struct pointsource *)check_realloc(psrc,(*nx)*(*ny)*sizeof(struct pointsource)); area = (*dx)*(*dy)*1.0e+10; /* km -> cm */ for(i=0;i<(*nx)*(*ny);i++) { tok = strtok(NULL, "\n"); /* Lat , Lon , Depth , Rake , Dip , Strike */ sscanf(tok,"%f %f %f %f %f %f",&psrc[i].lat, &psrc[i].lon, &psrc[i].dep, &psrc[i].rak, &psrc[i].dip, &psrc[i].stk); psrc[i].area = area; if(psrc[i].dep < *dtop) *dtop = psrc[i].dep; *stk = *stk + psrc[i].stk; *dip = *dip + psrc[i].dip; } *stk = *stk/((*nx)*(*ny)); *dip = *dip/((*nx)*(*ny)); nn = 0; for(i=0;i<(*nx)*(*ny);i++) { if(psrc[i].dep < (*dtop + 0.01)) { *elon = *elon + psrc[i].lon; *elat = *elat + psrc[i].lat; nn++; } } /* adjust for half subfault width */ *dtop = (*dtop) - 0.5*(*dx)*sin((*dip)*rperd); if(nn == 0) nn++; *elon = *elon/nn; *elat = *elat/nn; free(file_data); return(psrc); } #endif
misc.c
- In line 68, modify zapit to use a for-loop and specify a return type. The function should become:
void zapit(float* s,int n) { int i; for (i=0; i<n; i++) { s[i] = 0.0; } }
- Add the free_srf_ptrs function to clean up everything:
void free_srf_ptrs(struct standrupformat* srf) { struct srf_planerectangle srf_prect = srf->srf_prect; struct srf_allpoints srf_apnts = srf->srf_apnts; free(srf_prect.prectseg); int i; for (i=0; i<srf_apnts.np; i++) { free(srf_apnts.apntvals[i].stf1); free(srf_apnts.apntvals[i].stf2); free(srf_apnts.apntvals[i].stf3); } free(srf_apnts.apntvals); if (strcmp(srf->version,"2.0")==0) { free(srf->np_seg); free(srf->srf_hcmnt.cbuf); } }
genslip-v5.5.2.c
This is where the majority of changes are made to turn this into an API. I recommend having an original copy for reference.
Note that line numbers are approximate, and reflect the line number after having made the previous edits.
- In line 352, change from
int main(int ac,char **av)
to
#ifdef _USE_MEMCACHED int mc_genslip(int ac, char** av, rg_stats_t *stats, struct standrupformat* srf, int state, char* memcached_server) #else int genslip(int ac,char **av, rg_stats_t *stats, struct standrupformat* srf, int state) #endif
- In line 415, after
char seedfile[1024];
add
//Used to bypass iterating over slips to get to the 'right' seed int use_unmodified_seed = 0;
- In line 444, after
struct velmodel rvmod;
add initializations:
rvmod.vs = rvmod.invb2 = NULL; rvmod.th = NULL;
- In line 451, after
float delh, hx, hg, gwid2, *gbnd, *gwid, shypo_mseg, *rvfac_seg;
add initializations:
rvfac_seg = gbnd = gwid = NULL;
- Delete SRF definition in line 602:
struct standrupformat srf;
- After line 621:
int outbin = 0;
Add
int doslip = -1; int dohypo = -1;
- In line 638, after
double dh, *fdrt, *rslw, *fspace;
add initializations:
fdrt = rslw = fspace = NULL;
- In line 642, after
int ixs, iys, nsring, ntot, *ispace;
add initialization:
ispace = NULL;
- Change 'srf.version' to 'srf->version' in line 653.
- In line 653, change the default version to 2.0:
sprintf(srf->version,"2.0");
- In line 741, after 'getpar("outbin","d",&outbin);', add:
getpar("doslip","d",&doslip); getpar("dohypo","d",&dohypo);
- In line 750-52, change all the 'srf.version' to 'srf->version'.
- In line 915, after
getpar("seed","d",&seed);
add
getpar("use_unmodified_seed","d",&use_unmodified_seed);
- In line 1025, change
psrc_orig = read_ruppars(infile,psrc_orig,&mag,&nstk,&ndip,&dstk,&ddip,&dtop,&avgstk,&avgdip,&elon,&elat);
to
#ifdef _USE_MEMCACHED psrc_orig = _mc_read_ruppars(infile,psrc_orig,&mag,&nstk,&ndip,&dstk,&ddip,&dtop,&avgstk,&avgdip,&elon,&elat,memcached_server); #else psrc_orig = _read_ruppars(infile,psrc_orig,&mag,&nstk,&ndip,&dstk,&ddip,&dtop,&avgstk,&avgdip,&elon,&elat); #endif
- In line 1077, in the init_plane_srf() call, change '&srf' to 'srf'.
- In line 1081, in the get_seg_bounds() call, change '&srf' to 'srf'.
- In line 1393, after 'dhypo = dhypo_frac*fwid;', add:
stats->numslip = ns; stats->numhypo = nh; if (state==GET_STATS) { free(rvfac_seg); free(gbnd); free(gwid); free(psrc); free(psrc_orig); free(slip_c); free(rake_c); free(tsfac1_c); free(rtime1_c); free(rough_c); free(tsfac2_c); free(rtime2_c); free(slip_r); free(rake_r); free(tsfac1_r); free(rtime1_r); free(rough_r); free(tsfac2_r); free(rtime2_r); free(stk_r); free(dip_r); free(psrc_rake); free(vmod.vp); free(vmod.vs); free(vmod.den); free(vmod.th); free(vmod.dep); free(vmod.mu); free(vmod.invb2); free_srf_ptrs(srf); return 0; } if (dohypo>=nh) { fprintf(stderr,"Error: You requested hypocenter ID %d but from infile %s genslip calculates %d hypocenters.\n", dohypo, infile, nh); exit(1); } if (doslip>=ns) { fprintf(stderr,"Error: You requested slip ID %d but from infile %s genslip calculates %d slips.\n", doslip, infile, ns); exit(2); } if (ns<-1 || nh<-1) { fprintf(stderr, "Error: hypocenter and slip ID need to be >= -1."); exit(3); }
- In line 1470, after:
for(js=0;js<ns;js++) /* loop over slip/rupture realizations */ {
add
if (doslip!=-1 && js>doslip) { //We've already done the slip we need, break out of the loop break; } if (doslip!=-1 && js<doslip) { //Haven't gotten to the right slip yet, continue continue; }
- In lines 1500-1509, change iterative calculation of shypo and dhypo to explicit. Delete
if(ih_scnt == nhypo_s) { ih_scnt = 0; ih_dcnt++; } shypo = hypo_s0 + ih_scnt*hypo_ds; dhypo = hypo_d0 + ih_dcnt*hypo_dd; ih_scnt++;
and replace with
shypo = hypo_s0 + (js%nhypo_s)*hypo_ds; dhypo = hypo_d0 + (js/nhypo_s)*hypo_dd;
- In line 1496, put the for loop that calls _sfrand() into an if statement. It should look like:
if (use_unmodified_seed==0) { for(k=0;k<10*js;k++) sval = _sfrand(&seed); }
- In line 2653, in the load_slip_srf_dd4() call, change '&srf' to 'srf'.
- In the for loop starting in line 2742, starting with for(j=0;j<ndip;j++), we replace some of these calculations with refactored ones to pull out constants and take advantage of FMA. Specifically, make the following changes:
- Before the for loop, define the following variables:
float half_ddip = 0.5*ddip; float half_dstk_minus_half_flen = 0.5*dstk - 0.5*flen; float half_flen_plus_shypo = 0.5*flen + shypo; y0 = (deep_vrdep1-dtop)/sin(avgdip*rperd); float deep_alpha = 1.0 + (deep_vrdep1-deep_vrup*deep_vrdep1)/(deep_vrdep2-deep_vrdep1); float deep_const = (deep_vrup - 1.0)/(deep_vrdep2-deep_vrdep1);
- Replace
yy = (j + 0.5)*ddip;
with
yy = j*ddip + half_ddip;
- Replace
xx = (i+0.5)*dstk - 0.5*flen;
with
xx = i*dstk + half_dstk_minus_half_flen;
- Replace
hg = (gbnd[ig]-0.5*flen)-shypo;
with
hg = gbnd[ig] - half_flen_plus_shypo;
- Replace
vrfac = 1.0 + (deep_vrup - 1.0)*((psrc[ip].dep)-deep_vrdep1)/(deep_vrdep2-deep_vrdep1);
with
vrfac = deep_alpha + deep_const*psrc[ip].dep;
- Before the for loop, define the following variables:
- In line 2862, after 'psrc[j].rupt = psrc[j].rupt - tsmin + rupture_delay;', add:
if (dohypo!=ih || doslip!=js) { continue; }
- In line 2869, in the load_rupt_srf() call, change '&srf' to 'srf'.
- In line 2892, remove:
write2srf(srf,str,outbin);
Writing out SRFs is done by calling write_srf() on the srf object. - In line 2906, remove
free_srf_stf(&srf);
- In lines 2887 and 2890, change '&srf' to 'srf'.
- At the very end of the function, we want to free everything not needed by the SRF object. Add:
free(rvfac_seg); free(gbnd); free(gwid); free(psrc); free(psrc_orig); free(slip_c); free(rake_c); free(tsfac1_c); free(rtime1_c); free(rough_c); free(tsfac2_c); free(rtime2_c); free(slip_r); free(rake_r); free(tsfac1_r); free(rtime1_r); free(rough_r); free(tsfac2_r); free(rtime2_r); free(stk_r); free(dip_r); free(psrc_rake); free(rspd); free(vmod.vp); free(vmod.vs); free(vmod.den); free(vmod.th); free(vmod.dep); free(vmod.mu); free(vmod.invb2); free(rvmod.vs); free(rvmod.th); free(rvmod.invb2); free(rslw); free(fdrt); free(fspace); free(ispace); fftwf_cleanup(); return 0;
- Once verification is complete, you may wish to comment out some of the print statements.
Function name changes
Since Rob uses some of the same helper functions in his genslip code as in his jbsim3d code, we prepend an underscore to some of the function names in the genslip library to avoid namespace conflicts when compiling DirectSynth.
From the src directory, run the following sed command
sed -i 's/<function name>/_<function name>/g' *.c *.h */*.c */*.h
for each of these function names:
- fopfile
- opfile_ro
- opfile
- croptrfile
- reed
- rite
- check_malloc
- check_realloc
- sfrand
- gaus_rand
- zapit
- set_ne
- set_ll
- init_plane_srf
- load_slip_srf
- load_rupt_srf
- gen_2tri_stf
- gen_brune_stf
This might accidentally convert a 'write' to 'w_rite', and you might end up with _f_opfile and __opfile_ro; fix those with
sed -i 's/w_rite/write/g' *.c *.h */*.c */*.h sed -i 's/__opfile_ro/_opfile_ro/g' *.c *.h */*.c */*.h sed -i 's/_f_opfile/_fopfile/g' *.c *.h */*.c */*.h
and also run (since Fortran function names can't start with an underscore, but are case insensitive):
sed -i 's/sort/sort1/g' */*.f sed -i 's/SORT/SORT1/g' */*.f
Optimizations
Based on work done with v5.4.2, we found that we can reduce the runtime of kfilt_beta2() by 75%, speeding up the entire code for large ruptures by 15-20%.
Specifically, we took (slip.c, line 1469)
k2 = kx*kx + ky*ky; fac = 1.0/((1.0 + exp(ord*log(k2*lmin2)))*(1.0 + exp(-ord*log(k2*lmax2)))); amp = fac*exp(-0.5*beta2*log(k2));
and modified it via algebraic manipulation and pulling constants out of the loops:
//New constants float lmin2_ord = pow(lmin2, ord); float lmax2_neg_ord = pow(lmax2, -1.0*ord); float k2_ord; for(j=0;j<=ny0/2;j++) /* only do positive half, then use symmetry */ <snip> k2 = kx*kx + ky*ky; k2_ord = k2*k2*k2*k2; fac = 1.0/((1.0+k2_ord*lmin2_ord)*(1.0+1.0/k2_ord*lmax2_neg_ord)); amp = fac*1.0/k2;
These optimizations depend on beta=2 and ord=4, so before the for loops I added checks that this is in fact true:
if (ord!=4) { fprintf(stderr, "The optimized calculation of amp in kfilt_beta2 depends on ord=4. Ord is actually %d, so we are aborting and you should change this calculation back to the unoptimized version.\n", ord); exit(5); } if (fabs(beta2-2.0)>0.000001) { fprintf(stderr, "The optimized calculation of amp in kfilt_beta2 depends on beta2 = 2. beta2 is actually %d, so we are aborting and you should change this calculation.\n", beta2); exit(6); }
Compiling
Go back into the src directory and try to compile it with 'make'.