Difference between revisions of "AWP-ODC-FDQ"
(2 intermediate revisions by the same user not shown) | |||
Line 54: | Line 54: | ||
--MEDIASTART 0 --READ_STEP 91 \ | --MEDIASTART 0 --READ_STEP 91 \ | ||
--NTISKP 10 --WRITE_STEP 10 \ | --NTISKP 10 --WRITE_STEP 10 \ | ||
− | --FAC 1.0 --Q0 150.0 -EX 0.6 --FP 2.5 \ | + | --FAC 1.0 --Q0 150.0 --EX 0.6 --FP 2.5 \ |
--INSRC FAULTPOW \ | --INSRC FAULTPOW \ | ||
--NSKPX 2 --NSKPY 2 \ | --NSKPX 2 --NSKPY 2 \ | ||
Line 212: | Line 212: | ||
- BLOCK_SIZE_Z divides NZ | - BLOCK_SIZE_Z divides NZ | ||
- BLOCK_SIZE_Y * BLOCK_SIZE_Z = 512 | - BLOCK_SIZE_Y * BLOCK_SIZE_Z = 512 | ||
+ | |||
+ | The critical parameters for Q(f) are: | ||
+ | |||
+ | --FAC 1.0 --Q0 150.0 --EX 0.8 --FP 0.5 \ | ||
+ | |||
+ | I believe fac and Q0 should not be changed. | ||
+ | |||
+ | EX is the exponent in the power law: Q(f)=Q0f^EX. Tom and his student have found EX~0.6-0.8 for so Cal. | ||
+ | |||
+ | Define Q0 in mesh.c (tmpsq, tmppq, usually as a ratio of Vs. | ||
+ | |||
+ | FP is a reference frequency - we usually use FP=0.5-1.0 for the LA area. | ||
== See Also == | == See Also == | ||
+ | *[[AWP-ODC]] | ||
*[[High-F Project]] | *[[High-F Project]] | ||
*[[Main Page]] | *[[Main Page]] |
Latest revision as of 22:42, 2 July 2015
AWP-ODC-FDQ is a version of the wave propagation code AWP-ODC that contains frequency dependent-Q physics modules. Currently we have a GPU version of this code.
Contents
PBS Script
maechlin@h2ologin3:~/fdq_awpodc> more fdq_bw.pbs
#!/bin/bash ### ### PBS script for submitting FDQ on Blue Waters ### ### Set the number of nodes ### Set the number of PEs per node #PBS -l nodes=4:ppn=1:xk ### ### Set the wallclock time ### #PBS -l walltime=01:30:00 ### ### Set the job name ### #PBS -N chino_hills_gpu ### ### Set the job stdout and stderr ### #PBS -e $PBS_JOBID.err #PBS -o $PBS_JOBID.out ### ### Set the Queue ### #PBS -q normal ### ### Set the Allocation ### #PBS -A jmz ### ### Set the Email (Beginning, End, Abort) ### #PBS -m bea #PBS -M maechlin@usc.edu ### ### Load specific modules ### module swap PrgEnv-cray PrgEnv-gnu module load cudatoolkit module unload darshan cd $PBS_O_WORKDIR now=`date` fname="O.$now.tmp" echo "STARTING $now" >> "$fname" aprun -n 4 -N 1 ./pmcl3d --NX 224 --NY 224 -Z 1024 -x 2 -y 2 \ --TMAX 20.0 --DH 200.0 --DT 0.01 \ --MEDIASTART 0 --READ_STEP 91 \ --NTISKP 10 --WRITE_STEP 10 \ --FAC 1.0 --Q0 150.0 --EX 0.6 --FP 2.5 \ --INSRC FAULTPOW \ --NSKPX 2 --NSKPY 2 \ >> "$fname" echo "ENDING `date`" >> "$fname"
Current Result
maechlin@h2ologin1:~/fdq_awpodc> more *.err _pmiu_daemon(SIGCHLD): [NID 23046] [c7-0c0s3n0] [Thu Jul 2 14:10:29 2015] PE RA NK 0 exit signal Segmentation fault [NID 23046] 2015-07-02 14:10:29 Apid 25377581: initiated application termination maechlin@h2ologin1:~/fdq_awpodc> more *.out
Begin Torque Prologue on nid27634 at Thu Jul 2 14:10:24 CDT 2015 Job Id: 1950564.nid11293 Username: maechlin Group: PRAC_jmz Job name: chino_hills_gpu Requested resources: neednodes=4:ppn=1:xk,nodes=4:ppn=1:xk,walltime=01:30:00 Queue: normal Account: jmz End Torque Prologue: 0.024 elapsed
maechlin@h2ologin1:~/fdq_awpodc> more *.tmp STARTING Thu Jul 2 14:10:26 CDT 2015
rank=0) RS=91, RSG=91, NST=91, IF=1
0 = (0,0)) NX,NY,NZ=224,224,1024 nxt,nyt,nzt=112,112,1024 rec_N=(112,112,1) rec_nxt,=56,56,1 NBGX,SKP,END=(1:2:223),(1:2:223),(1:1:1) rec_nbg,ed=(0,110),(0,110),(0,0) disp=0
rank=1) RS=91, RSG=91, NST=91, IF=1
1 = (0,1)) NX,NY,NZ=224,224,1024 nxt,nyt,nzt=112,112,1024 rec_N=(112,112,1) rec_nxt,=56,56,1 NBGX,SKP,END=(1:2:223),(1:2:223),(1:1:1) rec_nbg,ed=(0,110),(0,110),(0,0) disp=25088
rank=3) RS=91, RSG=91, NST=91, IF=1
3 = (1,1)) NX,NY,NZ=224,224,1024 nxt,nyt,nzt=112,112,1024 rec_N=(112,112,1) rec_nxt,=56,56,1 NBGX,SKP,END=(1:2:223),(1:2:223),(1:1:1) rec_nbg,ed=(0,110),(0,110),(0,0) disp=25312
rank=2) RS=91, RSG=91, NST=91, IF=1
2 = (1,0)) NX,NY,NZ=224,224,1024 nxt,nyt,nzt=112,112,1024 rec_N=(112,112,1) rec_nxt,=56,56,1 NBGX,SKP,END=(1:2:223),(1:2:223),(1:1:1) rec_nbg,ed=(0,110),(0,110),(0,0) disp=224 filetype size (supposedly=rec_nxt*nyt*nzt*WS*4=125440) =125440 rank=0, x_rank_L=-1, x_rank_R=2, y_rank_F=-1, y_rank_B=1 Before inisource After inisource. Time elapsed (seconds): 0.003165 rank=0, source rank, npsrc=1 rank=1, x_rank_L=-1, x_rank_R=3, y_rank_F=0, y_rank_B=-1 rank=2, x_rank_L=0, x_rank_R=-1, y_rank_F=-1, y_rank_B=3 rank=3, x_rank_L=1, x_rank_R=-1, y_rank_F=2, y_rank_B=-1 Before inimesh tau: 5.420455e-03,1.111455e+00; 3.223022e+00,1.321751e-01; 9.346188e+00,4.558041 e-02; 1.571835e-02,3.832840e-01 After inimesh. Time elapsed (seconds): 0.045935 Application 25377581 exit codes: 139 Application 25377581 exit signals: Killed Application 25377581 resources: utime ~0s, stime ~1s, Rss ~414872, inblocks ~789 , outblocks ~1277 ENDING Thu Jul 2 14:10:30 CDT 2015
Changing Default Blue Waters Environment
Default software modules are Cray. Change these to GNU
module unload PrgEnv-cray module load PrgEnv-gnu module load cudatoolkit module unload darshan
Makefile
CC = cc CFLAGS = -O3 -Wall GFLAGS = nvcc -O4 -Xptxas -dlcm=ca -maxrregcount=255 -use_fast_math --ptxas-options=-v -arch=sm_35 INCDIR = -I/opt/nvidia/cudatoolkit/5.5.20-1.0402.7700.8.1/include OBJECTS = command.o pmcl3d.o grid.o source.o mesh.o cerjan.o swap.o kernel.o io.o LIB = -lm -ldl -L/opt/nvidia/cudatoolkit/5.5.20-1.0402.7700.8.1/lib64 -lcudart -lmpich pmcl3d: $(OBJECTS) $(CC) $(CFLAGS) $(INCDIR) -o pmcl3d $(OBJECTS) $(LIB) pmcl3d.o: pmcl3d.c $(CC) $(CFLAGS) $(INCDIR) -c -o pmcl3d.o pmcl3d.c command.o: command.c $(CC) $(CFLAGS) $(INCDIR) -c -o command.o command.c io.o: io.c $(CC) $(CFLAGS) $(INCDIR) -c -o io.o io.c grid.o: grid.c $(CC) $(CFLAGS) $(INCDIR) -c -o grid.o grid.c source.o: source.c $(CC) $(CFLAGS) $(INCDIR) -c -o source.o source.c mesh.o: mesh.c $(CC) $(CFLAGS) $(INCDIR) -c -o mesh.o mesh.c cerjan.o: cerjan.c $(CC) $(CFLAGS) $(INCDIR) -c -o cerjan.o cerjan.c swap.o: swap.c $(CC) $(CFLAGS) $(INCDIR) -c -o swap.o swap.c kernel.o: kernel.cu $(GFLAGS) $(INCDIR) -c -o kernel.o kernel.cu clean: rm *.o
Location of Code
Kyle has a GPU version of the code on Titan at: /lustre/atlas1/geo112/proj-shared/withers/chino_hills_gpu
The added parameters to run this code are mainly the choice of exponent in the power law Q(f) law, which is constant below 1 Hz.
Notes about FDQ Version
Note that there are some structural changes from the cpu code. The GPU code doesn't use the input parameter file anymore, all parameters are specified in the run script (if different from default values defined in the code). Also see the instructions below.
The following are must:
- NX, NY, NZ > 2 and even integers - PX, PY >= 1, PZ=1, and divide NX, NY, NZ respectively - BLOCK_SIZE_Y=2 in pmcl3d_cons.h - BLOCK_SIZE_Z divides NZ - BLOCK_SIZE_Y * BLOCK_SIZE_Z <= 1024 and a power of 2
The following are suggestions:
- NX and NY are around at the same order, or 1/2*NY<=NX<=2*NY - NZ >= 256 and a power of 2 - BLOCK_SIZE_Z divides NZ - BLOCK_SIZE_Y * BLOCK_SIZE_Z = 512
The critical parameters for Q(f) are:
--FAC 1.0 --Q0 150.0 --EX 0.8 --FP 0.5 \
I believe fac and Q0 should not be changed.
EX is the exponent in the power law: Q(f)=Q0f^EX. Tom and his student have found EX~0.6-0.8 for so Cal.
Define Q0 in mesh.c (tmpsq, tmppq, usually as a ratio of Vs.
FP is a reference frequency - we usually use FP=0.5-1.0 for the LA area.