Accessing CyberShake Seismograms
CyberShake seismograms are available on SCEC servers. There are three steps involved in accessing the seismograms:
- Determine which seismograms are of interest
- Find the seismogram file
- Read the seismogram data
Contents
Identifying Seismograms of Interest
The first step is to figure out which seismograms are of interest. To do this, you'll need to know a) the site and run ID of the CyberShake run, and b) the source id, rupture id, and rupture variation id of the seismogram. You may have done this already. If not, you can determine the run ID by using MySQL to query the CyberShake database on focal.usc.edu. To determine a run ID, you will need to know values for a few parameters, such as the ERF, the velocity model, and the rupture variation generator version. Here's a sample query:
select R.Run_ID from CyberShake_Runs R, CyberShake_Sites S where S.CS_Short_Name="LADT" //Replace this with the site you want and R.Site_ID=S.CS_Site_ID and R.ERF_ID=35 //35 corresponds to UCERF 2. You'll always want to use this. and R.SGT_Variation_ID=5 //This is the code used to generate the SGTs. 5 = Rob Graves' code. 6 = AWP-ODC. 7 = a newer, revised version of the Graves code. and R.Rup_Var_Scenario_ID=4 //This is the code used to generated the rupture variations. 3 = Graves & Pitarka 2007. 4 = Graves & Pitarka 2010. You'll probably want to use 4. and R.Velocity_Model_ID=1 //This is the velocity model. 1 = CVM-S4. 2 = CVM-H v11.2. 4 = CVM-H v11.9. and (R.Max_Frequency=0.5 or R.Max_Frequency is null) //This selects only deterministic (0.5 Hz) runs. If you wanted only runs that include the stochastic high-frequency components, you would change this to "R.Max_Frequency=10.0" and R.Status="Verified"; //This means the run completed successfully
From this you should be able to get a run ID. You could use this run ID to access all the seismograms (usually about 500,000) for the run, or you could continue to use the database to select seismograms for only certain events.
From an SRF file, you already have the source ID and run ID. The difference between rupture variations is the hypocenter locations and the slip distributions. The hypocenter locations are in the database, so you can look at them with a query like:
select Rup_Var_ID, Hypocenter_Lat, Hypocenter_Lon, Hypocenter_Depth from Rupture_Variations where ERF_ID=35 and Source_ID=0 and Rupture_ID=0;
To understand the different slips, the easiest thing to do is to plot them.
Finding the seismogram
Using the Seismogram Retrieval Script
Once you know the run ID and the event of interest, you can use a Python script, /home/scec-02/cybershk/utils/retrieve_seismogram.py (and in SVN at https://source.usc.edu/svn/cybershake/import/trunk/Utils/retrieve_seismogram.py), to retrieve the seismogram for you. It takes as arguments the run ID, source ID, rupture ID, rupture variation ID, output file, and if you want the seismogram header information included or not. This script automatically searches the directories below, retrieves the specific two-component seismogram of interest, and writes an output file. It is highly recommended to use this script rather than looking for the file by hand.
Locating Seismogram Files by Hand
If you prefer, you can try to find the files on disk yourself. There are a couple of places to check.
There is a directory on the SCEC servers which corresponds to the specific run ID. It will be in one of:
/home/scec-04/tera3d/CyberShake/data/PPFiles/<site>/<runID> /home/scec-04/tera3d/CyberShake/data/from_scec-02/PPFiles/<site>/<runID> /home/scec-02/tera3d/CyberShake2007/data/PPFiles/<site>/<runID>
Inside that directory you will see 1 of 2 possibilities -- we changed our file format about a year ago, so we have data in both formats.
a) In the older format, the directory will contain a number (10-80) of Seismogram*.zip files. All broadband runs fall into this category. One of these zip files has the seismogram you want, but there's no straightforward way to tell. I recommend running the following from the command line (in bash):
for i in `ls *_seismograms.zip`; do echo $i; unzip -l $i | grep Seismogram_<site>_<sourceID>_<ruptureID>_<ruptureVariationID>.grm; done
When you run it, you'll get output like:
CyberShake_LADT_749_3_seismograms.zip CyberShake_LADT_749_40_seismograms.zip CyberShake_LADT_749_41_seismograms.zip CyberShake_LADT_749_42_seismograms.zip 24000 02-11-2011 19:34 Seismogram_LADT_128_1100_0.grm CyberShake_LADT_749_43_seismograms.zip CyberShake_LADT_749_44_seismograms.zip CyberShake_LADT_749_45_seismograms.zip
So that would tell you that file Seismogram_LADT_128_1100_0.grm is in CyberShake_LADT_749_42_seismograms.zip.
Once you've IDed the zip file, you can extract the seismogram you want by:
unzip <zip file> <seismogram file> .
Note that if you are retrieving seismograms from a Broadband run, the filenames are slightly different. For low-frequency seismograms, they will be Seismogram_<site>_<sourceID>_<ruptureID>_<ruptureVariationID>_lf.grm , and the broadband seismogram filenames are Seismogram_<site>_<sourceID>_<ruptureID>_<ruptureVariationID>.grm .
b) If the seismograms are in the newer format, then there's a different approach. You'll know because the directory will contain several thousand Seismogram_*.grm files. The one you want is Seismogram_<site>_<sourceID>_<ruptureID>.grm, with whatever source ID and rupture ID you are interested in seismograms for.
All the rupture variations for a single rupture are in this file. The format is (binary):
<Rupture Variation 1 header> <Rupture Variation 1, X component - NT 4-byte floats> <Rupture Variation 1, Y component - NT 4-byte floats> <Rupture Variation 2 header> <Rupture Variation 2, X component - NT 4-byte floats> <Rupture Variation 2, Y component - NT 4-byte floats> ... <Rupture Variation n header> <Rupture Variation n, X component - NT 4-byte floats> <Rupture Variation n, Y component - NT 4-byte floats>
Where this gets tricky is that the rupture variations are not necessarily in order in the file. So you will have to read the header data to find the rupture variations you are interested in. A description of the header structure is given in CyberShake_output_data_headers#Seismogram_header. C and Python examples to read the header information are provided there.
Retrieving All Seismograms for a Run
If you wish to access all the seismograms for a run, you will need to determine the directory which contains the seismograms. In general, results for run IDs greater than or equal to 1328 will be in /home/scec-04/tera3d/CyberShake/data/PPFiles/<site name>/<run id> . However, you can definitively determine the directory by using the seismogram retrieval script on a single seismogram for the run. As part of the output, the script will print the path to that seismogram. All other seismograms for the same run id will be in the same directory. For example:
$>/home/scec-02/cybershk/utils/retrieve_seismogram.py -r 4749 -s 7 -p 1 -v 0 -o tmp -i Opening /home/scec-04/tera3d/CyberShake/data/PPFiles/USC/4749/Seismogram_USC_7_1.grm. Seismogram found.
Therefore, all the seismograms for run ID 4749 are in /home/scec-04/tera3d/CyberShake/data/PPFiles/USC/4749 .
Reading Seismogram Files
Once you have your seismogram file, the last step is to read the data.
The seismograms are velocity seismograms, with units of cm/s. The number of timesteps and dt depends on what frequency the run was performed at, as well as whether the run included stochastic high-frequency results.
We changed the format of the seismogram files after Study 2.2. So for any Run ID > 1309, the seismograms are in the 'new' format, described below.
New (Run ID > 1309) Seismogram Format
If you have used the seismogram extraction script, you will have your seismogram of interest. By parsing the header (specified in CyberShake_output_data_headers#Seismogram_header, examples provided), you can determine the timestep size and number of timesteps for this seismogram.
Following the header, all seismograms have the X component, followed by the Y component. Each component has NT 4-byte floats. So the overall format is:
<56-byte header> <NT x 4 byte-floats - X component seismogram> <NT x 4 byte-floats - Y component seismogram>
Sample C code for reading out the seismogram contents is below.
Old (Run ID < 1310) Seismogram Format
If the file is in the older format, the data is stored in binary format, without header information, as:
<NT 4-byte floats - X component seismogram> <NT 4-byte floats - Y component seismogram>
If you run "od -f <seismogram file> | more" you can see the values. Depending on the destination for the seismogram data, you may wish to write a small C or Python code to read the data out of the file so that you can process it.
Plotting Seismograms
If you have new-format seismogram files, you can use the script /home/scec-00/cybershk/scripts/plot_N_seis.py to plot the X and Y components of multiple seismograms on the same plot. The script takes the seismogram files, the legends, a plot title, and an output file as arguments. If you prefer to plot the spectral content of the seismograms, you can use the script /home/scec-00/cybershk/scripts/plot_N_seis_spectral.py .
Note that the seismograms MUST have header information in order to use these scripts.
Comparing Seismograms
If you want to compare two seismograms, you can use the script /home/scec-00/cybershk/scripts/compare_seis.py, The script compares two 2-component seismograms point-by-point to see if they exceed a specified tolerance.