Difference between revisions of "Accessing CyberShake Seismograms"

From SCECpedia
Jump to navigationJump to search
 
(36 intermediate revisions by the same user not shown)
Line 1: Line 1:
The seismograms are available on SCEC servers. There are three steps involved in accessing the seismograms:
+
This page details the process of identifying and accessing CyberShake seismograms.
  
1) Determine which seismograms are of interest
+
== Identifying Seismograms of Interest ==
2) Find the seismogram file
 
3) Read the seismogram data
 
  
1) The first step is to figure out which seismograms are of interest.  To do this, you'll need to know a) the site and run ID of the CyberShake run, and b) the source id, rupture id, and rupture variation id of the seismogram.  You may have done this already.  If not, you can determine the run ID by using MySQL to query the CyberShake database on focal.usc.edu.  To determine a run ID, you will need to know values for a few parameters, such as the ERF, the velocity model, and the rupture variation generator version.  Here's a sample query:
+
The first step is to figure out which seismograms are of interest.  You'll need to access the database to determine this; instructions are available at [[Accessing CyberShake Database Data]]. To do this, you'll need to know a) the site and run ID of the CyberShake run, and b) the source id, rupture id, and rupture variation id of the seismogram.  You may have done this already.  If not, you can determine the run ID by using MySQL to query the CyberShake database on focal.usc.edu.  To determine a run ID, you will need to know values for a few parameters, such as the ERF, the velocity model, and the rupture variation generator version.  Here's a sample query:
  
 
  select R.Run_ID from CyberShake_Runs R, CyberShake_Sites S
 
  select R.Run_ID from CyberShake_Runs R, CyberShake_Sites S
Line 17: Line 15:
 
  and R.Status="Verified";        //This means the run completed successfully
 
  and R.Status="Verified";        //This means the run completed successfully
  
From this you should be able to get a run ID.
+
From this you should be able to get a run ID.  You could use this run ID to access all the seismograms (usually about 500,000) for the run, or you could continue to use the database to select seismograms for only certain events.
  
 
From an SRF file, you already have the source ID and run ID.  The difference between rupture variations is the hypocenter locations and the slip distributions. The hypocenter locations are in the database, so you can look at them with a query like:
 
From an SRF file, you already have the source ID and run ID.  The difference between rupture variations is the hypocenter locations and the slip distributions. The hypocenter locations are in the database, so you can look at them with a query like:
Line 28: Line 26:
 
To understand the different slips, the easiest thing to do is to plot them.
 
To understand the different slips, the easiest thing to do is to plot them.
  
2) Once you have a run ID and a specific seismogram, the next step is to find the file on disk.  There are a couple of places to check.
+
== Retrieving Seismograms ==
  
There is a directory on the SCEC servers which corresponds to the specific run IDIt will be in one of:
+
We are in the process of migrating seismograms from past CyberShake studies to Globus collections, to simplify access for usersCurrently, only CyberShake Study 15.12 is available through Globus, but if you are interested in accessing seismograms from a different CyberShake study, please email software@scec.org to make a request.
  
/home/scec-04/tera3d/CyberShake/data/PPFiles/<site>/<runID>
+
=== Globus Account ===
/home/scec-04/tera3d/CyberShake/data/from_scec-02/PPFiles/<site>/<runID>
 
/home/scec-02/tera3d/CyberShake2007/data/PPFiles/<site>/<runID>
 
  
Inside that directory you will see 1 of 2 possibilities -- we changed our file format about a year ago, so we have data in both formats.
+
To access the seismograms, you'll need to acquire a Globus account.  If you don't already have one, you can create one using your email address by following the instructions [https://docs.globus.org/how-to/get-started/ here].
  
# In the older format, the directory will contain a number (10-80) of Seismogram*.zip files.  One of these zip files has the seismogram you want, but there's no straightforward way to tell.  I recommend running the following from the command line (in bash):
+
=== Connecting to the collection ===
  
for i in `ls *_seismograms.zip`; do echo $i; unzip -l $i | grep Seismogram_<site>_<sourceID>_<ruptureID>_<ruptureVariationID>.grm; done
+
To access the seismogram data, or intensity measure data beyond what is stored in the database, you'll need to transfer data from the Study 15.12 Globus collection to your destination system, either by using the Globus web-based interface or the Globus command-line interface.
  
When you run it, you'll get output like:
+
Once you've connected to the collection, you can transfer the data anywhere with a Globus endpoint - [https://docs.globus.org/how-to/get-started/#request_a_file_transfer details here].  Many clusters, universities, and HPC centers have Globus endpoints set up already.  If you can't find an endpoint where you need one, or you'd prefer to transfer the data to your local laptop or desktop, you can create your own endpoint using [https://www.globus.org/globus-connect-personal Globus Connect Personal].
  
CyberShake_LADT_749_3_seismograms.zip
+
==== Globus web-based interface ====
CyberShake_LADT_749_40_seismograms.zip
 
CyberShake_LADT_749_41_seismograms.zip
 
CyberShake_LADT_749_42_seismograms.zip
 
24000  02-11-2011 19:34  Seismogram_LADT_128_1100_0.grm
 
CyberShake_LADT_749_43_seismograms.zip
 
CyberShake_LADT_749_44_seismograms.zip
 
CyberShake_LADT_749_45_seismograms.zip
 
  
So that would tell you that file Seismogram_LADT_128_1100_0.grm is in CyberShake_LADT_749_42_seismograms.zip.
+
The simplest way to access CyberShake data is through the Globus web-based interface.  You can access this collection by following [https://app.globus.org/file-manager?origin_id=2c328eb5-1c89-470d-ac1d-9fa693d28d86&origin_path=%2F this link].
  
Once you've IDed the zip file, you can extract the seismogram you want by:
+
==== Globus command-line interface ====
  
unzip <zip file> <seismogram file> .
+
If you have a complex request (for example, you only want data for certain events) which is difficult to handle in the web GUI, or you would like to automate your transfers, Globus provides the [https://docs.globus.org/cli/ Globus Command Line Interface(CLI)].  You can issue the same commands as in the web client, and additionally you can create a batch file which contains a list of all the files you'd like to download.  With some versions of the CLI I have noticed a limit in terms of the number of files the CLI can handle at a time, so I would recommend no more than 100,000 files per batch file.  If you use this approach, you'll need either the name of the collection, which is 'SCEC CyberShake Study 15.12', or the UUID for the collection, which is 2c328eb5-1c89-470d-ac1d-9fa693d28d86.  Note that transfers made through the CLI will still be associated with your account, and you can monitor their progress through the web GUI.
  
# If the seismograms are in the newer format, then there's a different approach.  You'll know because the directory will contain several thousand Seismogram_*.grm files.  The one you want is Seismogram_<site>_<sourceID>_<ruptureID>.grm, with whatever source ID and rupture ID you are interested in seismograms for.
+
=== Data organization ===
  
3) Once you have your seismogram file, the last step is to read the dataIt's velocity data, in cm/s, with dt = 0.1 seconds.
+
The data is organized by site name and run IDSo, for example, there's a directory for site ACTN and then run ID 4296.  Inside this directory are all the seismograms, PSA, RotD, and duration files for site ACTN, run ID 4296.  The filenames are <Seismogram | PeakVals | RotD | Duration>_<site>_<source ID>_<rupture ID>.<grm | bsa | rotd | dur> . The data for each run of Study 15.12 is approximately 90 GB.
  
If the file is in the older format (you can tell because the filename is Seismogram_<site>_<sourceID>_<ruptureID>_<ruptureVariationID>.grm), the data is stored in binary format as:
+
== Seismogram Format ==
  
<3000 4-byte floats - X component seismogram>
+
The seismogram file Seismogram_<site>_<sourceID>_<ruptureID>.grm contains the seismograms for all the rupture variations for the given source ID.
<3000 4-byte floats - Y component seismogram>
 
  
If you run "od -f <seismogram file> | more" you can see the valuesI'm not sure what your plans are ultimately for the seismogram data, but you may wish to write a small C or Python code to read the data out of the file so that you can process it.
+
The seismograms are velocity seismograms, with <b>units of cm/s</b>.  The number of timesteps and dt depends on what frequency the run was performed at, as well as whether the run included stochastic high-frequency results, and can be determined by reading the header information.
  
If the file is in the newer format (filename Seismogram_<site>_<sourceID>_<ruptureID>.grm), all the rupture variations for a single rupture are in the same file.  The format is (binary, again):
+
The format is (binary):
  
  <Rupture Variation 1 header>
+
  <Rupture Variation 1 56-byte header>
  <Rupture Variation 1, X component - 3000 4-byte floats>
+
  <Rupture Variation 1, X component - NT 4-byte floats>
  <Rupture Variation 1, Y component - 3000 4-byte floats>
+
  <Rupture Variation 1, Y component - NT 4-byte floats>
  <Rupture Variation 2 header>
+
  <Rupture Variation 2 56-byte header>
  <Rupture Variation 2, X component - 3000 4-byte floats>
+
  <Rupture Variation 2, X component - NT 4-byte floats>
  <Rupture Variation 2, Y component - 3000 4-byte floats>
+
  <Rupture Variation 2, Y component - NT 4-byte floats>
 
  ...
 
  ...
  <Rupture Variation n header>
+
  <Rupture Variation n 56-byte header>
  <Rupture Variation n, X component - 3000 4-byte floats>
+
  <Rupture Variation n, X component - NT 4-byte floats>
  <Rupture Variation n, Y component - 3000 4-byte floats>
+
  <Rupture Variation n, Y component - NT 4-byte floats>
  
Where this gets tricky is that the rupture variations are not necessarily in order in the file.  So you will have to read the header data to find the rupture variations you are interested in.
+
However, the rupture variations are not necessarily in order in the file.  So if you are only interested in a subset of the rupture variations, you will have to read the header data to find the rupture variations you are interested in.  A description of the header structure is given in [[CyberShake_output_data_headers#Seismogram_header]].  C and Python examples to read the header information are provided there.
  
The format of the headers is outlined [http://scec.usc.edu/it/Seismogram_and_PSA_headers_for_I/O_forwarding here] (access with your SCEC computing login). I recommend writing a simple C code to read and parse the headers, and I can point you to some examples if that would be helpfulOnce you've found the rupture variation you're interested in, you can then read in the data and do some processing.
+
Sample Python and C code for reading out the seismogram contents for a seismogram with just 1 rupture variation is below.  This code prints the 1000th timesteps for X and Y components, but obviously you could do whatever is needed with the arrays.
 +
 
 +
=== Python code ===
 +
 
 +
<pre>
 +
!/usr/bin/env python3
 +
 
 +
import sys
 +
import struct
 +
 
 +
class Seismogram:
 +
 
 +
    def __init__(self, filename):
 +
        self.filename = filename
 +
        self.nt = 0
 +
        self.dt = 0.0
 +
 
 +
    def parseHeader(self, header_str):
 +
        #Find null terminator
 +
        split_point = header_str[0:8].find(b'\x00')
 +
        self.version = header_str[0:split_point].decode()
 +
        if self.version!="12.10":
 +
          print("Error: version does not match expected string '12.10', aborting.")
 +
          sys.exit(2)
 +
        #Find null terminator
 +
        split_point = header_str[8:16].find(b'\x00')
 +
        self.site = header_str[8:8+split_point].decode()
 +
        self.source_id = struct.unpack('i', header_str[24:28])[0]
 +
        self.rupture_id = struct.unpack('i', header_str[28:32])[0]
 +
        self.rup_var_id = struct.unpack('i', header_str[32:36])[0]
 +
        self.dt = struct.unpack('f', header_str[36:40])[0]
 +
        self.nt = struct.unpack('i', header_str[40:44])[0]
 +
        self.comps = struct.unpack('i', header_str[44:48])[0]
 +
        self.x_flag = self.y_flag = self.z_flag = False
 +
        if (self.comps & 1)==1:
 +
            self.x_flag = True
 +
        if (self.comps & 2)==2:
 +
            self.y_flag = True
 +
        if (self.comps & 4)==4:
 +
            self.z_flag = True
 +
        self.det_max_freq = struct.unpack('f', header_str[48:52])[0]
 +
        self.stoch_max_freq = struct.unpack('f', header_str[52:56])[0]
 +
 
 +
    def readData(self):
 +
        fp_in = open(self.filename, "rb")
 +
        header_str = fp_in.read(56)
 +
        self.parseHeader(header_str)
 +
        data_str = fp_in.read(4*self.nt)
 +
        self.x_data = struct.unpack("%df" % self.nt, data_str)
 +
        data_str = fp_in.read(4*self.nt)
 +
        self.y_data = struct.unpack("%df" % self.nt, data_str)
 +
        fp_in.close()
 +
 
 +
if len(sys.argv)<2:
 +
    print("Usage: %s <seismogram file>" % sys.argv[0])
 +
    sys.exit(1)
 +
 
 +
seis_file = sys.argv[1]
 +
s = Seismogram(seis_file)
 +
s.readData()
 +
print("seis_data[X][1000] = %f" % s.x_data[1000])
 +
print("seis_data[Y][1000] = %f" % s.y_data[1000])
 +
</pre>
 +
 
 +
=== C code ===
 +
 
 +
<pre>
 +
#include <stdio.h>
 +
#include <stdlib.h>
 +
#define X 0
 +
#define Y 1
 +
 
 +
struct seisheader {
 +
char version[8];
 +
char site_name[8];
 +
char padding[8];
 +
int source_id;
 +
int rupture_id;
 +
int rup_var_id;
 +
float dt;
 +
int nt;
 +
int comps;
 +
float det_max_freq;
 +
float stoch_max_freq;
 +
};
 +
 
 +
int main(int argc, char** argv) {
 +
        if (argc<2) {
 +
                printf("Usage: %s <input seismogram>\n", argv[0]);
 +
        }
 +
        char* input_file = argv[1];
 +
        FILE* fp_in = fopen(input_file, "rb");
 +
        struct seisheader shead;
 +
        fread(&shead, sizeof(struct seisheader), 1, fp_in);
 +
        //Array with X and Y seismograms
 +
        //seis_data[0] has X, seis_data[1] has Y
 +
        float* seis_data[2];
 +
        int i;
 +
        for (i=0; i<2; i++) {
 +
                seis_data[i] = malloc(sizeof(float)*shead.nt);
 +
                fread(seis_data[i], sizeof(float), shead.nt, fp_in);
 +
        }
 +
        fclose(fp_in);
 +
        printf("seis_data[X][1000] = %f\n", seis_data[X][1000]);
 +
        printf("seis_data[Y][1000] = %f\n", seis_data[Y][1000]);
 +
        for (i=0; i<2; i++) {
 +
                free(seis_data[i]);
 +
        }
 +
}
 +
</pre>
 +
 
 +
<!-- === Plotting Seismograms ===
 +
 
 +
If you have new-format seismogram files, you can use the script /home/scec-00/cybershk/scripts/plot_N_seis.py to plot the X and Y components of multiple seismograms on the same plot.  The script takes the seismogram files, the legends, a plot title, and an output file as argumentsIf you prefer to plot the spectral content of the seismograms, you can use the script /home/scec-00/cybershk/scripts/plot_N_seis_spectral.py .
 +
 
 +
Note that the seismograms MUST have header information in order to use these scripts.
 +
 
 +
=== Comparing Seismograms ===
 +
 
 +
If you want to compare two seismograms, you can use the script /home/scec-00/cybershk/scripts/compare_seis.py, The script compares two 2-component seismograms point-by-point to see if they exceed a specified tolerance.//-->

Latest revision as of 18:48, 5 May 2023

This page details the process of identifying and accessing CyberShake seismograms.

Identifying Seismograms of Interest

The first step is to figure out which seismograms are of interest. You'll need to access the database to determine this; instructions are available at Accessing CyberShake Database Data. To do this, you'll need to know a) the site and run ID of the CyberShake run, and b) the source id, rupture id, and rupture variation id of the seismogram. You may have done this already. If not, you can determine the run ID by using MySQL to query the CyberShake database on focal.usc.edu. To determine a run ID, you will need to know values for a few parameters, such as the ERF, the velocity model, and the rupture variation generator version. Here's a sample query:

select R.Run_ID from CyberShake_Runs R, CyberShake_Sites S
where S.CS_Short_Name="LADT"    //Replace this with the site you want
and R.Site_ID=S.CS_Site_ID
and R.ERF_ID=35                 //35 corresponds to UCERF 2.  You'll always want to use this.
and R.SGT_Variation_ID=5        //This is the code used to generate the SGTs.  5 = Rob Graves' code.  6 = AWP-ODC.  7 = a newer, revised version of the Graves code.
and R.Rup_Var_Scenario_ID=4     //This is the code used to generated the rupture variations.  3 = Graves & Pitarka 2007.  4 = Graves & Pitarka 2010.  You'll probably want to  use 4.
and R.Velocity_Model_ID=1       //This is the velocity model.  1 = CVM-S4.  2 = CVM-H v11.2.  4 = CVM-H v11.9.
and (R.Max_Frequency=0.5 or R.Max_Frequency is null)    //This selects only deterministic (0.5 Hz) runs.  If you wanted only runs that include the stochastic high-frequency components, you would change this to "R.Max_Frequency=10.0"
and R.Status="Verified";        //This means the run completed successfully

From this you should be able to get a run ID. You could use this run ID to access all the seismograms (usually about 500,000) for the run, or you could continue to use the database to select seismograms for only certain events.

From an SRF file, you already have the source ID and run ID. The difference between rupture variations is the hypocenter locations and the slip distributions. The hypocenter locations are in the database, so you can look at them with a query like:

select Rup_Var_ID, Hypocenter_Lat, Hypocenter_Lon, Hypocenter_Depth from Rupture_Variations
where ERF_ID=35
and Source_ID=0
and Rupture_ID=0;

To understand the different slips, the easiest thing to do is to plot them.

Retrieving Seismograms

We are in the process of migrating seismograms from past CyberShake studies to Globus collections, to simplify access for users. Currently, only CyberShake Study 15.12 is available through Globus, but if you are interested in accessing seismograms from a different CyberShake study, please email software@scec.org to make a request.

Globus Account

To access the seismograms, you'll need to acquire a Globus account. If you don't already have one, you can create one using your email address by following the instructions here.

Connecting to the collection

To access the seismogram data, or intensity measure data beyond what is stored in the database, you'll need to transfer data from the Study 15.12 Globus collection to your destination system, either by using the Globus web-based interface or the Globus command-line interface.

Once you've connected to the collection, you can transfer the data anywhere with a Globus endpoint - details here. Many clusters, universities, and HPC centers have Globus endpoints set up already. If you can't find an endpoint where you need one, or you'd prefer to transfer the data to your local laptop or desktop, you can create your own endpoint using Globus Connect Personal.

Globus web-based interface

The simplest way to access CyberShake data is through the Globus web-based interface. You can access this collection by following this link.

Globus command-line interface

If you have a complex request (for example, you only want data for certain events) which is difficult to handle in the web GUI, or you would like to automate your transfers, Globus provides the Globus Command Line Interface(CLI). You can issue the same commands as in the web client, and additionally you can create a batch file which contains a list of all the files you'd like to download. With some versions of the CLI I have noticed a limit in terms of the number of files the CLI can handle at a time, so I would recommend no more than 100,000 files per batch file. If you use this approach, you'll need either the name of the collection, which is 'SCEC CyberShake Study 15.12', or the UUID for the collection, which is 2c328eb5-1c89-470d-ac1d-9fa693d28d86. Note that transfers made through the CLI will still be associated with your account, and you can monitor their progress through the web GUI.

Data organization

The data is organized by site name and run ID. So, for example, there's a directory for site ACTN and then run ID 4296. Inside this directory are all the seismograms, PSA, RotD, and duration files for site ACTN, run ID 4296. The filenames are <Seismogram | PeakVals | RotD | Duration>_<site>_<source ID>_<rupture ID>.<grm | bsa | rotd | dur> . The data for each run of Study 15.12 is approximately 90 GB.

Seismogram Format

The seismogram file Seismogram_<site>_<sourceID>_<ruptureID>.grm contains the seismograms for all the rupture variations for the given source ID.

The seismograms are velocity seismograms, with units of cm/s. The number of timesteps and dt depends on what frequency the run was performed at, as well as whether the run included stochastic high-frequency results, and can be determined by reading the header information.

The format is (binary):

<Rupture Variation 1 56-byte header>
<Rupture Variation 1, X component - NT 4-byte floats>
<Rupture Variation 1, Y component - NT 4-byte floats>
<Rupture Variation 2 56-byte header>
<Rupture Variation 2, X component - NT 4-byte floats>
<Rupture Variation 2, Y component - NT 4-byte floats>
...
<Rupture Variation n 56-byte header>
<Rupture Variation n, X component - NT 4-byte floats>
<Rupture Variation n, Y component - NT 4-byte floats>

However, the rupture variations are not necessarily in order in the file. So if you are only interested in a subset of the rupture variations, you will have to read the header data to find the rupture variations you are interested in. A description of the header structure is given in CyberShake_output_data_headers#Seismogram_header. C and Python examples to read the header information are provided there.

Sample Python and C code for reading out the seismogram contents for a seismogram with just 1 rupture variation is below. This code prints the 1000th timesteps for X and Y components, but obviously you could do whatever is needed with the arrays.

Python code

!/usr/bin/env python3

import sys
import struct

class Seismogram:

    def __init__(self, filename):
        self.filename = filename
        self.nt = 0
        self.dt = 0.0

    def parseHeader(self, header_str):
        #Find null terminator
        split_point = header_str[0:8].find(b'\x00')
        self.version = header_str[0:split_point].decode()
        if self.version!="12.10":
           print("Error: version does not match expected string '12.10', aborting.")
           sys.exit(2)
        #Find null terminator
        split_point = header_str[8:16].find(b'\x00')
        self.site = header_str[8:8+split_point].decode()
        self.source_id = struct.unpack('i', header_str[24:28])[0]
        self.rupture_id = struct.unpack('i', header_str[28:32])[0]
        self.rup_var_id = struct.unpack('i', header_str[32:36])[0]
        self.dt = struct.unpack('f', header_str[36:40])[0]
        self.nt = struct.unpack('i', header_str[40:44])[0]
        self.comps = struct.unpack('i', header_str[44:48])[0]
        self.x_flag = self.y_flag = self.z_flag = False
        if (self.comps & 1)==1:
            self.x_flag = True
        if (self.comps & 2)==2:
            self.y_flag = True
        if (self.comps & 4)==4:
            self.z_flag = True
        self.det_max_freq = struct.unpack('f', header_str[48:52])[0]
        self.stoch_max_freq = struct.unpack('f', header_str[52:56])[0]

    def readData(self):
        fp_in = open(self.filename, "rb")
        header_str = fp_in.read(56)
        self.parseHeader(header_str)
        data_str = fp_in.read(4*self.nt)
        self.x_data = struct.unpack("%df" % self.nt, data_str)
        data_str = fp_in.read(4*self.nt)
        self.y_data = struct.unpack("%df" % self.nt, data_str)
        fp_in.close()

if len(sys.argv)<2:
    print("Usage: %s <seismogram file>" % sys.argv[0])
    sys.exit(1)

seis_file = sys.argv[1]
s = Seismogram(seis_file)
s.readData()
print("seis_data[X][1000] = %f" % s.x_data[1000])
print("seis_data[Y][1000] = %f" % s.y_data[1000])

C code

#include <stdio.h>
#include <stdlib.h>
#define X 0
#define Y 1

struct seisheader {
 char version[8];
 char site_name[8];
 char padding[8];
 int source_id;
 int rupture_id;
 int rup_var_id;
 float dt;
 int nt;
 int comps;
 float det_max_freq;
 float stoch_max_freq;
};

int main(int argc, char** argv) {
        if (argc<2) {
                printf("Usage: %s <input seismogram>\n", argv[0]);
        }
        char* input_file = argv[1];
        FILE* fp_in = fopen(input_file, "rb");
        struct seisheader shead;
        fread(&shead, sizeof(struct seisheader), 1, fp_in);
        //Array with X and Y seismograms
        //seis_data[0] has X, seis_data[1] has Y
        float* seis_data[2];
        int i;
        for (i=0; i<2; i++) {
                seis_data[i] = malloc(sizeof(float)*shead.nt);
                fread(seis_data[i], sizeof(float), shead.nt, fp_in);
        }
        fclose(fp_in);
        printf("seis_data[X][1000] = %f\n", seis_data[X][1000]);
        printf("seis_data[Y][1000] = %f\n", seis_data[Y][1000]);
        for (i=0; i<2; i++) {
                free(seis_data[i]);
        }
}