Difference between revisions of "CyberShake Data"
(20 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | This page provides an overview of CyberShake Data, and how to access it. | |
− | + | CyberShake data can be broken down into the following elements based on when it is used in simulations: | |
− | |||
− | |||
− | + | #Input data needed for CyberShake runs, such as which ruptures go with which site. This information is stored in the CyberShake database. | |
− | + | #Temporary data generated during CyberShake production runs. This data remains on the cluster and is purged. | |
− | + | #Output data products generated by CyberShake runs. This data is transferred from the cluster to SCEC disks, and some of it is inserted into the CyberShake database for quick access. | |
− | 1) | + | We will focus on (1) and (3). |
− | + | == CyberShake database overview == | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | CyberShake data is served through 2 on-disk relational database servers running MySQL/MariaDB, and an SQLite file for each past study. | |
− | + | === MySQL/MariaDB Databases === | |
− | + | The two databases used to store CyberShake data are focal.usc.edu ('focal') and moment.usc.edu ('moment'). | |
− | |||
− | and | ||
− | |||
− | + | Examples of accessing data stored in these databases can be found at [[Accessing_CyberShake_Database_Data]]. | |
− | + | ==== Moment DB ==== | |
− | + | Moment is the production database server. Currently, it maintains all the necessary inputs, metadata on all CyberShake runs, and results for Study 15.12 and Study 17.3. | |
− | + | Read-only access to moment is: | |
− | |||
− | |||
− | + | host: moment.usc.edu | |
+ | user: cybershk_ro | ||
+ | password: CyberShake2007 | ||
+ | database: CyberShake | ||
− | + | ==== Focal DB ==== | |
− | for | + | Focal is the database server for external user queries. We plan to remove all but the most recent few studies from focal, but this is still in progress, so for now focal has all inputs, metadata, and results up through Study 15.12. |
− | + | Read-only access to focal is: | |
− | + | host: focal.usc.edu | |
− | + | user: cybershk_ro | |
− | + | password: CyberShake2007 | |
− | + | database: CyberShake | |
− | |||
− | |||
− | |||
− | |||
− | + | === SQLite files === | |
− | |||
− | |||
− | + | == CyberShake input data == | |
− | + | At the beginning of a CyberShake run, the database is queried to determine site information (name, latitude, longitude). This can be found in the CyberShake_Sites table. | |
− | + | The database is also used to determine which ruptures fall within the 200 km cutoff. This information is used to construct the necessary volume and select the correct rupture files for processing. This can be found in the CyberShake_Site_Ruptures table, which contains a list of ruptures for each site which fall within a given cutoff. | |
− | + | Both of these tables are populated by Kevin when we select new sites for CyberShake processing. | |
− | |||
− | + | == CyberShake output data == | |
− | + | CyberShake runs produce the following output data, divided into data staged back from the cluster, and local data products: | |
− | + | Data staged from cluster: | |
− | + | *Seismograms | |
− | + | *Peak spectral acceleration, X and Y component and geometric mean | |
− | + | *RotD results (for some studies) | |
− | + | *Duration results (for some studies) | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | Local data products: | |
+ | *Hazard curves | ||
+ | *Disaggregation results | ||
+ | *Hazard maps | ||
− | + | === CyberShake data staged from cluster === | |
− | |||
− | == CyberShake Data Request Website == | + | The data products below are all generated on the remote system, then staged back to SCEC storage as part of the workflow. Some of these data products are inserted into the database. |
+ | |||
+ | ==== Seismograms ==== | ||
+ | |||
+ | Seismogram access is detailed at [[Accessing CyberShake Seismograms]]. | ||
+ | |||
+ | ==== Acceleration Data ==== | ||
+ | |||
+ | In CyberShake, we have two kinds of acceleration intensity measure data: | ||
+ | # X and Y component and geometric mean data | ||
+ | # RotD50 and RotD100 data (since Study 15.4). | ||
+ | Accessing this data depends on which periods you want, as some of it is in the database, and the rest of it is in files. Accessing this data is detailed at [[Accessing CyberShake Peak Acceleration Data]]. | ||
+ | |||
+ | ==== Duration ==== | ||
+ | |||
+ | Duration metric data was populated to the database for Study 15.12 (a study with stochastic components) but not for Study 17.3. Accessing duration data also depends on if you want what's in the database or not. Details on this are available in [[Accessing CyberShake Duration Data]]. | ||
+ | |||
+ | === CyberShake data products generated locally === | ||
+ | |||
+ | These data products are generated locally, on shock.usc.edu, in the final stages of the workflow. | ||
+ | |||
+ | ==== Hazard Curves ==== | ||
+ | |||
+ | Hazard curves are produced by combining the intensity measure data in the database, at a certain period, with the probability of each event. The code for performing this is part of OpenSHA. | ||
+ | |||
+ | Hazard curves are located in the directory /home/scec-00/cybershk/opensha/curves/<site short name>. The convention for the hazard curve name for a particular run, component, and period is: | ||
+ | <pre><site short name>_ERF<erf ID>_Run<Run ID>_SA_<period>sec_<component>_<yyyy>_<mm>_<dd>.<png or pdf></pre> | ||
+ | |||
+ | Note that the year, month, and day are when the run was completed, not when the hazard curve is produced. | ||
+ | |||
+ | In general, hazard curves are automatically generated for the same periods which are inserted into the database. | ||
+ | |||
+ | ==== Disaggregations ==== | ||
+ | |||
+ | Disaggregations calculate how much each CyberShake source (by 'source' we mean UCERF source) contributes to the overall hazard at a certain point on the hazard curve. | ||
+ | |||
+ | Disaggregations are automatically performed at an exeedance probability of 4e-4 (2% in 50 years). These disaggregation files are available in /home/scec-00/cybershk/opensha/disagg . The convention for the disaggregation file name is: | ||
+ | <pre><site short name>_ERF<erf ID>_Run<Run ID>_DisaggPOE_<probability level>_SA_<period>sec_<yyyy>_<mm>_<dd>.<pdf, png, or txt></pre> | ||
+ | |||
+ | Note that the year, month, and day are when the run was completed, not when the hazard curve is produced. | ||
+ | |||
+ | The PDF and PNG files are images, showing a breakdown of what magnitude events at what distance contributed to the hazard. The PDF and text files also have a numerical breakdown, by source, of the percent contribution. | ||
+ | |||
+ | ==== Hazard Maps ==== | ||
+ | |||
+ | Hazard maps calculate the hazard for a region, by sampling many hazard curves at a certain probability or IM level, calculating the difference between that curve and a GMPE basemap, and interpolating these differences on top of a GMPE basemap. | ||
+ | |||
+ | Hazard maps are generated at the conclusion of each study by Kevin. Maps are posted on the wiki page for each study, under 'Data Products'. | ||
+ | |||
+ | <!-- == CyberShake Data Request Website == | ||
We have developed a prototype CyberShake data access website. It can retrieve the most common CyberShake data requests for our most recent CyberShake Simulations. It may be useful as a way to review the types of data products available from a CyberShake Hazard Model. At this time, however, it requires a SCEC login (username/pwd) to use. Please request a SCEC login if you with to try this access method. A description and link to the site is below: | We have developed a prototype CyberShake data access website. It can retrieve the most common CyberShake data requests for our most recent CyberShake Simulations. It may be useful as a way to review the types of data products available from a CyberShake Hazard Model. At this time, however, it requires a SCEC login (username/pwd) to use. Please request a SCEC login if you with to try this access method. A description and link to the site is below: | ||
− | *[[CyberShake Data Request]] | + | *[[CyberShake Data Request]] //--> |
== Related Entries == | == Related Entries == |
Latest revision as of 21:06, 3 June 2022
This page provides an overview of CyberShake Data, and how to access it.
CyberShake data can be broken down into the following elements based on when it is used in simulations:
- Input data needed for CyberShake runs, such as which ruptures go with which site. This information is stored in the CyberShake database.
- Temporary data generated during CyberShake production runs. This data remains on the cluster and is purged.
- Output data products generated by CyberShake runs. This data is transferred from the cluster to SCEC disks, and some of it is inserted into the CyberShake database for quick access.
We will focus on (1) and (3).
Contents
CyberShake database overview
CyberShake data is served through 2 on-disk relational database servers running MySQL/MariaDB, and an SQLite file for each past study.
MySQL/MariaDB Databases
The two databases used to store CyberShake data are focal.usc.edu ('focal') and moment.usc.edu ('moment').
Examples of accessing data stored in these databases can be found at Accessing_CyberShake_Database_Data.
Moment DB
Moment is the production database server. Currently, it maintains all the necessary inputs, metadata on all CyberShake runs, and results for Study 15.12 and Study 17.3.
Read-only access to moment is:
host: moment.usc.edu user: cybershk_ro password: CyberShake2007 database: CyberShake
Focal DB
Focal is the database server for external user queries. We plan to remove all but the most recent few studies from focal, but this is still in progress, so for now focal has all inputs, metadata, and results up through Study 15.12.
Read-only access to focal is:
host: focal.usc.edu user: cybershk_ro password: CyberShake2007 database: CyberShake
SQLite files
CyberShake input data
At the beginning of a CyberShake run, the database is queried to determine site information (name, latitude, longitude). This can be found in the CyberShake_Sites table.
The database is also used to determine which ruptures fall within the 200 km cutoff. This information is used to construct the necessary volume and select the correct rupture files for processing. This can be found in the CyberShake_Site_Ruptures table, which contains a list of ruptures for each site which fall within a given cutoff.
Both of these tables are populated by Kevin when we select new sites for CyberShake processing.
CyberShake output data
CyberShake runs produce the following output data, divided into data staged back from the cluster, and local data products:
Data staged from cluster:
- Seismograms
- Peak spectral acceleration, X and Y component and geometric mean
- RotD results (for some studies)
- Duration results (for some studies)
Local data products:
- Hazard curves
- Disaggregation results
- Hazard maps
CyberShake data staged from cluster
The data products below are all generated on the remote system, then staged back to SCEC storage as part of the workflow. Some of these data products are inserted into the database.
Seismograms
Seismogram access is detailed at Accessing CyberShake Seismograms.
Acceleration Data
In CyberShake, we have two kinds of acceleration intensity measure data:
- X and Y component and geometric mean data
- RotD50 and RotD100 data (since Study 15.4).
Accessing this data depends on which periods you want, as some of it is in the database, and the rest of it is in files. Accessing this data is detailed at Accessing CyberShake Peak Acceleration Data.
Duration
Duration metric data was populated to the database for Study 15.12 (a study with stochastic components) but not for Study 17.3. Accessing duration data also depends on if you want what's in the database or not. Details on this are available in Accessing CyberShake Duration Data.
CyberShake data products generated locally
These data products are generated locally, on shock.usc.edu, in the final stages of the workflow.
Hazard Curves
Hazard curves are produced by combining the intensity measure data in the database, at a certain period, with the probability of each event. The code for performing this is part of OpenSHA.
Hazard curves are located in the directory /home/scec-00/cybershk/opensha/curves/<site short name>. The convention for the hazard curve name for a particular run, component, and period is:
<site short name>_ERF<erf ID>_Run<Run ID>_SA_<period>sec_<component>_<yyyy>_<mm>_<dd>.<png or pdf>
Note that the year, month, and day are when the run was completed, not when the hazard curve is produced.
In general, hazard curves are automatically generated for the same periods which are inserted into the database.
Disaggregations
Disaggregations calculate how much each CyberShake source (by 'source' we mean UCERF source) contributes to the overall hazard at a certain point on the hazard curve.
Disaggregations are automatically performed at an exeedance probability of 4e-4 (2% in 50 years). These disaggregation files are available in /home/scec-00/cybershk/opensha/disagg . The convention for the disaggregation file name is:
<site short name>_ERF<erf ID>_Run<Run ID>_DisaggPOE_<probability level>_SA_<period>sec_<yyyy>_<mm>_<dd>.<pdf, png, or txt>
Note that the year, month, and day are when the run was completed, not when the hazard curve is produced.
The PDF and PNG files are images, showing a breakdown of what magnitude events at what distance contributed to the hazard. The PDF and text files also have a numerical breakdown, by source, of the percent contribution.
Hazard Maps
Hazard maps calculate the hazard for a region, by sampling many hazard curves at a certain probability or IM level, calculating the difference between that curve and a GMPE basemap, and interpolating these differences on top of a GMPE basemap.
Hazard maps are generated at the conclusion of each study by Kevin. Maps are posted on the wiki page for each study, under 'Data Products'.