Difference between revisions of "CSEP2 Storing Stochastic Event Sets"

From SCECpedia
Jump to navigationJump to search
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
__TOC__
 
__TOC__
  
= Introduction =
+
= ZMAP Format =
  
 
This page reflects the work on dealing with stochastic event sets within CSEP2.  
 
This page reflects the work on dealing with stochastic event sets within CSEP2.  
Line 7: Line 7:
 
In order to maintain inter-operability between models and the international testing centers we must adopt a standard format for stochastic event sets that contains the necessary and sufficient information needed to perform CSEP evaluations of the forecasts.  
 
In order to maintain inter-operability between models and the international testing centers we must adopt a standard format for stochastic event sets that contains the necessary and sufficient information needed to perform CSEP evaluations of the forecasts.  
  
The most straightforward approach would be to continue with a catalog format supported by CSEP1. The likely candidate for CSEP2 catalog format would be the so-called ZMAP format which has the following format:
+
The most straightforward approach would be to continue with a catalog format supported by CSEP1. The likely candidate for CSEP2 synthetic catalog format would be the so-called ZMAP format which has the following format:
  
 
# Longitude [deg]
 
# Longitude [deg]
 
# Latitude [deg]
 
# Latitude [deg]
# Decimal year (e.g., 2005.5 for July 1st, 2005)
+
# Year
 
# Month
 
# Month
 
# Day
 
# Day
Line 19: Line 19:
 
# Minute
 
# Minute
 
# Second
 
# Second
 +
# Catalog_ID
  
 
The fields above can be represented on the computer using a number of different file formats. However, we will focus on a binary representation that aims to reduce the total file size.
 
The fields above can be represented on the computer using a number of different file formats. However, we will focus on a binary representation that aims to reduce the total file size.
Line 30: Line 31:
  
 
== UCERF3-ETAS ==
 
== UCERF3-ETAS ==
 
UCERF3-ETAS uses a binary format with the following configuration:
 
  
 
'''Note''': java.io.DataOutputStream stores binary variables in Big Endian notation.
 
'''Note''': java.io.DataOutputStream stores binary variables in Big Endian notation.
  
 +
=== Binary File Format (Single Catalog) ===
 +
UCERF3-ETAS uses a binary format with the following configuration:
 
# Header [6 bytes]
 
# Header [6 bytes]
 
#* File Version: [2 byte short]  
 
#* File Version: [2 byte short]  
Line 51: Line 52:
 
#* FSS Index: [4 byte Integer]
 
#* FSS Index: [4 byte Integer]
 
#* Grid Node Index: [4 byte Integer]
 
#* Grid Node Index: [4 byte Integer]
 +
 +
=== Binary File Format (Stochastic Event Set) ===
 +
# Header [4 bytes]
 +
#* Number of Catalogs in SES: [4 byte Integer]
 +
#* ''M'' Binary Single Catalogs: [76 byte Struct]
 +
 +
=== File Size Calculations ===
  
 
We can calculate the total file size of a single catalog using the following formula:
 
We can calculate the total file size of a single catalog using the following formula:
  
 
  NBYTES_CATALOG: 6B + ''N'' * 70B
 
  NBYTES_CATALOG: 6B + ''N'' * 70B
where ''N'' is the number of events in the catalog
+
where ''N'' is the number of events in the catalog
  
 
We can compute the size of the stochastic event set using:
 
We can compute the size of the stochastic event set using:
  
  NBYTES_SES: ''M'' * NBYTES_CATALOG
+
  NBYTES_SES: 4B + ''M'' * NBYTES_CATALOG
where ''M'' is the number of synthetic catalogs in the stochastic event set.
+
where ''M'' is the number of synthetic catalogs in the stochastic event set.
 +
 
 +
= File Formats for Observed Catalogs =
 +
We would extend the ZMAP format to include metadata fields usable for reproducibility. These might include:
 +
# event_id
 +
# retrieval_time

Latest revision as of 22:27, 24 April 2019

ZMAP Format

This page reflects the work on dealing with stochastic event sets within CSEP2.

In order to maintain inter-operability between models and the international testing centers we must adopt a standard format for stochastic event sets that contains the necessary and sufficient information needed to perform CSEP evaluations of the forecasts.

The most straightforward approach would be to continue with a catalog format supported by CSEP1. The likely candidate for CSEP2 synthetic catalog format would be the so-called ZMAP format which has the following format:

  1. Longitude [deg]
  2. Latitude [deg]
  3. Year
  4. Month
  5. Day
  6. Magnitude
  7. Depth [km]
  8. Hour
  9. Minute
  10. Second
  11. Catalog_ID

The fields above can be represented on the computer using a number of different file formats. However, we will focus on a binary representation that aims to reduce the total file size.

Internally, a stochastic event set will be represented as a collection of pandas DataFrames. Each column in the data frame will represent one of the 10 fields represented in the ZMAP format.

If the machines have sufficient memory, the data frames could be merged into a single large data structure that will support SQL-like queries. Note: Pandas data frames can also interface directly with an SQL database to allow for the possibility of storing simulation results in databases in the future.

File formats for Simulations

This section contains notes on how different forecasts store catalog formats. These notes will be used to implement the file readers.

UCERF3-ETAS

Note: java.io.DataOutputStream stores binary variables in Big Endian notation.

Binary File Format (Single Catalog)

UCERF3-ETAS uses a binary format with the following configuration:

  1. Header [6 bytes]
    • File Version: [2 byte short]
    • Catalog Size: [4 byte Integer]
  2. Rupture Start [70 bytes]
    • Rupture Id: [4 byte Integer]
    • Parent Id: [4 byte Integer]
    • Generation: [2 byte Short]
    • Origin Time: [8 byte Long]
    • Latitude: [8 byte Double]
    • Longitude: [8 byte Double]
    • Depth: [8 byte Double]
    • Magnitude: [8 byte Double]
    • Distance to Parent: [8 byte Double]
    • nth ERF Index: [4 byte Integer]
    • FSS Index: [4 byte Integer]
    • Grid Node Index: [4 byte Integer]

Binary File Format (Stochastic Event Set)

  1. Header [4 bytes]
    • Number of Catalogs in SES: [4 byte Integer]
    • M Binary Single Catalogs: [76 byte Struct]

File Size Calculations

We can calculate the total file size of a single catalog using the following formula:

NBYTES_CATALOG: 6B + N * 70B

where N is the number of events in the catalog

We can compute the size of the stochastic event set using:

NBYTES_SES: 4B + M * NBYTES_CATALOG

where M is the number of synthetic catalogs in the stochastic event set.

File Formats for Observed Catalogs

We would extend the ZMAP format to include metadata fields usable for reproducibility. These might include:

  1. event_id
  2. retrieval_time