Difference between revisions of "CSEP 2 CATALOG FORMAT"

From SCECpedia
Jump to navigationJump to search
 
(44 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
__TOC__
 
__TOC__
  
= Catalog Format =
+
== Summary ==
 +
The general philosophy is the file format should be human readable and easy to use for both researchers and developers. (The actual storage of these catalogs in CSEP testing centers might be different than this proposed format).
  
Catalog properties:
+
The catalog will be defined as a (ASCII/utf-8) text-file in csv format. Each row corresponds to a single event. This file will have the following headers:<br><br>
1. User friendly (easy for non-developers to work with)
 
2. Self-consistent and Independent
 
3. Extensible
 
4. Easy represented in other formats
 
 
 
Considerations:
 
1. File format
 
2. Catalogs per File
 
3. Contents of File
 
 
 
== Summary ==
 
The catalog will initially be defined as an ASCII .csv format where each row corresponds to a single event. This file will have the following headers:<br><br>
 
 
<code>
 
<code>
longitude, latitude, '''M''', epoch_time (time in millisecond since Unix epoch in GMT), depth, +x-offset, +y-offset, catalog_id, event_id
+
longitude, latitude, '''M''', time_string format="%Y-%m-%dT%H:%M:%S.%f", depth, catalog_id, [event_id]
 
</code>
 
</code>
  
<code>'''longitude'''</code>: longitude in decimal degrees [-180,180]</br>
+
<code>'''longitude'''</code>: longitude in decimal degrees</br>
<code>'''latitude'''</code>: latitude in decimal degrees [-90,90]</br>
+
<code>'''latitude'''</code>: latitude in decimal degrees</br>
<code>'''M'''</code>: magnitude arbitrary units</br>
+
<code>'''M'''</code>: magnitude</br>
<code>'''epoch_time'''</code>: time since unix epoch [-inf, inf]</br>
+
<code>'''time string (UTC)'''</code>: year-month-day hour-minute-second.fraction_second. strptime format="%Y-%m-%dT%H:%M:%S.%f": example: 1985-01-01T00:00:00.0</br>
<code>'''depth'''</code>: hypocenter depth in (km) [0, inf]</br>
+
<code>'''depth'''</code>: hypocenter depth in (km)</br>
<code>'''+x-offset'''</code>: east-west offset from lon/lat in (m) [-inf, inf]</br>
 
<code>'''+y-offset'''</code>: north-south offset from lon/lat in (m) [inf,inf]</br>
 
 
<code>'''catalog_id'''</code>: indicates type of catalog</br>
 
<code>'''catalog_id'''</code>: indicates type of catalog</br>
 +
<code>'''event_id'''</code> [optional] column to indicate specific flags for a given event
 
* observed: -1
 
* observed: -1
* simulated: [0,n_cat-1]
+
* simulated: [0, n_cat-1]
<code>'''event_id'''</code>: used to indentify events beyond the basic properties
+
 
* comcat: eg., 'ci0000000'
+
This format can be easily extended to additional file types, including more advanced storage including binary, HDF5, or database representations. Modelers can chose an arbitrary number of catalogs to store within a file. Events will be mapped to catalogs through the ''catalog_id'' field, which would prevent the build-up of many, but potentially empty, catalog files.
* ucerf3: eg., fault-system solution of supraseismogenic
 
  
 
== Example Catalog ==
 
== Example Catalog ==
 +
[[Media:Csep2_sample_catalog-comcat.csv|Downloadable Example]]
 +
 +
The catalog at the above link contains 829 events. This catalog requires 52kb of storage. The table below shows expected sizes of these catalogs for individual catalogs (middle) and forecast (right) assuming 100000 catalogs per forecast. Important: If event_id is not provided by the modeler there needs to be a ',' in the final column. See example below.
 +
 +
{| class = 'wikitable'
 +
|-
 +
! event count
 +
!  catalog size (Mb)
 +
!  forecast size (Gb)
 +
|-
 +
| 1
 +
| 0.00006
 +
| 0.00568
 +
|-
 +
| 10
 +
| 0.00058
 +
| 0.05681
 +
|-
 +
| 100
 +
| 0.00582
 +
| 0.56811
 +
|-
 +
| 500
 +
| 0.02909
 +
| 2.84053
 +
|-
 +
| 1000
 +
| 0.05817
 +
| 5.68107
 +
|-
 +
| 5000
 +
| 0.29087
 +
| 28.40534
 +
|}
 
<code>
 
<code>
lon,lat,M,epoch_time,depth,+x-offset,+y-offset,catalog_id,event_id
+
lon,lat,M,time_string,depth,catalog_id,event_id
</code>
+
</code><br>
-117.43017,35.616665,4.73,1562383355630,9.35,0.0,0.0,-1,ci37219500
+
-117.43017,35.616665,4.73,2019-07-06T03:22:35.630000,9.35,-1,<br>
 
+
-117.7365,35.891,4.64,2019-07-06T03:22:48.300000,9.1,-1,<br>
-117.7365,35.891,4.64,1562383368300,9.1,0.0,0.0,-1,ci37219156
+
-117.617836,35.803165,4.84,2019-07-06T03:23:50.720000,11.44,-1,<br>
-117.617836,35.803165,4.84,1562383430720,11.44,0.0,0.0,-1,ci37219164
+
-117.67083,35.86067,4.61,2019-07-06T03:25:27.970000,10.32,-1,<br>
-117.67083,35.86067,4.61,1562383527970,10.32,0.0,0.0,-1,ci37219172
+
-117.72583,35.913834,4.5,2019-07-06T03:27:07.010000,8.0,-1,<br>
-117.72583,35.913834,4.5,1562383627010,8.0,0.0,0.0,-1,ci37219180
+
-117.431335,35.530334,4.57,2019-07-06T03:27:11.370000,3.83,-1,<br>
-117.431335,35.530334,4.57,1562383631370,3.83,0.0,0.0,-1,ci37219564
+
-117.7115,35.902832,4.51,2019-07-06T03:29:32.080000,3.18,-1,<br>
-117.7115,35.902832,4.51,1562383772080,3.18,0.0,0.0,-1,ci38457583
+
-117.35817,35.556667,4.49,2019-07-06T03:30:25.050000,8.71,-1,<br>
-117.35817,35.556667,4.49,1562383825050,8.71,0.0,0.0,-1,ci38457591
+
-117.73033,35.890335,4.17,2019-07-06T03:32:46.660000,6.15,-1,<br>
-117.73033,35.890335,4.17,1562383966660,6.15,0.0,0.0,-1,ci38457599
+
-117.505165,35.714832,4.13,2019-07-06T03:33:09.850000,1.74,-1,<br>
-117.505165,35.714832,4.13,1562383989850,1.74,0.0,0.0,-1,ci37219692
+
-117.466835,35.65217,4.09,2019-07-06T03:35:05.420000,1.97,-1,<br>
-117.466835,35.65217,4.09,1562384105420,1.97,0.0,0.0,-1,ci38457607
+
-117.73383,35.902832,4.35,2019-07-06T03:36:16.460000,7.27,-1,<br>
-117.73383,35.902832,4.35,1562384176460,7.27,0.0,0.0,-1,ci38457615
 
-117.722,35.892334,3.98,1562384498650,7.87,0.0,0.0,-1,ci38457639
 
-117.693,35.9045,3.96,1562384657670,6.44,0.0,0.0,-1,ci37219924
 
-117.47034,35.6805,4.26,1562384786870,6.85,0.0,0.0,-1,ci38457679
 
-117.7495,35.901165,5.5,1562384873420,5.04,0.0,0.0,-1,ci38457687
 
-117.69,35.9025,3.98,1562384904660,0.5,0.0,0.0,-1,ci37220300
 
-117.700165,35.9035,4.97,1562385059710,8.26,0.0,0.0,-1,ci38457703
 
-117.6875,35.9055,4.06,1562385320020,8.43,0.0,0.0,-1,ci38457735
 
-117.66133,35.869,4.31,1562385470900,7.21,0.0,0.0,-1,ci37421229
 
-117.70883,35.917,3.9,1562385949900,2.34,0.0,0.0,-1,ci37221188
 
-117.37383,35.578,3.79,1562385974580,5.91,0.0,0.0,-1,ci38457887
 
-117.52167,35.55517,4.68,1562386024570,5.64,0.0,0.0,-1,ci38457775
 
-117.58083,35.777668,3.85,1562386249040,10.41,0.0,0.0,-1,ci37221580
 
-117.69067,35.902332,3.83,1562386262210,7.27,0.0,0.0,-1,ci37221588
 
-117.72283,35.887333,3.87,1562386338850,9.17,0.0,0.0,-1,ci37221620
 
-117.6145,35.584667,4.8,1562386387080,9.73,0.0,0.0,-1,ci38457815
 
-117.402664,35.604168,3.84,1562386460800,6.72,0.0,0.0,-1,ci37221644
 
-117.49067,35.6795,3.74,1562386471370,0.91,0.0,0.0,-1,ci37221708
 
-117.68483,35.910168,5.44,1562386735790,7.41,0.0,0.0,-1,ci38457847
 
-117.6175,35.785,4.49,1562386755470,10.6,0.0,0.0,-1,ci37221932
 
-117.717,35.885,3.83,1562386794940,4.9,0.0,0.0,-1,ci38457855
 

Latest revision as of 23:35, 11 June 2020

Summary

The general philosophy is the file format should be human readable and easy to use for both researchers and developers. (The actual storage of these catalogs in CSEP testing centers might be different than this proposed format).

The catalog will be defined as a (ASCII/utf-8) text-file in csv format. Each row corresponds to a single event. This file will have the following headers:

longitude, latitude, M, time_string format="%Y-%m-%dT%H:%M:%S.%f", depth, catalog_id, [event_id]

longitude: longitude in decimal degrees
latitude: latitude in decimal degrees
M: magnitude
time string (UTC): year-month-day hour-minute-second.fraction_second. strptime format="%Y-%m-%dT%H:%M:%S.%f": example: 1985-01-01T00:00:00.0
depth: hypocenter depth in (km)
catalog_id: indicates type of catalog
event_id [optional] column to indicate specific flags for a given event

  • observed: -1
  • simulated: [0, n_cat-1]

This format can be easily extended to additional file types, including more advanced storage including binary, HDF5, or database representations. Modelers can chose an arbitrary number of catalogs to store within a file. Events will be mapped to catalogs through the catalog_id field, which would prevent the build-up of many, but potentially empty, catalog files.

Example Catalog

Downloadable Example

The catalog at the above link contains 829 events. This catalog requires 52kb of storage. The table below shows expected sizes of these catalogs for individual catalogs (middle) and forecast (right) assuming 100000 catalogs per forecast. Important: If event_id is not provided by the modeler there needs to be a ',' in the final column. See example below.

event count catalog size (Mb) forecast size (Gb)
1 0.00006 0.00568
10 0.00058 0.05681
100 0.00582 0.56811
500 0.02909 2.84053
1000 0.05817 5.68107
5000 0.29087 28.40534

lon,lat,M,time_string,depth,catalog_id,event_id
-117.43017,35.616665,4.73,2019-07-06T03:22:35.630000,9.35,-1,
-117.7365,35.891,4.64,2019-07-06T03:22:48.300000,9.1,-1,
-117.617836,35.803165,4.84,2019-07-06T03:23:50.720000,11.44,-1,
-117.67083,35.86067,4.61,2019-07-06T03:25:27.970000,10.32,-1,
-117.72583,35.913834,4.5,2019-07-06T03:27:07.010000,8.0,-1,
-117.431335,35.530334,4.57,2019-07-06T03:27:11.370000,3.83,-1,
-117.7115,35.902832,4.51,2019-07-06T03:29:32.080000,3.18,-1,
-117.35817,35.556667,4.49,2019-07-06T03:30:25.050000,8.71,-1,
-117.73033,35.890335,4.17,2019-07-06T03:32:46.660000,6.15,-1,
-117.505165,35.714832,4.13,2019-07-06T03:33:09.850000,1.74,-1,
-117.466835,35.65217,4.09,2019-07-06T03:35:05.420000,1.97,-1,
-117.73383,35.902832,4.35,2019-07-06T03:36:16.460000,7.27,-1,