BBP Flat File Format

From SCECpedia
Jump to navigationJump to search

This page contains information regarding the BBP Flat File Format, used to collect data from BBP simulations.

The sample flat file above contains data for 50 realizations of the 1994 Northridge earthquake. It uses the GP method and includes data for all stations.

Export Script

The current script only works for simulations performed at the cluster, it requires (a small) change to be able to generate a flat file for a single BBP simulation. Here's the current usage:

$ export_bbp_cluster_simulation.py -i <cluster_top_level_input_directory> -o <output_directory> [-c]

The top level cluster directory is the same directory provided to the scripts that generated the cluster simulation (bbp_hpcc_validation.py). The output directory will be created if needed and will contain the output flat file (currently named bbl-summary-file.csv). The optional -c parameter causes the script to copy the time series from the various realizations into subdirectories inside the output directory. This can be used to generate a package containing the simulation parameters and the time series that can be distributed.

Assumptions

In the produced flat file, linked above, the following abbreviations were used:

  • NA - Not Available
  • TBC - To Be Calculated
  • TBD - To Be Determined

The following assumptions were made on these fields:

  • Simulation Workflow Description: BBP modules used in the workflow, separated by a '/'
  • Site Effects Model: Currently "GP2014" or "None"
  • Realization: Added a realization field to identify from which realization the data comes from
  • Number of SRC files: This field indicates how many times the following 18 fields are repeated (once for each SRC file)
  • The TBD fields 1-5 were not included in the design
  • Velocity Structure Model : Region for the simulation (e.g. LABasin)
  • Greens Function File Name: Comes directly from Rob's naming of the GFs (methods not using GFs have a "NA")
  • Vs30 : Need information on how to calculate it. Is this just the value from the velocity profile corresponding to 30m?
  • Recording Station Name, Station Elevation : Not available, need to be provided as indicated below
  • Number of Unique Components : 1 for ExSIM, 3 for the other methods
  • File Name : BBP uses 3-component BBP files so only 1 file needed for all components
  • LUF/HUF/LUP/HUP : Need to be checked (data coming from the station file, but not sure if correct)
  • Arias Durations : For which component? Should we have 3 sets (5-75, 5-95, total) one for each component?
  • Arias Total : How to calculate this? Is it 0-100?

Additionally:

  • The missing information needed in the General category (Fault ID/Fault Name/etc), Recording Station Name, could be provided either as a separate file that would server as input to the flat file generation script, or as additional keys to the SRC file. In the second case, they would be ignored by the BBP and the script would check for their presence before writing "TBD" or "NA" to the file.
  • For the record identification, one proposal would be to combine event/method/sim_start_time/realization
  • Should we add a DT field for the GFs?
  • For the LUF/HUF/LUP/HUP, there's a question of observations (where the recording equipment dictates some of these values) versus simulations (where modelers tell us what their methods can do). How do we incorporate this concept? Also, this may be relevant when we consider scenario simulations.
  • Should we add a "Method" field to include the simulation method used to generate the data?
  • We should probably add a "Flat File Version" field in case we change the format later.