CyberShake Study Database Archiving

SQLite Archiving

The following procedure should be used when we want to archive a study to SQLite which is either in the production or data access database to disk, to free up room in the database.

First, dump the old database contents into a new directory by running the following commands.

Note that if you don't want to lock the tables, you can replace '--lock-all-tables' with '--single-transaction=TRUE' in the following commands.

Determine the Study ID you want to archive, from looking at the Studies table in the DB:
Dump the PeakAmplitudes, using:
Dump Hazard_Datasets, using:
Dump Hazard_Curves, using:
Dump Hazard_Curve_Points, using:
Dump CyberShake_Runs, using:
You now need the rest of the input tables. They used to be pretty small, but now they're up to ~70GB with indices. As of the completion of Study 22.12, here are their approximate sizes (in the DB, not as dump files):

Table Size (GB)

AR_Hazard_Curve_Points 44.6

Points 6.3

CyberShake_Site_Ruptures 10.3

Ruptures 2.3

Rupture_Variations 2.6

AR_Hazard_Curves 1.4

Other tables 0.5

You can either:
- Dump all of the input tables -- they are small enough that we don't mind capturing data which wasn't directly used in this study.
- Dump only the sections of the large input tables that you need. Note that the Points table isn't used anymore, so don't bother to dump it.
  1. mysqldump --lock-all-tables -u cybershk -p --where 'AR_Hazard_Curve_ID in (select distinct C.AR_Hazard_Curve_ID from AR_Hazard_Curves C where C.AR_Hazard_Curve_Dataset_ID in (select distinct D.AR_Hazard_Dataset_ID from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lat>=(select distinct D.Min_Lat from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lat<=(select distinct D.Max_Lat from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lon>=(select distinct D.Min_Lon from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lon<=(select distinct D.Max_Lon from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) )' CyberShake AR_Hazard_Curve_Points > ar_hazard_curve_points.sql
  2. mysqldump --lock-all-tables -u cybershk -p --where 'CS_Site_ID in (select R.Site_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>) and ERF_ID in (select R.ERF_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>)' CyberShake CyberShake_Site_Ruptures > cs_site_ruptures.sql
  3. mysqldump --lock-all-tables -u cybershk -p --where 'ERF_ID in (select R.ERF_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>) and (Source_ID, Rupture_ID) in (select distinct SR.Source_ID, SR.Rupture_ID from CyberShake_Site_Ruptures SR, CyberShake_Runs R where SR.CS_Site_ID=R.Site_ID and SR.ERF_ID=R.ERF_ID and R.Study_ID=<study id to archive>)' CyberShake Ruptures > ruptures.sql
  4. mysqldump --lock-all-tables -u cybershk -p --where '(Rup_Var_Scenario_ID, ERF_ID) in (select R.Rup_Var_Scenario_ID, R.ERF_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>) and (Source_ID, Rupture_ID) in (select distinct SR.Source_ID, SR.Rupture_ID from CyberShake_Site_Ruptures SR, CyberShake_Runs R where SR.CS_Site_ID=R.Site_ID and SR.ERF_ID=R.ERF_ID and R.Study_ID=<study id to archive>)' CyberShake Rupture_Variations > rupture_variations.sql
  5. mysqldump --lock-all-tables -u cybershk -p --where 'AR_Hazard_Dataset_ID in (select distinct D.AR_Hazard_Dataset_ID from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lat>=(select distinct D.Min_Lat from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lat<=(select distinct D.Max_Lat from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lon>=(select distinct D.Min_Lon from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lon<=(select distinct D.Max_Lon from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID)' CyberShake AR_Hazard_Curves > ar_hazard_curves.sql
  6. mysqldump --lock-all-tables -u cybershk -p CyberShake AR_Hazard_Datasets Atten_Rel_Metadata Atten_Rels CyberShake_Site_Regions CyberShake_Site_Types CyberShake_Sites ERF_IDs ERF_Metadata ERF_Probability_Models IM_Types Rupture_Variation_Probability_Modifier Rupture_Variation_Scenario_IDs Rupture_Variation_Scenario_Metadata SGT_Variation_IDs SGT_Variation_Metadata Studies Time_Spans Velocity_Models

Next, convert the SQL dumps into SQLite format using mysql2sqlite:

For each of the 6 dump files, run:

 mysql2sqlite <SQL dump file> > <SQLite dump file>
 Example:  ./mysql2sqlite peak_amps.sql > peak_amps.sqlite

For large tables this may take an hour or two.

Create a SQLite database and import the tables. Use sqlite3 3.7.11 or later, or there will be an error reading the dump files.

Create a database for this study:
For each dump file, run the following command:

Run a few queries on the original table on the database, and on the sqlite files to check that the count is the same.

Check the number of PeakAmplitudes:
Check the number of rupture variations for each site:

Create index on Run_ID and IM_Type_ID in PeakAmplitudes table to speed up access.

Copy over the original version to a new file.
Set the SQLITE_TMPDIR variable to point to a filesystem with lots of free space.
Using a current version of sqlite3, open the new database file and run:

Move the files to the archive location on project.

Tar up the sqlite files:
SFTP the tgz file to the study sqlite archive location on project at CARC (/project/scec_608/cybershake/results/sqlite_studies/<study name>), using hpc-transfer1.
Extract the database to /project/scec_608/cybershake/results/sqlite_studies/<study_name> .

CyberShake Study Database Archiving

SQLite Archiving

Move study to from production DB to data access DB

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

Table	Size (GB)
AR_Hazard_Curve_Points	44.6
Points	6.3
CyberShake_Site_Ruptures	10.3
Ruptures	2.3
Rupture_Variations	2.6
AR_Hazard_Curves	1.4
Other tables	0.5