Difference between revisions of "CyberShake Study Database Archiving"

From SCECpedia
Jump to navigationJump to search
(Created page with "The following procedure should be used when we want to archive a study which is either in the production or data access database to disk, to free up room in the database. #Du...")
 
Line 1: Line 1:
 
The following procedure should be used when we want to archive a study which is either in the production or data access database to disk, to free up room in the database.
 
The following procedure should be used when we want to archive a study which is either in the production or data access database to disk, to free up room in the database.
  
#Dump the
+
First, dump the old database contents:
 +
 
 +
#Determine the Study ID you want to archive, from looking at the Studies table in the DB:
 +
  select * from Studies;
 +
#Dump the PeakAmplitudes, using:
 +
  mysqldump --lock-all-tables --where 'Run_ID in (select R.Run_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>)' CyberShake PeakAmplitudes > peak_amps.sql
 +
#Dump Hazard_Datasets, using:
 +
  mysqldump --lock-all-tables --where 'Study_ID=<study id to archive>' CyberShake Hazard_Datasets > hazard_datasets.sql
 +
#Dump Hazard_Curves, using:
 +
  mysqldump --lock-all-tables --where 'Hazard_Dataset_ID in (select D.Hazard_Dataset_ID from Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake Hazard_Curves > hazard_curves.sql
 +
#Dump Hazard_Curve_Points, using:
 +
  mysqldump --lock-all-tables --where 'Hazard_Curve_ID in (select C.Hazard_Curve_ID from Hazard_Curves C, Hazard_Datasets D where D.Study_ID=
 +
<study id to archive>)' CyberShake Hazard_Curve_Points > hazard_curve_points.sql
 +
#Dump CyberShake_Runs, using:
 +
  mysqldump --lock-all-tables --where 'Study_ID=<study id to archive>' CyberShake CyberShake_Runs > runs.sql
 +
#Dump all of the input tables -- they are small enough that we don't mind capturing data which wasn't directly used in this study.
 +
  mysqldump CyberShake AR_Hazard_Curve_Points AR_Hazard_Curves AR_Hazard_Datasets Atten_Rel_Metadata Atten_Rels CyberShake_Site_Regions CyberShake_Site_Ruptures CyberShake_Site_Types CyberShake_Sites ERF_IDs ERF_Metadata ERF_Probability_Models IM_Types Points Rupture_Variation_Probability_Modifier Rupture_Variation_Scenario_IDs Rupture_Variation_Scenario_Metadata Rupture_Variations Ruptures SGT_Variation_IDs SGT_Variation_Metadata Studies Time_Spans Velocity_Models > input_tables.sql
 +
 
 +
 
 +
Next, convert the SQL dumps into SQLite format using mysql2sqlite:
 +
 
 +
#For each of the 6 dump files, run:
 +
  mysql2sqlite <SQL dump file> > <SQLite dump file>
 +
  Example:  ./mysql2sqlite peak_amps.sql > peak_amps.sqlite
 +
For large tables this may take an hour or two.
 +
 
 +
Create a SQLite database and import the tables.  Use sqlite3 3.7.11 or later, or there will be an error reading the dump files.
 +
#Create a database for this study:
 +
  sqlite3 <study name>
 +
#For each dump file, run the following command:
 +
  .read <path/to/dump/file.sqlite>

Revision as of 02:18, 6 October 2017

The following procedure should be used when we want to archive a study which is either in the production or data access database to disk, to free up room in the database.

First, dump the old database contents:

  1. Determine the Study ID you want to archive, from looking at the Studies table in the DB:
 select * from Studies;
  1. Dump the PeakAmplitudes, using:
 mysqldump --lock-all-tables --where 'Run_ID in (select R.Run_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>)' CyberShake PeakAmplitudes > peak_amps.sql
  1. Dump Hazard_Datasets, using:
 mysqldump --lock-all-tables --where 'Study_ID=<study id to archive>' CyberShake Hazard_Datasets > hazard_datasets.sql
  1. Dump Hazard_Curves, using:
 mysqldump --lock-all-tables --where 'Hazard_Dataset_ID in (select D.Hazard_Dataset_ID from Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake Hazard_Curves > hazard_curves.sql
  1. Dump Hazard_Curve_Points, using:
 mysqldump --lock-all-tables --where 'Hazard_Curve_ID in (select C.Hazard_Curve_ID from Hazard_Curves C, Hazard_Datasets D where D.Study_ID=

<study id to archive>)' CyberShake Hazard_Curve_Points > hazard_curve_points.sql

  1. Dump CyberShake_Runs, using:
 mysqldump --lock-all-tables --where 'Study_ID=<study id to archive>' CyberShake CyberShake_Runs > runs.sql
  1. Dump all of the input tables -- they are small enough that we don't mind capturing data which wasn't directly used in this study.
 mysqldump CyberShake AR_Hazard_Curve_Points AR_Hazard_Curves AR_Hazard_Datasets Atten_Rel_Metadata Atten_Rels CyberShake_Site_Regions CyberShake_Site_Ruptures CyberShake_Site_Types CyberShake_Sites ERF_IDs ERF_Metadata ERF_Probability_Models IM_Types Points Rupture_Variation_Probability_Modifier Rupture_Variation_Scenario_IDs Rupture_Variation_Scenario_Metadata Rupture_Variations Ruptures SGT_Variation_IDs SGT_Variation_Metadata Studies Time_Spans Velocity_Models > input_tables.sql


Next, convert the SQL dumps into SQLite format using mysql2sqlite:

  1. For each of the 6 dump files, run:
 mysql2sqlite <SQL dump file> > <SQLite dump file>
 Example:  ./mysql2sqlite peak_amps.sql > peak_amps.sqlite

For large tables this may take an hour or two.

Create a SQLite database and import the tables. Use sqlite3 3.7.11 or later, or there will be an error reading the dump files.

  1. Create a database for this study:
 sqlite3 <study name>
  1. For each dump file, run the following command:
 .read <path/to/dump/file.sqlite>