Difference between revisions of "CyberShake Study Database Archiving"

From SCECpedia
Jump to navigationJump to search
Line 3: Line 3:
 
First, dump the old database contents:
 
First, dump the old database contents:
  
#Determine the Study ID you want to archive, from looking at the Studies table in the DB:
+
<ol>
#:  select * from Studies;
+
<li>Determine the Study ID you want to archive, from looking at the Studies table in the DB:</li>
#Dump the PeakAmplitudes, using:
+
  select * from Studies;
 +
<li>Dump the PeakAmplitudes, using:</li>
 
   mysqldump --lock-all-tables --where 'Run_ID in (select R.Run_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>)' CyberShake PeakAmplitudes > peak_amps.sql
 
   mysqldump --lock-all-tables --where 'Run_ID in (select R.Run_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>)' CyberShake PeakAmplitudes > peak_amps.sql
#Dump Hazard_Datasets, using:
+
<li>Dump Hazard_Datasets, using:</li>
 
   mysqldump --lock-all-tables --where 'Study_ID=<study id to archive>' CyberShake Hazard_Datasets > hazard_datasets.sql
 
   mysqldump --lock-all-tables --where 'Study_ID=<study id to archive>' CyberShake Hazard_Datasets > hazard_datasets.sql
#Dump Hazard_Curves, using:
+
<li>Dump Hazard_Curves, using:</li>
 
   mysqldump --lock-all-tables --where 'Hazard_Dataset_ID in (select D.Hazard_Dataset_ID from Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake Hazard_Curves > hazard_curves.sql
 
   mysqldump --lock-all-tables --where 'Hazard_Dataset_ID in (select D.Hazard_Dataset_ID from Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake Hazard_Curves > hazard_curves.sql
#Dump Hazard_Curve_Points, using:
+
<li>Dump Hazard_Curve_Points, using:</li>
   mysqldump --lock-all-tables --where 'Hazard_Curve_ID in (select C.Hazard_Curve_ID from Hazard_Curves C, Hazard_Datasets D where D.Study_ID=
+
   mysqldump --lock-all-tables --where 'Hazard_Curve_ID in (select C.Hazard_Curve_ID from Hazard_Curves C, Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake Hazard_Curve_Points > hazard_curve_points.sql
<study id to archive>)' CyberShake Hazard_Curve_Points > hazard_curve_points.sql
+
<li>Dump CyberShake_Runs, using:</li>
#Dump CyberShake_Runs, using:
 
 
   mysqldump --lock-all-tables --where 'Study_ID=<study id to archive>' CyberShake CyberShake_Runs > runs.sql
 
   mysqldump --lock-all-tables --where 'Study_ID=<study id to archive>' CyberShake CyberShake_Runs > runs.sql
#Dump all of the input tables -- they are small enough that we don't mind capturing data which wasn't directly used in this study.
+
<li>Dump all of the input tables -- they are small enough that we don't mind capturing data which wasn't directly used in this study.</li>
 
   mysqldump CyberShake AR_Hazard_Curve_Points AR_Hazard_Curves AR_Hazard_Datasets Atten_Rel_Metadata Atten_Rels CyberShake_Site_Regions CyberShake_Site_Ruptures CyberShake_Site_Types CyberShake_Sites ERF_IDs ERF_Metadata ERF_Probability_Models IM_Types Points Rupture_Variation_Probability_Modifier Rupture_Variation_Scenario_IDs Rupture_Variation_Scenario_Metadata Rupture_Variations Ruptures SGT_Variation_IDs SGT_Variation_Metadata Studies Time_Spans Velocity_Models > input_tables.sql
 
   mysqldump CyberShake AR_Hazard_Curve_Points AR_Hazard_Curves AR_Hazard_Datasets Atten_Rel_Metadata Atten_Rels CyberShake_Site_Regions CyberShake_Site_Ruptures CyberShake_Site_Types CyberShake_Sites ERF_IDs ERF_Metadata ERF_Probability_Models IM_Types Points Rupture_Variation_Probability_Modifier Rupture_Variation_Scenario_IDs Rupture_Variation_Scenario_Metadata Rupture_Variations Ruptures SGT_Variation_IDs SGT_Variation_Metadata Studies Time_Spans Velocity_Models > input_tables.sql
 
+
</ol>
  
 
Next, convert the SQL dumps into SQLite format using mysql2sqlite:
 
Next, convert the SQL dumps into SQLite format using mysql2sqlite:
Line 28: Line 28:
  
 
Create a SQLite database and import the tables.  Use sqlite3 3.7.11 or later, or there will be an error reading the dump files.
 
Create a SQLite database and import the tables.  Use sqlite3 3.7.11 or later, or there will be an error reading the dump files.
#Create a database for this study:
+
<ol>
 +
<li>Create a database for this study:</li>
 
   sqlite3 <study name>
 
   sqlite3 <study name>
#For each dump file, run the following command:
+
<li>For each dump file, run the following command:</li>
 
   .read <path/to/dump/file.sqlite>
 
   .read <path/to/dump/file.sqlite>
 +
</ol>
 +
 +
Run a few queries on the original table on the database, and on the sqlite files to check that the count is the same.
 +
<ol>
 +
<li>Check the number of PeakAmplitudes:</li>
 +
  select count(*) from PeakAmplitudes
 +
<li>Check the number of rupture variations for each site:</li>
 +
  select count(*) from CyberShake_Runs R, CyberShake_Site_Ruptures SR, Rupture_Variations V where R.Site_ID=SR.CS_Site_ID and SR.Source_ID=V.Source_ID and SR.Rupture_ID=V.Rupture_ID and R.ERF_ID=V.ERF_ID and R.ERF_ID=SR.ERF_ID and R.Study_ID=5 and V.Rup_Var_Scenario_ID=R.Rup_Var_Scenario_ID;
 +
</ol>

Revision as of 02:29, 6 October 2017

The following procedure should be used when we want to archive a study which is either in the production or data access database to disk, to free up room in the database.

First, dump the old database contents:

  1. Determine the Study ID you want to archive, from looking at the Studies table in the DB:
  2. select * from Studies;
  3. Dump the PeakAmplitudes, using:
  4. mysqldump --lock-all-tables --where 'Run_ID in (select R.Run_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>)' CyberShake PeakAmplitudes > peak_amps.sql
  5. Dump Hazard_Datasets, using:
  6. mysqldump --lock-all-tables --where 'Study_ID=<study id to archive>' CyberShake Hazard_Datasets > hazard_datasets.sql
  7. Dump Hazard_Curves, using:
  8. mysqldump --lock-all-tables --where 'Hazard_Dataset_ID in (select D.Hazard_Dataset_ID from Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake Hazard_Curves > hazard_curves.sql
  9. Dump Hazard_Curve_Points, using:
  10. mysqldump --lock-all-tables --where 'Hazard_Curve_ID in (select C.Hazard_Curve_ID from Hazard_Curves C, Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake Hazard_Curve_Points > hazard_curve_points.sql
  11. Dump CyberShake_Runs, using:
  12. mysqldump --lock-all-tables --where 'Study_ID=<study id to archive>' CyberShake CyberShake_Runs > runs.sql
  13. Dump all of the input tables -- they are small enough that we don't mind capturing data which wasn't directly used in this study.
  14. mysqldump CyberShake AR_Hazard_Curve_Points AR_Hazard_Curves AR_Hazard_Datasets Atten_Rel_Metadata Atten_Rels CyberShake_Site_Regions CyberShake_Site_Ruptures CyberShake_Site_Types CyberShake_Sites ERF_IDs ERF_Metadata ERF_Probability_Models IM_Types Points Rupture_Variation_Probability_Modifier Rupture_Variation_Scenario_IDs Rupture_Variation_Scenario_Metadata Rupture_Variations Ruptures SGT_Variation_IDs SGT_Variation_Metadata Studies Time_Spans Velocity_Models > input_tables.sql

Next, convert the SQL dumps into SQLite format using mysql2sqlite:

  1. For each of the 6 dump files, run:
 mysql2sqlite <SQL dump file> > <SQLite dump file>
 Example:  ./mysql2sqlite peak_amps.sql > peak_amps.sqlite

For large tables this may take an hour or two.

Create a SQLite database and import the tables. Use sqlite3 3.7.11 or later, or there will be an error reading the dump files.

  1. Create a database for this study:
  2. sqlite3 <study name>
  3. For each dump file, run the following command:
  4. .read <path/to/dump/file.sqlite>

Run a few queries on the original table on the database, and on the sqlite files to check that the count is the same.

  1. Check the number of PeakAmplitudes:
  2. select count(*) from PeakAmplitudes
  3. Check the number of rupture variations for each site:
  4. select count(*) from CyberShake_Runs R, CyberShake_Site_Ruptures SR, Rupture_Variations V where R.Site_ID=SR.CS_Site_ID and SR.Source_ID=V.Source_ID and SR.Rupture_ID=V.Rupture_ID and R.ERF_ID=V.ERF_ID and R.ERF_ID=SR.ERF_ID and R.Study_ID=5 and V.Rup_Var_Scenario_ID=R.Rup_Var_Scenario_ID;