Difference between revisions of "CyberShake Study Database Archiving"

From SCECpedia
Jump to navigationJump to search
Line 1: Line 1:
 
The following procedure should be used when we want to archive a study which is either in the production or data access database to disk, to free up room in the database.
 
The following procedure should be used when we want to archive a study which is either in the production or data access database to disk, to free up room in the database.
  
First, dump the old database contents:
+
<b>First, dump the old database contents into a new directory:</b>
  
 
<ol>
 
<ol>
Line 20: Line 20:
 
</ol>
 
</ol>
  
Next, convert the SQL dumps into SQLite format using mysql2sqlite:
+
<b>Next, convert the SQL dumps into SQLite format using mysql2sqlite:</b>
  
 
#For each of the 6 dump files, run:
 
#For each of the 6 dump files, run:
Line 27: Line 27:
 
For large tables this may take an hour or two.
 
For large tables this may take an hour or two.
  
Create a SQLite database and import the tables.  Use sqlite3 3.7.11 or later, or there will be an error reading the dump files.
+
<b>Create a SQLite database and import the tables.  Use sqlite3 3.7.11 or later, or there will be an error reading the dump files.</b>
 
<ol>
 
<ol>
 
<li>Create a database for this study:</li>
 
<li>Create a database for this study:</li>
Line 35: Line 35:
 
</ol>
 
</ol>
  
Run a few queries on the original table on the database, and on the sqlite files to check that the count is the same.
+
<b>Run a few queries on the original table on the database, and on the sqlite files to check that the count is the same.</b>
 
<ol>
 
<ol>
 
<li>Check the number of PeakAmplitudes:</li>
 
<li>Check the number of PeakAmplitudes:</li>
Line 42: Line 42:
 
   select count(*) from CyberShake_Runs R, CyberShake_Site_Ruptures SR, Rupture_Variations V where R.Site_ID=SR.CS_Site_ID and SR.Source_ID=V.Source_ID and SR.Rupture_ID=V.Rupture_ID and R.ERF_ID=V.ERF_ID and R.ERF_ID=SR.ERF_ID and R.Study_ID=5 and V.Rup_Var_Scenario_ID=R.Rup_Var_Scenario_ID;
 
   select count(*) from CyberShake_Runs R, CyberShake_Site_Ruptures SR, Rupture_Variations V where R.Site_ID=SR.CS_Site_ID and SR.Source_ID=V.Source_ID and SR.Rupture_ID=V.Rupture_ID and R.ERF_ID=V.ERF_ID and R.ERF_ID=SR.ERF_ID and R.Study_ID=5 and V.Rup_Var_Scenario_ID=R.Rup_Var_Scenario_ID;
 
</ol>
 
</ol>
 +
 +
<b>Move the files to the archive location on scec-04.</b>
 +
<ol>
 +
<li>Tar up the sqlite files:</li>
 +
  tar czvf <study_name>.tgz *.sqlite
 +
<li>Copy the tgz file to the study sqlite archive location on scec-04.</li>
 +
  cp <study_name>.tgz /home/scec-04/tera3d/CyberShake/sqlite_studies/sqlite_dumps
 +
<li>Extract the database to /home/scec-04/tera3d/CyberShake/sqlite_studies.</li>
 +
  tar xzvf sqlite_dumps/<study_name>.tgz <study_name>.sqlite
 +
</ol>
 +
 +
<b>Extract

Revision as of 02:37, 6 October 2017

The following procedure should be used when we want to archive a study which is either in the production or data access database to disk, to free up room in the database.

First, dump the old database contents into a new directory:

  1. Determine the Study ID you want to archive, from looking at the Studies table in the DB:
  2. select * from Studies;
  3. Dump the PeakAmplitudes, using:
  4. mysqldump --lock-all-tables --where 'Run_ID in (select R.Run_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>)' CyberShake PeakAmplitudes > peak_amps.sql
  5. Dump Hazard_Datasets, using:
  6. mysqldump --lock-all-tables --where 'Study_ID=<study id to archive>' CyberShake Hazard_Datasets > hazard_datasets.sql
  7. Dump Hazard_Curves, using:
  8. mysqldump --lock-all-tables --where 'Hazard_Dataset_ID in (select D.Hazard_Dataset_ID from Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake Hazard_Curves > hazard_curves.sql
  9. Dump Hazard_Curve_Points, using:
  10. mysqldump --lock-all-tables --where 'Hazard_Curve_ID in (select C.Hazard_Curve_ID from Hazard_Curves C, Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake Hazard_Curve_Points > hazard_curve_points.sql
  11. Dump CyberShake_Runs, using:
  12. mysqldump --lock-all-tables --where 'Study_ID=<study id to archive>' CyberShake CyberShake_Runs > runs.sql
  13. Dump all of the input tables -- they are small enough that we don't mind capturing data which wasn't directly used in this study.
  14. mysqldump CyberShake AR_Hazard_Curve_Points AR_Hazard_Curves AR_Hazard_Datasets Atten_Rel_Metadata Atten_Rels CyberShake_Site_Regions CyberShake_Site_Ruptures CyberShake_Site_Types CyberShake_Sites ERF_IDs ERF_Metadata ERF_Probability_Models IM_Types Points Rupture_Variation_Probability_Modifier Rupture_Variation_Scenario_IDs Rupture_Variation_Scenario_Metadata Rupture_Variations Ruptures SGT_Variation_IDs SGT_Variation_Metadata Studies Time_Spans Velocity_Models > input_tables.sql

Next, convert the SQL dumps into SQLite format using mysql2sqlite:

  1. For each of the 6 dump files, run:
 mysql2sqlite <SQL dump file> > <SQLite dump file>
 Example:  ./mysql2sqlite peak_amps.sql > peak_amps.sqlite

For large tables this may take an hour or two.

Create a SQLite database and import the tables. Use sqlite3 3.7.11 or later, or there will be an error reading the dump files.

  1. Create a database for this study:
  2. sqlite3 <study name>
  3. For each dump file, run the following command:
  4. .read <path/to/dump/file.sqlite>

Run a few queries on the original table on the database, and on the sqlite files to check that the count is the same.

  1. Check the number of PeakAmplitudes:
  2. select count(*) from PeakAmplitudes
  3. Check the number of rupture variations for each site:
  4. select count(*) from CyberShake_Runs R, CyberShake_Site_Ruptures SR, Rupture_Variations V where R.Site_ID=SR.CS_Site_ID and SR.Source_ID=V.Source_ID and SR.Rupture_ID=V.Rupture_ID and R.ERF_ID=V.ERF_ID and R.ERF_ID=SR.ERF_ID and R.Study_ID=5 and V.Rup_Var_Scenario_ID=R.Rup_Var_Scenario_ID;

Move the files to the archive location on scec-04.

  1. Tar up the sqlite files:
  2. tar czvf <study_name>.tgz *.sqlite
  3. Copy the tgz file to the study sqlite archive location on scec-04.
  4. cp <study_name>.tgz /home/scec-04/tera3d/CyberShake/sqlite_studies/sqlite_dumps
  5. Extract the database to /home/scec-04/tera3d/CyberShake/sqlite_studies.
  6. tar xzvf sqlite_dumps/<study_name>.tgz <study_name>.sqlite

Extract