Difference between revisions of "CyberShake Study Database Archiving"

From SCECpedia
Jump to navigationJump to search
 
(21 intermediate revisions by the same user not shown)
Line 1: Line 1:
The following procedure should be used when we want to archive a study which is either in the production or data access database to disk, to free up room in the database.
+
== SQLite Archiving ==
  
First, dump the old database contents:
+
The following procedure should be used when we want to archive a study to SQLite which is either in the production or data access database to disk, to free up room in the database.
  
#Determine the Study ID you want to archive, from looking at the Studies table in the DB:
+
<b>First, dump the old database contents into a new directory by running the following commands.</b>
  #:select * from Studies;
 
#Dump the PeakAmplitudes, using:
 
  mysqldump --lock-all-tables --where 'Run_ID in (select R.Run_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>)' CyberShake PeakAmplitudes > peak_amps.sql
 
#Dump Hazard_Datasets, using:
 
  mysqldump --lock-all-tables --where 'Study_ID=<study id to archive>' CyberShake Hazard_Datasets > hazard_datasets.sql
 
#Dump Hazard_Curves, using:
 
  mysqldump --lock-all-tables --where 'Hazard_Dataset_ID in (select D.Hazard_Dataset_ID from Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake Hazard_Curves > hazard_curves.sql
 
#Dump Hazard_Curve_Points, using:
 
  mysqldump --lock-all-tables --where 'Hazard_Curve_ID in (select C.Hazard_Curve_ID from Hazard_Curves C, Hazard_Datasets D where D.Study_ID=
 
<study id to archive>)' CyberShake Hazard_Curve_Points > hazard_curve_points.sql
 
#Dump CyberShake_Runs, using:
 
  mysqldump --lock-all-tables --where 'Study_ID=<study id to archive>' CyberShake CyberShake_Runs > runs.sql
 
#Dump all of the input tables -- they are small enough that we don't mind capturing data which wasn't directly used in this study.
 
  mysqldump CyberShake AR_Hazard_Curve_Points AR_Hazard_Curves AR_Hazard_Datasets Atten_Rel_Metadata Atten_Rels CyberShake_Site_Regions CyberShake_Site_Ruptures CyberShake_Site_Types CyberShake_Sites ERF_IDs ERF_Metadata ERF_Probability_Models IM_Types Points Rupture_Variation_Probability_Modifier Rupture_Variation_Scenario_IDs Rupture_Variation_Scenario_Metadata Rupture_Variations Ruptures SGT_Variation_IDs SGT_Variation_Metadata Studies Time_Spans Velocity_Models > input_tables.sql
 
  
 +
Note that if you don't want to lock the tables, you can replace '--lock-all-tables' with '--single-transaction=TRUE' in the following commands.
  
Next, convert the SQL dumps into SQLite format using mysql2sqlite:
+
<ol>
 +
<li>Determine the Study ID you want to archive, from looking at the Studies table in the DB:</li>
 +
  select * from Studies;
 +
<li>Dump the PeakAmplitudes, using:</li>
 +
  mysqldump --lock-all-tables -u cybershk -p --where 'Run_ID in (select R.Run_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>)' CyberShake PeakAmplitudes > peak_amps.sql
 +
<li>Dump Hazard_Datasets, using:</li>
 +
  mysqldump --lock-all-tables -u cybershk -p --where 'Study_ID=<study id to archive>' CyberShake Hazard_Datasets > hazard_datasets.sql
 +
<li>Dump Hazard_Curves, using:</li>
 +
  mysqldump --lock-all-tables -u cybershk -p --where 'Hazard_Dataset_ID in (select D.Hazard_Dataset_ID from Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake Hazard_Curves > hazard_curves.sql
 +
<li>Dump Hazard_Curve_Points, using:</li>
 +
  mysqldump --lock-all-tables -u cybershk -p --where 'Hazard_Curve_ID in (select C.Hazard_Curve_ID from Hazard_Curves C, Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake Hazard_Curve_Points > hazard_curve_points.sql
 +
<li>Dump CyberShake_Runs, using:</li>
 +
  mysqldump --lock-all-tables -u cybershk -p --where 'Study_ID=<study id to archive>' CyberShake CyberShake_Runs > runs.sql
 +
<li>You now need the rest of the input tables.  They used to be pretty small, but now they're up to ~70GB with indices.  As of the completion of Study 22.12, here are their approximate sizes (in the DB, not as dump files):
 +
{| border="1"
 +
! Table !! Size (GB)
 +
|-
 +
| AR_Hazard_Curve_Points || 44.6
 +
|-
 +
| Points || 6.3
 +
|-
 +
| CyberShake_Site_Ruptures || 10.3
 +
|-
 +
| Ruptures || 2.3
 +
|-
 +
| Rupture_Variations || 2.6
 +
|-
 +
| AR_Hazard_Curves || 1.4
 +
|-
 +
| Other tables || 0.5
 +
|}
 +
You can either:
 +
<ul>
 +
<li>Dump all of the input tables -- they are small enough that we don't mind capturing data which wasn't directly used in this study.</li>
 +
  mysqldump -u cybershk -p CyberShake AR_Hazard_Curve_Points AR_Hazard_Curves AR_Hazard_Datasets Atten_Rel_Metadata Atten_Rels CyberShake_Site_Regions CyberShake_Site_Ruptures CyberShake_Site_Types CyberShake_Sites ERF_IDs ERF_Metadata ERF_Probability_Models IM_Types Points Rupture_Variation_Probability_Modifier Rupture_Variation_Scenario_IDs Rupture_Variation_Scenario_Metadata Rupture_Variations Ruptures SGT_Variation_IDs SGT_Variation_Metadata Studies Time_Spans Velocity_Models > input_tables.sql</li>
 +
<li>Dump only the sections of the large input tables that you need.  Note that the Points table isn't used anymore, so don't bother to dump it.
 +
<ol>
 +
<li>mysqldump --lock-all-tables -u cybershk -p --where 'AR_Hazard_Curve_ID in (select distinct C.AR_Hazard_Curve_ID from AR_Hazard_Curves C where C.AR_Hazard_Dataset_ID in (select distinct D.AR_Hazard_Dataset_ID from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lat>=(select distinct D.Min_Lat from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lat<=(select distinct D.Max_Lat from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lon>=(select distinct D.Min_Lon from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lon<=(select distinct D.Max_Lon from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) )' CyberShake AR_Hazard_Curve_Points > ar_hazard_curve_points.sql</li>
 +
<li>mysqldump --lock-all-tables -u cybershk -p --where 'CS_Site_ID in (select R.Site_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>) and ERF_ID in (select R.ERF_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>)' CyberShake CyberShake_Site_Ruptures > cs_site_ruptures.sql</li>
 +
<li>mysqldump --lock-all-tables -u cybershk -p --where 'ERF_ID in (select R.ERF_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>) and (Source_ID, Rupture_ID) in (select distinct SR.Source_ID, SR.Rupture_ID from CyberShake_Site_Ruptures SR, CyberShake_Runs R where SR.CS_Site_ID=R.Site_ID and SR.ERF_ID=R.ERF_ID and R.Study_ID=<study id to archive>)' CyberShake Ruptures > ruptures.sql</li>
 +
<li>mysqldump --lock-all-tables -u cybershk -p --where '(Rup_Var_Scenario_ID, ERF_ID) in (select R.Rup_Var_Scenario_ID, R.ERF_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>) and (Source_ID, Rupture_ID) in (select distinct SR.Source_ID, SR.Rupture_ID from CyberShake_Site_Ruptures SR, CyberShake_Runs R where SR.CS_Site_ID=R.Site_ID and SR.ERF_ID=R.ERF_ID and R.Study_ID=<study id to archive>)' CyberShake Rupture_Variations > rupture_variations.sql</li>
 +
<li>mysqldump --lock-all-tables -u cybershk -p --where 'AR_Hazard_Dataset_ID in (select distinct D.AR_Hazard_Dataset_ID from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lat>=(select distinct D.Min_Lat from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lat<=(select distinct D.Max_Lat from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lon>=(select distinct D.Min_Lon from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lon<=(select distinct D.Max_Lon from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID)' CyberShake AR_Hazard_Curves > ar_hazard_curves.sql</li>
 +
<li>mysqldump --lock-all-tables -u cybershk -p CyberShake AR_Hazard_Datasets Atten_Rel_Metadata Atten_Rels CyberShake_Site_Regions CyberShake_Site_Types CyberShake_Sites ERF_IDs ERF_Metadata ERF_Probability_Models IM_Types Rupture_Variation_Probability_Modifier Rupture_Variation_Scenario_IDs Rupture_Variation_Scenario_Metadata SGT_Variation_IDs SGT_Variation_Metadata Studies Time_Spans Velocity_Models</li>
 +
</ol>
 +
</ul>
  
#For each of the 6 dump files, run:
+
</ol>
 +
 
 +
<b>Next, convert the SQL dumps into SQLite format using mysql2sqlite:</b>
 +
 
 +
#For each of the dump files, run:
 
   mysql2sqlite <SQL dump file> > <SQLite dump file>
 
   mysql2sqlite <SQL dump file> > <SQLite dump file>
 
   Example:  ./mysql2sqlite peak_amps.sql > peak_amps.sqlite
 
   Example:  ./mysql2sqlite peak_amps.sql > peak_amps.sqlite
 
For large tables this may take an hour or two.
 
For large tables this may take an hour or two.
  
Create a SQLite database and import the tables.  Use sqlite3 3.7.11 or later, or there will be an error reading the dump files.
+
<b>Create a SQLite database and import the tables.  Use sqlite3 3.7.11 or later, or there will be an error reading the dump files. If you get errors about too many entries, upgrade to a more recent version and try again.</b>
#Create a database for this study:
+
<ol>
 +
<li>Create a database for this study:</li>
 
   sqlite3 <study name>
 
   sqlite3 <study name>
#For each dump file, run the following command:
+
<li>For each dump file, run the following command:</li>
 
   .read <path/to/dump/file.sqlite>
 
   .read <path/to/dump/file.sqlite>
 +
</ol>
 +
 +
<b>Run a few queries on the original table on the database, and on the sqlite files to check that the count is the same.</b>
 +
<ol>
 +
<li>Check the number of PeakAmplitudes:</li>
 +
  select count(*) from PeakAmplitudes P, CyberShake_Runs R where R.Study_ID=<study_id> and P.Run_ID=R.Run_ID;
 +
<li>Check the number of rupture variations for each site:</li>
 +
  select count(*) from CyberShake_Runs R, CyberShake_Site_Ruptures SR, Rupture_Variations V where R.Site_ID=SR.CS_Site_ID and SR.Source_ID=V.Source_ID and SR.Rupture_ID=V.Rupture_ID and R.ERF_ID=V.ERF_ID and R.ERF_ID=SR.ERF_ID and V.Rup_Var_Scenario_ID=R.Rup_Var_Scenario_ID and R.Study_ID=<study id>;
 +
</ol>
 +
 +
<b>Create index on Run_ID and IM_Type_ID in PeakAmplitudes table to speed up access.</b>
 +
<ol>
 +
<li>Copy over the original version to a new file.</li>
 +
<li>Set the SQLITE_TMPDIR variable to point to a filesystem with lots of free space.</li>
 +
<li>Using a current version of sqlite3, open the new database file and run:</li>
 +
  CREATE INDEX "idx_PeakAmplitudes_Run_ID_IM_Type_ID" ON "PeakAmplitudes" (`Run_ID`, `IM_Type_ID`);
 +
</ol>
 +
 +
<b>Move the files to the archive location on project.</b>
 +
<ol>
 +
<li>Tar up the sqlite files:</li>
 +
  tar czvf <study_name>.tgz *.sqlite
 +
<li>SFTP the tgz file to the study sqlite archive location on project at CARC (/project/scec_608/cybershake/results/sqlite_studies/<study name>), using hpc-transfer1.</li>
 +
<li>Extract the database to /project/scec_608/cybershake/results/sqlite_studies/<study_name> .</li>
 +
  tar xzvf sqlite_dumps/<study_name>.tgz <study_name>.sqlite
 +
</ol>
 +
 +
== Move study to from production DB to data access DB ==

Latest revision as of 17:50, 24 January 2024

SQLite Archiving

The following procedure should be used when we want to archive a study to SQLite which is either in the production or data access database to disk, to free up room in the database.

First, dump the old database contents into a new directory by running the following commands.

Note that if you don't want to lock the tables, you can replace '--lock-all-tables' with '--single-transaction=TRUE' in the following commands.

  1. Determine the Study ID you want to archive, from looking at the Studies table in the DB:
  2. select * from Studies;
  3. Dump the PeakAmplitudes, using:
  4. mysqldump --lock-all-tables -u cybershk -p --where 'Run_ID in (select R.Run_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>)' CyberShake PeakAmplitudes > peak_amps.sql
  5. Dump Hazard_Datasets, using:
  6. mysqldump --lock-all-tables -u cybershk -p --where 'Study_ID=<study id to archive>' CyberShake Hazard_Datasets > hazard_datasets.sql
  7. Dump Hazard_Curves, using:
  8. mysqldump --lock-all-tables -u cybershk -p --where 'Hazard_Dataset_ID in (select D.Hazard_Dataset_ID from Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake Hazard_Curves > hazard_curves.sql
  9. Dump Hazard_Curve_Points, using:
  10. mysqldump --lock-all-tables -u cybershk -p --where 'Hazard_Curve_ID in (select C.Hazard_Curve_ID from Hazard_Curves C, Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake Hazard_Curve_Points > hazard_curve_points.sql
  11. Dump CyberShake_Runs, using:
  12. mysqldump --lock-all-tables -u cybershk -p --where 'Study_ID=<study id to archive>' CyberShake CyberShake_Runs > runs.sql
  13. You now need the rest of the input tables. They used to be pretty small, but now they're up to ~70GB with indices. As of the completion of Study 22.12, here are their approximate sizes (in the DB, not as dump files):
    Table Size (GB)
    AR_Hazard_Curve_Points 44.6
    Points 6.3
    CyberShake_Site_Ruptures 10.3
    Ruptures 2.3
    Rupture_Variations 2.6
    AR_Hazard_Curves 1.4
    Other tables 0.5

    You can either:

    • Dump all of the input tables -- they are small enough that we don't mind capturing data which wasn't directly used in this study.
    • mysqldump -u cybershk -p CyberShake AR_Hazard_Curve_Points AR_Hazard_Curves AR_Hazard_Datasets Atten_Rel_Metadata Atten_Rels CyberShake_Site_Regions CyberShake_Site_Ruptures CyberShake_Site_Types CyberShake_Sites ERF_IDs ERF_Metadata ERF_Probability_Models IM_Types Points Rupture_Variation_Probability_Modifier Rupture_Variation_Scenario_IDs Rupture_Variation_Scenario_Metadata Rupture_Variations Ruptures SGT_Variation_IDs SGT_Variation_Metadata Studies Time_Spans Velocity_Models > input_tables.sql
    • Dump only the sections of the large input tables that you need. Note that the Points table isn't used anymore, so don't bother to dump it.
      1. mysqldump --lock-all-tables -u cybershk -p --where 'AR_Hazard_Curve_ID in (select distinct C.AR_Hazard_Curve_ID from AR_Hazard_Curves C where C.AR_Hazard_Dataset_ID in (select distinct D.AR_Hazard_Dataset_ID from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lat>=(select distinct D.Min_Lat from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lat<=(select distinct D.Max_Lat from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lon>=(select distinct D.Min_Lon from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lon<=(select distinct D.Max_Lon from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) )' CyberShake AR_Hazard_Curve_Points > ar_hazard_curve_points.sql
      2. mysqldump --lock-all-tables -u cybershk -p --where 'CS_Site_ID in (select R.Site_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>) and ERF_ID in (select R.ERF_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>)' CyberShake CyberShake_Site_Ruptures > cs_site_ruptures.sql
      3. mysqldump --lock-all-tables -u cybershk -p --where 'ERF_ID in (select R.ERF_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>) and (Source_ID, Rupture_ID) in (select distinct SR.Source_ID, SR.Rupture_ID from CyberShake_Site_Ruptures SR, CyberShake_Runs R where SR.CS_Site_ID=R.Site_ID and SR.ERF_ID=R.ERF_ID and R.Study_ID=<study id to archive>)' CyberShake Ruptures > ruptures.sql
      4. mysqldump --lock-all-tables -u cybershk -p --where '(Rup_Var_Scenario_ID, ERF_ID) in (select R.Rup_Var_Scenario_ID, R.ERF_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>) and (Source_ID, Rupture_ID) in (select distinct SR.Source_ID, SR.Rupture_ID from CyberShake_Site_Ruptures SR, CyberShake_Runs R where SR.CS_Site_ID=R.Site_ID and SR.ERF_ID=R.ERF_ID and R.Study_ID=<study id to archive>)' CyberShake Rupture_Variations > rupture_variations.sql
      5. mysqldump --lock-all-tables -u cybershk -p --where 'AR_Hazard_Dataset_ID in (select distinct D.AR_Hazard_Dataset_ID from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lat>=(select distinct D.Min_Lat from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lat<=(select distinct D.Max_Lat from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lon>=(select distinct D.Min_Lon from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID) and Lon<=(select distinct D.Max_Lon from AR_Hazard_Datasets D, CyberShake_Runs R where R.Study_ID=<study id to archive> and D.Time_Span_ID=1 and D.Prob_Model_ID=1 and D.ERF_ID=R.ERF_ID and D.Velocity_Model_ID=R.Velocity_Model_ID)' CyberShake AR_Hazard_Curves > ar_hazard_curves.sql
      6. mysqldump --lock-all-tables -u cybershk -p CyberShake AR_Hazard_Datasets Atten_Rel_Metadata Atten_Rels CyberShake_Site_Regions CyberShake_Site_Types CyberShake_Sites ERF_IDs ERF_Metadata ERF_Probability_Models IM_Types Rupture_Variation_Probability_Modifier Rupture_Variation_Scenario_IDs Rupture_Variation_Scenario_Metadata SGT_Variation_IDs SGT_Variation_Metadata Studies Time_Spans Velocity_Models

Next, convert the SQL dumps into SQLite format using mysql2sqlite:

  1. For each of the dump files, run:
 mysql2sqlite <SQL dump file> > <SQLite dump file>
 Example:  ./mysql2sqlite peak_amps.sql > peak_amps.sqlite

For large tables this may take an hour or two.

Create a SQLite database and import the tables. Use sqlite3 3.7.11 or later, or there will be an error reading the dump files. If you get errors about too many entries, upgrade to a more recent version and try again.

  1. Create a database for this study:
  2. sqlite3 <study name>
  3. For each dump file, run the following command:
  4. .read <path/to/dump/file.sqlite>

Run a few queries on the original table on the database, and on the sqlite files to check that the count is the same.

  1. Check the number of PeakAmplitudes:
  2. select count(*) from PeakAmplitudes P, CyberShake_Runs R where R.Study_ID=<study_id> and P.Run_ID=R.Run_ID;
  3. Check the number of rupture variations for each site:
  4. select count(*) from CyberShake_Runs R, CyberShake_Site_Ruptures SR, Rupture_Variations V where R.Site_ID=SR.CS_Site_ID and SR.Source_ID=V.Source_ID and SR.Rupture_ID=V.Rupture_ID and R.ERF_ID=V.ERF_ID and R.ERF_ID=SR.ERF_ID and V.Rup_Var_Scenario_ID=R.Rup_Var_Scenario_ID and R.Study_ID=<study id>;

Create index on Run_ID and IM_Type_ID in PeakAmplitudes table to speed up access.

  1. Copy over the original version to a new file.
  2. Set the SQLITE_TMPDIR variable to point to a filesystem with lots of free space.
  3. Using a current version of sqlite3, open the new database file and run:
  4. CREATE INDEX "idx_PeakAmplitudes_Run_ID_IM_Type_ID" ON "PeakAmplitudes" (`Run_ID`, `IM_Type_ID`);

Move the files to the archive location on project.

  1. Tar up the sqlite files:
  2. tar czvf <study_name>.tgz *.sqlite
  3. SFTP the tgz file to the study sqlite archive location on project at CARC (/project/scec_608/cybershake/results/sqlite_studies/<study name>), using hpc-transfer1.
  4. Extract the database to /project/scec_608/cybershake/results/sqlite_studies/<study_name> .
  5. tar xzvf sqlite_dumps/<study_name>.tgz <study_name>.sqlite

Move study to from production DB to data access DB