Difference between revisions of "CyberShake Study Database Archiving"

Revision as of 21:24, 1 May 2023

SQLite Archiving

The following procedure should be used when we want to archive a study to SQLite which is either in the production or data access database to disk, to free up room in the database.

First, dump the old database contents into a new directory by running the following commands.

Note that if you don't want to lock the tables, you can replace '--lock-all-tables' with '--single-transaction=TRUE' in the following commands.

Determine the Study ID you want to archive, from looking at the Studies table in the DB:
Dump the PeakAmplitudes, using:
Dump Hazard_Datasets, using:
Dump Hazard_Curves, using:
Dump Hazard_Curve_Points, using:
Dump CyberShake_Runs, using:
You now need the rest of the input tables. They used to be pretty small, but now they're up to ~70GB with indices. As of the completion of Study 22.12, here are their approximate sizes (in the DB, not as dump files):

Table Size (GB)

AR_Hazard_Curve_Points 44.6

Points 6.3

CyberShake_Site_Ruptures 10.3

Ruptures 2.3

Rupture_Variations 2.6

AR_Hazard_Curves 1.4

Other tables 0.5

You can either:
- Dump all of the input tables -- they are small enough that we don't mind capturing data which wasn't directly used in this study.
- Dump only the sections of the large input tables that you need. Note that the Points table isn't used anymore, so don't bother to dump it.
  1. mysqldump --lock-all-tables -u cybershk -p --where 'Hazard_Curve_ID in (select C.Hazard_Curve_ID from Hazard_Curves C, Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake AR_Hazard_Curve_Points
  2. mysqldump --lock-all-tables -u cybershk -p --where 'CS_Site_ID in (select R.Site_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>) and ERF_ID in (select R.ERF_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>)' CyberShake CyberShake_Site_Ruptures > cs_site_ruptures.sql
  3. mysqldump --lock-all-tables -u cybershk -p --where 'ERF_ID in (select R.ERF_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>) and (Source_ID, Rupture_ID) in (select distinct SR.Source_ID, SR.Rupture_ID from CyberShake_Site_Ruptures SR, CyberShake_Runs R where SR.CS_Site_ID=R.Site_ID and SR.ERF_ID=R.ERF_ID and R.Study_ID=<study id to archive>)' CyberShake Ruptures > ruptures.sql
  4. mysqldump --lock-all-tables -u cybershk -p --where '(Rup_Var_Scenario_ID, ERF_ID) in (select R.Rup_Var_Scenario_ID, R.ERF_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>) and (Source_ID, Rupture_ID) in (select distinct SR.Source_ID, SR.Rupture_ID from CyberShake_Site_Ruptures SR, CyberShake_Runs R where SR.CS_Site_ID=R.Site_ID and SR.ERF_ID=R.ERF_ID and R.Study_ID=<study id to archive>)' CyberShake Rupture_Variations > rupture_variations.sql
  5. AR_Hazard_Curves
  6. Other tables

Next, convert the SQL dumps into SQLite format using mysql2sqlite:

For each of the 6 dump files, run:

 mysql2sqlite <SQL dump file> > <SQLite dump file>
 Example:  ./mysql2sqlite peak_amps.sql > peak_amps.sqlite

For large tables this may take an hour or two.

Create a SQLite database and import the tables. Use sqlite3 3.7.11 or later, or there will be an error reading the dump files.

Create a database for this study:
For each dump file, run the following command:

Run a few queries on the original table on the database, and on the sqlite files to check that the count is the same.

Check the number of PeakAmplitudes:
Check the number of rupture variations for each site:

Move the files to the archive location on project.

Tar up the sqlite files:
SFTP the tgz file to the study sqlite archive location on project at CARC (/project/scec_608/cybershake/results/sqlite_studies/<study name>), using hpc-transfer1.
Extract the database to /project/scec_608/cybershake/results/sqlite_studies/<study_name> .

@@ Line 3: / Line 3: @@
 The following procedure should be used when we want to archive a study to SQLite which is either in the production or data access database to disk, to free up room in the database.
-<b>First, dump the old database contents into a new directory:</b>
+<b>First, dump the old database contents into a new directory by running the following commands.</b>
+Note that if you don't want to lock the tables, you can replace '--lock-all-tables' with '--single-transaction=TRUE' in the following commands.
 <ol>
@@ Line 18: / Line 20: @@
 <li>Dump CyberShake_Runs, using:</li>
    mysqldump --lock-all-tables -u cybershk -p --where 'Study_ID=<study id to archive>' CyberShake CyberShake_Runs > runs.sql
-<li>You now need the rest of the input tables.  They used to be pretty small, but now they're up to ~70GB.  As of the completion of Study 22.12, here are their approximate sizes:
+<li>You now need the rest of the input tables.  They used to be pretty small, but now they're up to ~70GB with indices.  As of the completion of Study 22.12, here are their approximate sizes (in the DB, not as dump files):
 {| border="1"
 ! Table !! Size (GB)
@@ Line 36: / Line 38: @@
 | Other tables || 0.5
 |}
+You can either:
+<ul>
 <li>Dump all of the input tables -- they are small enough that we don't mind capturing data which wasn't directly used in this study.</li>
-   mysqldump -u cybershk -p CyberShake AR_Hazard_Curve_Points AR_Hazard_Curves AR_Hazard_Datasets Atten_Rel_Metadata Atten_Rels CyberShake_Site_Regions CyberShake_Site_Ruptures CyberShake_Site_Types CyberShake_Sites ERF_IDs ERF_Metadata ERF_Probability_Models IM_Types Points Rupture_Variation_Probability_Modifier Rupture_Variation_Scenario_IDs Rupture_Variation_Scenario_Metadata Rupture_Variations Ruptures SGT_Variation_IDs SGT_Variation_Metadata Studies Time_Spans Velocity_Models > input_tables.sql
+   mysqldump -u cybershk -p CyberShake AR_Hazard_Curve_Points AR_Hazard_Curves AR_Hazard_Datasets Atten_Rel_Metadata Atten_Rels CyberShake_Site_Regions CyberShake_Site_Ruptures CyberShake_Site_Types CyberShake_Sites ERF_IDs ERF_Metadata ERF_Probability_Models IM_Types Points Rupture_Variation_Probability_Modifier Rupture_Variation_Scenario_IDs Rupture_Variation_Scenario_Metadata Rupture_Variations Ruptures SGT_Variation_IDs SGT_Variation_Metadata Studies Time_Spans Velocity_Models > input_tables.sql</li>
+<li>Dump only the sections of the large input tables that you need.  Note that the Points table isn't used anymore, so don't bother to dump it.
-If you don't want to lock the tables, you can replace '--lock-all-tables' with '--single-transaction=TRUE'.
+<ol>
+<li>mysqldump --lock-all-tables -u cybershk -p --where 'Hazard_Curve_ID in (select C.Hazard_Curve_ID from Hazard_Curves C, Hazard_Datasets D where D.Study_ID=<study id to archive>)' CyberShake AR_Hazard_Curve_Points</li>
+<li>mysqldump --lock-all-tables -u cybershk -p --where 'CS_Site_ID in (select R.Site_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>) and ERF_ID in (select R.ERF_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>)' CyberShake CyberShake_Site_Ruptures > cs_site_ruptures.sql</li>
+<li>mysqldump --lock-all-tables -u cybershk -p --where 'ERF_ID in (select R.ERF_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>) and (Source_ID, Rupture_ID) in (select distinct SR.Source_ID, SR.Rupture_ID from CyberShake_Site_Ruptures SR, CyberShake_Runs R where SR.CS_Site_ID=R.Site_ID and SR.ERF_ID=R.ERF_ID and R.Study_ID=<study id to archive>)' CyberShake Ruptures > ruptures.sql</li>
+<li>mysqldump --lock-all-tables -u cybershk -p --where '(Rup_Var_Scenario_ID, ERF_ID) in (select R.Rup_Var_Scenario_ID, R.ERF_ID from CyberShake_Runs R where R.Study_ID=<study id to archive>) and (Source_ID, Rupture_ID) in (select distinct SR.Source_ID, SR.Rupture_ID from CyberShake_Site_Ruptures SR, CyberShake_Runs R where SR.CS_Site_ID=R.Site_ID and SR.ERF_ID=R.ERF_ID and R.Study_ID=<study id to archive>)' CyberShake Rupture_Variations > rupture_variations.sql</li>
+<li>AR_Hazard_Curves</li>
+<li>Other tables</li>
+</ol>
+</ul>
 </ol>

Difference between revisions of "CyberShake Study Database Archiving"

Revision as of 21:24, 1 May 2023

SQLite Archiving

Move study to from production DB to data access DB

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

Table	Size (GB)
AR_Hazard_Curve_Points	44.6
Points	6.3
CyberShake_Site_Ruptures	10.3
Ruptures	2.3
Rupture_Variations	2.6
AR_Hazard_Curves	1.4
Other tables	0.5