Difference between revisions of "2016 CyberShake database migration"

From SCECpedia
Jump to navigationJump to search
Line 17: Line 17:
 
*Swapped hardware between moment and focal
 
*Swapped hardware between moment and focal
 
*On read-only server, 2 databases: 1 with Study 15.4, and 1 with Study 15.12 data.
 
*On read-only server, 2 databases: 1 with Study 15.4, and 1 with Study 15.12 data.
*On production server, keep Study 15.12 and 15.4
+
*On production server, 1 database with all CyberShake data, including Study 15.12 and 15.4
*Migrate older studies to alternative format, delete from production server.
+
*After the above is complete, migrate older studies to alternative format and delete from production server.
  
 
=== Detailed Procedure for CyberShake DB Migration ===
 
=== Detailed Procedure for CyberShake DB Migration ===
  
# Run mysqldump on entire DB on focal.  Generate dumpfiles for all the input data, each study's output and runs data, and the runs and output data which is not part of a study.
+
# Run mysqldump on entire DB on focal.  Generate dumpfiles for all the input data, each study's output and runs data, and the runs and output data which is not part of any study.
 
# Delete database on moment.
 
# Delete database on moment.
 
# Reconfigure DB on moment (single file per table, etc.)
 
# Reconfigure DB on moment (single file per table, etc.)
Line 31: Line 31:
 
# Swap names of focal and moment so we don't have to change all our scripts.
 
# Swap names of focal and moment so we don't have to change all our scripts.
  
Once this is done, we can finalize the alternative format we want.  Then we will migrate older studies from the moment DB to archive storage and delete them from the moment DB.  We may want to test SQLite to see if it can handle a large study.
+
Once this is done, we can finalize the alternative format we want.  Then we will migrate older studies from the production DB to archive storage, and delete them from the production DB.  We may want to test SQLite to see if it can handle a large study.
  
Since the input data is much smaller (~100x) than the output data, we will keep a full copy of it with each study.  It would be much more time intensive to identify which subset of runs data and input data applies just to the study and the extra space needed to keep it all is trivial.  For each study, we will only keep the runs data for runs which are associated with that study.
+
Since the input data is much smaller (~100x) than the output data, we will keep a full copy of it with each study.  It would be much more time intensive to identify which subset of input data applies just to the study and the extra space needed to keep it all is trivial.  For each study, we will only keep the runs data for runs which are associated with that study.

Revision as of 22:09, 10 June 2016

To clarify terminology:

"Input data": Rupture data, ERF-related data, sites data. This data is shared between studies.

"Run data": What parameters are used with each run, timestamps, systems, study membership. A run is only part of a single study.

"Output data": Peak amplitudes data

Goals of DB Migration

  • Provide improved read performance for users of CyberShake data
  • Separate production data from data from completed studies
  • Permit easy extension to support UGMS web site

Status of DB resources following migration

  • Swapped hardware between moment and focal
  • On read-only server, 2 databases: 1 with Study 15.4, and 1 with Study 15.12 data.
  • On production server, 1 database with all CyberShake data, including Study 15.12 and 15.4
  • After the above is complete, migrate older studies to alternative format and delete from production server.

Detailed Procedure for CyberShake DB Migration

  1. Run mysqldump on entire DB on focal. Generate dumpfiles for all the input data, each study's output and runs data, and the runs and output data which is not part of any study.
  2. Delete database on moment.
  3. Reconfigure DB on moment (single file per table, etc.)
  4. Load all files into DB on moment using the InnoDB engine.
  5. Confirm the reload into moment was successful.
  6. Delete database on focal.
  7. Load input data, Study 15.12 runs+output data, and Study 15.4 runs+output data onto focal for read-only access, using the MyISAM engine. Each study is in a separate database.
  8. Swap names of focal and moment so we don't have to change all our scripts.

Once this is done, we can finalize the alternative format we want. Then we will migrate older studies from the production DB to archive storage, and delete them from the production DB. We may want to test SQLite to see if it can handle a large study.

Since the input data is much smaller (~100x) than the output data, we will keep a full copy of it with each study. It would be much more time intensive to identify which subset of input data applies just to the study and the extra space needed to keep it all is trivial. For each study, we will only keep the runs data for runs which are associated with that study.