UCVM v25.7 with external model data directory CVM LARGEDATA DIR
Contents
Goal
As the number of CVMs in UCVM with large data size increases, the time spent building and installing UCVM core code becomes long and unstable. In UCVM v25.7, an optional setup is introduced to separate most model data into a centralize location that can be accessed during build time and runtime via symbolic links.
As a benefit, for UCVM instance that is built as part of docker container, ie. within CVM explorer, this data directory can be mounted from the outside into a running container instead of packing into the image itself. This makes it possible to create multi-CVMs UCVM docker image.
Before building UCVM
Create am accessible directory, CVM_DATA_DIRECTORY with an expected directory structure (431G)
/somepath/CVM_DATA_DIRECTORY
./model: canvas cencal cvmh cvmhlabn cvmhsbbn cvmhsgbn cvmhstbn cvms5_s5 cvmsi_i26 uwlinca cca_i06 cs248 cvmhibbn cvmhrbn cvmhsbcbn cvmhsmbn cvmhvbn cvmsi_cvms sfcvm uwsfbcvm ./model/canvas: vp.dat vs.dat ./model/cca_i06: density.dat vp.dat vs.dat ./model/cencal: USGSBayAreaVM-08.3.0.etree USGSBayAreaVMExt-08.3.0.etree ./model/cs248: density.dat vp.dat vs.dat ./model/cvmh: base@@ CVM_CM.vo CVM_HR.vo CVMSM_flags@@ cvm_vs30_wills.hdr model_top@@ tsurf BASE.gts CVM_CM_VP@@ CVM_HR_VP@@ CVMSM_tag66@@ cvm_vs30_wills.mdl moho@@ BATO.gts CVM_CM_VS@@ CVM_HR_VS@@ CVMSM_vp66@@ fromOutside MOHO.gts CVM_CM_TAG@@ CVM_HR_TAG@@ CVM_LR.vo CVMSM_vs66@@ interfaces.vo topo_dem@@ ./model/cvmh/tsurf: CMxVM_Model3D_CalMex_BATO.ts CMxVM_Model3D_CM_BASE_Folded.ts CVMH_CalMex_BATO.ts CVMH_Moho.ts CMxVM_Model3D_CM_BASE_Folded.dxf CVMH_Basement64.ts CVMH_Moho64.ts ./model/cvmhibbn: base@@ CVMHB-Inner-Borderland-Basin.dat CVMSM_flags@@ model_top@@ CVM_CM_TAG@@ CVMHB-Inner-Borderland-Basin_tag61_basin@@ CVMSM_tag66@@ moho@@ CVM_CM.vo CVMHB-Inner-Borderland-Basin.vo CVMSM_vp66@@ topo_dem@@ CVM_CM_VP@@ CVMHB-Inner-Borderland-Basin_vp63_basin@@ CVMSM_vs66@@ CVM_CM_VS@@ CVMHB-Inner-Borderland-Basin_vs63_basin@@ interfaces.vo ./model/cvmhlabn: base@@ CVMHB-Los-Angeles-Basin.dat CVMSM_flags@@ model_top@@ CVM_CM_TAG@@ CVMHB-Los-Angeles-Basin_tag61_basin@@ CVMSM_tag66@@ moho@@ CVM_CM.vo CVMHB-Los-Angeles-Basin.vo CVMSM_vp66@@ topo_dem@@ CVM_CM_VP@@ CVMHB-Los-Angeles-Basin_vp63_basin@@ CVMSM_vs66@@ CVM_CM_VS@@ CVMHB-Los-Angeles-Basin_vs63_basin@@ interfaces.vo ./model/cvmhrbn: base@@ CVM_CM_VS@@ CVMHB-Ridge-Basin_vp63_basin@@ CVMSM_vp66@@ moho@@ CVM_CM_TAG@@ CVMHB-Ridge-Basin.dat CVMHB-Ridge-Basin_vs63_basin@@ CVMSM_vs66@@ topo_dem@@ CVM_CM.vo CVMHB-Ridge-Basin_tag61_basin@@ CVMSM_flags@@ interfaces.vo CVM_CM_VP@@ CVMHB-Ridge-Basin.vo CVMSM_tag66@@ model_top@@ ./model/cvmhsbbn: base@@ CVMHB-San-Bernardino-Basin.dat CVMSM_flags@@ model_top@@ CVM_CM_TAG@@ CVMHB-San-Bernardino-Basin_tag61_basin@@ CVMSM_tag66@@ moho@@ CVM_CM.vo CVMHB-San-Bernardino-Basin.vo CVMSM_vp66@@ topo_dem@@ CVM_CM_VP@@ CVMHB-San-Bernardino-Basin_vp63_basin@@ CVMSM_vs66@@ CVM_CM_VS@@ CVMHB-San-Bernardino-Basin_vs63_basin@@ interfaces.vo ./model/cvmhsbcbn: base@@ CVMHB-Santa-Barbara-Channel-Basin.dat CVMSM_flags@@ model_top@@ CVM_CM_TAG@@ CVMHB-Santa-Barbara-Channel-Basin_tag61_basin@@ CVMSM_tag66@@ moho@@ CVM_CM.vo CVMHB-Santa-Barbara-Channel-Basin.vo CVMSM_vp66@@ topo_dem@@ CVM_CM_VP@@ CVMHB-Santa-Barbara-Channel-Basin_vp63_basin@@ CVMSM_vs66@@ CVM_CM_VS@@ CVMHB-Santa-Barbara-Channel-Basin_vs63_basin@@ interfaces.vo ./model/cvmhsgbn: base@@ CVMHB-San-Gabriel-Basin.dat CVMSM_flags@@ model_top@@ CVM_CM_TAG@@ CVMHB-San-Gabriel-Basin_tag61_basin@@ CVMSM_tag66@@ moho@@ CVM_CM.vo CVMHB-San-Gabriel-Basin.vo CVMSM_vp66@@ topo_dem@@ CVM_CM_VP@@ CVMHB-San-Gabriel-Basin_vp63_basin@@ CVMSM_vs66@@ CVM_CM_VS@@ CVMHB-San-Gabriel-Basin_vs63_basin@@ interfaces.vo ./model/cvmhsmbn: base@@ CVMHB-Santa-Maria-Basin.dat CVMSM_flags@@ model_top@@ CVM_CM_TAG@@ CVMHB-Santa-Maria-Basin_tag61_basin@@ CVMSM_tag66@@ moho@@ CVM_CM.vo CVMHB-Santa-Maria-Basin.vo CVMSM_vp66@@ topo_dem@@ CVM_CM_VP@@ CVMHB-Santa-Maria-Basin_vp63_basin@@ CVMSM_vs66@@ CVM_CM_VS@@ CVMHB-Santa-Maria-Basin_vs63_basin@@ interfaces.vo ./model/cvmhstbn: base@@ CVMHB-Salton-Trough-Basin.dat CVMSM_flags@@ model_top@@ CVM_CM_TAG@@ CVMHB-Salton-Trough-Basin_tag61_basin@@ CVMSM_tag66@@ moho@@ CVM_CM.vo CVMHB-Salton-Trough-Basin.vo CVMSM_vp66@@ topo_dem@@ CVM_CM_VP@@ CVMHB-Salton-Trough-Basin_vp63_basin@@ CVMSM_vs66@@ CVM_CM_VS@@ CVMHB-Salton-Trough-Basin_vs63_basin@@ interfaces.vo ./model/cvmhvbn: base@@ CVM_CM_VS@@ CVMHB-Ventura-Basin_vp63_basin@@ CVMSM_vp66@@ moho@@ CVM_CM_TAG@@ CVMHB-Ventura-Basin.dat CVMHB-Ventura-Basin_vs63_basin@@ CVMSM_vs66@@ topo_dem@@ CVM_CM.vo CVMHB-Ventura-Basin_tag61_basin@@ CVMSM_flags@@ interfaces.vo CVM_CM_VP@@ CVMHB-Ventura-Basin.vo CVMSM_tag66@@ model_top@@ ./model/cvms5_s5: vp.dat vs.dat ./model/cvmsi_cvms: 3D.out ivmod.edge Makefile q12y_edge sgeod.h soil.pgm te6A_edge tsq4_sur2 b1___edge ivsurfaced.h Makefile.am q12y_sur2 sgeo.h sp9b_edge te6A_sur2 tsq5_edge b1___sur2 ivsurface.h Makefile.in q12z_edge sgmo_edge sp9b_sur2 te6B_edge tsq5_sur2 b2___edge ku1__edge mantled.h q12z_sur2 sgmo_sur2 spu1_edge te6B_sur2 tsq7_edge b2___sur2 ku1__sur2 mantle.h qps1_edge sgre_edge spu1_sur2 te7__edge tsq7_sur2 b3___edge ku2__edge moho1.h qps1_sur2 sgre_sur2 spu9_edge te7__sur2 tsq9_edge b3___sur2 ku2__sur2 moho_sur qps2_edge sku2_edge spu9_sur2 te8__edge tsq9_sur2 b4___edge ku3__edge names.h qps2_sur2 sku2_sur2 st4b_edge te8__sur2 tv1__edge b4___sur2 ku3__sur2 nsbb_edge qps5_edge smb1_edge st4b_sur2 tj1__edge tv1__sur2 b5___edge ku4__edge nsbb_sur2 qps5_sur2 smb2_edge st4s_edge tj1__sur2 tv2__edge b5___sur2 ku4__sur2 params.h qps6_edge smb2_sur2 st4s_sur2 tj2__edge tv2__sur2 bmod_edge ku5__edge pu1__edge qps6_sur2 smb3_edge ste2_edge tj2__sur2 tv3__edge borehole.h ku5__sur2 pu1__sur2 regionald.h smb3_sur2 ste2_sur2 tj3__edge tv3__sur2 boreholes ku8__edge pu2A_edge regional.h smb9_edge surfaced.h tj3__sur2 tv5__edge cvms.h ku8__sur2 pu2A_sur2 salton_base.sur smb9_sur2 surface.h tj4__edge tv5__sur2 cvms_sub.f laba_edge pu2B_edge sbb2_edge smm1_edge te1__edge tj4__sur2 tv9__edge cvms_sub.o laba_sur2 pu2B_sur2 sbb2_sur2 smm1_sur2 te1__sur2 tj5__edge tv9__sur2 dim2.h lab_geo2_geology pu3__edge sbb__edge smm2_edge te2__edge tj5__sur2 version.h dim8.h labup.h pu3__sur2 sbb__sur2 smm2_sur2 te2__sur2 tsq1_edge wtbh1d.h eh.modPS lamo_edge pu9__edge sbmi_edge smr1_edge te3__edge tsq1_sur2 wtbh1.h genprod.h lamo_sur2 pu9__sur2 sbmi_sur2 smr1_sur2 te3__sur2 tsq2_edge wtbh2.h genpro.h lare_edge q12b_edge sbmo_edge smr2_edge te4__edge tsq2_sur2 wtbh3.h impva.edge lare_sur2 q12b_sur2 sbmo_sur2 smr2_sur2 te4__sur2 tsq3_edge in.h laup_edge q12x_edge sgba_edge soil1.h te5__edge tsq3_sur2 innum.h laup_sur2 q12x_sur2 sgba_sur2 soil_generic te5__sur2 tsq4_edge ./model/cvmsi_i26: box.dat cvmsi.bin cvmsi.ver region_spec.in XYZGRD ./model/sfcvm: USGS_SFCVM_v21-0_regional.h5 USGS_SFCVM_v21-1_detailed.h5 ./model/uwlinca: vp.dat ./model/uwsfbcvm: easting.dat northing.dat vp.dat vs.dat
Building UCVM
Set the data location environment in addition to UCVM_SRC_PATH and UCVM_INSTALL_PATH
export CVM_LARGEDATA_DIR=/somepath/CVM_DATASET_DIRECTORY
The handling for this environment variable is within the configure.ac/Makefile.am of each CVMs's top/data directory.
The basic data source logic in mycvm/configure.ac is :
If CVM_LARGEDATA_DIR is defined :
If CVM_IN_DOCKER is defined :
set WITH_mycvm_CVM_LARGEDATA_DIR
If CVM_IN_DOCKER is not defined :
do a specific model data file check before enable WITH_mycvm_CVM_LARGEDATA_DIR
The basic data source logic in mycvm/data/Makefile.am is :
If WITH_mycvm_CVM_LARGEDATA_DIR is defined :
link with the right dataset locally
If WITH_mycvm_CVM_LARGEDATA_DIR is not defined:
download the dataset from remote source (CARC/globus endpoint)
Not every CVMs needs this data source logic because their data might be sparse or small or it is rule based.
The reason for the CVM_IN_DOCKER environment variable is because the data directory is mounted when the container is run instead of when the UCVM/image is being 'built'
Example: CVM explorer
Internally, there is a local path to this large data directory and so needs to set it during 'docker compose --build' for CVMs in UCVM to pick it up.
so, in cvm_web/setup_cvm_web.sh, add
export CVM_LARGEDATA_DIR=/usr/local/share/cvm-largedata-dir export CVM_IN_DOCKER='#'
and in Dockerfile-web, add
ENV CVM_LARGEDATA_DIR=/usr/local/share/cvm-largedata-dir ENV CVM_IN_DOCKER='#
and in development.yml, we add the mounting link so a running container would find it.
web: volumes: - ./:/app - ./custom-php.ini:/etc/php.d/custom-php.ini - ./cvm-result:/usr/local/share/ucvm/cvm-result - /var/www/html/CVM_DATASET_DIRECTORY:/usr/local/share/cvm-largedata-dir:ro restart: unless-stopped