UCVMC MPI Testing
Contents
Confirm MPI executables are built correctly
First, I checked the ucvmc/bin directory to confirm that the MPI executable are built. I see the following:
When I try to run the ./ucvm2mesh-mpi, without parameters, I get this
-bash-4.2$ ./ucvm2mesh-mpi -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PMI2_Job_GetId failed failed --> Returned value (null) (14) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_init failed --> Returned value (null) (14) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: ompi_rte_init failed --> Returned "(null)" (14) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [hpc-login2:16596] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
Building UCVMC on hpc with MPI
Install on a shared file system because it is a large installation (25GB)
- cd /home/scec-00/maechlin/
- %git clone https://github.com/SCECcode/ucvmc.git
Then follow installation instruction on github wiki:
I installed all models. both the "make check" and the example ucvm_query worked. Also, I confirmed that the mpi executables were built, inclugin ucvm2mesh-mpi, which we want to test.
UCVMC installation
UCVMC installation should detect whether MPI is avialable, and if so, build the MPI codes including ucvm2mesh-mpi.
Test environment is the USC HPC cluster. First, configure .bash_profile to source the openmpi setup directory
# # Setup MPI # if [ -e /usr/usc/openmpi/default/setup.sh ]; then source /usr/usc/openmpi/default/setup.sh fi #
Then the env command shows that path to openmp libraries have been added.
PATH=/home/scec-00/maechlin/ucvmc/lib/proj4/bin:/home/scec-00/maechlin/anaconda2/bin:/usr/usc/openmpi/1.8.8/slurm/bin:/usr/lib64/q t-3.3/bin:/opt/mam/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin PWD=/home/rcf-01/maechlin LANG=en_US.UTF-8 BBP_GF_DIR=/home/scec-00/maechlin/bbp/default/bbp_gf HISTCONTROL=ignoredups KRB5CCNAME=FILE:/tmp/krb5cc_14364_IcUSJQ OMPI_MCA_oob_tcp_if_exclude=lo,docker0,usb0,myri0 SHLVL=1 HOME=/home/rcf-01/maechlin OMPI_CC=gcc OMPI_MCA_btl_openib_if_exclude=mlx4_0:2 PYTHONPATH=/home/scec-00/maechlin/bbp/default/bbp/bbp/comps:/home/scec-00/maechlin/ucvmc/utilities:/home/scec-00/maechlin/ucvmc/ut ilities/pycvm OMPI_MCA_btl_openib_warn_nonexistent_if=0 LOGNAME=maechlin BBP_VAL_DIR=/home/scec-00/maechlin/bbp/default/bbp_val QTLIB=/usr/lib64/qt-3.3/lib CVS_RSH=ssh SSH_CONNECTION=47.39.67.178 50365 68.181.205.206 22 OMPI_CXX=g++ LESSOPEN=||/usr/bin/lesspipe.sh %s XDG_RUNTIME_DIR=/run/user/14364 DISPLAY=localhost:16.0 BBP_DATA_DIR=/home/scec-00/maechlin/bbp/default/bbp_data OMPI_MCA_btl=^scif _=/usr/bin/env
I removed the lines in my .bash_profile, that setup openmpi, and I confirmed that the OMPI vars, and the openmpi path in my path, are not set in that case.