UCVMC MPI Testing

From SCECpedia
Jump to navigationJump to search

Building UCVMC on hpc with MPI

Install on a shared file system because it is a large installation (25GB)

Then follow installation instruction on github wiki:

I installed all models. both the "make check" and the example ucvm_query worked. Also, I confirmed that the mpi executables were built, inclugin ucvm2mesh-mpi, which we want to test.

Confirm MPI executables are built correctly

First, I checked the ucvmc/bin directory to confirm that the MPI executable are built. I see the following:


When I try to run the ./ucvm2mesh-mpi, without parameters, I get this


-bash-4.2$ ./ucvm2mesh-mpi
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  PMI2_Job_GetId failed failed
  --> Returned value (null) (14) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_init failed
  --> Returned value (null) (14) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "(null)" (14) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[hpc-login2:16596] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!

UCVMC installation

UCVMC installation should detect whether MPI is avialable, and if so, build the MPI codes including ucvm2mesh-mpi.

Test environment is the USC HPC cluster. First, configure .bash_profile to source the openmpi setup directory

#
# Setup MPI
#
if [ -e /usr/usc/openmpi/default/setup.sh ]; then 
  source /usr/usc/openmpi/default/setup.sh
fi
#

Then the env command shows that path to openmp libraries have been added.

PATH=/home/scec-00/maechlin/ucvmc/lib/proj4/bin:/home/scec-00/maechlin/anaconda2/bin:/usr/usc/openmpi/1.8.8/slurm/bin:/usr/lib64/q
t-3.3/bin:/opt/mam/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
PWD=/home/rcf-01/maechlin
LANG=en_US.UTF-8
BBP_GF_DIR=/home/scec-00/maechlin/bbp/default/bbp_gf
HISTCONTROL=ignoredups
KRB5CCNAME=FILE:/tmp/krb5cc_14364_IcUSJQ
OMPI_MCA_oob_tcp_if_exclude=lo,docker0,usb0,myri0
SHLVL=1
HOME=/home/rcf-01/maechlin
OMPI_CC=gcc
OMPI_MCA_btl_openib_if_exclude=mlx4_0:2
PYTHONPATH=/home/scec-00/maechlin/bbp/default/bbp/bbp/comps:/home/scec-00/maechlin/ucvmc/utilities:/home/scec-00/maechlin/ucvmc/ut
ilities/pycvm
OMPI_MCA_btl_openib_warn_nonexistent_if=0
LOGNAME=maechlin
BBP_VAL_DIR=/home/scec-00/maechlin/bbp/default/bbp_val
QTLIB=/usr/lib64/qt-3.3/lib
CVS_RSH=ssh
SSH_CONNECTION=47.39.67.178 50365 68.181.205.206 22
OMPI_CXX=g++
LESSOPEN=||/usr/bin/lesspipe.sh %s
XDG_RUNTIME_DIR=/run/user/14364
DISPLAY=localhost:16.0
BBP_DATA_DIR=/home/scec-00/maechlin/bbp/default/bbp_data
OMPI_MCA_btl=^scif
_=/usr/bin/env

I removed the lines in my .bash_profile, that setup openmpi, and I confirmed that the OMPI vars, and the openmpi path in my path, are not set in that case.

Related Entries