UCVM on Compute Nodes

From SCECpedia
Jump to navigationJump to search

Examples of running UCVM Query on compute nodes on Discovery

Running on dedicated compute nodes

Reasons for using compute nodes

  • login (headnode) is shared by all users.
  • compute nodes are dedicated to your job (while in use) and not shared
  • HPC Center's don't like programs running on headnode

Method 1 - Allocation Interactive Compute Nodes

Information To Prepare:

  • scalloc - command that will reserve a dedicated compute node for your program.
    • Using dedicated worker nodes should let your program run faster than shared headnode
  • Number of tasks - typically 1 unless running MPI codes
  • Expected max duration of program : Format HH:MM:SS
    • Longer runtimes can
    • HPC systems typically have max runtime (e.g. 24:00:00 or 48:00:00).
    • Must make arrangements with HPC system operates for longer runtimes
  • allocation account - who's allocation will be charged for computing time
    • CARC offers "no-cost" allocations to University researchers that request them
    • Allocation will also include dedicated disk storage on CARC /project filesystem (e.g. /project/maechlin_162 /project/scec_608)

Example command for running program on discovery:

  • This reserves a single compute node, for 1 hour, using SCEC allocation
%salloc --ntasks=1 --time=1:00:00 --account=scec_608

Wait until systems assigns you the requested nodes:

  • Your command line prompt will show you when the nodes are assigned.
  • Run your program like you on the command line:
    • Example profile query
  • %ucvm_query -f /project/scec_608/<username>/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
  • %ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out

Method 2 - Submit job through queue using slurm batch script

This method is best for running large jobs that require many nodes

  • This will put your job into a system queue
  • HPC systems have their own rules for prioritizing jobs in their queues
    • Your queue priority is not necessarily the order submitted
    • short running jobs may have priority
    • jobs requiring few nodes may have priority

Submitting and monitoring your jobs require slurm commands

  • CARC Slurm Examples
  • Commonly used Slurm commands:
    • sbatch <job file name> - Submit job to queue
    • squeue -u <username> - check state of job in queue (waiting, running, completed)
    • Note slurm outputs log file (and possibly an error file) when job completes:
    • These often provide useful information about what happened with your job:

Example:

(base) [maechlin@discovery1 test_ucvm]$ sbatch ucvm_query.job
Submitted batch job 4219152
(base) [maechlin@discovery1 test_ucvm]$ squeue -u maechlin
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
           4219152      main ucvm_que maechlin  R       0:04      1 d05-10 

(base) [maechlin@discovery1 test_ucvm]$ ls -c1 *.out
rpv_cvmsi.out
slurm-4219152.out

Information to Prepare:

  • Number of tasks - typically 1 unless running MPI codes
  • Expected max duration of program : Format HH:MM:SS
  • Allocation to charge for computing time
  • Create slurm "job" file

Example Slurm Script:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=4GB
#SBATCH --time=0:30:00
#SBATCH --account=scec_608

source /project/maechlin_162/ucvm_bin/conf/ucvm_env.sh
ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out

Submit job using slurm comments

  • Locate example scripts in /home1/<username>/test_ucvm
    • Confirm the information in the slurm job file
    • %cat ucvm_query.job
  • %sbatch ucvm_query.job
  • %squeue -u maechlin
  • %cat rpv_cvmsi.out

Related Entries