Difference between revisions of "UCVM on Compute Nodes"

From SCECpedia
Jump to navigationJump to search
Line 34: Line 34:
  
 
== Method 2 - Submit job through queue using slurm batch script ==
 
== Method 2 - Submit job through queue using slurm batch script ==
* This method is best for running large jobs that require many nodes
+
This method is best for running large jobs that require many nodes
 
* This will put your job into a system queue
 
* This will put your job into a system queue
 
* HPC systems have their own rules for prioritizing jobs in their queues
 
* HPC systems have their own rules for prioritizing jobs in their queues
Line 40: Line 40:
 
** short running jobs may have priority
 
** short running jobs may have priority
 
** jobs requiring few nodes may have priority
 
** jobs requiring few nodes may have priority
* Submitting and monitoring your jobs require slurm commands
+
 
**
+
Submitting and monitoring your jobs require slurm commands
 +
*[https://carc.usc.edu/user-information/user-guides/high-performance-computing/slurm-templates CARC Slurm Examples]
  
  

Revision as of 04:06, 26 April 2021

Examples of running UCVM Query on compute nodes on Discovery

Running on dedicated compute nodes

Reasons for using compute nodes

  • login (headnode) is shared by all users.
  • compute nodes are dedicated to your job (while in use) and not shared
  • HPC Center's don't like programs running on headnode

Method 1 - Allocation Interactive Compute Nodes

Information To Prepare:

  • scalloc - command that will reserve a dedicated compute node for your program.
    • Using dedicated worker nodes should let your program run faster than shared headnode
  • Number of tasks - typically 1 unless running MPI codes
  • Expected max duration of program : Format HH:MM:SS
    • Longer runtimes can
    • HPC systems typically have max runtime (e.g. 24:00:00 or 48:00:00).
    • Must make arrangements with HPC system operates for longer runtimes
  • allocation account - who's allocation will be charged for computing time
    • CARC offers "no-cost" allocations to University researchers that request them
    • Allocation will also include dedicated disk storage on CARC /project filesystem (e.g. /project/maechlin_162 /project/scec_608)

Example command for running program on discovery:

  • This reserves a single compute node, for 1 hour, using SCEC allocation
%salloc --ntasks=1 --time=1:00:00 --account=scec_608

Wait until systems assigns you the requested nodes:

  • Your command line prompt will show you when the nodes are assigned.
  • Run your program like you on the command line:
    • Example profile query
  • %ucvm_query -f /project/scec_608/<username>/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
  • %ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out

Method 2 - Submit job through queue using slurm batch script

This method is best for running large jobs that require many nodes

  • This will put your job into a system queue
  • HPC systems have their own rules for prioritizing jobs in their queues
    • Your queue priority is not necessarily the order submitted
    • short running jobs may have priority
    • jobs requiring few nodes may have priority

Submitting and monitoring your jobs require slurm commands


Information to Prepare:

  • Number of tasks - typically 1 unless running MPI codes
  • Expected max duration of program : Format HH:MM:SS
  • Allocation to charge for computing time
  • Create slurm "job" file
  • Submit job using slurm comments
  • %cat ucvm_query.job
  • %sbatch ucvm_query.job
  • %squeue -u maechlin
  • %cat rpv_cvmsi.out

Related Entries