Difference between revisions of "UCVM on Compute Nodes"
From SCECpedia
Jump to navigationJump to searchLine 34: | Line 34: | ||
== Method 2 - Submit job through queue using slurm batch script == | == Method 2 - Submit job through queue using slurm batch script == | ||
− | + | This method is best for running large jobs that require many nodes | |
* This will put your job into a system queue | * This will put your job into a system queue | ||
* HPC systems have their own rules for prioritizing jobs in their queues | * HPC systems have their own rules for prioritizing jobs in their queues | ||
Line 40: | Line 40: | ||
** short running jobs may have priority | ** short running jobs may have priority | ||
** jobs requiring few nodes may have priority | ** jobs requiring few nodes may have priority | ||
− | + | ||
− | * | + | Submitting and monitoring your jobs require slurm commands |
+ | *[https://carc.usc.edu/user-information/user-guides/high-performance-computing/slurm-templates CARC Slurm Examples] | ||
Revision as of 04:06, 26 April 2021
Examples of running UCVM Query on compute nodes on Discovery
Contents
Running on dedicated compute nodes
Reasons for using compute nodes
- login (headnode) is shared by all users.
- compute nodes are dedicated to your job (while in use) and not shared
- HPC Center's don't like programs running on headnode
Method 1 - Allocation Interactive Compute Nodes
Information To Prepare:
- scalloc - command that will reserve a dedicated compute node for your program.
- Using dedicated worker nodes should let your program run faster than shared headnode
- Number of tasks - typically 1 unless running MPI codes
- Expected max duration of program : Format HH:MM:SS
- Longer runtimes can
- HPC systems typically have max runtime (e.g. 24:00:00 or 48:00:00).
- Must make arrangements with HPC system operates for longer runtimes
- allocation account - who's allocation will be charged for computing time
- CARC offers "no-cost" allocations to University researchers that request them
- Allocation will also include dedicated disk storage on CARC /project filesystem (e.g. /project/maechlin_162 /project/scec_608)
Example command for running program on discovery:
- This reserves a single compute node, for 1 hour, using SCEC allocation
%salloc --ntasks=1 --time=1:00:00 --account=scec_608
Wait until systems assigns you the requested nodes:
- Your command line prompt will show you when the nodes are assigned.
- Run your program like you on the command line:
- Example profile query
- %ucvm_query -f /project/scec_608/<username>/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
- %ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
Method 2 - Submit job through queue using slurm batch script
This method is best for running large jobs that require many nodes
- This will put your job into a system queue
- HPC systems have their own rules for prioritizing jobs in their queues
- Your queue priority is not necessarily the order submitted
- short running jobs may have priority
- jobs requiring few nodes may have priority
Submitting and monitoring your jobs require slurm commands
Information to Prepare:
- Number of tasks - typically 1 unless running MPI codes
- Expected max duration of program : Format HH:MM:SS
- Allocation to charge for computing time
- Create slurm "job" file
- Submit job using slurm comments
- %cat ucvm_query.job
- %sbatch ucvm_query.job
- %squeue -u maechlin
- %cat rpv_cvmsi.out