Difference between revisions of "UCVM on Compute Nodes"
From SCECpedia
Jump to navigationJump to searchLine 9: | Line 9: | ||
== Method 1 - Allocation Interactive Compute Nodes == | == Method 1 - Allocation Interactive Compute Nodes == | ||
Information To Prepare: | Information To Prepare: | ||
− | * | + | * salloc - command that will reserve a dedicated compute node for your program. |
** Using dedicated worker nodes should let your program run faster than shared headnode | ** Using dedicated worker nodes should let your program run faster than shared headnode | ||
* Number of tasks - typically 1 unless running MPI codes | * Number of tasks - typically 1 unless running MPI codes | ||
Line 19: | Line 19: | ||
** CARC offers "no-cost" allocations to University researchers that request them | ** CARC offers "no-cost" allocations to University researchers that request them | ||
** Allocation will also include dedicated disk storage on CARC /project filesystem (e.g. /project/maechlin_162 /project/scec_608) | ** Allocation will also include dedicated disk storage on CARC /project filesystem (e.g. /project/maechlin_162 /project/scec_608) | ||
+ | *[https://carc.usc.edu/user-information/user-guides/high-performance-computing/running-jobs CARC Website] | ||
Example command for running program on discovery: | Example command for running program on discovery: |
Latest revision as of 18:24, 26 April 2021
Examples of running UCVM Query on compute nodes on Discovery
Contents
Running on dedicated compute nodes
Reasons for using compute nodes
- login (headnode) is shared by all users.
- compute nodes are dedicated to your job (while in use) and not shared
- HPC Center's don't like programs running on headnode
Method 1 - Allocation Interactive Compute Nodes
Information To Prepare:
- salloc - command that will reserve a dedicated compute node for your program.
- Using dedicated worker nodes should let your program run faster than shared headnode
- Number of tasks - typically 1 unless running MPI codes
- Expected max duration of program : Format HH:MM:SS
- Longer runtimes can
- HPC systems typically have max runtime (e.g. 24:00:00 or 48:00:00).
- Must make arrangements with HPC system operates for longer runtimes
- allocation account - who's allocation will be charged for computing time
- CARC offers "no-cost" allocations to University researchers that request them
- Allocation will also include dedicated disk storage on CARC /project filesystem (e.g. /project/maechlin_162 /project/scec_608)
- CARC Website
Example command for running program on discovery:
- This reserves a single compute node, for 1 hour, using SCEC allocation
%salloc --ntasks=1 --time=1:00:00 --account=scec_608
Wait until systems assigns you the requested nodes:
- Your command line prompt will show you when the nodes are assigned.
- Run your program like you on the command line:
- Example profile query
- %ucvm_query -f /project/scec_608/<username>/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
- %ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
Method 2 - Submit job through queue using slurm batch script
This method is best for running large jobs that require many nodes
- This will put your job into a system queue
- HPC systems have their own rules for prioritizing jobs in their queues
- Your queue priority is not necessarily the order submitted
- short running jobs may have priority
- jobs requiring few nodes may have priority
Submitting and monitoring your jobs require slurm commands
- CARC Slurm Examples
- Commonly used Slurm commands:
- sbatch <job file name> - Submit job to queue
- squeue -u <username> - check state of job in queue (waiting, running, completed)
- Note slurm outputs log file (and possibly an error file) when job completes:
- These often provide useful information about what happened with your job:
Example:
(base) [maechlin@discovery1 test_ucvm]$ sbatch ucvm_query.job Submitted batch job 4219152 (base) [maechlin@discovery1 test_ucvm]$ squeue -u maechlin JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 4219152 main ucvm_que maechlin R 0:04 1 d05-10 (base) [maechlin@discovery1 test_ucvm]$ ls -c1 *.out rpv_cvmsi.out slurm-4219152.out
Information to Prepare:
- Number of tasks - typically 1 unless running MPI codes
- Expected max duration of program : Format HH:MM:SS
- Allocation to charge for computing time
- Post this information into a slurm "job" file
- Example in /home1/<username>/test_ucvm
Example Slurm Script:
#!/bin/bash #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 #SBATCH --mem-per-cpu=4GB #SBATCH --time=0:30:00 #SBATCH --account=scec_608 source /project/maechlin_162/ucvm_bin/conf/ucvm_env.sh ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
Submit job using slurm comments
- Locate example scripts in /home1/<username>/test_ucvm
- Confirm the information in the slurm job file
- %cat ucvm_query.job
- %sbatch ucvm_query.job
- %squeue -u maechlin
Check the output results:
- %cat rpv_cvmsi.out