Difference between revisions of "UCVM on Compute Nodes"
From SCECpedia
Jump to navigationJump to searchLine 85: | Line 85: | ||
** Confirm the information in the slurm job file | ** Confirm the information in the slurm job file | ||
** %cat ucvm_query.job | ** %cat ucvm_query.job | ||
− | *%sbatch ucvm_query.job | + | * %sbatch ucvm_query.job |
− | *%squeue -u maechlin | + | * %squeue -u maechlin |
− | *%cat rpv_cvmsi.out | + | |
+ | Check the output results: | ||
+ | * %cat rpv_cvmsi.out | ||
== Related Entries == | == Related Entries == |
Revision as of 04:19, 26 April 2021
Examples of running UCVM Query on compute nodes on Discovery
Contents
Running on dedicated compute nodes
Reasons for using compute nodes
- login (headnode) is shared by all users.
- compute nodes are dedicated to your job (while in use) and not shared
- HPC Center's don't like programs running on headnode
Method 1 - Allocation Interactive Compute Nodes
Information To Prepare:
- scalloc - command that will reserve a dedicated compute node for your program.
- Using dedicated worker nodes should let your program run faster than shared headnode
- Number of tasks - typically 1 unless running MPI codes
- Expected max duration of program : Format HH:MM:SS
- Longer runtimes can
- HPC systems typically have max runtime (e.g. 24:00:00 or 48:00:00).
- Must make arrangements with HPC system operates for longer runtimes
- allocation account - who's allocation will be charged for computing time
- CARC offers "no-cost" allocations to University researchers that request them
- Allocation will also include dedicated disk storage on CARC /project filesystem (e.g. /project/maechlin_162 /project/scec_608)
Example command for running program on discovery:
- This reserves a single compute node, for 1 hour, using SCEC allocation
%salloc --ntasks=1 --time=1:00:00 --account=scec_608
Wait until systems assigns you the requested nodes:
- Your command line prompt will show you when the nodes are assigned.
- Run your program like you on the command line:
- Example profile query
- %ucvm_query -f /project/scec_608/<username>/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
- %ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
Method 2 - Submit job through queue using slurm batch script
This method is best for running large jobs that require many nodes
- This will put your job into a system queue
- HPC systems have their own rules for prioritizing jobs in their queues
- Your queue priority is not necessarily the order submitted
- short running jobs may have priority
- jobs requiring few nodes may have priority
Submitting and monitoring your jobs require slurm commands
- CARC Slurm Examples
- Commonly used Slurm commands:
- sbatch <job file name> - Submit job to queue
- squeue -u <username> - check state of job in queue (waiting, running, completed)
- Note slurm outputs log file (and possibly an error file) when job completes:
- These often provide useful information about what happened with your job:
Example:
(base) [maechlin@discovery1 test_ucvm]$ sbatch ucvm_query.job Submitted batch job 4219152 (base) [maechlin@discovery1 test_ucvm]$ squeue -u maechlin JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 4219152 main ucvm_que maechlin R 0:04 1 d05-10 (base) [maechlin@discovery1 test_ucvm]$ ls -c1 *.out rpv_cvmsi.out slurm-4219152.out
Information to Prepare:
- Number of tasks - typically 1 unless running MPI codes
- Expected max duration of program : Format HH:MM:SS
- Allocation to charge for computing time
- Create slurm "job" file
Example Slurm Script:
#!/bin/bash #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 #SBATCH --mem-per-cpu=4GB #SBATCH --time=0:30:00 #SBATCH --account=scec_608 source /project/maechlin_162/ucvm_bin/conf/ucvm_env.sh ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
Submit job using slurm comments
- Locate example scripts in /home1/<username>/test_ucvm
- Confirm the information in the slurm job file
- %cat ucvm_query.job
- %sbatch ucvm_query.job
- %squeue -u maechlin
Check the output results:
- %cat rpv_cvmsi.out