Difference between revisions of "UCVM on Compute Nodes"
From SCECpedia
Jump to navigationJump to search| (5 intermediate revisions by the same user not shown) | |||
| Line 9: | Line 9: | ||
== Method 1 - Allocation Interactive Compute Nodes == | == Method 1 - Allocation Interactive Compute Nodes == | ||
Information To Prepare: | Information To Prepare: | ||
| − | * | + | * salloc - command that will reserve a dedicated compute node for your program. |
** Using dedicated worker nodes should let your program run faster than shared headnode | ** Using dedicated worker nodes should let your program run faster than shared headnode | ||
* Number of tasks - typically 1 unless running MPI codes | * Number of tasks - typically 1 unless running MPI codes | ||
| Line 19: | Line 19: | ||
** CARC offers "no-cost" allocations to University researchers that request them | ** CARC offers "no-cost" allocations to University researchers that request them | ||
** Allocation will also include dedicated disk storage on CARC /project filesystem (e.g. /project/maechlin_162 /project/scec_608) | ** Allocation will also include dedicated disk storage on CARC /project filesystem (e.g. /project/maechlin_162 /project/scec_608) | ||
| + | *[https://carc.usc.edu/user-information/user-guides/high-performance-computing/running-jobs CARC Website] | ||
Example command for running program on discovery: | Example command for running program on discovery: | ||
| Line 26: | Line 27: | ||
</pre> | </pre> | ||
| − | + | Wait until systems assigns you the requested nodes: | |
| − | %ucvm_query -f /project/scec_608/<username>/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out | + | * Your command line prompt will show you when the nodes are assigned. |
| − | * ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out | + | * Run your program like you on the command line: |
| + | ** Example profile query | ||
| + | * %ucvm_query -f /project/scec_608/<username>/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out | ||
| + | * %ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out | ||
| − | == slurm batch script == | + | == Method 2 - Submit job through queue using slurm batch script == |
| − | * | + | This method is best for running large jobs that require many nodes |
| + | * This will put your job into a system queue | ||
| + | * HPC systems have their own rules for prioritizing jobs in their queues | ||
| + | ** Your queue priority is not necessarily the order submitted | ||
| + | ** short running jobs may have priority | ||
| + | ** jobs requiring few nodes may have priority | ||
| − | * | + | Submitting and monitoring your jobs require slurm commands |
| + | *[https://carc.usc.edu/user-information/user-guides/high-performance-computing/slurm-templates CARC Slurm Examples] | ||
| + | * Commonly used Slurm commands: | ||
| + | ** sbatch <job file name> - Submit job to queue | ||
| + | ** squeue -u <username> - check state of job in queue (waiting, running, completed) | ||
| + | ** Note slurm outputs log file (and possibly an error file) when job completes: | ||
| + | ** These often provide useful information about what happened with your job: | ||
| − | *%squeue -u maechlin | + | Example: |
| + | <pre> | ||
| + | (base) [maechlin@discovery1 test_ucvm]$ sbatch ucvm_query.job | ||
| + | Submitted batch job 4219152 | ||
| + | (base) [maechlin@discovery1 test_ucvm]$ squeue -u maechlin | ||
| + | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | ||
| + | 4219152 main ucvm_que maechlin R 0:04 1 d05-10 | ||
| + | |||
| + | (base) [maechlin@discovery1 test_ucvm]$ ls -c1 *.out | ||
| + | rpv_cvmsi.out | ||
| + | slurm-4219152.out | ||
| + | </pre> | ||
| + | |||
| + | Information to Prepare: | ||
| + | * Number of tasks - typically 1 unless running MPI codes | ||
| + | * Expected max duration of program : Format HH:MM:SS | ||
| + | * Allocation to charge for computing time | ||
| + | * Post this information into a slurm "job" file | ||
| + | ** Example in /home1/<username>/test_ucvm | ||
| + | |||
| + | Example Slurm Script: | ||
| + | <pre> | ||
| + | #!/bin/bash | ||
| + | #SBATCH --ntasks=1 | ||
| + | #SBATCH --cpus-per-task=2 | ||
| + | #SBATCH --mem-per-cpu=4GB | ||
| + | #SBATCH --time=0:30:00 | ||
| + | #SBATCH --account=scec_608 | ||
| + | |||
| + | source /project/maechlin_162/ucvm_bin/conf/ucvm_env.sh | ||
| + | ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out | ||
| + | </pre> | ||
| + | |||
| + | Submit job using slurm comments | ||
| + | * Locate example scripts in /home1/<username>/test_ucvm | ||
| + | ** Confirm the information in the slurm job file | ||
| + | ** %cat ucvm_query.job | ||
| + | * %sbatch ucvm_query.job | ||
| + | * %squeue -u maechlin | ||
| − | *%cat rpv_cvmsi.out | + | Check the output results: |
| + | * %cat rpv_cvmsi.out | ||
== Related Entries == | == Related Entries == | ||
Latest revision as of 18:24, 26 April 2021
Examples of running UCVM Query on compute nodes on Discovery
Contents
Running on dedicated compute nodes
Reasons for using compute nodes
- login (headnode) is shared by all users.
- compute nodes are dedicated to your job (while in use) and not shared
- HPC Center's don't like programs running on headnode
Method 1 - Allocation Interactive Compute Nodes
Information To Prepare:
- salloc - command that will reserve a dedicated compute node for your program.
- Using dedicated worker nodes should let your program run faster than shared headnode
- Number of tasks - typically 1 unless running MPI codes
- Expected max duration of program : Format HH:MM:SS
- Longer runtimes can
- HPC systems typically have max runtime (e.g. 24:00:00 or 48:00:00).
- Must make arrangements with HPC system operates for longer runtimes
- allocation account - who's allocation will be charged for computing time
- CARC offers "no-cost" allocations to University researchers that request them
- Allocation will also include dedicated disk storage on CARC /project filesystem (e.g. /project/maechlin_162 /project/scec_608)
- CARC Website
Example command for running program on discovery:
- This reserves a single compute node, for 1 hour, using SCEC allocation
%salloc --ntasks=1 --time=1:00:00 --account=scec_608
Wait until systems assigns you the requested nodes:
- Your command line prompt will show you when the nodes are assigned.
- Run your program like you on the command line:
- Example profile query
- %ucvm_query -f /project/scec_608/<username>/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
- %ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
Method 2 - Submit job through queue using slurm batch script
This method is best for running large jobs that require many nodes
- This will put your job into a system queue
- HPC systems have their own rules for prioritizing jobs in their queues
- Your queue priority is not necessarily the order submitted
- short running jobs may have priority
- jobs requiring few nodes may have priority
Submitting and monitoring your jobs require slurm commands
- CARC Slurm Examples
- Commonly used Slurm commands:
- sbatch <job file name> - Submit job to queue
- squeue -u <username> - check state of job in queue (waiting, running, completed)
- Note slurm outputs log file (and possibly an error file) when job completes:
- These often provide useful information about what happened with your job:
Example:
(base) [maechlin@discovery1 test_ucvm]$ sbatch ucvm_query.job
Submitted batch job 4219152
(base) [maechlin@discovery1 test_ucvm]$ squeue -u maechlin
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
4219152 main ucvm_que maechlin R 0:04 1 d05-10
(base) [maechlin@discovery1 test_ucvm]$ ls -c1 *.out
rpv_cvmsi.out
slurm-4219152.out
Information to Prepare:
- Number of tasks - typically 1 unless running MPI codes
- Expected max duration of program : Format HH:MM:SS
- Allocation to charge for computing time
- Post this information into a slurm "job" file
- Example in /home1/<username>/test_ucvm
Example Slurm Script:
#!/bin/bash #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 #SBATCH --mem-per-cpu=4GB #SBATCH --time=0:30:00 #SBATCH --account=scec_608 source /project/maechlin_162/ucvm_bin/conf/ucvm_env.sh ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
Submit job using slurm comments
- Locate example scripts in /home1/<username>/test_ucvm
- Confirm the information in the slurm job file
- %cat ucvm_query.job
- %sbatch ucvm_query.job
- %squeue -u maechlin
Check the output results:
- %cat rpv_cvmsi.out