Difference between revisions of "UCVM on Compute Nodes"

Latest revision as of 18:24, 26 April 2021

Examples of running UCVM Query on compute nodes on Discovery

Running on dedicated compute nodes

Reasons for using compute nodes

login (headnode) is shared by all users.
compute nodes are dedicated to your job (while in use) and not shared
HPC Center's don't like programs running on headnode

Method 1 - Allocation Interactive Compute Nodes

Information To Prepare:

salloc - command that will reserve a dedicated compute node for your program.
- Using dedicated worker nodes should let your program run faster than shared headnode
Number of tasks - typically 1 unless running MPI codes
Expected max duration of program : Format HH:MM:SS
- Longer runtimes can
- HPC systems typically have max runtime (e.g. 24:00:00 or 48:00:00).
- Must make arrangements with HPC system operates for longer runtimes
allocation account - who's allocation will be charged for computing time
- CARC offers "no-cost" allocations to University researchers that request them
- Allocation will also include dedicated disk storage on CARC /project filesystem (e.g. /project/maechlin_162 /project/scec_608)
CARC Website

Example command for running program on discovery:

This reserves a single compute node, for 1 hour, using SCEC allocation

%salloc --ntasks=1 --time=1:00:00 --account=scec_608

Wait until systems assigns you the requested nodes:

Your command line prompt will show you when the nodes are assigned.
Run your program like you on the command line:
- Example profile query
%ucvm_query -f /project/scec_608/<username>/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
%ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out

Method 2 - Submit job through queue using slurm batch script

This method is best for running large jobs that require many nodes

This will put your job into a system queue
HPC systems have their own rules for prioritizing jobs in their queues
- Your queue priority is not necessarily the order submitted
- short running jobs may have priority
- jobs requiring few nodes may have priority

Submitting and monitoring your jobs require slurm commands

CARC Slurm Examples
Commonly used Slurm commands:
- sbatch <job file name> - Submit job to queue
- squeue -u <username> - check state of job in queue (waiting, running, completed)
- Note slurm outputs log file (and possibly an error file) when job completes:
- These often provide useful information about what happened with your job:

Example:

(base) [maechlin@discovery1 test_ucvm]$ sbatch ucvm_query.job
Submitted batch job 4219152
(base) [maechlin@discovery1 test_ucvm]$ squeue -u maechlin
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
           4219152      main ucvm_que maechlin  R       0:04      1 d05-10 

(base) [maechlin@discovery1 test_ucvm]$ ls -c1 *.out
rpv_cvmsi.out
slurm-4219152.out

Information to Prepare:

Number of tasks - typically 1 unless running MPI codes
Expected max duration of program : Format HH:MM:SS
Allocation to charge for computing time
Post this information into a slurm "job" file
- Example in /home1/<username>/test_ucvm

Example Slurm Script:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=4GB
#SBATCH --time=0:30:00
#SBATCH --account=scec_608

source /project/maechlin_162/ucvm_bin/conf/ucvm_env.sh
ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out

Submit job using slurm comments

Locate example scripts in /home1/<username>/test_ucvm
- Confirm the information in the slurm job file
- %cat ucvm_query.job
%sbatch ucvm_query.job
%squeue -u maechlin

Check the output results:

%cat rpv_cvmsi.out

@@ Line 1: / Line 1: @@
 Examples of running UCVM Query on compute nodes on Discovery
-== allocation interactive nodes ==
+== Running on dedicated compute nodes ==
+Reasons for using compute nodes
+* login (headnode) is shared by all users.
+* compute nodes are dedicated to your job (while in use) and not shared
+* HPC Center's don't like programs running on headnode
+== Method 1 - Allocation Interactive Compute Nodes ==
+Information To Prepare:
+* salloc - command that will reserve a dedicated compute node for your program.
+** Using dedicated worker nodes should let your program run faster than shared headnode
+* Number of tasks - typically 1 unless running MPI codes
+* Expected max duration of program : Format HH:MM:SS
+** Longer runtimes can
+** HPC systems typically have max runtime (e.g. 24:00:00 or 48:00:00).
+** Must make arrangements with HPC system operates for longer runtimes
+* allocation account - who's allocation will be charged for computing time
+** CARC offers "no-cost" allocations to University researchers that request them
+** Allocation will also include dedicated disk storage on CARC /project filesystem (e.g. /project/maechlin_162 /project/scec_608)
+*[https://carc.usc.edu/user-information/user-guides/high-performance-computing/running-jobs CARC Website]
+Example command for running program on discovery:
+*This reserves a single compute node, for 1 hour, using SCEC allocation
+<pre>
 %salloc --ntasks=1 --time=1:00:00 --account=scec_608
+</pre>
+Wait until systems assigns you the requested nodes:
+* Your command line prompt will show you when the nodes are assigned.
+* Run your program like you on the command line:
+** Example profile query
+* %ucvm_query -f /project/scec_608/<username>/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
+* %ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
+== Method 2 - Submit job through queue using slurm batch script ==
+This method is best for running large jobs that require many nodes
+* This will put your job into a system queue
+* HPC systems have their own rules for prioritizing jobs in their queues
+** Your queue priority is not necessarily the order submitted
+** short running jobs may have priority
+** jobs requiring few nodes may have priority
+Submitting and monitoring your jobs require slurm commands
+*[https://carc.usc.edu/user-information/user-guides/high-performance-computing/slurm-templates CARC Slurm Examples]
+* Commonly used Slurm commands:
+** sbatch <job file name> - Submit job to queue
+** squeue -u <username> - check state of job in queue (waiting, running, completed)
+** Note slurm outputs log file (and possibly an error file) when job completes:
+** These often provide useful information about what happened with your job:
+Example:
+<pre>
+(base) [maechlin@discovery1 test_ucvm]$ sbatch ucvm_query.job
+Submitted batch job 4219152
+(base) [maechlin@discovery1 test_ucvm]$ squeue -u maechlin
+             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
+           4219152      main ucvm_que maechlin  R       0:04      1 d05-10
+(base) [maechlin@discovery1 test_ucvm]$ ls -c1 *.out
+rpv_cvmsi.out
+slurm-4219152.out
+</pre>
-== Example profile query ==
+Information to Prepare:
-%ucvm_query -f /project/scec_608/<username>/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
+* Number of tasks - typically 1 unless running MPI codes
-* ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
+* Expected max duration of program : Format HH:MM:SS
+* Allocation to charge for computing time
+* Post this information into a slurm "job" file
+** Example in /home1/<username>/test_ucvm
-== slurm batch script ==
+Example Slurm Script:
-*%cat ucvm_query.job
+<pre>
+#!/bin/bash
+#SBATCH --ntasks=1
+#SBATCH --cpus-per-task=2
+#SBATCH --mem-per-cpu=4GB
+#SBATCH --time=0:30:00
+#SBATCH --account=scec_608
-*%sbatch ucvm_query.job
+source /project/maechlin_162/ucvm_bin/conf/ucvm_env.sh
+ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
+</pre>
-*%squeue -u maechlin
+Submit job using slurm comments
+* Locate example scripts in /home1/<username>/test_ucvm
+** Confirm the information in the slurm job file
+** %cat ucvm_query.job
+* %sbatch ucvm_query.job
+* %squeue -u maechlin
-*%cat rpv_cvmsi.out
+Check the output results:
+* %cat rpv_cvmsi.out
 == Related Entries ==
-*[[UCVM on Compute Nodes]]
+*[[UCVM Basin Query]]
-*[[UCVM Training]]
 *[[UCVM Plotting on Discovery]]
 *[[Export XWindows to Client]]

Difference between revisions of "UCVM on Compute Nodes"

Latest revision as of 18:24, 26 April 2021

Contents

Running on dedicated compute nodes

Method 1 - Allocation Interactive Compute Nodes

Method 2 - Submit job through queue using slurm batch script

Related Entries

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools