Difference between revisions of "UCVM on Compute Nodes"

From SCECpedia
Jump to navigationJump to search
 
(5 intermediate revisions by the same user not shown)
Line 9: Line 9:
 
== Method 1 - Allocation Interactive Compute Nodes ==
 
== Method 1 - Allocation Interactive Compute Nodes ==
 
Information To Prepare:
 
Information To Prepare:
* scalloc - command that will reserve a dedicated compute node for your program.  
+
* salloc - command that will reserve a dedicated compute node for your program.  
 
** Using dedicated worker nodes should let your program run faster than shared headnode
 
** Using dedicated worker nodes should let your program run faster than shared headnode
 
* Number of tasks - typically 1 unless running MPI codes
 
* Number of tasks - typically 1 unless running MPI codes
Line 19: Line 19:
 
** CARC offers "no-cost" allocations to University researchers that request them
 
** CARC offers "no-cost" allocations to University researchers that request them
 
** Allocation will also include dedicated disk storage on CARC /project filesystem (e.g. /project/maechlin_162 /project/scec_608)
 
** Allocation will also include dedicated disk storage on CARC /project filesystem (e.g. /project/maechlin_162 /project/scec_608)
 +
*[https://carc.usc.edu/user-information/user-guides/high-performance-computing/running-jobs CARC Website]
  
 
Example command for running program on discovery:
 
Example command for running program on discovery:
Line 26: Line 27:
 
</pre>
 
</pre>
  
== Example profile query ==
+
Wait until systems assigns you the requested nodes:
%ucvm_query -f /project/scec_608/<username>/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
+
* Your command line prompt will show you when the nodes are assigned.
* ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
+
* Run your program like you on the command line:
 +
** Example profile query
 +
* %ucvm_query -f /project/scec_608/<username>/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
 +
* %ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
  
== slurm batch script ==
+
== Method 2 - Submit job through queue using slurm batch script ==
*%cat ucvm_query.job
+
This method is best for running large jobs that require many nodes
 +
* This will put your job into a system queue
 +
* HPC systems have their own rules for prioritizing jobs in their queues
 +
** Your queue priority is not necessarily the order submitted
 +
** short running jobs may have priority
 +
** jobs requiring few nodes may have priority
  
*%sbatch ucvm_query.job
+
Submitting and monitoring your jobs require slurm commands
 +
*[https://carc.usc.edu/user-information/user-guides/high-performance-computing/slurm-templates CARC Slurm Examples]
 +
* Commonly used Slurm commands:
 +
** sbatch <job file name> - Submit job to queue
 +
** squeue -u <username> - check state of job in queue (waiting, running, completed)
 +
** Note slurm outputs log file (and possibly an error file) when job completes:
 +
** These often provide useful information about what happened with your job:
  
*%squeue -u maechlin
+
Example:
 +
<pre>
 +
(base) [maechlin@discovery1 test_ucvm]$ sbatch ucvm_query.job
 +
Submitted batch job 4219152
 +
(base) [maechlin@discovery1 test_ucvm]$ squeue -u maechlin
 +
            JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
 +
          4219152      main ucvm_que maechlin  R      0:04      1 d05-10
 +
 
 +
(base) [maechlin@discovery1 test_ucvm]$ ls -c1 *.out
 +
rpv_cvmsi.out
 +
slurm-4219152.out
 +
</pre>
 +
 
 +
Information to Prepare:
 +
* Number of tasks - typically 1 unless running MPI codes
 +
* Expected max duration of program : Format HH:MM:SS
 +
* Allocation to charge for computing time
 +
* Post this information into a slurm "job" file
 +
** Example in /home1/<username>/test_ucvm
 +
 
 +
Example Slurm Script:
 +
<pre>
 +
#!/bin/bash
 +
#SBATCH --ntasks=1
 +
#SBATCH --cpus-per-task=2
 +
#SBATCH --mem-per-cpu=4GB
 +
#SBATCH --time=0:30:00
 +
#SBATCH --account=scec_608
 +
 
 +
source /project/maechlin_162/ucvm_bin/conf/ucvm_env.sh
 +
ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
 +
</pre>
 +
 
 +
Submit job using slurm comments
 +
* Locate example scripts in /home1/<username>/test_ucvm
 +
** Confirm the information in the slurm job file
 +
** %cat ucvm_query.job
 +
* %sbatch ucvm_query.job
 +
* %squeue -u maechlin
  
*%cat rpv_cvmsi.out
+
Check the output results:
 +
* %cat rpv_cvmsi.out
  
 
== Related Entries ==
 
== Related Entries ==

Latest revision as of 18:24, 26 April 2021

Examples of running UCVM Query on compute nodes on Discovery

Running on dedicated compute nodes

Reasons for using compute nodes

  • login (headnode) is shared by all users.
  • compute nodes are dedicated to your job (while in use) and not shared
  • HPC Center's don't like programs running on headnode

Method 1 - Allocation Interactive Compute Nodes

Information To Prepare:

  • salloc - command that will reserve a dedicated compute node for your program.
    • Using dedicated worker nodes should let your program run faster than shared headnode
  • Number of tasks - typically 1 unless running MPI codes
  • Expected max duration of program : Format HH:MM:SS
    • Longer runtimes can
    • HPC systems typically have max runtime (e.g. 24:00:00 or 48:00:00).
    • Must make arrangements with HPC system operates for longer runtimes
  • allocation account - who's allocation will be charged for computing time
    • CARC offers "no-cost" allocations to University researchers that request them
    • Allocation will also include dedicated disk storage on CARC /project filesystem (e.g. /project/maechlin_162 /project/scec_608)
  • CARC Website

Example command for running program on discovery:

  • This reserves a single compute node, for 1 hour, using SCEC allocation
%salloc --ntasks=1 --time=1:00:00 --account=scec_608

Wait until systems assigns you the requested nodes:

  • Your command line prompt will show you when the nodes are assigned.
  • Run your program like you on the command line:
    • Example profile query
  • %ucvm_query -f /project/scec_608/<username>/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out
  • %ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out

Method 2 - Submit job through queue using slurm batch script

This method is best for running large jobs that require many nodes

  • This will put your job into a system queue
  • HPC systems have their own rules for prioritizing jobs in their queues
    • Your queue priority is not necessarily the order submitted
    • short running jobs may have priority
    • jobs requiring few nodes may have priority

Submitting and monitoring your jobs require slurm commands

  • CARC Slurm Examples
  • Commonly used Slurm commands:
    • sbatch <job file name> - Submit job to queue
    • squeue -u <username> - check state of job in queue (waiting, running, completed)
    • Note slurm outputs log file (and possibly an error file) when job completes:
    • These often provide useful information about what happened with your job:

Example:

(base) [maechlin@discovery1 test_ucvm]$ sbatch ucvm_query.job
Submitted batch job 4219152
(base) [maechlin@discovery1 test_ucvm]$ squeue -u maechlin
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
           4219152      main ucvm_que maechlin  R       0:04      1 d05-10 

(base) [maechlin@discovery1 test_ucvm]$ ls -c1 *.out
rpv_cvmsi.out
slurm-4219152.out

Information to Prepare:

  • Number of tasks - typically 1 unless running MPI codes
  • Expected max duration of program : Format HH:MM:SS
  • Allocation to charge for computing time
  • Post this information into a slurm "job" file
    • Example in /home1/<username>/test_ucvm

Example Slurm Script:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=4GB
#SBATCH --time=0:30:00
#SBATCH --account=scec_608

source /project/maechlin_162/ucvm_bin/conf/ucvm_env.sh
ucvm_query -f /project/maechlin_162/ucvm_bin/conf/ucvm.conf -m cvmsi < rpv.in > rpv_cvmsi.out

Submit job using slurm comments

  • Locate example scripts in /home1/<username>/test_ucvm
    • Confirm the information in the slurm job file
    • %cat ucvm_query.job
  • %sbatch ucvm_query.job
  • %squeue -u maechlin

Check the output results:

  • %cat rpv_cvmsi.out

Related Entries