Comparison of Compute Nodes

From SCECpedia
Jump to navigationJump to search

Option 1

Option 1: Nvidia GPU – strength is in processing power (flops) and memory bandwidth.

  • Pros:
    • Ideal for multi threaded/parallel and floating point code
    • CUDA programming allows for control over memory hierarchy, data movement and synchronization
    • 3rd party software applications have CUDA hooks available, so users just have to activate. Matlab, R
    • Standard compilers and parallel programming models can take advantage of GPUS
    • Still requires x86 chipset so nodes with GPUs still can be utilized by jobs running x86
    • Well documented, tutorials and courses available on the web
  • Cons:
    • Not efficient for graph algorithms, sparse linear algebra, searches and sorts
    • Need to learn CUDA programming language to take full advantage of GPU or code could actually run slower than x86 chipset
    • Programmers must think about best use of memory on device and data transfers
    • Requires increased power and cooling

Option 2

Option 2: Intel Xeon PHI

  • Pros:
    • Potentially a simple recompile will enable PHI
    • Standard compilers and parallel programming models can take advantage of PHI
    • Coding for PHI is similar to standard x86 coding – porting relatively straightforward (according to TACC)
    • Still requires x86 chipset so nodes with accelerators still can be utilized by jobs running x86
    • Wider vector unit, wider hardware thread count
    • TACC (Texas Advanced Computing Center) has large PHI center and provide documentation and are collaborative
    • Code optimized for PHI should also result in optimized CPU utilization
  • Cons:
    • Typically need to learn PHI programming to properly take advantage of PHI or code could actually run slower than x86 chipset
    • Programmers must think about best use of memory on device and data transfers – performance requires effort
    • Has three runtime modes – PHI, MPI on host and PHI, MPI on host offload to PHI – administratively challenging, end-user
    • Co processor runs a stripped down linux operating system requiring administrative overhead to seamlessly provide compute job access, data, applications and user code
    • Intel is re-designing PHI and will be releasing new architected version sometime in Q# 201#
    • Requires increased power and cooling

Option 3

Option 3: No accelerator – use X86 Chipset

  • Pros:
    • General purpose and already implemented by applications and general code
    • No additional costs so can obtain more x86 nodes with same budget
    • No additional power or cooling requirements so potentially more nodes in for existing infrastructure footprint
  • Cons:
    • Cutting edge technology not available for researchers to take advantage
    • No scaling up testbed available for national centers which utilize accelerators for larger multi processor jobs

Related Links