Comparison of Compute Nodes
From SCECpedia
Jump to navigationJump to searchContents
Option 1
Option 1: Nvidia GPU – strength is in processing power (flops) and memory bandwidth.
- Pros:
- Ideal for multi threaded/parallel and floating point code
- CUDA programming allows for control over memory hierarchy, data movement and synchronization
- 3rd party software applications have CUDA hooks available, so users just have to activate. Matlab, R
- Standard compilers and parallel programming models can take advantage of GPUS
- Still requires x86 chipset so nodes with GPUs still can be utilized by jobs running x86
- Well documented, tutorials and courses available on the web
- Cons:
- Not efficient for graph algorithms, sparse linear algebra, searches and sorts
- Need to learn CUDA programming language to take full advantage of GPU or code could actually run slower than x86 chipset
- Programmers must think about best use of memory on device and data transfers
- Requires increased power and cooling
Option 2
Option 2: Intel Xeon PHI
- Pros:
- Potentially a simple recompile will enable PHI
- Standard compilers and parallel programming models can take advantage of PHI
- Coding for PHI is similar to standard x86 coding – porting relatively straightforward (according to TACC)
- Still requires x86 chipset so nodes with accelerators still can be utilized by jobs running x86
- Wider vector unit, wider hardware thread count
- TACC (Texas Advanced Computing Center) has large PHI center and provide documentation and are collaborative
- Code optimized for PHI should also result in optimized CPU utilization
- Cons:
- Typically need to learn PHI programming to properly take advantage of PHI or code could actually run slower than x86 chipset
- Programmers must think about best use of memory on device and data transfers – performance requires effort
- Has three runtime modes – PHI, MPI on host and PHI, MPI on host offload to PHI – administratively challenging, end-user
- Co processor runs a stripped down linux operating system requiring administrative overhead to seamlessly provide compute job access, data, applications and user code
- Intel is re-designing PHI and will be releasing new architected version sometime in Q# 201#
- Requires increased power and cooling
Option 3
Option 3: No accelerator – use X86 Chipset
- Pros:
- General purpose and already implemented by applications and general code
- No additional costs so can obtain more x86 nodes with same budget
- No additional power or cooling requirements so potentially more nodes in for existing infrastructure footprint
- Cons:
- Cutting edge technology not available for researchers to take advantage
- No scaling up testbed available for national centers which utilize accelerators for larger multi processor jobs