Maechlin: Created page with ' == Option 1 == Option 1: Nvidia GPU – strength is in processing power (flops) and memory bandwidth. *Pros: Ideal for multi threaded/parallel and floating point code C…'

2015-06-25T19:48:05Z

Created page with ' == Option 1 == Option 1: Nvidia GPU – strength is in processing power (flops) and memory bandwidth. *Pros: ** Ideal for multi threaded/parallel and floating point code ** C…'

New page

== Option 1 ==
Option 1: Nvidia GPU – strength is in processing power (flops) and memory bandwidth.

*Pros:
** Ideal for multi threaded/parallel and floating point code
** CUDA programming allows for control over memory hierarchy, data movement and synchronization
** 3rd party software applications have CUDA hooks available, so users just have to activate. Matlab, R
** Standard compilers and parallel programming models can take advantage of GPUS
** Still requires x86 chipset so nodes with GPUs still can be utilized by jobs running x86
**Well documented, tutorials and courses available on the web

*Cons:
** Not efficient for graph algorithms, sparse linear algebra, searches and sorts
** Need to learn CUDA programming language to take full advantage of GPU or code could actually run slower than x86 chipset
** Programmers must think about best use of memory on device and data transfers
** Requires increased power and cooling

== Option 2 ==
Option 2: Intel Xeon PHI

*Pros:
** Potentially a simple recompile will enable PHI
** Standard compilers and parallel programming models can take advantage of PHI
** Coding for PHI is similar to standard x86 coding – porting relatively straightforward (according to TACC)
** Still requires x86 chipset so nodes with accelerators still can be utilized by jobs running x86
** Wider vector unit, wider hardware thread count
** TACC (Texas Advanced Computing Center) has large PHI center and provide documentation and are collaborative
** Code optimized for PHI should also result in optimized CPU utilization

*Cons:
** Typically need to learn PHI programming to properly take advantage of PHI or code could actually run slower than x86 chipset
** Programmers must think about best use of memory on device and data transfers – performance requires effort
** Has three runtime modes – PHI, MPI on host and PHI, MPI on host offload to PHI – administratively challenging, end-user
** Co processor runs a stripped down linux operating system requiring administrative overhead to seamlessly provide compute job access, data, applications and user code
** Intel is re-designing PHI and will be releasing new architected version sometime in Q# 201#
** Requires increased power and cooling

== Option 3 ==
Option 3: No accelerator – use X86 Chipset

*Pros:
** General purpose and already implemented by applications and general code
** No additional costs so can obtain more x86 nodes with same budget
** No additional power or cooling requirements so potentially more nodes in for existing infrastructure footprint

*Cons:
** Cutting edge technology not available for researchers to take advantage
** No scaling up testbed available for national centers which utilize accelerators for larger multi processor jobs

== Related Links ==
*[[CME Project]]
*[[Main Page]]

Comparison of Compute Nodes - Revision history

Maechlin: Created page with ' == Option 1 == Option 1: Nvidia GPU – strength is in processing power (flops) and memory bandwidth. *Pros: ** Ideal for multi threaded/parallel and floating point code ** C…'

Maechlin: Created page with ' == Option 1 == Option 1: Nvidia GPU – strength is in processing power (flops) and memory bandwidth. *Pros: Ideal for multi threaded/parallel and floating point code C…'