

<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://strike.scec.org/scecwiki/index.php?action=history&amp;feed=atom&amp;title=Comparison_of_Compute_Nodes</id>
	<title>Comparison of Compute Nodes - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://strike.scec.org/scecwiki/index.php?action=history&amp;feed=atom&amp;title=Comparison_of_Compute_Nodes"/>
	<link rel="alternate" type="text/html" href="https://strike.scec.org/scecwiki/index.php?title=Comparison_of_Compute_Nodes&amp;action=history"/>
	<updated>2026-04-29T00:44:51Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.34.2</generator>
	<entry>
		<id>https://strike.scec.org/scecwiki/index.php?title=Comparison_of_Compute_Nodes&amp;diff=13553&amp;oldid=prev</id>
		<title>Maechlin: Created page with ' == Option 1 == Option 1:  Nvidia GPU – strength is in processing power (flops) and memory bandwidth.   *Pros: ** Ideal for multi threaded/parallel and floating point code ** C…'</title>
		<link rel="alternate" type="text/html" href="https://strike.scec.org/scecwiki/index.php?title=Comparison_of_Compute_Nodes&amp;diff=13553&amp;oldid=prev"/>
		<updated>2015-06-25T19:48:05Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;#039; == Option 1 == Option 1:  Nvidia GPU – strength is in processing power (flops) and memory bandwidth.   *Pros: ** Ideal for multi threaded/parallel and floating point code ** C…&amp;#039;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&lt;br /&gt;
== Option 1 ==&lt;br /&gt;
Option 1:  Nvidia GPU – strength is in processing power (flops) and memory bandwidth. &lt;br /&gt;
&lt;br /&gt;
*Pros:&lt;br /&gt;
** Ideal for multi threaded/parallel and floating point code&lt;br /&gt;
** CUDA programming allows for control over memory hierarchy, data movement and synchronization&lt;br /&gt;
** 3rd party software applications have CUDA hooks available, so users just have to activate.  Matlab, R&lt;br /&gt;
** Standard compilers and parallel programming models can take advantage of GPUS&lt;br /&gt;
** Still requires x86 chipset so nodes with GPUs still can be utilized by jobs running x86&lt;br /&gt;
**Well documented, tutorials and courses available on the web&lt;br /&gt;
&lt;br /&gt;
*Cons:&lt;br /&gt;
** Not efficient for graph algorithms, sparse linear algebra, searches and sorts&lt;br /&gt;
** Need to learn CUDA programming language to take full advantage of GPU or code could actually run slower than x86 chipset&lt;br /&gt;
** Programmers must think about best use of memory on device and data transfers&lt;br /&gt;
** Requires increased power and cooling&lt;br /&gt;
&lt;br /&gt;
== Option 2 ==&lt;br /&gt;
Option 2: Intel Xeon PHI&lt;br /&gt;
&lt;br /&gt;
*Pros:&lt;br /&gt;
** Potentially a simple recompile will enable PHI &lt;br /&gt;
** Standard compilers and parallel programming models can take advantage of PHI&lt;br /&gt;
** Coding for PHI is similar to standard x86 coding – porting relatively straightforward (according to TACC)&lt;br /&gt;
** Still requires x86 chipset so nodes with accelerators still can be utilized by jobs running x86&lt;br /&gt;
** Wider vector unit, wider hardware thread count&lt;br /&gt;
** TACC (Texas Advanced Computing Center) has large PHI center and provide documentation and are collaborative&lt;br /&gt;
** Code optimized for PHI should also result in optimized CPU utilization&lt;br /&gt;
     &lt;br /&gt;
*Cons:&lt;br /&gt;
** Typically need to learn PHI programming to properly take advantage of PHI or code could  actually run slower than x86 chipset&lt;br /&gt;
** Programmers must think about best use of memory on device and data transfers – performance requires effort&lt;br /&gt;
** Has three runtime modes – PHI, MPI on host and PHI, MPI on host offload to PHI – administratively challenging, end-user &lt;br /&gt;
** Co processor runs a stripped down linux operating system requiring administrative overhead to seamlessly provide  compute job access, data, applications and user code&lt;br /&gt;
** Intel is re-designing PHI and will be releasing new architected version sometime in Q# 201#&lt;br /&gt;
** Requires increased power and cooling&lt;br /&gt;
&lt;br /&gt;
== Option 3 ==&lt;br /&gt;
Option 3: No accelerator – use X86 Chipset&lt;br /&gt;
&lt;br /&gt;
*Pros:&lt;br /&gt;
** General purpose and already implemented by applications and general code&lt;br /&gt;
** No additional costs so can obtain more x86 nodes with same budget&lt;br /&gt;
** No additional power or cooling requirements so potentially more nodes in for existing infrastructure footprint&lt;br /&gt;
&lt;br /&gt;
*Cons:&lt;br /&gt;
** Cutting edge technology not available for researchers to take advantage&lt;br /&gt;
** No scaling up testbed available for national centers which utilize accelerators for larger multi processor jobs&lt;br /&gt;
&lt;br /&gt;
== Related Links ==&lt;br /&gt;
*[[CME Project]]&lt;br /&gt;
*[[Main Page]]&lt;/div&gt;</summary>
		<author><name>Maechlin</name></author>
		
	</entry>
</feed>