Difference between revisions of "CyberShake migration to Summit"

From SCECpedia
Jump to navigationJump to search
Line 10: Line 10:
 
*AWP-GPU-SGT:
 
*AWP-GPU-SGT:
  
== CUDA versions ==
+
== AWP-GPU-SGT ==
 +
 
 +
We arbitrarily selected site s2257 and ran the X-component (the AWP X-component, which is the RWG Y-component) on Summit, to compare to the results on Titan.
 +
 
 +
=== Numerical comparison ===
 +
 
 +
When comparing all SGT points (test-reference), we get:
 +
 
 +
Average diff = 1.317764e-12, average percent diff = -3.324251%
 +
Largest diff of 0.000150 at float index 1718660606.
 +
Largest percent diff of 5017404928.000000% at float index 8636382051.
 +
 
 +
When we only consider points greater than 1e-10 (peak SGT values are usually 1e-4 to 1e-3), we get:
 +
 
 +
Average diff = 1.317764e-12, average percent diff = 0.133713%, average absolute percent diff = 12.979661%
 +
Largest diff of 0.000150 at float index 1718660606.
 +
Largest percent diff of 7148245.500000% at float index 1626932526.
 +
 
 +
The average percent diff is much less than before, but we included the absolute percent diff, which reveals that many points differ by considerable amounts and there was some canceling going on when looking at the average percent diff.
 +
 
 +
As a result of this, we decided to more closely investigate what is causing the differences between the two systems.
 +
 
 +
=== Initial plots ===
 +
 
 +
We identified point ID 71610 as the point which contains the largest difference between the test and reference SGTs.
 +
 
 +
Below are plots of
 +
 
 +
 
 +
=== CUDA versions ===
  
 
The default version of CUDA on Titan is 9.1.85 and on Summit it's 9.2.148.  We tried rebuilding the AWP code on Summit with 9.1.85 and GCC and rerunning.
 
The default version of CUDA on Titan is 9.1.85 and on Summit it's 9.2.148.  We tried rebuilding the AWP code on Summit with 9.1.85 and GCC and rerunning.

Revision as of 15:58, 4 April 2019

This page is being used to gather information on the effort to migrate CyberShake from Titan to Summit.

Status

  • PreCVM: OK
  • UCVM:
  • Smoothing:
  • PreSGT: OK
  • PreAWP: OK
  • AWP-GPU-SGT:

AWP-GPU-SGT

We arbitrarily selected site s2257 and ran the X-component (the AWP X-component, which is the RWG Y-component) on Summit, to compare to the results on Titan.

Numerical comparison

When comparing all SGT points (test-reference), we get:

Average diff = 1.317764e-12, average percent diff = -3.324251%
Largest diff of 0.000150 at float index 1718660606.
Largest percent diff of 5017404928.000000% at float index 8636382051.

When we only consider points greater than 1e-10 (peak SGT values are usually 1e-4 to 1e-3), we get:

Average diff = 1.317764e-12, average percent diff = 0.133713%, average absolute percent diff = 12.979661%
Largest diff of 0.000150 at float index 1718660606.
Largest percent diff of 7148245.500000% at float index 1626932526.

The average percent diff is much less than before, but we included the absolute percent diff, which reveals that many points differ by considerable amounts and there was some canceling going on when looking at the average percent diff.

As a result of this, we decided to more closely investigate what is causing the differences between the two systems.

Initial plots

We identified point ID 71610 as the point which contains the largest difference between the test and reference SGTs.

Below are plots of


CUDA versions

The default version of CUDA on Titan is 9.1.85 and on Summit it's 9.2.148. We tried rebuilding the AWP code on Summit with 9.1.85 and GCC and rerunning.