Difference between revisions of "Distributed Simulated Annealing"

Revision as of 22:46, 6 September 2011

Serial SA Algorithm

s = s0; e = E(s)
sbest = s; ebest = e
k = 0
while k < max_iterations:
- snew = neighbour(s)
- enew = E(snew)
- if P(e, enew, temperature) > random(); then
  - s = snew; e = enew
- if enew < ebest
  - sbest = snew; ebest = enew
- k++'

Parallel SA Algorithm

s = s0; e = E(s)
sbest = s; ebest = e
k = 0
while k < max_iterations
- on n processors, do nSubIterations iterations of serial SA
- find processor with best overall (lowest energy) solution, sbest
- redistribute sbest, ebest to all processors
- k += nSubIterations

Implementation

We implemented the parallel simulated annealing algorithm in OpenSHA (http://www.opensha.org), a Java-based framework for Seismic Hazard Analysis which is being used to develop UCERF3. All benchmarking calculations presented here were calculated on the USC HPCC cluster (http://www.usc.edu/hpcc/). There are two levels of parallelization used: cluster lever, and node level. Each HPCC node has 8 processors, so threading is used to make use of all available processors. We determined that 4 threads/node was optimal, possibly due to the use of a parallel sparse matrix multiplication package (used to calculate misfit, and thus energy) becoming overloaded when used with 8 threads/node. For cluster level parallelization, we used MPJ Express (http://mpj-express.org/, Baker 2007), a Java-based MPI implementation.

Conclusions

The parallel simulated annealing algorithm clearly presents

Performance Graphs

Dataset	ncal_constrained	ncal_unconstrained	state_constrained	state_unconstrained
Energy vs Time
Avg Energy vs Time
Serial Time vs Parallel Time
Time Speedup vs Time
Std. Dev. vs Time
Improvement vs Energy

@@ Line 21: / Line 21: @@
 ** redistribute sbest, ebest to all processors
 ** k += nSubIterations
+== Implementation ==
+We implemented the parallel simulated annealing algorithm in OpenSHA (http://www.opensha.org), a Java-based framework for Seismic Hazard Analysis which is being used to develop UCERF3. All benchmarking calculations presented here were calculated on the USC HPCC cluster (http://www.usc.edu/hpcc/). There are two levels of parallelization used: cluster lever, and node level. Each HPCC node has 8 processors, so threading is used to make use of all available processors. We determined that 4 threads/node was optimal, possibly due to the use of a parallel sparse matrix multiplication package (used to calculate misfit, and thus energy) becoming overloaded when used with 8 threads/node. For cluster level parallelization, we used MPJ Express (http://mpj-express.org/, Baker 2007), a Java-based MPI implementation.
+== Conclusions ==
+The parallel simulated annealing algorithm clearly presents
 == Performance Graphs ==

Difference between revisions of "Distributed Simulated Annealing"

Revision as of 22:46, 6 September 2011

Contents

Serial SA Algorithm

Parallel SA Algorithm

Implementation

Conclusions

Performance Graphs

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools