Difference between revisions of "Distributed Simulated Annealing"
Line 21: | Line 21: | ||
** redistribute sbest, ebest to all processors | ** redistribute sbest, ebest to all processors | ||
** k += nSubIterations | ** k += nSubIterations | ||
+ | |||
+ | == Implementation == | ||
+ | We implemented the parallel simulated annealing algorithm in OpenSHA (http://www.opensha.org), a Java-based framework for Seismic Hazard Analysis which is being used to develop UCERF3. All benchmarking calculations presented here were calculated on the USC HPCC cluster (http://www.usc.edu/hpcc/). There are two levels of parallelization used: cluster lever, and node level. Each HPCC node has 8 processors, so threading is used to make use of all available processors. We determined that 4 threads/node was optimal, possibly due to the use of a parallel sparse matrix multiplication package (used to calculate misfit, and thus energy) becoming overloaded when used with 8 threads/node. For cluster level parallelization, we used MPJ Express (http://mpj-express.org/, Baker 2007), a Java-based MPI implementation. | ||
+ | |||
+ | == Conclusions == | ||
+ | The parallel simulated annealing algorithm clearly presents | ||
== Performance Graphs == | == Performance Graphs == |
Revision as of 22:46, 6 September 2011
Contents
Serial SA Algorithm
- s = s0; e = E(s)
- sbest = s; ebest = e
- k = 0
- while k < max_iterations:
- snew = neighbour(s)
- enew = E(snew)
- if P(e, enew, temperature) > random(); then
- s = snew; e = enew
- if enew < ebest
- sbest = snew; ebest = enew
- k++'
Parallel SA Algorithm
- s = s0; e = E(s)
- sbest = s; ebest = e
- k = 0
- while k < max_iterations
- on n processors, do nSubIterations iterations of serial SA
- find processor with best overall (lowest energy) solution, sbest
- redistribute sbest, ebest to all processors
- k += nSubIterations
Implementation
We implemented the parallel simulated annealing algorithm in OpenSHA (http://www.opensha.org), a Java-based framework for Seismic Hazard Analysis which is being used to develop UCERF3. All benchmarking calculations presented here were calculated on the USC HPCC cluster (http://www.usc.edu/hpcc/). There are two levels of parallelization used: cluster lever, and node level. Each HPCC node has 8 processors, so threading is used to make use of all available processors. We determined that 4 threads/node was optimal, possibly due to the use of a parallel sparse matrix multiplication package (used to calculate misfit, and thus energy) becoming overloaded when used with 8 threads/node. For cluster level parallelization, we used MPJ Express (http://mpj-express.org/, Baker 2007), a Java-based MPI implementation.
Conclusions
The parallel simulated annealing algorithm clearly presents