Difference between revisions of "Distributed Simulated Annealing"

From SCECpedia
Jump to navigationJump to search
Line 21: Line 21:
 
** redistribute sbest, ebest to all processors
 
** redistribute sbest, ebest to all processors
 
** k += nSubIterations
 
** k += nSubIterations
 +
 +
== Implementation ==
 +
We implemented the parallel simulated annealing algorithm in OpenSHA (http://www.opensha.org), a Java-based framework for Seismic Hazard Analysis which is being used to develop UCERF3. All benchmarking calculations presented here were calculated on the USC HPCC cluster (http://www.usc.edu/hpcc/). There are two levels of parallelization used: cluster lever, and node level. Each HPCC node has 8 processors, so threading is used to make use of all available processors. We determined that 4 threads/node was optimal, possibly due to the use of a parallel sparse matrix multiplication package (used to calculate misfit, and thus energy) becoming overloaded when used with 8 threads/node. For cluster level parallelization, we used MPJ Express (http://mpj-express.org/, Baker 2007), a Java-based MPI implementation.
 +
 +
== Conclusions ==
 +
The parallel simulated annealing algorithm clearly presents
  
 
== Performance Graphs ==
 
== Performance Graphs ==

Revision as of 22:46, 6 September 2011

Serial SA Algorithm

  • s = s0; e = E(s)
  • sbest = s; ebest = e
  • k = 0
  • while k < max_iterations:
    • snew = neighbour(s)
    • enew = E(snew)
    • if P(e, enew, temperature) > random(); then
      • s = snew; e = enew
    • if enew < ebest
      • sbest = snew; ebest = enew
    • k++'

Parallel SA Algorithm

  • s = s0; e = E(s)
  • sbest = s; ebest = e
  • k = 0
  • while k < max_iterations
    • on n processors, do nSubIterations iterations of serial SA
    • find processor with best overall (lowest energy) solution, sbest
    • redistribute sbest, ebest to all processors
    • k += nSubIterations

Implementation

We implemented the parallel simulated annealing algorithm in OpenSHA (http://www.opensha.org), a Java-based framework for Seismic Hazard Analysis which is being used to develop UCERF3. All benchmarking calculations presented here were calculated on the USC HPCC cluster (http://www.usc.edu/hpcc/). There are two levels of parallelization used: cluster lever, and node level. Each HPCC node has 8 processors, so threading is used to make use of all available processors. We determined that 4 threads/node was optimal, possibly due to the use of a parallel sparse matrix multiplication package (used to calculate misfit, and thus energy) becoming overloaded when used with 8 threads/node. For cluster level parallelization, we used MPJ Express (http://mpj-express.org/, Baker 2007), a Java-based MPI implementation.

Conclusions

The parallel simulated annealing algorithm clearly presents

Performance Graphs

Dataset ncal_constrained ncal_unconstrained state_constrained state_unconstrained
Energy vs Time ncal_constrained_e_vs_t.small.png ncal_unconstrained_e_vs_t.small.png state_constrained_e_vs_t.small.png state_unconstrained_e_vs_t.small.png
Avg Energy vs Time ncal_constrained_avg_e_vs_t.small.png ncal_unconstrained_avg_e_vs_t.small.png state_constrained_avg_e_vs_t.small.png state_unconstrained_avg_e_vs_t.small.png
Serial Time vs Parallel Time ncal_constrained_st_vs_pt.small.png ncal_unconstrained_st_vs_pt.small.png state_constrained_st_vs_pt.small.png state_unconstrained_st_vs_pt.small.png
Time Speedup vs Time ncal_constrained_spd_vs_t.small.png ncal_unconstrained_spd_vs_t.small.png state_constrained_spd_vs_t.small.png state_unconstrained_spd_vs_t.small.png
Std. Dev. vs Time ncal_constrained_std_dev_vs_t.small.png ncal_unconstrained_std_dev_vs_t.small.png state_constrained_std_dev_vs_t.small.png state_unconstrained_std_dev_vs_t.small.png
Improvement vs Energy ncal_constrained_imp_vs_t.small.png ncal_unconstrained_imp_vs_t.small.png state_constrained_imp_vs_t.small.png state_unconstrained_imp_vs_t.small.png