Distributed Simulated Annealing

From SCECpedia
Revision as of 23:00, 6 September 2011 by Kmilner (talk | contribs)
Jump to navigationJump to search

Introduction

An inversion-based methodology is being developed for the 3rd Uniform California Earthquake Rupture Forecast (UCERF3) that simultaneously satisfies available slip-rate, paleoseismic event-rate, and magnitude-distribution constraints. Simulated Annealing (Kirkpatrick 1983) is a well defined method for solving optomization problems, but can be slow for problems with a large solution space, such as the UCERF3 "Grand Inversion." We present a parallel simulated annealing approach to solve for the rates of all ruptures that extend through the seismogenic thickness on major mapped faults in California.

Serial SA Algorithm

  • s = s0; e = E(s)
  • sbest = s; ebest = e
  • k = 0
  • while k < max_iterations:
    • snew = neighbour(s)
    • enew = E(snew)
    • if P(e, enew, temperature) > random(); then
      • s = snew; e = enew
    • if enew < ebest
      • sbest = snew; ebest = enew
    • k++'

Parallel SA Algorithm

  • s = s0; e = E(s)
  • sbest = s; ebest = e
  • k = 0
  • while k < max_iterations
    • on n processors, do nSubIterations iterations of serial SA
    • find processor with best overall (lowest energy) solution, sbest
    • redistribute sbest, ebest to all processors
    • k += nSubIterations

Implementation

We implemented the parallel simulated annealing algorithm in OpenSHA (http://www.opensha.org), a Java-based framework for Seismic Hazard Analysis which is being used to develop UCERF3. All benchmarking calculations presented here were calculated on the USC HPCC cluster (http://www.usc.edu/hpcc/). There are two levels of parallelization used: cluster lever, and node level. Each HPCC node has 8 processors, so threading is used to make use of all available processors. We determined that 4 threads/node was optimal, possibly due to the use of a parallel sparse matrix multiplication package (used to calculate misfit, and thus energy) becoming overloaded when used with 8 threads/node. For cluster level parallelization, we used MPJ Express (http://mpj-express.org/, Baker 2007), a Java-based MPI implementation.

Performance Graphs

For the purposes of benchmarking, we present results for 4 different problems: Northern California (Well Constrained), Northern California (Poorly Constrained), All California (Well Constrained), and All California (Poorly Constrained). These 4 problems help demonstrate the affect of problem size and degree of constraints on the parallel speedup of the parallel SA algorithm.

Dataset ncal_constrained ncal_unconstrained state_constrained state_unconstrained
Energy vs Time ncal_constrained_e_vs_t.small.png ncal_unconstrained_e_vs_t.small.png state_constrained_e_vs_t.small.png state_unconstrained_e_vs_t.small.png
Avg Energy vs Time ncal_constrained_avg_e_vs_t.small.png ncal_unconstrained_avg_e_vs_t.small.png state_constrained_avg_e_vs_t.small.png state_unconstrained_avg_e_vs_t.small.png
Serial Time vs Parallel Time ncal_constrained_st_vs_pt.small.png ncal_unconstrained_st_vs_pt.small.png state_constrained_st_vs_pt.small.png state_unconstrained_st_vs_pt.small.png
Time Speedup vs Time ncal_constrained_spd_vs_t.small.png ncal_unconstrained_spd_vs_t.small.png state_constrained_spd_vs_t.small.png state_unconstrained_spd_vs_t.small.png
Std. Dev. vs Time ncal_constrained_std_dev_vs_t.small.png ncal_unconstrained_std_dev_vs_t.small.png state_constrained_std_dev_vs_t.small.png state_unconstrained_std_dev_vs_t.small.png
Improvement vs Energy ncal_constrained_imp_vs_t.small.png ncal_unconstrained_imp_vs_t.small.png state_constrained_imp_vs_t.small.png state_unconstrained_imp_vs_t.small.png

Conclusions

The parallel simulated annealing algorithm clearly improves upon the classical serial approach.