Distributed Simulated Annealing

Introduction

An inversion-based methodology is being developed for the 3rd Uniform California Earthquake Rupture Forecast (UCERF3) that simultaneously satisfies available slip-rate, paleoseismic event-rate, and magnitude-distribution constraints. Simulated Annealing (Kirkpatrick 1983) is a well defined method for solving optomization problems, but can be slow for problems with a large solution space, such as the UCERF3 "Grand Inversion." We present a parallel simulated annealing approach to solve for the rates of all ruptures that extend through the seismogenic thickness on major mapped faults in California.

Serial SA Algorithm

s = s0; e = E(s)
sbest = s; ebest = e
k = 0
while k < max_iterations:
- snew = neighbour(s)
- enew = E(snew)
- if P(e, enew, temperature) > random(); then
  - s = snew; e = enew
- if enew < ebest
  - sbest = snew; ebest = enew
- k++'

Parallel SA Algorithm

s = s0; e = E(s)
sbest = s; ebest = e
k = 0
while k < max_iterations
- on n processors, do nSubIterations iterations of serial SA
- find processor with best overall (lowest energy) solution, sbest
- redistribute sbest, ebest to all processors
- k += nSubIterations

Implementation

We implemented the parallel simulated annealing algorithm in OpenSHA (http://www.opensha.org), a Java-based framework for Seismic Hazard Analysis which is being used to develop UCERF3. All benchmarking calculations presented here were calculated on the USC HPCC cluster (http://www.usc.edu/hpcc/). There are two levels of parallelization used: cluster lever, and node level. Each HPCC node has 8 processors, so threading is used to make use of all available processors. We determined that 4 threads/node was optimal, possibly due to the use of a parallel sparse matrix multiplication package (used to calculate misfit, and thus energy) becoming overloaded when used with 8 threads/node. For cluster level parallelization, we used MPJ Express (http://mpj-express.org/, Baker 2007), a Java-based MPI implementation.

Performance Graphs

For the purposes of benchmarking, we present results for 4 different problems: Northern California (Well Constrained), Northern California (Poorly Constrained), All California (Well Constrained), and All California (Poorly Constrained). These 4 problems help demonstrate the affect of problem size and degree of constraints on the parallel speedup of the parallel SA algorithm.