CSEP Powell Center 2018

CSEP2 Challenges

Testing fault and simulation-based models Care about low prob. events Should we be testing something other nucleation? Is UCERF3-ETAS more valuable given the alternatives? Epistemic Uncertainties

Day 1

Reasenberg & Jones for USGS OAF

Rate of ≥M aftershocks at time t after mainshock with given magnitude
Improvements made to reasenberg & jones model to update generic parameters for California
Aftershock forecast for Mw ≥ 5 using improved R&J model
Automating aftershock forecasts for the US (in progress w/ code development challenges)
Moving past R&J in favor of ETAS, but could be useful for UCERF3-ETAS testing
Testability challenges:
- Overlapping, non-independent forecasts
- EQ prob. Dist. Not necessarily Poissonian
- Temporal forecasts with poorly defined spatial area
- R&J not great with substantial triggering (e.g., swarms)

Update on ETAS Forecasting

GUI interface to compute manual forecasts for external uses.
AIC prefers ETAS, however more complicated models not favored over simple 3 param model
Performs better than R&J
Issues with "supercriticality"
- Could solve by fitting mainshock separately
For global problem (and local): estimating magnitude of completeness and b-value
Need to limit supercriticality before OEF can be given to non-experts
"Similarity forecast" can be implemented as mask/failsafe to reduce surprises
- Defined as having "similar number of earthquakes in binned magnitudes"
- ETAS has ½ the surprise rate of R&J
- Or could be included in ensemble

Spatial ETAS

ETAS type models can zero in on aftershock hot-spots
- Using spatial omori type
- Need some spatial kernel
Moving from spatial rates to hazards
- Couple forecasts with GMPE to produce ground motions
- MMI regression-based models
Testability
- Challenges associated with incorporating hazard, because it eliminates some granularity in the forecast model
- Worried about Type II error

Time-Dependent Background seismicity

Particularly useful for earthquake swarms where background seismicity differs from 'normal' rate
- Could determine rate from previous swarms
- Potential issues:
  - Swarm duration
  - Considerable variability in swarm durations
  - Solving using "life expectancy" table, but limited data in southern California
  - Likely need some physical constraints on distribution functions
Using STETAS to use standard catalog without needing declustering
- Hydro mechanical models for stressing-rate can be used for induced seismicity
- rate-and-state framework
Testing strategy:
- Given a swarm; how long should we provide forecasts?

Day 2

UCERF3

Three models
- Time-independent
  - Fault-based approach that splits faults into subsections
  - Rate of rupture computed from Grand Inversion (see pub for details)
  - Add gridded off-fault seismicity
  - Logic tree used to capture the epistemic uncertainty
  - Fault participation most important. (ie., what is the prob. of a particular fault hosting an eq ≥ Mw)
- Time-dependent
  - Based on reed renewal statistics
  - Additional logic tree branches added
- ETAS
  - Ignoring faults gives rise to discrepancy between ETAS and elastic rebound type models
  - Combines UCERF3-TD with an ETAS model and produces synthetic catalog
  - Issues:
    - Variability of MFD throughout CA
    - GR not consistent with data
    - Main question: what is the conditional prob of observing large eq given an observed small eq?
    - Determined that elastic rebound necessary
    - Rate of small events not always consistent with rate of expected aftershocks
  - Operationalizable, but needs significant resources
    - Major question: Does it have value?
    - HayWired scenario recently published in SRL
    - Shows value if interested in severe shaking.
    - Faults important for low prob high ground motions.
  - Testing UCERF3-ETAS
    - Fault participation, not nucleation
    - Logic-tree branches
    - Elastic rebound/aperiodicity
    - Characteristic behavior near faults
    - Retrospective testing
    - Aleatory variability and sequence specific etas parameters

RSQSim Rate-State earthquake Simulator

Physics-based forecasting model based on R&S statistics
Using RSQsim ruptures in hazard assessments
- Need to create ucerf3 style ruptures
- Do RSQSim ruptures pass ucerf3 plausibility criteria?
  - Surprisingly, most RSQsim ruptures didn't pass the coulomb criterion. ~17.5% did not pass.
  - Multi-fault ruptures tend to agree between UCERF3 and RSQsim
RSQsim agrees well with UCERF3 without specific tuning of recurrence intervals
- Also: repeat times and short period ground motions
- but starts to disagree at longer spectral periods
interesting conditional probs:
- what is the prob of having 2 mw 7 on the Mojave within 1 week?
  - 4.5% in UCERF3 and 5.6% in RSQSim
Could look at two-point statistics pairwise difference between centroids
CSEP2 needs to be able to compare synthetic catalogs coming from RSQSim, ie., the ability to handle non-standard catalog sources.

Objectives & Challenges of Model Testing

Not enough data
Expand data sources in space and time
- i.e., incorp. South America
- retrospective testing experiments
- time-dependence
- extend authoritative data sets
issue 2: primitive models based on point-based models (ie, hypocentral/nucleation)
- need: simulation- , fault-, and physics-based models
- 3d models
- Need to account for non-poissonian & correlation structures
- [DJ] expand ucerf3 approach to new locations to understand principal components
Issue 3: need to build complete prob. models accounting for ontological error (unknown unknowns)
- How to go from logic tree to continuous pdf?
- Need to consider correlations within tree
Issue 4: testing fault-based models
- Lots of work to be done
- 'Turing' style testing can help… Page (2018)

Turing Tests of UCERF3

properly accounting for spatial diffusivity
Inter-sequence aftershock productivity
Foreshock and aftershock productivity as function of differential magnitude
Nearest neighbor separations
Analysis of clusters
Paleo hiatus
[NF] what are the possible explanations of hiatus?
Super cycle: extreme clustering over extreme period
CSEP testing should be more visual and include into CSEP2

Comparing R&J with ETAS

R&J -> ETAS
- Secondary sequences
- Faster adaptation
- Spatial forecasts
- Better estimates of the range of outcomes
CSEP 1day forecasts begin at start of day; lose some power
Challenges for USGS testing:
- Overlapping windows
- Update forecasts within window
- New RJ89 method no longer Poissonian
- ETAS forecasts are not Poissonian
- All violate CSEP testing methods
Likelihood based on Poisson distribution using standard statistical test
CSEP strategy:
- Poisson numbers based on RJ forecasts
Strategy:

Want dist. Of events in window that starts at t with duration d
Using R&J and ETAS to simulate "real" observations
Fit R&J to ETAS model: fit is the mode of the individual ETAS runs

Using overlapping windows.. CSEP assumptions on independence fail; solved by incorporating R&J simulations, forecast, and observations
ETAS observations fails bc they are not Poisson distributed
R&J fails when considering large magnitude main shock
Solution: all ETAS all the time
Takeaway: forecasts and observations must be consistent
Conclusion
- Non-Poissonian behavior
- Simulation based forecasts could address some issues
- Will also handle overlapping time windows
- RJ will fail assuming that the world is like ETAS
CSEP must take the forecasts in as simulations in order to test

Moving past Poisson

Poisson likelihood does not allow for clustering
Three-ways to eliminate:
- Adjusted likelihood simulations (in other words, remove likelihood)
- Normal approximation
- K-S test (could work with Turing style tests too)
N-test could be fixed by using negative binomial if dispersion is supplied
Accommodating simulation-based models better solution
Simulations can preserve space-time clustering
(CSEP needs to separate forecasting from modelling)
Consistency tests of simulation-based tests
- General approach is to compare statistic computed from simulated catalog with same statistics from observed catalog
- For example, inter-event time distribution
- P-values should be uniform on [0,1]
Interesting in improving models: looking at information gained
- Not obvious way to transparently estimate without gridding
- Standard CSEP information not restricted to Poisson
CSEP needs to be able to retroactively evaluate new tests, in other words become a testing center.
Moving past parametric distribution functions in favor of non-parametric simulation-based models
CSEP could support individual testing, will be more straightforward with agreed upon simulated catalog formats

Current CSEP Testing Approaches

Two approaches
- Establishing discrepancies/agreement with observations
  - E.g., number of earthquakes
  - Likelihood
- Comparing against other models
  - How much better or worse does one model do
Installed methods
- Number test: compares number of epicenter forecasts in bin
- Likelihood test: based on RELM model setup using mainshock and mainshock+aftershock class, mainshocks declusted using reasenberg. Assume that the model is the data generating process.
- Conditional likelihood test: set simulated = observed, and place sim eqs in bins according to relative rates
- Space test: collapses forecast into spatial domain. Integrate over magnitude and set simulated = observed. Use relative rates. Calculate simulated LL scores
  - One of the more interesting tests
- Magnitude test:
  - Same as S test but integrating over space.
  - Not particularly powerful, could be using a more powerful KS test.
- Information gain per earthquake: is "rate-corrected" information gain significane greater than 0.
  - Paired t-test (T-test)
    - Differences must be approximately independent
    - If differences are not iid normal, CLT!
  - Wilcoxon signed rank test (W-test)
    - Less powerful
    - Require symmetric data
  - Differences are proportional to error bounds, ie., large difference -> large error bounds
  - Error bounds only apply to forecast pairs
Residuals based:
- Residual: difference between local forecast and observation
- Raw residual: bin-wise difference between observed # and forecast
- Pearson residuals: normalized cell-wise difference between rate and observed number
- Deviance residuals: difference between (point-process) log-likelihood Scores.
Hit&miss tests
- Receiver-operating characteristic
- Molchan error diagram
- Area-skill-score
Goal is to evaluate different aspects of the forecasting model
Interpreting results requires going back to the models. A shortcoming of CSEP results is that not enough scientific discussion about the evaluations within context of models.
10 years of data collected by testing centers @ SCEC and GNS science. Need more results.
- 1 day forecasting for California.
- Over 200 eqs for New Zealand, lots of science to be done here. Kaikoura and Christchurch…
- Curated dataset valuable resource for the scientific community.
Next steps:
- Ensemble modeling
  - Marzochi et al, 2012
  - BMA: averaged based on previously best performing model, which makes it better for selecting models
  - Using additive or multiplicative models for combining models
- Simulated-based forecasts
  - See previous lecture from Morgan Page and David Rhoades
  - NSIM: number of target eqs
  - Earthquake rate distribution
  - Inter-event time distribution
  - Inter-event distance distribution
- External forecasts and predictions
  - Quakefinder type predictions.
  - No implemented evaluation method.
  - Critical for real-time forecast and predictions that are generated externally to CSEP platform.

Testing Fault Based Models

Association problem: mapping an eq to the ucerf3 fault model
Need to understand the stopping probabilities associated with stopping between fault segments
Proposed Procedure:

Separate linear fault into sections
For each section: estimate nucleation rate for eqs of interest
Estimate conditional probabilities of earthquake stopping
Evaluate frequency of eqs for each pair of section

Could rely on aftershocks to determine the extent of the rupture plane or maybe a finite-fault inversion
Fault participation is the most important this to test for fault-based models.
Null-hypothesis can be established using the following assumptions
- Known magnitude distribution
- Known scaling between mw and length
- Uniform distribution of rupture locations on fault

Considering Epistemic Uncertainty

Aleatory variability: inherent complexity or randomness in some physical process
Epistemic uncertainty: comes from our lack of knowledge about the process
An exchangeable event allow testing of Bayesian models in frequentist framework
Modifying experimental concept allows for ontological testing of exchangeable sequences
Hierarchy of uncert. Necessary for testing
- Aleatory variability -> frequentist
- Epistemic uncertainty -> Bayesian methods
- Ontological error -> rejection of 'ontological' null hypothesis
  - States that the true hazard is a realization of the extended experts distribution (EED)
  - Rejection of this null hypothesis implies ontological error
- Ontological tests requires 'experimental concept' that conditions the aleatory variability of the natural system.

Ensemble Modeling and Hybrid model

Definition: inferring the extended expert's distribution from the sample provided by any set of models that sample the epistemic uncertainty
Combining models allows the ensemble to perform only slightly worse than the best performing model. Useful when not sure what is 'correct' model.
Two main features:
- Describe epistemic uncertainty
- Significantly increases the skill of forecast
Hybrid models to increase information gain
- Additive hybrid
  - Best fitting linear combination of models
- Maximum hybrid
- Multiplicative hybrids
  - Exploit independent information ie., GPS and smoothed seismicity
Form hybrids for better information gain
Does not require a choice of best model but leverages all models
Could be a target for CSEP to help gain hybrid models -> improve collaborations

Event based testing

Could move to solution where you can make updates to forecast during the forecast period. Likely more important for long-term models
Might want to update forecast when event happens and when event doesn't happen.

Day 3

Milestones for UCERF3 Testing Program (U3-TP)

Goals:

- Verify
- Validate
- Valuate

2. Milestones " Develop infrastructure " Retrospective testing of UCERF3 " Prospective testing of UCERF3 " Comparatively evalulate U3 against empirical models and physics-based models 3. 5 types of testing o Exploratory testing: Turing o Comparatively: T and W tests o Mean-Hazard testing: null hypothesis significance testing o Ontological testing: including epistemic uncertainty o Sequence-specific testing: testing U3-ETAS against observed aftershock sequences " Guiding principal: all OEF models should be under continual prospective testing, put ETAS under operational testing. Need to find the value for UCERF3-ETAS testing. " Could possibly find out what aspects of U3 are superfluous and could simplify the model for using in other locales. " Slip-rate data are special for California data sets " Important to compare against physics-based earthquake simulations such as RSQsim to evaluate certain assumptions in the model. " U3 drivers of CSEP2 o Datasets used in prospective testing must be versioned and archived. Should include analyzed datasets and raw catalogs o Testing on simulated event catalogs " Benefits of U3-TP to the USGS o Scientific value o Software infrastructure

List of Possible milestones " CSEP1.0. What products would be useful? o Do we need to keep operationalizing CSEP1.0? " [Matt] Mistake to shutdown CSEP1.0 o New models necessary? " [Max] CSEP1 big achievement/success was incorporating new models. " [Ned] Should support new models but need to be more selective. o [Max] Wants to published the CA 1-day forecast results. o [Morgan] Value in providing CSEP1.0 data set publicly. o [Mike B.] Need some clear scientific findings and provocative about this. Need to find models that can be rejected and not worth pursuing. Need to curated in such a way that allows scientists to access and work. Would build community. o [Peter] Need products that are digestible by the public. " Simulation based testing o Methods: " ETAS, U3-ETAS, o Need: Process for defining timelines, and which models we would be evaluating. " Event-triggered/sequence specific testing " Comparative valuations o Turing tests o Verification o Inter-comparison of models " Modeling catalog completeness " Modeling epistemic uncertainties: Important, but IT challenges. " Fault/cell participation " [Ned] Testing usefulness! " Rupture association problem " Fault characteristics " U3-TD elastic-rebound testing " [Peter] Declustering work could help with the USGS. Lots of ambiguity on different declustering models. " [Phil] CSEP Should expose its methods so users could leverage the algorithm " [Mike B.] Valuation question most important. Should physics-based simulations be used for Hazard? " [Kevin] Errors need to be propagated moving forward! " [Phil] Need to figure out how the data will be provided to the scientist. " [All] Web based interface would be valuable.

Notes: Targeting publication for July/August issue of SRL. Have some time to develop the webpage. " Expect Press material surrounding the SRL release, so need to be prepared with digestible figures. " Schedule IT call with Peter, Ned, Phil, Bill, Max, Fabio

CSEP Powell Center 2018

Contents

CSEP2 Challenges

Day 1

Day 2

Day 3

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools