DirectSynth

From SCECpedia
Jump to navigationJump to search

DirectSynth is a CyberShake post-processing code authored by Scott Callaghan in early 2015. It provides an alternative to separate SGT extraction and seismogram synthesis/PSA/RotD jobs, and requires fewer SUs and less I/O, making it a more efficient choice for CyberShake runs at frequencies at or above 1 Hz.

Overview

At a high level, DirectSynth works by reading in SGTs across a large number of processors. Another set of processors works through a list of seismogram synthesis/PSA/RotD tasks, requesting the SGTs needed for synthesis from the processor(s) which have them in memory. Since the total quantity of SGTs needed to synthesize a seismogram may be larger than what can fit into a single processor's memory, the requests are divided up so as not to exceed 1 GB at a time. Since multiple rupture variations will use the same SGTs, as many rupture variations as can fit into a processor's memory will be synthesized at the same time. Output files (seismograms, PSA, RotD) are sent to a single process for writing. Whenever all the rupture variations for a rupture are completed, that file is fsync()ed and the source and rupture ID are recorded in a checkpoint file.

Details

The processes in DirectSynth have 1 of 4 roles:

  • Master (process rank 0).
    1. Reads in header information, broadcasts to everyone.
    2. Determines which SGT points go to which SGT handlers, broadcasts to everyone.
    3. While there is still work to do:
      1. Gathers data from workers and writes to files
      2. Updates checkpoint file as ruptures finish
  • SGT Handlers (processes 1 - <num SGT handlers-1>)
    1. Receive header information from master.
    2. Receive mapping of SGT points to handlers.
    3. Together, MPI-read in SGT files, with each handler reading its assigned section.
    4. While there is still work to do:
      1. Handle worker requests for SGTs by sending relevant header info followed by SGT data
  • Task Manager (process <num SGT handlers>)
    1. Read in rupture list which we are synthesizing seismograms for
    2. For each rupture in the list, determine how many tasks are needed to calculate all the rupture variations for that task. The limiting factor is memory per worker process, since (size of rupture variation)*num_rupture_variations + 1 GB sgt buffer < 1.8 GB. If the rupture is in the checkpoint file, it's already been calculated and isn't included from the task list.
    3. While there are still tasks in the task list:
      1. Handler worker requests for more work by sending a task description
  • Workers (process <num SGT handlers + 1> - end)

Communication patterns

Improvement over previous post-processing

Limitations