Difference between revisions of "Cvmbycols"
(25 intermediate revisions by 2 users not shown) | |||
Line 10: | Line 10: | ||
− | This SCEC version of cvmbycols is based on an older version originally developed by Tiankai Tu | + | This SCEC version of cvmbycols is based on an older version originally developed by Tiankai Tu and Ricardo Taborda from the Quake Group at CMU. |
+ | |||
+ | |||
+ | = Etree Summary = | ||
+ | |||
+ | In order to understand how cvmbycols works, some knowledge of the Etree data structure is required. The fundamental unit of measure in an Etree is the tick. This is a distance expressed in meters, and is derived from the length of the longest side of the geographic region: | ||
+ | <pre> | ||
+ | tick size = (max(len x, len y, len z) / 2^31) m | ||
+ | </pre> | ||
+ | In essence, the geographic region is divided up into a set of 2^31 ticks. | ||
+ | |||
+ | The funamental unit of data in an Etree is the octant. An octant contains an address (location code) and a payload (material properties). Octants are organized into a tree structure called an octree, where each node has up to eight children. Each level in this octree represents an octant size (alternatively: edge size, or resolution). Moving from level N to level N+1 in the octree increases resolution by 2x. Moving from level N to level N-1 reduces resolution by 2x. The resolution for a level can be determined with: | ||
+ | |||
+ | <pre> | ||
+ | res from level N = (max_len / 2^N) m | ||
+ | </pre> | ||
+ | |||
+ | |||
+ | The number of ticks in an octant of level N is: | ||
+ | |||
+ | <pre> | ||
+ | ticks for level N = 2^N | ||
+ | </pre> | ||
+ | |||
+ | |||
+ | And thus the octant size in level N is: | ||
+ | |||
+ | <pre> | ||
+ | octant size = (2^N * (max(len x, len y, len z) / 2^31)) m | ||
+ | </pre> | ||
+ | |||
+ | |||
+ | Given a particular resolution, the appropriate level to represent it in the etree can be determined with: | ||
+ | |||
+ | <pre> | ||
+ | level N from res = (int)(log(max_len/res)/log(2)) + 1 (rounded up to next integer) | ||
+ | </pre> | ||
+ | |||
+ | |||
+ | These relations are important in understanding the algorithm described in the following sections. | ||
Line 40: | Line 79: | ||
− | Using the above relation, the 3D space is decomposed into small columns, and the resolution in each column is tailored to the vs values found at each depth. | + | Using the above relation, the 3D space is decomposed into small columns, and the resolution in each column is tailored to the vs values found at each depth. The algorithm can be summarized as follows: |
* Divide region in set of x,y columns, where the length of each column equals its width. | * Divide region in set of x,y columns, where the length of each column equals its width. | ||
* For each column: | * For each column: | ||
− | ** Starting at depth = 0m, extract a 2D grid for the column at the resolution needed to support the max frequency, pounts/wavelength, and minimum Vs, with a depth that is one-half the current octant edge size. | + | ** Starting at depth = 0m, extract a 2D grid for the column at the resolution needed to support the max frequency, pounts/wavelength, and minimum Vs, with a depth that is one-half the current octant edge size (that is, Z axis is cell center as well). |
*** If the local minimum Vs is in this 2D grid allows for a lower resolution, and the current z value is divisible by the lower resolution, requery the CVM at the lower resolution. Otherwise, if the local min Vs in this grid needs a higher resolution, increase the resolution to what is required (up to the maximum resolution dictated by the configured global minimum Vs) and requery the CVM. | *** If the local minimum Vs is in this 2D grid allows for a lower resolution, and the current z value is divisible by the lower resolution, requery the CVM at the lower resolution. Otherwise, if the local min Vs in this grid needs a higher resolution, increase the resolution to what is required (up to the maximum resolution dictated by the configured global minimum Vs) and requery the CVM. | ||
− | *** Write this layer to | + | *** Write this layer to a flat file (a single file when run serially, or a file local to this core when run in parallel) |
− | *** Step the depth down by the resolution of this 2D grid and loop to the next layer. | + | *** Step the depth down by the resolution of this 2D grid and loop to the next layer in the column. |
** Loop to the next column | ** Loop to the next column | ||
+ | |||
+ | |||
+ | The cvmbycols software enforces a minimum and maximum resolution for octants. Given a certain set of input parameters, these resolutions can be derived as follows: | ||
+ | <pre> | ||
+ | If vs_min = 200 m/s, max_freq = 4.0 Hz, and ppwl = 8, max_dim_length = 180000.0 m, max_cellsize = 500.0 m, then | ||
+ | |||
+ | Maximum Resolution: | ||
+ | res = 200.0 / (8.0 * 4.0) = 6.25 m (minimum resolution needed to support Vs=200 m/s) | ||
+ | N = (log(180000.0/6.25)/log(2)) + 1 = 15 | ||
+ | max_res = 180000.0 / 2^15 ticks ~= 5.49 m | ||
+ | |||
+ | Therefore, maximum resolution (minimum edge size) for these parameters is 5.49 m. | ||
+ | |||
+ | Minimum Resolution: | ||
+ | res = 500.0 m (the maximum cell size to support) | ||
+ | N = (log(180000.0/500.0)/log(2)) + 1 = 9 | ||
+ | min_res = 180000.0 / 2^9 ticks ~= 351.56 m | ||
+ | |||
+ | Therefore, minimum resolution (maximum edge size) for these parameters is 351.56 m. | ||
+ | </pre> | ||
+ | * Minimum resolution represents a floor on the possible resolution of an etree. This is protection against very large octants from being formed by high Vs values. | ||
Line 54: | Line 114: | ||
Generation of an Etree in a parallel environment is challenging since the Euclid etree interface only allows serial write access to the etree file. Points may only be inserted, or they may be appended in key order, by a single writer. This makes etee construction and compaction time consuming. To circumvent this issue, a high performance etree merge/compaction algorithm has been developed. | Generation of an Etree in a parallel environment is challenging since the Euclid etree interface only allows serial write access to the etree file. Points may only be inserted, or they may be appended in key order, by a single writer. This makes etee construction and compaction time consuming. To circumvent this issue, a high performance etree merge/compaction algorithm has been developed. | ||
− | Simply put, it is essentially a merge sort performed on 2^N small | + | Simply put, it is essentially a merge sort performed on 2^N small flat files that contain all of the points needed for the final etree. The flat files contain material properties previously extracted from UCVM. This extraction may be done serially or in parallel as described in the sections below. Once these small, local flat files exist on disk, an MPI job of 2^N cores is then able to sort these octants by their keys in O(n*log n) time. Once all the octants are sorted, a single core may then open the final etree/flat file for writing in append mode, and simply stream the sorted octants to disk efficiently. |
− | Conceptually, the cores in this 2^N MPI job are organized into a tree-like structure with each node containing two children. The bottom layer of this tree has nodes that each do a pre-order traversal on two | + | Conceptually, the cores in this 2^N MPI job are organized into a tree-like structure with each node containing two children. The bottom layer of this tree has nodes that each do a pre-order traversal on two flat files. They sort the combined octants read from each etree in pre-order, and then pass them up to a parent node. The nodes from the top layer to the second to bottom layer each receive locally sorted lists of octants from their two children and once again sort the combined list of octants in pre-order. This newly sorted list is passed upward in this manner until the root node of the tree receives two locally sorted halves of the entire set of etree points. It then sorts them, and appends the fully sorted stream of octants to the final file. |
Line 66: | Line 126: | ||
== Cvmbycols Serial == | == Cvmbycols Serial == | ||
− | This program runs as a single-core command-line application. It works by dividing the etree region into columns, then extracts each column from the underlying CVM and inserts the octants into the final Etree | + | This program runs as a single-core command-line application. It works by dividing the etree region into columns, then extracts each column from the underlying CVM and inserts the octants into the final Etree. |
== Cvmbycols MPI == | == Cvmbycols MPI == | ||
− | This program is actually | + | This program is actually three separate MPI programs. One program performs a parallel extraction from UCVM and saves the octants in set of files, the second locally sorts the octants in each file, and the last performs a merge to form the final etree. These are very high performance applications, capable of generating large etrees. |
=== Cvmbycols-extract-MPI === | === Cvmbycols-extract-MPI === | ||
− | Divides the etree region into | + | Divides the etree region into C columns for extraction. This is an embarrassingly parallel operation. A dispatcher (rank 0) farms out each column to a worker in a pool of N cores for extraction. Each worker queries UCVM for the points in its column and writes a flat-file formatted etree. After program execution, there are N sub-etree files, each locally unsorted. The extractor must be run on 2^Y + 1 cores where Y>0 and (2^Y) < C. The output flat file format is a list of octants(24 byte addr, 16 byte key, 12 byte payload) in arbitrary Z-order. |
+ | |||
+ | Since the number of points in a column depends on the minimum Vs values within that column, some columns will have high octant counts and others will have very low octant counts. Having sub-etrees that vary greatly in size is not optimal for the sorting operations that follow, so cvmbycols-extract-MPI implements a simple octant balancing mechanism. When a worker has extracted more than X octants (the default 16M octants), it reports to the dispatcher that it cannot extract any more columns and terminates. This strategy approximately balances the sub-etrees so that they may be loaded into memory by cvmbycols-sort-MPI. In the case of very large extractions where the dispatcher reports that all workers have reported they are full yet columns remain to be extracted, increase the job size by a factor of 2 until there is room for all the columns. | ||
− | + | ||
+ | === Cvmbycols-sort-MPI === | ||
+ | |||
+ | Sorts the sub-etrees produced by cvmbycols-extract-MPI so that each file is in local pre-order (Z-order). Again, the is an embarrassingly parallel operation. Each rank in the job reads in one of the sub-etrees produced by the previous program, sorts the octants in Z-order, and writes the sorted octants to a new sub-etree. The sorter must be run on 2^Y cores where Y>0. The worker pool must be large enough to allow each worker to load all the octants from its assigned file into memory. By default, this octant limit is 20M octants. If a rank reports that the size of the sub-etree exceeds memory capacity, the 20M buffer size constant may be increased if memory allows, or alternatively, cvmbycols-extract-MPI may be rerun with a larger job size to reduce the number of octants per file. | ||
=== Cvmbycols-merge-MPI === | === Cvmbycols-merge-MPI === | ||
− | Merges N | + | Merges N locally sorted etrees in flat file format into a final, compacted etree. This is essentially a merge sort on the keys from the addresses read from the local files. The cores at the lowest level of the merge tree each read in octants from two flat files in pre-order, merge sort the two sets of addresses, then pass the locally sorted list of addresses to a parent node for additional merging. This proceeds until the points rise to rank 1 which has a completely sorted list of etree addresses. Rank 0 takes this sorted list and performs a transactional append on the final Etree. The merger must be run on 2^N cores. |
− | The program | + | The program reads in input files that are in flat file format. In can output a merged Etree in either Etree format or flat file format. Although, due to space considerations, it strips the output flat file format to a pre-order list ot octants(16 byte key, 12 byte payload). The missing addr field is redundant and can be regenerated from the key field. |
=== ff2etree === | === ff2etree === | ||
− | This utility is used to convert the flat file produced by | + | This utility is used to convert the flat file produced by Cvmbycols in flat file mode into Etree format. |
+ | |||
+ | |||
+ | == ecoalesce == | ||
+ | |||
+ | This utility optimizes an Etree be coalescing sequences of adjacent octants into single octants of a higher level when a set of comparison criteria is met. This serves to reduce the size of the Etree at the cost of resolution. Currently, ecoalesce is setup to combine eight adjacent octants when: | ||
+ | |||
+ | <pre> | ||
+ | Vp is equal | ||
+ | Vs is equal | ||
+ | Density is equal | ||
+ | </pre> | ||
+ | |||
+ | Execute the utility with: | ||
+ | |||
+ | <pre> | ||
+ | % ecoalesce etree.in etree.out | ||
+ | </pre> | ||
Line 109: | Line 191: | ||
= User Guide = | = User Guide = | ||
+ | |||
+ | |||
+ | == Memory Considerations == | ||
+ | |||
+ | * cvmbycols_extract_MPI: Requires 400MB per core for CVM-S extractions, 1500MB per core for CVM-H | ||
+ | * cvmbycols_sort_MPI: Requires 1200MB per core (FLATFILE_MAX_OCTANTS constant in cvmbycols_sort_MPI.c) | ||
+ | * cvmbycols_merge_MPI: Requires 500MB per core (CBC_OCTBUF_WRITE_SIZE, CBC_OCTBUF_READ_SIZE in cvmbycols_merge_MPI.c) | ||
+ | |||
== Configuring Cvmbycols == | == Configuring Cvmbycols == | ||
Line 146: | Line 236: | ||
# Etree parameters and info | # Etree parameters and info | ||
− | title= | + | title=ChinoHills4Hz8ppwl200ms |
author=P_Small_and_R_Taborda | author=P_Small_and_R_Taborda | ||
date=03/2011 | date=03/2011 | ||
Line 186: | Line 276: | ||
aprun -n 1025 -S 4 ./cvmbycols-extract-MPI -f ../../conf/kraken/chino_cvmh_4hz_200ms.conf | aprun -n 1025 -S 4 ./cvmbycols-extract-MPI -f ../../conf/kraken/chino_cvmh_4hz_200ms.conf | ||
+ | |||
+ | aprun -n 1024 -S 4 ./cvmbycols-sort-MPI -f ../../conf/kraken/chino_cvmh_4hz_200ms.conf | ||
aprun -n 1024 -S 4 ./cvmbycols-merge-MPI -f ../../conf/kraken/chino_cvmh_4hz_200ms.conf | aprun -n 1024 -S 4 ./cvmbycols-merge-MPI -f ../../conf/kraken/chino_cvmh_4hz_200ms.conf | ||
Line 200: | Line 292: | ||
Special consideration must be made when running these codes on a high-performance filesystem such as Lustre. | Special consideration must be made when running these codes on a high-performance filesystem such as Lustre. | ||
− | * | + | * Update all Etree *.h/*.c files to include (#include "iobuf.h") following all stdio.h includes. Or apply the patch given below. |
+ | * Execute: module add iobuf | ||
+ | * Set compiler flag USE_IOBUF_MACROS in Etree Makefile | ||
+ | * Recompile Euclide Etree | ||
+ | * Recompile cvmbycols | ||
+ | * Set IOBUF_PARAMS environment variable: | ||
<pre> | <pre> | ||
IOBUF_PARAMS='*:count=2:size=64M:verbose';export IOBUF_PARAMS; | IOBUF_PARAMS='*:count=2:size=64M:verbose';export IOBUF_PARAMS; | ||
</pre> | </pre> | ||
− | * | + | |
− | * | + | * Problems persists with writing many (1-2K cores) Etrees at once on a Lustre filesystem. |
+ | |||
+ | === iobuf.h C header file === | ||
+ | |||
+ | <pre> | ||
+ | #ifndef IOBUF_H | ||
+ | #define IOBUF_H | ||
+ | |||
+ | /* | ||
+ | * iobuf - I/O buffering layer. | ||
+ | * Copyright 2006 Cray Inc. All rights reserved. | ||
+ | */ | ||
+ | |||
+ | #if defined(USE_IOBUF_MACROS) | ||
+ | #define lseek(f,o,w) iobuf_lseek(f,o,w) | ||
+ | #define read(f,b,c) iobuf_read(f,b,c) | ||
+ | #define write(f,b,c) iobuf_write(f,b,c) | ||
+ | #define pread(f,b,c,o) iobuf_pread(f,b,c,o) | ||
+ | #define pwrite(f,b,c,o) iobuf_pwrite(f,b,c,o) | ||
+ | #define readv(f,v,c) iobuf_readv(f,v,c) | ||
+ | #define writev(f,v,c) iobuf_writev(f,v,c) | ||
+ | #define open iobuf_open | ||
+ | #define creat(p,m) iobuf_creat(p,m) | ||
+ | #define close(f) iobuf_close(f) | ||
+ | #define ftruncate(f,l) iobuf_ftruncate(f,l) | ||
+ | #define dup(f) iobuf_dup(f) | ||
+ | #define dup2(o,n) iobuf_dup2(o,n) | ||
+ | #define fsync(f) iobuf_fsync(f) | ||
+ | #define fdatasync(f) iobuf_fdatasync(f) | ||
+ | #define lseek64(f,o,w) iobuf_lseek(f,o,w) | ||
+ | #define ftruncate64(f,l) iobuf_ftruncate(f,l) | ||
+ | #define pread64(f,b,c,o) iobuf_pread(f,b,c,o) | ||
+ | #define pwrite64(f,b,c,o) iobuf_pwrite(f,b,c,o) | ||
+ | #endif | ||
+ | |||
+ | #ifdef __cplusplus | ||
+ | extern "C" { | ||
+ | #endif | ||
+ | |||
+ | #include <sys/types.h> | ||
+ | struct iovec; | ||
+ | |||
+ | /* Replacements for system call namesakes. */ | ||
+ | extern off_t iobuf_lseek(int fd, off_t offset, int whence); | ||
+ | extern ssize_t iobuf_read(int fd, void *buf, size_t count); | ||
+ | extern ssize_t iobuf_write(int fd, const void *buf, size_t count); | ||
+ | extern ssize_t iobuf_readv(int fd, const struct iovec *vector, int count); | ||
+ | extern ssize_t iobuf_writev(int fd, const struct iovec *vector, int count); | ||
+ | extern ssize_t iobuf_pread(int fd, void *buf, size_t count, off_t offset); | ||
+ | extern ssize_t iobuf_pwrite(int fd, const void *buf, size_t count, off_t offset); | ||
+ | extern int iobuf_open(const char *pathname, int flags, ... ); | ||
+ | extern int iobuf_creat(const char *pathname, mode_t mode); | ||
+ | extern int iobuf_close(int fd); | ||
+ | extern int iobuf_ftruncate(int fd, off_t length); | ||
+ | extern int iobuf_dup(int oldfd); | ||
+ | extern int iobuf_dup2(int oldfd, int newfd); | ||
+ | extern int iobuf_fsync(int fd); | ||
+ | extern int iobuf_fdatasync(int fd); | ||
+ | |||
+ | /* Flush dirty buffers to disk. */ | ||
+ | extern int iobuf_flush(int fd); | ||
+ | |||
+ | /* Augment IOBUF_PARAMS with a file name pattern and parameters string. */ | ||
+ | extern int iobuf_file_params(const char *fname, const char *params); | ||
+ | |||
+ | #ifdef __cplusplus | ||
+ | } | ||
+ | #endif | ||
+ | |||
+ | #endif | ||
+ | </pre> | ||
+ | |||
+ | |||
+ | === IOBUF Etree Patch === | ||
+ | |||
+ | <pre> | ||
+ | Index: euclid3/libsrc/buffer.c | ||
+ | =================================================================== | ||
+ | RCS file: /afs/cs.cmu.edu/project/euclid/CVSROOT/euclid3/libsrc/buffer.c,v | ||
+ | retrieving revision 1.12 | ||
+ | diff -u -r1.12 buffer.c | ||
+ | --- euclid3/libsrc/buffer.c 2 Dec 2003 22:24:32 -0000 1.12 | ||
+ | +++ euclid3/libsrc/buffer.c 18 Mar 2011 07:12:49 -0000 | ||
+ | @@ -24,6 +24,51 @@ | ||
+ | |||
+ | #include "buffer.h" | ||
+ | |||
+ | +/* use IOBUF library on Cray XT3/5 platforms */ | ||
+ | +#ifdef USE_IOBUF | ||
+ | +#undef lseek | ||
+ | +#undef read | ||
+ | +#undef write | ||
+ | +#undef readv | ||
+ | +#undef writev | ||
+ | +#undef pread | ||
+ | +#undef pwrite | ||
+ | +#undef open | ||
+ | +#undef creat | ||
+ | +#undef close | ||
+ | +#undef ftruncate | ||
+ | +#undef dup | ||
+ | +#undef dup2 | ||
+ | +#undef fsync | ||
+ | +#undef fdatasync | ||
+ | + | ||
+ | +#if !defined USE_IOBUF_LOCAL_MACROS | ||
+ | + | ||
+ | +#define USE_IOBUF_MACROS | ||
+ | +#include <iobuf.h> | ||
+ | + | ||
+ | +#else /* !defined USE_IOBUF_LOCAL_MACROS */ | ||
+ | + | ||
+ | +#define lseek iobuf_lseek | ||
+ | +#define read iobuf_read | ||
+ | +#define write iobuf_write | ||
+ | +#define readv iobuf_readv | ||
+ | +#define writev iobuf_writev | ||
+ | +#define pread iobuf_pread | ||
+ | +#define pwrite iobuf_pwrite | ||
+ | +#define open iobuf_open | ||
+ | +#define creat iobuf_creat | ||
+ | +#define close iobuf_close | ||
+ | +#define ftruncate iobuf_ftruncate | ||
+ | +#define dup iobuf_dup | ||
+ | +#define dup2 iobuf_dup2 | ||
+ | +#define fsync iobuf_fsync | ||
+ | +#define fdatasync iobuf_fdatasync | ||
+ | +#endif /* !defined USE_IOBUF_LOCAL_MACROS */ | ||
+ | + | ||
+ | +#endif /* USE_IOBUF */ | ||
+ | + | ||
+ | + | ||
+ | /* various offsets for quick pointer manipulation */ | ||
+ | static int lruln_offset, hashln_offset; | ||
+ | </pre> | ||
Line 224: | Line 455: | ||
= References = | = References = | ||
+ | #'''Ricardo Taborda, Julio Lopez, David O'Hallaron, Tiankai Tu and Jacobo Bielak (2007)''': A review of the current approach to CVM-Etrees", SCEC Annual Meeting, Palm Springs, CA, USA, September 8–12 [http://www.ce.cmu.edu/~rtaborda/publications/html/abstract_2007b.html (Abstract and poster)]. | ||
#'''Tiankai Tu, David R. O'Hallaron, and Julio Lopez (2003)''': The Etree Library: A System for Manipulating Large Octrees on Disk. Technical Report CMU-CS-03-174, School of Computer Science, Carnegie Mellon University, July, 2003. ( ps, pdf, ppt, bib ). | #'''Tiankai Tu, David R. O'Hallaron, and Julio Lopez (2003)''': The Etree Library: A System for Manipulating Large Octrees on Disk. Technical Report CMU-CS-03-174, School of Computer Science, Carnegie Mellon University, July, 2003. ( ps, pdf, ppt, bib ). | ||
#'''Tiankai Tu, David R. O'Hallaron, and Julio Lopez (2002)''': Etree: A Database-Oriented Method for Generating Large Octree Meshes. In Proceedings of the Eleventh International Meshing Roundtable (Ithaca, NY, Sept. 2002), pp. 127-138. ( ps, pdf, ppt, bib ) | #'''Tiankai Tu, David R. O'Hallaron, and Julio Lopez (2002)''': Etree: A Database-Oriented Method for Generating Large Octree Meshes. In Proceedings of the Eleventh International Meshing Roundtable (Ithaca, NY, Sept. 2002), pp. 127-138. ( ps, pdf, ppt, bib ) | ||
#'''David R. O'Hallaron (2003)''': Database Methods for Scientific Computing, Northwestern University, Feb, 2003. This is an overview of the Euclid project. ( ppt, MPEG animation on title slide ) | #'''David R. O'Hallaron (2003)''': Database Methods for Scientific Computing, Northwestern University, Feb, 2003. This is an overview of the Euclid project. ( ppt, MPEG animation on title slide ) |
Latest revision as of 17:35, 29 March 2011
Contents
Overview
Cvmbycols is a set of codes for producing compacted Etrees containing 3D geographic regions with material properties drawn from a community velocity model. Serial and MPI versions of the program are provided. The tool makes use of several existing software packages, including:
- CMU Euclid Etree Library, available from http://www.cs.cmu.edu/~euclid/
- Unified California Velocity Model, available from UCVM
- Community Velocity Models, including one or more of these:
- CVM-H, available from CVM-H
- CVM-S, available from CVM-S
- USGS CenCal, available from http://earthquake.usgs.gov/regional/nca/3Dgeologic/
This SCEC version of cvmbycols is based on an older version originally developed by Tiankai Tu and Ricardo Taborda from the Quake Group at CMU.
Etree Summary
In order to understand how cvmbycols works, some knowledge of the Etree data structure is required. The fundamental unit of measure in an Etree is the tick. This is a distance expressed in meters, and is derived from the length of the longest side of the geographic region:
tick size = (max(len x, len y, len z) / 2^31) m
In essence, the geographic region is divided up into a set of 2^31 ticks.
The funamental unit of data in an Etree is the octant. An octant contains an address (location code) and a payload (material properties). Octants are organized into a tree structure called an octree, where each node has up to eight children. Each level in this octree represents an octant size (alternatively: edge size, or resolution). Moving from level N to level N+1 in the octree increases resolution by 2x. Moving from level N to level N-1 reduces resolution by 2x. The resolution for a level can be determined with:
res from level N = (max_len / 2^N) m
The number of ticks in an octant of level N is:
ticks for level N = 2^N
And thus the octant size in level N is:
octant size = (2^N * (max(len x, len y, len z) / 2^31)) m
Given a particular resolution, the appropriate level to represent it in the etree can be determined with:
level N from res = (int)(log(max_len/res)/log(2)) + 1 (rounded up to next integer)
These relations are important in understanding the algorithm described in the following sections.
Algorithm Description
This program uses several algorithmic approaches to increase extraction speed and minimize the amount of disk space required to store the Etree. These are described in the following sections.
Gridding of Geographic Coordinates
Gridding of the geographic bounding box is performed using bilinear interpolation of the four corners specified in the configuration file. No map projection is used. Grid points are located at cell center.
Optimized Etree Resolution Based on Frequency, PPWL, and Minimum Vs
One of the strengths of the Etree (octree) representation is that resolution can be adjusted so that a particular region is sampled at a lower/higher interval than surrounding areas. You are able to achieve high resolution in just those places where it is needed, and the rest of the region can be sampled at a much lower rez thereby saving considerable disk space.
With this flexibility comes the difficulty of determining what resolution is needed at all points within a meshing region to support Computational Seismology simulations. Cvmbycols uses an algorithm to deduce this resolution based off of three input parameters: maximum frequency to support, the number of points per wavelength that is desired, and the minimum Vs to support. The following relation is used to derive the resolution for a region:
res = vs_min / (ppwl * max_freq) where: res: resolution for the region with vs_min, in meters vs_min: Minimum Vs found in region, in meters/sec ppwl: desired points per wavelength max_freq: maximum frequency to support, in Hz
Using the above relation, the 3D space is decomposed into small columns, and the resolution in each column is tailored to the vs values found at each depth. The algorithm can be summarized as follows:
- Divide region in set of x,y columns, where the length of each column equals its width.
- For each column:
- Starting at depth = 0m, extract a 2D grid for the column at the resolution needed to support the max frequency, pounts/wavelength, and minimum Vs, with a depth that is one-half the current octant edge size (that is, Z axis is cell center as well).
- If the local minimum Vs is in this 2D grid allows for a lower resolution, and the current z value is divisible by the lower resolution, requery the CVM at the lower resolution. Otherwise, if the local min Vs in this grid needs a higher resolution, increase the resolution to what is required (up to the maximum resolution dictated by the configured global minimum Vs) and requery the CVM.
- Write this layer to a flat file (a single file when run serially, or a file local to this core when run in parallel)
- Step the depth down by the resolution of this 2D grid and loop to the next layer in the column.
- Loop to the next column
- Starting at depth = 0m, extract a 2D grid for the column at the resolution needed to support the max frequency, pounts/wavelength, and minimum Vs, with a depth that is one-half the current octant edge size (that is, Z axis is cell center as well).
The cvmbycols software enforces a minimum and maximum resolution for octants. Given a certain set of input parameters, these resolutions can be derived as follows:
If vs_min = 200 m/s, max_freq = 4.0 Hz, and ppwl = 8, max_dim_length = 180000.0 m, max_cellsize = 500.0 m, then Maximum Resolution: res = 200.0 / (8.0 * 4.0) = 6.25 m (minimum resolution needed to support Vs=200 m/s) N = (log(180000.0/6.25)/log(2)) + 1 = 15 max_res = 180000.0 / 2^15 ticks ~= 5.49 m Therefore, maximum resolution (minimum edge size) for these parameters is 5.49 m. Minimum Resolution: res = 500.0 m (the maximum cell size to support) N = (log(180000.0/500.0)/log(2)) + 1 = 9 min_res = 180000.0 / 2^9 ticks ~= 351.56 m Therefore, minimum resolution (maximum edge size) for these parameters is 351.56 m.
- Minimum resolution represents a floor on the possible resolution of an etree. This is protection against very large octants from being formed by high Vs values.
High-performance Etree merging and compaction
Generation of an Etree in a parallel environment is challenging since the Euclid etree interface only allows serial write access to the etree file. Points may only be inserted, or they may be appended in key order, by a single writer. This makes etee construction and compaction time consuming. To circumvent this issue, a high performance etree merge/compaction algorithm has been developed.
Simply put, it is essentially a merge sort performed on 2^N small flat files that contain all of the points needed for the final etree. The flat files contain material properties previously extracted from UCVM. This extraction may be done serially or in parallel as described in the sections below. Once these small, local flat files exist on disk, an MPI job of 2^N cores is then able to sort these octants by their keys in O(n*log n) time. Once all the octants are sorted, a single core may then open the final etree/flat file for writing in append mode, and simply stream the sorted octants to disk efficiently.
Conceptually, the cores in this 2^N MPI job are organized into a tree-like structure with each node containing two children. The bottom layer of this tree has nodes that each do a pre-order traversal on two flat files. They sort the combined octants read from each etree in pre-order, and then pass them up to a parent node. The nodes from the top layer to the second to bottom layer each receive locally sorted lists of octants from their two children and once again sort the combined list of octants in pre-order. This newly sorted list is passed upward in this manner until the root node of the tree receives two locally sorted halves of the entire set of etree points. It then sorts them, and appends the fully sorted stream of octants to the final file.
Programs
The package provides a number of programs and utilities. The Cvmbycols etree creator is described below:
Cvmbycols Serial
This program runs as a single-core command-line application. It works by dividing the etree region into columns, then extracts each column from the underlying CVM and inserts the octants into the final Etree.
Cvmbycols MPI
This program is actually three separate MPI programs. One program performs a parallel extraction from UCVM and saves the octants in set of files, the second locally sorts the octants in each file, and the last performs a merge to form the final etree. These are very high performance applications, capable of generating large etrees.
Cvmbycols-extract-MPI
Divides the etree region into C columns for extraction. This is an embarrassingly parallel operation. A dispatcher (rank 0) farms out each column to a worker in a pool of N cores for extraction. Each worker queries UCVM for the points in its column and writes a flat-file formatted etree. After program execution, there are N sub-etree files, each locally unsorted. The extractor must be run on 2^Y + 1 cores where Y>0 and (2^Y) < C. The output flat file format is a list of octants(24 byte addr, 16 byte key, 12 byte payload) in arbitrary Z-order.
Since the number of points in a column depends on the minimum Vs values within that column, some columns will have high octant counts and others will have very low octant counts. Having sub-etrees that vary greatly in size is not optimal for the sorting operations that follow, so cvmbycols-extract-MPI implements a simple octant balancing mechanism. When a worker has extracted more than X octants (the default 16M octants), it reports to the dispatcher that it cannot extract any more columns and terminates. This strategy approximately balances the sub-etrees so that they may be loaded into memory by cvmbycols-sort-MPI. In the case of very large extractions where the dispatcher reports that all workers have reported they are full yet columns remain to be extracted, increase the job size by a factor of 2 until there is room for all the columns.
Cvmbycols-sort-MPI
Sorts the sub-etrees produced by cvmbycols-extract-MPI so that each file is in local pre-order (Z-order). Again, the is an embarrassingly parallel operation. Each rank in the job reads in one of the sub-etrees produced by the previous program, sorts the octants in Z-order, and writes the sorted octants to a new sub-etree. The sorter must be run on 2^Y cores where Y>0. The worker pool must be large enough to allow each worker to load all the octants from its assigned file into memory. By default, this octant limit is 20M octants. If a rank reports that the size of the sub-etree exceeds memory capacity, the 20M buffer size constant may be increased if memory allows, or alternatively, cvmbycols-extract-MPI may be rerun with a larger job size to reduce the number of octants per file.
Cvmbycols-merge-MPI
Merges N locally sorted etrees in flat file format into a final, compacted etree. This is essentially a merge sort on the keys from the addresses read from the local files. The cores at the lowest level of the merge tree each read in octants from two flat files in pre-order, merge sort the two sets of addresses, then pass the locally sorted list of addresses to a parent node for additional merging. This proceeds until the points rise to rank 1 which has a completely sorted list of etree addresses. Rank 0 takes this sorted list and performs a transactional append on the final Etree. The merger must be run on 2^N cores.
The program reads in input files that are in flat file format. In can output a merged Etree in either Etree format or flat file format. Although, due to space considerations, it strips the output flat file format to a pre-order list ot octants(16 byte key, 12 byte payload). The missing addr field is redundant and can be regenerated from the key field.
ff2etree
This utility is used to convert the flat file produced by Cvmbycols in flat file mode into Etree format.
ecoalesce
This utility optimizes an Etree be coalescing sequences of adjacent octants into single octants of a higher level when a set of comparison criteria is met. This serves to reduce the size of the Etree at the cost of resolution. Currently, ecoalesce is setup to combine eight adjacent octants when:
Vp is equal Vs is equal Density is equal
Execute the utility with:
% ecoalesce etree.in etree.out
Defining properties of a CVM Etree
The following properties are needed in order to produce a CVM Etree:
- Bounding box of geographic region in lon,lat coordinates
- Extents of Etree volume in 3D, in km
- Number of columns in x,y axis to partition 3D space. The resulting columns must be square along the x-y axis.
- Frequency limit to support in Hz
- Number of points per wavelength to support
- Minimum Vs to support
- Maximum octant size to allow
- Name of the output etree file and its format (Etree or flat files are supported)
- CVM to use for material properties (from UCVM)
User Guide
Memory Considerations
- cvmbycols_extract_MPI: Requires 400MB per core for CVM-S extractions, 1500MB per core for CVM-H
- cvmbycols_sort_MPI: Requires 1200MB per core (FLATFILE_MAX_OCTANTS constant in cvmbycols_sort_MPI.c)
- cvmbycols_merge_MPI: Requires 500MB per core (CBC_OCTBUF_WRITE_SIZE, CBC_OCTBUF_READ_SIZE in cvmbycols_merge_MPI.c)
Configuring Cvmbycols
An example configuration file follows:
# Domain corners coordinates (degrees): proj=geo-bilinear lon_0=-119.288842 lat_0=34.120549 lon_1=-118.354016 lat_1=35.061096 lon_2=-117.780976 lat_2=33.096503 lon_3=-116.846030 lat_3=34.025873 # Domain dimensions (meters): x-size=180000.0000 y-size=135000.0000 z-size=61875.0000 # Blocks partition parameters: nx=256 ny=192 # Max freq, points per wavelength max_freq=4.0 ppwl=8.0 # Etree min/max resolution as power of 2 max_cellsize=500.0 # Etree parameters and info title=ChinoHills4Hz8ppwl200ms author=P_Small_and_R_Taborda date=03/2011 outputfile=/lustre/scratch/patricks/cvmbycols/trunk/src/cvmbycols/etree_cvmh_chino_4hz_200ms.e format=etree # CVM to use #cvm=cvms cvm=cvmh ucvmconf=../../conf/kraken/ucvm.conf # Min Vs, Vp vs_min=200.0 # Scratch dir scratch=/lustre/scratch/patricks/tmp
Running Cvmbycols MPI
The following PBS script for NICS Kraken illustrates how cvmbycols may be submitted to the batch scheduler:
#!/bin/sh #PBS -q medium #PBS -l size=1548 #PBS -l walltime=08:00:00 #PBS -o /lustre/scratch/${USER}/cvmbycols/trunk/pbs/kraken/chino_cvmh_4hz_200ms.out #PBS -e /lustre/scratch/${USER}/cvmbycols/trunk/pbs/kraken/chino_cvmh_4hz_200ms.err #PBS -V HOME_DIR=/lustre/scratch/${USER}/cvmbycols/trunk/src/cvmbycols cd ${HOME_DIR} echo "Starting jobs" date aprun -n 1025 -S 4 ./cvmbycols-extract-MPI -f ../../conf/kraken/chino_cvmh_4hz_200ms.conf aprun -n 1024 -S 4 ./cvmbycols-sort-MPI -f ../../conf/kraken/chino_cvmh_4hz_200ms.conf aprun -n 1024 -S 4 ./cvmbycols-merge-MPI -f ../../conf/kraken/chino_cvmh_4hz_200ms.conf echo "Jobs done" date exit 0
Note that one additional core is needed as a dispatcher for the extractor.
Technical Note for Lustre Filesystems
Special consideration must be made when running these codes on a high-performance filesystem such as Lustre.
- Update all Etree *.h/*.c files to include (#include "iobuf.h") following all stdio.h includes. Or apply the patch given below.
- Execute: module add iobuf
- Set compiler flag USE_IOBUF_MACROS in Etree Makefile
- Recompile Euclide Etree
- Recompile cvmbycols
- Set IOBUF_PARAMS environment variable:
IOBUF_PARAMS='*:count=2:size=64M:verbose';export IOBUF_PARAMS;
- Problems persists with writing many (1-2K cores) Etrees at once on a Lustre filesystem.
iobuf.h C header file
#ifndef IOBUF_H #define IOBUF_H /* * iobuf - I/O buffering layer. * Copyright 2006 Cray Inc. All rights reserved. */ #if defined(USE_IOBUF_MACROS) #define lseek(f,o,w) iobuf_lseek(f,o,w) #define read(f,b,c) iobuf_read(f,b,c) #define write(f,b,c) iobuf_write(f,b,c) #define pread(f,b,c,o) iobuf_pread(f,b,c,o) #define pwrite(f,b,c,o) iobuf_pwrite(f,b,c,o) #define readv(f,v,c) iobuf_readv(f,v,c) #define writev(f,v,c) iobuf_writev(f,v,c) #define open iobuf_open #define creat(p,m) iobuf_creat(p,m) #define close(f) iobuf_close(f) #define ftruncate(f,l) iobuf_ftruncate(f,l) #define dup(f) iobuf_dup(f) #define dup2(o,n) iobuf_dup2(o,n) #define fsync(f) iobuf_fsync(f) #define fdatasync(f) iobuf_fdatasync(f) #define lseek64(f,o,w) iobuf_lseek(f,o,w) #define ftruncate64(f,l) iobuf_ftruncate(f,l) #define pread64(f,b,c,o) iobuf_pread(f,b,c,o) #define pwrite64(f,b,c,o) iobuf_pwrite(f,b,c,o) #endif #ifdef __cplusplus extern "C" { #endif #include <sys/types.h> struct iovec; /* Replacements for system call namesakes. */ extern off_t iobuf_lseek(int fd, off_t offset, int whence); extern ssize_t iobuf_read(int fd, void *buf, size_t count); extern ssize_t iobuf_write(int fd, const void *buf, size_t count); extern ssize_t iobuf_readv(int fd, const struct iovec *vector, int count); extern ssize_t iobuf_writev(int fd, const struct iovec *vector, int count); extern ssize_t iobuf_pread(int fd, void *buf, size_t count, off_t offset); extern ssize_t iobuf_pwrite(int fd, const void *buf, size_t count, off_t offset); extern int iobuf_open(const char *pathname, int flags, ... ); extern int iobuf_creat(const char *pathname, mode_t mode); extern int iobuf_close(int fd); extern int iobuf_ftruncate(int fd, off_t length); extern int iobuf_dup(int oldfd); extern int iobuf_dup2(int oldfd, int newfd); extern int iobuf_fsync(int fd); extern int iobuf_fdatasync(int fd); /* Flush dirty buffers to disk. */ extern int iobuf_flush(int fd); /* Augment IOBUF_PARAMS with a file name pattern and parameters string. */ extern int iobuf_file_params(const char *fname, const char *params); #ifdef __cplusplus } #endif #endif
IOBUF Etree Patch
Index: euclid3/libsrc/buffer.c =================================================================== RCS file: /afs/cs.cmu.edu/project/euclid/CVSROOT/euclid3/libsrc/buffer.c,v retrieving revision 1.12 diff -u -r1.12 buffer.c --- euclid3/libsrc/buffer.c 2 Dec 2003 22:24:32 -0000 1.12 +++ euclid3/libsrc/buffer.c 18 Mar 2011 07:12:49 -0000 @@ -24,6 +24,51 @@ #include "buffer.h" +/* use IOBUF library on Cray XT3/5 platforms */ +#ifdef USE_IOBUF +#undef lseek +#undef read +#undef write +#undef readv +#undef writev +#undef pread +#undef pwrite +#undef open +#undef creat +#undef close +#undef ftruncate +#undef dup +#undef dup2 +#undef fsync +#undef fdatasync + +#if !defined USE_IOBUF_LOCAL_MACROS + +#define USE_IOBUF_MACROS +#include <iobuf.h> + +#else /* !defined USE_IOBUF_LOCAL_MACROS */ + +#define lseek iobuf_lseek +#define read iobuf_read +#define write iobuf_write +#define readv iobuf_readv +#define writev iobuf_writev +#define pread iobuf_pread +#define pwrite iobuf_pwrite +#define open iobuf_open +#define creat iobuf_creat +#define close iobuf_close +#define ftruncate iobuf_ftruncate +#define dup iobuf_dup +#define dup2 iobuf_dup2 +#define fsync iobuf_fsync +#define fdatasync iobuf_fdatasync +#endif /* !defined USE_IOBUF_LOCAL_MACROS */ + +#endif /* USE_IOBUF */ + + /* various offsets for quick pointer manipulation */ static int lruln_offset, hashln_offset;
Source Code
SVN: https://source.usc.edu/svn/cvmbycols
Supported CVMs
The cvmbycols utility is built upon UCVM. As a result, it can extract etrees from the following CMV sources:
- CVM-H
- CVM-S
- USGS Cencal (which itself is an etree)
- 1D
References
- Ricardo Taborda, Julio Lopez, David O'Hallaron, Tiankai Tu and Jacobo Bielak (2007): A review of the current approach to CVM-Etrees", SCEC Annual Meeting, Palm Springs, CA, USA, September 8–12 (Abstract and poster).
- Tiankai Tu, David R. O'Hallaron, and Julio Lopez (2003): The Etree Library: A System for Manipulating Large Octrees on Disk. Technical Report CMU-CS-03-174, School of Computer Science, Carnegie Mellon University, July, 2003. ( ps, pdf, ppt, bib ).
- Tiankai Tu, David R. O'Hallaron, and Julio Lopez (2002): Etree: A Database-Oriented Method for Generating Large Octree Meshes. In Proceedings of the Eleventh International Meshing Roundtable (Ithaca, NY, Sept. 2002), pp. 127-138. ( ps, pdf, ppt, bib )
- David R. O'Hallaron (2003): Database Methods for Scientific Computing, Northwestern University, Feb, 2003. This is an overview of the Euclid project. ( ppt, MPEG animation on title slide )