The simulation of a complete three-dimensional seismic wavefield with realistic assumptions of the material properties of the medium the waves are propagating through requires the solution of a system of partial differential equations. Depending on the desired frequency range of the simulated waves and the material properties their spatial wavelengths vary. In particular, high-frequency wave propagation in low-velocity material can reduce the wavelengths dramatically and the necessary resolution of the numerical mesh to discretize the physical model can lead to an enormous amount of mesh elements.

Optimizations for commodity CPUs and Intel Xeon Phi

Picture of SuperMUC. Source: Leibniz Rechenzentrum

The numerical scheme of SeisSol boils down to sparse and dense small matrix multiplications, where small means that the matrices have roughly less than a hundred entries per dimension. Furthermore, all sparsity patterns are known a priori. Before running SeisSol we specify the convergence order, which leads to different matrix sizes, and the architecture that we want to optimize for. Then, we obtain specialized matrix multiplication routines through our code generator for each matrix multiplication. Each one may be either a sparse x dense, dense x sparse, or dense x dense multiplication. When the matrix multiplication is sparse, we hardwire the sparsity pattern into the generated code and order the operations such that the compiler auto-vectorizes them. In the case a dense x dense multiplication, we generate assembler code which makes use of SSE3, AVX, AVX2, or AVX512 instructions, depending on what is available on the architecture. In order to decide if a matrix should be stored sparse or dense, we use auto-tuning.
In the meantime, our code generator is part of the BSD-licensed libxsmm library and is developed further by the Intel Corporation.

In our ADER-DG scheme, each element may in principle have its own time step, which is also called local time stepping. This leads, however, to a highly irregular scheme with respect to communication and computation. In order to circumvent the irregularity and reduce the overhead from local time stepping we partition the whole interval of time steps in disjoint intervals, where the boundaries of those intervals are multiples of the minimum timestep. We associate a time cluster with each interval and assign each element to a time cluster depending on the element's timestep. As such we have a regular structure in each cluster and may communicate data in large chunks. Note, that the assignment is solely based on the timestep and has no geometric constraints. As such, the elements of a time cluster may be at arbitrary locations in the mesh.

SeisSol provides highly tuned I/O routines to handle unstructured meshes with several hundred millions of cells and billions of unknowns. The I/O routines are adapted for the complete workflow from a CAD model to the final result. SeisSol is able to read the large meshes from a specialized mesh format based on parallel netCDF. The tool PUMGen creates the customized files directly from CAD models or other mesh formats. The I/O routines writing the complete wave field use the parallel HDF5 library to combine the data into a single file. An optional aggregation step can reduce the number of I/O ranks significantly and aligns the data to file system blocks. The output is written in XDMF, a format that can be visualized with common tools such as ParaView or VisIt. SeisSol also provides multiple checkpoint back-ends based on POSIX I/O, MPI-IO and HDF5 to restart the simulation in case of failures.

Performance results on SuperMUC

Coming soon.