[cig-commits] commit: Cleanup of discussion of parallel performance.
Mercurial
hg at geodynamics.org
Mon Aug 13 14:56:20 PDT 2012
changeset: 131:56ed4e9edde6
tag: tip
user: Brad Aagaard <baagaard at usgs.gov>
date: Mon Aug 13 14:56:14 2012 -0700
files: faultRup.tex references.bib
description:
Cleanup of discussion of parallel performance.
diff -r 9268656f1d00 -r 56ed4e9edde6 faultRup.tex
--- a/faultRup.tex Mon Aug 13 09:33:09 2012 -0700
+++ b/faultRup.tex Mon Aug 13 14:56:14 2012 -0700
@@ -24,9 +24,6 @@
\newcommand\brad[1]{{\color{red}\bf [BRAD: #1]}}
\newcommand\matt[1]{{\color{blue}\bf [MATT: #1]}}
\newcommand\charles[1]{{\color{green}\bf [CHARLES: #1]}}
-\newcommand\PetscFunction[1]{\texttt{\bf #1}}
-\newcommand\PetscClass[1]{\texttt{\bf #1}}
-\newcommand\PetscEvent[1]{\texttt{\bf #1}}
% ======================================================================
% PREAMBLE
@@ -80,7 +77,7 @@
equations that provides excellent scalability with problem size
compared to conventional Additive Schwarz methods. We demonstrate
application of this approach using benchmarks for both quasi-static
- viscoelastic deformation and spontaneous dynamic rupture propagation
+ viscoelastic deformation and dynamic spontaneous rupture propagation
that verify the numerical implementation. Future work will focus on
linking the quasi-static and dynamic simulations together to capture
both the slow strain accumulation and post-seismic relaxation at
@@ -223,7 +220,7 @@ fault slip evolve according to a fault c
fault slip evolve according to a fault constitutive model that
specifies the friction on the fault surface. Here, we describe a
robust, yet flexible method for implementing fault slip with a domain
-decomposition approach, its affect on the overall design of PyLith,
+decomposition approach, its effect on the overall design of PyLith,
and verification of its implementation using benchmarks.
% ------------------------------------------------------------------
@@ -233,7 +230,7 @@ equations using the finite-element metho
equations using the finite-element method. We augment the conventional
finite-element formulation for elasticity with a domain decomposition
approach \citep{Smith:etal:1996,Zienkiewicz:Taylor:2005} to implement
-the fault slip. The PyLith manual \citep{PyLith:manual:1.6.2} provides
+the fault slip. The PyLith manual \citep{PyLith:manual:1.7.1} provides
a step-by-step description of the formulation.
We solve the elasticity equation including inertial terms,
@@ -298,8 +295,8 @@ the Lagrange multipliers (fault traction
\begin{split}
- \int_{V} \nabla\bm{\phi} : \bm{\sigma} \, dV
+ \int_{S_T} \bm{\phi} \cdot \bm{T} \, dS
- - \int_{S_{f^+}} \bm{\phi} \cdot \bm{l} \, dS
- + \int_{S_{f^-}} \bm{\phi} \cdot \bm{l} \, dS \\
+ - \int_{S_{f^+}} \bm{\phi} \cdot \bm{l} \, dS \\
+ + \int_{S_{f^-}} \bm{\phi} \cdot \bm{l} \, dS
+ \int_{V} \bm{\phi} \cdot \bm{f} \, dV
- \int_{V} \bm{\phi} \cdot \rho \frac{\partial^{2}\bm{u}}{\partial t^{2}} \, dV
=0.
@@ -363,7 +360,7 @@ for all $\bm{a}_m$, which leads to
\end{gather}
We want to solve these equations for the coefficients $\bm{u}_n$
and $\bm{l}_p$ subject to $\bm{u} = \bm{u}_0 \text{ on
-}S_u$. When we prescribed the slip, we specify $\bm{d}$ on $S_f$,
+}S_u$. When we prescribe the slip, we specify $\bm{d}$ on $S_f$,
and when we use a fault constitutive model, we specify how the
Lagrange multipliers $\bm{l}$ depend on the fault slip, slip rate,
and state variables.
@@ -414,7 +411,7 @@ Extensible Toolkit for Scientific Comput
Extensible Toolkit for Scientific Computation (PETSc), which provides
a suite of tools for solving linear systems of algebraic equations
with parallel processing
-\citep{PETSc:manual,PETSC:efficient}. In solving the
+\citep{PETSC:efficient,PETSc:manual}. In solving the
system, we compute the residual (i.e., $\bm{r} = \bm{b} -
\bm{A} \cdot \bm{u}$ and the Jacobian of the system
($\bm{A}$). In our case the solution is $\bm{u} =
@@ -447,8 +444,10 @@ some algebra we find this portion of the
(\nabla^T + \nabla) \bm{N}_m^T \cdot
\bm{C} \cdot (\nabla + \nabla^T) \bm{N}_n \, dV.
\end{equation}
-Following a similar procedure, we find the portion of the Jacobian
-associated with the constraints, equation~(\ref{eqn:quasi-static:residual:fault}), is
+This matches the stiffness matrix in conventional solid mechanics
+finite-element formulations. Following a similar procedure, we find
+the portion of the Jacobian associated with the constraints,
+equation~(\ref{eqn:quasi-static:residual:fault}), is
\begin{equation}
\label{eqn:jacobian:constraint}
\bm{L} = \int_{S_f} \bm{N}_p^T \cdot (\bm{N}_{n^+} - \bm{N}_{n^-}) \, dS.
@@ -494,7 +493,7 @@ the fault slip constraint remains unchan
the fault slip constraint remains unchanged, so the corresponding
portions of the Jacobian ($\bm{L}$) and residual ($\bm{r}_p$)
are exactly the same as in the quasi-static simulations. Including the
-inertial term in equation~\ref{eqn:quasi-static:residual:elasticity} yields
+inertial term in equation~(\ref{eqn:quasi-static:residual:elasticity}) yields
\begin{equation}
\label{eqn:dynamic:residual:elasticity}
\begin{split}
@@ -541,10 +540,12 @@ so that the upper portion of the Jacobia
\bm{K} =
\frac{1}{\Delta t^2} \int_{V} \rho \bm{N}_m^T\ \cdot \bm{N}_n \, dV.
\end{equation}
+This matches the mass matrix in conventional solid mechanics
+finite-element formulations.
-Earthquake ruptures in which the slip has a short rise time tends to
+Earthquake ruptures in which the slip has a short rise time tend to
introduce deformation at short length scales (high frequencies) that
-the numerical models can resolve accurately. This is especially true
+the numerical models cannot resolve accurately. This is especially true
in spontaneous rupture simulations, because the rise time is sensitive
to the evolution of the fault rupture. In order to reduce the
introduction of deformation at such short length scales we add
@@ -558,7 +559,7 @@ artificial damping via Kelvin-Voigt visc
\bm{u_d} = \bm{u} + \eta^* \Delta t \frac{\partial
\bm{u}}{\partial t},
\end{gather}
-where $\eta^*$ is a nomdimensional viscosity on the order of
+where $\eta^*$ is a nondimensional viscosity on the order of
0.1--1.0.
% ------------------------------------------------------------------
@@ -593,7 +594,7 @@ magnitude. This results in a rather ill-
magnitude. This results in a rather ill-conditioned system. We avoid
this ill-conditioning by nondimensionalizing all of the quantities
involved in the problem based upon user-specified scales
-\citep{PyLith:manual:1.6.2}, facilitating forming well-conditioned
+\citep{PyLith:manual:1.7.1}, facilitating forming well-conditioned
systems of equations for problems across a wide range of spatial and
temporal scales.
@@ -614,7 +615,7 @@ rate), the integral of Brune's far-field
\citep{Brune:1970}, a sine-cosine function developed by
\citet{Liu:etal:2006}, and a user-defined time history. These are
discussed in detail in the PyLith manual
-\citep{PyLith:manual:1.6.2}. PyLith allows specification of the slip
+\citep{PyLith:manual:1.7.1}. PyLith allows specification of the slip
initiation time independently at each location as well as
superposition of multiple earthquake ruptures with different origin
times, thereby permitting complex slip behavior.
@@ -623,11 +624,11 @@ times, thereby permitting complex slip b
% ------------------------------------------------------------------
\subsection{Spontaneous Fault Rupture}
-In a spontaneous (dynamic) fault rupture a constitutive model controls
-the tractions on the fault surface. The fault slip evolves based on
-the fault tractions as driven by the initial conditions, boundary
+In a spontaneous fault rupture a constitutive model controls the
+tractions on the fault surface. The fault slip evolves based on the
+fault tractions as driven by the initial conditions, boundary
conditions and deformation. In our formulation of fault slip, we
-specify the fault slip and the fault tractions (Lagrange multipliers)
+specify the fault slip, and the fault tractions (Lagrange multipliers)
are part of the solution (unknowns). The fault constitutive model
places bounds on the Lagrange multipliers and the system of equations
is nonlinear-- when a location on the fault is slipping, we must solve
@@ -700,7 +701,7 @@ significantly with slip, then the estima
significantly with slip, then the estimate of how much slip is
required to match the fault constitutive model will be
poor. Similarly, in rare cases in which the fault slip extends across
-the entire domain, deformation extends much farther from the fault and
+the entire domain, deformation extends far from the fault and
the estimate derived using only the fault DOF will be
poor. In order to make this iterative procedure more robust so that it
works well across a wide variety of fault constitutive models, we add
@@ -736,7 +737,7 @@ zero. The fault constitutive models incl
zero. The fault constitutive models include static friction, linear
slip-weakening, linear time-weakening, and Dieterich-Ruina rate-state
friction with an aging law (see the PyLith manual
-\citep{PyLith:manual:1.6.2} for details).
+\citep{PyLith:manual:1.7.1} for details).
% ------------------------------------------------------------------
\section{Finite-Element Mesh Processing}
@@ -754,7 +755,7 @@ representation and manipulation is based
representation and manipulation is based upon a direct acyclic graph
representation of the \textit{covering} relation in a mesh. For
example, faces cover cells, edges cover faces, and points cover
-edges. By focusing on the key topological relation, the interface can
+edges. By focusing on the key topological relations, the interface can
be both concise and quite general. Using this generic API, PyLith is
able to support one, two, and three dimensional meshes, with
simplicial, hex, and even prismatic cell shapes, while using very
@@ -881,7 +882,7 @@ accommodate an arbitrary number of field
accommodate an arbitrary number of fields, mixed discretizations,
fields defined over a subset of the mesh, etc. Once these fields are
defined, a substantial range of preconditioners can be assembled using
-only PyLith options for PETSc. Table~\ref{tab:solver:options} shows
+only PyLith options for PETSc. Table~\ref{tab:preconditioner:options} shows
example preconditioners and the options necessary to construct them.
Another option involves using the field split preconditioner in PETSc
@@ -911,26 +912,26 @@ which leads to a simple block diagonal p
\end{array} \right).
\end{equation}
-The elastic submatrix $K$, in the absence of boundary conditions,
+The elastic submatrix $\bm{K}$, in the absence of boundary conditions,
has three translational and three rotational null modes. These are
provided to the algebraic multigrid (AMG) preconditioner, such as the
-ML library \citep{ML:users:guide} or the PETSc GAMG preconditioner,
-in order to assure an accurate coarse grid solution. AMG mimics the
-action of traditional geometric multgrid, but it generates coarse
+ML library \citep{ML:users:guide} or the PETSc GAMG preconditioner, in
+order to assure an accurate coarse grid solution. AMG mimics the
+action of traditional geometric multigrid, but it generates coarse
level operators and interpolation matrices using only the system
matrix, treated as a weighted graph, rather than a separate
-description of the problem geometry, such as a mesh. We use
-PCFIELDSPLIT to split the elastic block from the fault block, and
-also to manage Schur complements. In this way, all block preconditioners,
-including those nested with multigrid, can be controlled from the
-options file without recompilation or special code.
+description of the problem geometry, such as a mesh. We split the
+elastic block from the fault block, and also to manage Schur
+complements. In this way, all block preconditioners, including those
+nested with multigrid, can be controlled from the options file without
+recompilation or special code.
We now turn our attention to evaluating the fault portion of the
preconditioning matrix associated with the Lagrange multipliers, since
stock PETSc preconditioners can handle the elastic portion as
discussed in the previous paragraph. In computing
$\bm{P_\mathit{fault}}$ we approximate $\bm{K}^{-1}$ with
-the inverse of the diagonal portion of $\bm{K}$. $\bm{L}$, which
+the inverse of the diagonal portion of $\bm{K}$. $\bm{L}$ which
consists of integrating the products of basis functions over the fault
faces. Its structure depends on the quadrature scheme and the choice
of basis functions. For conventional low order finite-elements and
@@ -956,12 +957,12 @@ equation~(\ref{eqn:saddle:point:code}).
% $\bm{K}$.}
Our preferred setup uses the field splitting options in PETSc to
-combine an AMG preconditioner for the elasticity submatrix with out
+combine an AMG preconditioner for the elasticity submatrix with our
custom fault preconditioner for the Lagrange multiplier submatrix. See
-Section~\ref{sec:solvertest} for a comparison of preconditioner
-performance for an application involved a static simulation with
-multiple faults. It shows the clear superiority of this setup over
-several other possible preconditioning strategies.
+Section~\ref{sec:performance:benchmark} for a comparison of
+preconditioner performance for an application involved a static
+simulation with multiple faults. It shows the clear superiority of
+this setup over several other possible preconditioning strategies.
\subsection{Dynamic Simulations}
@@ -1048,7 +1049,7 @@ associated with the fault slip constrain
associated with the fault slip constraint in solving the system of
equations via a Schur's complement algorithm. We compute an initial
residual assuming the increment in the solution is zero (i.e.,
-$\bm{du}_n = \bm{0}$ and $\bm{dl}_p = \bm{0}$,
+$\bm{du}_n = \bm{0}$ and $\bm{dl}_p = \bm{0}$),
\begin{equation}
\bm{r}^* = \begin{pmatrix} \bm{r}_n^* \\ \bm{r}_p^* \end{pmatrix} =
\begin{pmatrix} \bm{b}_n \\ \bm{b}_p \end{pmatrix}
@@ -1169,10 +1170,10 @@ the problem size (number of DOF) for the
the problem size (number of DOF) for the two different cell types
(hexahedra and tetrahedra) are nearly the same. The suite of
simulations examine problem sizes increasing by about a factor of two
-from $1.78\times 10^5$ DOF to $2.14\times 10^7$ DOF. The corresponding
-discretization sizes are 1520 m to 392 m for the hexahedral meshes and
-1744 m to 456 m for the tetrahedral meshes.
-Figure~\ref{fig:solvertest:mesh} shows the 1744 m resolution
+from $1.78\times 10^5$ DOF (1 process) to $1.10\times 10^7$ DOF (64
+processes). The corresponding discretization sizes are 2033 m to 500 m
+for the hexahedral meshes and 2326 m to 581 m for the tetrahedral
+meshes. Figure~\ref{fig:solvertest:mesh} shows the 1846 m resolution
tetrahedral mesh. As we will see in
Section~\ref{sec:verification:quasi-static}, the hexahedral mesh for a
given resolution in a quasi-static problem is slightly more accurate,
@@ -1212,87 +1213,103 @@ family of field split preconditioners, t
family of field split preconditioners, the one with multiplicative
composition minimizes the number of iterations. The custom
preconditioner for the Lagrange multiplier submatrix greatly
-accelerates the convergence with an 80\% further reduction in the
+accelerates the convergence with an 80\% reduction in the
number of iterations required for convergence.
\subsection{Parallel Scaling Performance}
-The underlying PETSc solver infrastructure has demonstrated optimal scalability
-on the largest machines available today. However, very often computer science
-scalability results are based upon unrealistically simple problems which do not
-advance the scientific state-of-the-art. We will concentrate on explaining the
-sources of reduced scalability, and propose possible algorithmic mitigation.
+The underlying PETSc solver infrastructure has demonstrated optimal
+scalability on the largest machines available today \ref{??} \matt{Add
+ reference for this statement}. However, computer science scalability
+results are often based upon unrealistically simple problems which do
+not advance the scientific state-of-the-art. In evaluating the
+parallel scalability of PyLith, we consider the sources responsible
+for reducing the scalability and propose possible steps for
+mitigation.
-The main impediment to scalability in PyLith is load imbalance in the solver stage.
-This imbalance is the combination of three effects: the inherent imbalance in the
-partition of an unstructured mesh, the use of a cell partition, and lack of
-incorporation of cohesive cells in the partition. In our full test case, matrix-vector
-multiplication (the PETSc \PetscFunction{MatMult} function) could have a load imbalance
-of up to 30\% on 128 processors. The cell partition, calculated with ParMetis, which is
-necessary in order to achieve good balance for the finite element integration, does not
-take into account Dirichlet boundary conditions or unknowns on the fault, which can exacerbate
-the imbalance. However, elimination of constrained unknowns preserves the symmetry
-of the overall systems, and can result in better conditioned linear systems.
+The main impediment to scalability in PyLith is load imbalance during
+the solve. This imbalance is the combination of three effects: the
+inherent imbalance in partitioning an unstructured mesh, partitioning
+based on cells rather than DOF, and weighting the cohesive cells the
+same as conventional bulk cells while partitioning. In this
+performance benchmark matrix-vector multiplication (the PETSc
+\texttt{MatMult} function) has a load imbalance of up to 20\%
+on 64 processors. The cell partition balances the number of cells
+across the processes using ParMetis \cite{Karypis:etal:1999} in order
+to achieve good balance for the finite element integration. This does
+not take into account a reduction in the number of DOF associated with
+constraints from Dirichlet boundary conditions or the additional DOF
+associated with the Lagrange multiplier constraints, which can
+exacerbate any imbalance. Nevertheless, eliminating DOF associated
+with Dirichlet boundary conditions preserves the symmetry of the
+overall systems and, in many cases, results in better conditioned
+linear systems.
We evaluate the parallel performance via a weak scaling
criterion. That is, we run simulations on various numbers of
processors/cores with an increase in the problem size as the number of
-processors increases to maintain the same workload (number of DOF) for
-each core. Ideally, the time for the various stages of the simulation
-should be independent of the number of processors/cores. For this
-performance benchmark we use the entire suite of hexahedral and
-tetrahedral meshes described earlier that range in size from
-$1.78\times 10^5$ DOF to $2.14\times 10^7$ DOF. In each of these
-simulations, we compare the standard Additive Schwarz parallel
-preconditioner~\cite{Smith:etal:1996} with an approximate block factorization
-preconditioner~\cite{elman2008tcp} built using the PETSc \PetscClass{PC FieldSplit}
-object. We test additive, multiplicative, and Schur complement block compositions,
-and employ the ML algebraic multigrid preconditioner for the elasticity
+processes increases to maintain the same workload (e.g., number of
+cells and number of DOF) for each core. In ideal weak scaling the time
+for the various stages of the simulation is independent of the number
+of processes. For this performance benchmark we use the entire suite
+of hexahedral and tetrahedral meshes described earlier that range in
+size from $1.78\times 10^5$ DOF (1 process) to $1.10\times 10^7$ DOF
+(64 processes). We employ the AMG preconditioner for the elasticity
submatrix and our custom preconditioner for the Lagrange multipliers
submatrix. We ran the simulations on a Beowulf cluster comprised of 24
compute nodes connected by QDR Infiniband, where each compute node
-consisted of two quad-core Intel Xeon E5620 processors with 24 GB
-of RAM. Simulations run on eight or fewer cores were run on a single
-compute node. In the USGS configuration, four dual-quadcore chips share a single backplane.
-Thus, in addition to algorithm bottlenecks, runtime performance is potentially impeded
-by core/memory affinity, memory bandwidth, and communication among compute nodes.
+consisted of two quad-core Intel Xeon E5620 processors with 24 GB of
+RAM. Simulations run on eight or fewer cores were run on a single
+compute node with processes distributed across processors and then
+cores. For example, the two process simulation used one core on each
+of two processors. In addition to algorithm bottlenecks, runtime
+performance is potentially impeded by core/memory affinity, memory
+bandwidth, and communication among compute nodes.
-The single node scaling for PyLith is almost completely controlled by the
-available memory bandwidth. Good illustrations of the memory system
-performance are given by the \PetscEvent{VecAXPY}, \PetscEvent{VecMAXPY} and
-\PetscEvent{VecMDot} operations reported in the log summary~\cite{PETSc:manual},
-since these operations are limited by available memory bandwidth rather than
-processor flop rate. From Table~\ref{tab:memBandwidth}, we see that we saturate
-the memory system using two or three processes per processor, since scaling
-plateaus from 2 to 4 processes, but shows good scaling from 8 to 16. This lack of
-bandwidth will depress overall performance, but should not affect the inter-node
-scaling of the application.
+The single node scaling for PyLith (eight processes or less in this
+case) is almost completely controlled by the available memory
+bandwidth. Good illustrations of the memory system performance are
+given by the \texttt{VecAXPY}, \texttt{VecMAXPY} and \texttt{VecMDot}
+operations reported in the log summary \cite{PETSc:manual}. These
+operations are limited by available memory bandwidth rather than the
+rate at which a processor can perform floating points operations. From
+Table~\ref{tab:solvertest:memory:events}, we see that we saturate the memory
+system using two processes per processor, since scaling plateaus from
+2 to 4 processes, but shows good scaling from 8 to 16 processes. This
+lack of memory bandwidth will depress overall performance, but should
+not affect the inter-node scaling of the application.
-Machine network performance can be elucidated by the \PetscEvent{VecMDot} operation
-for reductions, and \PetscEvent{MatMult} for point-to-point communication. In
-Table~\ref{tab:memBandwidth} we see that the vector reduction shows good scaling up
-to 64 processes \matt{Recheck with new 128 results}. Similarly in Table~\ref{tab:solver},
-we see that \PetscEvent{MatMult} has good scalability, but that it is the smaller component
-of overall solver time. The AMG setup time increases roughly linearly with the number of
-processes. It is often not included in weak scaling studies since it is amortized over
-the iteration, but it is responsible for most of the deviation from perfect weak scaling.
-The scalability of the AMG apply decays more slowly, but there is still serious deterioation
-by 64 processes. Here we could trade preconditioner strength for scalability by turning down
-the work done on coarse AMG grids, so that the solver uses more iterations which scale very well.
-However, that would increase overall solver time and thus would not be the choice to maximize
-scientific output.
+Machine network performance can be elucidated by the \texttt{VecMDot}
+operation for vector reductions, and \texttt{MatMult} for
+point-to-point communication. In
+Table~\ref{tab:solvertest:memory:events} we see that the vector
+reduction shows good scaling up to 64 processes. Similarly in
+Table~\ref{tab:solvertest:solver:events}, we see that \texttt{MatMult}
+has good scalability, but that it is a small fraction of the overall
+solver time. The AMG preconditioner setup (\texttt{PCSetUp}) and
+application \texttt{PCApply}) dominate the overall solver time. The
+AMG preconditioner setup time increases roughly linearly with the
+number of processes. Note that many weak scaling studies do not
+include this event, because it is amortized over the
+iteration. Nevertheless, in our benchmark it is responsible for most
+of the deviation from perfect weak scaling. The scalability of the
+application of the AMG preconditioner decays more slowly, but there is
+still serious deterioration by 64 processes. We could trade
+preconditioner strength for scalability by reducing the work done on
+the coarse AMG grids, so that the solver uses more iterations which
+scale very well. However, that would increase overall solver time and
+thus would not be the choice to maximize scientific output.
Figure~\ref{fig:solvertest:scaling} illustrates excellent the parallel
performance for the finite-element assembly routines (reforming the
-Jacobian sparse matrix and computing the residual). From the ASM performance,
-we see that the basic solver building-blocks, like parallel sparse matrix-vector
-multiplication, scale well. However, the preconditioner itself is not scalable, and
-the number of iterates increases strongly with process count. The introduction of
-Schur complement methods and AMG slows the growth considerably, but future work will
-pursue the ultimate goal of iteration counts independent of process number.
-
-%Separately plot VecMDot and MatMult. This shows that increasing iterates have a penalty from sync VecMDot, and MatMult
-%scales better. Could tradeoff with iterative method like BCGS.
+Jacobian sparse matrix and computing the residual). From the ASM
+performance, we see that the basic solver building-blocks, like
+parallel sparse matrix-vector multiplication, scale well. However, the
+ASM preconditioner itself is not scalable, and the number of iterates
+increases strongly with the number of processes. The introduction of Schur
+complement methods and an AMG preconditione slows the growth
+considerably, but future work will pursue the ultimate goal of
+iteration counts independent of the number of processes.
% ------------------------------------------------------------------
\section{Code Verification Benchmarks}
@@ -1307,19 +1324,19 @@ verify that the code properly solves the
verify that the code properly solves the numerical problem. These
benchmark problems include quasi-static strike-slip and reverse
viscoelastic simulations and various problems in the suite of
-spontaneous dynamic rupture benchmarks developed by the Southern
+dynamic spontaneous rupture benchmarks developed by the Southern
California Earthquake Center (SCEC) and the United States Geological
Survey \citep{Harris:etal:SRL:2009}. In this section we focus on two
benchmarks that test different scientific applications: quasi-static
relaxation of a Maxwell viscoelastic material subjected to multiple
earthquake cycles involving slip and steady creep on a vertical
strike-slip fault \citep{Savage:Prescott:1978} and supershear
-spontaneous dynamic rupture of a 60 degree dipping normal fault in a
+dynamic spontaneous rupture of a 60 degree dipping normal fault in a
Drucker-Prager elastoplastic medium. This second benchmark corresponds
-to benchmark TPV13 in the SCEC suite of spontaneous dynamic rupture
+to benchmark TPV13 in the SCEC suite of dynamic spontaneous rupture
benchmarks.
-\subsection{Quasi-static}
+\subsection{Quasi-static Benchmark}
\label{sec:verification:quasi-static}
As a test of our quasi-static solution, we compare our numerical
@@ -1361,13 +1378,13 @@ domain. Finally, we fix the x-displaceme
domain. Finally, we fix the x-displacements on the -y and +y faces to
enforce symmetry consistent with an infinitely long strike-slip fault.
-We examine four difference numerical solutions considering the effects
+We examine four different numerical solutions considering the effects
of cell type (hexahedral versus tetrahedral) and discretization
size. In our coarse hexahedral mesh we use a uniform resolution of 20
km. In our higher resolution hexahedral mesh we refine an inner region
(480 km by 240 km by 100 km) by a factor of three, yielding a
resolution near the center of the fault of 6.7 km. For the
-tetrahedral meshes, we match the discretization size of the hexehedral
+tetrahedral meshes, we match the discretization size of the hexahedral
mesh near the center of the fault (20km or 6.7 km) while increasing
the discretization size in a geometric progression at a rate of
1.02. This results in a maximum discretization size of approximately
@@ -1404,8 +1421,8 @@ second earthquake cycle, the far-field n
second earthquake cycle, the far-field numerical solution does not yet
accurately represent steady plate motion and the numerical simulations
underpredict the displacement. By the tenth earthquake cycle, steady
-plate motion is accurately simulated and the numerical results thus
-match the analytical solution.
+plate motion is accurately simulated and the numerical results match
+the analytical solution.
Within about one elastic thickness of the fault the effect of the
resolution of the numerical models becomes apparent. We find large
@@ -1415,21 +1432,21 @@ solution. The 6.7 km hexahedral solution
solution. The 6.7 km hexahedral solution is indistinguishable from the
analytical solution in Figure~\ref{fig:savage:prescott:profiles}(b);
the 6.7 km tetrahedral solution slightly underpredicts the analytical
-solution for times later in the earthquake cycle. The greater
+solution for times late in the earthquake cycle. The greater
accuracy of the hexahedral cells relative to the tetrahedral cells
with the same nominal discretization size for quasi-static solutions
is consistent with our findings for other benchmarks. The greater
number of polynomial terms in the basis functions of the hexahedra
allows the model to capture a more complex deformation field.
-\subsection{Dynamic}
+\subsection{Dynamic Benchmark}
\label{sec:verification:dynamic}
SCEC Spontaneous Rupture Benchmark TPV13 focuses on modeling an
earthquake that produces extreme (very large) ground motions
associated with supershear rupture towards the ground surface on a
-dipping fault a large stress drop to generate large slip and fast slip
-rates \citep{Harris:etal:SRL:2011}. It uses a Drucker-Prager
+dipping fault with a large stress drop to generate large slip and fast
+slip rates \citep{Harris:etal:SRL:2011}. It uses a Drucker-Prager
elastoplastic bulk rheology and a slip-weakening friction model in a
depth-dependent initial stress field. Figure~\ref{fig:tpv13:geometry}
show the geometry of the benchmark problem and the size of the domain
@@ -1441,12 +1458,12 @@ both triangular and quadrilateral discre
both triangular and quadrilateral discretizations with resolutions on
the fault of 50 m, 100 m, and 200 m for TPV13-2D and 100 m and 200 m
for TPV13. We gradually coarsen the mesh with distance from the fault
-by increasing the discretization size at a geometric rate of
-2\%. This provides high resolution at the fault surface to resolve the
-small scale features of the rupture process with less resolution at
-the edges of the boundary where the solution is much
-smoother. Figure~\ref{fig:tpv13-2d:mesh} shows the triangular mesh for a
-discretization size of 100 m on the fault.
+by increasing the discretization size at a geometric rate of 2\%. This
+provides high resolution at the fault surface to resolve the small
+scale features of the rupture process with less resolution at the
+edges of the boundary where the solution is much
+smoother. Figure~\ref{fig:tpv13-2d:mesh} shows the triangular mesh for
+a discretization size of 100 m on the fault.
Rupture initiates due to a low static coefficient of friction in the
nucleation region. Figure~\ref{fig:tpv13-2d:stress:slip} illustrates
@@ -1480,11 +1497,11 @@ cohesive zone \citep{Rice:1993}. The fin
cohesive zone \citep{Rice:1993}. The finer meshes provide sufficient
resolution of the cohesive zone so there is very little high-frequency
oscillation in the slip rate time histories. The triangular cells
-result in less oscillation compared with quadrilateral cells.
+generate less oscillation compared with quadrilateral cells.
In this problem without an analytical solution, as in all of the
benchmarks in the SCEC spontaneous rupture benchmark suite, we rely on
-comparison with other spontaneous dynamic rupture modeling codes to
+comparison with other dynamic spontaneous rupture modeling codes to
verify the numerical implementation in
PyLith. Figure~\ref{fig:tpv13-2d:slip:rate}(b) compares the slip rate
time histories from PyLith with four other codes (see
@@ -1529,8 +1546,8 @@ produces a small time shift in the time
From the 2-D and 3-D versions of the SCEC spontaneous rupture
benchmark TPV13, we conclude that the PyLith performs very similarly
-to other finite-element and finite-difference spontaneous dynamic
-rupture modeling codes. In particular it is well-suited to problems
+to other finite-element and finite-difference dynamic
+spontaneous rupture modeling codes. In particular it is well-suited to problems
with complex geometry as we are able to vary the discretization size
while simulating a dipping normal fault. The code accurately captures
supershear rupture and properly implements a Drucker-Prager
@@ -1554,7 +1571,7 @@ in PyLith with excellent agreement with
in PyLith with excellent agreement with the analytical solution for
viscoelastic relaxation and strike-slip faulting over multiply
earthquake cycle and excellent agreement with other codes for
-supershear spontaneous dynamic rupture on a dipping normal fault
+supershear dynamic spontaneous rupture on a dipping normal fault
embedded in an elastoplastic domain. Consequently, we believe this
methodology will provide a promising avenue for modeling the
earthquake cycle through coupling of quasi-static simulations of the
@@ -1676,7 +1693,7 @@ simulations of earthquake rupture propag
\noindent\includegraphics{figs/savageprescott_profiles}
\caption{Comparison of displacement profiles perpendicular to the
fault in the Savage and Prescott benchmark during earthquake
- cycles (a) 2 and (b) 10. The displacement values shown are
+ cycles (a) two and (b) ten. The displacement values shown are
relative to the values at the beginning of the earthquake cycle to
facilitate comparison between the analytical solution and the
numerical models. Both the analytical and numerical simulations
@@ -1689,8 +1706,8 @@ simulations of earthquake rupture propag
coarser (20 km) resolutions are unable to match the details of the
displacement field at distances less than about one elastic
thickness, but all of the numerical models provide a good fit to
- the analytical solution at distances greater than 2-3 times the
- elastic thickness.}
+ the analytical solution in the tenth earthquake cycle at distances
+ greater than 2--3 times the elastic thickness.}
\label{fig:savage:prescott:profiles}
\end{figure}
@@ -1701,7 +1718,7 @@ simulations of earthquake rupture propag
friction, a depth-dependent stress field, and normal fault with a
60 degree dip angle. The 2-D version corresponds to the vertical
slice shown by the dashed line. The red dotes denote locations on
- the fault used in the comparison of the vertical slip dates
+ the fault used in the comparison of the vertical slip rates
(Figures~\ref{fig:tpv13-2d:slip:rate}
and~\ref{fig:tpv13:slip:rate}). The blue dots indicate locations
on the ground surface used in the comparison of fault normal and
@@ -1711,7 +1728,7 @@ simulations of earthquake rupture propag
\begin{figure}
\noindent\includegraphics[width=84mm]{figs/tpv13-2d_mesh}
- \caption{Finite-element mesh comprised of triangular cells for SCEC
+ \caption{Finite-element mesh comprised of quadrilateral cells for SCEC
spontaneous rupture benchmark TPV13-2D. The discretization size is 100
m on the fault surface and increases at a geometric rate of 2\%
with distance from the fault. We employ this same spatial
@@ -1813,12 +1830,13 @@ simulations of earthquake rupture propag
% ------------------------------------------------------------------
% TABLES
% ------------------------------------------------------------------
-\begin{table*}
+\begin{table}
\caption{Example Preconditioners for the Saddle Point Problem in
Equation~(\ref{eqn:saddle:point})\tablenotemark{a}}
-\label{tab:solver:options}
+ \label{tab:preconditioner:options}
\centering
\begin{tabular}{ll}
+ AMG with additive relaxation & AMG with multiplicative relaxation \\
$\begin{pmatrix}\bm{K} & \bm{0} \\ \bm{0} & \bm{I}\end{pmatrix}$~\label{prec:add} & $\begin{pmatrix}\bm{K} & \bm{L}^T \\ \bm{0} & \bm{I}\end{pmatrix}$ \label{prec:mult}\\
\texttt{[pylithapp.problem.formulation]} & \texttt{[pylithapp.problem.formulation]} \\
\texttt{split\_fields = True} & \texttt{split\_fields = True} \\
@@ -1832,6 +1850,7 @@ simulations of earthquake rupture propag
\texttt{fs\_fieldsplit\_1\_pc\_type = jacobi} & \texttt{fs\_fieldsplit\_1\_pc\_type = jacobi} \\
\texttt{fs\_fieldsplit\_1\_ksp\_type = gmres} & \texttt{fs\_fieldsplit\_1\_ksp\_type = gmres} \\
\smallskip \\
+ Schur complement, upper factorization & Schur complement, full factorization \\
$\begin{pmatrix}\bm{K} & \bm{L}^T \\ \bm{0} & \bm{S}\end{pmatrix}$ \label{prec:schurUpper} & $\begin{pmatrix}\bm{I} & \bm{0} \\ \bm{B}^T \bm{A}^{-1} & \bm{I}\end{pmatrix}\begin{pmatrix}\bm{A} & \bm{0} \\ \bm{0} & \bm{S}\end{pmatrix}\begin{pmatrix}\bm{I} & \bm{A}^{-1} \bm{B} \\ \bm{0} & \bm{I}\end{pmatrix}$ \label{prec:schurFull}\\
\texttt{[pylithapp.problem.formulation]} & \texttt{[pylithapp.problem.formulation]} \\
\texttt{split\_fields = True} & \texttt{split\_fields = True} \\
@@ -1847,8 +1866,16 @@ simulations of earthquake rupture propag
\texttt{fieldsplit\_1\_pc\_type = jacobi} & \texttt{fieldsplit\_1\_pc\_type = jacobi} \\
\texttt{fieldsplit\_1\_ksp\_type = gmres} & \texttt{fieldsplit\_1\_ksp\_type = gmres} \\
\end{tabular}
-\tablenotetext{a}{ADD STUFF HERE}
-\end{table*}
+\tablenotetext{a}{Four examples of preconditioners often used to
+ accelerate convergence in saddle point problems. Below the
+ mathematical expression for the preconditioner, we show the PyLith
+ parameters used to construct the preconditioner. In the performance
+ benchmark we consider the AMG preconditioner with multiplicative
+ relaxation, the Schur complement preconditioner with upper
+ factorization, and the Schur complement preconditioner with full
+ factorization. The AMG preconditioner with additive relaxation is
+ shown for completeness.}
+\end{table}
\clearpage
\begin{table}
@@ -1887,12 +1914,12 @@ simulations of earthquake rupture propag
\begin{table}
\caption{Performance Benchmark Memory System Evaluation\tablenotemark{a}}
-\label{tab:memBandwidth}
+\label{tab:solvertest:memory:events}
\centering
\begin{tabular}{rcc}
\# Cores & Load Imbalance & MFlops/s \\
\hline
- \multicolumn{3}{c}{\PetscEvent{VecMDot}} \\
+ \multicolumn{3}{c}{\texttt{VecMDot}} \\
1 & 1.0 & 2188 \\
2 & 1.0 & 3969 \\
4 & 1.0 & 5511 \\
@@ -1900,9 +1927,8 @@ simulations of earthquake rupture propag
16 & 1.3 & 10249 \\
32 & 1.2 & 4270 \\
64 & 1.2 & 12299 \\
- 128 & 1.3 & 2019 \\
\hline
- \multicolumn{3}{c}{\PetscEvent{VecAXPY}} \\
+ \multicolumn{3}{c}{\texttt{VecAXPY}} \\
1 & 1.0 & 1453 \\
2 & 1.1 & 2708 \\
4 & 1.1 & 5001 \\
@@ -1910,9 +1936,8 @@ simulations of earthquake rupture propag
16 & 1.3 & 8157 \\
32 & 1.2 & 13876 \\
64 & 1.2 & 25807 \\
- 128 & 1.3 & 58759 \\
\hline
- \multicolumn{3}{c}{\PetscEvent{VecMAXPY}} \\
+ \multicolumn{3}{c}{\texttt{VecMAXPY}} \\
1 & 1.0 & 1733 \\
2 & 1.1 & 3283 \\
4 & 1.1 & 4991 \\
@@ -1920,16 +1945,19 @@ simulations of earthquake rupture propag
16 & 1.3 & 11050 \\
32 & 1.2 & 21680 \\
64 & 1.2 & 42697 \\
- 128 & 1.3 & 84691 \\
\hline
\end{tabular}
-\tablenotetext{a}{Examination of memory system performance using three PETSc vector operations.}
+\tablenotetext{a}{Examination of memory system performance using three
+ PETSc vector operations for simulations with the hexahedral meshes. \texttt{VecMDot}
+ corresponds to the operation for vector reductions, \texttt{VecAXPY}
+ corresponds to vector scaling and addition, and \texttt{VecMAXPY}
+ corresponds to mutiple vector scaling and addition.}
\end{table}
\begin{table}
\caption{Performance Benchmark Solver Evaluation\tablenotemark{a}}
-\label{tab:solver}
+\label{tab:solvertest:solver:events}
\centering
\begin{tabular}{lrrr}
Event & Calls & Time (s) & MFlops/s \\
@@ -1958,14 +1986,14 @@ simulations of earthquake rupture propag
PCApply & 70 & 14.8 & 10229 \\
KSPSolve & 1 & 47.5 & 7067 \\
\hline
- \multicolumn{4}{c}{p = 128} \\
- MatMult & 222 & 22.1 & 9880 \\
- PCSetUp & 1 & 314.1 & 169 \\
- PCApply & 71 & 6865.0 & 46 \\
- KSPSolve & 1 & 7198.9 & 99 \\
- \hline
\end{tabular}
-\tablenotetext{a}{Examination of memory system performance using three PETSc vector operations.}
+\tablenotetext{a}{Examination of solver performance using three of the
+ main events comprising the linear solve for simulations with the
+ hexahedral meshes and 8, 16, 32, and 64 processes. The
+ \texttt{KSPSolve} event encompasses the entire linear
+ solve. \texttt{MatMult} corresponds to matrix-vector
+ multiplications. \texttt{PCSetUp} and \texttt{PCApply} correspond to
+ the setup and application of the AMG preconditioner.}
\end{table}
@@ -2030,7 +2058,7 @@ simulations of earthquake rupture propag
preconditioners for tetrahedral and hexahedral discretizations and
three problem sizes (S1 with $1.8\times 10^5$ DOF, S2 with
$3.5\times 10^5$ DOF, and S3 with $6.9\times 10^5$ DOF). The field
- split preconditioner with multiplicative composittion and the custom
+ split preconditioner with multiplicative composition and the custom
fault block preconditioner yields good performance with only a
fraction of the iterates as the other preconditioners and a small
increase with problem size.}
diff -r 9268656f1d00 -r 56ed4e9edde6 references.bib
--- a/references.bib Mon Aug 13 09:33:09 2012 -0700
+++ b/references.bib Mon Aug 13 14:56:14 2012 -0700
@@ -982,7 +982,7 @@
title = {Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations},
publisher = {Cambridge University Press},
year = {1996},
- url = {http://www.mcs.anl.gov/\~{ }bsmith/ddbook.html}
+ note = {http://www.mcs.anl.gov/\~{ }bsmith/ddbook.html}
}
@article{elman2008tcp,
@@ -1093,6 +1093,50 @@
doi = {10.3233/SPR-2009-0249}
}
+ at article{Karypis:etal:1999,
+ author = {Karypis, G. and Aggarwal, R. and Kumar, V. and
+ Shekhar, S.},
+ title = {Multilevel hypergraph partitioning: Applications in
+ {VLSI} domain},
+ journal = {IEEE Transactions on Very Large Scale Integration
+ (VLSI) Systems},
+ year = 1999,
+ month = mar,
+ volume = 7,
+ number = 1,
+ pages = {69--79},
+ abstract = {In this paper, we present a new hypergraph-
+ partitioning algorithm that is based on the
+ multilevel paradigm. In the multilevel paradigm, a
+ sequence of successively coarser hypergraphs is
+ constructed, A bisection of the smallest hypergraph
+ is computed and it is used to obtain a bisection of
+ the original hypergraph by successively projecting
+ and refining the bisection to the next level finer
+ hypergraph. We have developed new hypergraph
+ coarsening strategies within the multilevel
+ framework. We evaluate their performance both in
+ terms of the size of the hyperedge cut on the
+ bisection, as well as on the run time for a number
+ of very large scale integration circuits. Our
+ experiments show that our multilevel hypergraph-
+ partitioning algorithm produces high-quality
+ partitioning in a relatively small amount of
+ time. The quality of the partitionings produced by
+ our scheme are on the average 6\%-23\% better than
+ those produced by other state-of-the-art
+ schemes. Furthermore, our partitioning algorithm is
+ significantly faster, often requiring CIO times less
+ time than that required by the other schemes. Our
+ multilevel hypergraph-partitioning algorithm scales
+ very well for large hypergraphs, Hypergraphs with
+ over 100 000 vertices can be bisected in a few
+ minutes on today's workstations. Also, on the large
+ hypergraphs, our scheme outperforms other schemes
+ (in hyperedge cut) quite consistently with larger
+ margins (9\%-30\%).},
+}
+
@Book{Zienkiewicz:Taylor:2005,
author = {Ziekiewicz, O.~C. and Taylor, R.~L. and Zhu, J.~Z.},
title = {The Finite-Element Method: Its Basis and Fundamentals},
@@ -1118,7 +1162,7 @@
Number = "ANL-95/11 - Revision 3.1",
Institution = "Argonne National Laboratory",
Year = 2010,
- url = {http://www.mcs.anl.gov/petsc}
+ note = "http://www.mcs.anl.gov/petsc"
}
@InProceedings{PETSc:efficient,
@@ -1133,14 +1177,14 @@
Year = 1997,
}
- at Manual{PyLith:manual:1.6.2,
- title = {PyLith User Manual, Version 1.6.2},
+ at Manual{PyLith:manual:1.7.1,
+ title = {PyLith User Manual, Version 1.7.1},
author = {Aagaard, B. and Kientz, S. and Knepley, M. and
Somala, S. and Strand, L. and Williams, C.},
organization = {Computational Infrastructure for Geodynamics (CIG)},
address = {University of California, Davis},
- year = 2011,
- note = {http://www.geodynamics.org/cig/software/pylith/pylith\_manual-1.6.2.pdf}
+ year = 2012,
+ note = {http://www.geodynamics.org/cig/software/pylith/pylith\_manual-1.7.1.pdf}
}
@book{Saad03,
More information about the CIG-COMMITS
mailing list