[cig-commits] commit: Updated some text on the preconditioning performance. Added notes on what still needs to be done.

Tue May 8 11:46:20 PDT 2012

changeset:   122:1eceaf4a15a3
tag:         tip
user:        Brad Aagaard <baagaard at usgs.gov>
date:        Tue May 08 11:46:07 2012 -0700
files:       faultRup.tex
description:
Updated some text on the preconditioning performance. Added notes on what still needs to be done.


diff -r fe2ed4131587 -r 1eceaf4a15a3 faultRup.tex

--- a/faultRup.tex	Fri May 04 11:29:13 2012 -0700
+++ b/faultRup.tex	Tue May 08 11:46:07 2012 -0700
@@ -77,12 +77,11 @@
   compared to conventional Additive Schwarz methods. We demonstrate
   application of this approach using benchmarks for both quasi-static
   viscoelastic deformation and spontaneous dynamic rupture propagation
-  that verify the numerical implementation of these features in
-  PyLith.  Future work will focus on linking the quasi-static and
-  dynamic simulations together to capture both the slow strain
-  accumulation and post-seismic relaxation at long time scales and the
-  dynamic rupture propagation and radiation of seismic waves at short
-  time scales.
+  that verify the numerical implementation.  Future work will focus on
+  linking the quasi-static and dynamic simulations together to capture
+  both the slow strain accumulation and post-seismic relaxation at
+  long time scales and the dynamic rupture propagation and radiation
+  of seismic waves at short time scales.
 \end{abstract}
   
 % ------------------------------------------------------------------
@@ -872,7 +871,7 @@ construct many variations of effective p
 construct many variations of effective parallel preconditioners for
 saddle point problems.
 
-The PCFIELDSPLIT \citep{PETSc:manual} preconditioner in PETSc allows
+The field split preconditioner in PETSc \citep{PETSc:manual} allows
 the user to define sets of unknowns which correspond to different
 fields in the physical problem. This scheme is flexible enough to
 accommodate an arbitrary number of fields, mixed discretizations,
@@ -881,10 +880,10 @@ only PyLith options for PETSc. Table~\re
 only PyLith options for PETSc. Table~\ref{tab:solver:options} shows
 example preconditioners and the options necessary to construct them.
 
-Another option involves using using the field split preconditioner in
-PETSc in combination with a custom preconditioner matrix for the
-block associated with the Lagrange multipliers. In formulating the
-custom preconditioner, we exploit the structure of the sparse Jacobian
+Another option involves using the field split preconditioner in PETSc
+in combination with a custom preconditioner for the submatrix
+associated with the Lagrange multipliers. In formulating the custom
+preconditioner, we exploit the structure of the sparse Jacobian
 matrix. Our system Jacobian has the form
 \begin{equation}
   \mathbf{A} = \left( \begin{array}{cc}
@@ -892,7 +891,7 @@ matrix. Our system Jacobian has the form
       \mathbf{L} & \mathbf{0}
     \end{array} \right).
 \end{equation}
-We use the Schur complement of block $\mathbf{K}$ to examine the form
+We use the Schur complement of the submatrix $\mathbf{K}$ to examine the form
 of $\mathbf{A}^{-1}$,
 \begin{gather}
   \mathbf{A}^{-1} = \left( \begin{array}{cc}
@@ -905,7 +904,7 @@ of $\mathbf{A}^{-1}$,
     \mathbf{A}^{-1}_{pn} = -(-\mathbf{L} \mathbf{K}^{-1} \mathbf{L}^{T})^{-1} \mathbf{L} \mathbf{K}^{-1}, \\
     \mathbf{A}^{-1}_{pp} = -(\mathbf{L} \mathbf{K}^{-1} \mathbf{L}^T)^{-1}.
 \end{gather}
-A suitable block diagonal approximation of $\mathbf{A}^{-1}$ is
+A suitable diagonal approximation of submatrix $\mathbf{A}^{-1}$ is
 \begin{equation}
   \mathbf{P}^{-1} = \left( \begin{array}{cc}
       \mathbf{K}^{-1} & 0 \\
@@ -924,24 +923,26 @@ which leads to
   \end{array} \right).
 \end{equation}
 
-The elastic block $K$ can be further split into blocks associated with
-displacements along the $x$, $y$, and $z$ axes. It is known that the
-vector Laplacian is spectrally equivalent to this
-operator~\citep{AskMarkAdams}\matt{Ask Mark Adams}, and each component
-is efficiently preconditioned by algebraic multigrid (AMG) methods,
-such as the ML library \citep{ML:users:guide}. AMG mimics the action of
-traditional geometric multgrid, but it generates coarse level
-operators and interpolation matrices using only the system matrix,
-treated as a weighted graph, rather than a separate description of the
-problem geometry, such as a mesh. We use PCFIELDSPLIT to split the
-elastic block and separately apply AMG to each component.
+\matt{Rewrite this paragraph based on new formulation with only two
+  split fields.}  The elastic submatrix $K$ can be further split into
+blocks associated with displacements along the $x$, $y$, and $z$
+axes. It is known that the vector Laplacian is spectrally equivalent
+to this operator~\citep{AskMarkAdams}\matt{Ask Mark Adams}, and each
+component is efficiently preconditioned by algebraic multigrid (AMG)
+methods, such as the ML library \citep{ML:users:guide}. AMG mimics the
+action of traditional geometric multgrid, but it generates coarse
+level operators and interpolation matrices using only the system
+matrix, treated as a weighted graph, rather than a separate
+description of the problem geometry, such as a mesh. We use
+PCFIELDSPLIT to split the elastic block and separately apply AMG to
+each component.
 
 We now turn our attention to evaluating the fault portion of the
-preconditioning matrix associated with the Lagrange multipliers since
+preconditioning matrix associated with the Lagrange multipliers, since
 stock PETSc preconditioners can handle the elastic portion as
 discussed in the previous paragraph. In computing
-$\mathbf{P_\mathit{fault}}$ we we approximate $\mathbf{K}^{-1}$ with
-the inverse of the diagonal portion of $\mathbf{K}$. $\mathbf{L}$
+$\mathbf{P_\mathit{fault}}$ we approximate $\mathbf{K}^{-1}$ with
+the inverse of the diagonal portion of $\mathbf{K}$. $\mathbf{L}$, which
 consists of integrating the products of basis functions over the fault
 faces. Its structure depends on the quadrature scheme and the choice
 of basis functions. For conventional low order finite-elements and
@@ -957,23 +958,22 @@ portion of the conditioning matrix,
 \begin{equation}
   \mathbf{P}_\mathit{fault} = -\mathbf{L}_p (\mathbf{K}_{n+n+} + \mathbf{K}_{n-n-}) \mathbf{L}_p^{T},
 \end{equation}
-where $\mathbf{L}_p$ is given in equation~(\ref{eqn:jacobian:constraint:code}) and
-$\mathbf{K}_{n+n+}$ and $\mathbf{K}_{n-n-}$ are the diagonal terms
-from equation~(\ref{eqn:saddle:point:code}).
+where $\mathbf{L}_p$ is given in
+equation~(\ref{eqn:jacobian:constraint:code}) and $\mathbf{K}_{n+n+}$
+and $\mathbf{K}_{n-n-}$ are the diagonal terms from
+equation~(\ref{eqn:saddle:point:code}).
 
 % Matt conjectures that collocation, because it makes $\mathbf{L}$
 %  block diagonal, is more tolerant of the diagonal approximation for
 %  $\mathbf{K}$.}
 
-For the upper portion of the preconditioning matrix associated with
-elasticity, we have found AMG preconditioners provide substantially
-faster convergence that the Additive Schwarz method. We combine the
-field split preconditioner with the AMG preconditioner, such that we
-precondition the DOF for each global coordinate axis
-independently. See Section~\ref{sec:solvertest} for a comparison of
-preconditioner performance for an application involved a static
-simulation with multiple faults. It shows the clear superiority of our
-custom fault preconditioner.
+Our preferred setup uses the field splitting options in PETSc to
+combine an AMG preconditioner for the elasticity submatrix with out
+custom fault preconditioner for the Lagrange multiplier submatrix. See
+Section~\ref{sec:solvertest} for a comparison of preconditioner
+performance for an application involved a static simulation with
+multiple faults. It shows the clear superiority of this setup over
+several other possible preconditioning strategies.
 
 \subsection{Dynamic Simulations}
 
@@ -1164,7 +1164,7 @@ We compare the relative performance of t
 We compare the relative performance of the various preconditioners
 discussed in section~\ref{sec:solver:quasi-static} for quasi-static
 problems using a static simulation with three vertical, strike-slip
-faults. Using multiple, intersecting faults involves multiple saddle
+faults. Using multiple, intersecting faults introduces multiple saddle
 points, so it provides a more thorough test of the preconditioner
 compared to a single fault with a single saddle point.
 Figure~\ref{fig:solvertest:geometry} shows the geometry of the faults
@@ -1187,22 +1187,23 @@ Figure~\ref{fig:solvertest:mesh} shows t
 Figure~\ref{fig:solvertest:mesh} shows the 1744 m resolution
 tetrahedral mesh. As we will see in
 Section~\ref{sec:verification:quasi-static}, the hexahedral mesh for a
-given resolution is more accurate, so the errors in solution for each
-pair of meshes are significantly larger for the tetrahedral mesh.
+given resolution in a quasi-static problem is slightly more accurate,
+so the errors in solution for each pair of meshes are larger for the
+tetrahedral mesh.
 
 \subsection{Preconditioner Performance}
 
 We characterize preconditioner performance in terms of the number of
-iterations required for the residual to reach a convergence tolerance
-and the sensitivity of the number of iterations to the problem
-size. An ideal preconditioner would yield a small, constant number of
-iterations independent of problem size. However, for complex problems
-such as elasticity with fault slip and potentially nonuniform physical
-properties, ideal preconditioners may not exist. Hence, we seek a
-preconditioner that provides a minimal increase in the number of
-iterations as the problem size increases, so that we can efficiently
-simulate quasi-static crustal deformation related to faulting and
-post-seismic and interseismic deformation.
+iterations required for the residual to reach a given convergence
+tolerance and the sensitivity of the number of iterations to the
+problem size. An ideal preconditioner would yield a small, constant
+number of iterations independent of problem size. However, for complex
+problems such as elasticity with fault slip and potentially nonuniform
+physical properties, ideal preconditioners may not exist. Hence, we
+seek a preconditioner that provides a minimal increase in the number
+of iterations as the problem size increases, so that we can
+efficiently simulate quasi-static crustal deformation related to
+faulting and post-seismic and interseismic deformation.
 
 For this benchmark of preconditioner performance, we examine the
 number of iterations required for convergence using the PETSc Additive
@@ -1222,7 +1223,7 @@ freedom, compared to 60\% for the ASM pr
 freedom, compared to 60\% for the ASM preconditioner. Within the
 family of field split preconditioners, the one with multiplicative
 composition minimizes the number of iterations. The custom
-preconditioner for the fault block (Lagrange multipliers), it greatly
+preconditioner for the Lagrange multiplier submatrix greatly
 accelerates the convergence with an 80\% further reduction in the
 number of iterations required for convergence.
 
@@ -1240,16 +1241,17 @@ tetrahedral meshes described earlier tha
 tetrahedral meshes described earlier that range in size from
 $1.78\times 10^5$ DOF to $2.14\times 10^7$ DOF. In each of these
 simulations, we employ the field split algebraic multigrid
-preconditioner with multiplicative composition and the custom fault
-block preconditioner. We ran the simulations on a Beowulf cluster
-comprised of 24 compute nodes connected by QDR Infiniband, where each
-compute node consists of two quad-core Intel Xeon E5620 processors
-with 24 GB RAM. Simulations run on eight or fewer cores were run on a
-single compute node. Thus, in addition to algorithm bottlenecks,
-runtime performance is potentially impeded by core/memory affinity,
-memory bandwidth, and communication among compute nodes.
+preconditioner with multiplicative composition for the elasticity
+submatrix and the custom preconditioner for the Lagrange multipliers
+submatrix. We ran the simulations on a Beowulf cluster comprised of 24
+compute nodes connected by QDR Infiniband, where each compute node
+consisted of two quad-core Intel Xeon E5620 processors with 24 GB
+of RAM. Simulations run on eight or fewer cores were run on a single
+compute node. Thus, in addition to algorithm bottlenecks, runtime
+performance is potentially impeded by core/memory affinity, memory
+bandwidth, and communication among compute nodes.
 
-\brad{Update this after tuning solver}%
+
 Figure~\ref{fig:solvertest:scaling} illustrates excellent the parallel
 performance for the finite-element assembly routines (reforming the
 Jacobian sparse matrix and computing the residual) with somewhat poor
@@ -1257,7 +1259,7 @@ the solve.  The finite-element assembly 
 the solve.  The finite-element assembly routines achieve weak scaling
 with negligible effects from the cluster architecture. The solver, on
 the other hand, shows a significant increase in runtime \ldots
-\matt{Can we get better solver scaling?}%
+\matt{Update this}%
 
 % ------------------------------------------------------------------
 \section{Code Verification Benchmarks}
@@ -1771,12 +1773,13 @@ simulations of earthquake rupture propag
 % TABLES
 % ------------------------------------------------------------------
 \begin{table*}
+\matt{Add labels. Need consistency with Table~\ref{tab:solvertest:preconditioner:iterates}.}
   \caption{Example Preconditioners for the Saddle Point Problem in
     Equation~(\ref{eqn:saddle:point})\tablenotemark{a}}
 \label{tab:solver:options}
 \centering
 \begin{tabular}{ll}
-  $\begin{pmatrix}\mathbf{K} & \mathbf{0} \\ \mathbf{0} & \mathbf{I}\end{pmatrix}$ & $\begin{pmatrix}\mathbf{K} & \mathbf{L}^T \\ \mathbf{0} & \mathbf{I}\end{pmatrix}$ \\
+  $\begin{pmatrix}\mathbf{K} & \mathbf{0} \\ \mathbf{0} & \mathbf{I}\end{pmatrix}$ \matt{ADD LABEL HERE} & $\begin{pmatrix}\mathbf{K} & \mathbf{L}^T \\ \mathbf{0} & \mathbf{I}\end{pmatrix}$ \matt{ADD LABEL HERE}\\
   \texttt{[pylithapp.problem.formulation]}             & \texttt{[pylithapp.problem.formulation]} \\
   \texttt{split\_fields = True}                        & \texttt{split\_fields = True} \\
   \texttt{matrix\_type = aij}                          & \texttt{matrix\_type = aij} \\
@@ -1789,7 +1792,7 @@ simulations of earthquake rupture propag
   \texttt{fs\_fieldsplit\_1\_pc\_type = jacobi}        & \texttt{fs\_fieldsplit\_1\_pc\_type = jacobi} \\
   \texttt{fs\_fieldsplit\_1\_ksp\_type = preonly}      & \texttt{fs\_fieldsplit\_1\_ksp\_type = preonly} \\
   \smallskip \\
-  $\begin{pmatrix}\mathbf{K} & \mathbf{0} \\ \mathbf{0} & -\mathbf{S}\end{pmatrix}$ & $\begin{pmatrix}\mathbf{K} & \mathbf{0} \\ \mathbf{L} & \mathbf{S}\end{pmatrix}$ \\
+  $\begin{pmatrix}\mathbf{K} & \mathbf{0} \\ \mathbf{0} & -\mathbf{S}\end{pmatrix}$ \matt{ADD LABEL HERE}& $\begin{pmatrix}\mathbf{K} & \mathbf{0} \\ \mathbf{L} & \mathbf{S}\end{pmatrix}$ \matt{ADD LABEL HERE}\\
   \texttt{[pylithapp.problem.formulation]}                   & \texttt{[pylithapp.problem.formulation]} \\
   \texttt{split\_fields = False}                             & \texttt{split\_fields = False} \\
   \texttt{matrix\_type = aij}                                & \texttt{matrix\_type = aij} \\
@@ -1804,7 +1807,7 @@ simulations of earthquake rupture propag
   \texttt{fieldsplit\_1\_pc\_type = none}                    & \texttt{fieldsplit\_1\_pc\_type = none} \\
   \texttt{fieldsplit\_1\_ksp\_type = minres}                 & \texttt{fieldsplit\_1\_ksp\_type = minres} \\
   \smallskip \\
-  $\begin{pmatrix}\mathbf{K} & \mathbf{L}^T \\ \mathbf{0} & \mathbf{S}\end{pmatrix}$ & $\begin{pmatrix}\mathbf{I} & \mathbf{0} \\ \mathbf{B}^T \mathbf{A}^{-1} & \mathbf{I}\end{pmatrix}\begin{pmatrix}\mathbf{A} & \mathbf{0} \\ \mathbf{0} & \mathbf{S}\end{pmatrix}\begin{pmatrix}\mathbf{I} & \mathbf{A}^{-1} \mathbf{B} \\ \mathbf{0} & \mathbf{I}\end{pmatrix}$ \\
+  $\begin{pmatrix}\mathbf{K} & \mathbf{L}^T \\ \mathbf{0} & \mathbf{S}\end{pmatrix}$ \matt{ADD LABEL HERE} & $\begin{pmatrix}\mathbf{I} & \mathbf{0} \\ \mathbf{B}^T \mathbf{A}^{-1} & \mathbf{I}\end{pmatrix}\begin{pmatrix}\mathbf{A} & \mathbf{0} \\ \mathbf{0} & \mathbf{S}\end{pmatrix}\begin{pmatrix}\mathbf{I} & \mathbf{A}^{-1} \mathbf{B} \\ \mathbf{0} & \mathbf{I}\end{pmatrix}$ \matt{ADD LABEL HERE}\\
   \texttt{[pylithapp.problem.formulation]}                    & \texttt{[pylithapp.problem.formulation]} \\
   \texttt{split\_fields = False}                              & \texttt{split\_fields = False} \\
   \texttt{matrix\_type = aij}                                 & \texttt{matrix\_type = aij} \\
@@ -1819,10 +1822,7 @@ simulations of earthquake rupture propag
   \texttt{fieldsplit\_1\_pc\_type = none}                     & \texttt{fieldsplit\_1\_pc\_type = none} \\
   \texttt{fieldsplit\_1\_ksp\_type = minres}                  & \texttt{fieldsplit\_1\_ksp\_type = minres} \\
 \end{tabular}
-\tablenotetext{a}{All of these field split
-  preconditioners require the use of the parameters
-  \texttt{split\_fields = True} and \texttt{matrix\_type = aij} for
-  \texttt{pylithapp.problem.formulation}.}
+\tablenotetext{a}{ADD STUFF HERE}
 \end{table*}
 
 \clearpage
@@ -1891,34 +1891,42 @@ simulations of earthquake rupture propag
 \label{tab:solvertest:preconditioner:iterates}
 \centering
 \begin{tabular}{lcrrr}
-  \hline
   Preconditioner & Cell & \multicolumn{3}{c}{Problem Size} \\
      &      & S1 & S2 & S4 \\
   \hline
   ASM
     & Tet4 & 239 & 287 & 434 \\
     & Hex8 & 184 & 236 & 298 \\
+  Schur (full)
+    & Tet4 & 131 & 173 & 205 \\
+    & Hex8 & 101 & 131 & 155 \\
+  Schur (lower)
+    & Tet4 & 222 & 269 & 358 \\
+    & Hex8 & 175 & 215 & 274 \\
+  Schur (upper)
+    & Tet4 & 222 & 269 & 356 \\
+    & Hex8 & 175 & 215 & 274 \\
   FieldSplit (add)
-    & Tet4 & 416 & 482 & 499 \\
-    & Hex8 & 304 & 329 & 370 \\
+    & Tet4 & 301 & 330 & 333 \\
+    & Hex8 & 205 & 203 & 232 \\
   FieldSplit (mult)
-    & Tet4 & 341 & 390 & 396 \\
-    & Hex8 & 245 & 261 & 293 \\
+    & Tet4 & 451 & 503 & 517 \\
+    & Hex8 & 258 & 264 & 331 \\
   FieldSplit (mult,custom)
-    & Tet4 & 62 & 69 & 77 \\
-    & Hex8 & 51 & 57 & 62 \\
+    & Tet4 & 60 & 63 & 70 \\
+    & Hex8 & 48 & 51 & 59 \\
   \hline
 \end{tabular}
 \tablenotetext{a}{Number of iterations for Additive Schwarz (ASM),
-  field split (additive, multiplicative, and multiplicative with
-  custom fault block preconditioner), and Schur complement preconditioners for
-  tetrahedral and hexahedral discretizations and three problem sizes
-  (S1 with $1.8\times 10^5$ DOF, S2 with $3.5\times
-  10^5$ DOF, and S3 with $6.9\times 10^5$ DOF). The field split
-  preconditioner with multiplicative composittion and the custom fault
-  block preconditioner yields good performance with only a fraction of
-  the iterates as the other preconditioners and a small increase with
-  problem size.}
+  Schur complement (Schur), and field split (additive, multiplicative,
+  and multiplicative with custom fault block preconditioner),
+  preconditioners for tetrahedral and hexahedral discretizations and
+  three problem sizes (S1 with $1.8\times 10^5$ DOF, S2 with
+  $3.5\times 10^5$ DOF, and S3 with $6.9\times 10^5$ DOF). The field
+  split preconditioner with multiplicative composittion and the custom
+  fault block preconditioner yields good performance with only a
+  fraction of the iterates as the other preconditioners and a small
+  increase with problem size.}
 \end{table}