[cig-commits] commit: More writing

Thu May 10 14:18:01 PDT 2012

changeset:   123:9a339f379222
tag:         tip
user:        Matthew G. Knepley <knepley at gmail.com>
date:        Thu May 10 17:17:47 2012 -0400
files:       faultRup.tex
description:
More writing


diff -r 1eceaf4a15a3 -r 9a339f379222 faultRup.tex

--- a/faultRup.tex	Tue May 08 11:46:07 2012 -0700
+++ b/faultRup.tex	Thu May 10 17:17:47 2012 -0400
@@ -23,6 +23,7 @@
 \newcommand\brad[1]{{\color{red}\bf [BRAD: #1]}}
 \newcommand\matt[1]{{\color{blue}\bf [MATT: #1]}}
 \newcommand\charles[1]{{\color{green}\bf [CHARLES: #1]}}
+\newcommand\event[1]{\texttt{\bf #1}}
 
 % ======================================================================
 % PREAMBLE
@@ -891,27 +892,11 @@ matrix. Our system Jacobian has the form
       \mathbf{L} & \mathbf{0}
     \end{array} \right).
 \end{equation}
-We use the Schur complement of the submatrix $\mathbf{K}$ to examine the form
-of $\mathbf{A}^{-1}$,
-\begin{gather}
-  \mathbf{A}^{-1} = \left( \begin{array}{cc}
-      \mathbf{A}^{-1}_{nn} & \mathbf{A}^{-1}_{np} \\
-      \mathbf{A}^{-1}_{pn} & \mathbf{A}^{-1}_{pp}
-    \end{array} \right), \\
-  \mathbf{A}^{-1}_{nn} = \mathbf{K}^{-1} + \mathbf{K}^{-1} \mathbf{L}^{T} 
-    (-\mathbf{L} \mathbf{K}^{-1} \mathbf{L}^{T})^{-1} \mathbf{L} \mathbf{K}^{-1}, \\
-    \mathbf{A}^{-1}_{np} = -\mathbf{K}^{-1} \mathbf{L}^{T }(-\mathbf{L} \mathbf{K}^{-1} \mathbf{L}^{T})^{-1}, \\
-    \mathbf{A}^{-1}_{pn} = -(-\mathbf{L} \mathbf{K}^{-1} \mathbf{L}^{T})^{-1} \mathbf{L} \mathbf{K}^{-1}, \\
-    \mathbf{A}^{-1}_{pp} = -(\mathbf{L} \mathbf{K}^{-1} \mathbf{L}^T)^{-1}.
-\end{gather}
-A suitable diagonal approximation of submatrix $\mathbf{A}^{-1}$ is
+The Schur complement $\mathbf{S}$ of the submatrix $\mathbf{K}$ is given by,
 \begin{equation}
-  \mathbf{P}^{-1} = \left( \begin{array}{cc}
-      \mathbf{K}^{-1} & 0 \\
-      0 & -(\mathbf{L} \mathbf{K}^{-1} \mathbf{L}^T)^{-1}
-    \end{array} \right),
+  \mathbf{S} = -\mathbf{L} \mathbf{K}^{-1} \mathbf{L}^T
 \end{equation}
-which leads to
+which leads to a simple block diagonal preconditioner for $\mathbf{A}$
 \begin{equation}
   \mathbf{P} = \left( \begin{array}{cc}
     \mathbf{P}_\mathit{elasticity} & 0 \\
@@ -923,19 +908,19 @@ which leads to
   \end{array} \right).
 \end{equation}
 
-\matt{Rewrite this paragraph based on new formulation with only two
-  split fields.}  The elastic submatrix $K$ can be further split into
-blocks associated with displacements along the $x$, $y$, and $z$
-axes. It is known that the vector Laplacian is spectrally equivalent
-to this operator~\citep{AskMarkAdams}\matt{Ask Mark Adams}, and each
-component is efficiently preconditioned by algebraic multigrid (AMG)
-methods, such as the ML library \citep{ML:users:guide}. AMG mimics the
+The elastic submatrix $K$, in the absence of boundary conditions,
+has three translational and three rotational null modes. These are
+provided to the algebraic multigrid preconditioner, such as the
+ML library \citep{ML:users:guide} or the PETSc GAMG preconditioner,
+in order to assure an accurate coarse grid solution. AMG mimics the
 action of traditional geometric multgrid, but it generates coarse
 level operators and interpolation matrices using only the system
 matrix, treated as a weighted graph, rather than a separate
 description of the problem geometry, such as a mesh. We use
-PCFIELDSPLIT to split the elastic block and separately apply AMG to
-each component.
+PCFIELDSPLIT to split the elastic block from the fault block, and
+also to manage Schur complements. In this way, all block preconditioners,
+including those nested with multigrid, can be controlled from the
+options file without recompilation or special code.
 
 We now turn our attention to evaluating the fault portion of the
 preconditioning matrix associated with the Lagrange multipliers, since
@@ -1251,15 +1236,27 @@ performance is potentially impeded by co
 performance is potentially impeded by core/memory affinity, memory
 bandwidth, and communication among compute nodes.
 
+The scaling of \event{VecMAXPY} and \event{MatMult} give a good idea of the
+memory system performance since both operations are limited by memory bandwidth.
+From Fig.~\ref{fig:memBandwidth}, we see that we saturate the memory system
+using two processes per processor, since scaling plateaus from 2 to 4 processes,
+but shows good scaling from 4 to 8.
+
+Machine network performance can be elucidated by the \event{VecMDot} operation
+for reductions, and \event{MatMult} for point-to-point communication. In the USGS
+configuration, four dual-quadcore chips shared a single backplane. In
+Fig.~\ref{fig:comm} we see that these operations show good scaling up to 32 processes,
+which corresponds to a single 2U box, but very poor scaling thereafter, indicating
+a very poor internode network.
 
 Figure~\ref{fig:solvertest:scaling} illustrates excellent the parallel
 performance for the finite-element assembly routines (reforming the
-Jacobian sparse matrix and computing the residual) with somewhat poor
-parallel performance for setting up the preconditioner and performing
-the solve.  The finite-element assembly routines achieve weak scaling
-with negligible effects from the cluster architecture. The solver, on
-the other hand, shows a significant increase in runtime \ldots
-\matt{Update this}%
+Jacobian sparse matrix and computing the residual). From the ASM performance,
+we see that the basic solver building-blocks, like parallel sparse matrix-vector
+multiplication, scale well. However, the preconditioner itself is not scalable, and
+the number of iterates increases strongly with process count. The introduction of
+Schur complement methods and AMG slows the growth considerably, but future work will
+pursue the ultimate goal of iteration counts independent of process number.
 
 % ------------------------------------------------------------------
 \section{Code Verification Benchmarks}
@@ -1773,54 +1770,38 @@ simulations of earthquake rupture propag
 % TABLES
 % ------------------------------------------------------------------
 \begin{table*}
-\matt{Add labels. Need consistency with Table~\ref{tab:solvertest:preconditioner:iterates}.}
   \caption{Example Preconditioners for the Saddle Point Problem in
     Equation~(\ref{eqn:saddle:point})\tablenotemark{a}}
 \label{tab:solver:options}
 \centering
 \begin{tabular}{ll}
-  $\begin{pmatrix}\mathbf{K} & \mathbf{0} \\ \mathbf{0} & \mathbf{I}\end{pmatrix}$ \matt{ADD LABEL HERE} & $\begin{pmatrix}\mathbf{K} & \mathbf{L}^T \\ \mathbf{0} & \mathbf{I}\end{pmatrix}$ \matt{ADD LABEL HERE}\\
+  $\begin{pmatrix}\mathbf{K} & \mathbf{0} \\ \mathbf{0} & \mathbf{I}\end{pmatrix}$~\label{prec:add} & $\begin{pmatrix}\mathbf{K} & \mathbf{L}^T \\ \mathbf{0} & \mathbf{I}\end{pmatrix}$ \label{prec:mult}\\
   \texttt{[pylithapp.problem.formulation]}             & \texttt{[pylithapp.problem.formulation]} \\
   \texttt{split\_fields = True}                        & \texttt{split\_fields = True} \\
   \texttt{matrix\_type = aij}                          & \texttt{matrix\_type = aij} \\
   \texttt{[pylithapp.petsc]}                           & \texttt{[pylithapp.petsc]} \\
   \texttt{fs\_pc\_type = fieldsplit}                   & \texttt{fs\_pc\_type = fieldsplit} \\
-  \texttt{fs\_pc\_field\_split\_real\_diagonal = True} & \texttt{fs\_pc\_field\_split\_real\_diagonal = True} \\
+  \texttt{fs\_pc\_field\_split\_real\_diagonal = true} & \texttt{fs\_pc\_field\_split\_real\_diagonal = true} \\
   \texttt{fs\_pc\_field\_split\_type = additive}       & \texttt{fs\_pc\_field\_split\_type multiplicative} \\
   \texttt{fs\_fieldsplit\_0\_pc\_type = ml}            & \texttt{fs\_fieldsplit\_0\_pc\_type = ml} \\
   \texttt{fs\_fieldsplit\_0\_ksp\_type = preonly}      & \texttt{fs\_fieldsplit\_0\_ksp\_type = preonly} \\
   \texttt{fs\_fieldsplit\_1\_pc\_type = jacobi}        & \texttt{fs\_fieldsplit\_1\_pc\_type = jacobi} \\
-  \texttt{fs\_fieldsplit\_1\_ksp\_type = preonly}      & \texttt{fs\_fieldsplit\_1\_ksp\_type = preonly} \\
+  \texttt{fs\_fieldsplit\_1\_ksp\_type = gmres}        & \texttt{fs\_fieldsplit\_1\_ksp\_type = gmres} \\
   \smallskip \\
-  $\begin{pmatrix}\mathbf{K} & \mathbf{0} \\ \mathbf{0} & -\mathbf{S}\end{pmatrix}$ \matt{ADD LABEL HERE}& $\begin{pmatrix}\mathbf{K} & \mathbf{0} \\ \mathbf{L} & \mathbf{S}\end{pmatrix}$ \matt{ADD LABEL HERE}\\
-  \texttt{[pylithapp.problem.formulation]}                   & \texttt{[pylithapp.problem.formulation]} \\
-  \texttt{split\_fields = False}                             & \texttt{split\_fields = False} \\
-  \texttt{matrix\_type = aij}                                & \texttt{matrix\_type = aij} \\
-  \texttt{[pylithapp.petsc]}                                 & \texttt{[pylithapp.petsc]} \\
-  \texttt{pc\_type = fieldsplit}                             & \texttt{pc\_type = fieldsplit} \\
-  \texttt{pc\_field\_split\_type = schur}                    & \texttt{pc\_field\_split\_type schur} \\
-  \texttt{pc\_field\_split\_detect\_saddle\_point = True}    & \texttt{pc\_field\_split\_detect\_saddle\_point = True} \\
-  \texttt{pc\_fieldsplit\_schur\_factorization\_type = diag} & \texttt{pc\_fieldsplit\_schur\_factorization\_type lower} \\
-  \texttt{pc\_fieldsplit\_schur\_precondition = diag}??      & \texttt{pc\_fieldsplit\_schur\_precondition = diag}?? \\
-  \texttt{fieldsplit\_0\_pc\_type = ml}                      & \texttt{fieldsplit\_0\_pc\_type = ml} \\
-  \texttt{fieldsplit\_0\_ksp\_type = preonly}                & \texttt{fieldsplit\_0\_ksp\_type = preonly} \\
-  \texttt{fieldsplit\_1\_pc\_type = none}                    & \texttt{fieldsplit\_1\_pc\_type = none} \\
-  \texttt{fieldsplit\_1\_ksp\_type = minres}                 & \texttt{fieldsplit\_1\_ksp\_type = minres} \\
-  \smallskip \\
-  $\begin{pmatrix}\mathbf{K} & \mathbf{L}^T \\ \mathbf{0} & \mathbf{S}\end{pmatrix}$ \matt{ADD LABEL HERE} & $\begin{pmatrix}\mathbf{I} & \mathbf{0} \\ \mathbf{B}^T \mathbf{A}^{-1} & \mathbf{I}\end{pmatrix}\begin{pmatrix}\mathbf{A} & \mathbf{0} \\ \mathbf{0} & \mathbf{S}\end{pmatrix}\begin{pmatrix}\mathbf{I} & \mathbf{A}^{-1} \mathbf{B} \\ \mathbf{0} & \mathbf{I}\end{pmatrix}$ \matt{ADD LABEL HERE}\\
+  $\begin{pmatrix}\mathbf{K} & \mathbf{L}^T \\ \mathbf{0} & \mathbf{S}\end{pmatrix}$ \label{prec:schurUpper} & $\begin{pmatrix}\mathbf{I} & \mathbf{0} \\ \mathbf{B}^T \mathbf{A}^{-1} & \mathbf{I}\end{pmatrix}\begin{pmatrix}\mathbf{A} & \mathbf{0} \\ \mathbf{0} & \mathbf{S}\end{pmatrix}\begin{pmatrix}\mathbf{I} & \mathbf{A}^{-1} \mathbf{B} \\ \mathbf{0} & \mathbf{I}\end{pmatrix}$ \label{prec:schurFull}\\
   \texttt{[pylithapp.problem.formulation]}                    & \texttt{[pylithapp.problem.formulation]} \\
-  \texttt{split\_fields = False}                              & \texttt{split\_fields = False} \\
+  \texttt{split\_fields = True}                               & \texttt{split\_fields = True} \\
   \texttt{matrix\_type = aij}                                 & \texttt{matrix\_type = aij} \\
   \texttt{[pylithapp.petsc]}                                  & \texttt{[pylithapp.petsc]} \\
   \texttt{pc\_type = fieldsplit}                              & \texttt{pc\_type = fieldsplit} \\
   \texttt{pc\_field\_split\_type = schur}                     & \texttt{pc\_field\_split\_type = schur} \\
-  \texttt{pc\_field\_split\_detect\_saddle\_point = True}     & \texttt{pc\_field\_split\_detect\_saddle\_point = True} \\
+  \texttt{fs\_pc\_field\_split\_real\_diagonal = true}        & \texttt{fs\_pc\_field\_split\_real\_diagonal = true} \\
   \texttt{pc\_fieldsplit\_schur\_factorization\_type = upper} & \texttt{pc\_fieldsplit\_schur\_factorization\_type = full} \\
-  \texttt{pc\_fieldsplit\_schur\_precondition = diag}??       & \texttt{pc\_fieldsplit\_schur\_precondition = diag}?? \\
+  \texttt{pc\_fieldsplit\_schur\_precondition = user}         & \texttt{pc\_fieldsplit\_schur\_precondition = user} \\
   \texttt{fieldsplit\_0\_pc\_type = ml}                       & \texttt{fieldsplit\_0\_pc\_type = ml} \\
   \texttt{fieldsplit\_0\_ksp\_type = preonly}                 & \texttt{fieldsplit\_0\_ksp\_type = preonly} \\
-  \texttt{fieldsplit\_1\_pc\_type = none}                     & \texttt{fieldsplit\_1\_pc\_type = none} \\
-  \texttt{fieldsplit\_1\_ksp\_type = minres}                  & \texttt{fieldsplit\_1\_ksp\_type = minres} \\
+  \texttt{fieldsplit\_1\_pc\_type = jacobi}                   & \texttt{fieldsplit\_1\_pc\_type = jacobi} \\
+  \texttt{fieldsplit\_1\_ksp\_type = gmres}                   & \texttt{fieldsplit\_1\_ksp\_type = gmres} \\
 \end{tabular}
 \tablenotetext{a}{ADD STUFF HERE}
 \end{table*}
@@ -1900,9 +1881,6 @@ simulations of earthquake rupture propag
   Schur (full)
     & Tet4 & 131 & 173 & 205 \\
     & Hex8 & 101 & 131 & 155 \\
-  Schur (lower)
-    & Tet4 & 222 & 269 & 358 \\
-    & Hex8 & 175 & 215 & 274 \\
   Schur (upper)
     & Tet4 & 222 & 269 & 356 \\
     & Hex8 & 175 & 215 & 274 \\