[aspect-devel] Assemble temperature system time

Wed Nov 6 17:26:36 PST 2013

On 11/06/2013 06:48 PM, Eric Heien wrote:
> Hello all,
>
> I’m doing some medium sized 3D box runs on TACC Stampede now (1e6 DOF, 32 cores) and I’ve noticed the Assemble temperature system timing is very large.
>
> +---------------------------------------------+------------+------------+
> | Total wallclock time elapsed since start    |  3.15e+03s |            |
> |                                             |            |            |
> | Section                         | no. calls |  wall time | % of total |
> +---------------------------------+-----------+------------+------------+
> | Assemble Stokes system          |       201 |       315s |        10% |
> | Assemble temperature system     |       201 |  2.16e+03s |        69% |
> | Build Stokes preconditioner     |         1 |      26.6s |      0.85% |
> | Build temperature preconditioner|       201 |      21.7s |      0.69% |
> | Solve Stokes system             |       201 |        90s |       2.9% |
> | Solve temperature system        |       201 |      13.9s |      0.44% |
> | Initialization                  |         2 |      1.51s |     0.048% |
> | Postprocessing                  |       201 |       310s |       9.8% |
> | Setup dof systems               |         1 |      15.7s |       0.5% |
> +---------------------------------+-----------+------------+——————+
>
> Does anyone have a suggestion for why this might be?  I know some of the developers recently worked on improving the performance of this, which is why it seems odd it would take so much time.  Before I dig into the reasons with a profiler, I was hoping someone might know what’s wrong.  This was compiled with Intel compiler version 13.0.079, and uses deal.II r31565 and Aspect r2009.

To me, this looks a lot like you're in debug mode. For reference, the 
simulation from which I created this movie
   http://www.youtube.com/watch?v=_bKqU_P4j48
used the attached version of the 3d box convection input file. There, 
timing looked like this:

+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start    |  5.88e+05s |           |
|                                             |            |           |
| Section                         | no. calls |  wall time | % of total|
+---------------------------------+-----------+------------+------------+
| Assemble Stokes system          |     12634 |  1.44e+05s |       24% |
| Assemble temperature system     |     12634 |  1.09e+05s |       19% |
| Build Stokes preconditioner     |      1098 |  5.96e+04s |       10% |
| Build temperature preconditioner|     12634 |  1.85e+04s |      3.2% |
| Solve Stokes system             |     12634 |  1.72e+05s |       29% |
| Solve temperature system        |     12634 |  5.18e+04s |      8.8% |
| Create snapshot                 |       252 |       203s |    0.035% |
| Initialization                  |         5 |     0.258s |  4.4e-05% |
| Postprocessing                  |     12631 |  6.15e+03s |        1% |
| Refine mesh structure, part 1   |       846 |  5.28e+03s |      0.9% |
| Refine mesh structure, part 2   |       846 |       475s |    0.081% |
| Setup dof systems               |       847 |  1.03e+04s |      1.7% |
+---------------------------------+-----------+------------+------------+

I think these percentages are more realistic. This was on 64 processors.

Best
  Wolfgang

-- 
------------------------------------------------------------------------
Wolfgang Bangerth               email:            bangerth at math.tamu.edu
                                 www: http://www.math.tamu.edu/~bangerth/

-------------- next part --------------
set Resume computation                     = false

set Timing output frequency = 10

# At the top, we define the number of space dimensions we would like to
# work in:
set Dimension                              = 3

# There are several global variables that have to do with what
# time system we want to work in and what the end time is. We
# also designate an output directory.
set Use years in output instead of seconds = false
set End time                               = 1.0
set Output directory                       = output-x

# Then there are variables that describe the tolerance of
# the linear solver as well as how the pressure should
# be normalized. Here, we choose a zero average pressure
# at the surface of the domain (for the current geometry, the
# surface is defined as the top boundary).
set Linear solver tolerance                = 1e-15
set Temperature solver tolerance           = 1e-15

set Pressure normalization                 = surface
set Surface pressure                       = 0

# Then come a number of sections that deal with the setup
# of the problem to solve. The first one deals with the
# geometry of the domain within which we want to solve.
# The sections that follow all have the same basic setup
# where we select the name of a particular model (here,
# the box geometry) and then, in a further subsection,
# set the parameters that are specific to this particular
# model.
subsection Geometry model
  set Model name = box

  subsection Box
    set X extent = 1
    set Y extent = 1
    set Z extent = 1
  end
end

# The next section deals with the initial conditions for the
# temperature (there are no initial conditions for the
# velocity variable since the velocity is assumed to always
# be in a static equilibrium with the temperature field).
# There are a number of models with the 'function' model
# a generic one that allows us to enter the actual initial
# conditions in the form of a formula that can contain
# constants. We choose a linear temperature profile that
# matches the boundary conditions defined below plus
# a small perturbation:
subsection Initial conditions
  set Model name = function

  subsection Function
    set Variable names      = x,y,z
    set Function constants  = p=0.01, L=1, pi=3.1415926536, k=1
    set Function expression = (1.0-z) - p*cos(k*pi*x/L)*sin(pi*z)*y^3
  end
end

# Then follows a section that describes the boundary conditions
# for the temperature. The model we choose is called 'box' and
# allows to set a constant temperature on each of the four sides
# of the box geometry. In our case, we choose something that is
# heated from below and cooled from above. (As will be seen
# in the next section, the actual temperature prescribed here
# at the left and right does not matter.)
subsection Boundary temperature model
  set Model name = box

  subsection Box
    set Bottom temperature = 1
    set Left temperature   = 0
    set Right temperature  = 0
    set Top temperature    = 0
  end
end

# We then also have to prescribe several other parts of the model
# such as which boundaries actually carry a prescribed boundary
# temperature (as described in the documentation of the `box'
# geometry, boundaries 2 and 3 are the bottom and top boundaries)
# whereas all other parts of the boundary are insulated (i.e.,
# no heat flux through these boundaries; this is also often used
# to specify symmetry boundaries).
subsection Model settings
  set Fixed temperature boundary indicators   = 4,5

  # The next parameters then describe on which parts of the
  # boundary we prescribe a zero or nonzero velocity and
  # on which parts the flow is allowed to be tangential.
  # Here, all four sides of the box allow tangential
  # unrestricted flow but with a zero normal component:
  set Zero velocity boundary indicators       =
  set Prescribed velocity boundary indicators =
  set Tangential velocity boundary indicators = 0,1,2,3,4,5

  # The final part of this section describes whether we
  # want to include adiabatic heating (from a small
  # compressibility of the medium) or from shear friction,
  # as well as the rate of internal heating. We do not
  # want to use any of these options here:
  set Include adiabatic heating               = false
  set Include shear heating                   = false
  set Radiogenic heating rate                 = 0
end

# The following two sections describe first the
# direction (vertical) and magnitude of gravity and the
# material model (i.e., density, viscosity, etc). We have
# discussed the settings used here in the introduction to
# this cookbook in the manual already.
subsection Gravity model
  set Model name = vertical

  subsection Vertical
    set Magnitude = 1e16   # = Ra / Thermal expansion coefficient
  end
end

subsection Material model
  set Model name = simple # default:

  subsection Simple model
    set Reference density             = 1
    set Reference specific heat       = 1
    set Reference temperature         = 0
    set Thermal conductivity          = 1
    set Thermal expansion coefficient = 1e-10
    set Viscosity                     = 1
  end
end

# The settings above all pertain to the description of the
# continuous partial differential equations we want to solve.
# The following section deals with the discretization of
# this problem, namely the kind of mesh we want to compute
# on. We here use a globally refined mesh without
# adaptive mesh refinement.
subsection Mesh refinement
  set Initial global refinement                = 3
  set Initial adaptive refinement              = 3
  set Time steps between mesh refinement       = 15

  set Additional refinement times              = 0.003

end

# The final part is to specify what ASPECT should do with the
# solution once computed at the end of every time step. The
# process of evaluating the solution is called `postprocessing'
# and we choose to compute velocity and temperature statistics,
# statistics about the heat flux through the boundaries of the
# domain, and to generate graphical output files for later
# visualization. These output files are created every time
# a time step crosses time points separated by 0.01. Given
# our start time (zero) and final time (0.5) this means that
# we will obtain 50 output files.
subsection Postprocess
  set List of postprocessors = velocity statistics, temperature statistics, heat flux statistics, visualization

  subsection Visualization
    set Time between graphical output = 0.0001
  end
end

subsection Checkpointing
  # The number of timesteps between performing checkpoints. If 0 and time
  # between checkpoint is not specified, checkpointing will not be performed.
  # Units: None.
  set Steps between checkpoint = 50
end