[aspect-devel] performance

Thieulot, C. (Cedric) c.thieulot at uu.nl
Thu Mar 7 08:17:37 PST 2013


Hi all,

I have been timing various parts of the Simulator::setup_dofs function, in source/simulator/core.cc, lines 560 and following, in the case of a 64^3 regular grid.
(In order to speed things up a bit, I have used 8 cores this time).

Here is the standard output:

  1 Number of active cells: 262144 (on 7 levels)
  2 Number of degrees of freedom: 11008070 (6440067+274625+2146689+2146689)
  3
  4 *** Timestep 0:  t=0 seconds
  5    Solving temperature system... 0 iterations.
  6    Solving composition system 1... 0 iterations.
  7    Rebuilding Stokes preconditioner...
  8    Solving Stokes system... 27 iterations.
  9
 10    Postprocessing:
 11
 12      Reference density (kg/m^3):                    1010
 13      Reference gravity (m/s^2):                     10
 14      Reference thermal expansion (1/K):             0
 15      Temperature contrast accross model domain (K): 1
 16      Model domain depth (m):                        1
 17      Reference thermal diffusivity (m^2/s):         3.72277e-06
 18      Reference viscosity (Pas):                     100
 19      Ra number:                                     0
 20      k_value:                                       4.7
 21      reference_cp:                                  1250
 22      reference_thermal_diffusivity:                 3.72277e-06
 23
 24      Writing graphical output:  output_vkk3D/solution-00000
 25      RMS, max velocity:         0.000135 m/s, 0.000322 m/s
 26      Compositions min/max/mass: 0/1/0.1979
 27
 28
 29
 30 +---------------------------------------------+------------+------------+
 31 | Total wallclock time elapsed since start    |  1.98e+03s |            |
 32 |                                             |            |            |
 33 | Section                         | no. calls |  wall time | % of total |
 34 +---------------------------------+-----------+------------+------------+
 35 | Assemble Stokes system          |         1 |       314s |        16% |
 36 | Assemble composition system     |         1 |       521s |        26% |
 37 | Assemble temperature system     |         1 |       528s |        27% |
 38 | Build Stokes preconditioner     |         1 |       138s |       6.9% |
 39 | Build composition preconditioner|         1 |      7.13s |      0.36% |
 40 | Build temperature preconditioner|         1 |      7.86s |       0.4% |
 41 | Solve Stokes system             |         1 |      81.4s |       4.1% |
 42 | Solve composition system        |         1 |      1.87s |     0.094% |
 43 | Solve temperature system        |         1 |      1.99s |       0.1% |
 44 | Initialization                  |         2 |      9.01s |      0.45% |
 45 | Postprocessing                  |         1 |      49.9s |       2.5% |
 46 | Setup dof systems               |         1 |      1.05s |     0.053% |
 47 | aft.Renumbering                 |         1 |     0.207s |      0.01% |
 48 | aft.constraints                 |         1 |   0.00666s |   0.00034% |
 49 | aft.distributedofs              |         1 |      2.61s |      0.13% |
 50 | aft.introspection               |         1 |  0.000636s |   3.2e-05% |
 51 | aft.pcout                       |         1 |     0.725s |     0.037% |
 52 | setup ifcompr                   |         1 |  2.86e-06s |   1.4e-07% |
 53 | setup reinit                    |         1 |      3.55s |      0.18% |
 54 | setup system matrix             |         1 |       214s |        11% |
 55 | setup system preconditioner     |         1 |      95.2s |       4.8% |
 56 +---------------------------------+-----------+------------+------------+

"setup reinit" corresponds to this part of the code:

system_rhs.reinit(introspection.index_sets.system_partitioning, mpi_communicator);
solution.reinit(introspection.index_sets.system_relevant_partitioning, mpi_communicator);
old_solution.reinit(introspection.index_sets.system_relevant_partitioning, mpi_communicator);
old_old_solution.reinit(introspection.index_sets.system_relevant_partitioning, mpi_communicator);
current_linearization_point.reinit (introspection.index_sets.system_relevant_partitioning, MPI_COMM_WORLD);

"setup system matrix" correponds to this part of the code:

setup_system_matrix (introspection.index_sets.system_partitioning);

"setup system preconditioner" corresponds to:

setup_system_preconditioner (introspection.index_sets.system_partitioning);

I will hereby reiterate my comments from last time for clarity:
I am surprised by the time all these actions take (Assembly included, by the way).
My own code is FE but it relies on a regular grid and uses 1st order elements (Q1P0 + penalised formulation)
so that the assembly and 'dof setup' processes for a 64^3 grid are much faster, and therefore not
comparable to these obtained with Aspect.
Not having a proper referential, my question is simple: are those measured times normal ?

Another question: why does the Stokes assembly take more time than the assembly of temperature ?

I will appreciate any comments or partial answers any of you may have.

Cedric.




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://geodynamics.org/pipermail/aspect-devel/attachments/20130307/02e556c9/attachment.htm 


More information about the Aspect-devel mailing list