[aspect-devel] performance

Wolfgang Bangerth bangerth at math.tamu.edu
Sat Mar 9 03:58:19 PST 2013

>   53 | setup reinit                    |         1 |      3.55s |      0.18% |
>   54 | setup system matrix             |         1 |       214s |        11% |
>   55 | setup system preconditioner     |         1 |      95.2s |       4.8% |
>   56 +---------------------------------+-----------+------------+------------+
> "setup reinit" corresponds to this part of the code:
> system_rhs.reinit(introspection.index_sets.system_partitioning, mpi_communicator);
> solution.reinit(introspection.index_sets.system_relevant_partitioning,
> mpi_communicator);
> old_solution.reinit(introspection.index_sets.system_relevant_partitioning,
> mpi_communicator);
> old_old_solution.reinit(introspection.index_sets.system_relevant_partitioning,
> mpi_communicator);
> current_linearization_point.reinit
> (introspection.index_sets.system_relevant_partitioning, MPI_COMM_WORLD);
> "setup system matrix" correponds to this part of the code:
> setup_system_matrix (introspection.index_sets.system_partitioning);
> "setup system preconditioner" corresponds to:
> setup_system_preconditioner (introspection.index_sets.system_partitioning);
> I will hereby reiterate my comments from last time for clarity:
> I am surprised by the time all these actions take (Assembly included, by the way).
> My own code is FE but it relies on a regular grid and uses 1st order elements
> (Q1P0 + penalised formulation)
> so that the assembly and 'dof setup' processes for a 64^3 grid are much
> faster, and therefore not
> comparable to these obtained with Aspect.
> Not having a proper referential, my question is simple: are those measured
> times normal ?

Timo and I talked about this yesterday over lunch and I think there are a 
number of things one could note:
1/ It *shouldn't* take this long. What version of Trilinos are you using? We
    reported a bug in an earlier Trilinos version that talked about the fact
    that their function that creates the sparsity pattern is unreasonably slow.
    I do not recall if this was ever fixed but we could point right at the code
    that was the problem. Can you try to use Trilinos 11?
2/ One of the problems with that piece of the code in the Trilinos
    Epetra_FECrsGraph (their "SparsityPattern" class) was that the effort to
    create it was quadratic or cubic in the number of elements per row. In 3d,
    when you use a Q2/Q1 element like we do in Aspect, you get rows that
    have 384 elements per row, whereas with the Q1/P0 element you have
    85. That difference alone puts a significant amount of stress on the

So, the short is: it's going to take longer for Q2/Q1 elements but it 
shouldn't take *this* long.

> Another question: why does the Stokes assembly take more time than the
> assembly of temperature ?

Because you couple so many more degrees of freedom in the (vector-valued) 
Stokes equation than in the (scalar) temperature equation.


Wolfgang Bangerth               email:            bangerth at math.tamu.edu
                                 www: http://www.math.tamu.edu/~bangerth/

More information about the Aspect-devel mailing list