[aspect-devel] performance
Wolfgang Bangerth
bangerth at math.tamu.edu
Sat Mar 9 03:58:19 PST 2013
> 53 | setup reinit | 1 | 3.55s | 0.18% |
> 54 | setup system matrix | 1 | 214s | 11% |
> 55 | setup system preconditioner | 1 | 95.2s | 4.8% |
> 56 +---------------------------------+-----------+------------+------------+
>
> "setup reinit" corresponds to this part of the code:
>
> system_rhs.reinit(introspection.index_sets.system_partitioning, mpi_communicator);
> solution.reinit(introspection.index_sets.system_relevant_partitioning,
> mpi_communicator);
> old_solution.reinit(introspection.index_sets.system_relevant_partitioning,
> mpi_communicator);
> old_old_solution.reinit(introspection.index_sets.system_relevant_partitioning,
> mpi_communicator);
> current_linearization_point.reinit
> (introspection.index_sets.system_relevant_partitioning, MPI_COMM_WORLD);
>
> "setup system matrix" correponds to this part of the code:
>
> setup_system_matrix (introspection.index_sets.system_partitioning);
>
> "setup system preconditioner" corresponds to:
>
> setup_system_preconditioner (introspection.index_sets.system_partitioning);
>
> I will hereby reiterate my comments from last time for clarity:
> I am surprised by the time all these actions take (Assembly included, by the way).
> My own code is FE but it relies on a regular grid and uses 1st order elements
> (Q1P0 + penalised formulation)
> so that the assembly and 'dof setup' processes for a 64^3 grid are much
> faster, and therefore not
> comparable to these obtained with Aspect.
> Not having a proper referential, my question is simple: are those measured
> times normal ?
Timo and I talked about this yesterday over lunch and I think there are a
number of things one could note:
1/ It *shouldn't* take this long. What version of Trilinos are you using? We
reported a bug in an earlier Trilinos version that talked about the fact
that their function that creates the sparsity pattern is unreasonably slow.
I do not recall if this was ever fixed but we could point right at the code
that was the problem. Can you try to use Trilinos 11?
2/ One of the problems with that piece of the code in the Trilinos
Epetra_FECrsGraph (their "SparsityPattern" class) was that the effort to
create it was quadratic or cubic in the number of elements per row. In 3d,
when you use a Q2/Q1 element like we do in Aspect, you get rows that
have 384 elements per row, whereas with the Q1/P0 element you have
85. That difference alone puts a significant amount of stress on the
system.
So, the short is: it's going to take longer for Q2/Q1 elements but it
shouldn't take *this* long.
> Another question: why does the Stokes assembly take more time than the
> assembly of temperature ?
Because you couple so many more degrees of freedom in the (vector-valued)
Stokes equation than in the (scalar) temperature equation.
Best
W.
--
------------------------------------------------------------------------
Wolfgang Bangerth email: bangerth at math.tamu.edu
www: http://www.math.tamu.edu/~bangerth/
More information about the Aspect-devel
mailing list