[aspect-devel] Assembly speedup

Wolfgang Bangerth bangerth at math.tamu.edu
Tue Oct 8 15:42:32 PDT 2013


> revision 1932 (move is_compressible() out of the inner loop of Stokes
> assembly):
> +---------------------------------------------+------------+------------
> | Total wallclock time elapsed since start    |      27.7s |
> |                                             |            |
> | Section                         | no. calls |  wall time | % of total
> +---------------------------------+-----------+------------+------------
> | Assemble Stokes system          |        23 |      5.31s |        19%
> | Assemble temperature system     |        23 |      6.97s |        25%
> | Build Stokes preconditioner     |         4 |      2.79s |        10%
> | Build temperature preconditioner|        23 |     0.719s |       2.6%
> | Solve Stokes system             |        23 |       7.5s |        27%
> | Solve temperature system        |        23 |      1.09s |       3.9%
> | Initialization                  |         4 |     0.124s |      0.45%
> | Postprocessing                  |        21 |     0.739s |       2.7%
> | Refine mesh structure, part 1   |         3 |     0.399s |       1.4%
> | Refine mesh structure, part 2   |         3 |     0.104s |      0.37%
> | Setup dof systems               |         4 |      1.53s |       5.5%
> +---------------------------------+-----------+------------+------------

And this is after revision 1948 where I filter out all degrees of 
freedom in the temperature assembly that I don't care about:

+---------------------------------------------+------------+------------
| Total wallclock time elapsed since start    |      26.1s |
|                                             |            |
| Section                         | no. calls |  wall time | % of total
+---------------------------------+-----------+------------+------------
| Assemble Stokes system          |        23 |      5.37s |        21%
| Assemble temperature system     |        23 |      6.13s |        23%
| Build Stokes preconditioner     |         4 |      2.81s |        11%
| Build temperature preconditioner|        23 |     0.726s |       2.8%
| Solve Stokes system             |        23 |      6.64s |        25%
| Solve temperature system        |        23 |      1.14s |       4.3%
| Initialization                  |         4 |     0.125s |      0.48%
| Postprocessing                  |        21 |     0.742s |       2.8%
| Refine mesh structure, part 1   |         3 |     0.399s |       1.5%
| Refine mesh structure, part 2   |         3 |     0.104s |       0.4%
| Setup dof systems               |         4 |      1.52s |       5.8%
+---------------------------------+-----------+------------+------------

This is probably almost in the noise, but should help significantly with 
the problem Thomas sees on many processors. In any case, we're now at 
less than 1/3 of the time for temperature assembly :-)


@Thomas: Can you see whether that makes a difference?

@Timo: Want to re-run your 3d simulation with the same setup and compare 
results on your end?

Best
  Wolfgang


-- 
------------------------------------------------------------------------
Wolfgang Bangerth               email:            bangerth at math.tamu.edu
                                 www: http://www.math.tamu.edu/~bangerth/



More information about the Aspect-devel mailing list