[aspect-devel] Assembly speedup
Thomas Geenen
geenen at gmail.com
Fri Oct 11 01:40:12 PDT 2013
with respect to local_assemble_advection system i see a speedup of almost
20X using linear elements for temperature.
however the copy_local_to_global on 512 cores still takes to much time.
with the new patches it runs 10% faster but still a lot of time is spend in
inserting matrix values for off process entries
i will do some timing using mpi_Wtime to make sure we are not looking at
profiling overhead
if that gives the same results i will post this on the trilinos forum
cheers
Thomas
On Wed, Oct 9, 2013 at 12:42 AM, Wolfgang Bangerth
<bangerth at math.tamu.edu>wrote:
>
> revision 1932 (move is_compressible() out of the inner loop of Stokes
>> assembly):
>> +-----------------------------**----------------+------------+**
>> ------------
>> | Total wallclock time elapsed since start | 27.7s |
>> | | |
>> | Section | no. calls | wall time | % of total
>> +-----------------------------**----+-----------+------------+**
>> ------------
>> | Assemble Stokes system | 23 | 5.31s | 19%
>> | Assemble temperature system | 23 | 6.97s | 25%
>> | Build Stokes preconditioner | 4 | 2.79s | 10%
>> | Build temperature preconditioner| 23 | 0.719s | 2.6%
>> | Solve Stokes system | 23 | 7.5s | 27%
>> | Solve temperature system | 23 | 1.09s | 3.9%
>> | Initialization | 4 | 0.124s | 0.45%
>> | Postprocessing | 21 | 0.739s | 2.7%
>> | Refine mesh structure, part 1 | 3 | 0.399s | 1.4%
>> | Refine mesh structure, part 2 | 3 | 0.104s | 0.37%
>> | Setup dof systems | 4 | 1.53s | 5.5%
>> +-----------------------------**----+-----------+------------+**
>> ------------
>>
>
> And this is after revision 1948 where I filter out all degrees of freedom
> in the temperature assembly that I don't care about:
>
> +-----------------------------**----------------+------------+**
> ------------
> | Total wallclock time elapsed since start | 26.1s |
>
> | | |
> | Section | no. calls | wall time | % of total
> +-----------------------------**----+-----------+------------+**
> ------------
> | Assemble Stokes system | 23 | 5.37s | 21%
> | Assemble temperature system | 23 | 6.13s | 23%
> | Build Stokes preconditioner | 4 | 2.81s | 11%
> | Build temperature preconditioner| 23 | 0.726s | 2.8%
> | Solve Stokes system | 23 | 6.64s | 25%
> | Solve temperature system | 23 | 1.14s | 4.3%
> | Initialization | 4 | 0.125s | 0.48%
> | Postprocessing | 21 | 0.742s | 2.8%
> | Refine mesh structure, part 1 | 3 | 0.399s | 1.5%
> | Refine mesh structure, part 2 | 3 | 0.104s | 0.4%
> | Setup dof systems | 4 | 1.52s | 5.8%
> +-----------------------------**----+-----------+------------+**
> ------------
>
> This is probably almost in the noise, but should help significantly with
> the problem Thomas sees on many processors. In any case, we're now at less
> than 1/3 of the time for temperature assembly :-)
>
>
> @Thomas: Can you see whether that makes a difference?
>
> @Timo: Want to re-run your 3d simulation with the same setup and compare
> results on your end?
>
>
> Best
> Wolfgang
>
>
> --
> ------------------------------**------------------------------**
> ------------
> Wolfgang Bangerth email: bangerth at math.tamu.edu
> www: http://www.math.tamu.edu/~**bangerth/<http://www.math.tamu.edu/~bangerth/>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://geodynamics.org/pipermail/aspect-devel/attachments/20131011/ded7063b/attachment.html>
More information about the Aspect-devel
mailing list