<div dir="ltr">it seems to have been a glitch in the system.<div>i reran the experiments and the walltime spend in the pressure_normalization has reduces a lot</div><div>i now see timings for all cores similar to the lowest timings in the previous run.</div>
<div><br></div><div>for that run i observed large outliers in the walltime for several cores<br><div>up to 30 times higher with a lot of time spend in the global mpi routines MPI::sum etc</div><div><br></div><div>next on the list is the call to compress</div>
<div>dealii::BlockMatrixBase<dealii::TrilinosWrappers::SparseMatrix>::compress</div><div><br><div>cheers</div><div>Thomas</div><div>ps i was running between 100 and 200K dfd per core for the timing runs</div><div class="gmail_extra">
<br><br><div class="gmail_quote">On Fri, Aug 30, 2013 at 8:24 PM, Timo Heister <span dir="ltr"><<a href="mailto:heister@clemson.edu" target="_blank">heister@clemson.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div>> Interesting. The function contains one explicit MPI call, one creation<br>
> of a completely distributed vector (should not be very expensive) plus<br>
> one copy from a completely distributed to a ghosted vector (should also<br>
> not be very expensive). Can you break down which part of the function is<br>
> expensive?<br>
<br>
</div>It might be one of 3 things:<br>
1. Thomas found a work imbalance in this function (as he said, some<br>
processor might not have anything to do). This could show up in his<br>
instrumentation of processors being idle (but does not mean it takes a<br>
significant amount of total runtime).<br>
2. It is instead a work imbalance/issue in the computation that<br>
happens before the normalization and the timers are not synchronized<br>
correctly.<br>
3. He has only very few unknowns per processor which skews the timings.<br>
<div><br>
> That<br>
> said, I always tried to follow the principle of least surprise, which<br>
> would mean to make sure that the pressure is normalized or that the<br>
> linear systems are indeed solved to sufficient accuracy.<br>
<br>
</div>I agree.<br>
<div><br>
> Instead of globally relaxing tolerances or switch off pressure<br>
> normalization, how about having a section in the manual in which we list<br>
> ways to make the code faster if you know what you do? I'll be happy to<br>
> write this section.<br>
<br>
</div>Sounds good. Some more ideas:<br>
- use optimized mode :-)<br>
- lower order of temperature/compositional discretization<br>
- disable postprocessing if not needed<br>
<div><br>
--<br>
Timo Heister<br>
<a href="http://www.math.clemson.edu/~heister/" target="_blank">http://www.math.clemson.edu/~heister/</a><br>
</div><div><div>_______________________________________________<br>
Aspect-devel mailing list<br>
<a href="mailto:Aspect-devel@geodynamics.org" target="_blank">Aspect-devel@geodynamics.org</a><br>
<a href="http://geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel" target="_blank">http://geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel</a><br>
</div></div></blockquote></div><br></div></div></div></div>