<div dir="ltr">it seems to have been a glitch in the system.<div>i reran the experiments and the walltime spend in the pressure_normalization has reduces a lot</div><div>i now see timings for all cores similar to the lowest timings in the previous run.</div>


<div><br></div><div>for that run i observed large outliers in the walltime for several cores<br><div>up to 30 times higher with a lot of time spend in the global mpi routines MPI::sum etc</div><div><br></div><div>next on the list is the call to compress</div>


<div>dealii::BlockMatrixBase<dealii::TrilinosWrappers::SparseMatrix>::compress</div><div><br><div>cheers</div><div>Thomas</div><div>ps i was running between 100 and 200K dfd per core for the timing runs</div><div class="gmail_extra">


<br><br><div class="gmail_quote">On Fri, Aug 30, 2013 at 8:24 PM, Timo Heister <span dir="ltr"><<a href="mailto:heister@clemson.edu" target="_blank">heister@clemson.edu</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div>> Interesting. The function contains one explicit MPI call, one creation<br>


> of a completely distributed vector (should not be very expensive) plus<br>

> one copy from a completely distributed to a ghosted vector (should also<br>

> not be very expensive). Can you break down which part of the function is<br>

> expensive?<br>

<br>

</div>It might be one of 3 things:<br>

1. Thomas found a work imbalance in this function (as he said, some<br>

processor might not have anything to do). This could show up in his<br>

instrumentation of processors being idle (but does not mean it takes a<br>

significant amount of total runtime).<br>

2. It is instead a work imbalance/issue in the computation that<br>

happens before the normalization and the timers are not synchronized<br>

correctly.<br>

3. He has only very few unknowns per processor which skews the timings.<br>

<div><br>

> That<br>

> said, I always tried to follow the principle of least surprise, which<br>

> would mean to make sure that the pressure is normalized or that the<br>

> linear systems are indeed solved to sufficient accuracy.<br>

<br>

</div>I agree.<br>

<div><br>

> Instead of globally relaxing tolerances or switch off pressure<br>

> normalization, how about having a section in the manual in which we list<br>

> ways to make the code faster if you know what you do? I'll be happy to<br>

> write this section.<br>

<br>

</div>Sounds good. Some more ideas:<br>

- use optimized mode :-)<br>

- lower order of temperature/compositional discretization<br>

- disable postprocessing if not needed<br>

<div><br>

--<br>

Timo Heister<br>

<a href="http://www.math.clemson.edu/~heister/" target="_blank">http://www.math.clemson.edu/~heister/</a><br>

</div><div><div>_______________________________________________<br>

Aspect-devel mailing list<br>

<a href="mailto:Aspect-devel@geodynamics.org" target="_blank">Aspect-devel@geodynamics.org</a><br>

<a href="http://geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel" target="_blank">http://geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel</a><br>

</div></div></blockquote></div><br></div></div></div></div>