[aspect-devel] ASPECT scaling on a Cray XC30

Thu Feb 6 09:33:59 PST 2014

On 02/04/2014 11:53 AM, Thomas Geenen wrote:
> he Rene,
> 
> how did you pin your mpi processes on the node?
> for the implicit solver this makes a big difference.
> these routines are memory bandwidth limited. you will saturate your
> bandwidth usually when running on half the number of cores per socket.
> so for a dual socket node you want to use half the number of cores per
> socket but use both sockets.

Oh thanks for the hint, I did not consider this. I can repeat that small
model with pinned processes.

On 02/06/2014 12:44 AM, Timo Heister wrote:
> Very nice results. A couple of comments:
> - the Stokes solver is not scaling well weakly but looks okay in
> strong scaling (I wonder why that is happening because the setup is
> not included)

Actually I used the same setup as in my earlier mail on the topic. It is
somewhat resolution dependent because of a spherical anomaly and uses a
loose solver tolerance to keep the computing time low. This two may be
reasons for the weak stokes scaling. I am currently setting up a more
resolution independent model to test this.

> - the scaling on a single node is surprisingly good. I wonder if this
> is due to the architecture or due to cpu binding (what MPI library are
> you using and what options do you specify for mpirun?)

Actually, the usual way on this cluster is to use a specifically written
wrapper script instead of mpirun to hide all the optimized parameters
from the usual (not so experienced) user. I did not dig through all the
options yet. They are using a modified version of the Argonne MPICH
implementation of MPI-3.0 (Cray MPT 6.0) but I am not aware of the
special optimizations in there.

> - the Stokes solver always needs something like 30+6 iterations, which
> is not ideal. Can you try switching the cheap iterations off and
> compare the runtime (a single data point would be enough)?

A single data point (Global resolution 5, 96 cores, 50kDoFs/core)
"30+2" Iterations: 4,39 s Stokes solve
"0+9" Iterations: 7,42 s Stokes solve
The rest of the timing is completely equal (within errors). I will redo
some other computations with 50 kDoFs/core after my vacation next week
to see whether the number of iterations is now more constant for fixed
#DoFs/core.

> - can you please post the .prm you used?

See earlier mail.

> - I wonder if it would be worthwhile to look at the second timestep to
> ignore effects like MPI buffer creation.

Yes I thought about that as well. Also a thing on the to-do list. Maybe
we can combine all this to an example section on ASPECT scaling in the
manual? Essentially it is just an extension of the part in the paper,
but it might be useful for others to estimate whether ASPECT is suitable
for their model sizes. In this case, I would maybe remove the parameter
file of all specific options for our model (getting rid of this
resolution dependency) and redo the models once more.