[aspect-devel] ASPECT scaling on a Cray XC30
Rene Gassmoeller
rengas at gfz-potsdam.de
Tue Feb 4 02:22:37 PST 2014
Ok so finally here is the promised update on the scaling results. I
attached the new calc spreadsheet, in which I subtracted the
initialization and I/O timing and averaged the runtimes over 3
subsequent runs (not much change there, except from the very small models).
In fact removing the initialization and I/O times from the runtime
resolved the issue with the apparent slowdown for a high number of
DoFs/core, apparently the I/O speed is somewhat limiting, but this will
not be a problem for the final models.
Using half the available cores per node did not change much in terms of
efficiency (at least on a single node).
>> - weak scaling is normally number of cores vs runtime with a fixed
>> number of DoFs/core. Optimal scaling are horizontal lines (or you plot
>> it as cores vs. efficiency)
Done in the new spreadsheet. The lines are not horizontal but see
next point for this.
>> - assembly might not scale perfectly due to communication to the ghost
>> neighbors. I have not seen that much in the past, but it depends on
>> the speed of the interconnect (and if you are bandwidth-limited it
>> will get slower with number of DoFs/core). You can try to add another
>> timing section for the matrix.compress() call.
Thanks for the hint. In the new setup (maximal cores per node)
the assembly scales quite perfectly for 50 kDoFs/core at the moment . At
400 kDoF/node at least the Stokes Assembly increase its computing time
by 10 % over an increase in #DoFs by factor 8 (increasing global
resolution by 1). This does support your point. However the increase is
so small that it does not bother me at the moment.
The increase in computing time for stokes solve however is by far the
stronger effect. On the other hand I think this might be model specific
for this setup. Since we wanted to use a setup that is not too far from
the models we will run productively, we decided to include a spherical
temperature/composition anomaly in the setup, which of course is
resolved differently by the different resolutions. This may be the
reason for the increased number of stokes iterations for increased
resolution. For an actual assessment of the code scaling (instead of the
scaling for our model setup) one would need
to repeat the models with a resolution independent setup (i.e. harmonic
perturbation of low degree), I guess. I finished a model setup for this,
and if I find some free time I will update the results for resolution
independence.
Cheers and thanks for the help,
Rene
-------------- next part --------------
A non-text attachment was scrubbed...
Name: speedup_3d_nodes.ods
Type: application/vnd.oasis.opendocument.spreadsheet
Size: 71364 bytes
Desc: not available
URL: <http://geodynamics.org/pipermail/aspect-devel/attachments/20140204/298f5c95/attachment-0001.ods>
More information about the Aspect-devel
mailing list