[aspect-devel] Progress in writing the mantle convection code Aspect

Tue Oct 15 09:27:45 PDT 2013

On Oct 14, 2013, at 1:29 PM, Wolfgang Bangerth wrote:

>> If you have N particles and processor i covers a fraction V_i of the
>> total domain volume, then processor i creates N*V_i particles.  This
>> creates a reasonably random looking distribution, but statistically
>> it's highly non-random because each subdomain has a set number of
>> particles.
> 
> That's true, but I think it's not too much of a problem. After all, we don't use these particles for random integration or anything else that requires truly statistically distributed particles. Basically, we want to trace where particles come from and where they go, and so the trajectory is more important than the details of their initial distribution. At least, that's my view. Do I miss something?

I think we'd need the scientists to weigh in on this.  For tracking trajectories I agree having a true random distribution isn't too important.  However, many users want to represent material properties with particles where a random distribution may be more important.

>> That sounds fine, as long as we make sure the user knows.  Actually
>> the original idea of generating particles on a single processor and
>> sending them to other processors will fix all these problems,
>> including the random seed issue, as well as ensuring results are
>> identical regardless of the number of processors used.  I might try
>> coding this up soon to see how complex it is, and with the Generator
>> class it will be easy to keep this separate from other
>> implementations.
> 
> Would this scale? You'd either need to send particles in lots of small packages, or store a lot of particles on a single processor...

Good question.  I did a few tests on Stampede and it looks like it will scale reasonably well.  If we assume 1e9 particles with id+XYZ (4 doubles = 32 bytes) per particle, we need to broadcast ~30 GB from the root process in many small packages.

Number of tasks= 256 My rank= 0
Bcast 131072 doubles (1.000000 MB) takes 0.001406 seconds. 711.060478 MB/sec
Bcast 262144 doubles (2.000000 MB) takes 0.003084 seconds. 648.465996 MB/sec
Bcast 524288 doubles (4.000000 MB) takes 0.007491 seconds. 533.962737 MB/sec

Number of tasks= 1024 My rank= 0
Bcast 131072 doubles (1.000000 MB) takes 0.001876 seconds. 533.071689 MB/sec
Bcast 262144 doubles (2.000000 MB) takes 0.003831 seconds. 522.028270 MB/sec
Bcast 524288 doubles (4.000000 MB) takes 0.008793 seconds. 454.887641 MB/sec

Number of tasks= 4096 My rank= 0
Bcast 131072 doubles (1.000000 MB) takes 0.002373 seconds. 421.451682 MB/sec
Bcast 262144 doubles (2.000000 MB) takes 0.004993 seconds. 400.559061 MB/sec
Bcast 524288 doubles (4.000000 MB) takes 0.011756 seconds. 340.251219 MB/sec

Each of these is the average of 100 broadcasts.  At the kind of transfer rates we see above, sending 30GB of particle data would take a minute or two, which would occur once during initialization.  This can be improved further by intelligently sending particles only to processes that have a chance of containing them (track bounding boxes at the root process).  And transmission likely won't be the slowest part, but rather determining if a particle is in a particular cell, which has to occur no matter what method we use.

I'll look in the literature to see if there's a better way of doing this sort of thing but this seems reasonable from what I can tell.

-Eric