[aspect-devel] Writing output in parallel

Timo Heister heister at math.tamu.edu
Tue Feb 28 06:47:47 PST 2012


First: the solution to use MPI I/O and merge output files is the only
way to scale to bigger machines. You can not run with 10'000 cores and
write out 10'000 files per timestep.
Second: the merging of files is optional. It is a runtime parameter
you can set. You might want to generate one file per node (instead of
one file per core now) or you can leave it as it is today.
Third: In your setup I would just symlink the output/ directory to the
local scratch space and copy the files to the central nfs at the end
of the computation (you can do this in your jobfile). If you want to
do both things at the same time, you can execute a shell script after
the visualization that does the mv in the background.

On Tue, Feb 28, 2012 at 8:32 AM, Thomas Geenen <geenen at gmail.com> wrote:
> i am not sure if this will be very efficient on the type of cluster we have.
>
> we have a cluster with a bunch of nodes with fast local io that are
> interconnected with infiniband and have an ethernet connection for the io
> etc with the master node. On the master node we have our network drive (a
> large slow beast, NFS).
>
> in the proposed solution we will be using the infiniband for the io during
> the computation (assuming the io will be in the background) how will that
> affect the speed of the solver? how large are the mpi buffers needed for
> this type of io and is that pinned memory? do we have enough left for the
> application?
>
> if its not doing io in the background this will be a bottleneck for us since
> we have a very basic disk setup on the master node
>
> locally (on the compute nodes) we have something like 500MB/s throughput so
> for a typical run on 10-20 nodes we have an effective bandwith of 5-10GB/s
>
> i would be in favor of implementing a few IO strategies and leave it to the
> user to pick the one that is most efficient for his/her hardware setup.
>
> the low tech option i proposed before (write to local storage fast and mv
> the files in the background over the ethernet connection to the slow network
> drive) will probably work best for me.
>
> cheers
> Thomas
>
>
>
> On Mon, Feb 27, 2012 at 6:40 PM, Wolfgang Bangerth <bangerth at math.tamu.edu>
> wrote:
>>
>>
>> > Oh, one thing I forgot to mention is that I am not sure if we want one
>> > big file per time step. It might happen that paraview in parallel is
>> > less efficient reading one big file. One solution would be to write
>> > something like n*0.05 files, where n is the number of compute
>> > processes.
>>
>> Yes, go with that. Make the reduction factor (here, 20) a run-time
>> parameter so that people can choose whatever is convenient for them when
>> visualizing. For single-processor visualization, one could set it equal
>> to the number of MPI jobs, for example.
>>
>> Cheers
>>  W.
>>
>> ------------------------------------------------------------------------
>> Wolfgang Bangerth               email:            bangerth at math.tamu.edu
>>                                 www: http://www.math.tamu.edu/~bangerth/
>>
>> _______________________________________________
>> Aspect-devel mailing list
>> Aspect-devel at geodynamics.org
>> http://geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>
>
>
> _______________________________________________
> Aspect-devel mailing list
> Aspect-devel at geodynamics.org
> http://geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>



-- 
Timo Heister
http://www.math.tamu.edu/~heister/


More information about the Aspect-devel mailing list