[aspect-devel] Writing output in parallel

Tue Feb 28 08:40:13 PST 2012

> OK, then I'd say go with what you have right now. Out of curiosity, how
> does MPI I/O work? If you have a parallel file system, is the target of
> the write striped across multiple disks and servers? Or does everything
> converge on a single node?

The interface is very simple and comparable to posix
(open/seek/write), but you have additional collective writes (for
example: proc 0 writes n_0 bytes, followed by proc 1 writes n_1 bytes
after that, ...). The MPI I/O middle layer then decides how to do this
in the most efficient way. The fallback is to just use standard posix
routines (with some synchronization on top done by MPI I/O). It can
group writes to fewer nodes (e.g. every 16 MPI processes send their
data to one node that does the writing). When you are using a special
parallel filesystem (not just a simple NFS), it can also use specific
interfaces or tuning parameters (buffer sizes, striping, etc.)
supplied by the parallel filesystem. This is all transparent to the
user code, but you can give hints to the implementation (I had to set
the number of writing nodes on hurr to a smaller number to get better
performance; in contact with the admins about this). I am pretty sure
that on hurr it is striping the big file between the four I/O backend
machines.

This is all part of the MPI2 standard btw.. I hope that is not a problem.

> I'll write a paragraph on how to implement the scheme of writing locally
> and copying later from the job script.

something like:
1. ln -s /a/local/tmp/filesystem output
2. run normally: mpirun ...
3. mv output/* /some/shared/filesystem/

Thomas was thinking of doing step 3 concurrently with the code running
(for example spawn a shell script after every visualization).

-- 
Timo Heister
http://www.math.tamu.edu/~heister/