[aspect-devel] [cse.ucdavis.edu #13562] Fwd: Fwd: error when writing checkpoint files

Sat Jul 7 20:02:02 PDT 2018

The cluster that I use at Portland State does not have a problem with
checkpointing despite not having a parallel filesystem. This cluster is
~2000 cores/100 nodes, so typical of a smaller university compute resource.
For 3D models, we are usually running on 400-1000 cores. The storage server
is NFS on XFS. This is the output from 'mount' for the scratch filesystem:
scratch2-ib:/mnt on /scratch2 type nfs
(rw,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.3.1.236,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.3.1.236)

Max

On Sat, Jul 7, 2018 at 5:53 AM Magali Billen <mibillen at ucdavis.edu> wrote:

> Hi All,
>
> Thanks for all the information.  Its seems to me that I won’t be using
> checkpointing in Aspect in the near term.
>
> I am surprised that no other users have come across this issue… it suggest
> that very few people are using
> there own small clusters to run models with aspect, or if they do they
> don’t use checkpointing - seems strange?
> Is this true? Or do other people use smaller clusters and somehow make
> this work?
>
> I’ll be at CIDER the last two weeks of  July and I’ll try to talk to Rene
> in person about this issue and try to understand
> more about what options might exists.  Since this is all handled by other
> libraries (p4est), there may be no real option.   I
>  don’t feel like I have the expertise or experience with Aspect to wade
> into this on my own. Maybe after talking with Rene,
> we can see about trying to compiling p4est with mpi-io and see what
> happens.
>
> In the meantime I’ll move ahead without checkpointing - however, I’m
> already looking at running 2D models
> with Newtonian rheology that will need 3-5 days to run on my cluster using
> 128 processors,
> so I already see a need for this to avoid starting over if for example a
> node crashes etc… with non-newtonian rheology,
> it will be an even bigger problem for me.
>
> Magali
>
>
>
> > On Jul 7, 2018, at 9:08 AM, Bill Broadley <bill at cse.ucdavis.edu> wrote:
> >
> > On 07/06/2018 08:23 PM, Wolfgang Bangerth wrote:
> >>
> >> Magali & Bill,
> >>
> >>> Is there a way to write checkpointing files without uses MPI -I/O?
> >>>
> >>> Is the trickery involved in writing the checkpointing files such that
> I should
> >>> ask Bill (cc’d on this email) to enable MPI-IO?
> >>> That is, even though Bill says is generally incompatible with the NFS
> file
> >>> system, should it work?
> >>
> >> I don't know whether we really want to support systems that don't have
> MPI-IO.
> >
> > That vast majority of clusters don't support MPI-IO, which is why it's
> support
> > is so limited and off by default with many libraries.   Like say HDF.
> >
> > Sure national lab level clusters that use Lustre, Ceph, and BeeGFS
> support MPI-IO.
> >
> >> You're the first person to report a cluster where this doesn't work. I
> have no
> >> idea how MPI-IO is internally implemented (e.g., whether really every
> processor
> >> opens the same file at the same time, using file system support; or
> whether all
> >> MPI processes send their data to one process that then does the write),
> but the
> >> only way to achieve scalability is to use MPI-IO.
> >
> > MPI-IO allows multiple nodes to arrange access to stripes of a file to
> allow
> > reading/writing in parallel.  But such storage systems start becoming
> reasonable
> > at $100k just for the storage.  Typically they include things like 8
> storage
> > arrays, each doubly connected to two servers.  The array of 16 machines
> are the
> > block store, then you buy a few other machines to be the metadata
> servers,
> > typically with tons of ram and SSDs.  If building a few $million cluster
> it's
> > definitely the way to go.
> >
> > The minimum reasonable config is somewhere around one metadata server, 2
> storage
> > arrays, and 4 object stores.  Using our standard building blocks that
> would be
> > somewhere around 1.2PB of storage.
> >
> > The most popular clusters are of course much smaller and have 1 to a few
> > fileservers, most often running NFS.  That's the typical default install
> for any
> > cluster software I've seen like Rocks, Warewulf, etc.
> >
> > Is aspect really going to target $0.5M cluster and up or so?  Lustre
> manages
> > this, used to require a license to be current, but not the licensing
> changed.
> > Last HPC meeting I went to I talked to a group of 6 or so faculty who
> had used
> > Lustre and the related horror stories consumed a the first part of the
> meeting.
> >
> > Seems kind of strange to write checkpoints to the slower central storage
> that's
> > very expensive and ignoring the local dedicate disk that's not shared.
> >
> > I believe just about ever aspect run ever run at Davis has been without
> MPI-IO,
> > it's not a learning issue, it's just that parallel filesystems are very
> > expensive and haven't currently been justified.  I wouldn't rule them
> out, just
> > haven't had a big enough chunk of funding to be spend at a single time
> with a
> > I/O heavy workload in mind.
> >
> >> So what I'm trying to say is that our preference would be for your
> clusters to
> >> learn how to use MPI-IO :-)
> >
> > Seems pretty silly to talk about scaling when ultimately many clusters
> only have
> > a single file server.
>
> ____________________________________________________________
> Professor of Geophysics
> Earth & Planetary Sciences Dept., UC Davis
> Davis, CA 95616
> 2129 Earth & Physical Sciences Bldg.
> Office Phone: (530) 752-4169
> http://magalibillen.faculty.ucdavis.edu
>
> Currently on Sabbatical at Munich University (LMU)
> Department of Geophysics (PST + 9 hr)
>
> Avoid implicit bias - check before you submit:
> http://www.tomforth.co.uk/genderbias/
> ___________________________________________________________
>
> _______________________________________________
> Aspect-devel mailing list
> Aspect-devel at geodynamics.org
> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.geodynamics.org/pipermail/aspect-devel/attachments/20180707/5a9b2292/attachment.html>