[aspect-devel] error when writing checkpoint file

Juliane Dannberg judannberg at gmail.com
Sun Jul 8 09:28:00 PDT 2018


Magali,

one thing I can really recommend is making use of the High-Performance 
Computing Centers in Germany that are built to support research.
I only have experience with the HLRN (which is for universities in 
Northern Germany) and you would have to get an account at the HLRS (four 
Southern Germany). You can get a free test account 
<https://www.hlrs.de/solutions-services/academic-users/test-access/> 
with 50.000 core hours right away if you plan to apply for computing 
time, and from my experience with the HLRN, they are usually pretty 
quick with reviewing applications (for us, it took about a month to get 
the project, but they had quarterly application deadlines).

If you are affiliated with a university in Southern Germany, I think you 
will get the computing time for free, and it is usually not much effort 
to write a proposal. I also felt the support staff were very 
professional and helpful, and we made a really good experience with 
running ASPECT models on their machines (on up to 1500 cores for 
time-dependent models).

Cheers,
Juliane


On 07/08/2018 07:44 AM, Timo Heister wrote:
> Magali,
>
> can you try to set "number of grouped files" to 1 and see if this
> works when you run with 2 or more cores?
>
> Note that it was just a guess on my end that MPI IO is the problem. It
> might be something else we are triggering inside p4est.
>
> On Sun, Jul 8, 2018 at 4:19 AM Magali Billen <mibillen at ucdavis.edu> wrote:
>> Hi everyone,
>>
>> My cluster is really old (GigE…), and is perhaps of a dying breed of “individual PI” clusters.
>>   So, that problem is not fixable until I write a grant to add nodes to a larger, newer cluster with a more modern set-up.
>>
>> Bill - is MPI-IO enabled on Peleton? This cluster, or one like it, is what I would be buying nodes to add to.
>>
>> Wolfgang  -  your responses help to solve another mystery of why VTU works and not checkpointing.
>> I started my PRM files from a file that John Naliboff gave me (he was using my cluster with a visiting student),
>>   and in it the parameter “Number of grouped files” is set to zero (see below).  I had not dug into what that meant,
>> but now its clear.
>>
>> Maybe the only related question, is whether it is possible to create a similar variable for Checkpointing?
>> If not, I guess that's just really strong motivation for me to write an IFR proposal quickly ;-) (and a proposal for time
>> on a national lab machine).
>>
>> -Magali
>>
>> # INFORMATION ON OUTPUT TO BE CREATED
>> subsection Postprocess
>>    set List of postprocessors = visualization, velocity statistics, temperature statistics
>>
>>    subsection Visualization
>>      set List of output variables      = density, viscosity, strain rate
>>      set Output format                 = vtu
>>      set Time between graphical output = 0.10e6
>>      set Number of grouped files       = 0
>>    end
>> end
>>
>>
>>> On Jul 8, 2018, at 5:22 AM, Wolfgang Bangerth <bangerth at colostate.edu> wrote:
>>>
>>> On 07/07/2018 06:45 AM, Magali Billen wrote:
>>>> I’ll be at CIDER the last two weeks of  July and I’ll try to talk to Rene in person about this issue and try to understand
>>>> more about what options might exists.  Since this is all handled by other libraries (p4est), there may be no real option.   I
>>>>   don’t feel like I have the expertise or experience with Aspect to wade into this on my own. Maybe after talking with Rene,
>>>> we can see about trying to compiling p4est with mpi-io and see what happens.
>>> p4est can be configured to disable MPI-IO. So that's a problem that can be solved. But deal.II also uses MPI-IO, here:
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dealii_dealii_blob_master_source_base_data-5Fout-5Fbase.cc-23L7286&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=c08Btfq4m9QEScXN3ZQwLZzzWQE7S8CYq1IYuzKV_Zk&m=y9C9vb_vomIN-3tVapS2IIRVdYUvMB4xfn9pOkNxI68&s=vJUTaxRsIyqCAsdOOH8xWMlRmOK5PJN_RU7OMVVIiYg&e=
>>>
>>> This deal.II function is called from essentially all ASPECT runs:
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_geodynamics_aspect_blob_master_source_postprocess_visualization.cc-23L594&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=c08Btfq4m9QEScXN3ZQwLZzzWQE7S8CYq1IYuzKV_Zk&m=y9C9vb_vomIN-3tVapS2IIRVdYUvMB4xfn9pOkNxI68&s=5YWmXmPeTwHWBTODyC3uNUM8HgGXShapMXyso39CVJQ&e=
>>>
>>> The default for the number of grouped files is 16, and I suspect that most people leave it as is -- so basically everyone ends up in the `else` branch in line 598.
>>>
>>> In other words, while I don't know whether people use checkpoint/restart frequently, pretty much everyone I know uses VTU output, and that uses MPI-IO. I can't really reconcile this, but it seems to suggest that MPI-IO must work for most of our users.
>>>
>>> Best
>>> Wolfgang
>>>
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Wolfgang Bangerth          email:                 bangerth at colostate.edu
>>>                            www: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.math.colostate.edu_-7Ebangerth_&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=c08Btfq4m9QEScXN3ZQwLZzzWQE7S8CYq1IYuzKV_Zk&m=y9C9vb_vomIN-3tVapS2IIRVdYUvMB4xfn9pOkNxI68&s=TUx1aZuf9zhb89hGAhffqpCu2xRVQJRbYRgoGPCkN9s&e=
>>>
>> ____________________________________________________________
>> Professor of Geophysics
>> Earth & Planetary Sciences Dept., UC Davis
>> Davis, CA 95616
>> 2129 Earth & Physical Sciences Bldg.
>> Office Phone: (530) 752-4169
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__magalibillen.faculty.ucdavis.edu&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=c08Btfq4m9QEScXN3ZQwLZzzWQE7S8CYq1IYuzKV_Zk&m=y9C9vb_vomIN-3tVapS2IIRVdYUvMB4xfn9pOkNxI68&s=ST9ZpC9d-huOnaBrBTuHiLv5_iYcrhexJMoBXuns99g&e=
>>
>> Currently on Sabbatical at Munich University (LMU)
>> Department of Geophysics (PST + 9 hr)
>>
>> Avoid implicit bias - check before you submit:
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.tomforth.co.uk_genderbias_&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=c08Btfq4m9QEScXN3ZQwLZzzWQE7S8CYq1IYuzKV_Zk&m=y9C9vb_vomIN-3tVapS2IIRVdYUvMB4xfn9pOkNxI68&s=pcYkjkVZe6Fm65h3jTIbtSFYXMKC7KFrmHFOT3-STvM&e=
>> ___________________________________________________________
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.geodynamics.org/pipermail/aspect-devel/attachments/20180708/cc973c81/attachment-0001.html>


More information about the Aspect-devel mailing list