[aspect-devel] [cse.ucdavis.edu #13562] Fwd: Fwd: error when writing checkpoint files

Magali Billen mibillen at ucdavis.edu
Mon Jul 9 00:52:58 PDT 2018


Hello Timo,
I ran the test again with number of group processors  = 1 for the VTU output, and this did lead to an error in the visualization output:

----------------------------------------------------
Exception on MPI process <1> while running postprocessor <N6aspect11Postprocess13VisualizationILi2EEE>: 

--------------------------------------------------------
An error occurred in line <6632> of file </share/apps/cig/dealii/dealii-9.0.0/install/tmp/unpack/deal.II-v9.0.0/source/base/data_out_base.cc> in function
    void dealii::DataOutInterface<dim, spacedim>::write_vtu_in_parallel(const char*, MPI_Comm) const [with int dim = 2; int spacedim = 2; MPI_Comm = ompi_communicator_t*]
The violated condition was: 
    ierr == MPI_SUCCESS
Additional information: 
deal.II encountered an error while calling an MPI function.
The description of the error provided by MPI is "MPI_ERR_OTHER: known error not in list".
The numerical value of the original error code is 16.
————————————————————————————


So, it seems that there is a “solution” for dealing with a system without MPI-IO for  visualization  (Number of grouped files = 0), but there
is not a similar solution implemented for checkpointing.   Is that correct?  

-Magali


> On Jul 8, 2018, at 4:44 PM, Timo Heister <heister at clemson.edu> wrote:
> 
> Magali,
> 
> can you try to set "number of grouped files" to 1 and see if this
> works when you run with 2 or more cores?
> 
> Note that it was just a guess on my end that MPI IO is the problem. It
> might be something else we are triggering inside p4est.
> 
> On Sun, Jul 8, 2018 at 4:19 AM Magali Billen <mibillen at ucdavis.edu <mailto:mibillen at ucdavis.edu>> wrote:
>> 
>> Hi everyone,
>> 
>> My cluster is really old (GigE…), and is perhaps of a dying breed of “individual PI” clusters.
>> So, that problem is not fixable until I write a grant to add nodes to a larger, newer cluster with a more modern set-up.
>> 
>> Bill - is MPI-IO enabled on Peleton? This cluster, or one like it, is what I would be buying nodes to add to.
>> 
>> Wolfgang  -  your responses help to solve another mystery of why VTU works and not checkpointing.
>> I started my PRM files from a file that John Naliboff gave me (he was using my cluster with a visiting student),
>> and in it the parameter “Number of grouped files” is set to zero (see below).  I had not dug into what that meant,
>> but now its clear.
>> 
>> Maybe the only related question, is whether it is possible to create a similar variable for Checkpointing?
>> If not, I guess that's just really strong motivation for me to write an IFR proposal quickly ;-) (and a proposal for time
>> on a national lab machine).
>> 
>> -Magali
>> 
>> # INFORMATION ON OUTPUT TO BE CREATED
>> subsection Postprocess
>>  set List of postprocessors = visualization, velocity statistics, temperature statistics
>> 
>>  subsection Visualization
>>    set List of output variables      = density, viscosity, strain rate
>>    set Output format                 = vtu
>>    set Time between graphical output = 0.10e6
>>    set Number of grouped files       = 0
>>  end
>> end
>> 
>> 
>>> On Jul 8, 2018, at 5:22 AM, Wolfgang Bangerth <bangerth at colostate.edu> wrote:
>>> 
>>> On 07/07/2018 06:45 AM, Magali Billen wrote:
>>>> I’ll be at CIDER the last two weeks of  July and I’ll try to talk to Rene in person about this issue and try to understand
>>>> more about what options might exists.  Since this is all handled by other libraries (p4est), there may be no real option.   I
>>>> don’t feel like I have the expertise or experience with Aspect to wade into this on my own. Maybe after talking with Rene,
>>>> we can see about trying to compiling p4est with mpi-io and see what happens.
>>> 
>>> p4est can be configured to disable MPI-IO. So that's a problem that can be solved. But deal.II also uses MPI-IO, here:
>>> 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dealii_dealii_blob_master_source_base_data-5Fout-5Fbase.cc-23L7286&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=jeDjvZkHnpb4B_vDNC1oj53AntalD-RMupbvz3V41j0&m=ghs_LQJwqsqYsZsSjJr8j_-ycCDfcGqXtZ-C-PI0bDk&s=3yZwcUlNJ6xLXtI7lnEJIohV8WmFxcX7XmGzFHGU6bI&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dealii_dealii_blob_master_source_base_data-5Fout-5Fbase.cc-23L7286&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=jeDjvZkHnpb4B_vDNC1oj53AntalD-RMupbvz3V41j0&m=ghs_LQJwqsqYsZsSjJr8j_-ycCDfcGqXtZ-C-PI0bDk&s=3yZwcUlNJ6xLXtI7lnEJIohV8WmFxcX7XmGzFHGU6bI&e=> 
>>> 
>>> This deal.II function is called from essentially all ASPECT runs:
>>> 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_geodynamics_aspect_blob_master_source_postprocess_visualization.cc-23L594&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=jeDjvZkHnpb4B_vDNC1oj53AntalD-RMupbvz3V41j0&m=ghs_LQJwqsqYsZsSjJr8j_-ycCDfcGqXtZ-C-PI0bDk&s=NyGcGXgyYFx980WB4c1_ykx_EIWxsTTXVvVIaGS7RUM&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_geodynamics_aspect_blob_master_source_postprocess_visualization.cc-23L594&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=jeDjvZkHnpb4B_vDNC1oj53AntalD-RMupbvz3V41j0&m=ghs_LQJwqsqYsZsSjJr8j_-ycCDfcGqXtZ-C-PI0bDk&s=NyGcGXgyYFx980WB4c1_ykx_EIWxsTTXVvVIaGS7RUM&e=> 
>>> 
>>> The default for the number of grouped files is 16, and I suspect that most people leave it as is -- so basically everyone ends up in the `else` branch in line 598.
>>> 
>>> In other words, while I don't know whether people use checkpoint/restart frequently, pretty much everyone I know uses VTU output, and that uses MPI-IO. I can't really reconcile this, but it seems to suggest that MPI-IO must work for most of our users.
>>> 
>>> Best
>>> Wolfgang
>>> 
>>> 
>>> --
>>> ------------------------------------------------------------------------
>>> Wolfgang Bangerth          email:                 bangerth at colostate.edu <mailto:bangerth at colostate.edu>
>>>                          www: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.math.colostate.edu_-7Ebangerth_&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=jeDjvZkHnpb4B_vDNC1oj53AntalD-RMupbvz3V41j0&m=ghs_LQJwqsqYsZsSjJr8j_-ycCDfcGqXtZ-C-PI0bDk&s=Z_Locat-vl3vRVkIseXVOyooDOUmTQOu_hm4m3_0pug&e= <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.math.colostate.edu_-7Ebangerth_&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=jeDjvZkHnpb4B_vDNC1oj53AntalD-RMupbvz3V41j0&m=ghs_LQJwqsqYsZsSjJr8j_-ycCDfcGqXtZ-C-PI0bDk&s=Z_Locat-vl3vRVkIseXVOyooDOUmTQOu_hm4m3_0pug&e=> 
>>> 
>> 
>> ____________________________________________________________
>> Professor of Geophysics
>> Earth & Planetary Sciences Dept., UC Davis
>> Davis, CA 95616
>> 2129 Earth & Physical Sciences Bldg.
>> Office Phone: (530) 752-4169
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__magalibillen.faculty.ucdavis.edu&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=jeDjvZkHnpb4B_vDNC1oj53AntalD-RMupbvz3V41j0&m=ghs_LQJwqsqYsZsSjJr8j_-ycCDfcGqXtZ-C-PI0bDk&s=QCgav8D3xvHshNvhUkDr8R-GpTR1HG2jOKQEi9Lwg0w&e= <https://urldefense.proofpoint.com/v2/url?u=http-3A__magalibillen.faculty.ucdavis.edu&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=jeDjvZkHnpb4B_vDNC1oj53AntalD-RMupbvz3V41j0&m=ghs_LQJwqsqYsZsSjJr8j_-ycCDfcGqXtZ-C-PI0bDk&s=QCgav8D3xvHshNvhUkDr8R-GpTR1HG2jOKQEi9Lwg0w&e=> 
>> 
>> Currently on Sabbatical at Munich University (LMU)
>> Department of Geophysics (PST + 9 hr)
>> 
>> Avoid implicit bias - check before you submit:
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.tomforth.co.uk_genderbias_&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=jeDjvZkHnpb4B_vDNC1oj53AntalD-RMupbvz3V41j0&m=ghs_LQJwqsqYsZsSjJr8j_-ycCDfcGqXtZ-C-PI0bDk&s=bf1NyNt5N06uB-D8_ikDIuuUB1I3ZC8yb5G7HuRcm2c&e= <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.tomforth.co.uk_genderbias_&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=jeDjvZkHnpb4B_vDNC1oj53AntalD-RMupbvz3V41j0&m=ghs_LQJwqsqYsZsSjJr8j_-ycCDfcGqXtZ-C-PI0bDk&s=bf1NyNt5N06uB-D8_ikDIuuUB1I3ZC8yb5G7HuRcm2c&e=> 
>> ___________________________________________________________
>> 
> 
> 
> -- 
> Timo Heister
> http://www.math.clemson.edu/~heister/ <http://www.math.clemson.edu/~heister/>
____________________________________________________________
Professor of Geophysics 
Earth & Planetary Sciences Dept., UC Davis
Davis, CA 95616
2129 Earth & Physical Sciences Bldg.
Office Phone: (530) 752-4169
http://magalibillen.faculty.ucdavis.edu

Currently on Sabbatical at Munich University (LMU)
Department of Geophysics (PST + 9 hr)

Avoid implicit bias - check before you submit: 
http://www.tomforth.co.uk/genderbias/
___________________________________________________________

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.geodynamics.org/pipermail/aspect-devel/attachments/20180709/fd3059c9/attachment-0001.html>


More information about the Aspect-devel mailing list