[aspect-devel] A weird error with MPI postprocessing

shangxin Liu shangxin at vt.edu
Mon Feb 29 12:10:29 PST 2016


Hi Timo,

I set “Number of grouped files” to 1 and used 160 MPI processes. From the error, it seems that the error happens when writing to vtu files. I don’t think it could run out of disk space (160 MPI for 4 global refinement should be enough). I will try to reset “number of grouped file” to 0 (I recalled that last year 1 doesn’t work but in my recent short time cases test 1 worked fine) to see whether it’s still the problem of parallel output.

Yeah, I’ll also submit a simple case for a long wall time to see what happens.

By the way, we’re now using gcc-4.7.2 and openmpi-1.6.4, are these versions good for ASPECT?

I created two issues for the detail on git hub if you want to take a look.

https://github.com/geodynamics/aspect/issues/773
https://github.com/geodynamics/aspect/issues/774

Another one is the another error after ~3000 time steps when building Stokes preconditioner. I’m not sure whether these two are related.

Best,
Shangxin

On Feb 29, 2016, at 9:24 AM, Timo Heister <timo.heister at gmail.com> wrote:

> Shangxin,
> 
> I have no clue why it could fail. What did you set "Number of grouped
> files" to? How many MPI ranks? What kind of filesystem are you writing
> to? Could you be running out of disk space?
> 
> Maybe we or the implementation is leaking file handles. Can you check
> if the same thing happens if you run a simple 2d convection-box after
> ~3000 postprocessing steps?
> 
> On Mon, Feb 29, 2016 at 12:06 AM, Shangxin Liu <sxliu at vt.edu> wrote:
>> Hi;
>> 
>> Recently, when I'm running the time-dependent cases, sometimes the jobs will
>> fail at the postprocessing part after running more than ~20 hours. I paste
>> the error here (from one of my cases):
>> 
>> ----------------------------------------------------
>> 
>> Exception on MPI process <76> while running postprocessor
>> <N6aspect11Postprocess13VisualizationILi3EEE>:
>> 
>> 
>> --------------------------------------------------------
>> 
>> An error occurred in line <6156> of file
>> </home/shangxin/sources/dealii/source/base/data_out_base.cc> in function
>> 
>>    void dealii::DataOutInterface<dim,
>> spacedim>::write_vtu_in_parallel(const char*, MPI_Comm) const [with int dim
>> = 3; int spacedim = 3; MPI_Comm = ompi_communicator_t*]
>> 
>> The violated condition was:
>> 
>>    err==0
>> 
>> The name and call sequence of the exception was:
>> 
>>    ExcMessage("Unable to open file with MPI_File_open!")
>> 
>> Additional Information:
>> 
>> Unable to open file with MPI_File_open!
>> 
>> --------------------------------------------------------
>> 
>> 
>> Aborting!
>> 
>> ----------------------------------------------------
>> 
>> 
>> This error often appears after running dozens of hours so it's hard to debug
>> in short time test. It seems that this error is related with writing the
>> visualization postprocess output to files. But if so, it means that after
>> some time step the postprocessing output can proceed but after another
>> certain time step the postprocessing will not work (In my case, the code
>> crashed at the time step ~3000 postprocessing). I'm using the ASPECT and
>> dealii from git hub and didn't modify anything in the postprocessing code.
>> 
>> Any idea why this problem appears and how to solve it?
>> 
>> Best,
>> Shangxin
>> 
>> 
>> _______________________________________________
>> Aspect-devel mailing list
>> Aspect-devel at geodynamics.org
>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel



More information about the Aspect-devel mailing list