[aspect-devel] A weird error with MPI postprocessing
shangxin Liu
shangxin at vt.edu
Mon Feb 29 12:10:29 PST 2016
Hi Timo,
I set “Number of grouped files” to 1 and used 160 MPI processes. From the error, it seems that the error happens when writing to vtu files. I don’t think it could run out of disk space (160 MPI for 4 global refinement should be enough). I will try to reset “number of grouped file” to 0 (I recalled that last year 1 doesn’t work but in my recent short time cases test 1 worked fine) to see whether it’s still the problem of parallel output.
Yeah, I’ll also submit a simple case for a long wall time to see what happens.
By the way, we’re now using gcc-4.7.2 and openmpi-1.6.4, are these versions good for ASPECT?
I created two issues for the detail on git hub if you want to take a look.
https://github.com/geodynamics/aspect/issues/773
https://github.com/geodynamics/aspect/issues/774
Another one is the another error after ~3000 time steps when building Stokes preconditioner. I’m not sure whether these two are related.
Best,
Shangxin
On Feb 29, 2016, at 9:24 AM, Timo Heister <timo.heister at gmail.com> wrote:
> Shangxin,
>
> I have no clue why it could fail. What did you set "Number of grouped
> files" to? How many MPI ranks? What kind of filesystem are you writing
> to? Could you be running out of disk space?
>
> Maybe we or the implementation is leaking file handles. Can you check
> if the same thing happens if you run a simple 2d convection-box after
> ~3000 postprocessing steps?
>
> On Mon, Feb 29, 2016 at 12:06 AM, Shangxin Liu <sxliu at vt.edu> wrote:
>> Hi;
>>
>> Recently, when I'm running the time-dependent cases, sometimes the jobs will
>> fail at the postprocessing part after running more than ~20 hours. I paste
>> the error here (from one of my cases):
>>
>> ----------------------------------------------------
>>
>> Exception on MPI process <76> while running postprocessor
>> <N6aspect11Postprocess13VisualizationILi3EEE>:
>>
>>
>> --------------------------------------------------------
>>
>> An error occurred in line <6156> of file
>> </home/shangxin/sources/dealii/source/base/data_out_base.cc> in function
>>
>> void dealii::DataOutInterface<dim,
>> spacedim>::write_vtu_in_parallel(const char*, MPI_Comm) const [with int dim
>> = 3; int spacedim = 3; MPI_Comm = ompi_communicator_t*]
>>
>> The violated condition was:
>>
>> err==0
>>
>> The name and call sequence of the exception was:
>>
>> ExcMessage("Unable to open file with MPI_File_open!")
>>
>> Additional Information:
>>
>> Unable to open file with MPI_File_open!
>>
>> --------------------------------------------------------
>>
>>
>> Aborting!
>>
>> ----------------------------------------------------
>>
>>
>> This error often appears after running dozens of hours so it's hard to debug
>> in short time test. It seems that this error is related with writing the
>> visualization postprocess output to files. But if so, it means that after
>> some time step the postprocessing output can proceed but after another
>> certain time step the postprocessing will not work (In my case, the code
>> crashed at the time step ~3000 postprocessing). I'm using the ASPECT and
>> dealii from git hub and didn't modify anything in the postprocessing code.
>>
>> Any idea why this problem appears and how to solve it?
>>
>> Best,
>> Shangxin
>>
>>
>> _______________________________________________
>> Aspect-devel mailing list
>> Aspect-devel at geodynamics.org
>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
More information about the Aspect-devel
mailing list