[CIG-SHORT] Pylith dies after thousands of time steps (convergence issue)

Tabrez Ali stali at purdue.edu
Thu Apr 30 06:24:30 PDT 2009


Matt

I have tried this on two different machines. One is a cluster with  
mpich2-1.0.5p4/gcc-4.2.1 and the other is an SMP machine with  
mpich2-1.0.8/gcc-4.1.2.

What I havent tried yet is running it on a single processor (there are  
0.25 million elements and it might take a while to run it).

Tabrez


On Apr 30, 2009, at 9:15 AM, Matthew Knepley wrote:

> On Thu, Apr 30, 2009 at 8:11 AM, Tabrez Ali <stali at purdue.edu> wrote:
> Brad
>
> The solution at the last working step does converge and looks okay but
> then nothing happens and it dies. I am however experimenting with
> time_step and will also try to use the debugger.
>
> Btw do you know if I can use --petsc.on_error_attach_debugger when the
> job is submitted via PBS or should I just run it interactively?
>
> I do not understand why this is labeled a convergence issue. Unless  
> I miss what
> you mean by "die". Non-convergence will result in a bad  
> ConvergenceReason
> from the solver, but nothing else. The code will continue to run.
>
> This looks like death from a signal. With the very little  
> information in front of
> me, this looks like a bug in the MPI on this machine. If it was  
> doing Sieve stuff,
> I would put the blame on me. But with PETSc stuff (10+ years old and  
> used by
> thousands of people), I put the blame on MPI or hardware for this  
> computer.
>
>   Matt
>
>
> ...
> ...
> 87 KSP Residual norm 3.579491816101e-07
> 88 KSP Residual norm 3.241876854223e-07
> 89 KSP Residual norm 2.836307394788e-07
>
> [cli_0]: aborting job:
> Fatal error in MPI_Wait: Error message texts are not available
> [cli_1]: aborting job:
> Fatal error in MPI_Wait: Error message texts are not available
> [cli_3]: aborting job:
> Fatal error in MPI_Wait: Error message texts are not available
> [cli_2]: aborting job:
> Fatal error in MPI_Wait: Error message texts are not available
> mpiexec: Warning: tasks 0-3 exited with status 1.
> --pyre-start: mpiexec: exit 1
> /usr/rmt_share/scratch96/s/stali/pylith/bin/pylith: /usr/rmt_share/
> scratch96/s/stali/pylith/bin/nemesis: exit 1
>
> Tabrez
>
> On Apr 29, 2009, at 4:26 PM, Brad Aagaard wrote:
>
> > Tabrez-
> >
> > You may want to set ksp_monitor=true so that you can see the
> > residual. If the
> > residual increases significantly, the solution is losing
> > convergence. This
> > can be alleviated a bit by using an absolute convergence tolerance
> > (ksp_atol). You probably need a slightly smaller time step or
> > slightly higher
> > quality mesh (improve the aspect ratio of the most distorted cells).
> >
> > Brad
> >
> >
> > On Wednesday 29 April 2009 1:13:21 pm Tabrez Ali wrote:
> >> Brad
> >>
> >> I think you were right. The elastic problem worked out fine. I will
> >> now try to play with time step (for the viscous runs)
> >>
> >> Tabrez
> >>
> >> On Apr 29, 2009, at 1:19 PM, Brad Aagaard wrote:
> >>> On Wednesday 29 April 2009 10:09:26 am Tabrez Ali wrote:
> >>>> Also I dont see the error until ~9000 time steps with one set of
> >>>> material properties but get the error at around 4000th time step
> >>>> with
> >>>> a different set of material properties (on the same mesh).
> >>>
> >>> This seems to indicate a time-integration stability issue. Does  
> the
> >>> one that
> >>> has an error after 4000 time steps have a smaller Maxwell time?  
> You
> >>> might try
> >>> running with purely elastic properties. If that works, then you  
> may
> >>> need to
> >>> reduce your time step.
> >
> >
>
> _______________________________________________
> CIG-SHORT mailing list
> CIG-SHORT at geodynamics.org
> http://geodynamics.org/cgi-bin/mailman/listinfo/cig-short
>
>
>
> -- 
> What most experimenters take for granted before they begin their  
> experiments is infinitely more interesting than any results to which  
> their experiments lead.
> -- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://geodynamics.org/pipermail/cig-short/attachments/20090430/5f43c767/attachment-0001.htm 


More information about the CIG-SHORT mailing list