[CIG-SHORT] Nonlinear solve did not converge

Brad Aagaard baagaard at usgs.gov
Wed Oct 29 13:20:32 PDT 2014


Marcelo,

I get the same error you do. This led me to look at the mesh quality. 
You have lots of very distorted cells. We use the quality tools included 
in mesh generation software (such as CUBIT/Trelis) to try to keep the 
maximum aspect ratio under 3.0 (ideally under 2.0). The maximum aspect 
ratio in your mesh is >40 and there are lots of cells with aspect ratios 
 >3. This indicates you have lots of nearly flat cells in your mesh, 
which is probably causing the error creating the fault. This will also 
greatly reduce the convergence rate.

In order to keep the solution error to a reasonable level, I think you 
will want your snes_atol to be 1.0e-6 to 1.0e-8.

My recommendations are to improve the quality of your mesh and tighten 
your tolerances back down so that your snes_atol is 1.0e-7.

Regards,
Brad

On 10/28/2014 12:07 PM, Marcelo Contreras wrote:
> Dear Brad,
>
> Following with my subduction problem, I did several runs with differents
> tolerances parameters and only in few steps there are linear convergency but not
> Non-linear. For zero_tolerance < e-04 the problem it run but there aren’t
> convergence (pylith1.9).   By the other hand, running the identical problem in
> pylith2.0.3 I have an error after “Initializing integrators”.   Does posible
> that you’ll see the config files for a suggestion or idea? , maybe missing
> something and I'm not seeing it by anguish (attached file or complete problem is
> here : https://www.dropbox.com/sh/xbxoumop8qxgy0w/AAAOFF-WxcDB-WOS78s7m7kca?dl=0 )
>
> from already thank you
> best regards
> Marcelo.
>
>
>
>
>
>
>
>
>
>
>
>
>
> ERROR PYLITH-2.0.3
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: Faces should separate only two cells, not 3
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.5-272-gc754102  GIT Date:
> 2014-07-16 16:18:38 -0500
> [0]PETSC ERROR: /Users/marcont/pylith203/bin/mpinemesis on a arch-pylith-gcc-opt
> named OSXmate-5.local by marcont Tue Oct 28 15:30:38 2014
> [0]PETSC ERROR: Configure options --useThreads=0
> --prefix=/Users/buildbot/install/pylith_darwin_10.6_binbot
> --download-fblaslapack=1 --with-c2html=0 --CXXFLAGS=-DMPICH_IGNORE_CXX_SEEK
> --with-clanguage=C --with-mpicompilers=1 --with-debugging=0 --download-chaco=1
> --download-ml=1 --with-hdf5=1
> --with-hdf5-dir=/Users/buildbot/install/pylith_darwin_10.6_binbot --with-x=0
> --with-ssl=0 --with-shared-libraries=1
> [0]PETSC ERROR: #1 DMPlexOrient() line 2402 in
> /Users/buildbot/slave/build/pylith_darwin_10.6_binbot/PETSc/petsc-pylith/binaries/src/dm/impls/plex/plex.c
> [0]PETSC ERROR: #2 static void
> pylith::faults::CohesiveTopology::createFaultParallel(pylith::topology::Mesh*,
> const pylith::topology::Mesh&, int, const char*, bool)() line 197 in
> faults/CohesiveTopology.cc <http://CohesiveTopology.cc>
> Fatal error. Calling MPI_Abort() to abort PyLith application.
> Traceback (most recent call last):
>     File
> "/Users/marcont/pylith203/lib/python2.7/site-packages/pylith/apps/PetscApplication.py",
> line 64, in onComputeNodes
>       self.main(*args, **kwds)
>     File
> "/Users/marcont/pylith203/lib/python2.7/site-packages/pylith/apps/PyLithApp.py",
> line 125, in main
>       self.problem.initialize()
>     File
> "/Users/marcont/pylith203/lib/python2.7/site-packages/pylith/problems/TimeDependent.py",
> line 119, in initialize
>       self.formulation.initialize(self.dimension, self.normalizer)
>     File
> "/Users/marcont/pylith203/lib/python2.7/site-packages/pylith/problems/Implicit.py",
> line 122, in initialize
>       self._initialize(dimension, normalizer)
>     File
> "/Users/marcont/pylith203/lib/python2.7/site-packages/pylith/problems/Formulation.py",
> line 470, in _initialize
>       integrator.initialize(totalTime, numTimeSteps, normalizer)
>     File
> "/Users/marcont/pylith203/lib/python2.7/site-packages/pylith/faults/FaultCohesiveDyn.py",
> line 167, in initialize
>       FaultCohesive.initialize(self, totalTime, numTimeSteps, normalizer)
>     File
> "/Users/marcont/pylith203/lib/python2.7/site-packages/pylith/faults/Fault.py",
> line 170, in initialize
>       ModuleFault.initialize(self, self.mesh(), self.upDir)
>     File
> "/Users/marcont/pylith203/lib/python2.7/site-packages/pylith/faults/faults.py",
> line 321, in initialize
>       def initialize(self, *args): return _faults.Fault_initialize(self, *args)
> RuntimeError: Error detected while in PETSc function.
> application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
> /Users/marcont/pylith203/bin/nemesis: mpirun: exit 255
> /Users/marcont/pylith203/bin/pylith: /Users/marcont/pylith203/bin/nemesis: exit
>
>> El 23-10-2014, a las 21:37, Brad Aagaard <baagaard at usgs.gov
>> <mailto:baagaard at usgs.gov>> escribió:
>>
>> Marcelo,
>>
>> I just discovered yesterday that the custom fault preconditioner in v2.0.0 and
>> later is not working for multiple faults. The values aren't inserted correctly
>> into the matrix so we end up with a lot of zeros in the matrix which results
>> in very slow convergence. My recommendation is to try to use the ASM
>> preconditioner as a temporary solution.
>>
>> Remove the fieldsplit stuff, aij matrix type, and just set the tolerances and use
>>
>> pc_type = asm
>> sub_pc_factor_shift_type = nonzero
>>
>> Matt and I are working on fixing this bug.
>>
>> Brad
>>
>>
>>
>> On 10/23/2014 05:21 PM, Marcelo Andres Contreras Kohl wrote:
>>> Dear Brad, thanks and following your suggestions
>>>
>>> What happens with the parameters from step13.cfg? Does the linear solve
>>> residual blow up, decrease towards some constant value, or decrease but it
>>> hits the maximum number of iterations?
>>>
>>>
>>>  Ok, using the same parameters from step13.cfg the output show with number
>>> of iterations ksp_max_it=1000 the linear solve residual around this ~2e-06
>>>  -- Solving equations.
>>>   0 KSP Residual norm 3.508157467362e-04
>>>   1 KSP Residual norm 3.444174192324e-04
>>>   2 KSP Residual norm 1.721538417703e-04
>>> .
>>> .
>>> 999 KSP Residual norm 1.659138337020e-06
>>> 1000 KSP Residual norm 1.657936306497e-06
>>> Linear solve did not converge due to DIVERGED_ITS iterations 1000
>>
>> From this information, I can't tell if the residual is decreasing towards a
>> constant value or is just decreasing very slowly.
>>
>> What is pylithapp.timedependent.interfaces.fault.zero_tolerance in this case?
>> Make sure it is between the ksp_atol and snes_atol.
>>
>>
>>> Then changing   ksp_atol=1.0e-05, snes_atol=1.0e-03 and zero_tolerance =
>>> 1.0e-04 we have linear convergence in iteration 78 . The figure attached
>>> show the displacement mag (total time 1 year, dt=1yr ).
>>
>> These tolerances are probably too big.
>>
>>>
>>>  76 KSP Residual norm 1.042663776533e-05
>>>
>>>  77 KSP Residual norm 1.013069863722e-05
>>>
>>>  78 KSP Residual norm 9.988824674598e-06
>>>
>>> Linear solve converged due to CONVERGED_ATOL iterations 78
>>>
>>> Improved to this point, but increasing the total_time = 300years in a few
>>> years does not converge if reduce number of iteration.
>>>
>>>   299 KSP Residual norm 1.225148064637e-04
>>>   300 KSP Residual norm 1.223876620855e-04
>>>   Linear solve did not converge due to DIVERGED_ITS iterations 300
>>>   .
>>>   .
>>>   (Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations
>>> 0)
>>>
>>>
>>>
>>> What is the best way to resolve to thousands of years?
>>>
>>>
>>>
>>> Another point is when added gravity with the same parameters above :
>>>
>>> Linear solve converged due to CONVERGED_ATOL iterations 0
>>>   0 SNES Function norm 4.327384975917e+03
>>>     0 KSP Residual norm 1.205460536621e+00
>>>     0 KSP preconditioned resid norm 1.205460536621e+00 true resid norm
>>> 4.327384975917e+03 ||r(i)||/||b|| 1.000000000000e+00
>>>     1 KSP Residual norm 7.199230795177e-01
>>>     .
>>>     .
>>>
>>>   300 KSP Residual norm 2.032016852945e-01
>>>
>>>   300 KSP preconditioned resid norm 2.032016852945e-01 true resid norm
>>> 5.557174056870e+02 ||r(i)||/||b|| 1.284187583910e-01
>>>
>>>   Linear solve did not converge due to DIVERGED_ITS iterations 300
>>>
>>>   … what can i do? reduce more ksp_atol ?
>>>
>>>
>>>
>>> Regards,
>>> Marcelo.
>>>
>>> 2014-10-22 15:27 GMT-03:00 Brad Aagaard <baagaard at usgs.gov
>>> <mailto:baagaard at usgs.gov>>:
>>>
>>>> On 10/22/2014 10:12 AM, Marcelo Contreras wrote:
>>>>
>>>>> Dear Brad,
>>>>>
>>>>> I´m following some examples that include  laws friction  for a 3D
>>>>> subduction
>>>>> model (interseismic).  Using petsc parameter from example step13.cfg (Slip
>>>>> weakening) there is no convergence (linear).
>>>>>
>>>>
>>>> What happens with the parameters from step13.cfg? Does the linear solve
>>>> residual blow up, decrease towards some constant value, or decrease but it
>>>> hits the maximum number of iterations?
>>>>
>>>>  Now with the folllowing parameters
>>>>
>>>>> I have solution (see figura1.png attached)
>>>>> # ejemplo
>>>>> snes_view = true
>>>>> ksp_monitor_true_residual = true
>>>>> fs_pc_type = fieldsplit
>>>>> fs_pc_fieldsplit_schur_precondition = user
>>>>> fs_pc_fieldsplit_real_diagonal = true
>>>>> fs_pc_fieldsplit_type = schur
>>>>> fs_pc_fieldsplit_schur_factorization_type = full
>>>>> fs_fieldsplit_0_ksp_type = preonly
>>>>> fs_fieldsplit_0_pc_type = ml
>>>>> fs_fieldsplit_1_ksp_type = gmres
>>>>> fs_fieldsplit_1_ksp_rtol = 1.0e-05
>>>>> fs_fieldsplit_1_pc_type = jacobi
>>>>>
>>>>
>>>> The field split in PyLith v2.0 uses names for the fields, not numbers, so
>>>> you would need:
>>>>
>>>>> fs_fieldsplit_displacement_ksp_type = preonly
>>>>> fs_fieldsplit_displacement_pc_type = ml
>>>>> fs_fieldsplit_lagrange_multiplier_ksp_type = gmres
>>>>> fs_fieldsplit_lagrange_multiplier_ksp_rtol = 1.0e-05
>>>>> fs_fieldsplit_lagrange_multiplier_pc_type = jacobi
>>>>
>>>>  # Convergence parameters.
>>>>> ksp_rtol = 1.0e-20
>>>>> ksp_atol = 1.0e-09
>>>>> ksp_max_it = 300
>>>>> ksp_gmres_restart = 50
>>>>>
>>>>
>>>> See examples/3d/hex8/step14.cfg for tolerances. You need zero_tolerance to
>>>> be greater than the KSP atol and less than the SNES atol. You can increase
>>>> the values (multiply by 10 or even 100) but the relative sizes should be
>>>> the same as in step14.
>>>>
>>>>
>>>>  The problem stars when I activate "solver = pylith.problems.SolverNonlinear”
>>>>> the
>>>>> error is " Nonlinear solve did not converge”. Petsc parameters are: (Using
>>>>> parameters in step13.cfg didn’t work)
>>>>>
>>>>
>>>> If you see this happening, you should turn on the monitoring of the inner
>>>> solves to diagnose things further. Matt can help you with this.
>>>>
>>>> Regards,
>>>> Brad
>



More information about the CIG-SHORT mailing list