[aspect-devel] convection-box-3d example hangs when more than one node is used

Rene Gassmoeller rengas at gfz-potsdam.de
Mon Aug 3 05:20:20 PDT 2015


Hi Rob,
I tried to reproduce your problem with the same trilinos, p4est, deal.II
and aspect versions (older mpi and newer gcc though), but unfortunately
without success. It seems we need to do some tests on your machine to
find the problem. Could you as a first step disable the visualization
output plugin by removing 'visualization' from the line 'set List of
postprocessors' (just to be sure it is nothing from the last timestep
... it is written in a background thread and might cause a delayed crash).
Next we need to figure out, what is happening between the last output
and the next expected output. From what I see the last output you get is
the start of the new timestep. Between there and the next message
("Solving Temperature System") only the boundary conditions and user
plugins get updated and the temperature system and temperature
preconditioner is assembled. I attached a patch with additional debug
output for core.cc. When you apply the patch ('patch -p1 < patch' in
your aspect folder) and rebuild aspect you should see additional output
after the last line. Could you check which lines get printed in Timestep
1? After that we can think about what is causing the issue.

Best
Rene

On 07/31/2015 10:51 PM, Robert Moucha wrote:
> Finally got a chance to get back to this after some travel:
> 
> I tried step-32 example of deal.II and it runs without a problem on
> more than one compute node.
> 
> ASPECT is the default 1.3 version, the problem occurs with both debug
> and release versions.
> 
> To compile ASPECT I'm using:
> 
> gcc 4.4.7
> BLAS and LAPACK 3.2.1-4
> OpenMPI 1.8.4
> Trilinos 12.0.1 with CXX11=OFF
> p4est 1.1
> pdhf5 1.8.15
> deal.ii 8.2.1
> 
> no problems with compiling these as far as I could tell
> 
> When ASPECT hangs, right after initial time step, all the files
> solution-00000 for all processors are written without issue as well as
> other files, then it just hangs.
> 
> Thanks,
> Rob
> 
>> Message: 1
>> Date: Sun, 19 Jul 2015 12:25:14 +0200
>> From: Rene Gassmoeller <rengas at gfz-potsdam.de>
>> To: aspect-devel at geodynamics.org
>> Subject: Re: [aspect-devel] convection-box-3d example hangs when more
>>         than one node is used
>> Message-ID: <55AB7B0A.8040503 at gfz-potsdam.de>
>> Content-Type: text/plain; charset=utf-8
>>
>> Hi Rob,
>> I just checked the convection-box-3d on 2 nodes of our cluster and it
>> runs fine. So it seems there is something special about your
>> installation or there is a bug that is only showing in this
>> configuration. Could you test the following things for us to be able to
>> give you some more help:
>>
>> 1. Try running convection-box with an ASPECT compiled in debug mode.
>> Maybe there is some error message suppressed by the release mode.
>> 2. Could you try to compile the deal.II example step-32 and run that one
>> on more than one node of your cluster? It should be in your deal.II
>> folder /examples/step-32, and compiling should be as simple as 'cmake .
>> && make'. This will give us some insight if something in aspect is
>> causing the problem or if it is an issue with the deal.II code or
>> configuration.
>> 3. We need some more information on your deal.II configuration (your
>> ASPECT is an unchanged 1.3, right?). Which version of deal.II are you
>> using? Which trilinos, p4est and compiler? Were there any problems
>> during compiling those?
>>
>> Best,
>> Rene
>>
>> On 07/18/2015 12:23 AM, Robert Moucha wrote:
>>> Hi Timo,
>>>
>>> Yes I still have the same problem. It occurs with the following cook
>>> books (have not tried all, but it looks like anything to do with time
>>> stepping is causing the hang):
>>>
>>> convection-box
>>> convection-box-3d
>>> shell_simple_2d
>>> van-keken-discontinuous
>>>
>>> Thanks
>>> Rob
>>>
>>>> Hey Robert,
>>>>
>>>> sorry for only getting back to this now. Any update on your problem?
>>>> Does this happen with every .prm file (like a simple 2d problem)?
>>>>
>>>> On Sun, Jul 5, 2015 at 6:10 PM, Robert Moucha <rmoucha at gmail.com> wrote:
>>>>> OK, it appears that I solved last-weeks issue with the files, turns
>>>>> out one of the nodes did not have the correct paths (thanks).
>>>>>
>>>>> However, now I am still having problems when using more than one node,
>>>>> this time ASPECT just hangs on time step 1, no error, the
>>>>> solution-00000 files are created on each of the nodes than nothing.
>>>>>
>>>>> It runs fine on a single node. I should point out that the ASPECT
>>>>> example stokes.prm as well as Citcoms runs on the cluster without
>>>>> issues.
>>>>>
>>>>> Here is the log.txt for the convection-box-3d.prm -- thanks in advance Rob
>>>>>
>>>>> -----------------------------------------------------------------------------
>>>>> -- This is ASPECT, the Advanced Solver for Problems in Earth's ConvecTion.
>>>>> --     . version 1.3
>>>>> --     . running in OPTIMIZED mode
>>>>> --     . running with 12 MPI processes
>>>>> --     . using Trilinos
>>>>> -----------------------------------------------------------------------------
>>>>>
>>>>> Number of active cells: 512 (on 4 levels)
>>>>> Number of degrees of freedom: 20381 (14739+729+4913)
>>>>>
>>>>> *** Timestep 0:  t=0 seconds
>>>>>    Solving temperature system... 0 iterations.
>>>>>    Rebuilding Stokes preconditioner...
>>>>>    Solving Stokes system... 29 iterations.
>>>>>
>>>>> Number of active cells: 1583 (on 5 levels)
>>>>> Number of degrees of freedom: 63622 (46077+2186+15359)
>>>>>
>>>>> *** Timestep 0:  t=0 seconds
>>>>>    Solving temperature system... 0 iterations.
>>>>>    Rebuilding Stokes preconditioner...
>>>>>    Solving Stokes system... 30+4 iterations.
>>>>>
>>>>> Number of active cells: 3256 (on 5 levels)
>>>>> Number of degrees of freedom: 122269 (88647+4073+29549)
>>>>>
>>>>> *** Timestep 0:  t=0 seconds
>>>>>    Solving temperature system... 0 iterations.
>>>>>    Rebuilding Stokes preconditioner...
>>>>>    Solving Stokes system... 30+4 iterations.
>>>>>
>>>>> Number of active cells: 9010 (on 6 levels)
>>>>> Number of degrees of freedom: 333145 (241677+10909+80559)
>>>>>
>>>>> *** Timestep 0:  t=0 seconds
>>>>>    Solving temperature system... 0 iterations.
>>>>>    Rebuilding Stokes preconditioner...
>>>>>    Solving Stokes system... 30+4 iterations.
>>>>>
>>>>>    Postprocessing:
>>>>>      RMS, max velocity:                  57.6 m/s, 176 m/s
>>>>>      Temperature min/avg/max:            0 K, 0.5 K, 1 K
>>>>>      Heat fluxes through boundary parts: 7.682e-07 W, -7.682e-07 W,
>>>>> 1.685e-15 W, 2.362e-15 W, -1 W, 1 W
>>>>>      Writing graphical output:
>>>>> /state/partition1/RMOUCHA/output/solution-00000
>>>>>
>>>>> *** Timestep 1:  t=8.87115e-05 seconds
>>>>>
>>>>>
>>>>> ------------------------------------------------------------
>>>>> Robert Moucha
>>>>> Assistant Professor of Geophysics
>>>>> Department of Earth Sciences
>>>>> 204 Heroy Geology Lab
>>>>> Syracuse University
>>>>> Syracuse, NY, 13244-1070
>>>>> _______________________________________________
>>>>> Aspect-devel mailing list
>>>>> Aspect-devel at geodynamics.org
>>>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> Subject: Digest Footer
>>>>
>>>> _______________________________________________
>>>> Aspect-devel mailing list
>>>> Aspect-devel at geodynamics.org
>>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>>>>
>>>> ------------------------------
>>>>
>>>> End of Aspect-devel Digest, Vol 44, Issue 9
>>>> *******************************************
>>>
>>>
>>>
>>
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> Aspect-devel mailing list
>> Aspect-devel at geodynamics.org
>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>>
>> ------------------------------
>>
>> End of Aspect-devel Digest, Vol 44, Issue 11
>> ********************************************
> 
> 
> 
-------------- next part --------------
diff --git a/source/simulator/core.cc b/source/simulator/core.cc
index 6516ac2..ecf881c 100644
--- a/source/simulator/core.cc
+++ b/source/simulator/core.cc
@@ -636,6 +636,8 @@ namespace aspect
     gravity_model->update();
     heating_model->update();
     adiabatic_conditions->update();
+
+    pcout << "   Updated constraints and plugins." << std::endl;
   }
 
 
@@ -1532,7 +1534,12 @@ namespace aspect
           if (parameters.free_surface_enabled)
             free_surface->execute ();
 
+          pcout << "   Assemble temperature system." << std::endl;
+
           assemble_advection_system (AdvectionField::temperature());
+
+          pcout << "   Build temperature preconditioner." << std::endl;
+
           build_advection_preconditioner(AdvectionField::temperature(),
                                          T_preconditioner);
           solve_advection(AdvectionField::temperature());


More information about the Aspect-devel mailing list