[aspect-devel] convection-box-3d example hangs when more than one node is used (Rene Gassmoeller)

Robert Moucha rmoucha at gmail.com
Fri Aug 7 08:17:29 PDT 2015


Rene, the cause of the hang is the visualization plugin. When I
removed this, I can run on more than one node to completion.

Rob

On Mon, Aug 3, 2015 at 3:00 PM,  <aspect-devel-request at geodynamics.org> wrote:
> Send Aspect-devel mailing list submissions to
>         aspect-devel at geodynamics.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
> or, via email, send a message with subject or body 'help' to
>         aspect-devel-request at geodynamics.org
>
> You can reach the person managing the list at
>         aspect-devel-owner at geodynamics.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Aspect-devel digest..."
>
>
> Today's Topics:
>
>    1. Re: convection-box-3d example hangs when more than one node
>       is used (Rene Gassmoeller)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 03 Aug 2015 14:20:20 +0200
> From: Rene Gassmoeller <rengas at gfz-potsdam.de>
> To: aspect-devel at geodynamics.org
> Subject: Re: [aspect-devel] convection-box-3d example hangs when more
>         than one node is used
> Message-ID: <55BF5C84.6010908 at gfz-potsdam.de>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Rob,
> I tried to reproduce your problem with the same trilinos, p4est, deal.II
> and aspect versions (older mpi and newer gcc though), but unfortunately
> without success. It seems we need to do some tests on your machine to
> find the problem. Could you as a first step disable the visualization
> output plugin by removing 'visualization' from the line 'set List of
> postprocessors' (just to be sure it is nothing from the last timestep
> ... it is written in a background thread and might cause a delayed crash).
> Next we need to figure out, what is happening between the last output
> and the next expected output. From what I see the last output you get is
> the start of the new timestep. Between there and the next message
> ("Solving Temperature System") only the boundary conditions and user
> plugins get updated and the temperature system and temperature
> preconditioner is assembled. I attached a patch with additional debug
> output for core.cc. When you apply the patch ('patch -p1 < patch' in
> your aspect folder) and rebuild aspect you should see additional output
> after the last line. Could you check which lines get printed in Timestep
> 1? After that we can think about what is causing the issue.
>
> Best
> Rene
>
> On 07/31/2015 10:51 PM, Robert Moucha wrote:
>> Finally got a chance to get back to this after some travel:
>>
>> I tried step-32 example of deal.II and it runs without a problem on
>> more than one compute node.
>>
>> ASPECT is the default 1.3 version, the problem occurs with both debug
>> and release versions.
>>
>> To compile ASPECT I'm using:
>>
>> gcc 4.4.7
>> BLAS and LAPACK 3.2.1-4
>> OpenMPI 1.8.4
>> Trilinos 12.0.1 with CXX11=OFF
>> p4est 1.1
>> pdhf5 1.8.15
>> deal.ii 8.2.1
>>
>> no problems with compiling these as far as I could tell
>>
>> When ASPECT hangs, right after initial time step, all the files
>> solution-00000 for all processors are written without issue as well as
>> other files, then it just hangs.
>>
>> Thanks,
>> Rob
>>
>>> Message: 1
>>> Date: Sun, 19 Jul 2015 12:25:14 +0200
>>> From: Rene Gassmoeller <rengas at gfz-potsdam.de>
>>> To: aspect-devel at geodynamics.org
>>> Subject: Re: [aspect-devel] convection-box-3d example hangs when more
>>>         than one node is used
>>> Message-ID: <55AB7B0A.8040503 at gfz-potsdam.de>
>>> Content-Type: text/plain; charset=utf-8
>>>
>>> Hi Rob,
>>> I just checked the convection-box-3d on 2 nodes of our cluster and it
>>> runs fine. So it seems there is something special about your
>>> installation or there is a bug that is only showing in this
>>> configuration. Could you test the following things for us to be able to
>>> give you some more help:
>>>
>>> 1. Try running convection-box with an ASPECT compiled in debug mode.
>>> Maybe there is some error message suppressed by the release mode.
>>> 2. Could you try to compile the deal.II example step-32 and run that one
>>> on more than one node of your cluster? It should be in your deal.II
>>> folder /examples/step-32, and compiling should be as simple as 'cmake .
>>> && make'. This will give us some insight if something in aspect is
>>> causing the problem or if it is an issue with the deal.II code or
>>> configuration.
>>> 3. We need some more information on your deal.II configuration (your
>>> ASPECT is an unchanged 1.3, right?). Which version of deal.II are you
>>> using? Which trilinos, p4est and compiler? Were there any problems
>>> during compiling those?
>>>
>>> Best,
>>> Rene
>>>
>>> On 07/18/2015 12:23 AM, Robert Moucha wrote:
>>>> Hi Timo,
>>>>
>>>> Yes I still have the same problem. It occurs with the following cook
>>>> books (have not tried all, but it looks like anything to do with time
>>>> stepping is causing the hang):
>>>>
>>>> convection-box
>>>> convection-box-3d
>>>> shell_simple_2d
>>>> van-keken-discontinuous
>>>>
>>>> Thanks
>>>> Rob
>>>>
>>>>> Hey Robert,
>>>>>
>>>>> sorry for only getting back to this now. Any update on your problem?
>>>>> Does this happen with every .prm file (like a simple 2d problem)?
>>>>>
>>>>> On Sun, Jul 5, 2015 at 6:10 PM, Robert Moucha <rmoucha at gmail.com> wrote:
>>>>>> OK, it appears that I solved last-weeks issue with the files, turns
>>>>>> out one of the nodes did not have the correct paths (thanks).
>>>>>>
>>>>>> However, now I am still having problems when using more than one node,
>>>>>> this time ASPECT just hangs on time step 1, no error, the
>>>>>> solution-00000 files are created on each of the nodes than nothing.
>>>>>>
>>>>>> It runs fine on a single node. I should point out that the ASPECT
>>>>>> example stokes.prm as well as Citcoms runs on the cluster without
>>>>>> issues.
>>>>>>
>>>>>> Here is the log.txt for the convection-box-3d.prm -- thanks in advance Rob
>>>>>>
>>>>>> -----------------------------------------------------------------------------
>>>>>> -- This is ASPECT, the Advanced Solver for Problems in Earth's ConvecTion.
>>>>>> --     . version 1.3
>>>>>> --     . running in OPTIMIZED mode
>>>>>> --     . running with 12 MPI processes
>>>>>> --     . using Trilinos
>>>>>> -----------------------------------------------------------------------------
>>>>>>
>>>>>> Number of active cells: 512 (on 4 levels)
>>>>>> Number of degrees of freedom: 20381 (14739+729+4913)
>>>>>>
>>>>>> *** Timestep 0:  t=0 seconds
>>>>>>    Solving temperature system... 0 iterations.
>>>>>>    Rebuilding Stokes preconditioner...
>>>>>>    Solving Stokes system... 29 iterations.
>>>>>>
>>>>>> Number of active cells: 1583 (on 5 levels)
>>>>>> Number of degrees of freedom: 63622 (46077+2186+15359)
>>>>>>
>>>>>> *** Timestep 0:  t=0 seconds
>>>>>>    Solving temperature system... 0 iterations.
>>>>>>    Rebuilding Stokes preconditioner...
>>>>>>    Solving Stokes system... 30+4 iterations.
>>>>>>
>>>>>> Number of active cells: 3256 (on 5 levels)
>>>>>> Number of degrees of freedom: 122269 (88647+4073+29549)
>>>>>>
>>>>>> *** Timestep 0:  t=0 seconds
>>>>>>    Solving temperature system... 0 iterations.
>>>>>>    Rebuilding Stokes preconditioner...
>>>>>>    Solving Stokes system... 30+4 iterations.
>>>>>>
>>>>>> Number of active cells: 9010 (on 6 levels)
>>>>>> Number of degrees of freedom: 333145 (241677+10909+80559)
>>>>>>
>>>>>> *** Timestep 0:  t=0 seconds
>>>>>>    Solving temperature system... 0 iterations.
>>>>>>    Rebuilding Stokes preconditioner...
>>>>>>    Solving Stokes system... 30+4 iterations.
>>>>>>
>>>>>>    Postprocessing:
>>>>>>      RMS, max velocity:                  57.6 m/s, 176 m/s
>>>>>>      Temperature min/avg/max:            0 K, 0.5 K, 1 K
>>>>>>      Heat fluxes through boundary parts: 7.682e-07 W, -7.682e-07 W,
>>>>>> 1.685e-15 W, 2.362e-15 W, -1 W, 1 W
>>>>>>      Writing graphical output:
>>>>>> /state/partition1/RMOUCHA/output/solution-00000
>>>>>>
>>>>>> *** Timestep 1:  t=8.87115e-05 seconds
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------
>>>>>> Robert Moucha
>>>>>> Assistant Professor of Geophysics
>>>>>> Department of Earth Sciences
>>>>>> 204 Heroy Geology Lab
>>>>>> Syracuse University
>>>>>> Syracuse, NY, 13244-1070
>>>>>> _______________________________________________
>>>>>> Aspect-devel mailing list
>>>>>> Aspect-devel at geodynamics.org
>>>>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> Subject: Digest Footer
>>>>>
>>>>> _______________________________________________
>>>>> Aspect-devel mailing list
>>>>> Aspect-devel at geodynamics.org
>>>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> End of Aspect-devel Digest, Vol 44, Issue 9
>>>>> *******************************************
>>>>
>>>>
>>>>
>>>
>>>
>>> ------------------------------
>>>
>>> Subject: Digest Footer
>>>
>>> _______________________________________________
>>> Aspect-devel mailing list
>>> Aspect-devel at geodynamics.org
>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>>>
>>> ------------------------------
>>>
>>> End of Aspect-devel Digest, Vol 44, Issue 11
>>> ********************************************
>>
>>
>>
> -------------- next part --------------
> diff --git a/source/simulator/core.cc b/source/simulator/core.cc
> index 6516ac2..ecf881c 100644
> --- a/source/simulator/core.cc
> +++ b/source/simulator/core.cc
> @@ -636,6 +636,8 @@ namespace aspect
>      gravity_model->update();
>      heating_model->update();
>      adiabatic_conditions->update();
> +
> +    pcout << "   Updated constraints and plugins." << std::endl;
>    }
>
>
> @@ -1532,7 +1534,12 @@ namespace aspect
>            if (parameters.free_surface_enabled)
>              free_surface->execute ();
>
> +          pcout << "   Assemble temperature system." << std::endl;
> +
>            assemble_advection_system (AdvectionField::temperature());
> +
> +          pcout << "   Build temperature preconditioner." << std::endl;
> +
>            build_advection_preconditioner(AdvectionField::temperature(),
>                                           T_preconditioner);
>            solve_advection(AdvectionField::temperature());
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Aspect-devel mailing list
> Aspect-devel at geodynamics.org
> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>
> ------------------------------
>
> End of Aspect-devel Digest, Vol 45, Issue 2
> *******************************************



-- 
------------------------------------------------------------
Robert Moucha
Assistant Professor of Geophysics
Department of Earth Sciences
204 Heroy Geology Lab
Syracuse University
Syracuse, NY, 13244-1070


More information about the Aspect-devel mailing list