[aspect-devel] Aspect-devel Digest, Vol 45, Issue 8
Rene Gassmöller
r.gassmoeller at mailbox.org
Tue Feb 9 14:18:10 PST 2016
Hi Sam,
so far we have no news on this problem. However it seems several users
experience issues with this part of the code. Although there is a
workaround as you mentioned, we should try to create a better default
behavior. I will give it a look. Could you add an issue to the github
page to remind everyone of the problem?
Best,
Rene
On 02/09/2016 04:40 AM, Cox, Samuel P. wrote:
> Hi all,
>
> Sorry to dredge up a fairly old conversation, but I ran into the same problem when running Aspect on our cluster - it runs fine on a single node, but hangs on visualization when multiple nodes are used. I tried the fixes suggested below: nothing I did to TMP or TMPDIR seemed to make a difference, but changing set Number of grouped files = 1 does fix it. I am not too distressed by having everything outputted to a single file, but was wondering whether anybody came up with a better understanding or better fix?
>
> Sam
>
>> On 7 Aug 2015, at 20:00, aspect-devel-request at geodynamics.org wrote:
>>
>> Send Aspect-devel mailing list submissions to
>> aspect-devel at geodynamics.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>> or, via email, send a message with subject or body 'help' to
>> aspect-devel-request at geodynamics.org
>>
>> You can reach the person managing the list at
>> aspect-devel-owner at geodynamics.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Aspect-devel digest..."
>>
>>
>> Today's Topics:
>>
>> 1. Re: convection-box-3d example hangs when more than one node
>> is used (Rene Gassmoeller) (Rene Gassmoeller)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Fri, 07 Aug 2015 20:52:43 +0200
>> From: Rene Gassmoeller <rengas at gfz-potsdam.de>
>> To: aspect-devel at geodynamics.org
>> Subject: Re: [aspect-devel] convection-box-3d example hangs when more
>> than one node is used (Rene Gassmoeller)
>> Message-ID: <55C4FE7B.4070604 at gfz-potsdam.de>
>> Content-Type: text/plain; charset=utf-8
>>
>> Actually, currently the code looks for $TMPDIR not $TMP, but setting the
>> export is still worth a try. In particular we could find out if the
>> temporary directory is causing the issue or the background writing thread.
>>
>> Maybe we could make the temporary directory an input parameter, and only
>> try to use a temporary directory, when either $TMP or $TMPDIR is set, or
>> the input parameter is set, or /tmp is available (in the order parameter
>>> shell variable > /tmp). On the other hand we already check if the
>> folder is available before writing, and if it is not available it should
>> simply write the file directly to the final location. But somehow this
>> seems to crash in Rob's case.
>>
>>
>> On 08/07/2015 08:44 PM, Timo Heister wrote:
>>> I was just about to write the same suggestions, Rene. :-)
>>>
>>> It could be that $TMP is not set up correctly on one of the nodes or
>>> the disk is full. Not sure how we can make this more robust from
>>> inside ASPECT.
>>>
>>> Rob, another thing to try would be to "export TMP=~/mytmp" with some
>>> directory that you can write into. Do this in the launch script before
>>> mpirun.
>>>
>>>
>>>
>>>
>>> On Fri, Aug 7, 2015 at 1:39 PM, Rene Gassmoeller <rengas at gfz-potsdam.de> wrote:
>>>> Hi Rob,
>>>> great, then we know where to look. Could you add
>>>>
>>>> 'set Number of grouped files = 1'
>>>>
>>>> in the Visualization subsection and re-enable the plugin? With this
>>>> setting all output will be written as MPI-IO into one file per timestep.
>>>> If this works, something with writing one file per process in a
>>>> background thread in a temporary folder is not working. If it does not
>>>> work we will have to look deeper into deal.II.
>>>>
>>>> If this works it is also a great (semi-)permanent workaround for your
>>>> problem, since you usually do not need one output file per process (we
>>>> just can not rely on the fact that MPI-IO is available on all systems
>>>> aspect is running on, therefore it is not the default option).
>>>>
>>>> Best,
>>>> Rene
>>>>
>>>> PS: Alternatively you could try the hdf5 output format instead of vtu.
>>>> ('set Output format = hdf5' in the Visualization subsection).
>>>>
>>>>
>>>> On 08/07/2015 05:17 PM, Robert Moucha wrote:
>>>>> Rene, the cause of the hang is the visualization plugin. When I
>>>>> removed this, I can run on more than one node to completion.
>>>>>
>>>>> Rob
>>>>>
>>>>> On Mon, Aug 3, 2015 at 3:00 PM, <aspect-devel-request at geodynamics.org> wrote:
>>>>>> Send Aspect-devel mailing list submissions to
>>>>>> aspect-devel at geodynamics.org
>>>>>>
>>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>>>>>> or, via email, send a message with subject or body 'help' to
>>>>>> aspect-devel-request at geodynamics.org
>>>>>>
>>>>>> You can reach the person managing the list at
>>>>>> aspect-devel-owner at geodynamics.org
>>>>>>
>>>>>> When replying, please edit your Subject line so it is more specific
>>>>>> than "Re: Contents of Aspect-devel digest..."
>>>>>>
>>>>>>
>>>>>> Today's Topics:
>>>>>>
>>>>>> 1. Re: convection-box-3d example hangs when more than one node
>>>>>> is used (Rene Gassmoeller)
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------------
>>>>>>
>>>>>> Message: 1
>>>>>> Date: Mon, 03 Aug 2015 14:20:20 +0200
>>>>>> From: Rene Gassmoeller <rengas at gfz-potsdam.de>
>>>>>> To: aspect-devel at geodynamics.org
>>>>>> Subject: Re: [aspect-devel] convection-box-3d example hangs when more
>>>>>> than one node is used
>>>>>> Message-ID: <55BF5C84.6010908 at gfz-potsdam.de>
>>>>>> Content-Type: text/plain; charset="utf-8"
>>>>>>
>>>>>> Hi Rob,
>>>>>> I tried to reproduce your problem with the same trilinos, p4est, deal.II
>>>>>> and aspect versions (older mpi and newer gcc though), but unfortunately
>>>>>> without success. It seems we need to do some tests on your machine to
>>>>>> find the problem. Could you as a first step disable the visualization
>>>>>> output plugin by removing 'visualization' from the line 'set List of
>>>>>> postprocessors' (just to be sure it is nothing from the last timestep
>>>>>> ... it is written in a background thread and might cause a delayed crash).
>>>>>> Next we need to figure out, what is happening between the last output
>>>>>> and the next expected output. From what I see the last output you get is
>>>>>> the start of the new timestep. Between there and the next message
>>>>>> ("Solving Temperature System") only the boundary conditions and user
>>>>>> plugins get updated and the temperature system and temperature
>>>>>> preconditioner is assembled. I attached a patch with additional debug
>>>>>> output for core.cc. When you apply the patch ('patch -p1 < patch' in
>>>>>> your aspect folder) and rebuild aspect you should see additional output
>>>>>> after the last line. Could you check which lines get printed in Timestep
>>>>>> 1? After that we can think about what is causing the issue.
>>>>>>
>>>>>> Best
>>>>>> Rene
>>>>>>
>>>>>> On 07/31/2015 10:51 PM, Robert Moucha wrote:
>>>>>>> Finally got a chance to get back to this after some travel:
>>>>>>>
>>>>>>> I tried step-32 example of deal.II and it runs without a problem on
>>>>>>> more than one compute node.
>>>>>>>
>>>>>>> ASPECT is the default 1.3 version, the problem occurs with both debug
>>>>>>> and release versions.
>>>>>>>
>>>>>>> To compile ASPECT I'm using:
>>>>>>>
>>>>>>> gcc 4.4.7
>>>>>>> BLAS and LAPACK 3.2.1-4
>>>>>>> OpenMPI 1.8.4
>>>>>>> Trilinos 12.0.1 with CXX11=OFF
>>>>>>> p4est 1.1
>>>>>>> pdhf5 1.8.15
>>>>>>> deal.ii 8.2.1
>>>>>>>
>>>>>>> no problems with compiling these as far as I could tell
>>>>>>>
>>>>>>> When ASPECT hangs, right after initial time step, all the files
>>>>>>> solution-00000 for all processors are written without issue as well as
>>>>>>> other files, then it just hangs.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rob
>>>>>>>
>>>>>>>> Message: 1
>>>>>>>> Date: Sun, 19 Jul 2015 12:25:14 +0200
>>>>>>>> From: Rene Gassmoeller <rengas at gfz-potsdam.de>
>>>>>>>> To: aspect-devel at geodynamics.org
>>>>>>>> Subject: Re: [aspect-devel] convection-box-3d example hangs when more
>>>>>>>> than one node is used
>>>>>>>> Message-ID: <55AB7B0A.8040503 at gfz-potsdam.de>
>>>>>>>> Content-Type: text/plain; charset=utf-8
>>>>>>>>
>>>>>>>> Hi Rob,
>>>>>>>> I just checked the convection-box-3d on 2 nodes of our cluster and it
>>>>>>>> runs fine. So it seems there is something special about your
>>>>>>>> installation or there is a bug that is only showing in this
>>>>>>>> configuration. Could you test the following things for us to be able to
>>>>>>>> give you some more help:
>>>>>>>>
>>>>>>>> 1. Try running convection-box with an ASPECT compiled in debug mode.
>>>>>>>> Maybe there is some error message suppressed by the release mode.
>>>>>>>> 2. Could you try to compile the deal.II example step-32 and run that one
>>>>>>>> on more than one node of your cluster? It should be in your deal.II
>>>>>>>> folder /examples/step-32, and compiling should be as simple as 'cmake .
>>>>>>>> && make'. This will give us some insight if something in aspect is
>>>>>>>> causing the problem or if it is an issue with the deal.II code or
>>>>>>>> configuration.
>>>>>>>> 3. We need some more information on your deal.II configuration (your
>>>>>>>> ASPECT is an unchanged 1.3, right?). Which version of deal.II are you
>>>>>>>> using? Which trilinos, p4est and compiler? Were there any problems
>>>>>>>> during compiling those?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Rene
>>>>>>>>
>>>>>>>> On 07/18/2015 12:23 AM, Robert Moucha wrote:
>>>>>>>>> Hi Timo,
>>>>>>>>>
>>>>>>>>> Yes I still have the same problem. It occurs with the following cook
>>>>>>>>> books (have not tried all, but it looks like anything to do with time
>>>>>>>>> stepping is causing the hang):
>>>>>>>>>
>>>>>>>>> convection-box
>>>>>>>>> convection-box-3d
>>>>>>>>> shell_simple_2d
>>>>>>>>> van-keken-discontinuous
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Rob
>>>>>>>>>
>>>>>>>>>> Hey Robert,
>>>>>>>>>>
>>>>>>>>>> sorry for only getting back to this now. Any update on your problem?
>>>>>>>>>> Does this happen with every .prm file (like a simple 2d problem)?
>>>>>>>>>>
>>>>>>>>>> On Sun, Jul 5, 2015 at 6:10 PM, Robert Moucha <rmoucha at gmail.com> wrote:
>>>>>>>>>>> OK, it appears that I solved last-weeks issue with the files, turns
>>>>>>>>>>> out one of the nodes did not have the correct paths (thanks).
>>>>>>>>>>>
>>>>>>>>>>> However, now I am still having problems when using more than one node,
>>>>>>>>>>> this time ASPECT just hangs on time step 1, no error, the
>>>>>>>>>>> solution-00000 files are created on each of the nodes than nothing.
>>>>>>>>>>>
>>>>>>>>>>> It runs fine on a single node. I should point out that the ASPECT
>>>>>>>>>>> example stokes.prm as well as Citcoms runs on the cluster without
>>>>>>>>>>> issues.
>>>>>>>>>>>
>>>>>>>>>>> Here is the log.txt for the convection-box-3d.prm -- thanks in advance Rob
>>>>>>>>>>>
>>>>>>>>>>> -----------------------------------------------------------------------------
>>>>>>>>>>> -- This is ASPECT, the Advanced Solver for Problems in Earth's ConvecTion.
>>>>>>>>>>> -- . version 1.3
>>>>>>>>>>> -- . running in OPTIMIZED mode
>>>>>>>>>>> -- . running with 12 MPI processes
>>>>>>>>>>> -- . using Trilinos
>>>>>>>>>>> -----------------------------------------------------------------------------
>>>>>>>>>>>
>>>>>>>>>>> Number of active cells: 512 (on 4 levels)
>>>>>>>>>>> Number of degrees of freedom: 20381 (14739+729+4913)
>>>>>>>>>>>
>>>>>>>>>>> *** Timestep 0: t=0 seconds
>>>>>>>>>>> Solving temperature system... 0 iterations.
>>>>>>>>>>> Rebuilding Stokes preconditioner...
>>>>>>>>>>> Solving Stokes system... 29 iterations.
>>>>>>>>>>>
>>>>>>>>>>> Number of active cells: 1583 (on 5 levels)
>>>>>>>>>>> Number of degrees of freedom: 63622 (46077+2186+15359)
>>>>>>>>>>>
>>>>>>>>>>> *** Timestep 0: t=0 seconds
>>>>>>>>>>> Solving temperature system... 0 iterations.
>>>>>>>>>>> Rebuilding Stokes preconditioner...
>>>>>>>>>>> Solving Stokes system... 30+4 iterations.
>>>>>>>>>>>
>>>>>>>>>>> Number of active cells: 3256 (on 5 levels)
>>>>>>>>>>> Number of degrees of freedom: 122269 (88647+4073+29549)
>>>>>>>>>>>
>>>>>>>>>>> *** Timestep 0: t=0 seconds
>>>>>>>>>>> Solving temperature system... 0 iterations.
>>>>>>>>>>> Rebuilding Stokes preconditioner...
>>>>>>>>>>> Solving Stokes system... 30+4 iterations.
>>>>>>>>>>>
>>>>>>>>>>> Number of active cells: 9010 (on 6 levels)
>>>>>>>>>>> Number of degrees of freedom: 333145 (241677+10909+80559)
>>>>>>>>>>>
>>>>>>>>>>> *** Timestep 0: t=0 seconds
>>>>>>>>>>> Solving temperature system... 0 iterations.
>>>>>>>>>>> Rebuilding Stokes preconditioner...
>>>>>>>>>>> Solving Stokes system... 30+4 iterations.
>>>>>>>>>>>
>>>>>>>>>>> Postprocessing:
>>>>>>>>>>> RMS, max velocity: 57.6 m/s, 176 m/s
>>>>>>>>>>> Temperature min/avg/max: 0 K, 0.5 K, 1 K
>>>>>>>>>>> Heat fluxes through boundary parts: 7.682e-07 W, -7.682e-07 W,
>>>>>>>>>>> 1.685e-15 W, 2.362e-15 W, -1 W, 1 W
>>>>>>>>>>> Writing graphical output:
>>>>>>>>>>> /state/partition1/RMOUCHA/output/solution-00000
>>>>>>>>>>>
>>>>>>>>>>> *** Timestep 1: t=8.87115e-05 seconds
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> Robert Moucha
>>>>>>>>>>> Assistant Professor of Geophysics
>>>>>>>>>>> Department of Earth Sciences
>>>>>>>>>>> 204 Heroy Geology Lab
>>>>>>>>>>> Syracuse University
>>>>>>>>>>> Syracuse, NY, 13244-1070
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Aspect-devel mailing list
>>>>>>>>>>> Aspect-devel at geodynamics.org
>>>>>>>>>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>>>>>>>>>>
>>>>>>>>>> ------------------------------
>>>>>>>>>>
>>>>>>>>>> Subject: Digest Footer
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Aspect-devel mailing list
>>>>>>>>>> Aspect-devel at geodynamics.org
>>>>>>>>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>>>>>>>>>>
>>>>>>>>>> ------------------------------
>>>>>>>>>>
>>>>>>>>>> End of Aspect-devel Digest, Vol 44, Issue 9
>>>>>>>>>> *******************************************
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>>>
>>>>>>>> Subject: Digest Footer
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Aspect-devel mailing list
>>>>>>>> Aspect-devel at geodynamics.org
>>>>>>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>>>
>>>>>>>> End of Aspect-devel Digest, Vol 44, Issue 11
>>>>>>>> ********************************************
>>>>>>>
>>>>>>>
>>>>>> -------------- next part --------------
>>>>>> diff --git a/source/simulator/core.cc b/source/simulator/core.cc
>>>>>> index 6516ac2..ecf881c 100644
>>>>>> --- a/source/simulator/core.cc
>>>>>> +++ b/source/simulator/core.cc
>>>>>> @@ -636,6 +636,8 @@ namespace aspect
>>>>>> gravity_model->update();
>>>>>> heating_model->update();
>>>>>> adiabatic_conditions->update();
>>>>>> +
>>>>>> + pcout << " Updated constraints and plugins." << std::endl;
>>>>>> }
>>>>>>
>>>>>>
>>>>>> @@ -1532,7 +1534,12 @@ namespace aspect
>>>>>> if (parameters.free_surface_enabled)
>>>>>> free_surface->execute ();
>>>>>>
>>>>>> + pcout << " Assemble temperature system." << std::endl;
>>>>>> +
>>>>>> assemble_advection_system (AdvectionField::temperature());
>>>>>> +
>>>>>> + pcout << " Build temperature preconditioner." << std::endl;
>>>>>> +
>>>>>> build_advection_preconditioner(AdvectionField::temperature(),
>>>>>> T_preconditioner);
>>>>>> solve_advection(AdvectionField::temperature());
>>>>>>
>>>>>> ------------------------------
>>>>>>
>>>>>> Subject: Digest Footer
>>>>>>
>>>>>> _______________________________________________
>>>>>> Aspect-devel mailing list
>>>>>> Aspect-devel at geodynamics.org
>>>>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>>>>>>
>>>>>> ------------------------------
>>>>>>
>>>>>> End of Aspect-devel Digest, Vol 45, Issue 2
>>>>>> *******************************************
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Aspect-devel mailing list
>>>> Aspect-devel at geodynamics.org
>>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>>>
>>>
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> Aspect-devel mailing list
>> Aspect-devel at geodynamics.org
>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>>
>> ------------------------------
>>
>> End of Aspect-devel Digest, Vol 45, Issue 8
>> *******************************************
> _______________________________________________
> Aspect-devel mailing list
> Aspect-devel at geodynamics.org
> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
More information about the Aspect-devel
mailing list