[aspect-devel] Aspect hangs after several time steps
Lev Karatun
lev.karatun at gmail.com
Thu Jan 28 13:05:21 PST 2016
Hi Timo,
Thanks for pointing this out. I reran the setup in Debug mode on 1 core and
the only error messages I got were related to the the names of the
compositional fields. After 12 hours of running there is not a single
message neither in stdout nor in stderr. Am I doing something wrong? Do I
need manually edit the Makefile to add compilation options like -Wall or
-O0? How can I see these errors?
Also, my understanding was that a division by zero error prevents the code
from continuing to run, which happened a lot (even when running in release
mode) before I added a couple checks in the plugin, but never happened
after it. How does Aspect even run if I have such errors?
Also, this expression contains no divisions, so I don't understand how this
can be a division by zero error...
*double strainrate_E2
=sqrt(0.5*(in.strain_rate[i][0][0]*in.strain_rate[i][0][0]+in.strain_rate[i][1][1]*in.strain_rate[i][1][1]+in.strain_rate[i][2][2]*in.strain_rate[i][2][2])+
in.strain_rate[i][0][1]*in.strain_rate[i][0][1]
+in.strain_rate[i][0][2]*in.strain_rate[i][0][2]
+in.strain_rate[i][1][2]*in.strain_rate[i][1][2])*
Best regards,
Lev Karatun.
2016-01-27 3:36 GMT-05:00 Timo Heister <heister at clemson.edu>:
> Lev,
>
> there are various issues with your setup. Please run in debug mode and
> fix the problems. In particular, you are doing divisions by zero in
> many places. Make sure you can run the same setup on a normal desktop
> before thinking about running on a parallel cluster. I am listing the
> first couple of problems I encountered:
>
> 1.
> Additional Information:
> Invalid character in field weak zone. Names of compositional fields
> should consist of a combination of letters, numbers and underscores.
>
> 2.
> Program received signal SIGFPE, Arithmetic exception.
> 0x00007fffd5e8de75 in aspect::MaterialModel::nz<3>::evaluate
> (this=0x15746e0, in=..., out=...) at
> /home/heister/Downloads/aspect-hang/timo2/nz.cc:77
> 77 double strainrate_E2 =
>
> sqrt(0.5*(in.strain_rate[i][0][0]*in.strain_rate[i][0][0]+in.strain_rate[i][1][1]*in.strain_rate[i][1][1]+in.strain_rate[i][2][2]*in.strain_rate[i][2][2])
> + in.strain_rate[i][0][1]*in.strain_rate[i][0][1] +
> in.strain_rate[i][0][2]*in.strain_rate[i][0][2] +
> in.strain_rate[i][1][2]*in.strain_rate[i][1][2]);
> (gdb) p in.strain_rate
> $1 = std::vector of length 0, capacity 1
>
> 3.
> Program received signal SIGFPE, Arithmetic exception.
> 0x00007fffd5e8e2a8 in aspect::MaterialModel::nz<3>::evaluate
> (this=0x15746e0, in=..., out=...) at
> /home/heister/Downloads/aspect-hang/timo2/nz.cc:101
> 101
>
> exp((activation_energies[j]+activation_volumes[j]*in.pressure[i])/(nvs[j]*R*in.temperature[i]));
> (gdb) p nvs
> $1 = std::vector of length 4, capacity 4 = {1.5, 1.5, 3, 3}
> (gdb) p j
> $2 = 0
> (gdb) p in.temperature
> $6 = std::vector of length 1, capacity 1 = {0}
>
>
> 4.
> Program received signal SIGFPE, Arithmetic exception.
> 0x00007fffd5e8e11f in aspect::MaterialModel::nz<3>::evaluate
> (this=0x15746e0, in=..., out=...) at
> /home/heister/Downloads/aspect-hang/timo2/nz.cc:87
> 87 double viscosity_MC =
> (1/(1/(sigma_y/(2*strainrate_E2)+eta_min)+1/eta_max));
> (gdb) p strainrate_E2
> $1 = 0
>
> 5.
> Program received signal SIGFPE, Arithmetic exception.
> 0x0000000000a62363 in
> aspect::Simulator<3>::local_assemble_advection_system
> (this=0x7fffffffb520, advection_field=..., viscosity_per_cell=...,
> cell=..., scratch=..., data=...) at
> /ssd/aspect-local/source/simulator/assembly.cc:2016
> 2016 data.local_rhs(i)
> (gdb) p reaction_term
> $2 = nan(0x4000000000000)
>
> On Tue, Jan 26, 2016 at 11:04 PM, Lev Karatun <lev.karatun at gmail.com>
> wrote:
> > Hi,
> >
> > the simulation that was run on 1 core resumed after a while (which never
> > happened before - sorry about the confusion), and produced an error on
> time
> > step 103 (attached). The same simulation on 8 cores is still hanging. The
> > simulation on 8 cores without free surface ran without problems.
> >
> > Best regards,
> > Lev Karatun.
> >
> > 2016-01-26 14:04 GMT-05:00 Lev Karatun <lev.karatun at gmail.com>:
> >>
> >> Hi Timo,
> >>
> >> I have found a good setup that reproduces the problem (after fixing the
> >> error that I had, that is). On both 8 cores and 1 core the simulation
> stops
> >> on time step 59. I attached the necessary files.
> >>
> >> Best regards,
> >> Lev Karatun.
> >>
> >> 2016-01-26 5:15 GMT-05:00 Lev Karatun <lev.karatun at gmail.com>:
> >>>
> >>> Hi Timo,
> >>>
> >>> I composed a lengthy email with answers to your questions, but then I
> >>> realized I have a mistake in boundary conditions, so now I need a bit
> more
> >>> time to explore if the fixed ones work correctly.
> >>>
> >>> Thank you for you help.
> >>>
> >>> Best regards,
> >>> Lev Karatun.
> >>>
> >>> 2016-01-25 7:09 GMT-05:00 Timo Heister <timo.heister at gmail.com>:
> >>>>
> >>>> Lev,
> >>>>
> >>>> I'm not sure what could cause a hang like you observe. Your first goal
> >>>> should be to make this problem reproducible as quickly as possible
> >>>> with the smallest number of processors.
> >>>>
> >>>> > Disabling free surface (changing the top boundary to free-slip)
> makes
> >>>> > the simulation hang after 1 time step
> >>>>
> >>>> This is great. How many cores do you need to have this happen? If you
> >>>> can reduce it further? Does this happen with any of the included
> >>>> cookbooks? If not, please share your input file and all details
> >>>> (number of cores, when it hangs, etc.).
> >>>>
> >>>>
> >>>> On Mon, Jan 25, 2016 at 12:01 PM, Lev Karatun <lev.karatun at gmail.com>
> >>>> wrote:
> >>>> > Hi,
> >>>> >
> >>>> > (I already reported the problem last year
> >>>> >
> >>>> > (
> http://lists.geodynamics.org/pipermail/aspect-devel/2015-June/000919.html
> ),
> >>>> > but at that moment it wasn't as severe. Now I moved to the higher
> >>>> > resolution
> >>>> > models and it has gotten much worse)
> >>>> >
> >>>> > The problem is - Aspect hangs after a (seemingly) random timestep
> >>>> > without
> >>>> > producing any error. I ran several tests in attempts to narrow down
> >>>> > the
> >>>> > problem, and this is what I found:
> >>>> > - Neither assigning a value to $TMP/$TMPDIR or adding a "Number of
> >>>> > grouped
> >>>> > files" parameter helps
> >>>> > - Disabling visualization postprocessor doesn't help
> >>>> > - Running Apsect in development mode produced the same results:
> >>>> > hanging
> >>>> > without any error messages
> >>>> > - Restarting the hanging simulation from a checkpoint helps
> partially,
> >>>> > it
> >>>> > goes past the timestep at which in hung, but then after it hangs
> again
> >>>> > - Disabling free surface (changing the top boundary to free-slip)
> >>>> > makes the
> >>>> > simulation hang after 1 time step
> >>>> > - Changing CFL number affects how long the simulation runs before
> >>>> > hanging:
> >>>> > at 0.05 is hangs at timestep 13, at 0.1 - 56, at 0.2 it ran without
> >>>> > hanging
> >>>> > up to time step 153 (with disabled visualization though)
> >>>> >
> >>>> > Could you please help me solve this problem? It's really hurting the
> >>>> > progress of my research.
> >>>> > Thanks in advance!
> >>>> >
> >>>> > Best regards,
> >>>> > Lev Karatun.
> >>>> >
> >>>> > _______________________________________________
> >>>> > Aspect-devel mailing list
> >>>> > Aspect-devel at geodynamics.org
> >>>> > http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
> >>>> _______________________________________________
> >>>> Aspect-devel mailing list
> >>>> Aspect-devel at geodynamics.org
> >>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
> >>>
> >>>
> >>
> >
> >
> > _______________________________________________
> > Aspect-devel mailing list
> > Aspect-devel at geodynamics.org
> > http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>
>
>
> --
> Timo Heister
> http://www.math.clemson.edu/~heister/
> _______________________________________________
> Aspect-devel mailing list
> Aspect-devel at geodynamics.org
> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.geodynamics.org/pipermail/aspect-devel/attachments/20160128/214efc12/attachment.html>
More information about the Aspect-devel
mailing list