[CIG-LONG] Gale 2.0 intermittent bug?
Walter Landry
walter at geodynamics.org
Tue Apr 10 15:38:21 PDT 2012
Hi George,
Cc'ing the list.
George Hilley <hilley at stanford.edu> wrote:
> Hi Walter,
>
> Sorry to keep bothering you. First, your last fix to Gale 2.0 was
> great, and fixed the hanging problem that we experienced after time
> step 3 of the model that I sent you. However, now that the model
> has been able to run longer, another crash has occurred. In this
> case, the crash occurred at time step 60, and I have reproduced this
> twice when starting the model from the initial time step. In an
> attempt to provide you with debugging information, I restarted the
> model at time step 58 to see if it would once again crash at time
> step 60. Unfortunately, Gale seems to be computing past the prior
> problem when it has been started at this time step, and so the
> problem appears intermittent and depends on the sequence (and
> perhaps duration) of model evaluation. Thus, I am not sure how much
> use it would be to provide you with the checkpoint files for later
> time steps in the model to help quickly reproduce the problem on
> your end, but I am happy to do this if you think it is useful.
>
> In any case, I attach the input and output files to this email for
> your reference. It took about 6 hours of compute time on 16 CPUs to
> get to time step 60, and so reproducing this problem on your end
> could take some time. I am sorry about this.
>
> As a side note, I did have a similar problem to this with the
> previous version of Gale when running a yielding D-P model of a
> small 3D fault bend in central California. My solution at the time
> was to continually restart the model at several time-steps before
> the crash, wait for it to crash, and repeat. The frequency of
> crashes generally seemed to increase into the simulation, and the
> crashes seemed to be preceded by the formation of a set of
> localized, high-strain-rate patches that developed in the vicinity
> of the specified fault zone. This makes me wonder if that problem
> was related to a numerical artifact. However, based on my limited
> experience, the intermittency of the problem (and the Signal 11 in
> the error file) seems more consistent with a memory referencing
> issue.
>
> Please let me know if there is any more information I can provide on
> this end that is helpful.
What does the model look like just before it crashes? This sounds
like the mesh may have gotten distorted. When I ran the input
files you sent me earlier, I saw something like that. Could you put
up the output files somewhere I can get them?
Cheers,
Walter Landry
More information about the CIG-LONG
mailing list