[CIG-LONG] Gale 2.0 intermittent bug?

Walter Landry walter at geodynamics.org
Tue Apr 10 15:38:21 PDT 2012


Hi George,

Cc'ing the list.

George Hilley <hilley at stanford.edu> wrote:
> Hi Walter,
> 
> Sorry to keep bothering you.  First, your last fix to Gale 2.0 was
> great, and fixed the hanging problem that we experienced after time
> step 3 of the model that I sent you.  However, now that the model
> has been able to run longer, another crash has occurred.  In this
> case, the crash occurred at time step 60, and I have reproduced this
> twice when starting the model from the initial time step.  In an
> attempt to provide you with debugging information, I restarted the
> model at time step 58 to see if it would once again crash at time
> step 60.  Unfortunately, Gale seems to be computing past the prior
> problem when it has been started at this time step, and so the
> problem appears intermittent and depends on the sequence (and
> perhaps duration) of model evaluation.  Thus, I am not sure how much
> use it would be to provide you with the checkpoint files for later
> time steps in the model to help quickly reproduce the problem on
> your end, but I am happy to do this if you think it is useful.
> 
> In any case, I attach the input and output files to this email for
> your reference.  It took about 6 hours of compute time on 16 CPUs to
> get to time step 60, and so reproducing this problem on your end
> could take some time.  I am sorry about this.
> 
> As a side note, I did have a similar problem to this with the
> previous version of Gale when running a yielding D-P model of a
> small 3D fault bend in central California.  My solution at the time
> was to continually restart the model at several time-steps before
> the crash, wait for it to crash, and repeat.  The frequency of
> crashes generally seemed to increase into the simulation, and the
> crashes seemed to be preceded by the formation of a set of
> localized, high-strain-rate patches that developed in the vicinity
> of the specified fault zone.  This makes me wonder if that problem
> was related to a numerical artifact.  However, based on my limited
> experience, the intermittency of the problem (and the Signal 11 in
> the error file) seems more consistent with a memory referencing
> issue.
> 
> Please let me know if there is any more information I can provide on
> this end that is helpful.

What does the model look like just before it crashes?  This sounds
like the mesh may have gotten distorted.  When I ran the input
files you sent me earlier, I saw something like that.  Could you put
up the output files somewhere I can get them?

Cheers,
Walter Landry


More information about the CIG-LONG mailing list