[CIG-LONG] gale : 1 input, 4 different outputs

Wed Feb 27 13:23:22 PST 2008

Walter Landry <walter at geodynamics.org> wrote:
> js at cp.dias.ie wrote:
> > Hi,
> > I recently ran the same model in Gale four times and it generated
> > four differing outputs. While run 1 and run 3 are quite similar, as are
> > run 2 and run 4, 1 and 3 both differ significantly from 2 and 4. When
> > Gale generates the material points for a model, does it do so randomly?
> > Is this the reason for the differences? If so, is it possible to use
> > the starting locations of the material points from one model in another
> > future model?
> 
> There are a few different ways that Gale can lay out the particles, one
> of which is random.  However, there should be a parameter "seed" which
> is the pseudo random number generator seed.  If you are getting
> different answers for the same input file and the same number of
> processors, then something might be wrong.  Can I see your input
> file?

I figured out the problem.  The solver is not completely
deterministic.  I am attaching an email I got from Barry Smith (Petsc
maintainer).  If you modify the tolerance parameters so that Gale only
does one iteration, then you see differences that are about the size
of the truncation error.  With normal tolerances, you see a more
significant difference after the first time step, because it has to do
so many iterations to get a solution.  Eventually, after enough time
steps, you see significantly different evolution.

I think the root of the problem is that the solutions you are getting
on your grid are just not very good.  That allows the solver a lot of
leeway when finding a solution.  To fix this, you probably just need
to increase your resolution.  You may need to decrease the solver
tolerance as well.

Cheers,
Walter Landry
walter at geodynamics.org

Barry Smith <bsmith at mcs.anl.gov> wrote:
> >> When I run a Gale model multiple times on 16 processors, I get
> >> different answers.
> 
> 
>     Different answers? Or all equally correct approximate answers?
> 
>     You will NOT see exactly the same convergence rates because the
> order of the floating point operations in the parallel dot product is not
> deterministic! What you will see is first the last few digits of the residual
> norm start to be different, as iterations proceed more of the digits
> in the residual will differ, eventually the residual norms may look
> completely different (say 8.73e-10 vs 1.02e-11).  In floating point
> precision all the answers (assuming no bugs) are equally correct.
> 
>    Barry