[CIG-SEISMO] SPECFEM3D: time per time step increasing during simulation

Tue Oct 7 11:01:53 PDT 2014

Dear Brad,

I am not a SPECFEM user, but I wanted to mention that we had some
similar effect in AxiSEM a while ago. We finally identified the reason:
denormal floats. Some reading:

http://stackoverflow.com/questions/9314534/why-does-changing-0-1f-to-0-slow-down-performance-by-10x

When the wavefield is initialized with zeros, it goes through the regime
of denormal floats at the first rise of the P-wave. As the wavefront
increases, there are more and more denormal floats to treat. In AxiSEM,
once the p-wave had arrived the antipode, the simulation went back to
normal speed. The difference in our case was a factor of three in
performance.

We found several solutions that change how denormal floats are treated.

1) Compiler flags: -ffast-math for gfortran, -ftz for ifort. Cray seems
to has the flush to zero enabled per default.

2) Alternatively, the behaviour can be changed using "IA intrinsics", see

https://software.intel.com/en-us/articles/how-to-avoid-performance-penalties-for-gradual-underflow-behavior

3) A simplistic solution is to initialize the displacement with some
value that is just above the denormal range, for single precision, 1e-30
or 1e-35 worked, if I recall correctly.

Our current solution in AxiSEM is the IA intrinsics and essentially
consists in calling the function set_ftz() once in the very beginning of
the program:

https://github.com/geodynamics/axisem/blob/master/SOLVER/ftz.c

Hope this helps,
Martin

On 10/07/2014 06:39 PM, Brad Aagaard wrote:
> SPECFEM3D users and developers,
> 
> I am finding that the average time per time step in a SPECFEM3D
> simulation is increasing as the simulation progresses:
> 
>  Time step #          400
>  Time:   -1.002500      seconds
>  Elapsed time in seconds =    135.029711008072
>  Elapsed time in hh:mm:ss =    0 h 02 m 15 s
>  Mean elapsed time per time step in seconds =   0.337574277520180
> 
>  Time step #          800
>  Time:  -2.4999999E-03  seconds
>  Elapsed time in seconds =    420.503839015961
>  Elapsed time in hh:mm:ss =    0 h 07 m 00 s
>  Mean elapsed time per time step in seconds =   0.525629798769951
> 
>  Time step #         1200
>  Time:   0.9975000      seconds
>  Elapsed time in seconds =    854.967207908630
>  Elapsed time in hh:mm:ss =    0 h 14 m 14 s
>  Mean elapsed time per time step in seconds =   0.712472673257192
> 
>  Time step #         1600
>  Time:    1.997500      seconds
>  Elapsed time in seconds =    1439.92759609222
>  Elapsed time in hh:mm:ss =    0 h 23 m 59 s
>  Mean elapsed time per time step in seconds =   0.899954747557640
> 
> This behavior seems very odd because I would expect the work per time
> step to be constant. The job is running on 4 compute nodes (32 cores
> total) and easily fits in memory. I don't see any anomalous behavior on
> the cluster diagnostics (CPU load, network traffic, etc) consistent with
> an increasing workload. I have forked off the git master branch to add
> my own seismic velocity model.
> 
> Has this behavior been observed before?
> 
> I can try turning off output to see if that isolates the problem. Does
> anyone have any other suggestions?
> 
> Thanks,
> Brad Aagaard
> _______________________________________________
> CIG-SEISMO mailing list
> CIG-SEISMO at geodynamics.org
> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-seismo