[CIG-SEISMO] SPECFEM3D: time per time step increasing during simulation
Martin van Driel
vandriel at tomo.ig.erdw.ethz.ch
Tue Oct 7 11:01:53 PDT 2014
Dear Brad,
I am not a SPECFEM user, but I wanted to mention that we had some
similar effect in AxiSEM a while ago. We finally identified the reason:
denormal floats. Some reading:
http://stackoverflow.com/questions/9314534/why-does-changing-0-1f-to-0-slow-down-performance-by-10x
When the wavefield is initialized with zeros, it goes through the regime
of denormal floats at the first rise of the P-wave. As the wavefront
increases, there are more and more denormal floats to treat. In AxiSEM,
once the p-wave had arrived the antipode, the simulation went back to
normal speed. The difference in our case was a factor of three in
performance.
We found several solutions that change how denormal floats are treated.
1) Compiler flags: -ffast-math for gfortran, -ftz for ifort. Cray seems
to has the flush to zero enabled per default.
2) Alternatively, the behaviour can be changed using "IA intrinsics", see
https://software.intel.com/en-us/articles/how-to-avoid-performance-penalties-for-gradual-underflow-behavior
3) A simplistic solution is to initialize the displacement with some
value that is just above the denormal range, for single precision, 1e-30
or 1e-35 worked, if I recall correctly.
Our current solution in AxiSEM is the IA intrinsics and essentially
consists in calling the function set_ftz() once in the very beginning of
the program:
https://github.com/geodynamics/axisem/blob/master/SOLVER/ftz.c
Hope this helps,
Martin
On 10/07/2014 06:39 PM, Brad Aagaard wrote:
> SPECFEM3D users and developers,
>
> I am finding that the average time per time step in a SPECFEM3D
> simulation is increasing as the simulation progresses:
>
> Time step # 400
> Time: -1.002500 seconds
> Elapsed time in seconds = 135.029711008072
> Elapsed time in hh:mm:ss = 0 h 02 m 15 s
> Mean elapsed time per time step in seconds = 0.337574277520180
>
> Time step # 800
> Time: -2.4999999E-03 seconds
> Elapsed time in seconds = 420.503839015961
> Elapsed time in hh:mm:ss = 0 h 07 m 00 s
> Mean elapsed time per time step in seconds = 0.525629798769951
>
> Time step # 1200
> Time: 0.9975000 seconds
> Elapsed time in seconds = 854.967207908630
> Elapsed time in hh:mm:ss = 0 h 14 m 14 s
> Mean elapsed time per time step in seconds = 0.712472673257192
>
> Time step # 1600
> Time: 1.997500 seconds
> Elapsed time in seconds = 1439.92759609222
> Elapsed time in hh:mm:ss = 0 h 23 m 59 s
> Mean elapsed time per time step in seconds = 0.899954747557640
>
> This behavior seems very odd because I would expect the work per time
> step to be constant. The job is running on 4 compute nodes (32 cores
> total) and easily fits in memory. I don't see any anomalous behavior on
> the cluster diagnostics (CPU load, network traffic, etc) consistent with
> an increasing workload. I have forked off the git master branch to add
> my own seismic velocity model.
>
> Has this behavior been observed before?
>
> I can try turning off output to see if that isolates the problem. Does
> anyone have any other suggestions?
>
> Thanks,
> Brad Aagaard
> _______________________________________________
> CIG-SEISMO mailing list
> CIG-SEISMO at geodynamics.org
> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-seismo
More information about the CIG-SEISMO
mailing list