[CIG-SEISMO] SPECFEM3D: time per time step increasing during simulation

Tue Oct 7 12:46:52 PDT 2014

Hi Brad and Martin, Hi all,

Yes, that comes from very slow gradual underflow that is on by default 
unfortunately on some Intel procs, one must turn it off. SPECFEM has 
three options to handle that (you can use all of them, it does not hurt; 
in principle they are on by default though, thus I am surprised they 
seem to be off on your machine):

1/ compile with -ftz (Flush-to-Zero) for the Intel compiler;
option -ffast-math for gfortran sometimes works and sometimes does not 
for some reason

2/ call a C function I wrote based on some routines I found on the Web: 
https://github.com/geodynamics/specfem3d/blob/devel/src/shared/force_ftz.c

3/ set the initial field to some small value instead of zero before the 
time loop to avoid underflows; here is how we do it in SPECFEM (the flag 
is defined in setup/constants.h.in, and is on by default):

! on some processors it is necessary to suppress underflows
! by using a small initial field instead of zero
   logical, parameter :: FIX_UNDERFLOW_PROBLEM = .true.

   if(FIX_UNDERFLOW_PROBLEM) displ(:,:) = VERYSMALLVAL

(where VERYSMALLVAL is 1.d-24 or so)

Best regards,
Dimitri.

On 10/07/2014 08:01 PM, Martin van Driel wrote:
> Dear Brad,
>
> I am not a SPECFEM user, but I wanted to mention that we had some
> similar effect in AxiSEM a while ago. We finally identified the reason:
> denormal floats. Some reading:
>
> http://stackoverflow.com/questions/9314534/why-does-changing-0-1f-to-0-slow-down-performance-by-10x
>
> When the wavefield is initialized with zeros, it goes through the regime
> of denormal floats at the first rise of the P-wave. As the wavefront
> increases, there are more and more denormal floats to treat. In AxiSEM,
> once the p-wave had arrived the antipode, the simulation went back to
> normal speed. The difference in our case was a factor of three in
> performance.
>
> We found several solutions that change how denormal floats are treated.
>
> 1) Compiler flags: -ffast-math for gfortran, -ftz for ifort. Cray seems
> to has the flush to zero enabled per default.
>
> 2) Alternatively, the behaviour can be changed using "IA intrinsics", see
>
> https://software.intel.com/en-us/articles/how-to-avoid-performance-penalties-for-gradual-underflow-behavior
>
> 3) A simplistic solution is to initialize the displacement with some
> value that is just above the denormal range, for single precision, 1e-30
> or 1e-35 worked, if I recall correctly.
>
> Our current solution in AxiSEM is the IA intrinsics and essentially
> consists in calling the function set_ftz() once in the very beginning of
> the program:
>
> https://github.com/geodynamics/axisem/blob/master/SOLVER/ftz.c
>
> Hope this helps,
> Martin
>
>
> On 10/07/2014 06:39 PM, Brad Aagaard wrote:
>> SPECFEM3D users and developers,
>>
>> I am finding that the average time per time step in a SPECFEM3D
>> simulation is increasing as the simulation progresses:
>>
>>   Time step #          400
>>   Time:   -1.002500      seconds
>>   Elapsed time in seconds =    135.029711008072
>>   Elapsed time in hh:mm:ss =    0 h 02 m 15 s
>>   Mean elapsed time per time step in seconds =   0.337574277520180
>>
>>   Time step #          800
>>   Time:  -2.4999999E-03  seconds
>>   Elapsed time in seconds =    420.503839015961
>>   Elapsed time in hh:mm:ss =    0 h 07 m 00 s
>>   Mean elapsed time per time step in seconds =   0.525629798769951
>>
>>   Time step #         1200
>>   Time:   0.9975000      seconds
>>   Elapsed time in seconds =    854.967207908630
>>   Elapsed time in hh:mm:ss =    0 h 14 m 14 s
>>   Mean elapsed time per time step in seconds =   0.712472673257192
>>
>>   Time step #         1600
>>   Time:    1.997500      seconds
>>   Elapsed time in seconds =    1439.92759609222
>>   Elapsed time in hh:mm:ss =    0 h 23 m 59 s
>>   Mean elapsed time per time step in seconds =   0.899954747557640
>>
>> This behavior seems very odd because I would expect the work per time
>> step to be constant. The job is running on 4 compute nodes (32 cores
>> total) and easily fits in memory. I don't see any anomalous behavior on
>> the cluster diagnostics (CPU load, network traffic, etc) consistent with
>> an increasing workload. I have forked off the git master branch to add
>> my own seismic velocity model.
>>
>> Has this behavior been observed before?
>>
>> I can try turning off output to see if that isolates the problem. Does
>> anyone have any other suggestions?
>>
>> Thanks,
>> Brad Aagaard
>> _______________________________________________
>> CIG-SEISMO mailing list
>> CIG-SEISMO at geodynamics.org
>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-seismo
> _______________________________________________
> CIG-SEISMO mailing list
> CIG-SEISMO at geodynamics.org
> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-seismo
>

-- 
Dimitri Komatitsch
CNRS Research Director (DR CNRS), Laboratory of Mechanics and Acoustics,
UPR 7051, Marseille, France    http://komatitsch.free.fr