[CIG-SEISMO] Roundoff inconsistency in assembling routines

surendra at caltech.edu surendra at caltech.edu
Thu Oct 18 20:07:41 PDT 2012


Dear Dimitri,

We have noticed that different processors produce results with different round-off errors during the force assembly in SPECFEM3D. This apparently minor inconvenience actually lead to significant inconsistencies in our dynamic rupture simulations if parallel interfaces cut through faults. The non-linearity of the dynamic rupture problem likely amplified the impact of round-off inconsistencies, beyond what would be noticeable in a linear wave propagation simulation. However, as the issue might cumulate through the iterations in tomography and source inversions with SPECFEM3D, we thought it would be useful to share with you and other users what we found and how we fixed it.

Attached are three figures to illustrate how the issue appears in our rupture simulation results. All are views of the fault plane:
 1) fault_processor_partition.png: each color indicates a different processor
 2) sliprate_snapshot_glitches.png: snapshot of slip velocity using the original assemble routines of SPECFEM3D
 3) sliprate_snapshot_afterFixing_assemblingRoutine.png: snapshot of slip velocity using our modified assemble routine
Comparing carefully figures (2) and (3), there are some glitches in figure (2) that disappeared after using the new assembling routine. One of the most perceivable glitches is close to (x,z)=(5,-5). Remarkably, all glitches are located at intersections between three processors as can be seen figure (1). With time, these glitches spread over neighboring nodes and pollute the solution.

Here is the origin of the issue. In the assembling routines of SPECFEM3D each processor adds the contribution of its neighbors in the order of their rank. However, because that sum is added to the local processor value, the result is that the overall order in which the contribution from each processor (including self) is added is not the same on all processors. For instance, at a triple junction between processors 1, 2 and 3, the sum is
v1+v2+v3 in proc 1, but
v2+v1+v3 in proc 2, and
v3+v1+v2 in proc 3.
Adding floating point numbers in different orders leads to different round-off errors.

We fixed this issue with a few lines of code to insure that the order of sums is the same on all processors. We created a new routine "assemble_MPI_vector_ext_mesh_w_ordered" in assemble_MPI_vector.f90 file. The file can be found on the SVN server from the following link:
http://geodynamics.org/wsvn/cig/seismo/3D/FAULT_SOURCE/branches/new_fault_db/src/assemble_MPI_vector.f90?op=file&rev=0&sc=0

We suspect this version improves the accuracy and consistency of the code without affecting its performance. It might be worth using it in the main branch of SPECFEM3D too.

Best regards,

Surendra Somala and Jean-Paul Ampuero - Caltech

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://geodynamics.org/pipermail/cig-seismo/attachments/20121018/1586b99c/attachment-0001.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sliprate_snapshot_afterFixing_assemblingRoutine.png
Type: image/png
Size: 27609 bytes
Desc: not available
Url : http://geodynamics.org/pipermail/cig-seismo/attachments/20121018/1586b99c/attachment-0003.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fault_processor_partition.png
Type: image/png
Size: 9592 bytes
Desc: not available
Url : http://geodynamics.org/pipermail/cig-seismo/attachments/20121018/1586b99c/attachment-0004.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sliprate_snapshot_glitches.png
Type: image/png
Size: 27607 bytes
Desc: not available
Url : http://geodynamics.org/pipermail/cig-seismo/attachments/20121018/1586b99c/attachment-0005.png 


More information about the CIG-SEISMO mailing list