%0 Article %J ACM Trans. Math. Softw. %D 2019 %T Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators %A Kronbichler, Martin %A Kormann, Katharina %C New York, NY, USA %I Association for Computing Machinery %N 3 %U https://doi.org/10.1145/3325864 %V 45 %8 aug %1 10.1145/3325864 %K sum factorization, parallelization, vectorization, Matrix free method, finite element method, discontinuous Galerkin method, SPECFEM3D %X We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Different algorithms and data structures are compared in an in-depth performance analysis. The implementations of the local integrals are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional interpolations. Up to 60% of the arithmetic peak on Intel Haswell, Broadwell, and Knights Landing processors is reached when running from caches and up to 40% of peak when also considering the access to vectors from main memory. On 2texttimes{}14 Broadwell cores, the throughput is up to 2.2 billion unknowns per second for the 3D Laplacian and up to 4 billion unknowns per second for the 3D advection on affine geometries, close to a simple copy operation at 4.7 billion unknowns per second. Our experiments show that MPI ghost exchange has a considerable impact on performance and we present strategies to mitigate this effect. Finally, various options for evaluating geometry terms and their performance are discussed. Our implementations are publicly available through the deal.II finite element library.