Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation

Mudalige, G. R. and Reguly, I. Z. and Jammy, S. P. and Jacobs, C. T. and Giles, M. B. (2019) Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 131. pp. 130-146. ISSN 0743-7315


Download (1MB) | Preview


SBLI (Shock-wave/Boundary-layer Interaction) is a large-scale Computational Fluid Dynamics (CFD) application, developed over 20 years at the University of Southampton and extensively used within the UK Turbulence Consortium. It is capable of performing Direct Numerical Simulations (DNS) or Large Eddy Simulation (LES) of shock-wave/boundary-layer interaction problems over highly detailed multi-block structured mesh geometries. SBLI presents major challenges in data organization and movement that need to be overcome for continued high performance on emerging massively parallel hardware platforms. In this paper we present research in achieving this goal through the OPS embedded domain-specific language. OPS targets the domain of multi-block structured mesh applications. It provides an API embedded in C/C++ and Fortran and makes use of automatic code generation and compilation to produce executables capable of running on a range of parallel hardware systems. The core functionality of SBLI is captured using a new framework called OpenSBLI which enables a developer to declare the partial differential equations using Einstein notation and then automatically carryout discretization and generation of OPS (C/C++) API code. OPS is then used to automatically generate a wide range of parallel implementations. Using this multi-layered abstractions approach we demonstrate how new opportunities for further optimizations can be gained, such as fine-tuning the computation intensity and reducing data movement and apply them automatically. Performance results demonstrate there is no performance loss due to the high-level development strategy with OPS and OpenSBLI, with performance matching or exceeding the hand-tuned original code on all CPU nodes tested. The data movement optimizations provide over 3× speedups on CPU nodes, while GPUs provide 5× speedups over the best performing CPU node. The OPS generated parallel code also demonstrates excellent scalability on nearly 100K cores on a Cray XC30 (ARCHER at EPCC) and on over 4K GPUs on a CrayXK7 (Titan at ORNL).

Item Type: Article
Uncontrolled Keywords: DSLs, Multi-block Structured mesh applications, OPS, SBLI, Finite Difference Methods
Subjects: Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Depositing User: MTMT SWORD
Date Deposited: 18 Sep 2019 12:39
Last Modified: 18 Sep 2019 12:39

Actions (login required)

Edit Item Edit Item