Thoroughly exploring GPU buffering options for stencil code by using an efficiency measure and a performance model

Document Type


Publication Date



Stencil computations form the basis for computer simulations across almost every field of science, such as computational fluid dynamics, data mining, and image processing. Their mostly regular data access patterns potentially enable them to take advantage of the high computation and data bandwidth of GPUS, but only if data buffering and other issues are handled properly. Finding a good code generation strategy presents a number of challenges, one of which is the best way to make use of memory. GPUS have several types of on-chip storage including registers, shared memory, and a read-only cache. The choice of type of storage and how it's used, a buffering strategy, for each stencil array (grid function, [GF]) not only requires a good understanding of its stencil pattern, but also the efficiency of each type of storage for the GF, to avoid squandering storage that would be more beneficial to another GF. For a stencil computation with N GFs, the total number of possible assignments is bN where b is the number of buffering strategies. Our code-generation framework supports five buffering strategies (b = 5). Large, complex stencil kernels may consist of dozens of GFs, resulting in significant search overhead. In this work, we present an analytic performance model for stencil computations on GPUS and study the behavior of read-only cache and L2 cache. Next, we propose an efficiency-based assignment algorithm which operates by scoring a change in buffering strategy for a GF using a combination of (a) the predicted execution time and (b) on-chip storage usage. By using this scoring, an assignment for N GFs can be determined in 2 steps. Results show that the performance model has good accuracy and that the assignment strategy is highly efficient.

Publication Source (Journal or Book title)

IEEE Transactions on Multi-Scale Computing Systems

First Page


Last Page


This document is currently not available here.