Analyzing performance within asynchronous many-task-based runtime systems is challenging because millions of tasks are launched concurrently. Especially for long-term runs, the amount of data collected becomes overwhelming. We study HPX and its performance-counter framework and autonomic performance environment for Exascale to collect performance data and energy consumption. We added HPX application-specific performance counters to the Octo-Tiger full 3-D adaptive multigrid code astrophysics application. This enables the combined visualization of physical and performance data to highlight bottlenecks with respect to different solvers. We examine the overhead introduced by these measurements, which is around 1%, with respect to the overall application runtime. We perform a resolution study for four different levels of refinement and analyze the application's performance with respect to adaptive grid refinement. The measurements' overheads are small, enabling the combined use of performance data and physical properties with the goal of improving the code's performance. All runs were obtained on NERSC's Cori, Louisiana Optical Network Infrastructure's QueenBee2, and Indiana University's Big Red 3.
Publication Source (Journal or Book title)
Computing in Science and Engineering
Diehl, P., Marcello, D., Amini, P., Kaiser, H., Shiber, S., Clayton, G., Frank, J., Dais, G., Pfluger, D., Eder, D., Koniges, A., & Huck, K. (2021). Performance Measurements within Asynchronous Task-Based Runtime Systems: A Double White Dwarf Merger as an Application. Computing in Science and Engineering, 23 (3), 73-81. https://doi.org/10.1109/MCSE.2021.3073626