Fast GPU Graph Contraction by Combining Efficient Shallow Searches and Post-Culling
Efficient GPU single-source shortest-path (SSSP) queries of road network graphs can be realized by a technique called PHAST (Delling et al.) in which the graph is contracted (pre-processed using Geisberger's Contraction Hierarchies) once and the resulting contracted graph is queried as needed. PHAST accommodates GPUs' parallelism requirements well, resulting in efficient queries. For situations in which a graph is not available well in advance or changes frequently contraction time itself becomes significant. Karimi et al. recently described a GPU contraction technique, CU-CH, which significantly reduces the contraction time of small-to medium-sized graphs, reporting a speedup of over 20× on an NVidia P100 GPU. However CU-CH realizes little speedup on larger graphs, such as DIMACS' USA and W. Europe graphs. The obstacle to faster contraction of larger graphs is the frequently performed witness path search (WPS). A WPS for a node determines which shortcut edges need to be added between the node's neighbors to maintain distances after the removal of the node. GPUs' strict thread convergence requirements and limited scratchpad preclude the bidirectional Dijkstra approach used in CPU implementations. Instead, CU-CH uses a two-hop-limit WPS tightly coded to fit GPU shared storage and to maintain thread convergence. Where two hops is sufficient speedup is high, but for larger graphs the hop limit exacts a toll due to the accumulation of unneeded shortcuts. The problem is overcome here by retaining the efficient CU-CH WPS but using it both for its original purpose and also to identify unnecessary shortcuts added in prior steps. The unnecessary shortcuts are culled (removed). Culling shortcuts not only dramatically reduces the time needed to contract a graph but also improves the quality of the contracted graph. For smaller graphs such as DIMACS Cal (travel time) contraction time is 61 % faster compared to CU-CH. For the DIMACS Europe and USA graphs contraction times are 40× and 12× faster, respectively. SSSP query times also improve dramatically, approaching those obtained on aggressively contracted graphs. The speedup over Geisberger's CPU code is over 100 times for NVidia VI00 GPUs on most graphs tried.
Publication Source (Journal or Book title)
2020 IEEE High Performance Extreme Computing Conference, HPEC 2020
Karimi, R., Koppelman, D., & Michael, C. (2020). Fast GPU Graph Contraction by Combining Efficient Shallow Searches and Post-Culling. 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020 https://doi.org/10.1109/HPEC43674.2020.9286141