Expressing High Performance Irregular Computations on the GPU

A Google TechTalk, presented by Muhammad Osama, 2022/06/07 ABSTRACT: GPUs excel at data analytics problems with ample, regular parallelism. Problems with fine-grained irregular parallelism (where neighboring data elements are assigned different amounts of work), such as those in sparse machine learning and linear algebra, numerical simulation, and graph analytics, are more challenging to map to the GPU. Today’s best GPU implementations of irregular-parallel problems employ sophisticated low-level primitives to map irregular amounts of work to the GPU’s compute units. Generally, these implementations build application-specific load-balancing techniques that are tightly coupled with application logic. The result is complex code whose load-balance capabilities cannot easily be used in other applications. We describe our implementation of a standalone fine-grained load-balancing framework for GPUs that can address these irregular problems. In our work, we focus on two primary problems: (1) an abstraction that eases programmer complexity by separating the concerns of load balancing from work processing, and (2) interfaces that enable programmers to target load-balanced applications, load-balanced kernel launches, and/or in-kernel load-balancing collectives. About the Speaker: Muhammad Osama is a Ph.D. candidate advised by Professor John Owens, in the Electrical and Computer Engineering department at the University of California, Davis. Muhammad’s current research focuses on General Purpose GPU Computing (GPGPU), specifically GPU load-balancing for dense and sparse workloads. He is also the lead developer of Gunrock, a GPU graph analytics library, and has been a part of DARPA’s HIVE (a sparse computation accelerator) and SDH (Software Defined Hardware) projects. Muhammad graduated from the University of Washington, Seattle, with a bachelor’s degree in Electrical Engineering with a research focus on real-time graphics.
Back to Top