[Paper Reading] ACS: Concurrent Kernel Execution on Irregular, Input-Dependent Computational Graphs (arXiv'24)
This blog is a write-up of the paper “ACS: Concurrent Kernel Execution on Irregular, Input-Dependent Computational Graphs” from arXiv'24. Motivation Some workloads (e.g., Simulation Engines for Deep RL, Dynamic DNNs) cannot fully utilize the massive parallelism of GPUs (see Figure 1). The main reason is that these workloads contain lots of small kernels which cannot fully utilize the GPU, and these kernels are not executed concurrently, although most of them are independent and in theory can be executed concurrently. ...