[Paper Reading] ACS: Concurrent Kernel Execution on Irregular, Input-Dependent Computational Graphs (arXiv'24)

This blog is a write-up of the paper “ACS: Concurrent Kernel Execution on Irregular, Input-Dependent Computational Graphs” from arXiv'24. Motivation Some workloads (e.g., Simulation Engines for Deep RL, Dynamic DNNs) cannot fully utilize the massive parallelism of GPUs (see Figure 1). The main reason is that these workloads contain lots of small kernels which cannot fully utilize the GPU, and these kernels are not executed concurrently, although most of them are independent and in theory can be executed concurrently. ...

2024-02-07 · 7 min · Monsoon

[Paper Reading] GPUPool: A Holistic Approach to Fine-Grained GPU Sharing in the Cloud (PACT'22)

This blog is a write-up of the paper “GPUPool: A Holistic Approach to Fine-Grained GPU Sharing in the Cloud” from PACT'22. Motivation This paper focuses on the GPU sharing in cloud scenarios. Currently, existing GPU sharing techniques can be categorized into 2 types: Time-sharing means executing each concurrent VM on a full device in a round-robin fashion. Pros: Simple and mature. Cons: VMs could still under-utilize the hardware within each time slice. ...

2024-02-07 · 11 min · Monsoon