Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning

Yaoyu Wang, Hankun Dai, Zhidong Yang, Junmin Xiao, Guangming Tan

Reinforcement Learning & Agents Thu, Apr 23 · 10:30 AM–10:40 AM · 202 A/B Avg rating: 6.00 (4–8)

Author-provided TL;DR

We propose SparseRL, a deep reinforcement learning framework that generates high-performance CUDA code for sparse matrix operations, achieving significant improvements in both correctness and execution efficiency.

Abstract

Code generation is a crucial research area in the field of artificial intelligence, holding the potential to revolutionize software development and streamline programming processes. However, generating the high-performance code, which need to be executed in a shorter time for the low-latency scenario, remains a formidable challenge. Existing methods often struggle to account for the irregularity of input sparse data in sparse programs and the need for domain-specific architectural knowledge, leading to sub-optimal performance. To tackle these issues, we propose the SparseRL framework. SparseRL leverages deep reinforcement learning, treating a pre-trained language model as a stochastic policy. It takes the row and column indices of non-zero elements in the sparse matrix as input and generates CUDA code as output for sparse matrix operations. We also introduce a domain-specific code generation mechanism for the dynamic input, a sinusoidal embedding technique tailored for sparse matrices, and a hierarchical reward function that considers both code correctness and execution efficiency. Experimental results demonstrate SparseRL achieves state-of-the-art performance. In sparse matrix-vector multiplication (SpMV) tasks, it improves the compilation rate by 20% compared to existing methods, and the generated code runs 30% faster on average. For sparse matrix-dense matrix multiplication (SpMM) tasks, SparseRL also shows significant performance gains. These results highlight the effectiveness of SparseRL in generating high-performance CUDA code for sparse matrix operations.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

SparseRL leverages deep RL and pretrained models to generate high-performance CUDA code for sparse matrix operations.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Domain-specific code generation mechanism for dynamic sparse matrix inputs
Sinusoidal embedding technique tailored for sparse matrices
Hierarchical reward function considering both code correctness and execution efficiency
20% improvement in compilation rate and 30% faster execution on SpMV tasks

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Reinforcement learning
Pretrained language models
Domain-specific code generation

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

University of Florida Sparse Matrix Collection

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

RL-based optimization is computationally expensive during fine-tuning due to compiler and executor interactions
from the paper
Method best-suited for scenarios where sparse code can be reused repeatedly due to generation and execution time overhead
from the paper
Extension to other hardware backends is non-trivial
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Replace sparse matrix indices with task-specific structural features and use multi-modal adapters
from the paper
Adapt hierarchical reward to task-specific metrics like loop execution time and parallelization speedup
from the paper
Reuse pretrain-SFT-RL workflow with task-specific training data for general code optimization
from the paper

Author keywords

Reinforcement Learning
CUDA Code Generation
High-Performance Computing

Something off? Let us know →

Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning

Abstract

Author keywords

Related orals

Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport

Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training