ICLR 2026 Orals

Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning

Yaoyu Wang, Hankun Dai, Zhidong Yang, Junmin Xiao, Guangming Tan

Reinforcement Learning & Agents Thu, Apr 23 · 10:30 AM–10:40 AM · 202 A/B Avg rating: 6.00 (4–8)
Author-provided TL;DR

We propose SparseRL, a deep reinforcement learning framework that generates high-performance CUDA code for sparse matrix operations, achieving significant improvements in both correctness and execution efficiency.

Abstract

Code generation is a crucial research area in the field of artificial intelligence, holding the potential to revolutionize software development and streamline programming processes. However, generating the high-performance code, which need to be executed in a shorter time for the low-latency scenario, remains a formidable challenge. Existing methods often struggle to account for the irregularity of input sparse data in sparse programs and the need for domain-specific architectural knowledge, leading to sub-optimal performance. To tackle these issues, we propose the SparseRL framework. SparseRL leverages deep reinforcement learning, treating a pre-trained language model as a stochastic policy. It takes the row and column indices of non-zero elements in the sparse matrix as input and generates CUDA code as output for sparse matrix operations. We also introduce a domain-specific code generation mechanism for the dynamic input, a sinusoidal embedding technique tailored for sparse matrices, and a hierarchical reward function that considers both code correctness and execution efficiency. Experimental results demonstrate SparseRL achieves state-of-the-art performance. In sparse matrix-vector multiplication (SpMV) tasks, it improves the compilation rate by 20% compared to existing methods, and the generated code runs 30% faster on average. For sparse matrix-dense matrix multiplication (SpMM) tasks, SparseRL also shows significant performance gains. These results highlight the effectiveness of SparseRL in generating high-performance CUDA code for sparse matrix operations.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

SparseRL leverages deep RL and pretrained models to generate high-performance CUDA code for sparse matrix operations.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Domain-specific code generation mechanism for dynamic sparse matrix inputs
  • Sinusoidal embedding technique tailored for sparse matrices
  • Hierarchical reward function considering both code correctness and execution efficiency
  • 20% improvement in compilation rate and 30% faster execution on SpMV tasks
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Reinforcement learning
  • Pretrained language models
  • Domain-specific code generation
Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)
  • University of Florida Sparse Matrix Collection
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • RL-based optimization is computationally expensive during fine-tuning due to compiler and executor interactions
    from the paper
  • Method best-suited for scenarios where sparse code can be reused repeatedly due to generation and execution time overhead
    from the paper
  • Extension to other hardware backends is non-trivial
    from the paper
Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Replace sparse matrix indices with task-specific structural features and use multi-modal adapters
    from the paper
  • Adapt hierarchical reward to task-specific metrics like loop execution time and parallelization speedup
    from the paper
  • Reuse pretrain-SFT-RL workflow with task-specific training data for general code optimization
    from the paper

Author keywords

  • Reinforcement Learning
  • CUDA Code Generation
  • High-Performance Computing

Related orals

Something off? Let us know →