ICLR 2026 Orals

TileLang: Bridge Programmability and Performance in Modern Neural Kernels

Lei Wang, Yu Cheng, Yining Shi, Zhiwen Mo, Zhengju Tang, Wenhao Xie, Tong Wu, Lingxiao Ma, Yuqing Xia, Jilong Xue, Fan Yang, Zhi Yang

Efficiency, Systems & Kernels Thu, Apr 23 · 11:06 AM–11:16 AM · 202 A/B Avg rating: 7.00 (4–8)
Author-provided TL;DR

We introduce TileLang, a controllable programming system for fused neural kernels.

Abstract

Modern AI algorithms increasingly adopt fused kernels for performance, but implementing them remains complex due to the lack of fine-grained control in existing compilers like Triton. We introduce TileLang, a controllable programming system for fused neural kernels. TileLang provides explicit tile-level primitives for memory placement, data movement, and parallel scheduling. To guide developers in hardware-aware programming, the TileLang introduces two key techniques: tile inference which models tile programs as fused graphs and automatically deduces tile configuration from partial annotations; and tile recommendation that suggests efficient tile configurations based on hardware profiles and heuristics. TileLang makes it easy to express a wide range of fused attention kernels in under 80 lines of Python code, reducing code size by up to 90% compared to manual implementations. Evaluations show that TileLang achieves up to 5x speedup over Triton on NVIDIA H100 and up to 6 on AMD GPUs, demonstrating its ability to bridge programmability and performance.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

TileLang enables hardware-aware fused kernel programming with tile inference and recommendation achieving 5-6x speedup.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Tile-level programming model with explicit primitives for memory, data movement, and parallel scheduling
  • Tile inference that automatically deduces tile configuration from partial annotations via fused graph modeling
  • Tile recommendation suggesting efficient configurations from hardware profiles and heuristics
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Graph-based optimization
  • Tile-level abstraction
  • Hardware profiling
  • Configuration inference
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

  • compiler; AI; programming model

Related orals

Something off? Let us know →