Efficiency, Systems & Kernels

Efficient inference, quantization, pruning, distillation, sparse models, GPU kernels, ML systems.

All papers

Min rating

Sort

WASI applies subspace-based training to transformer models reducing memory by 62x and FLOPs by 2x while maintaining accuracy on edge devices.

Avg rating: 6.00 (6–6) · Le-Trung Nguyen et al.

Introduces MotionStream enabling sub-second latency motion-controlled infinite-length video generation via causal diffusion.

Avg rating: 5.50 (4–6) · Joonghyuk Shin et al.

Hierarchical Speculative Decoding uses lossless verification to maximize accepted tokens while preserving target distribution fidelity.

Avg rating: 5.00 (0–8) · Yuxuan Zhou et al.

Proposes probabilistic kernel functions for angle testing enabling efficient approximate nearest neighbor search.

Avg rating: 8.00 (8–8) · Kejing Lu et al.

Generates minute-long high-resolution videos efficiently with linear attention and constant-memory KV cache for block autoregression.

Avg rating: 6.50 (6–8) · Junsong Chen et al.

Speculative Actions accelerates agent systems by predicting and executing likely future actions in parallel.

Avg rating: 7.50 (6–10) · Naimeng Ye et al.

TileLang enables hardware-aware fused kernel programming with tile inference and recommendation achieving 5-6x speedup.

Avg rating: 7.00 (4–8) · Lei Wang et al.

Analyzes low-precision flash attention training failure caused by low-rank representations and biased BF16 rounding errors.

Avg rating: 6.50 (4–8) · Haiquan Qiu et al.