Efficient Resource-Constrained Training of Transformers via Subspace Optimization
WASI applies subspace-based training to transformer models reducing memory by 62x and FLOPs by 2x while maintaining accuracy on edge devices.
Efficient inference, quantization, pruning, distillation, sparse models, GPU kernels, ML systems.
WASI applies subspace-based training to transformer models reducing memory by 62x and FLOPs by 2x while maintaining accuracy on edge devices.
Introduces MotionStream enabling sub-second latency motion-controlled infinite-length video generation via causal diffusion.
Hierarchical Speculative Decoding uses lossless verification to maximize accepted tokens while preserving target distribution fidelity.
Proposes probabilistic kernel functions for angle testing enabling efficient approximate nearest neighbor search.
Generates minute-long high-resolution videos efficiently with linear attention and constant-memory KV cache for block autoregression.
Speculative Actions accelerates agent systems by predicting and executing likely future actions in parallel.
TileLang enables hardware-aware fused kernel programming with tile inference and recommendation achieving 5-6x speedup.
Analyzes low-precision flash attention training failure caused by low-rank representations and biased BF16 rounding errors.