Efficient Resource-Constrained Training of Transformers via Subspace Optimization

Le-Trung Nguyen, Enzo Tartaglione, Van-Tam Nguyen

Efficiency, Systems & Kernels Fri, Apr 24 · 11:06 AM–11:16 AM · 204 A/B Avg rating: 6.00 (6–6)

Author-provided TL;DR

We propose a novel method that enables training vision transformer models within a low-rank subspace to optimize computational resources, making on-device learning practically feasible.

Abstract

As AI increasingly shapes daily life, energy consumption and data privacy have become pressing concerns. On-device learning trains models directly on edge devices, cutting energy consumption and safeguarding data privacy. However, the expanding scale of modern neural networks creates a major obstacle for on-device training. Although prior work has concentrated on compact convolutional architectures, we instead apply subspace-based training to transformer models. Motivated by the idea that a model's essential information lies in a fixed subspace, we introduce Weight-Activation Subspace Iteration (WASI), a method that mitigates the memory bottleneck of backpropagation and boosts inference efficiency in transformer models by restricting training to this subspace. Our results demonstrate that WASI maintains accuracy comparable to vanilla training while reducing memory usage by up to $62\times$ and computational cost (FLOPs) by up to $2\times$. On a Raspberry Pi 5, WASI achieves roughly $1.4\times$ faster training and inference than vanilla training. The code is available at https://github.com/Le-TrungNguyen/ICLR2026-WASI.git.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

WASI applies subspace-based training to transformer models reducing memory by 62x and FLOPs by 2x while maintaining accuracy on edge devices.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Weight-Activation Subspace Iteration method restricting training to fixed low-dimensional subspace
Reduces memory usage by up to 62x and computational cost by up to 2x
Achieves 1.4x speedup over vanilla training on Raspberry Pi 5

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Subspace optimization
SVD
Transformer training
On-device learning
Low-rank approximation

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Experiments primarily focused on vision tasks; hardware limitations prevent evaluation of larger-scale models
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Extend to broader range of tasks with emphasis on LLMs
from the paper

Author keywords

Deep Learning
Computer Vision
Compression
Low rank

Something off? Let us know →

Efficient Resource-Constrained Training of Transformers via Subspace Optimization

Abstract

Author keywords

Related orals

TileLang: Bridge Programmability and Performance in Modern Neural Kernels

Probabilistic Kernel Function for Fast Angle Testing

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

Speculative Actions: A Lossless Framework for Faster AI Agents