Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
Leonardo Defilippis, Yizhou Xu, Julius Girardin, Vittorio Erba, Emanuele Troiani, Lenka Zdeborová, Bruno Loureiro, Florent Krzakala
We derive a phase diagram of scaling laws for diagonal and quadratic neural networks via a bridge to LASSO and matrix compressed sensing, predicting both generalization and the emergence of power-law weight spectra.
Abstract
Neural scaling laws underlie many of the recent advances in deep learning, yet their theoretical understanding remains largely confined to linear models. In this work, we present a systematic analysis of scaling laws for quadratic and diagonal neural networks in the feature learning regime. Leveraging connections with matrix compressed sensing and LASSO, we derive a detailed phase diagram for the scaling exponents of the excess risk as a function of sample complexity and weight decay. This analysis uncovers crossovers between distinct scaling regimes and plateau behaviors, mirroring phenomena widely reported in the empirical neural scaling literature. Furthermore, we establish a precise link between these regimes and the spectral properties of the trained network weights, which we characterize in detail. As a consequence, we provide a theoretical validation of recent empirical observations connecting the emergence of power-law tails in the weight spectrum with network generalization performance, yielding an interpretation from first principles.
Analyzes scaling laws for shallow networks with feature learning via sparse estimation and matrix compression theory.
- Maps shallow network scaling laws to sparse vector and low-rank matrix estimation problems
- Derives comprehensive phase diagram for excess risk scaling laws with distinct scaling regimes
- Establishes connection between weight spectra and generalization from first principles
- Provides theoretical validation of empirical observations linking power-law tails to network generalization
- Scaling laws
- Sparse estimation
- Matrix compressed sensing
Authors did not state explicit limitations.
Explore additional data structures like non-trivial covariances
from the paperExtend beyond two-layer networks and quadratic activations
from the paperProve state evolution conjecture rigorously
from the paperAnalyze compute scaling laws of GD/SGD
from the paperStudy implicit biases of SGD toward heavy tails
from the paper
Author keywords
- Scaling laws; Neural networks; LASSO and matrix compressed sensing; Random matrix theory; Approximate message passing; High dimensional Statistics
Related orals
One for Two: A Unified Framework for Imbalanced Graph Classification via Dynamic Balanced Prototype
Unified framework for imbalanced graph classification using dynamic balanced prototypes and prototype load-balancing optimization.
Compactness and Consistency: A Conjoint Framework for Deep Graph Clustering
CoCo framework captures compactness and consistency in graph neural network representations for improved deep graph clustering.
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models
GraphGlue uses Riemannian geometry to merge multi-domain graphs into unified manifolds, enabling knowledge transfer across graph domains.
Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment
Proposes framework to handle noisy entity-attribute and inter-graph correspondences in multi-modal entity alignment.
Exchangeability of GNN Representations with Applications to Graph Retrieval
Graph embeddings exhibit exchangeability property, enabling efficient graph retrieval via transport-based similarity approximation with locality-sensitive hashing.