ICLR 2026 Orals

Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime

Leonardo Defilippis, Yizhou Xu, Julius Girardin, Vittorio Erba, Emanuele Troiani, Lenka Zdeborová, Bruno Loureiro, Florent Krzakala

Graph Learning Fri, Apr 24 · 10:54 AM–11:04 AM · 204 A/B Avg rating: 7.00 (6–8)
Author-provided TL;DR

We derive a phase diagram of scaling laws for diagonal and quadratic neural networks via a bridge to LASSO and matrix compressed sensing, predicting both generalization and the emergence of power-law weight spectra.

Abstract

Neural scaling laws underlie many of the recent advances in deep learning, yet their theoretical understanding remains largely confined to linear models. In this work, we present a systematic analysis of scaling laws for quadratic and diagonal neural networks in the feature learning regime. Leveraging connections with matrix compressed sensing and LASSO, we derive a detailed phase diagram for the scaling exponents of the excess risk as a function of sample complexity and weight decay. This analysis uncovers crossovers between distinct scaling regimes and plateau behaviors, mirroring phenomena widely reported in the empirical neural scaling literature. Furthermore, we establish a precise link between these regimes and the spectral properties of the trained network weights, which we characterize in detail. As a consequence, we provide a theoretical validation of recent empirical observations connecting the emergence of power-law tails in the weight spectrum with network generalization performance, yielding an interpretation from first principles.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Analyzes scaling laws for shallow networks with feature learning via sparse estimation and matrix compression theory.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Maps shallow network scaling laws to sparse vector and low-rank matrix estimation problems
  • Derives comprehensive phase diagram for excess risk scaling laws with distinct scaling regimes
  • Establishes connection between weight spectra and generalization from first principles
  • Provides theoretical validation of empirical observations linking power-law tails to network generalization
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Scaling laws
  • Sparse estimation
  • Matrix compressed sensing
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Explore additional data structures like non-trivial covariances
    from the paper
  • Extend beyond two-layer networks and quadratic activations
    from the paper
  • Prove state evolution conjecture rigorously
    from the paper
  • Analyze compute scaling laws of GD/SGD
    from the paper
  • Study implicit biases of SGD toward heavy tails
    from the paper

Author keywords

  • Scaling laws; Neural networks; LASSO and matrix compressed sensing; Random matrix theory; Approximate message passing; High dimensional Statistics

Related orals

Something off? Let us know →