Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime

Leonardo Defilippis, Yizhou Xu, Julius Girardin, Vittorio Erba, Emanuele Troiani, Lenka Zdeborová, Bruno Loureiro, Florent Krzakala

Graph Learning Fri, Apr 24 · 10:54 AM–11:04 AM · 204 A/B Avg rating: 7.00 (6–8)

OpenReview ↗ arXiv ↗ PDF ↗ iclr.cc ↗

Author-provided TL;DR

We derive a phase diagram of scaling laws for diagonal and quadratic neural networks via a bridge to LASSO and matrix compressed sensing, predicting both generalization and the emergence of power-law weight spectra.

Abstract

Neural scaling laws underlie many of the recent advances in deep learning, yet their theoretical understanding remains largely confined to linear models. In this work, we present a systematic analysis of scaling laws for quadratic and diagonal neural networks in the feature learning regime. Leveraging connections with matrix compressed sensing and LASSO, we derive a detailed phase diagram for the scaling exponents of the excess risk as a function of sample complexity and weight decay. This analysis uncovers crossovers between distinct scaling regimes and plateau behaviors, mirroring phenomena widely reported in the empirical neural scaling literature. Furthermore, we establish a precise link between these regimes and the spectral properties of the trained network weights, which we characterize in detail. As a consequence, we provide a theoretical validation of recent empirical observations connecting the emergence of power-law tails in the weight spectrum with network generalization performance, yielding an interpretation from first principles.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Analyzes scaling laws for shallow networks with feature learning via sparse estimation and matrix compression theory.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Maps shallow network scaling laws to sparse vector and low-rank matrix estimation problems
Derives comprehensive phase diagram for excess risk scaling laws with distinct scaling regimes
Establishes connection between weight spectra and generalization from first principles
Provides theoretical validation of empirical observations linking power-law tails to network generalization

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Scaling laws
Sparse estimation
Matrix compressed sensing

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Explore additional data structures like non-trivial covariances
from the paper
Extend beyond two-layer networks and quadratic activations
from the paper
Prove state evolution conjecture rigorously
from the paper
Analyze compute scaling laws of GD/SGD
from the paper
Study implicit biases of SGD toward heavy tails
from the paper

Author keywords

Scaling laws; Neural networks; LASSO and matrix compressed sensing; Random matrix theory; Approximate message passing; High dimensional Statistics

Something off? Let us know →

Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime

Abstract

Author keywords

Related orals

One for Two: A Unified Framework for Imbalanced Graph Classification via Dynamic Balanced Prototype

Compactness and Consistency: A Conjoint Framework for Deep Graph Clustering

Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models

Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment

Exchangeability of GNN Representations with Applications to Graph Retrieval