Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data
Guillaume Braun, Bruno Loureiro, Minh Ha Quang, Masaaki Imaizumi
Abstract
Scaling laws describe how learning performance improves with data, compute, or training time, and have become a central theme in modern deep learning. We study this phenomenon in a canonical nonlinear model: phase retrieval with anisotropic Gaussian inputs whose covariance spectrum follows a power law. Unlike the isotropic case, where dynamics collapse to a two-dimensional system, anisotropy yields a qualitatively new regime in which an infinite hierarchy of coupled equations governs the evolution of the summary statistics. We develop a tractable reduction that reveals a three-phase trajectory: (i) fast escape from low alignment, (ii) slow convergence of the summary statistics, and (iii) spectral-tail learning in low-variance directions. From this decomposition, we derive explicit scaling laws for the mean-squared error, showing how spectral decay dictates convergence times and error curves. Experiments confirm the predicted phases and exponents. These results provide the first rigorous characterization of scaling laws in nonlinear regression with anisotropic data, highlighting how anisotropy reshapes learning dynamics.
Analyzes phase retrieval learning dynamics with anisotropic data, deriving explicit scaling laws and three-phase trajectories.
- Develops tractable reduction of infinite hierarchy of equations governing anisotropic phase retrieval dynamics
- Reveals three-phase trajectory: fast escape, slow convergence, spectral-tail learning
- Derives explicit scaling laws showing how spectral decay dictates convergence times
- Provides first rigorous characterization of scaling laws in nonlinear regression with anisotropic data
- Gradient flow
- Phase retrieval
- Scaling laws
Analysis assumes Gaussian inputs with power-law spectrum; may not be essential but simplifies theory
from the paperQuantitative results apply only in gradient flow limit; extending to discrete-time SGD remains open
from the paper
Derive scaling laws for discrete-time SGD
from the paperAnalyze finite-sample effects
from the paperRelax distributional assumptions
from the paperExtend beyond quadratic nonlinearities to single- and multi-index models
from the paperStudy phase decompositions in wider neural networks for bridge to deep learning theory
from the paper
Author keywords
- scaling laws
- gradient flow
- power-law spectrum
- phase retrieval
- anisotropic data
- learning dynamics
Related orals
On The Surprising Effectiveness of a Single Global Merging in Decentralized Learning
Shows decentralized learning with single global merging achieves convergence rates matching parallel SGD under data heterogeneity.
Non-Convex Federated Optimization under Cost-Aware Client Selection
Develops efficient federated optimization algorithm with cost-aware client selection achieving best communication and local complexity.
A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization
Representer theorem for Hawkes processes shows dual coefficients are analytically fixed to unity via penalized least squares.
Quantitative Bounds for Length Generalization in Transformers
Quantitative bounds show training length required for length generalization depends on periodicity, locality, alphabet size, and model norms.
Difficult Examples Hurt Unsupervised Contrastive Learning: A Theoretical Perspective
EBTs frame System 2 thinking as energy minimization enabling inference-time reasoning emergence across modalities.