Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data

Guillaume Braun, Bruno Loureiro, Minh Ha Quang, Masaaki Imaizumi

Theory & Optimization Thu, Apr 23 · 11:18 AM–11:28 AM · 204 A/B Avg rating: 5.50 (4–6)

Abstract

Scaling laws describe how learning performance improves with data, compute, or training time, and have become a central theme in modern deep learning. We study this phenomenon in a canonical nonlinear model: phase retrieval with anisotropic Gaussian inputs whose covariance spectrum follows a power law. Unlike the isotropic case, where dynamics collapse to a two-dimensional system, anisotropy yields a qualitatively new regime in which an infinite hierarchy of coupled equations governs the evolution of the summary statistics. We develop a tractable reduction that reveals a three-phase trajectory: (i) fast escape from low alignment, (ii) slow convergence of the summary statistics, and (iii) spectral-tail learning in low-variance directions. From this decomposition, we derive explicit scaling laws for the mean-squared error, showing how spectral decay dictates convergence times and error curves. Experiments confirm the predicted phases and exponents. These results provide the first rigorous characterization of scaling laws in nonlinear regression with anisotropic data, highlighting how anisotropy reshapes learning dynamics.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Analyzes phase retrieval learning dynamics with anisotropic data, deriving explicit scaling laws and three-phase trajectories.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Develops tractable reduction of infinite hierarchy of equations governing anisotropic phase retrieval dynamics
Reveals three-phase trajectory: fast escape, slow convergence, spectral-tail learning
Derives explicit scaling laws showing how spectral decay dictates convergence times
Provides first rigorous characterization of scaling laws in nonlinear regression with anisotropic data

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Gradient flow
Phase retrieval
Scaling laws

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Analysis assumes Gaussian inputs with power-law spectrum; may not be essential but simplifies theory
from the paper
Quantitative results apply only in gradient flow limit; extending to discrete-time SGD remains open
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Derive scaling laws for discrete-time SGD
from the paper
Analyze finite-sample effects
from the paper
Relax distributional assumptions
from the paper
Extend beyond quadratic nonlinearities to single- and multi-index models
from the paper
Study phase decompositions in wider neural networks for bridge to deep learning theory
from the paper

Author keywords

scaling laws
gradient flow
power-law spectrum
phase retrieval
anisotropic data
learning dynamics

Something off? Let us know →

Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data

Abstract

Author keywords

Related orals

On The Surprising Effectiveness of a Single Global Merging in Decentralized Learning

Non-Convex Federated Optimization under Cost-Aware Client Selection

A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization

Quantitative Bounds for Length Generalization in Transformers

Difficult Examples Hurt Unsupervised Contrastive Learning: A Theoretical Perspective