ICLR 2026 Orals

Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data

Guillaume Braun, Bruno Loureiro, Minh Ha Quang, Masaaki Imaizumi

Theory & Optimization Thu, Apr 23 · 11:18 AM–11:28 AM · 204 A/B Avg rating: 5.50 (4–6)

Abstract

Scaling laws describe how learning performance improves with data, compute, or training time, and have become a central theme in modern deep learning. We study this phenomenon in a canonical nonlinear model: phase retrieval with anisotropic Gaussian inputs whose covariance spectrum follows a power law. Unlike the isotropic case, where dynamics collapse to a two-dimensional system, anisotropy yields a qualitatively new regime in which an infinite hierarchy of coupled equations governs the evolution of the summary statistics. We develop a tractable reduction that reveals a three-phase trajectory: (i) fast escape from low alignment, (ii) slow convergence of the summary statistics, and (iii) spectral-tail learning in low-variance directions. From this decomposition, we derive explicit scaling laws for the mean-squared error, showing how spectral decay dictates convergence times and error curves. Experiments confirm the predicted phases and exponents. These results provide the first rigorous characterization of scaling laws in nonlinear regression with anisotropic data, highlighting how anisotropy reshapes learning dynamics.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Analyzes phase retrieval learning dynamics with anisotropic data, deriving explicit scaling laws and three-phase trajectories.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Develops tractable reduction of infinite hierarchy of equations governing anisotropic phase retrieval dynamics
  • Reveals three-phase trajectory: fast escape, slow convergence, spectral-tail learning
  • Derives explicit scaling laws showing how spectral decay dictates convergence times
  • Provides first rigorous characterization of scaling laws in nonlinear regression with anisotropic data
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Gradient flow
  • Phase retrieval
  • Scaling laws
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Analysis assumes Gaussian inputs with power-law spectrum; may not be essential but simplifies theory
    from the paper
  • Quantitative results apply only in gradient flow limit; extending to discrete-time SGD remains open
    from the paper
Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Derive scaling laws for discrete-time SGD
    from the paper
  • Analyze finite-sample effects
    from the paper
  • Relax distributional assumptions
    from the paper
  • Extend beyond quadratic nonlinearities to single- and multi-index models
    from the paper
  • Study phase decompositions in wider neural networks for bridge to deep learning theory
    from the paper

Author keywords

  • scaling laws
  • gradient flow
  • power-law spectrum
  • phase retrieval
  • anisotropic data
  • learning dynamics

Related orals

Something off? Let us know →