Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks
Brandon Livio Annesi, Dario Bocchi, Chiara Cammarota
We quantitatively analyze how overparametrization reshapes the high-dimensional loss landscape of a teacher–student setup in random positions, showing it can anticipate and qualitatively alter transitions between successful and failed signal recovery
Abstract
High-dimensional non-convex loss landscapes play a central role in the theory of Machine Learning. Gaining insight into how these landscapes interact with gradient-based optimization methods, even in relatively simple models, can shed light on this enigmatic feature of neural networks. In this work, we will focus on a prototypical simple learning problem, which generalizes the Phase Retrieval inference problem by allowing the exploration of overparametrized settings. Using techniques from field theory, we analyze the spectrum of the Hessian at initialization and identify a Baik–Ben Arous–Péché (BBP) transition in the amount of data that separates regimes where the initialization is informative or uninformative about a planted signal of a teacher-student setup. Crucially, we demonstrate how overparameterization can "bend" the loss landscape, shifting the transition point, even reaching the information-theoretic weak-recovery threshold in the large overparameterization limit, while also altering its qualitative nature. We distinguish between continuous and discontinuous BBP transitions and support our analytical predictions with simulations, examining how they compare to the finite-N behavior. In the case of discontinuous BBP transitions strong finite-N corrections allow the retrieval of information at a signal-to-noise ratio (SNR) smaller than the predicted BBP transition. In these cases we provide estimates for a new lower SNR threshold that marks the point at which initialization becomes entirely uninformative.
Analyzes how overparametrization shifts BBP transition point in loss landscape, bending geometric properties.
- Analyzes loss landscape at initialization for teacher-student setup with quadratic activation
- Identifies Baik-Ben Arous-Péché transition in data amount separating informative from uninformative initialization
- Shows overparametrization bends loss landscape, shifting transition point toward information-theoretic threshold
- Distinguishes between continuous and discontinuous BBP transitions based on overparametrization
- Hessian analysis
- Spectral methods
- Phase transitions
Authors did not state explicit limitations.
Authors did not state explicit future directions.
Author keywords
- Overparametrization
- Loss landscapes
- Signal recovery
- High-dimensional learning
Related orals
On The Surprising Effectiveness of a Single Global Merging in Decentralized Learning
Shows decentralized learning with single global merging achieves convergence rates matching parallel SGD under data heterogeneity.
Non-Convex Federated Optimization under Cost-Aware Client Selection
Develops efficient federated optimization algorithm with cost-aware client selection achieving best communication and local complexity.
Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data
Analyzes phase retrieval learning dynamics with anisotropic data, deriving explicit scaling laws and three-phase trajectories.
A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization
Representer theorem for Hawkes processes shows dual coefficients are analytically fixed to unity via penalized least squares.
Quantitative Bounds for Length Generalization in Transformers
Quantitative bounds show training length required for length generalization depends on periodicity, locality, alphabet size, and model norms.