InfoNCE Induces Gaussian Distribution

Roy Betser, Eyal Gofer, Meir Yossef Levi, Guy Gilboa

Theory & Optimization Sat, Apr 25 · 10:54 AM–11:04 AM · 203 A/B Avg rating: 4.00 (2–8)

Author-provided TL;DR

Contrastive learning based representations can be well approximated by a multivariate Gaussian distribution.

Abstract

Contrastive learning has become a cornerstone of modern representation learning, allowing training with massive unlabeled data for both task-specific and general (foundation) models. A prototypical loss in contrastive training is InfoNCE and its variants. In this work, we show that the InfoNCE objective induces Gaussian structure in representations that emerge from contrastive training. We establish this result in two complementary regimes. First, we show that under certain alignment and concentration assumptions, projections of the high-dimensional representation asymptotically approach a multivariate Gaussian distribution. Next, under less strict assumptions, we show that adding a small asymptotically vanishing regularization term that promotes low feature norm and high feature entropy leads to similar asymptotic results. We support our analysis with experiments on synthetic and CIFAR-10 datasets across multiple encoder architectures and sizes, demonstrating consistent Gaussian behavior. This perspective provides a principled explanation for commonly observed Gaussianity in contrastive representations. The resulting Gaussian model enables principled analytical treatment of learned representations and is expected to support a wide range of applications in contrastive learning.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Shows InfoNCE loss induces Gaussian distribution in contrastive representations, providing principled explanation for observed Gaussianity.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Proves InfoNCE-trained representations asymptotically approach multivariate Gaussian distribution
Shows adding regularization promoting low norm and high entropy yields similar asymptotic results
Validates Gaussian behavior across synthetic, CIFAR-10, and pretrained models

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Contrastive learning
Information theory
Representation learning
Gaussian models

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

CIFAR-10
MS-COCO
ImageNet-R

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Results are asymptotic relying on high-dimensional limits and idealized assumptions
from the paper
Analysis not includes optimization dynamics or proof of training reaching stated minimizers
from the paper
Results characterize population optima under stated assumptions rather than practical training
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

Contrastive learning
Gaussian distribution
InfoNCE

Something off? Let us know →

InfoNCE Induces Gaussian Distribution

Abstract

Author keywords

Related orals

On The Surprising Effectiveness of a Single Global Merging in Decentralized Learning

Non-Convex Federated Optimization under Cost-Aware Client Selection

Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data

A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization

Quantitative Bounds for Length Generalization in Transformers