Uncover Underlying Correspondence for Robust Multi-view Clustering
Haochen Zhou, Guofeng Ding, Mouxing Yang, Peng Hu, Yijie Lin, Xi Peng
Abstract
Multi-view clustering (MVC) aims to group unlabeled data into semantically meaningful clusters by leveraging cross-view consistency. However, real-world datasets collected from the web often suffer from noisy correspondence (NC), which breaks the consistency prior and results in unreliable alignments. In this paper, we identify two critical forms of NC that particularly harm clustering: i) category-level mismatch, where semantically consistent samples from the same class are mistakenly treated as negatives; and ii) sample-level mismatch, where collected cross-view pairs are misaligned and some samples may even lack any valid counterpart. To address these challenges, we propose \textbf{CorreGen}, a generative framework that formulates noisy correspondence learning in MVC as maximum likelihood estimation over underlying cross-view correspondences. The objective is elegantly solved via an Expectation–Maximization algorithm: in the E-step, soft correspondence distributions are inferred across views, capturing class-level relations while adaptively down-weighting noisy or unalignable samples through GMM-guided marginals; in the M-step, the embedding network is updated to maximize the expected log-likelihood. Extensive experiments on both synthetic and real-world noisy datasets demonstrate that our method significantly improves clustering robustness. The code is available at [https://github.com/XLearning-SCU/2026-ICLR-CorreGen](https://github.com/XLearning-SCU/2026-ICLR-CorreGen).
Proposes CorreGen, generative framework for multi-view clustering under noisy correspondence using EM algorithm.
- Formulates noisy correspondence learning in MVC as maximum likelihood estimation over underlying correspondences
- E-step infers soft correspondence distributions across views with GMM-guided marginals adaptively down-weighting noisy samples
- M-step updates embedding network to maximize expected log-likelihood
- Expectation-Maximization algorithm
- Generative modeling
- Gaussian mixture models
- Multi-view learning
- Clustering
- Scene15
- LandUse21
- Caltech101
- UMPC-Food101
Authors did not state explicit limitations.
Extend framework to unpaired multi-modal learning
from the paperApply to cross-modal retrieval tasks with large-scale noisy data
from the paper
Author keywords
- Multi-view clustering; Noisy Correspondence
Related orals
LLM Fingerprinting via Semantically Conditioned Watermarks
Introduces semantically conditioned watermarks for robust and stealthy LLM fingerprinting robust to deployment scenarios.
Steering the Herd: A Framework for LLM-based Control of Social Learning
Framework studying strategic control of social learning by algorithmic information mediators with theoretical analysis and LLM-based simulations.
Every Language Model Has a Forgery-Resistant Signature
Ellipse signatures function as forgery-resistant model output identifiers based on high-dimensional geometric constraints.
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
Analyzes machine unlearning in high dimensions showing single noisy Newton step with Gaussian noise suffices for privacy-accuracy.
Differentially Private Domain Discovery
WGM-based methods provide efficient domain discovery with near-optimal guarantees for missing mass on Zipfian data.