Uncover Underlying Correspondence for Robust Multi-view Clustering

Haochen Zhou, Guofeng Ding, Mouxing Yang, Peng Hu, Yijie Lin, Xi Peng

Safety, Privacy & Alignment Sat, Apr 25 · 10:30 AM–10:40 AM · 204 A/B Avg rating: 7.00 (6–8)

Abstract

Multi-view clustering (MVC) aims to group unlabeled data into semantically meaningful clusters by leveraging cross-view consistency. However, real-world datasets collected from the web often suffer from noisy correspondence (NC), which breaks the consistency prior and results in unreliable alignments. In this paper, we identify two critical forms of NC that particularly harm clustering: i) category-level mismatch, where semantically consistent samples from the same class are mistakenly treated as negatives; and ii) sample-level mismatch, where collected cross-view pairs are misaligned and some samples may even lack any valid counterpart. To address these challenges, we propose \textbf{CorreGen}, a generative framework that formulates noisy correspondence learning in MVC as maximum likelihood estimation over underlying cross-view correspondences. The objective is elegantly solved via an Expectation–Maximization algorithm: in the E-step, soft correspondence distributions are inferred across views, capturing class-level relations while adaptively down-weighting noisy or unalignable samples through GMM-guided marginals; in the M-step, the embedding network is updated to maximize the expected log-likelihood. Extensive experiments on both synthetic and real-world noisy datasets demonstrate that our method significantly improves clustering robustness. The code is available at [https://github.com/XLearning-SCU/2026-ICLR-CorreGen](https://github.com/XLearning-SCU/2026-ICLR-CorreGen).

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Proposes CorreGen, generative framework for multi-view clustering under noisy correspondence using EM algorithm.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Formulates noisy correspondence learning in MVC as maximum likelihood estimation over underlying correspondences
E-step infers soft correspondence distributions across views with GMM-guided marginals adaptively down-weighting noisy samples
M-step updates embedding network to maximize expected log-likelihood

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Expectation-Maximization algorithm
Generative modeling
Gaussian mixture models
Multi-view learning
Clustering

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

Scene15
LandUse21
Caltech101
UMPC-Food101

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Extend framework to unpaired multi-modal learning
from the paper
Apply to cross-modal retrieval tasks with large-scale noisy data
from the paper

Author keywords

Multi-view clustering; Noisy Correspondence

Something off? Let us know →

Uncover Underlying Correspondence for Robust Multi-view Clustering

Abstract

Author keywords

Related orals

LLM Fingerprinting via Semantically Conditioned Watermarks

Steering the Herd: A Framework for LLM-based Control of Social Learning

Every Language Model Has a Forgery-Resistant Signature

Gaussian certified unlearning in high dimensions: A hypothesis testing approach

Differentially Private Domain Discovery