ICLR 2026 Orals

Uncover Underlying Correspondence for Robust Multi-view Clustering

Haochen Zhou, Guofeng Ding, Mouxing Yang, Peng Hu, Yijie Lin, Xi Peng

Safety, Privacy & Alignment Sat, Apr 25 · 10:30 AM–10:40 AM · 204 A/B Avg rating: 7.00 (6–8)

Abstract

Multi-view clustering (MVC) aims to group unlabeled data into semantically meaningful clusters by leveraging cross-view consistency. However, real-world datasets collected from the web often suffer from noisy correspondence (NC), which breaks the consistency prior and results in unreliable alignments. In this paper, we identify two critical forms of NC that particularly harm clustering: i) category-level mismatch, where semantically consistent samples from the same class are mistakenly treated as negatives; and ii) sample-level mismatch, where collected cross-view pairs are misaligned and some samples may even lack any valid counterpart. To address these challenges, we propose \textbf{CorreGen}, a generative framework that formulates noisy correspondence learning in MVC as maximum likelihood estimation over underlying cross-view correspondences. The objective is elegantly solved via an Expectation–Maximization algorithm: in the E-step, soft correspondence distributions are inferred across views, capturing class-level relations while adaptively down-weighting noisy or unalignable samples through GMM-guided marginals; in the M-step, the embedding network is updated to maximize the expected log-likelihood. Extensive experiments on both synthetic and real-world noisy datasets demonstrate that our method significantly improves clustering robustness. The code is available at [https://github.com/XLearning-SCU/2026-ICLR-CorreGen](https://github.com/XLearning-SCU/2026-ICLR-CorreGen).

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Proposes CorreGen, generative framework for multi-view clustering under noisy correspondence using EM algorithm.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Formulates noisy correspondence learning in MVC as maximum likelihood estimation over underlying correspondences
  • E-step infers soft correspondence distributions across views with GMM-guided marginals adaptively down-weighting noisy samples
  • M-step updates embedding network to maximize expected log-likelihood
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Expectation-Maximization algorithm
  • Generative modeling
  • Gaussian mixture models
  • Multi-view learning
  • Clustering
Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Scene15
  • LandUse21
  • Caltech101
  • UMPC-Food101
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Extend framework to unpaired multi-modal learning
    from the paper
  • Apply to cross-modal retrieval tasks with large-scale noisy data
    from the paper

Author keywords

  • Multi-view clustering; Noisy Correspondence

Related orals

Something off? Let us know →