ICLR 2026 Orals

AnyUp: Universal Feature Upsampling

Thomas Wimmer, Prune Truong, Marie-Julie Rakotosaona, Michael Oechsle, Federico Tombari, Bernt Schiele, Jan Eric Lenssen

Vision & 3D Sat, Apr 25 · 11:18 AM–11:28 AM · 204 A/B Avg rating: 6.50 (6–8)
Author-provided TL;DR

A universal feature upsampling model that can be used to upsample any feature from any to any resolution and generalizes to features unseen during training.

Abstract

We introduce AnyUp, a method for feature upsampling that can be applied to any vision feature at any resolution, without encoder-specific training. Existing learning-based upsamplers for features like DINO or CLIP need to be re-trained for every feature extractor and thus do not generalize to different feature types at inference time. In this work, we propose an *inference-time* feature-agnostic upsampling architecture to alleviate this limitation and improve upsampling quality. In our experiments, AnyUp sets a new state of the art for upsampled features, generalizes to different feature types, and preserves feature semantics while being efficient and easy to apply to a wide range of downstream tasks.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

AnyUp inference-time feature upsampler generalizes across different feature types and resolutions without encoder-specific retraining.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • First feature-agnostic method for upsampling at inference time to any resolution
  • Feature-agnostic layer, windowed attention, and training strategy enabling generalization to unseen feature types
  • State-of-the-art upsampling quality while preserving feature semantics across diverse downstream tasks
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • feature upsampling
  • attention mechanisms
  • feature-agnostic architecture
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Relies on simplifying assumption that upsampled features are linear combinations of low-resolution inputs
    from the paper
  • Does not extract sub-patch-level spatial information encoded in high-dimensional channels
    from the paper
Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Explore larger, more complex upsampling models to extract additional information from patch features
    from the paper

Author keywords

  • feature upsampling
  • representation learning

Related orals

Something off? Let us know →