AnyUp: Universal Feature Upsampling

Thomas Wimmer, Prune Truong, Marie-Julie Rakotosaona, Michael Oechsle, Federico Tombari, Bernt Schiele, Jan Eric Lenssen

Vision & 3D Sat, Apr 25 · 11:18 AM–11:28 AM · 204 A/B Avg rating: 6.50 (6–8)

OpenReview ↗ arXiv ↗ PDF ↗ iclr.cc ↗

Author-provided TL;DR

A universal feature upsampling model that can be used to upsample any feature from any to any resolution and generalizes to features unseen during training.

Abstract

We introduce AnyUp, a method for feature upsampling that can be applied to any vision feature at any resolution, without encoder-specific training. Existing learning-based upsamplers for features like DINO or CLIP need to be re-trained for every feature extractor and thus do not generalize to different feature types at inference time. In this work, we propose an *inference-time* feature-agnostic upsampling architecture to alleviate this limitation and improve upsampling quality. In our experiments, AnyUp sets a new state of the art for upsampled features, generalizes to different feature types, and preserves feature semantics while being efficient and easy to apply to a wide range of downstream tasks.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

AnyUp inference-time feature upsampler generalizes across different feature types and resolutions without encoder-specific retraining.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

First feature-agnostic method for upsampling at inference time to any resolution
Feature-agnostic layer, windowed attention, and training strategy enabling generalization to unseen feature types
State-of-the-art upsampling quality while preserving feature semantics across diverse downstream tasks

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

feature upsampling
attention mechanisms
feature-agnostic architecture

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Relies on simplifying assumption that upsampled features are linear combinations of low-resolution inputs
from the paper
Does not extract sub-patch-level spatial information encoded in high-dimensional channels
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Explore larger, more complex upsampling models to extract additional information from patch features
from the paper

Author keywords

feature upsampling
representation learning

Something off? Let us know →

AnyUp: Universal Feature Upsampling

Abstract

Author keywords

Related orals

Improving Diffusion Models for Class-imbalanced Training Data via Capacity Manipulation

Depth Anything 3: Recovering the Visual Space from Any Views

Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator

Radiometrically Consistent Gaussian Surfels for Inverse Rendering

True Self-Supervised Novel View Synthesis is Transferable