Vision & 3D

Computer vision, 3D reconstruction, NeRF, Gaussian splatting, video understanding, image generation.

All papers

Min rating

Sort

AnyUp: Universal Feature Upsampling

AnyUp inference-time feature upsampler generalizes across different feature types and resolutions without encoder-specific retraining.

Avg rating: 6.50 (6–8) · Thomas Wimmer et al.

Depth Anything 3: Recovering the Visual Space from Any Views

DA3 predicts spatially consistent 3D geometry from arbitrary camera views using plain transformer and depth-ray targets.

Avg rating: 7.00 (6–8) · Haotong Lin et al.

DTO-KD: Dynamic Trade-off Optimization for Effective Knowledge Distillation

DTO-KD uses multi-objective optimization to dynamically balance task and distillation losses at gradient level for better knowledge distillation.

Avg rating: 6.67 (6–8) · Zeeshan Hayder et al.

Improving Diffusion Models for Class-imbalanced Training Data via Capacity Manipulation

Capacity manipulation improves diffusion models' handling of class-imbalanced data by reserving capacity for minority classes via low-rank decomposition.

Avg rating: 6.00 (6–6) · Feng Hong et al.

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

Introduces parallel decoding for autoregressive image generation with flexible ordering achieving 3.4x latency reduction.

Avg rating: 7.00 (6–8) · Zhuoyang Zhang et al.

Navigating the Latent Space Dynamics of Neural Models

Interprets neural autoencoders as dynamical systems with latent vector fields to analyze generalization, memorization, and out-of-distribution detection.

Avg rating: 6.50 (6–8) · Marco Fumero et al.

Plug-and-Play Compositionality for Boosting Continual Learning with Foundation Models

Proposes CompSLOT framework extracting interpretable concepts from vision transformers to enhance continual learning.

Avg rating: 5.33 (4–6) · Weiduo Liao et al.

Radiometrically Consistent Gaussian Surfels for Inverse Rendering

RadioGS introduces radiometric consistency supervision for inverse rendering to accurately model indirect illumination in Gaussian-based representations.

Avg rating: 5.00 (2–6) · Kyu Beom Han et al.

Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI

PRISM framework projects fMRI signals into structured text space for visual stimulus reconstruction with object-centric diffusion and attribute search modules.

Avg rating: 6.00 (4–8) · Zheng Huang et al.

Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator

VIST3A stitches text-to-video models with 3D reconstruction systems and aligns them via reward finetuning for high-quality text-to-3D generation.

Avg rating: 8.00 (8–8) · Hyojun Go et al.

True Self-Supervised Novel View Synthesis is Transferable

Presents XFactor, first geometry-free self-supervised model for transferable novel view synthesis without 3D inductive biases.

Avg rating: 6.00 (4–8) · Thomas Mitchel et al.

WAFT: Warping-Alone Field Transforms for Optical Flow

WAFT replaces cost volumes with high-resolution warping for optical flow, ranking first on Spring, Sintel, and KITTI with 1.3-4.1x faster inference.

Avg rating: 6.67 (6–8) · Yihan Wang et al.