AnyUp: Universal Feature Upsampling
AnyUp inference-time feature upsampler generalizes across different feature types and resolutions without encoder-specific retraining.
Computer vision, 3D reconstruction, NeRF, Gaussian splatting, video understanding, image generation.
AnyUp inference-time feature upsampler generalizes across different feature types and resolutions without encoder-specific retraining.
DA3 predicts spatially consistent 3D geometry from arbitrary camera views using plain transformer and depth-ray targets.
DTO-KD uses multi-objective optimization to dynamically balance task and distillation losses at gradient level for better knowledge distillation.
Capacity manipulation improves diffusion models' handling of class-imbalanced data by reserving capacity for minority classes via low-rank decomposition.
Introduces parallel decoding for autoregressive image generation with flexible ordering achieving 3.4x latency reduction.
Interprets neural autoencoders as dynamical systems with latent vector fields to analyze generalization, memorization, and out-of-distribution detection.
Proposes CompSLOT framework extracting interpretable concepts from vision transformers to enhance continual learning.
RadioGS introduces radiometric consistency supervision for inverse rendering to accurately model indirect illumination in Gaussian-based representations.
PRISM framework projects fMRI signals into structured text space for visual stimulus reconstruction with object-centric diffusion and attribute search modules.
VIST3A stitches text-to-video models with 3D reconstruction systems and aligns them via reward finetuning for high-quality text-to-3D generation.
Presents XFactor, first geometry-free self-supervised model for transferable novel view synthesis without 3D inductive biases.
WAFT replaces cost volumes with high-resolution warping for optical flow, ranking first on Spring, Sintel, and KITTI with 1.3-4.1x faster inference.