Monocular Normal Estimation via Shading Sequence Estimation

Zongrui Li, Xinhua Ma, Minghui Hu, Yunqing Zhao, Yingchen Yu, Qian Zheng, Chang Liu, Xudong Jiang, Song Bai

Datasets, Benchmarks & Evaluation Thu, Apr 23 · 3:51 PM–4:01 PM · 204 A/B Avg rating: 6.40 (6–8)

Abstract

Monocular normal estimation aims to estimate the normal map from a single RGB image of an object under arbitrary lights. Existing methods rely on deep models to directly predict normal maps. However, they often suffer from 3D misalignment: while the estimated normal maps may appear to have a correct appearance, the reconstructed surfaces often fail to align with the 3D geometry. We argue that this misalignment stems from the current paradigm: the model struggles to distinguish and estimate varying geometry represented in normal maps, as the differences in underlying geometry are reflected only through relatively subtle color variations. To address this issue, we propose a new paradigm that reformulates normal estimation as shading sequence estimation, where shading sequences are more sensitive to various geometry information. By learning to infer the shading sequence of an object, the model can better capture underlying 3D geometry and thereby produce more accurate normal predictions. Building on this paradigm, we present RoSE, a method that leverages image-to-video generative models to predict shading sequences, which are then converted into normal maps by solving a simple ordinary least-squares problem. To enhance robustness and better handle complex objects, RoSE is trained on a synthetic dataset, MultiShade, with diverse shapes, materials, and light conditions. Experiments demonstrate that RoSE achieves state-of-the-art performance on both synthetic and real-world benchmark datasets for object-based monocular normal estimation.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

RoSE estimates surface normals via shading sequence prediction, addressing 3D misalignment in monocular normal estimation.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

New paradigm reformulating monocular normal estimation as shading sequence estimation for better geometry capture
Leverages image-to-video generative models to predict shading sequences converted to normals via OLS solver
Trains on MultiShade synthetic dataset with diverse shapes, materials and light conditions for robustness

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Diffusion models
Image-to-video generation
Ordinary least squares solving
Synthetic dataset training

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

MultiShade
DiLiGenT
LUCES

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Video diffusion models introduce computational overhead limiting real-time applicability
from the paper
Struggles under extreme lighting conditions with large regions of insufficient illumination
from the paper
Fails on transparent or semi-transparent objects
from the paper
Primary evaluation object-centric, scene-centric extension remains open
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Reduce computational overhead of video diffusion models for real-time use
from the paper
Improve handling of extreme lighting conditions and insufficient illumination regions
from the paper
Support transparent and semi-transparent object normal estimation
from the paper
Extend to scene-centric settings beyond single object focus
from the paper

Author keywords

Video Diffusion Model
Shading Estimation
Single-view Normal Estimation

Something off? Let us know →

Monocular Normal Estimation via Shading Sequence Estimation

Abstract

Author keywords

Related orals

On the Wasserstein Geodesic Principal Component Analysis of probability measures

TabStruct: Measuring Structural Fidelity of Tabular Data

TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems

World-In-World: World Models in a Closed-Loop World

EditBench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits