Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

Zhuoran Jin, Hongbang Yuan, Kejian Zhu, Jiachun Li, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

Safety, Privacy & Alignment Fri, Apr 24 · 11:18 AM–11:28 AM · 203 A/B Avg rating: 6.50 (6–8)

Author-provided TL;DR

We propose Omni-Reward, a step towards universal omni-modal reward modeling with free-form preferences.

Abstract

Reward models (RMs) play a critical role in aligning AI behaviors with human preferences, yet they face two fundamental challenges: (1) Modality Imbalance, where most RMs are mainly focused on text and image modalities, offering limited support for video, audio, and other modalities; and (2) Preference Rigidity, where training on fixed binary preference pairs fails to capture the complexity and diversity of personalized preferences. To address the above challenges, we propose Omni-Reward, a step toward generalist omni-modal reward modeling with support for free-form preferences, consisting of: (1) Evaluation: We introduce Omni-RewardBench, the first omni-modal RM benchmark with free-form preferences, covering nine tasks across five modalities including text, image, video, audio, and 3D; (2) Data: We construct Omni-RewardData, a multimodal preference dataset comprising 248K general preference pairs and 69K instruction-tuning pairs for training generalist omni-modal RMs; (3) Model: We propose Omni-RewardModel, which includes both discriminative and generative RMs, and achieves strong performance on Omni-RewardBench as well as other widely used reward modeling benchmarks.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Omni-Reward addresses modality imbalance and preference rigidity with omni-modal reward modeling framework.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Omni-RewardBench: first omni-modal RM benchmark with free-form preferences across 5 modalities
Omni-RewardData: 248K preference pairs and 69K instruction-tuning pairs for multi-modal training
Omni-RewardModel: discriminative and generative RMs achieving strong performance across benchmarks
Framework supporting text, image, video, audio, and 3D modalities with diverse task coverage

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Reward modeling
Multi-modal learning
Free-form preference modeling

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Benchmark scale may not support evaluations involving millions of examples
from the paper
Current task definitions remain relatively coarse requiring finer-grained categorization
from the paper
Preference data limited to single-turn interactions without multi-turn conversational preferences
from the paper
RL technique in training Omni-RewardModel-R1 is preliminary and requires further investigation
from the paper
Missing modalities such as thermal, radar, tabular, and time-series data
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Incorporate video understanding and generation tasks into evaluation system
from the paper

Author keywords

Omni-Modal Models
Reward Models
Alignment

Something off? Let us know →

Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

Abstract

Author keywords

Related orals

LLM Fingerprinting via Semantically Conditioned Watermarks

Steering the Herd: A Framework for LLM-based Control of Social Learning

Every Language Model Has a Forgery-Resistant Signature

Gaussian certified unlearning in high dimensions: A hypothesis testing approach

Differentially Private Domain Discovery