RealPDEBench: A Benchmark for Complex Physical Systems with Real-World Data

Peiyan Hu, Haodong Feng, Hongyuan Liu, Tongtong Yan, Wenhao Deng, Tianrun Gao, Rong Zheng, Haoren Zheng, Chenglei Yu, Chuanrui Wang, Kaiwen Li, Zhi-Ming Ma, Dezhi Zhou, Xingcai Lu, Dixia Fan, Tailin Wu

Datasets, Benchmarks & Evaluation Sat, Apr 25 · 3:15 PM–3:25 PM · 201 C Avg rating: 7.50 (4–10)

OpenReview ↗ arXiv ↗ PDF ↗ iclr.cc ↗

Author-provided TL;DR

We propose the first benchmark for complex physical systems with paired real-world data and simulated data, and explore how to bridge simulated and real-world data.

Abstract

Predicting the evolution of complex physical systems remains a central problem in science and engineering. Despite rapid progress in scientific Machine Learning (ML) models, a critical bottleneck is the lack of expensive real-world data, resulting in most current models being trained and validated on simulated data. Beyond limiting the development and evaluation of scientific ML, this gap also hinders research into essential tasks such as sim-to-real transfer. We introduce RealPDEBench, the first benchmark for scientific ML that integrates real-world measurements with paired numerical simulations. RealPDEBench consists of five datasets, three tasks, nine metrics, and ten baselines. We first present five real-world measured datasets with paired simulated datasets across different complex physical systems. We further define three tasks, which allow comparisons between real-world and simulated data, and facilitate the development of methods to bridge the two. Moreover, we design nine evaluation metrics, spanning data-oriented and physics-oriented metrics, and finally benchmark ten representative baselines, including state-of-the-art models, pretrained PDE foundation models, and a traditional method. Experiments reveal significant discrepancies between simulated and real-world data, while showing that pretraining with simulated data consistently improves both accuracy and convergence. In this work, we hope to provide insights from real-world data, advancing scientific ML toward bridging the sim-to-real gap and real-world deployment. Our benchmark, datasets, and instructions are available at https://realpdebench.github.io/.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

RealPDEBench first benchmark integrating real-world measurements with paired simulations across five physical systems for scientific ML evaluation.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

First real-world scientific ML benchmark combining real-world and simulated data for complex physical systems
Three categories of tasks, nine evaluation metrics, and ten baselines for comprehensive assessment
Evidence that pretraining with simulated data improves both accuracy and convergence on real-world data

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

physics-informed machine learning
sim-to-real transfer

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

five real-world measured datasets with paired simulations

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Benchmark scope limited to specific domains without coverage of electromagnetics, structural mechanics, or aerodynamics
from the paper
No dedicated metrics for Combustion system
from the paper
Does not systematically explore strong out-of-distribution regimes
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Extend to additional physical domains and introduce domain-specific metrics
from the paper
Design out-of-distribution tasks and maintain long-term development toward more data and models
from the paper

Author keywords

complex physical system
PDE
benchmark
real-world data
prediction

Something off? Let us know →

RealPDEBench: A Benchmark for Complex Physical Systems with Real-World Data

Abstract

Author keywords

Related orals

On the Wasserstein Geodesic Principal Component Analysis of probability measures

TabStruct: Measuring Structural Fidelity of Tabular Data

Monocular Normal Estimation via Shading Sequence Estimation

TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems

World-In-World: World Models in a Closed-Loop World