LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference

Jianhao Yuan, Fabio Pizzati, Francesco Pinto, Lars Kunze, Ivan Laptev, Paul Newman, Philip Torr, Daniele De Martini

14 Oct 2025 3 min read

AI-generated image, based on the article abstract

Quick Insight

New Test Shows How AI Videos Learn Real‑World Physics

Ever wondered if a computer can tell the difference between a ball that rolls naturally and one that flies off a table for no reason? Scientists introduced a clever, training‑free test called LikePhys that does exactly that. By feeding AI video generators pairs of short clips—one that follows the laws of physics and one that breaks them—the test measures which version the model thinks looks more plausible. Think of it like a “spot‑the‑fake” game for machines, similar to how we can instantly tell if a cup is about to tip over or not. The result, a score named Plausibility Preference Error, lines up closely with what people actually prefer, proving the AI’s “intuition” is improving. While today’s models still stumble on chaotic scenes like swirling water, they get better as they grow bigger and are given more time to think. This breakthrough brings us closer to AI that can safely simulate real‑world events, from virtual training to movie special effects. Imagine a future where every digital scene obeys the same physics we live by—because now, the machines are learning it too. Exciting, isn’t it?

Short Review

Overview

The article introduces LikePhys, an innovative evaluation method designed to assess intuitive physics understanding in video diffusion models (VDMs). This method distinguishes between valid and invalid video outputs using a likelihood-based metric known as Plausibility Preference Error (PPE). The study benchmarks twelve scenarios across various physics domains, revealing significant insights into model performance and limitations, particularly in handling complex dynamics. By systematically evaluating current VDMs, the research highlights the variations in intuitive physics understanding across different model architectures and inference settings.

Critical Evaluation

Strengths

One of the primary strengths of this study is its introduction of a training-free evaluation method that effectively measures intuitive physics understanding in VDMs. The use of a curated dataset of valid and invalid video pairs allows for a robust assessment of model performance. Furthermore, the alignment of the PPE metric with human preferences underscores its potential as a reliable evaluation tool. The systematic benchmarking across diverse physics scenarios provides a comprehensive overview of the current state of VDMs, revealing trends in performance improvements as model capacity scales.

Weaknesses

Despite its strengths, the study has notable weaknesses. The reliance on specific training data may limit the generalizability of the findings, as models may perform variably across different datasets. Additionally, many models still struggle with complex and chaotic dynamics, indicating a gap in their physical reasoning capabilities. The impact of classifier-free guidance (CFG) on performance appears minimal, suggesting that further exploration is needed to enhance intuitive physics understanding in VDMs.

Implications

The implications of this research are significant for the development of future VDMs. By identifying the limitations in current models, the study paves the way for targeted improvements in model training and evaluation methods. The findings emphasize the need for enhanced physical reasoning capabilities, which could lead to more accurate and reliable simulations in various applications, from gaming to scientific modeling.

Conclusion

In summary, the article presents a valuable contribution to the field of video diffusion models by introducing LikePhys and the Plausibility Preference Error metric. The insights gained from benchmarking intuitive physics understanding across different models highlight both the progress made and the challenges that remain. As VDMs continue to evolve, this research serves as a critical reference point for future advancements in physically plausible video generation.

Readability

The article is structured to enhance readability, with clear and concise language that facilitates understanding. Each section flows logically, allowing readers to grasp complex concepts without overwhelming jargon. This approach not only improves user engagement but also encourages further exploration of the topic.