1 Carnegie Mellon University 2 NEC Labs America 3 UC San Diego
TL;DR PhyCo learns controllable physical priors — friction, restitution, deformation, and force — from simple block-sliding and ball-bouncing simulations, enabling physically grounded and continuously controllable video generation without any simulator at inference.
A two-stage pipeline: physics-supervised ControlNet fine-tuning on simulation data, followed by VLM-guided reward optimization for physical consistency.
Photorealistic block-sliding, ball-bouncing, and collision videos rendered with Kubric & PyBullet, with systematically varied physical properties.
A ControlNet conditioned on pixel-aligned physical property maps is trained on top of a frozen Cosmos-Predict2 video diffusion backbone.
A fine-tuned Qwen2.5-VL evaluates generated videos with targeted physics questions, providing differentiable feedback to improve consistency.
@misc{narayanan2026phyco,
title = {PhyCo: Learning Controllable Physical Priors for Generative Motion},
author = {Narayanan, Sriram and Jiang, Ziyu and Narasimhan, Srinivasa G. and Chandraker, Manmohan},
year = {2026},
}
This work was partially conducted during Sriram's internship at NEC Labs America, and was supported in part by NSF grants IIS-2107236 and IIS-2513219. We also thank Kausik Sivakumar and Yug Ajmera for their insightful discussions.