PhyCAGE: Physically Plausible Compositional 3D Asset Generation from a Single Image

1Shanghai Jiao Tong University 2Tencent XR Vision Labs
*: Work done during internship at Tencent XR Vision Labs

PhyCAGE generates physically plausible compositional 3D assets from a single image.

Abstract

We present PhyCAGE, the first approach for Physically plausible Compositional 3D Asset Generation from a single Image. Given an input image, we first generate consistent multi-view images for components of the assets. These images are then fitted with 3D Gaussian Splatting representations. To ensure that the Gaussians representing objects are physically compatible with each other, we introduce a Physical Simulation-Enhanced Score Distillation Sampling (PSE-SDS) technique to further optimize the positions of the Gaussians. It is achieved by setting the gradient of the SDS loss as the initial velocity of the physical simulation, allowing the simulator to act as a physics-guided optimizer that progressively corrects the Gaussians' positions to a physically compatible state. Experimental results demonstrate that the proposed method can generate physically plausible compositional 3D assets given a single image.

Pipeline

Interpolate start reference image.

Given an input image, we first generate consistent multi-view images for the components of the assets.
Then, we fit multi-view images with 3D Gaussian Splatting representations.
Finally, we introduce a Physical Simulation-Enhanced SDS to further optimize the positions of the Gaussians.

Physical Simulation-Enhanced Loss

Interpolate start reference image.

Comparison with Baseline

Interpolate start reference image.

Gallery

"An axe. A red cloth."

axe

"A baby yoda in white. Clothes."

babyyoda

"A bottle with a stopper. A red cloth."

bottle

"A fox. Clothes."

fox

"A frog. A red cloth."

frog

"A plant. A flowerpot."

plant

"A shiba. A red collar."

shiba

"A clean smooth skull with no cracks, grey, no light, dark. A helmet."

skeleton

"A straw hat. Red ribbons."

strawhat

"An open top trash can. A black plastic."

trash

Physical Simulation

Elastic body simulation (the red cloth):

elastic

Rigid body simulation (the helmet):

rigid

BibTeX


      @misc{yan2024phycage,
        title={PhyCAGE: Physically Plausible Compositional 3D Asset Generation from a Single Image}, 
        author={Han Yan and Mingrui Zhang and Yang Li and Chao Ma and Pan Ji},
        year={2024},
        eprint={2411.18548},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2411.18548}, 
    }