MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
- Xiangjun Gao 1
- Jiaolong Yang 2
- Jongyoo Kim 2
- Sida Peng 3
- Zicheng Liu 4
- Xin Tong 2
- 1 Beijing Institue of Technology
- 2 Microsoft Research Asia
- 3 Zhejiang University
- 4 Microsoft Azure AI
Abstract
There has been rapid progress recently on 3D human rendering, including novel view synthesis and pose animation, based on the advances of neural radiance fields (NeRF). However, most existing methods focus on person-specific training and their training typically requires multi-view videos. This paper deals with a new challenging task – rendering novel views and novel poses for a person unseen in training, using only multiview still images as input without videos. For this task, we propose a simple yet surprisingly effective method to train a generalizable NeRF with multiview images as conditional input. The key ingredient is a dedicated representation combining a canonical NeRF and a volume deformation scheme. Using a canonical space enables our method to learn shared properties of human and easily generalize to different people. Volume deformation is used to connect the canonical space with input and target images and query image features for radiance and density prediction. We leverage the parametric 3D human model fitted on the input images to derive the deformation, which works quite well in practice when combined with our canonical NeRF. The experiments on both real and synthetic data with the novel view synthesis and pose animation tasks collectively demonstrate the efficacy of our method.