1. Self-supervised reconstruction and synthesis of faces
- Author
-
Tewari, Ayush
- Abstract
Photorealistic and semantically controllable digital models of human faces are important for a wide range of applications such as movies, virtual reality, and casual photography. Traditional approaches require expensive setups which capture the person from multiple cameras under different illumination conditions. Recent approaches have also explored digitizing faces under less constrained settings, even from a single image of the person. These approaches rely on priors, commonly known as 3D morphable models (3DMMs), which are learned from datasets of 3D scans. This thesis pushes the state of the art in high-quality 3D reconstruction of faces from monocular images. A model-based face autoencoder architecture is introduced which integrates convolutional neural networks, 3DMMs, and differentiable rendering for self-supervised training on large image datasets. This architecture is extended to enable the refinement of a pretrained 3DMM just from a dataset of monocular images, allowing for higher-quality reconstructions. In addition, this thesis demonstrates the learning of the identity components of a 3DMM directly from videos without using any 3D data. Since videos are more readily available, this model can generalize better compared to the models learned from limited 3D scans. This thesis also presents methods for the photorealistic editing of portrait images. In contrast to traditional approaches, the presented methods do not rely on any supervised training. Self-supervised editing is achieved by integrating the semantically meaningful 3DMM-based monocular reconstructions with a pretrained and fixed generative adversarial network. While this thesis presents several ideas which enable self-supervised learning for the reconstruction and synthesis of faces, several open challenges remain. These challenges, as well as an outlook for future work are also discussed.
- Published
- 2021