Back to Search Start Over

TSNeRF: Text-driven stylized neural radiance fields via semantic contrastive learning.

Authors :
Wang, Yi
Cheng, Jing-Song
Feng, Qiao
Tao, Wen-Yuan
Lai, Yu-Kun
Li, Kun
Source :
Computers & Graphics. Nov2023, Vol. 116, p102-114. 13p.
Publication Year :
2023

Abstract

3D scene stylization aims to generate impressive stylized images from arbitrary novel views based on the stylistic reference. Existing image-driven 3D scene stylization methods require a specific style reference to be given, and lack the ability to produce diverse stylization results by combining style information from different aspects. In this paper, we propose a text-driven 3D scene stylization method based on semantic contrast learning, which takes Neural Radiance Fields (NeRF) as the 3D scene representation and generates diverse 3D stylized scenes by leveraging the semantic capabilities of the Contrastive Language-Image Pre-Training (CLIP) model. For comprehensively exploiting the semantic knowledge to generate finely stylized results, we design a CLIP-based semantic contrast estimation loss, which can avoid the global stylistic inconsistency caused by the NeRF ray sampling method and avoid the tendency to stylize neutral descriptions due to the semantic averaging of the CLIP space. In addition, to reduce the memory burden arising from NeRF ray sampling, we propose a novel ray sampling method with gradient accumulation to optimize the NeRF rendering process. The experimental results indicate that our method generates high-quality and plausible results with cross-view consistency. Moreover, our method enables the creation of new styles that match the target text by combining multiple domains. The code will be available at. • The text-driven 3D implicit stylization method can intuitively and diversely stylize 3D scenes. • The stylization transfer direction of the CLIP semantic space can be controlled by using the contrast learning method, which can more accurately stylize the 3D scene. • Gradient accumulation for NeRF field ray sampling ensures that the stylization losses are calculated with full-size rendered images, and prevent the memory overload caused by NeRF intensive sampling. • The target stylized semantics are fine-tuned on a specialized art database by using a nearest-neighbor semantic similarity searcher, which can generate multi-domain stylized scenes based on a specialized art database. [Display omitted] [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00978493
Volume :
116
Database :
Academic Search Index
Journal :
Computers & Graphics
Publication Type :
Academic Journal
Accession number :
174061408
Full Text :
https://doi.org/10.1016/j.cag.2023.08.009