1. Towards a Method of Dynamic Vocal Tract Shapes Generation by Combining Static 3D and Dynamic 2D MRI Speech Data
- Author
-
Ioannis Douros, Anastasiia Tsukanova, Karyna Isaieva, Pierre-André Vuissoz, Yves Laprie, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Imagerie Adaptative Diagnostique et Interventionnelle (IADI), Université de Lorraine (UL)-Institut National de la Santé et de la Recherche Médicale (INSERM), Douros, Ioannis, and Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Lorraine (UL)
- Subjects
speech resources enrichment ,Computer science ,Image quality ,02 engineering and technology ,[INFO] Computer Science [cs] ,Set (abstract data type) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Dimension (vector space) ,vocal tract ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,Computer vision ,Spatial analysis ,MRI data ,business.industry ,Frame (networking) ,020206 networking & telecommunications ,image transformation ,Transformation (function) ,[INFO.INFO-TI] Computer Science [cs]/Image Processing [eess.IV] ,[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] ,modality transformation ,Artificial intelligence ,0305 other medical science ,business ,Vocal tract - Abstract
International audience; We present an algorithm for augmenting the shape of the vocal tract using 3D static and 2D dynamic speech MRI data. While static 3D images have better resolution and provide spatial information, 2D dynamic images capture the transitions. The aim of this work is to combine strong points of these two types of data to obtain better image quality of 2D dynamic images and extend the 2D dynamic images to the 3D domain. To produce a 3D dynamic consonant-vowel (CV) sequence, our algorithm takes as input the 2D CV transition and the static 3D targets for C and V. To obtain the enhanced sequence of images , the first step is to find a transformation between the 2D images and the mid-sagittal slice of the acoustically corresponding 3D image stack, and then find a transformation between neighbouring sagittal slices in the 3D static image stack. Combination of these transformations allows producing the final set of images. In the present study we first examined the transformation from the 3D mid-sagittal frame to the 2D video in order to improve image quality and then we examined the extension of the 2D video to the 3rd dimension with the aim to enrich spatial information.
- Published
- 2019
- Full Text
- View/download PDF