Back to Search Start Over

A comprehensive multimodal dataset for contactless lip reading and acoustic analysis

Authors :
Yao Ge
Chong Tang
Haobo Li
Zikang Chen
Jingyan Wang
Wenda Li
Jonathan Cooper
Kevin Chetty
Daniele Faccio
Muhammad Imran
Qammer H. Abbasi
Source :
Scientific Data, Vol 10, Iss 1, Pp 1-17 (2023)
Publication Year :
2023
Publisher :
Nature Portfolio, 2023.

Abstract

Abstract Small-scale motion detection using non-invasive remote sensing techniques has recently garnered significant interest in the field of speech recognition. Our dataset paper aims to facilitate the enhancement and restoration of speech information from diverse data sources for speakers. In this paper, we introduce a novel multimodal dataset based on Radio Frequency, visual, text, audio, laser and lip landmark information, also called RVTALL. Specifically, the dataset consists of 7.5 GHz Channel Impulse Response (CIR) data from ultra-wideband (UWB) radars, 77 GHz frequency modulated continuous wave (FMCW) data from millimeter wave (mmWave) radar, visual and audio information, lip landmarks and laser data, offering a unique multimodal approach to speech recognition research. Meanwhile, a depth camera is adopted to record the landmarks of the subject’s lip and voice. Approximately 400 minutes of annotated speech profiles are provided, which are collected from 20 participants speaking 5 vowels, 15 words, and 16 sentences. The dataset has been validated and has potential for the investigation of lip reading and multimodal speech recognition.

Subjects

Subjects :
Science

Details

Language :
English
ISSN :
20524463
Volume :
10
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Scientific Data
Publication Type :
Academic Journal
Accession number :
edsdoj.6a19b27234b20a9751230252188b5
Document Type :
article
Full Text :
https://doi.org/10.1038/s41597-023-02793-w