Back to Search Start Over

Developmental Learning of Audio-Visual Integration From Facial Gestures Of a Social Robot

Authors :
Oriane Dermy
Sofiane Boucenna
Alex Pitti
Arnaud Blanchard
Lifelong Autonomy and interaction skills for Robots in a Sensing ENvironment (LARSEN)
Inria Nancy - Grand Est
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Complex Systems, Artificial Intelligence & Robotics (LORIA - AIS)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Equipes Traitement de l'Information et Systèmes (ETIS - UMR 8051)
Ecole Nationale Supérieure de l'Electronique et de ses Applications (ENSEA)-Centre National de la Recherche Scientifique (CNRS)-CY Cergy Paris Université (CY)
Pitti, Alexandre
Source :
HAL
Publication Year :
2019
Publisher :
HAL CCSD, 2019.

Abstract

We present a robot head with facial gestures, audio and vision capabilities toward the emergence of infant-like social features. For this, we propose a neural architecture that integrates these three modalities following a developmental stage with social interaction with a caregiver. During dyadic interaction with the experimenter, the robot learns to categorize audio-speech gestures of vowels /a/, /i/, /o/ as a baby would do it, by linking someone-else facial expressions to its own movements. We show that multimodal integration in the neural network is more robust than unimodal learning so that it compensates erroneous or noisy information coming from each modality. Therefore, facial mimicry with a partner can be reproduced using redundant audiovisual signals or noisy information from one modality only. Statistical experiments on 24 naive participants show the robustness of our algorithm during human-robot interactions in public environment where many people move and talk all the time. We then discuss our model in the light of human-robot communication, the development of social skills and language in infants.

Details

Language :
English
Database :
OpenAIRE
Journal :
HAL
Accession number :
edsair.dedup.wf.001..3e1c0690378a1c46d3d8a6b1f6554c8e