Author: "Carolina Sánchez-García" / Topic: speech perception - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Carolina Sánchez-García"' showing total 5 results

Start Over Author "Carolina Sánchez-García" Topic speech perception

5 results on '"Carolina Sánchez-García"'

1. The Time Course of Audio-Visual Phoneme Identification: a High Temporal Resolution Study

Author: Sonia Kandel, Salvador Soto-Faraco, Christophe Savariaux, Carolina Sánchez-García, Universitat Pompeu Fabra [Barcelona] (UPF), GIPSA - Voix Systèmes Linguistiques et Dialectologie (GIPSA-VSLD), Département Parole et Cognition (GIPSA-DPC), Grenoble Images Parole Signal Automatique (GIPSA-lab ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Grenoble Images Parole Signal Automatique (GIPSA-lab ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), GIPSA-Services (GIPSA-Services), and Institució Catalana de Recerca i Estudis Avançats (ICREA)
Subjects: Consonant, Speech perception, Cognitive Neuroscience, media_common.quotation_subject, Speech recognition, Experimental and Cognitive Psychology, speech perception, 050105 experimental psychology, 03 medical and health sciences, [SCCO]Cognitive science, 0302 clinical medicine, Perception, 0501 psychology and cognitive sciences, media_common, Communication, Modality (human–computer interaction), business.industry, multisensory integration, 05 social sciences, Multisensory integration, Audio-visual, [SCCO.LING]Cognitive science/Linguistics, Speech processing, Sensory Systems, Ophthalmology, Task (computing), Identification (information), gating, [SCCO.PSYC]Cognitive science/Psychology, Computer Vision and Pattern Recognition, business, Psychology, 030217 neurology & neurosurgery
Abstract: International audience; Speech unfolds in time and, as a consequence, its perception requires temporal integration. Yet, studies addressing audio-visual speech processing have often overlooked this temporal aspect. Here, we address the temporal course of audio-visual speech processing in a phoneme identification task using a Gating paradigm. We created disyllabic Spanish word-like utterances (e.g., /pafa/, /paθa/, …) from high-speed camera recordings. The stimuli differed only in the middle consonant (/f/, /θ/, /s/, /r/, /g/), which varied in visual and auditory saliency. As in classical Gating tasks, the utterances were presented in fragments of increasing length (gates), here in 10 ms steps, for identification and confidence ratings. We measured correct identification as a function of time (at each gate) for each critical consonant in audio, visual and audio-visual conditions, and computed the Identification Point and Recognition Point scores. The results revealed that audio-visual identification is a time-varying process that depends on the relative strength of each modality (i.e., saliency). In some cases, audio-visual identification followed the pattern of one dominant modality (either A or V), when that modality was very salient. In other cases, both modalities contributed to identification, hence resulting in audio-visual advantage or interference with respect to unimodal conditions. Both unimodal dominance and audio-visual interaction patterns may arise within the course of identification of the same utterance, at different times. The outcome of this study suggests that audio-visual speech integration models should take into account the time-varying nature of visual and auditory saliency.
Published: 2018
Full Text: View/download PDF

2. Visual information constrains early and late stages of spoken-word recognition in sentence context

Author: Angèle Brunellière, Salvador Soto-Faraco, Carolina Sánchez-García, Nara Ikumi, Unité de Recherche en Sciences Cognitives et Affectives (URECA), and Université de Lille, Sciences Humaines et Sociales-PRES Université Lille Nord de France
Subjects: Male, Visual perception, Data Interpretation, Speech recognition, Word processing, Psycholinguistics, Visual speech, 0302 clinical medicine, Evoked Potentials, Auditory, General Neuroscience, 05 social sciences, Electroencephalography, Statistical, Fixation, Spoken-word recognition, Semantics, Semantic constraints, Neuropsychology and Physiological Psychology, Data Interpretation, Statistical, [SCCO.PSYC]Cognitive science/Psychology, Evoked Potentials, Auditory, Speech Perception, Visual Perception, Female, Psychology, Comprehension, Sentence, Event-related potentials, Adult, Speech perception, Adolescent, Context (language use), Fixation, Ocular, Recognition (Psychology), 050105 experimental psychology, 03 medical and health sciences, Young Adult, Phonetics, Physiology (medical), Ocular, Humans, 0501 psychology and cognitive sciences, Communication, business.industry, [SCCO.NEUR]Cognitive science/Neuroscience, Recognition, Psychology, N400, Reading, Acoustic Stimulation, Word recognition, business, 030217 neurology & neurosurgery, Photic Stimulation
Abstract: Audiovisual speech perception has been frequently studied considering phoneme, syllable and word processing levels. Here, we examined the constraints that visual speech information might exert during the recognition of words embedded in a natural sentence context. We recorded event-related potentials (ERPs) to words that could be either strongly or weakly predictable on the basis of the prior semantic sentential context and, whose initial phoneme varied in the degree of visual saliency from lip movements. When the sentences were presented audio-visually (Experiment 1), words weakly predicted from semantic context elicited a larger long-lasting N400, compared to strongly predictable words. This semantic effect interacted with the degree of visual saliency over a late part of the N400. When comparing audio-visual versus auditory alone presentation (Experiment 2), the typical amplitude-reduction effect over the auditory-evoked N100 response was observed in the audiovisual modality. Interestingly, a specific benefit of high- versus low-visual saliency constraints occurred over the early N100 response and at the late N400 time window, confirming the result of Experiment 1. Taken together, our results indicate that the saliency of visual speech can exert an influence over both auditory processing and word recognition at relatively late stages, and thus suggest strong interactivity between audio-visual integration and other (arguably higher) stages of information processing during natural speech comprehension. This research was supported by the Spanish Ministry of Science and Innovation (PSI2010-15426 and Consolider INGENIO CSD2007-00012), Comissionat per a Universitats i Recerca del DIUE-Generalitat de Catalunya (SGR2009-092), and the European Research Council (StG-2010263145).
Published: 2013
Full Text: View/download PDF

3. Cross-modal prediction in speech depends on prior linguistic experience

Author: Salvador Soto-Faraco, Carolina Sánchez-García, and James T. Enns
Subjects: Male, Predictive coding, Matching (statistics), Speech perception, media_common.quotation_subject, First language, Representation (arts), 050105 experimental psychology, Young Adult, 03 medical and health sciences, 0302 clinical medicine, Stimulus modality, Perception, Multisensory integration, Humans, Learning, Speech, 0501 psychology and cognitive sciences, Language, media_common, General Neuroscience, 05 social sciences, Speech processing, Linguistics, Audiovisual speech, Visual Perception, Female, Psychology, 030217 neurology & neurosurgery
Abstract: The sight of a speaker's facial movements during the perception of a spoken message can benefit speech processing through online predictive mechanisms. Recent evidence suggests that these predictive mechanisms can operate across sensory modalities, that is, vision and audition. However, to date, behavioral and electrophysiological demonstrations of cross-modal prediction in speech have considered only the speaker's native language. Here, we address a question of current debate, namely whether the level of representation involved in cross-modal prediction is phonological or pre-phonological. We do this by testing participants in an unfamiliar language. If cross-modal prediction is predominantly based on phonological representations tuned to the phonemic categories of the native language of the listener, then it should be more effective in the listener's native language than in an unfamiliar one. We tested Spanish and English native speakers in an audiovisual matching paradigm that allowed us to evaluate visual-to-auditory prediction, using sentences in the participant's native language and in an unfamiliar language. The benefits of cross-modal prediction were only seen in the native language, regardless of the particular language or participant's linguistic background. This pattern of results implies that cross-modal visual-to-auditory prediction during speech processing makes strong use of phonological representations, rather than low-level spatiotemporal correlations across facial movements and sounds.
Published: 2013

4. Time course of audio–visual phoneme identification: A cross-modal gating study

Author: Nara Ikumi, Sonia Kandel, Christophe Savariaux, Carolina Sánchez-García, and Salvador Soto-Faraco
Subjects: Consonant, Communication, Modality (human–computer interaction), Speech perception, business.industry, Cognitive Neuroscience, Speech recognition, Experimental and Cognitive Psychology, Degree (music), Sensory Systems, Task (project management), Ophthalmology, Identification (information), Modal, Salient, Computer Vision and Pattern Recognition, business
Abstract: When both present, visual and auditory information are combined in order to decode the speech signal. Past research has addressed to what extent visual information contributes to distinguish confusable speech sounds, but usually ignoring the continuous nature of speech perception. Here we tap at the temporal course of the contribution of visual and auditory information during the process of speech perception. To this end, we designed an audio–visual gating task with videos recorded with high speed camera. Participants were asked to identify gradually longer fragments of pseudowords varying in the central consonant. Different Spanish consonant phonemes with different degree of visual and acoustic saliency were included, and tested on visual-only, auditory-only and audio–visual trials. The data showed different patterns of contribution of unimodal and bimodal information during identification, depending on the visual saliency of the presented phonemes. In particular, for phonemes which are clearly more salient in one modality than the other, audio–visual performance equals that of the best unimodal. In phonemes with more balanced saliency, audio–visual performance was better than both unimodal conditions. These results shed new light on the temporal course of audio–visual speech integration.
Published: 2012
Full Text: View/download PDF

5. Cross-Modal Prediction in Speech Perception

Author: Salvador Soto-Faraco, James T. Enns, Agnès Alsius, and Carolina Sánchez-García
Subjects: Male, genetic structures, Visual System, Speech recognition, lcsh:Medicine, Social and Behavioral Sciences, computer.software_genre, 0302 clinical medicine, Psychology, lcsh:Science, Audio signal processing, media_common, Multidisciplinary, 05 social sciences, Experimental Psychology, Sensory Systems, Sound, Auditory System, Speech Perception, Sensory Perception, Female, psychological phenomena and processes, Research Article, Speech perception, media_common.quotation_subject, Sensory system, Context (language use), Biology, Models, Biological, 050105 experimental psychology, Young Adult, 03 medical and health sciences, Stimulus modality, Perception, otorhinolaryngologic diseases, Speech, Humans, 0501 psychology and cognitive sciences, Vision, Ocular, Behavior, Modality (human–computer interaction), Modalities, Verbal Behavior, lcsh:R, Cognitive Psychology, Linguistics, Face, lcsh:Q, computer, Photic Stimulation, 030217 neurology & neurosurgery, Neuroscience
Abstract: Speech perception often benefits from vision of the speaker's lip movements when they are available. One potential mechanism underlying this reported gain in perception arising from audio-visual integration is on-line prediction. In this study we address whether the preceding speech context in a single modality can improve audiovisual processing and whether this improvement is based on on-line information-transfer across sensory modalities. In the experiments presented here, during each trial, a speech fragment (context) presented in a single sensory modality (voice or lips) was immediately continued by an audiovisual target fragment. Participants made speeded judgments about whether voice and lips were in agreement in the target fragment. The leading single sensory context and the subsequent audiovisual target fragment could be continuous in either one modality only, both (context in one modality continues into both modalities in the target fragment) or neither modalities (i.e., discontinuous). The results showed quicker audiovisual matching responses when context was continuous with the target within either the visual or auditory channel (Experiment 1). Critically, prior visual context also provided an advantage when it was cross-modally continuous (with the auditory channel in the target), but auditory to visual cross-modal continuity resulted in no advantage (Experiment 2). This suggests that visual speech information can provide an on-line benefit for processing the upcoming auditory input through the use of predictive mechanisms. We hypothesize that this benefit is expressed at an early level of speech analysis. This research was supported by the Spanish Ministry of Science and Innovation (PSI2010-15426 and Consolider INGENIO CSD2007-00012), Comissionat per a Universitats i Recerca del DIUE-Generalitat de Catalunya (SRG2009-092 and PIV2009-00122), and the European Research Council (StG-2010 263145).
Published: 2011
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results on '"Carolina Sánchez-García"'

1. The Time Course of Audio-Visual Phoneme Identification: a High Temporal Resolution Study

2. Visual information constrains early and late stages of spoken-word recognition in sentence context

3. Cross-modal prediction in speech depends on prior linguistic experience

4. Time course of audio–visual phoneme identification: A cross-modal gating study

5. Cross-Modal Prediction in Speech Perception

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

5 results on '"Carolina Sánchez-García"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources