Journal: plos one / Publication Year Range: Last 3 years / Topic: speech perception - Searchworks@Jio Institute Digital Library Search Results

Showing total 25 results

Start Over Topic speech perception Publication Year Range Last 3 years Journal plos one

25 results

1. Combined spectral and speech features for pig speech recognition.

Author: Wu, Xuan, Zhou, Silong, Chen, Mingwei, Zhao, Yihang, Wang, Yifei, Zhao, Xianmeng, Li, Danyang, and Pu, Haibo
Subjects: SPEECH perception, SPEECH, SWINE, COMPUTER vision, EMOTIONAL state, SPECTROGRAMS, SWINE breeding
Abstract: The sound of the pig is one of its important signs, which can reflect various states such as hunger, pain or emotional state, and directly indicates the growth and health status of the pig. Existing speech recognition methods usually start with spectral features. The use of spectrograms to achieve classification of different speech sounds, while working well, may not be the best approach for solving such tasks with single-dimensional feature input. Based on the above assumptions, in order to more accurately grasp the situation of pigs and take timely measures to ensure the health status of pigs, this paper proposes a pig sound classification method based on the dual role of signal spectrum and speech. Spectrograms can visualize information about the characteristics of the sound under different time periods. The audio data are introduced, and the spectrogram features of the model input as well as the audio time-domain features are complemented with each other and passed into a pre-designed parallel network structure. The network model with the best results and the classifier were selected for combination. An accuracy of 93.39% was achieved on the pig speech classification task, while the AUC also reached 0.99163, demonstrating the superiority of the method. This study contributes to the direction of computer vision and acoustics by recognizing the sound of pigs. In addition, a total of 4,000 pig sound datasets in four categories are established in this paper to provide a research basis for later research scholars. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

2. The perception of Mandarin speech conveying communicative functions in Chinese heroin addicts.

Author: Geng P, Fan N, Ling R, Guo H, Lu Q, and Chen X
Subjects: Humans, Male, Speech, Heroin, China, Speech Perception, Substance-Related Disorders
Abstract: Drug addiction can cause severe damage to the human brain, leading to significant problems in cognitive processing, such as irritability, speech distortions, and exaggeration of negative stimuli. Speech plays a fundamental role in social interaction, including both the production and perception. The ability to perceive communicative functions conveyed through speech is crucial for successful interpersonal communication and the maintaining good social relationships. However, due to the limited number of previous studies, it remains unclear whether the cognitive disorder caused by drug addiction affects the perception of communicative function conveyed in Mandarin speech. To address this question, we conducted a perception experiment involving sixty male participants, including 25 heroin addicts and 35 healthy controls. The experiment aimed to examine the perception of three communicative functions (i.e., statement, interrogative, and imperative) under three background noise conditions (i.e., no noise, SNR [Signal to Noise Ratio] = 10, and SNR = 0). Eight target sentences were first recorded by two native Mandarin speakers for each of the three communicative functions. Each half was then combined with Gaussian White Noise under two background noise conditions (i.e., SNR = 10 and SNR = 0). Finally, 48 speech stimuli were included in the experiment with four options provided for perceptual judgment. The results showed that, under the three noise conditions, the average perceptual accuracies of the three communicative functions were 80.66% and 38% for the control group and the heroin addicts, respectively. Significant differences were found in the perception of the three communicative functions between the control group and the heroin addicts under the three noise conditions, except for the recognition of imperative under strong noise condition (i.e., SNR = 0). Moreover, heroin addicts showed good accuracy (around 50%) in recognizing imperative and poor accuracy (i.e., lower than the chance level) in recognizing interrogative. This paper not only fills the research gap in the perception of communicative functions in Mandarin speech among drug addicts but also enhances the understanding of the effects of drugs on speech perception and provides a foundation for the speech rehabilitation of drug addicts., Competing Interests: NO authors have competing interests., (Copyright: © 2024 Geng et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

3. Effective mitigation of the belief perseverance bias after the retraction of misinformation: Awareness training and counter-speech.

Author: Siebert, Jana and Siebert, Johannes Ulrich
Subjects: MISINFORMATION, ATTITUDE change (Psychology), AWARENESS, SPEECH perception
Abstract: The spread and influence of misinformation have become a matter of concern in society as misinformation can negatively impact individuals' beliefs, opinions and, consequently, decisions. Research has shown that individuals persevere in their biased beliefs and opinions even after the retraction of misinformation. This phenomenon is known as the belief perseverance bias. However, research on mitigating the belief perseverance bias after the retraction of misinformation has been limited. Only a few debiasing techniques with limited practical applicability have been proposed, and research on comparing various techniques in terms of their effectiveness has been scarce. This paper contributes to research on mitigating the belief perseverance bias after the retraction of misinformation by proposing counter-speech and awareness-training techniques and comparing them in terms of effectiveness to the existing counter-explanation technique in an experiment with N = 251 participants. To determine changes in opinions, the extent of the belief perseverance bias and the effectiveness of the debiasing techniques in mitigating the belief perseverance bias, we measure participants' opinions four times in the experiment by using Likert items and phi-coefficient measures. The effectiveness of the debiasing techniques is assessed by measuring the difference between the baseline opinions before exposure to misinformation and the opinions after exposure to a debiasing technique. Further, we discuss the efforts of the providers and recipients of debiasing and the practical applicability of the debiasing techniques. The CS technique, with a very large effect size, is the most effective among the three techniques. The CE and AT techniques, with medium effect sizes, are close to being equivalent in terms of their effectiveness. The CS and AT techniques are associated with less cognitive and time effort of the recipients of debiasing than the CE technique, while the AT and CE techniques require less effort from the providers of debiasing than the CS technique. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

4. Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network.

Author: Li, Zhenqing, Basit, Abdul, Daraz, Amil, and Jan, Atif
Subjects: RECURRENT neural networks, ARTIFICIAL neural networks, INTELLIGIBILITY of speech, SPEECH enhancement, CONVOLUTIONAL neural networks, SPEECH perception, GENERATIVE adversarial networks, AUTOMATIC speech recognition
Abstract: Long short-term memory (LSTM) has been effectively used to represent sequential data in recent years. However, LSTM still struggles with capturing the long-term temporal dependencies. In this paper, we propose an hourglass-shaped LSTM that is able to capture long-term temporal correlations by reducing the feature resolutions without data loss. We have used skip connections in non-adjacent layers to avoid gradient decay. In addition, an attention process is incorporated into skip connections to emphasize the essential spectral features and spectral regions. The proposed LSTM model is applied to speech enhancement and recognition applications. The proposed LSTM model uses no future information, resulting in a causal system suitable for real-time processing. The combined spectral feature sets are used to train the LSTM model for improved performance. Using the proposed model, the ideal ratio mask (IRM) is estimated as a training objective. The experimental evaluations using short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) have demonstrated that the proposed model with robust feature representation obtained higher speech intelligibility and perceptual quality. With the TIMIT, LibriSpeech, and VoiceBank datasets, the proposed model improved STOI by 16.21%, 16.41%, and 18.33% over noisy speech, whereas PESQ is improved by 31.1%, 32.9%, and 32%. In seen and unseen noisy situations, the proposed model outperformed existing deep neural networks (DNNs), including baseline LSTM, feedforward neural network (FDNN), convolutional neural network (CNN), and generative adversarial network (GAN). With the Kaldi toolkit for automated speech recognition (ASR), the proposed model significantly reduced the word error rates (WERs) and reached an average WER of 15.13% in noisy backgrounds. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Portraying accent stereotyping by second language speakers.

Author: Lan Y, Xie T, and Lee A
Subjects: Humans, Speech Intelligibility, Language, Cognition, Stereotyping, Speech Perception
Abstract: Stereotyping towards the second language accent of second language learners is extensively seen even when the content of learner speech can be understood. Previous studies reported conflicting results on accent perception by speakers of second languages, especially among homogenous learners. In this paper, we conducted a survey and two experiments to test whether Mandarin-speaking advanced learners of English may give harsher accent ratings to their fellow learners than to Standard American English speakers. The survey was designed to understand the L2 listeners' beliefs about accented speech. In Experiment 1, participants rated short audio recordings of L2 learner' and Standard American English speech; in Experiment 2, they did the same in a more detailed word-in-sentence accent rating task. Results showed a markedly high level of perceived L2 accentedness for several learner speech stimuli despite good intelligibility, especially for the strongly-accented Cantonese passage and for specific vowel and consonant types. The findings reveal the existence of native-speakerism in China and highlight existing accent stereotypes. Implications for policymaking and language teaching are discussed., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2023 Lan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2023
Full Text: View/download PDF

6. Different stages of emotional prosody processing in healthy ageing-evidence from behavioural responses, ERPs, tDCS, and tRNS.

Author: Maltezou-Papastylianou C, Russo R, Wallace D, Harmsworth C, and Paulmann S
Subjects: Acoustic Stimulation methods, Aged, Brain, Electroencephalography, Emotions physiology, Evoked Potentials physiology, Humans, Healthy Aging, Speech Perception physiology, Transcranial Direct Current Stimulation
Abstract: Past research suggests that the ability to recognise the emotional intent of a speaker decreases as a function of age. Yet, few studies have looked at the underlying cause for this effect in a systematic way. This paper builds on the view that emotional prosody perception is a multi-stage process and explores which step of the recognition processing line is impaired in healthy ageing using time-sensitive event-related brain potentials (ERPs). Results suggest that early processes linked to salience detection as reflected in the P200 component and initial build-up of emotional representation as linked to a subsequent negative ERP component are largely unaffected in healthy ageing. The two groups show, however, emotional prosody recognition differences: older participants recognise emotional intentions of speakers less well than younger participants do. These findings were followed up by two neuro-stimulation studies specifically targeting the inferior frontal cortex to test if recognition improves during active stimulation relative to sham. Overall, results suggests that neither tDCS nor high-frequency tRNS stimulation at 2mA for 30 minutes facilitates emotional prosody recognition rates in healthy older adults., Competing Interests: The authors have declared that no competing interests exist.
Published: 2022
Full Text: View/download PDF

7. Linguistic based emotion analysis using softmax over time attention mechanism.

Author: Roshan, Megha, Rawat, Mukul, Aryan, Karan, Lyakso, Elena, Mekala, A. Mary, and Ruban, Nersisson
Subjects: FACIAL expression & emotions (Psychology), EMOTION recognition, SPEECH perception, PSYCHOLOGICAL feedback, EMOTIONS, CUSTOMER feedback, SPEECH
Abstract: Recognizing the real emotion of humans is considered the most essential task for any customer feedback or medical applications. There are many methods available to recognize the type of emotion from speech signal by extracting frequency, pitch, and other dominant features. These features are used to train various models to auto-detect various human emotions. We cannot completely rely on the features of speech signals to detect the emotion, for instance, a customer is angry but still, he is speaking at a low voice (frequency components) which will eventually lead to wrong predictions. Even a video-based emotion detection system can be fooled by false facial expressions for various emotions. To rectify this issue, we need to make a parallel model that will train on textual data and make predictions based on the words present in the text. The model will then classify the type of emotions using more comprehensive information, thus making it a more robust model. To address this issue, we have tested four text-based classification models to classify the emotions of a customer. We examined the text-based models and compared their results which showed that the modified Encoder decoder model with attention mechanism trained on textual data achieved an accuracy of 93.5%. This research highlights the pressing need for more robust emotion recognition systems and underscores the potential of transfer models with attention mechanisms to significantly improve feedback management processes and the medical applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Confusion2Vec 2.0: Enriching ambiguous spoken language representations with subwords.

Author: Gurunath Shivakumar P, Georgiou P, and Narayanan S
Subjects: Humans, Natural Language Processing, Semantics, Speech, Language, Speech Perception
Abstract: Word vector representations enable machines to encode human language for spoken language understanding and processing. Confusion2vec, motivated from human speech production and perception, is a word vector representation which encodes ambiguities present in human spoken language in addition to semantics and syntactic information. Confusion2vec provides a robust spoken language representation by considering inherent human language ambiguities. In this paper, we propose a novel word vector space estimation by unsupervised learning on lattices output by an automatic speech recognition (ASR) system. We encode each word in Confusion2vec vector space by its constituent subword character n-grams. We show that the subword encoding helps better represent the acoustic perceptual ambiguities in human spoken language via information modeled on lattice-structured ASR output. The usefulness of the proposed Confusion2vec representation is evaluated using analogy and word similarity tasks designed for assessing semantic, syntactic and acoustic word relations. We also show the benefits of subword modeling for acoustic ambiguity representation on the task of spoken language intent detection. The results significantly outperform existing word vector representations when evaluated on erroneous ASR outputs, providing improvements up-to 13.12% relative to previous state-of-the-art in intent detection on ATIS benchmark dataset. We demonstrate that Confusion2vec subword modeling eliminates the need for retraining/adapting the natural language understanding models on ASR transcripts., Competing Interests: The authors have declared that no competing interests exist.
Published: 2022
Full Text: View/download PDF

9. Predictors of cochlear implant outcomes in pediatric auditory neuropathy: A matched case-control study.

Author: Jafari, Zahra, Fitzpatrick, Elizabeth M., Schramm, David R., Rouillon, Isabelle, and Koravand, Amineh
Subjects: COCHLEAR implants, AUDITORY neuropathy, PERCEPTION testing, SPEECH perception, HEARING aid fitting, CASE-control method
Abstract: Objectives: Current evidence supports the benefits of cochlear implants (CIs) in children with hearing loss, including those with auditory neuropathy spectrum disorder (ANSD). However, there is limited evidence regarding factors that hold predictive value for intervention outcomes. Design: This retrospective case-control study consisted of 66 children with CIs, including 22 with ANSD and 44 with sensorineural hearing loss (SNHL) matched on sex, age, age at CI activation, and the length of follow-up with CIs (1:2 ratio). The case and control groups were compared in the results of five open-set speech perception tests, and a Forward Linear Regression Model was used to identify factors that can predict the post-CI outcomes. Results: There was no significant difference in average scores between the two groups across five outcome measures, ranging from 88.40% to 95.65%. The correlation matrix revealed that younger ages at hearing aid fitting and CI activation positively influenced improvements in speech perception test scores. Furthermore, among the variables incorporated in the regression model, the duration of follow-up with CIs, age at CI activation, and the utilization of two CIs demonstrated prognostic significance for improved post-CI speech perception outcomes. Conclusions: Children with ANSD can achieve similar open-set speech perception outcomes as children with SNHL. A longer CI follow-up, a lower age at CI activation, and the use of two CIs are predictive for optimal CI outcome. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Crossmixed convolutional neural network for digital speech recognition.

Author: Diep, Quoc Bao, Phan, Hong Yen, and Truong, Thanh-Cong
Subjects: CONVOLUTIONAL neural networks, SPEECH perception, BIG data, LEARNING ability, FOURIER transforms, MIMO radar
Abstract: Digital speech recognition is a challenging problem that requires the ability to learn complex signal characteristics such as frequency, pitch, intensity, timbre, and melody, which traditional methods often face issues in recognizing. This article introduces three solutions based on convolutional neural networks (CNN) to solve the problem: 1D-CNN is designed to learn directly from digital data; 2DS-CNN and 2DM-CNN have a more complex architecture, transferring raw waveform into transformed images using Fourier transform to learn essential features. Experimental results on four large data sets, containing 30,000 samples for each, show that the three proposed models achieve superior performance compared to well-known models such as GoogLeNet and AlexNet, with the best accuracy of 95.87%, 99.65%, and 99.76%, respectively. With 5-10% higher performance than other models, the proposed solution has demonstrated the ability to effectively learn features, improve recognition accuracy and speed, and open up the potential for broad applications in virtual assistants, medical recording, and voice commands. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Perceptual formant discrimination during speech movement planning.

Author: Wang, Hantao, Ali, Yusuf, and Max, Ludo
Subjects: SPEECH perception, SILENT reading, AUDITORY perception, SPEECH, EVOKED potentials (Electrophysiology), DEAF children
Abstract: Evoked potential studies have shown that speech planning modulates auditory cortical responses. The phenomenon's functional relevance is unknown. We tested whether, during this time window of cortical auditory modulation, there is an effect on speakers' perceptual sensitivity for vowel formant discrimination. Participants made same/different judgments for pairs of stimuli consisting of a pre-recorded, self-produced vowel and a formant-shifted version of the same production. Stimuli were presented prior to a "go" signal for speaking, prior to passive listening, and during silent reading. The formant discrimination stimulus /uh/ was tested with a congruent productions list (words with /uh/) and an incongruent productions list (words without /uh/). Logistic curves were fitted to participants' responses, and the just-noticeable difference (JND) served as a measure of discrimination sensitivity. We found a statistically significant effect of condition (worst discrimination before speaking) without congruency effect. Post-hoc pairwise comparisons revealed that JND was significantly greater before speaking than during silent reading. Thus, formant discrimination sensitivity was reduced during speech planning regardless of the congruence between discrimination stimulus and predicted acoustic consequences of the planned speech movements. This finding may inform ongoing efforts to determine the functional relevance of the previously reported modulation of auditory processing during speech planning. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Unraveling the contributions of prosodic patterns and individual traits on cross-linguistic perception of Spanish sentence modality.

Author: Shang, Peizhu, Li, Yuejiao, and Liang, Yuhao
Subjects: INTONATION (Phonetics), SPANISH language, MUSICAL perception, TONE (Phonetics), MUSICAL ability, SPEECH perception
Abstract: Cross-linguistic perception is known to be molded by native and second language (L2) experiences. Yet, the role of prosodic patterns and individual characteristics on how speakers of tonal languages perceive L2 Spanish sentence modalities remains relatively underexplored. This study addresses the gap by analyzing the auditory performance of 75 Mandarin speakers with varying levels of Spanish proficiency. The experiment consisted of four parts: the first three collected sociolinguistic profiles and assessed participants' pragmatic competence and musical abilities. The last part involved an auditory gating task, where participants were asked to identify Spanish broad focus statements and information-seeking yes/no questions with different stress patterns. Results indicated that the shape of intonation contours and the position of the final stressed syllable significantly impact learners' perceptual accuracy, with effects modulated by utterance length and L2 proficiency. Moreover, individual differences in pragmatic and musical competence were found to refine auditory and cognitive processing in Mandarin learners, thereby influencing their ability to discriminate question-statement contrasts. These findings reveal the complex interplay between prosodic and individual variations in L2 speech perception, providing novel insights into how speakers of tonal languages process intonation in a non-native Romance language like Spanish. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. CROS or hearing aid? Selecting the ideal solution for unilateral CI patients with limited aidable hearing in the contralateral ear.

Author: Lively, Sarah, Agrawal, Smita, Stewart, Matthew, Dwyer, Robert T., Strobel, Laura, Marcinkevich, Paula, Hetlinger, Chris, and Croce, Julia
Subjects: HEARING aids, COCHLEAR implants, SPEECH perception, EAR, SPEECH
Abstract: A hearing aid or a contralateral routing of signal device are options for unilateral cochlear implant listeners with limited hearing in the unimplanted ear; however, it is uncertain which device provides greater benefit beyond unilateral listening alone. Eighteen unilateral cochlear implant listeners participated in this prospective, within-participants, repeated measures study. Participants were tested with the cochlear implant alone, cochlear implant + hearing aid, and cochlear implant + contralateral routing of signal device configurations with a one-month take-home period between each in-person visit. Audiograms, speech perception in noise, and lateralization were evaluated. Subjective feedback was obtained via questionnaires. Marked improvement in speech in noise and non-implanted ear lateralization accuracy were observed with the addition of a contralateral hearing aid. There were no significant differences in speech recognition between listening configurations. However, the chronic device use questionnaires and the final device selection showed a clear preference for the hearing aid in spatial awareness and communication domains. Individuals with limited hearing in their unimplanted ears demonstrate significant improvement with the addition of a contralateral device. Subjective questionnaires somewhat contrast with clinic-based outcome measures, highlighting the delicate decision-making process involved in clinically advising one device or another to maximize communication benefits. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Predictors for estimating subcortical EEG responses to continuous speech.

Author: Kulasingham, Joshua P., Bachmann, Florine L., Eskelund, Kasper, Enqvist, Martin, Innes-Brown, Hamish, and Alickovic, Emina
Subjects: SPEECH, AUDITORY pathways, AUDITORY adaptation, AUDITORY perception, SPEECH perception, ACOUSTIC nerve, ELECTROENCEPHALOGRAPHY, EAR, SCALP
Abstract: Perception of sounds and speech involves structures in the auditory brainstem that rapidly process ongoing auditory stimuli. The role of these structures in speech processing can be investigated by measuring their electrical activity using scalp-mounted electrodes. However, typical analysis methods involve averaging neural responses to many short repetitive stimuli that bear little relevance to daily listening environments. Recently, subcortical responses to more ecologically relevant continuous speech were detected using linear encoding models. These methods estimate the temporal response function (TRF), which is a regression model that minimises the error between the measured neural signal and a predictor derived from the stimulus. Using predictors that model the highly non-linear peripheral auditory system may improve linear TRF estimation accuracy and peak detection. Here, we compare predictors from both simple and complex peripheral auditory models for estimating brainstem TRFs on electroencephalography (EEG) data from 24 participants listening to continuous speech. We also investigate the data length required for estimating subcortical TRFs, and find that around 12 minutes of data is sufficient for clear wave V peaks (>3 dB SNR) to be seen in nearly all participants. Interestingly, predictors derived from simple filterbank-based models of the peripheral auditory system yield TRF wave V peak SNRs that are not significantly different from those estimated using a complex model of the auditory nerve, provided that the nonlinear effects of adaptation in the auditory system are appropriately modelled. Crucially, computing predictors from these simpler models is more than 50 times faster compared to the complex model. This work paves the way for efficient modelling and detection of subcortical processing of continuous speech, which may lead to improved diagnosis metrics for hearing impairment and assistive hearing technology. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Comparing online versus laboratory measures of speech perception in older children and adolescents.

Author: McAllister, Tara, Preston, Jonathan L., Ochs, Laura, Hill, Jennifer, and Hitchcock, Elaine R.
Subjects: SPEECH perception, TEENAGERS, AMERICAN English language, SPEECH-language pathology, JUDGMENT (Psychology), SPEECH disorders, VIRTUAL communities, TEENAGE girls
Abstract: Given the increasing prevalence of online data collection, it is important to know how behavioral data obtained online compare to samples collected in the laboratory. This study compares online and in-person measurement of speech perception in older children and adolescents. Speech perception is important for assessment and treatment planning in speech-language pathology; we focus on the American English /ɹ/ sound because of its frequency as a clinical target. Two speech perception tasks were adapted for web presentation using Gorilla: identification of items along a synthetic continuum from rake to wake, and category goodness judgment of English /ɹ/ sounds in words produced by various talkers with and without speech sound disorder. Fifty typical children aged 9–15 completed these tasks online using a standard headset. These data were compared to a previous sample of 98 typical children aged 9–15 who completed the same tasks in the lab setting. For the identification task, participants exhibited smaller boundary widths (suggestive of more acute perception) in the in-person setting relative to the online setting. For the category goodness judgment task, there was no statistically significant effect of modality. The correlation between scores on the two tasks was significant in the online setting but not in the in-person setting, but the difference in correlation strength was not statistically significant. Overall, our findings agree with previous research in suggesting that online and in-person data collection do not yield identical results, but the two contexts tend to support the same broad conclusions. In addition, these results suggest that online data collection can make it easier for researchers connect with a more representative sample of participants. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Effects of spectral smearing on speech understanding and masking release in simulated bilateral cochlear implants.

Author: Cychosz, Margaret, Xu, Kevin, and Fu, Qian-Jie
Subjects: COCHLEAR implants, SPEECH perception
Abstract: Differences in spectro-temporal degradation may explain some variability in cochlear implant users' speech outcomes. The present study employs vocoder simulations on listeners with typical hearing to evaluate how differences in degree of channel interaction across ears affects spatial speech recognition. Speech recognition thresholds and spatial release from masking were measured in 16 normal-hearing subjects listening to simulated bilateral cochlear implants. 16-channel sine-vocoded speech simulated limited, broad, or mixed channel interaction, in dichotic and diotic target-masker conditions, across ears. Thresholds were highest with broad channel interaction in both ears but improved when interaction decreased in one ear and again in both ears. Masking release was apparent across conditions. Results from this simulation study on listeners with typical hearing show that channel interaction may impact speech recognition more than masking release, and may have implications for the effects of channel interaction on cochlear implant users' speech recognition outcomes. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

17. Speech extraction from vibration signals based on deep learning.

Author: Wang, Li, Zheng, Weiguang, Li, Shande, and Huang, Qibai
Subjects: DEEP learning, SPEECH, SPEECH perception, FINITE element method, SYSTEM identification
Abstract: Extracting speech information from vibration response signals is a typical system identification problem, and the traditional method is too sensitive to deviations such as model parameters, noise, boundary conditions, and position. A method was proposed to obtain speech signals by collecting vibration signals of vibroacoustic systems for deep learning training in the work. The vibroacoustic coupling finite element model was first established with the voice signal as the excitation source. The vibration acceleration signals of the vibration response point were used as the training set to extract its spectral characteristics. Training was performed by two types of networks: fully connected, and convolutional. And it is found that the Fully Connected network prediction model has faster Rate of convergence and better quality of extracted speech. The amplitude spectra of the output speech signals (network output) and the phase of the vibration signals were used to convert extracted speech signals back to the time domain during the test set. The simulation results showed that the positions of the vibration response points had little effect on the quality of speech recognition, and good speech extraction quality can be obtained. The noises of the speech signals posed a greater influence on the speech extraction quality than the noises of the vibration signals. Extracted speech quality was poor when both had large noises. This method was robust to the position deviation of vibration responses during training and testing. The smaller the structural flexibility, the better the speech extraction quality. The quality of speech extraction was reduced in a trained system as the mass of node increased in the test set, but with negligible differences. Changes in boundary conditions did not significantly affect extracted speech quality. The speech extraction model proposed in the work has good robustness to position deviations, quality deviations, and boundary conditions. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

18. Separated and reunified: An apparent time investigation of the voice quality differences between Hong Kong Cantonese and Guangzhou Cantonese.

Author: Fung, Roxana S. Y. and Wong, Eugene Y. C.
Subjects: NOISE measurement, TONE color (Music theory), CITIES & towns, AGE groups, SPEECH, SPEECH perception
Abstract: Hong Kong Cantonese (HKC) and Guangzhou Cantonese (GZC) are two major accents of Cantonese spoken in two geographically non-contiguous cities in Southern China. Previous studies were unable to identify the phonetic features that discern the two accents since they share the same phonological system. This study attempted to solve the puzzle by investigating the voice quality differences between the two accents through acoustic analysis on the speech output of 191 talkers in three age groups ranging from 18 to 65 years old. Among the various spectral and noise measurements of voice quality, we found that Cepstral Peak Prominence (CPP) was the best acoustic measure to discern the two accents. Based on the CPP measure, GZC had overall increased noise than HKC. Covariation of voice quality and tones was studied. The greatest CPP differences between the two accents were found in the two extreme tones: the high-level and the extra-low-level tones. Furthermore, creaky voice was found mainly tied to the extra-low-level tone in both accents. However, HKC exhibited higher frequency of creaky voice than GZC. The creaky voice in GZC was characterized by increased noise and increased tension, compared to those of HKC. Finally, age was found to be a mediating factor in the voice quality of the two accents. Adopting the Apparent Time Framework, voice quality in the two cities has undergone changes over time. The voice quality of the young generations of the two accents have become merged among the three low tones. Furthermore, the prevalence of creaky voice was increasing across age groups in both accents, and it increased at a faster rate in HKC than GZC. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

19. Changes in the length of speeches in the plays of William Shakespeare and his contemporaries: A mixed models approach.

Author: Colyvas, Kim, Egan, Gabriel, and Craig, Hugh
Subjects: SPEECH, SPEECH perception, PLAYS on words, ENGLISH language
Abstract: Since 2007 a number of investigators have compiled statistics on the length in words of speeches in plays by William Shakespeare and his contemporaries, focusing on a change to shorter speeches around 1600. In this article we take account of several potentially confounding factors in the variation of speech lengths in these works and present a model of this variation in the period 1538–1642 through Linear Mixed Models. We confirm that the mode of speech lengths in English plays changed from nine words to four words around 1600, and that Shakespeare's plays fit this wider pattern closely. We establish for the first time: that this change is independent of authorship, dramatic genre, theatrical company, and the proportion of verse in a play's dialogue; that the chosen time span can be segmented into pre-1597 plays (with high modes), 1597–1602 plays (with mixed high and low modes), and post-1602 plays (with low modes); that some additional secondary modes are evident in speech lengths, at 16 and 24 words, suggesting that the length of a standard blank verse line (around 8 words) is an underlying unit in speech length; and that the general change to short speeches also holds true when the data is viewed through the perspective of the median and the mean. The change in speech lengths is part of a collective drift in the plays towards liveliness and verisimilitude and is evidence of a hitherto hidden constraint on the playwrights: whether or not they were aware of the fact, playwrights as a group were conforming to a structure for the distribution of speech lengths peculiar to the era they were writing in. The authors hope that the full modelling of this variation in the article will help bring this change to the attention of scholars of Shakespeare and his contemporaries. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

20. The children's emotional speech recognition by adults: Cross-cultural study on Russian and Tamil language.

Author: Lyakso, Elena, Ruban, Nersisson, Frolova, Olga, and Mekala, Mary A.
Subjects: SPEECH perception, RUSSIAN language, CROSS-cultural studies, EMOTIONAL state, SPEECH, RUSSIANS, DEAF children
Abstract: The current study investigated the features of cross-cultural recognition of four basic emotions "joy–neutral (calm state)–sad–anger" in the spontaneous and acting speech of Indian and Russian children aged 8–12 years across Russian and Tamil languages. The research tasks were to examine the ability of Russian and Indian experts to recognize the state of Russian and Indian children by their speech, determine the acoustic features of correctly recognized speech samples, and specify the influence of the expert's language on the cross-cultural recognition of the emotional states of children. The study includes a perceptual auditory study by listeners and instrumental spectrographic analysis of child speech. Different accuracy and agreement between Russian and Indian experts were shown in recognizing the emotional states of Indian and Russian children by their speech, with more accurate recognition of the emotional state of children in their native language, in acting speech vs spontaneous speech. Both groups of experts recognize the state of anger via acting speech with the high agreement. The difference between the groups of experts was in the definition of joy, sadness, and neutral states depending on the test material with a different agreement. Speech signals with emphasized differences in acoustic patterns were more accurately classified by experts as belonging to emotions of different activation. The data showed that, despite the universality of basic emotions, on the one hand, the cultural environment affects their expression and perception, on the other hand, there are universal non-linguistic acoustic features of the voice that allow us to identify emotions via speech. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

21. Comparison and combination of gamified neurofeedback training and general behavioral training.

Author: Chang, Ming, Yokota, Yusuke, Ando, Hideyuki, Maeda, Taro, and Naruse, Yasushi
Subjects: AUDITORY learning, BIOFEEDBACK training, SPEECH perception, LANGUAGE & languages
Abstract: With the rapid development of the international community, foreign language learning has become increasingly important. Listening training is a particularly important component of foreign language learning. The most difficult aspect of listening training is the development of speech discrimination ability, which is crucial to speech perception. General behavioral training requires a substantial amount of time and attention. To address this, we previously developed a neurofeedback (NF) training system that enables unconscious learning of auditory discrimination. However, to our knowledge, no studies have compared NF training and general behavioral training. In the present study, we compared the learning effects of NF training, general behavioral training, and a combination of both strategies. Specifically, we developed a gamified and adapted NF training of auditory discrimination. We found that both NF training and general behavioral training enhanced behavioral performance, whereas only NF training elicited significant changes in brain activity. Furthermore, the participants that used both training methods exhibited the largest improvement in behavioral performance. This indicates that the combined use of NF and general behavioral training methods may be optimal for enhancing auditory discrimination ability when learning foreign languages. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

22. How do people think about the implementation of speech and video recognition technology in emergency medical practice?

Author: Kim, Ki Hong, Hong, Ki Jeong, Shin, Sang Do, Ro, Young Sun, Song, Kyoung Jun, Kim, Tae Han, Park, Jeong Ho, and Jeong, Joo
Subjects: MEDICAL emergencies, SPEECH perception, MEDICAL practice, MEDICAL technology, AUTOMATIC speech recognition, MEDICAL personnel
Abstract: Background: Recently, speech and video information recognition technology (SVRT) has developed rapidly. Introducing SVRT into the emergency medical practice process may lead to improvements in health care. The purpose of this study was to evaluate the level of acceptance of SVRT among patients, caregivers and emergency medical staff. Methods: Structured questionnaires were developed for the patient or caregiver group and the emergency medical staff group. The survey was performed in one tertiary academic hospital emergency department. Questions were optimized for each specific group, and responses were provided mostly using Likert 5-scales. Additional multivariable logistic regression analyses for the whole cohort and subgroups were conducted to calculate odds ratios (OR) and confidence intervals (CI) to examine the association between individual characteristics and SVRT acceptance. Results: Of 264 participants, respondents demonstrated a positive attitude and acceptance toward SVRT and artificial intelligence (AI) in future; 179 (67.8%) for video recordings, and 190 (72.0%) for speech recordings. A multivariable logistic regression model revealed that several factors were associated with acceptance of SVRT in emergency medical practice: belief in health care improvement by signal analysis technology (OR, 95% CIs: 2.48 (1.15–5.42)) and AI (OR, 95% CIs: 1.70 (0.91–3.17)), reliability of AI application in emergency medicine (OR, 95% CIs: 2.36 (1.28–4.35)) and the security of personal information (OR, 95% CIs: 1.98 (1.10–3.63)). Conclusion: A high level of acceptance toward SVRT has been shown in patients or caregivers, and it also appears to be associated with positive attitudes toward new technology, AI and security of personal information. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

23. Automatic analysis of cochlear response using electrocochleography signals during cochlear implant surgery.

Author: Wijewickrema, Sudanthi, Bester, Christofer, Gerard, Jean-Marc, Collins, Aaron, and O'Leary, Stephen
Subjects: COCHLEAR implants, ELECTRIC stimulation, MUSIC appreciation, HEARING impaired, SPEECH perception, SURGERY
Abstract: Cochlear implants (CIs) provide an opportunity for the hearing impaired to perceive sound through electrical stimulation of the hearing (cochlear) nerve. However, there is a high risk of losing a patient's natural hearing during CI surgery, which has been shown to reduce speech perception in noisy environments as well as music appreciation. This is a major barrier to the adoption of CIs by the hearing impaired. Electrocochleography (ECochG) has been used to detect intra-operative trauma that may lead to loss of natural hearing. There is early evidence that ECochG can enable early intervention to save natural hearing of the patient. However, detection of trauma by observing changes in the ECochG response is typically carried out by a human expert. Here, we discuss a method of automating the analysis of cochlear responses during CI surgery. We establish, using historical patient data, that the proposed method is highly accurate (∼94% and ∼95% for sensitivity and specificity respectively) when compared to a human expert. The automation of real-time cochlear response analysis is expected to improve the scalability of ECochG and improve patient safety. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

24. Effects of degraded speech processing and binaural unmasking investigated using functional near-infrared spectroscopy (fNIRS).

Author: Zhou, Xin, Sobczak, Gabriel S., McKay, Colette M., and Litovsky, Ruth Y.
Subjects: SPEECH, TEMPORAL lobe, AUDITORY cortex, PREFRONTAL cortex, SPEECH perception, NEAR infrared spectroscopy, INTELLIGIBILITY of speech
Abstract: The present study aimed to investigate the effects of degraded speech perception and binaural unmasking using functional near-infrared spectroscopy (fNIRS). Normal hearing listeners were tested when attending to unprocessed or vocoded speech, presented to the left ear at two speech-to-noise ratios (SNRs). Additionally, by comparing monaural versus diotic masker noise, we measured binaural unmasking. Our primary research question was whether the prefrontal cortex and temporal cortex responded differently to varying listening configurations. Our a priori regions of interest (ROIs) were located at the left dorsolateral prefrontal cortex (DLPFC) and auditory cortex (AC). The left DLPFC has been reported to be involved in attentional processes when listening to degraded speech and in spatial hearing processing, while the AC has been reported to be sensitive to speech intelligibility. Comparisons of cortical activity between these two ROIs revealed significantly different fNIRS response patterns. Further, we showed a significant and positive correlation between self-reported task difficulty levels and fNIRS responses in the DLPFC, with a negative but non-significant correlation for the left AC, suggesting that the two ROIs played different roles in effortful speech perception. Our secondary question was whether activity within three sub-regions of the lateral PFC (LPFC) including the DLPFC was differentially affected by varying speech-noise configurations. We found significant effects of spectral degradation and SNR, and significant differences in fNIRS response amplitudes between the three regions, but no significant interaction between ROI and speech type, or between ROI and SNR. When attending to speech with monaural and diotic noises, participants reported the latter conditions being easier; however, no significant main effect of masker condition on cortical activity was observed. For cortical responses in the LPFC, a significant interaction between SNR and masker condition was observed. These findings suggest that binaural unmasking affects cortical activity through improving speech reception threshold in noise, rather than by reducing effort exerted. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

25. Different stages of emotional prosody processing in healthy ageing–evidence from behavioural responses, ERPs, tDCS, and tRNS

Author: Constantina Maltezou-Papastylianou, Riccardo Russo, Denise Wallace, Chelsea Harmsworth, and Silke Paulmann
Subjects: Healthy Aging, Multidisciplinary, Acoustic Stimulation, Emotions, Speech Perception, Brain, Humans, Electroencephalography, Transcranial Direct Current Stimulation, Evoked Potentials, Aged
Abstract: Past research suggests that the ability to recognise the emotional intent of a speaker decreases as a function of age. Yet, few studies have looked at the underlying cause for this effect in a systematic way. This paper builds on the view that emotional prosody perception is a multi-stage process and explores which step of the recognition processing line is impaired in healthy ageing using time-sensitive event-related brain potentials (ERPs). Results suggest that early processes linked to salience detection as reflected in the P200 component and initial build-up of emotional representation as linked to a subsequent negative ERP component are largely unaffected in healthy ageing. The two groups show, however, emotional prosody recognition differences: older participants recognise emotional intentions of speakers less well than younger participants do. These findings were followed up by two neuro-stimulation studies specifically targeting the inferior frontal cortex to test if recognition improves during active stimulation relative to sham. Overall, results suggests that neither tDCS nor high-frequency tRNS stimulation at 2mA for 30 minutes facilitates emotional prosody recognition rates in healthy older adults.
Published: 2022

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

25 results

1. Combined spectral and speech features for pig speech recognition.

2. The perception of Mandarin speech conveying communicative functions in Chinese heroin addicts.

3. Effective mitigation of the belief perseverance bias after the retraction of misinformation: Awareness training and counter-speech.

4. Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network.

5. Portraying accent stereotyping by second language speakers.

6. Different stages of emotional prosody processing in healthy ageing-evidence from behavioural responses, ERPs, tDCS, and tRNS.

7. Linguistic based emotion analysis using softmax over time attention mechanism.

8. Confusion2Vec 2.0: Enriching ambiguous spoken language representations with subwords.

9. Predictors of cochlear implant outcomes in pediatric auditory neuropathy: A matched case-control study.

10. Crossmixed convolutional neural network for digital speech recognition.

11. Perceptual formant discrimination during speech movement planning.

12. Unraveling the contributions of prosodic patterns and individual traits on cross-linguistic perception of Spanish sentence modality.

13. CROS or hearing aid? Selecting the ideal solution for unilateral CI patients with limited aidable hearing in the contralateral ear.

14. Predictors for estimating subcortical EEG responses to continuous speech.

15. Comparing online versus laboratory measures of speech perception in older children and adolescents.

16. Effects of spectral smearing on speech understanding and masking release in simulated bilateral cochlear implants.

17. Speech extraction from vibration signals based on deep learning.

18. Separated and reunified: An apparent time investigation of the voice quality differences between Hong Kong Cantonese and Guangzhou Cantonese.

19. Changes in the length of speeches in the plays of William Shakespeare and his contemporaries: A mixed models approach.

20. The children's emotional speech recognition by adults: Cross-cultural study on Russian and Tamil language.

21. Comparison and combination of gamified neurofeedback training and general behavioral training.

22. How do people think about the implementation of speech and video recognition technology in emergency medical practice?

23. Automatic analysis of cochlear response using electrocochleography signals during cochlear implant surgery.

24. Effects of degraded speech processing and binaural unmasking investigated using functional near-infrared spectroscopy (fNIRS).

25. Different stages of emotional prosody processing in healthy ageing–evidence from behavioural responses, ERPs, tDCS, and tRNS

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Region

Database

Publisher

25 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources