Descriptor: "Phoneme recognition" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Phoneme recognition"' showing total 456 results

Start Over Descriptor "Phoneme recognition"

456 results on '"Phoneme recognition"'

1. Phoneme Recognition in Korean Singing Voices Using Self-Supervised English Speech Representations.

Author: Wu, Wenqin and Lee, Joonwhoan
Subjects: HUMAN voice, SPEECH, KOREAN language, ERROR rates, ENGLISH language, SPEECH perception, AUTOMATIC speech recognition
Abstract: In general, it is difficult to obtain a huge, labeled dataset for deep learning-based phoneme recognition in singing voices. Studying singing voices also offers inherent challenges, compared to speech, because of the distinct variations in pitch, duration, and intensity. This paper proposes a detouring method to overcome this insufficient dataset, and applies it to the recognition of Korean phonemes in singing voices. The method started with pre-training the HuBERT, a self-supervised speech representation model, on a large-scale English corpus. The model was then adapted to the Korean speech domain with a relatively small-scale Korean corpus, in which the Korean phonemes were interpreted as similar English ones. Finally, the speech-adapted model was again trained with a tiny-scale Korean singing voice corpus for speech–singing adaptation. In the final adaptation, melodic supervision was chosen, which utilizes pitch information to improve the performance. For evaluation, the performance on multi-level error rates based on Word Error Rate (WER) was taken. Using the HuBERT-based transfer learning for adaptation improved the phoneme-level error rate of Korean speech by as much as 31.19%. Again, on singing voices by melodic supervision, it improved the rate by 0.55%. The significant improvement in speech recognition underscores the considerable potential of a model equipped with general human voice representations captured from the English corpus that can improve phoneme recognition on less target speech data. Moreover, the musical variation in singing voices is beneficial for phoneme recognition in singing voices. The proposed method could be applied to the phoneme recognition of other languages that have less speech and singing voice corpora. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Text-Independent Phone-to-Audio Alignment Leveraging SSL (TIPAA-SSL) Pre-Trained Model Latent Representation and Knowledge Transfer.

Author: Tits, Noé, Bhatnagar, Prernna, and Dutoit, Thierry
Subjects: SPEECH processing systems, WORK design, KNOWLEDGE representation (Information theory), SPEECH perception, DEEP learning, TEXT recognition
Abstract: In this paper, we present a novel approach for text-independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (Wav2Vec2) fine-tuned for phoneme recognition using a Connectionist Temporal Classification (CTC) loss, a dimension reduction model and a frame-level phoneme classifier trained using forced-alignment labels (using Montreal Forced Aligner) to produce multi-lingual phonetic representations, thus requiring minimal additional training. We evaluate our model using synthetic native data from the TIMIT dataset and the SCRIBE dataset for American and British English, respectively. Our proposed model outperforms the state-of-the-art (charsiu) in statistical metrics and has applications in language learning and speech processing systems. We leave experiments on other languages for future work but the design of the system makes it easily adaptable to other languages. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Establishment and verification of auditory brainstem implant vocoder model

Author: ZHANG Qinjie, HUANG Sui, TAN Haoyue, ZHOU Xiang, WANG Junyi, LIU Yuzi, WEN Wen, GUO Jia, WU Hao, and JIA Huan
Subjects: auditory brainstem implant, vocoder, phoneme recognition, psychoacoustic, electrode array topology, Medicine
Abstract: Objective·To develope an auditory brainstem implant (ABI) vocoder based on cochlear implant (CI) vocoder characteristics and ABI electrode array topology, and to verify its reliability.Methods·An "n-of-m" coding strategy CI/ABI vocoder was constructed based on MATLAB. Within each frame, only the envelopes of the n channels with the highest energy were selected. The interaction coefficient (IC) (range: 1‒3), channel numbers (range: 5‒22), and electrode array topology (CI/ABI) were adjustable parameters, allowing for the synthesis of simulated speech. Psychoacoustic evaluation was employed, recruiting normal hearing subjects to perform closed-set simulated phoneme perception. The phoneme recognition accuracy (20 vowel questions/condition, 11 consonant questions/condition) was compared with the corresponding conditions of CI and ABI from reference literature to determine the IC value of the vocoder and verify its reliability.Results·The vocoder successfully synthesized all test stimuli. In the closed-set CI-simulated speech recognition, the simulated vowel and consonant recognition accuracy for IC2 and IC3 conditions showed no significant difference compared to the accuracy reported in the CI reference literature (P>0.05). The difference in vowel and consonant accuracy between IC2 and the literature was smaller than that between IC3 and the literature (vowel |d|=1.6% vs. 20%, consonant |d|=8.4% vs. 9.9%), thus determining the optimal interaction coefficient of this model as 2. Subsequently, when modifying the electrode array topology to ABI, it was found that the simulated phoneme recognition accuracy for a 16-channel ABI was significantly lower than that for the 16-channel CI group, consistent with the reported literature. The simulated vowel and consonant accuracy within the 5‒8 channel range for ABI showed no significant difference (P>0.05), also aligning with the trend reported in the literature.Conclusion·A CI/ABI vocoder based on "n-of-m" coding strategy is established and the optimal IC is determined. The established ABI encoder has been evaluated for high reliability through psychoacoustic experiments. It provides suitable technical means for validating ABI-specific coding strategies.
Published: 2024
Full Text: View/download PDF

4. Text-Independent Phone-to-Audio Alignment Leveraging SSL (TIPAA-SSL) Pre-Trained Model Latent Representation and Knowledge Transfer

Author: Noé Tits, Prernna Bhatnagar, and Thierry Dutoit
Subjects: speech recognition, phoneme recognition, deep learning, transfer learning, Physics, QC1-999
Abstract: In this paper, we present a novel approach for text-independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (Wav2Vec2) fine-tuned for phoneme recognition using a Connectionist Temporal Classification (CTC) loss, a dimension reduction model and a frame-level phoneme classifier trained using forced-alignment labels (using Montreal Forced Aligner) to produce multi-lingual phonetic representations, thus requiring minimal additional training. We evaluate our model using synthetic native data from the TIMIT dataset and the SCRIBE dataset for American and British English, respectively. Our proposed model outperforms the state-of-the-art (charsiu) in statistical metrics and has applications in language learning and speech processing systems. We leave experiments on other languages for future work but the design of the system makes it easily adaptable to other languages.
Published: 2024
Full Text: View/download PDF

5. 低信噪比下基于融合网络的音素识别方法.

Author: 黄辉波, 邵玉斌, 龙华, and 杜庆治
Subjects: TRANSFORMER models, FEATURE extraction, IMAGE denoising, CONVOLUTIONAL neural networks, ERROR rates
Abstract: Copyright of Journal of Chongqing University of Posts & Telecommunications (Natural Science Edition) is the property of Chongqing University of Posts & Telecommunications and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

6. Hierarchical Multi-task Learning with Articulatory Attributes for Cross-Lingual Phoneme Recognition

Author: Glocker, Kevin, Georges, Munir, Celebi, Emre, Series Editor, Chen, Jingdong, Series Editor, Gopi, E. S., Series Editor, Neustein, Amy, Series Editor, Liotta, Antonio, Series Editor, Di Mauro, Mario, Series Editor, and Abbas, Mourad, editor
Published: 2024
Full Text: View/download PDF

7. Improving Automatic Forced Alignment for Phoneme Segmentation in Quranic Recitation

Author: Ammar Mohammed Ali Alqadasi, Akram M. Zeki, Mohd Shahrizal Sunar, Md. Sah Bin Hj Salam, Rawad Abdulghafor, and Nashwan Abdo Khaled
Subjects: Phoneme alignment, forced alignment, phoneme segmentation, Arabic phoneme segmentation, phoneme duration, phoneme recognition, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Segmentation plays a crucial role in speech processing applications, where high accuracy is essential. The quest for improved accuracy in automatic segmentation, particularly in the context of the Arabic language, has garnered substantial attention. However, the differences between Qur’an recitation and normal Arabic speech, especially with regard to intonation rules affecting the lengthening of long vowels, pose challenges in segmentation especially for Qur’an recitation. This research endeavors to address these challenges by delving into the domain of automatic segmentation for Qur’an recitation recognition. The proposed scheme employs a hidden Markov models (HMMs) forced alignment algorithm. To enhance the precision of segmentation, several refinements have been introduced, with a primary emphasis on the phonetic model of the Qur’an and Tajweed, particularly the intricate rules governing elongation. These enhancements encompass the adaptation of an acoustic model tailored for Qur’anic recitation as preprocessing and culminate in the development of an algorithm aimed at refining forced alignment based on the phonetic nuances of the Qur’an. These enhancements are seamlessly integrated as post-processing components for the classic HMM-based forced alignment. The research utilizes a comprehensive database featuring recordings from 100 renowned Qur’an reciters, encompassing the recitation of 21 Qur’anic verses (Ayat). Additionally, 30 reciters were asked to record the same verses, incorporating various recitation speed patterns. To facilitate the evaluation process, a Random sample of the Qur’anic database was manually segmented, comprised 21 Ayats, totaling 19,800 words, with 89 unique words (14 verses x 3 recitation levels: fast, slow and normal x 6 readers). The outcomes of this study manifest notable advancements in the alignment of long vowels within Qur’an recitation, all while maintaining the precise alignment of vowels and consonants. Objective comparisons between the proposed automatic methods and manual segmentation were conducted to ascertain the superior approach. The findings affirm that the classic forced alignment method produces satisfactory outcomes when employed on verses lacking long vowels. However, its performance diminishes when confronted with verses containing long vowels. Therefore, the test samples were categorized into three groups based on the presence of long vowels, resulting in a Correct Classification Rate (CCR) that ranged from 6% to 57%, contingent on whether the verse includes long vowels or not. The average CCR across all test samples was 23%. In contrast, the proposed algorithm significantly enhances audio segmentation. It achieved CCR values ranging from 16% to 70% within the same database categories, with an average CCR of 45% across all test samples. This marks a notable advancement of 22% in segmented speech accuracy, particularly within a 30 ms tolerance, for verses containing long vowels.
Published: 2024
Full Text: View/download PDF

8. Phoneme Recognition in Korean Singing Voices Using Self-Supervised English Speech Representations

Author: Wenqin Wu and Joonwhoan Lee
Subjects: phoneme recognition, Korean singing voices, self-supervised learning, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: In general, it is difficult to obtain a huge, labeled dataset for deep learning-based phoneme recognition in singing voices. Studying singing voices also offers inherent challenges, compared to speech, because of the distinct variations in pitch, duration, and intensity. This paper proposes a detouring method to overcome this insufficient dataset, and applies it to the recognition of Korean phonemes in singing voices. The method started with pre-training the HuBERT, a self-supervised speech representation model, on a large-scale English corpus. The model was then adapted to the Korean speech domain with a relatively small-scale Korean corpus, in which the Korean phonemes were interpreted as similar English ones. Finally, the speech-adapted model was again trained with a tiny-scale Korean singing voice corpus for speech–singing adaptation. In the final adaptation, melodic supervision was chosen, which utilizes pitch information to improve the performance. For evaluation, the performance on multi-level error rates based on Word Error Rate (WER) was taken. Using the HuBERT-based transfer learning for adaptation improved the phoneme-level error rate of Korean speech by as much as 31.19%. Again, on singing voices by melodic supervision, it improved the rate by 0.55%. The significant improvement in speech recognition underscores the considerable potential of a model equipped with general human voice representations captured from the English corpus that can improve phoneme recognition on less target speech data. Moreover, the musical variation in singing voices is beneficial for phoneme recognition in singing voices. The proposed method could be applied to the phoneme recognition of other languages that have less speech and singing voice corpora.
Published: 2024
Full Text: View/download PDF

9. Fine-Grained Voice Discrimination for Low-Resource Datasets Using Scalogram Images

Author: Moirangthem, Gourashyam, Nongmeikapam, Kishorjit, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Das, Swagatam, editor, Saha, Snehanshu, editor, Coello Coello, Carlos A., editor, and Bansal, Jagdish Chand, editor
Published: 2023
Full Text: View/download PDF

10. pROnounce: Automatic Pronunciation Assessment for Romanian

Author: Ungureanu, Dan, Ruseti, Stefan, Toma, Irina, Dascalu, Mihai, Howlett, Robert J., Series Editor, Jain, Lakhmi C., Series Editor, Dascalu, Mihai, editor, Marti, Patrizia, editor, and Pozzi, Francesca, editor
Published: 2023
Full Text: View/download PDF

11. Comparable Encoding, Comparable Perceptual Pattern: Acoustic and Electric Hearing

Author: Fanhui Kong, Huali Zhou, Yefei Mo, Mingyue Shi, Qinglin Meng, and Nengheng Zheng
Subjects: Neural prosthetic, cochlear implant, aural rehabilitation, phoneme recognition, vocoder simulation, Medical technology, R855-855.5, Therapeutics. Pharmacology, RM1-950
Abstract: Perception with electric neuroprostheses is sometimes expected to be simulated using properly designed physical stimuli. Here, we examined a new acoustic vocoder model for electric hearing with cochlear implants (CIs) and hypothesized that comparable speech encoding can lead to comparable perceptual patterns for CI and normal hearing (NH) listeners. Speech signals were encoded using FFT-based signal processing stages including band-pass filtering, temporal envelope extraction, maxima selection, and amplitude compression and quantization. These stages were specifically implemented in the same manner by an Advanced Combination Encoder (ACE) strategy in CI processors and Gaussian-enveloped Tones (GET) or Noise (GEN) vocoders for NH. Adaptive speech reception thresholds (SRTs) in noise were measured using four Mandarin sentence corpora. Initial consonant (11 monosyllables) and final vowel (20 monosyllables) recognition were also measured. NaÏve NH listeners were tested using vocoded speech with the proposed GET/GEN vocoders as well as conventional vocoders (controls). Experienced CI listeners were tested using their daily-used processors. Results showed that: 1) there was a significant training effect on GET vocoded speech perception; 2) the GEN vocoded scores (SRTs with four corpora and consonant and vowel recognition scores) as well as the phoneme-level confusion pattern matched with the CI scores better than controls. The findings suggest that the same signal encoding implementations may lead to similar perceptual patterns simultaneously in multiple perception tasks. This study highlights the importance of faithfully replicating all signal processing stages in the modeling of perceptual patterns in sensory neuroprostheses. This approach has the potential to enhance our understanding of CI perception and accelerate the engineering of prosthetic interventions. The GET/GEN MATLAB program is freely available athttps://github.com/BetterCI/GETVocoder.
Published: 2023
Full Text: View/download PDF

12. Evaluation of phoneme recognition skills in pediatric auditory brainstem implant users.

Author: Baş, Banu and Yücel, Esra
Subjects: *PHONEME (Linguistics), *AUDITORY brain stem implants, *COCHLEAR implants, *RECOGNITION (Psychology), *LANGUAGE ability testing, *DEMOGRAPHIC characteristics
Abstract: Objectives: This study aims to evaluate the relationship between phoneme recognition skills and language development skills in pediatric auditory brainstem implant (ABI) users. It further intends to identify the delays and problems that may occur in the phoneme recognition skills of children with ABI and shed light on rehabilitation programs. Methods: Our study included 20 children using ABI and another 20 using cochlear implants (CI). They were aged between 6 and 8 years 11 months. The participants exhibited homogenous demographic and audiological characteristics. The Turkish version of the Test of Language Development-Primary: Fourth Edition (TOLDP:4) was used to evaluate the language development skills, and the Turkish version of the Phoneme Recognition Test (PRT) was applied to assess the phoneme recognition skills. Results: There was a statistically significant difference (p < 0.05) in the PRT values as well as in the language development skills between the children with ABI and those with CI. It was observed that the values of the children with CI were significantly higher than those of children with ABI. Conclusion: Although children with ABI were not able to match the skills of their peers with CI, their language development and communication skills improved. It is believed that this study will contribute to the literature by demonstrating that the use of ABI improves phoneme recognition skills in children who are not eligible for CI or who do not adequately benefit from CI. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

13. Assessment of Syllable Intelligibility Based on Convolutional Neural Networks for Speech Rehabilitation After Speech Organs Surgical Interventions

Author: Kostuchenko, Evgeny, Novokhrestova, Dariya, Pekarskikh, Svetlana, Shelupanov, Alexander, Nemirovich-Danchenko, Mikhail, Choynzonov, Evgeny, Balatskaya, Lidiya, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Salah, Albert Ali, editor, Karpov, Alexey, editor, and Potapova, Rodmonga, editor
Published: 2019
Full Text: View/download PDF

14. Recognition of vocoded speech in English by Mandarin-speaking English-learners.

Author: Yang, Jing, Wagner, Andrew, Zhang, Yu, and Xu, Li
Subjects: *INTELLIGIBILITY of speech, *SPEECH perception, *ENGLISH as a foreign language, *SECOND language acquisition, *NATIVE language, *VOCODER
Abstract: • The L2 listeners showed less improvement in phoneme recognition accuracy as a function of number of channels, which was associated with the phoneme confusions due to the L1 impact. • The L2 listeners were less effective than L1 listeners in applying contextual information and linguistic knowledge to sentence recognition. • The facilitating role of contextual cues in sentence recognition was consistently present in the L2 listeners. • The L2 listeners required more spectral information to maximize the contextual benefit in comparison to the L1 listeners. • The overall perceptual performance of the L2 listeners was positively correlated with and predicted by the length of residence in the U.S. The purpose of this study was to examine the impact of spectral degradation on speech processing in non-native listeners. The participants included 27 native English (L1) listeners and 43 native Mandarin listeners who learned English as a second language (L2). The speech stimuli included 12 English vowels embedded in a /hVd/ context, 20 English consonants embedded in a /Ca/ context, and HINT, CUNY, and R-SPIN sentences. All stimuli were processed using 2-, 4-, 6-, 8-, and 12-channel noise vocoders. The results showed that compared to the L1 listeners, the L2 listeners demonstrated less improvement in phoneme recognition with increasing number of channels, which was associated with the phoneme confusions due to the impact of their native language. Both consonant and vowel recognition made significant contributions to sentence recognition in the L2 listeners. In addition, the L2 listeners were less effective than the L1 listeners in applying contextual information and linguistic knowledge to sentence recognition. However, the facilitating role of contextual cues in sentence recognition was consistently present in the L2 listeners but they required more spectral information to maximize the contextual benefit in comparison to the L1 listeners. The overall perceptual performance of the L2 listeners was positively correlated with and predicted by the length of residence in the U.S. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

15. Towards Deep Object Detection Techniques for Phoneme Recognition

Author: Mohammed Algabri, Hassan Mathkour, Mohamed Abdelkader Bencherif, Mansour Alsulaiman, and Mohamed Amine Mekhtiche
Subjects: CenterNet, object detection, phoneme recognition, transfer learning, YOLO, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: The use of cutting edge object detection techniques to build an accurate phoneme sequence recognition system for English and Arabic languages is investigated in this study. Recently, numerous techniques have been proposed for object detection in daily life applications using deep learning. In this paper, we propose the use of object detection techniques in speech processing tasks. We selected two state-of-the-art object detectors, namely YOLO and CenterNet, based on a trade-off between detection accuracy and speed. We tackled the problem of phoneme sequence recognition using three systems: the domain transfer learning system (DTS) from image to speech, intra-language transfer leaning system (IaTS) between speech corpora within the same language (English to English), and inter-language transfer learning system (IeTS) between speech corpora from dissimilar languages (English to Arabic). For English phoneme recognition, the Texas Instruments/Massachusetts Institute of Technology (TIMIT) corpus is used to evaluate the performance of the proposed systems. Our IaTS based on the CenterNet detector achieves the best results using the test core set of TIMIT with 15.89% phone error rate (PER). For Arabic phoneme recognition, the best performance, with 7.58% PER, was achieved using the CenterNet. These results show the effectiveness of using object detection techniques in phoneme recognition tasks. Furthermore, based on the findings of this study, speech processing tasks may be treated as object detection tasks.
Published: 2020
Full Text: View/download PDF

16. Optimizing Arabic Speech Distinctive Phonetic Features and Phoneme Recognition Using Genetic Algorithm

Author: Ahmed B. Ibrahim, Yasser Mohammad Seddiq, Ali Hamid Meftah, Mansour Alghamdi, Sid-Ahmed Selouani, Mustafa A. Qamhan, Yousef A. Alotaibi, and Saleh A. Alshebeili
Subjects: Arabic speech distinctive phonetic feature, phoneme recognition, genetic algorithm, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Distinctive phonetic features have an important role in Arabic speech phoneme recognition. In a given language, distinctive phonetic features are extrapolated from acoustic features using different methods. However, exploiting lengthy acoustic features vector in the sake of phoneme recognition has a huge cost in terms of computational complexity, which in turn, affects real time applications. The aim of this work is to consider methods to reduce the size of features vector employed for distinctive phonetic feature and phoneme recognition. The objective is to select the relevant input features that contribute to the speech recognition process. This, in turn, will lead to a reduced computational complexity of recognition algorithm, and an improved recognition accuracy. In the proposed approach, genetic algorithm is used to perform optimal features selection. Therefore, a baseline model based on feedforward neural networks is first built. This model is used to benchmark the results of proposed features selection method with a method that employs all elements of a features vector. Experimental results, utilizing the King Abdulaziz City for Science and Technology Arabic Phonetic Database, show that the average genetic algorithm based phoneme overall recognition accuracy is maintained slightly higher than that of recognition method employing the full-fledge features vector. The genetic algorithm based distinctive phonetic features recognition method has achieved a 50% reduction in the dimension of the input vector while obtaining a recognition accuracy of 90%. Moreover, the results of the proposed method is validated using Wilcoxon signed rank test.
Published: 2020
Full Text: View/download PDF

17. Normal hearing and verbal discrimination in real sounds environments.

Author: Lodeiro Colatosti A, Pla Gil I, Morant Ventura A, Latorre Monteagudo E, Chacón Aranda L, and Marco Algarra J
Abstract: Introduction: Human beings are constantly exposed to complex acoustic environments every day, which even pose challenges for individuals with normal hearing. Speech perception relies not only on fixed elements within the acoustic wave but is also influenced by various factors. These factors include speech intensity, environmental noise, the presence of other speakers, individual specific characteristics, spatial separatios of sound sources, ambient reverberation, and audiovisual cues. The objective of this study is twofold: to determine the auditory capacity of normal hearing individuals to discriminate spoken words in real-life acoustic conditions and perform a phonetic analysis of misunderstood spoken words., Materials and Methods: This is a descriptive observational cross-sectional study involving 20 normal hearing individuals. Verbal audiometry was conducted in an open-field environment, with sounds masked by simulated real-word acoustic environment at various sound intensity levels. To enhance sound emission, 2D visual images related to the sounds were displayed on a television. We analyzed the percentage of correct answers and performed a phonetic analysis of misunderstood Spanish bisyllabic words in each environment., Results: 14 women (70%) and 6 men (30%), with an average age of 26 ± 5,4 years and a mean airway hearing threshold in the right ear of 10,56 ± 3,52 dB SPL and in the left ear of 10,12 ± 2,49 dB SPL. The percentage of verbal discrimination in the "Ocean" sound environment was 97,2 ± 5,04%, "Restaurant" was 94 ± 4,58%, and "Traffic" was 86,2 ± 9,94% (p = 0,000). Regarding the phonetic analysis, the allophones that exhibited statistically significant differences were as follows: [o] (p = 0,002) within the group of vocalic phonemes, [n] (p = 0,000) of voiced nasal consonants, [r] (p = 0,0016) of voiced fricatives, [b] (p = 0,000) and [g] (p = 0,045) of voiced stops., Conclusion: The dynamic properties of the acoustic environment can impact the ability of a normal hearing individual to extract information from a voice signal. Our study demonstrates that this ability decreases when the voice signal is masked by one or more simultaneous interfering voices, as observed in a "Restaurant" environment, and when it is masked by a continuous and intense noise environment such as "Traffic". Regarding the phonetic analysis, when the sound environment was composed of continuous-low frequency noise, we found that nasal consonants were particularly challenging to identify. Furthermore in situations with distracting verbal signals, vowels and vibrating consonants exhibited the worst intelligibility., (Copyright © 2024 Sociedad Española de Otorrinolaringología y Cirugía de Cabeza y Cuello. Published by Elsevier España, S.L.U. All rights reserved.)
Published: 2024
Full Text: View/download PDF

18. Toward an Automatic Fongbe Speech Recognition System: Hierarchical Mixtures of Algorithms for Phoneme Recognition

Author: Laleye, Fréjus A. A., Ezin, Eugène C., Motamed, Cina, Madani, Kurosh, editor, Peaucelle, Dimitri, editor, and Gusikhin, Oleg, editor
Published: 2018
Full Text: View/download PDF

19. Bilateral cochlear implantation: an assessment of language sub-skills and phoneme recognition in school-aged children.

Author: Yıldırım Gökay, Nuriye and Yücel, Esra
Subjects: *DEAF children, *COCHLEAR implants, *LANGUAGE ability testing, *SCHOOL children, *PHONEME (Linguistics), *AUDITORY perception
Abstract: Purposes: The main purpose of this study is to investigate whether there is a difference in phoneme recognition and school-age language skills in children with bilateral and unilateral cochlear implants (CI). The second aim of the study is to examine language-based skills in bilateral cochlear implanted children with the first implant, second implant and in the bilateral listening situations. Method: 60 to 108-month-old children with similar demographic and audiological features were included. Of the 64 participants in total, 30 are bilateral cochlear implant users and 34 of them use unilateral cochlear implants. Turkish version of "Test of Language Development-Primary: Fourth edition (TOLD-P:4)" and "Phoneme Recognition Test (PRT)" were implemented for the evaluation of the language sub-components skills and auditory perception. In addition, the PRT test audio file was presented directly to the implant with connection cables via the fitting program methodologically. Results: Children with bilateral cochlear implants were more successful in all language-based skills than children with unilateral cochlear implants (p < 0.05). In the PRT test, the most successful scores were obtained in the bilateral listening conditions, the second with the experienced implant side, and the most unsuccessful scores in the listening conditions with second implant. Conclusion: Bilateral cochlear implants are very useful in terms of language-based skills in children with severe/profound hearing loss. This can positively affect even the future academic and social skills of children. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

20. Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Author: M. Asadolahzade Kermanshahi and M. M. Homayounpour
Subjects: Phoneme Recognition, Deep Neural Network, Hidden Markov Model, Hidden Semi-Markov Model, Extended Viterbi Algorithm, Information technology, T58.5-58.64, Computer software, QA76.75-76.765
Abstract: Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Most previous research attempted to improve training phase such as training algorithms, different types of network, network architecture, feature type, etc. But in this study, we focus on test phase which is related to generate phoneme sequence that is also essential to achieve good phoneme recognition accuracy. Past research used Viterbi algorithm on hidden Markov model (HMM) to generate phoneme sequences. We address an important problem associated with this method. To deal with the problem of considering geometric distribution of state duration in HMM, we use real duration probability distribution for each phoneme with the aid of hidden semi-Markov model (HSMM). We also represent each phoneme with only one state to simply use phonemes duration information in HSMM. Furthermore, we investigate the performance of a post-processing method, which corrects the phoneme sequence obtained from the neural network, based on our knowledge about phonemes. The experimental results using the Persian FarsDat corpus show that using extended Viterbi algorithm on HSMM achieves phoneme recognition accuracy improvements of 2.68% and 0.56% over conventional methods using Gaussian mixture model-hidden Markov models (GMM-HMMs) and Viterbi on HMM, respectively. The post-processing method also increases the accuracy compared to before its application.
Published: 2019
Full Text: View/download PDF

21. Semi-supervised Phoneme Recognition with Recurrent Ladder Networks

Author: Tietz, Marian, Alpay, Tayfun, Twiefel, Johannes, Wermter, Stefan, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Lintas, Alessandra, editor, Rovetta, Stefano, editor, Verschure, Paul F.M.J., editor, and Villa, Alessandro E.P., editor
Published: 2017
Full Text: View/download PDF

22. Out of Time: Automated Lip Sync in the Wild

Author: Chung, Joon Son, Zisserman, Andrew, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Chen, Chu-Song, editor, Lu, Jiwen, editor, and Ma, Kai-Kuang, editor
Published: 2017
Full Text: View/download PDF

23. Multi-channel spectrograms for speech processing applications using deep learning methods.

Author: Arias-Vergara, T., Klumpp, P., Vasquez-Correa, J. C., Nöth, E., Orozco-Arroyave, J. R., and Schuster, M.
Subjects: *DEEP learning, *RECURRENT neural networks, *CONVOLUTIONAL neural networks, *SIGNAL convolution, *SPECTROGRAMS, *HOUGH transforms, *COCHLEAR implants
Abstract: Time–frequency representations of the speech signals provide dynamic information about how the frequency component changes with time. In order to process this information, deep learning models with convolution layers can be used to obtain feature maps. In many speech processing applications, the time–frequency representations are obtained by applying the short-time Fourier transform and using single-channel input tensors to feed the models. However, this may limit the potential of convolutional networks to learn different representations of the audio signal. In this paper, we propose a methodology to combine three different time–frequency representations of the signals by computing continuous wavelet transform, Mel-spectrograms, and Gammatone spectrograms and combining then into 3D-channel spectrograms to analyze speech in two different applications: (1) automatic detection of speech deficits in cochlear implant users and (2) phoneme class recognition to extract phone-attribute features. For this, two different deep learning-based models are considered: convolutional neural networks and recurrent neural networks with convolution layers. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

24. Improving phoneme recognition of throat microphone speech recordings using transfer learning.

Author: Turan, M.A. Tuğtekin and Erzin, Engin
Subjects: *PHONEME (Linguistics), *GAUSSIAN mixture models, *HIDDEN Markov models, *CONVOLUTIONAL neural networks, *THROAT, *PSYCHOACOUSTICS, *LEARNING strategies
Abstract: Throat microphones (TM) are a type of skin-attached non-acoustic sensors, which are robust to environmental noise but carry a lower signal bandwidth characterization than the traditional close-talk microphones (CM). Attaining high-performance phoneme recognition is a challenging task when the training data from a degrading channel, such as TM, is limited. In this paper, we address this challenge for the TM speech recordings using a transfer learning approach based on the stacked denoising auto-encoders (SDA). The proposed transfer learning approach defines an SDA-based domain adaptation framework to map the source domain CM representations and the target domain TM representations into a common latent space, where the mismatch across TM and CM is eliminated to better train an acoustic model and to improve the TM phoneme recognition. For the phoneme recognition task, we use the convolutional neural network (CNN) and the hidden Markov model (HMM) based CNN/HMM hybrid system, which delivers better acoustic modeling performance compared to the conventional Gaussian mixture model (GMM) based models. In the experimental evaluations, we observed more than 12% relative phoneme error rate (PER) improvement for the TM recordings with the proposed transfer learning approach compared to baseline performances. • High-performance phoneme recognition of throat microphone is a challenging task. • Data is scarce for throat microphone speech compared to close-talk microphones. • Transfer learning from close-talk to target throat microphone eliminates mismatch. • Transfer learning as well performs data augmentation from source to target domain. • Proposed learning scheme attains significant phoneme recognition improvements. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

25. Better Phoneme Recognisers Lead to Better Phoneme Posteriorgrams for Search on Speech? An Experimental Analysis

Author: Lopez-Otero, Paula, Docio-Fernandez, Laura, Garcia-Mateo, Carmen, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Abad, Alberto, editor, Ortega, Alfonso, editor, Teixeira, António, editor, García Mateo, Carmen, editor, Martínez Hinarejos, Carlos D., editor, Perdigão, Fernando, editor, Batista, Fernando, editor, and Mamede, Nuno, editor
Published: 2016
Full Text: View/download PDF

26. Gender-Specific Classifiers in Phoneme Recognition and Academic Emotion Detection

Author: Azcarraga, Arnulfo, Talavera, Arces, Azcarraga, Judith, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Hirose, Akira, editor, Ozawa, Seiichi, editor, Doya, Kenji, editor, Ikeda, Kazushi, editor, Lee, Minho, editor, and Liu, Derong, editor
Published: 2016
Full Text: View/download PDF

27. Pronunciation error detection model based on feature fusion.

Author: Zhu, Cuicui, Wumaier, Aishan, Wei, Dongping, Fan, Zhixing, Yang, Jianlei, Yu, Heng, Kadeer, Zaokere, and Wang, Liejun
Subjects: *RECOGNITION (Psychology), *PRONUNCIATION, *SPEECH perception, *ERROR functions, *PHONEME (Linguistics)
Abstract: Mispronunciation detection and diagnosis (MDD) is a specific speech recognition task that aims to recognize the phoneme sequence produced by a user, compare it with the standard phoneme sequence, and identify the type and location of any mispronunciations. However, the lack of large amounts of phoneme-level annotated data limits the performance improvement of the model. In this paper, we propose a joint training approach, Acoustic Error_Type Linguistic (AEL) that utilizes the error type information, acoustic information, and linguistic information from the annotated data, and achieves feature fusion through multiple attention mechanisms. To address the issue of uneven distribution of phonemes in the MDD data, which can cause the model to make overconfident predictions when using the CTC loss, we propose a new loss function, Focal Attention Loss, to improve the performance of the model, such as F1 score accuracy and other metrics. The proposed method in this paper was evaluated on the TIMIT and L2-Arctic public corpora. In ideal conditions, it was compared with the baseline model CNN-RNN-CTC. The F1 score, diagnostic accuracy, and precision were improved by 31.24%, 16.6%, and 17.35% respectively. Compared to the baseline model, our model reduced the phoneme error rate from 29.55% to 8.49% and showed significant improvements in other metrics. Furthermore, experimental results demonstrated that when we have a model capable of accurately obtaining pronunciation error types, our model can achieve results close to the ideal conditions. • The utilization of pronunciation error types in the pronunciation error detection model significantly enhances its performance. • Jointly using Focal loss and multi-task loss effectively resolves overconfidence caused by CTC loss. • The model excels across multiple evaluation metrics by incorporating joint loss functions and error type information. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Wavelet-based techniques for speech recognition

Author: Farooq, Omar
Subjects: 621, Phoneme recognition
Abstract: In this thesis, new wavelet-based techniques have been developed for the extraction of features from speech signals for the purpose of automatic speech recognition (ASR). One of the advantages of the wavelet transform over the short time Fourier transform (STFT) is its capability to process non-stationary signals. Since speech signals are not strictly stationary the wavelet transform is a better choice for time-frequency transformation of these signals. In addition it has compactly supported basis functions, thereby reducing the amount of computation as opposed to STFT where an overlapping window is needed.
Published: 2002

29. Local Feature Extractors Accelerating HNNP for Phoneme Recognition

Author: Janning, Ruth, Schatten, Carlotta, Schmidt-Thieme, Lars, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Siekmann, Jörg, Series editor, Lutz, Carsten, editor, and Thielscher, Michael, editor
Published: 2014
Full Text: View/download PDF

30. Intelligibility Assessment of the De-Identified Speech Obtained Using Phoneme Recognition and Speech Synthesis Systems

Author: Justin, Tadej, Mihelič, France, Dobrišek, Simon, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Siekmann, Jörg, Series editor, Sojka, Petr, editor, Horák, Aleš, editor, Kopeček, Ivan, editor, and Pala, Karel, editor
Published: 2014
Full Text: View/download PDF

31. Classification of a Sequence of Objects with the Fuzzy Decoding Method

Author: Savchenko, Andrey V., Savchenko, Lyudmila V., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Siekmann, Jörg, Series editor, Cornelis, Chris, editor, Kryszkiewicz, Marzena, editor, Ślȩzak, Dominik, editor, Ruiz, Ernestina Menasalvas, editor, Bello, Rafael, editor, and Shang, Lin, editor
Published: 2014
Full Text: View/download PDF

32. Semi-automated Speaker Adaptation: How to Control the Quality of Adaptation?

Author: Savchenko, Andrey V., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Elmoataz, Abderrahim, editor, Lezoray, Olivier, editor, Nouboud, Fathallah, editor, and Mammass, Driss, editor
Published: 2014
Full Text: View/download PDF

33. Incorporation of Manner of Articulation Constraint in LSTM for Speech Recognition.

Author: Pradeep, R. and Rao, K. Sreenivasa
Subjects: *RECURRENT neural networks, *SPEECH perception, *AUTOMATIC speech recognition, *COURTESY, *SHORT-term memory, *ERROR rates
Abstract: The variants of recurrent neural networks such as long short-term memory (LSTM) and gated recurrent unit are successful in sequence modelling such as automatic speech recognition. However, the decoded sequence is prune to have false substitutions, insertions and deletions. In our work, we investigate the outcome of the hidden layers in LSTM trained on TIMIT dataset. We found interestingly that the first hidden layer was capturing information related to some broad manners of articulation. The successive hidden layers try to cluster among the broad manners of articulation. We detected two broad manners of articulation, namely sonorants (vowels, semi-vowels, nasals) and obstruents (fricatives, stops, affricates) by exploiting the spectral flatness measure (SFM) on the linear prediction coefficients. We define a additional gate called manner of articulation gate that is high if the broad manners of articulation of tth frame are same as that of (t + 1) th frame. The manner of articulation detection is embedded at the output of the activation gate of LSTM at the first hidden layer. By doing so, the sonorants being substituted as obstruents are minimized at the output layer. The proposed method decreased the phone error rates by 0.7% when evaluated on the core test set of the TIMIT. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

34. Temperature controlled PSO on optimizing the DBN parameters for phoneme classification.

Author: Laxmi Sree, B. R. and Vijaya, M. S.
Subjects: AUTOMATIC speech recognition, ARTIFICIAL neural networks, PHONEME (Linguistics), PARTICLE swarm optimization, TAMIL language
Abstract: Speech recognition has become an essential component to communicate with the latest gadgets and machines in ease through speech. Phoneme classification model for phonemes in Tamil continuous speech is built here by exploring the power of deep belief network (DBN), a powerful neural network architecture that is capable of learning complex problems. But building an efficient DBN highly relies on several parameters like number of layers, number of neurons, connection weights and bias. The effect of increasing the number of layers in DBN for phoneme recognition has been studied in our previous experiments. In addition, a methodology which employed particle swarm optimization (PSO) or its variants second generation PSO (SGPSO) and new method PSO (NMPSO) for optimizing the connection weights and bias of the DBN for phoneme classification were studied in our earlier work. Pre-training DBN with PSO faced the problem of particle stagnation and took longer time to converge, whereas DBN with SGPSO, NMPSO converges faster but still suffers from particle stagnation which prevents it from reaching an optimal solution. Here we try to minimize stagnation of particles in the population in addition to faster convergence by proposing a new improved PSO, named Temperature controlled TPSO to optimize the initial connection weights and bias parameters that controls the DBN efficiency. TPSO seems to converge faster with better optimizing the DBN connection weights and bias parameters when compared to the existing ones with reduced stagnation of population. The TPSO-DBN is designed and applied on a phoneme classification problem for Tamil continuous speech and found to classify phonemes comparatively better with a classification accuracy of 89.2%. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

35. Speech recognition using cepstral articulatory features.

Author: Najnin, Shamima and Banerjee, Bonny
Subjects: *AUTOMATIC speech recognition, *ARTICULATION (Speech), *CEPSTRUM analysis (Mechanics), *ACCURACY, *PHONEME (Linguistics), *PERFORMANCE evaluation
Abstract: Abstract Though speech recognition has been widely investigated in the past decades, the role of articulation in recognition has received scant attention. Recognition accuracy increases when recognizers are trained with acoustic features in conjunction with articulatory ones. Traditionally, acoustic features are represented by mel-frequency cepstral coefficients (MFCCs) while articulatory features are represented by the locations or trajectories of the articulators. We propose the articulatory cepstral coefficients (ACCs) as features which are the cepstral coefficients of the time-location articulatory signal. We show that ACCs yield state-of-the-art results in phoneme classification and recognition on benchmark datasets over a wide range of experiments. The similarity of MFCCs and ACCs and their superior performance in isolation and conjunction indicate that common algorithms can be effectively used for acoustic and articulatory signals. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

36. Progress of machine learning based automatic phoneme recognition and its prospect

Author: Mousumi Malakar and Ravindra B. Keskar
Subjects: Linguistics and Language, Vocabulary, Phoneme recognition, Computer science, media_common.quotation_subject, Machine learning, computer.software_genre, ComputingMethodologies_ARTIFICIALINTELLIGENCE, Language and Linguistics, Application domain, Set (psychology), media_common, business.industry, Communication, Automatic speech processing, SIGNAL (programming language), Computer Science Applications, Identification (information), ComputingMethodologies_PATTERNRECOGNITION, Modeling and Simulation, Computer Vision and Pattern Recognition, Artificial intelligence, business, computer, Software, Scope (computer science)
Abstract: A phoneme is the smallest perceptually distinct sound unit that can be distinguished among words in a particular language. Every language has its own set of phonemes, and all possible words can be considered as ordered sequences of phonemes.The total number of phonemes contained in a language is always very few in comparison to the size of the vocabulary supported by the language. These facts have made phoneme recognition an attractive proposition in the entire journey of the Automatic Speech Processing (ASP) till date. As a result, the classification and recognition of phonemes are considered as the primary tasks of automatic speech recognition (ASR) systems irrespective of application domain. The dynamic nature of phonemes and several sources of their variability create lots of barriers in accurate identification of phonemes from an acoustic signal. The contribution of Machine Learning (ML) based techniques in overcoming these obstructions in automatic phoneme recognition (APR) is remarkable. Nowadays with lot of data availability, ML based ASR is preferred because of its simplicity over acoustic-phonetic based methods. The ML based techniques do not follow the conventional method based on identification of acoustic properties. Rather, ML techniques build their own trained model (algorithm) using readily available data. They do so by finding out the hidden patterns in speech signals, and acquire predictive intelligence through learning. Therefore, ML techniques can be said to provide a more generalized model for phoneme classification. In this paper, we present a comprehensive survey of ML tools to build phoneme recognizers. We also highlight some applications of speech (especially phoneme) recognition which illustrate the current scope as well as future prospects of APR.
Published: 2021

37. Introduction

Author: Vasquez, Daniel, Gruhn, Rainer, Minker, Wolfgang, Vasquez, Daniel, Gruhn, Rainer, and Minker, Wolfgang
Published: 2013
Full Text: View/download PDF

38. Extending the Hierarchical Scheme: Inter and Intra Phonetic Information

Author: Vasquez, Daniel, Gruhn, Rainer, Minker, Wolfgang, Vasquez, Daniel, Gruhn, Rainer, and Minker, Wolfgang
Published: 2013
Full Text: View/download PDF

39. Phoneme Recognition Task

Author: Vasquez, Daniel, Gruhn, Rainer, Minker, Wolfgang, Vasquez, Daniel, Gruhn, Rainer, and Minker, Wolfgang
Published: 2013
Full Text: View/download PDF

40. On the Use of Phoneme Lattices in Spoken Language Understanding

Author: Švec, Jan, Šmídl, Luboš, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Habernal, Ivan, editor, and Matoušek, Václav, editor
Published: 2013
Full Text: View/download PDF

41. Phoneme Recognition Using Support Vector Machine and Different Features Representations

Author: Amami, Rimah, Ayed, Dorra Ben, Ellouze, Noureddine, Omatu, Sigeru, editor, De Paz Santana, Juan F., editor, González, Sara Rodríguez, editor, Molina, Jose M., editor, Bernardos, Ana M., editor, and Rodríguez, Juan M. Corchado, editor
Published: 2012
Full Text: View/download PDF

42. Development of a Broadcast Sound Receiver for Elderly Persons

Author: Komori, Tomoyasu, Imai, Atsushi, Seiyama, Nobumasa, Takou, Reiko, Takagi, Tohru, Oikawa, Yasuhiro, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Miesenberger, Klaus, editor, Karshmer, Arthur, editor, Penaz, Petr, editor, and Zagler, Wolfgang, editor
Published: 2012
Full Text: View/download PDF

43. Dysarthric Speech Classification Using Hierarchical Multilayer Perceptrons and Posterior Rhythmic Features

Author: Selouani, Sid-Ahmed, Dahmani, Habiba, Amami, Riadh, Hamam, Habib, Kacprzyk, Janusz, editor, Corchado, Emilio, editor, Snášel, Václav, editor, Sedano, Javier, editor, Hassanien, Aboul Ella, editor, Calvo, José Luis, editor, and Ślȩzak, Dominik, editor
Published: 2011
Full Text: View/download PDF

44. Properties of Non-native Speech

Author: Gruhn, Rainer E., Minker, Wolfgang, Nakamura, Satoshi, Gruhn, Rainer E., Minker, Wolfgang, and Nakamura, Satoshi
Published: 2011
Full Text: View/download PDF

45. Multiple Source Phoneme Recognition Aided by Articulatory Features

Author: Kane, Mark, Carson-Berndsen, Julie, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Mehrotra, Kishan G., editor, Mohan, Chilukuri K., editor, Oh, Jae C., editor, Varshney, Pramod K., editor, and Ali, Moonis, editor
Published: 2011
Full Text: View/download PDF

46. Kernel-Based Lip Shape Clustering with Phoneme Recognition for Real-Time Voice Driven Talking Face

Author: Shih, Po-Yi, Wang, Jhing-Fa, Chen, Zong-You, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Zhang, Liqing, editor, Lu, Bao-Liang, editor, and Kwok, James, editor
Published: 2010
Full Text: View/download PDF

47. Phoneme Recognition Using Sparse Random Projections and Ensemble Classifiers

Author: Atsonios, Ioannis, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Solé-Casals, Jordi, editor, and Zaiats, Vladimir, editor
Published: 2010
Full Text: View/download PDF

48. The Growing Hierarchical Recurrent Self Organizing Map for Phoneme Recognition

Author: Jlassi, Chiraz, Arous, Najet, Ellouze, Noureddine, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Solé-Casals, Jordi, editor, and Zaiats, Vladimir, editor
Published: 2010
Full Text: View/download PDF

49. Articulation based admissible wavelet packet feature based on human cochlear frequency response for TIMIT speech recognition

Author: Astik Biswas, P.K. Sahu, Anirban Bhowmick, and Mahesh Chandra
Subjects: Speech recognition, Wavelet packets, ERB scale, HMM, Phoneme recognition, Engineering (General). Civil engineering (General), TA1-2040
Abstract: To deal with non-stationary and quasi-stationary signals, wavelet transform has been used as an effective tool for the time-frequency analysis. In the recent years, wavelet transform has been used extensively for feature extraction in noisy speech recognition. These filters have the benefit of having frequency bands spacing similar to the auditory Equivalent Rectangular Bandwidth (ERB) scale. Central frequencies of ERB are equally distributed with the frequency response of the human cochlea. This paper deals with the speaker-independent Automatic Speech Recognition (ASR) system for continuous speech. This Hidden Markov Model (HMM) based ASR system was developed for English using recordings of four regions taken from TIMIT database. A new set of features were derived using wavelet packet transform’s multi-resolution capabilities and having an advantage of ERB filter based on the human cochlea. New set of wavelet features have shown significant improvements in the noisy environment, especially at low SNR values.
Published: 2014
Full Text: View/download PDF

50. Using Virtual Characters as TV Presenters

Author: Oyarzun, David, Lehr, Maider, Ortiz, Amalia, del Puy Carretero, Maria, Ugarte, Alejandro, Vivanco, Karmelo, García-Alonso, Alejandro, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Rangan, C. Pandu, editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Hui, Kin-chuen, editor, Pan, Zhigeng, editor, Chung, Ronald Chi-kit, editor, Wang, Charlie C. L., editor, Jin, Xiaogang, editor, Göbel, Stefan, editor, and Li, Eric C.-L., editor
Published: 2007
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

456 results on '"Phoneme recognition"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources