13 results on '"Elmar Nöth"'
Search Results
2. Transfer learning helps to improve the accuracy to classify patients with different speech disorders in different languages
- Author
-
Elmar Nöth, Jan Rusz, Cristian David Rios-Urrego, Tomas Arias-Vergara, Juan Rafael Orozco-Arroyave, Maria Schuster, and Juan Camilo Vásquez-Correa
- Subjects
Czech ,Computer science ,02 engineering and technology ,computer.software_genre ,01 natural sciences ,Convolutional neural network ,German ,Artificial Intelligence ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,010306 general physics ,Artificial neural network ,business.industry ,Deep learning ,language.human_language ,Test set ,Signal Processing ,language ,Computer-aided ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Transfer of learning ,computer ,Software ,Natural language processing - Abstract
Patients suffering from neurodegenerative disorders such as Parkinson’s or Huntington’s disease exhibit speech impairments that affect their communication capabilities. The automatic assessment of the speech of the patients allows to develop computer aided tools to support the diagnosis and to evaluate the disease severity, which helps clinicians to make timely decisions about the treatment of the patients. This paper extends our previous studies about methods to classify patients with neurodegenerative diseases from speech. The proposed approach considers convolutional neural networks trained with time frequency representations and a transfer learning strategy to classify different speech impairments in patients that are native of different languages. The transfer learning schemes aim to improve the accuracy of the models when the weights of a neural network are initialized with utterances from a different corpus than the one used for the test set. The proposed methodology is evaluated with speech data from Parkinson’s disease patients, who are Spanish, German, and Czech native speakers, Huntington’s disease patients, who are Czech native speakers, and English native speakers affected by laryngeal impairments. We performed experiments in two scenarios: (1) transfer learning among languages, where a base model is transferred to classify patients with the same disease, but who speak a different language, and (2) transfer learning among diseases, where the base model is transferred to a corpus from patients with a different disease. The results suggest that the transfer learning schemes improve the accuracy in the target corpus only when the base model is accurate enough to transfer the knowledge to the target corpus. This behavior is observed in different scenarios of both transfer learning among languages and diseases.
- Published
- 2021
3. Automatic Score of Articulatory Distortion in Adults with Dysarthria
- Author
-
Viviana Mendoza Ramos, Juan Camilo Vasquez-Correa, Elmar Nöth, Marc De Bodt, and Gwen Van Nuffelen
- Subjects
History ,Polymers and Plastics ,Business and International Management ,Industrial and Manufacturing Engineering - Published
- 2022
4. Identifying failure root causes by visualizing parameter interdependencies with spectrograms
- Author
-
Lukas Baier, Elmar Nöth, Peter Schuderer, Jörg Franke, Julian Frommherz, and Toni Donhauser
- Subjects
0209 industrial biotechnology ,Root (linguistics) ,Audio signal ,Computer science ,media_common.quotation_subject ,Supply chain ,Process (computing) ,02 engineering and technology ,Root cause ,computer.software_genre ,Industrial and Manufacturing Engineering ,Interdependence ,Identification (information) ,020901 industrial engineering & automation ,Hardware and Architecture ,Control and Systems Engineering ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Quality (business) ,Data mining ,computer ,Software ,media_common - Abstract
Fast identification of failure root causes is a major task in minimizing rejects in manufacturing. Due to increasing complexity of products and supply chains many interdependencies affect the final product quality. As knowledge about every possible interdependence is hardly held by individuals, data analysis strategies are required to evaluate captured information. Hence, we propose a root cause identification method for determining the main influencing factors on end products failing end of line tests. Additionally, graphical representation of calculation results in spectrograms similar to audio signals facilitates human interpretation. The evaluation of the proposed method on the basis of a use case proves the applicability in a real scenario. The method is able to identify the root cause for rejects within short periods of time. In this specific case it shortened the analysis time by a factor of about 50. In the future, it empowers smart production systems to automatically identify failure root causes and to take countermeasures like adjusting process parameters.
- Published
- 2019
5. NeuroSpeech: An open-source software for Parkinson's speech analysis
- Author
-
Milos Cernak, Najim Dehak, Jesús Francisco Vargas-Bonilla, Julius Hannink, Phani Sankar Nidadavolu, Heidi Christensen, Maria Yancheva, Alyssa Vann, Nikolai Vogler, Hamidreza Chinaei, Frank Rudzicz, Elmar Nöth, Tobias Bocklet, Juan Camilo Vásquez-Correa, Raman Arora, and Juan Rafael Orozco-Arroyave
- Subjects
Computer science ,Speech recognition ,Intelligibility (communication) ,computer.software_genre ,01 natural sciences ,03 medical and health sciences ,Dysarthria ,0302 clinical medicine ,Software ,Artificial Intelligence ,Rating scale ,0103 physical sciences ,medicine ,Phonation ,Electrical and Electronic Engineering ,Prosody ,010301 acoustics ,computer.programming_language ,business.industry ,Applied Mathematics ,Python (programming language) ,Speech processing ,Computational Theory and Mathematics ,Signal Processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Statistics, Probability and Uncertainty ,medicine.symptom ,business ,computer ,030217 neurology & neurosurgery ,Natural language processing - Abstract
A new software for modeling pathological speech signals is presented in this paper. The software is called NeuroSpeech. This software enables the analysis of pathological speech signals considering different speech dimensions: phonation, articulation, prosody, and intelligibility. All the methods considered in the software have been validated in previous experiments and publications. The current version of NeuroSpeech was developed to model dysarthric speech signals from people with Parkinson's disease; however, the structure of the software allows other computer scientists or developers to include other pathologies and/or other measures in order to complement the existing options. Three different tasks can be performed with the current version of the software: (1) the modeling of the speech recordings considering the aforementioned speech dimensions, (2) the automatic discrimination of Parkinson's vs. non-Parkinson's speech signals (if the user has access to recordings of other pathologies, he/she can re-train the system to perform the detection of other diseases), and (3) the prediction of the neurological state of the patient according to the Unified Parkinson's Disease Rating Scale (UPDRS) score. The prediction of the dysarthria level according to the Frenchay Dysarthria Assessment scale is also provided (the user can also train the system to perform the prediction of other kind of scales or degrees of severity). To the best of our knowledge, this is the first software with the characteristics described above, and we consider that it will help other researchers to contribute to the state-of-the-art in pathological speech assessment from different perspectives, e.g., from the clinical point of view for interpretation, and from the computer science point of view enabling the test of different measures and pattern recognition techniques.
- Published
- 2018
6. Characterisation of voice quality of Parkinson’s disease using differential phonological posterior features
- Author
-
Frank Rudzicz, Heidi Christensen, Elmar Nöth, Juan Rafael Orozco-Arroyave, Juan Camilo Vásquez-Correa, and Milos Cernak
- Subjects
Computer science ,Speech recognition ,01 natural sciences ,Theoretical Computer Science ,Voice analysis ,Human-Computer Interaction ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Dysarthria ,Hoarse voice ,0103 physical sciences ,Harsh voice ,medicine ,Falsetto ,Phonation ,medicine.symptom ,0305 other medical science ,Articulation (phonetics) ,010301 acoustics ,Software ,Creaky voice - Abstract
Change in voice quality (VQ) is one of the first precursors of Parkinson’s disease (PD). Specifically, impacted phonation and articulation causes the patient to have a breathy, husky-semiwhisper and hoarse voice. A goal of this paper is to characterise a VQ spectrum – the composition of non-modal phonations – of voice in PD. The paper relates non-modal healthy phonations: breathy, creaky, tense, falsetto and harsh, with disordered phonation in PD. First, statistics are learned to differentiate the modal and non-modal phonations. Statistics are computed using phonological posteriors, the probabilities of phonological features inferred from the speech signal using a deep learning approach. Second, statistics of disordered speech are learned from PD speech data comprising 50 patients and 50 healthy controls. Third, Euclidean distance is used to calculate similarity of non-modal and disordered statistics, and the inverse of the distances is used to obtain the composition of non-modal phonation in PD. Thus, pathological voice quality is characterised using healthy non-modal voice quality “base/eigenspace”. The obtained results are interpreted as the voice of an average patient with PD and can be characterised by the voice quality spectrum composed of 30% breathy voice, 23% creaky voice, 20% tense voice, 15% falsetto voice and 12% harsh voice. In addition, the proposed features were applied for prediction of the dysarthria level according to the Frenchay assessment score related to the larynx, and significant improvement is obtained for reading speech task. The proposed characterisation of VQ might also be applied to other kinds of pathological speech.
- Published
- 2017
7. Automatic detection of Voice Onset Time in voiceless plosives using gated recurrent units
- Author
-
Patricia Argüello-Vélez, Elmar Nöth, Juan Rafael Orozco-Arroyave, Maria Schuster, Juan Camilo Vásquez-Correa, Tomas Arias-Vergara, and María Claudia González-Rátiva
- Subjects
Speech production ,Computer science ,Applied Mathematics ,Speech recognition ,Voice-onset time ,020206 networking & telecommunications ,02 engineering and technology ,medicine.disease ,Task (project management) ,Recurrent neural network ,Computational Theory and Mathematics ,Artificial Intelligence ,Signal Processing ,Motor speech disorders ,otorhinolaryngologic diseases ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Electrical and Electronic Engineering ,Statistics, Probability and Uncertainty - Abstract
Voice Onset Time (VOT) has been used by researchers as an acoustic measure in order to gain some understanding about the impact of different motor speech disorders in speech production. However, VOT values are usually obtained manually, which is expensive and time consuming. In this paper we proposed a method for the automatic detection of VOT based on pre-trained Recurrent Neural Networks with Gated Recurrent Units (GRUs). Speech recordings from 50 Spanish native speakers from Colombia (25 male) are considered for the experiments. The recordings include the utterance of the diadochokinesis task /pa-ta-ka/ which is typically used for the evaluation of motor speech disorders like those caused due to Parkinson's disease. Additionally, the diadochokinesis task allows us to train a system to detect the VOT of voiceless plosive sounds in intermediate positions. Acoustic analysis is performed by extracting different temporal and spectral features from the recordings. According to the results, it is possible to detect the VOT with F1-score values of 0.66 for , 0.75 for , and 0.78 for when the predicted values are compared with respect to the manual VOT labels.
- Published
- 2020
8. Automatic Quantification of Speech Intelligibility in Patients After Treatment for Oral Squamous Cell Carcinoma
- Author
-
Elmar Nöth, Florian Stelzle, Andreas Maier, Tobias Bocklet, Emeka Nkenke, Friedrich Wilhelm Neukam, Maria Schuster, and Christian Knipfer
- Subjects
Adult ,Male ,medicine.medical_specialty ,Adolescent ,medicine.medical_treatment ,Mandible ,Speech Therapy ,Intelligibility (communication) ,Surgical Flaps ,Cohort Studies ,Speech Recognition Software ,Young Adult ,Tongue ,otorhinolaryngologic diseases ,Carcinoma ,medicine ,Humans ,Mouth Floor ,Neoadjuvant therapy ,Aged ,Neoplasm Staging ,Aged, 80 and over ,business.industry ,Speech Intelligibility ,Multimodal therapy ,Neck dissection ,Middle Aged ,Plastic Surgery Procedures ,Alveolectomy ,medicine.disease ,Neoadjuvant Therapy ,Tongue Neoplasms ,Surgery ,Radiation therapy ,Mandibular Neoplasms ,stomatognathic diseases ,Cross-Sectional Studies ,medicine.anatomical_structure ,Otorhinolaryngology ,Carcinoma, Squamous Cell ,Neck Dissection ,Female ,Mouth Neoplasms ,Radiotherapy, Adjuvant ,Radiology ,Oral Surgery ,business - Abstract
Treatment of oral carcinomas often causes reduced speech intelligibility. It was the aim of this study to objectively evaluate the speech intelligibility of patients after multimodal therapy for oral squamous cell carcinoma (OSCC) with a computer-based, automatic speech recognition system.The speech intelligibility of 59 patients after multimodal tumor treatment for OSCC, located at the lateral tongue, floor of the mouth, or the alveolar crest of the lower jaw, was objectively analyzed by a computer-based speech recognition system that calculates the percentage of correct word recognition (WR).The patients' WR was significantly reduced compared with a healthy control group without speech impairment (P ≤ .001). Higher T-classification was associated with a reduced WR (P.01). Tumors located at the tongue showed a significantly higher WR than tumors at the floor of the mouth or the alveolar crest (P ≤ .001). Surgical resection and reconstruction of the lower jaw bone significantly reduced the WR (P ≤ .001) compared with cases without osseous tumor infiltration.Speech intelligibility after treatment for OSCC, objectively quantified by a standardized automatic speech recognition system, is reduced for increasing tumor size, increasing resection volume, and tumor localization near the lower jaw. Surgical reconstruction techniques seem to have an impact on speech intelligibility.
- Published
- 2011
9. A scalable architecture for multilingual speech recognition on embedded devices
- Author
-
Elmar Nöth, Rainer Gruhn, and Martin Raab
- Subjects
Linguistics and Language ,business.industry ,Computer science ,Communication ,Speech recognition ,Codebook ,Acoustic model ,Machine learning ,computer.software_genre ,Speech processing ,Mixture model ,Language and Linguistics ,Computer Science Applications ,Modeling and Simulation ,Embedded system ,Scalability ,Multilingualism ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Hidden Markov model ,computer ,Software ,Decoding methods - Abstract
In-car infotainment and navigation devices are typical examples where speech based interfaces are successfully applied. While classical applications are monolingual, such as voice commands or monolingual destination input, the trend goes towards multilingual applications. Examples are music player control or multilingual destination input. As soon as more languages are considered the training and decoding complexity of the speech recognizer increases. For large multilingual systems, some kind of parameter tying is needed to keep the decoding task feasible on embedded systems with limited resources. A traditional technique for this is to use a semi-continuous Hidden Markov Model as the acoustic model. The monolingual codebook on which such a system relies is not appropriate for multilingual recognition. We introduce Multilingual Weighted Codebooks that give good results with low decoding complexity. These codebooks depend on the actual language combination and increase the training complexity. Therefore an algorithm is needed that can reduce the training complexity. Our first proposal are mathematically motivated projections between Hidden Markov Models defined in Gaussian spaces. Although theoretically optimal, these projections were difficult to employ directly in speech decoders. We found approximated projections to be most effective for practical application, giving good performance without requiring major modifications to the common speech recognizer architecture. With a combination of the Multilingual Weighted Codebooks and Gaussian Mixture Model projections we create an efficient and scalable architecture for non-native speech recognition. Our new architecture offers a solution to the combinatoric problems of training and decoding for multiple languages. It builds new multilingual systems in only 0.002% of the time of a traditional HMM training, and achieves comparable performance on foreign languages.
- Published
- 2011
10. PEAKS – A system for the automatic evaluation of voice and speech disorders
- Author
-
Anton Batliner, Andreas Maier, Frank Rosanowski, Tino Haderlein, Ulrich Eysholdt, Elmar Nöth, and Maria Schuster
- Subjects
Linguistics and Language ,Computer science ,business.industry ,Communication ,Speech recognition ,Speech processing ,Language and Linguistics ,Computer Science Applications ,Voice analysis ,Correlation ,Modeling and Simulation ,otorhinolaryngologic diseases ,The Internet ,Computer Vision and Pattern Recognition ,ddc:004 ,business ,Prosody ,Software - Abstract
We present a novel system for the automatic evaluation of speech and voice disorders. The system can be accessed via the internet platform-independently. The patient reads a text or names pictures. His or her speech is then analyzed by automatic speech recognition and prosodic analysis. For patients who had their larynx removed due to cancer and for children with cleft lip and palate we show that we can achieve significant correlations between the automatic analysis and the judgment of human experts in a leave-one-out experiment (p
- Published
- 2009
11. Automatic pronunciation scoring of words and sentences independent from the non-native’s first language
- Author
-
Christian Hacker, Satoshi Nakamura, Elmar Nöth, Rainer Gruhn, and Tobias Cincarek
- Subjects
Computer science ,business.industry ,First language ,Speech recognition ,Foreign language ,Pronunciation ,computer.software_genre ,language.human_language ,Theoretical Computer Science ,Human-Computer Interaction ,German ,Feature (machine learning) ,language ,Artificial intelligence ,Computational linguistics ,Set (psychology) ,business ,computer ,Software ,Natural language processing ,Sentence - Abstract
This paper describes an approach for automatic scoring of pronunciation quality for non-native speech. It is applicable regardless of the foreign language student's mother tongue. Sentences and words are considered as scoring units. Additionally, mispronunciation and phoneme confusion statistics for the target language phoneme set are derived from human annotations and word level scoring results using a Markov chain model of mispronunciation detection. The proposed methods can be employed for building a part of the scoring module of a system for computer assisted pronunciation training (CAPT). Methods from pattern and speech recognition are applied to develop appropriate feature sets for sentence and word level scoring. Besides features well-known from and approved in previous research, e.g. phoneme accuracy, posterior score, duration score and recognition accuracy, new features such as high-level phoneme confidence measures are identified. The proposed method is evaluated with native English speech, non-native English speech from German, French, Japanese, Indonesian and Chinese adults and non-native speech from German school children. The speech data are annotated with tags for mispronounced words and sentence level ratings by native English teachers. Experimental results show, that the reliability of automatic sentence level scoring by the system is almost as high as the average human evaluator. Furthermore, a good performance for detecting mispronounced words is achieved. In a validation experiment, it could also be verified, that the system gives the highest pronunciation quality scores to 90% of native speakers' utterances. Automatic error diagnosis based on a automatically derived phoneme mispronunciation statistic showed reasonable results for five non-native speaker groups. The statistics can be exploited in order to provide the non-native feedback on mispronounced phonemes.
- Published
- 2009
12. Integrated recognition of words and prosodic phrase boundaries
- Author
-
Volker Warnke, Heinrich Niemann, Elmar Nöth, and Florian Gallwitz
- Subjects
Linguistics and Language ,Parsing ,Phrase ,Computer science ,business.industry ,Communication ,Speech recognition ,Word error rate ,computer.software_genre ,Language and Linguistics ,Computer Science Applications ,ComputingMethodologies_PATTERNRECOGNITION ,Modeling and Simulation ,Classifier (linguistics) ,Computer Vision and Pattern Recognition ,Language model ,Artificial intelligence ,business ,Hidden Markov model ,Prosody ,computer ,Software ,Natural language processing ,Word (computer architecture) - Abstract
In this paper, we present an integrated approach for recognizing both the word sequence and the syntactic–prosodic structure of a spontaneous utterance. The approach aims at improving the performance of the understanding component of speech understanding systems by exploiting not only acoustic–phonetic and syntactic information, but also prosodic information directly within the speech recognition process. Whereas spoken utterances are typically modelled as unstructured word sequences in the speech recognizer, our approach includes phrase boundary information in the language model and provides HMMs to model the acoustic and prosodic characteristics of phrase boundaries. This methodology has two major advantages compared to purely word-based speech recognizers. First, additional syntactic–prosodic boundaries are determined by the speech recognizer which facilitates parsing and resolve syntactic and semantic ambiguities. Second – after having removed the boundary information from the result of the recognizer – the integrated model yields a 4% relative word error rate (WER) reduction compared to a traditional word recognizer. The boundary classification performance is equal to that of a separate prosodic classifier operating on the word recognizer output, thus making a separate classifier unnecessary for this task and saving the computation time involved. Compared to the baseline word recognizer, the integrated word-and-boundary recognizer does not involve any computational overhead.
- Published
- 2002
13. Speech intelligibility in patients with oral squamous cell carcinoma—a prospective study based on automatic, computer-based speech analysis
- Author
-
Florian Stelzle, Elmar Nöth, Emeka Nkenke, Maria Schuster, Friedrich-Wilhelm Neukam, Werner Adler, and Christian Knipfer
- Subjects
medicine.medical_specialty ,Otorhinolaryngology ,business.industry ,Computer based ,Medicine ,Surgery ,Basal cell ,In patient ,Oral Surgery ,Audiology ,business ,Prospective cohort study - Published
- 2013
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.