121 results on '"Linguistic Data Consortium"'
Search Results
2. Taming Digital Texts, Voices and Images for the Wild: Models and Methods for Handling Unconventional Corpora to Engage the Public
- Author
-
Corrigan, Karen P., Mearns, Adam, Corrigan, Karen P., editor, and Mearns, Adam, editor
- Published
- 2016
- Full Text
- View/download PDF
3. Creation and Analysis of (Agriculturally) Speech Database for Uttarakhand
- Author
-
Riyal, Manoj Kumar, Khanduri, Vinod Prasad, Rajput, Nikhil Kumar, and Irfan, Nagma
- Published
- 2016
- Full Text
- View/download PDF
4. An Open Linguistic Infrastructure for Annotated Corpora
- Author
-
Ide, Nancy, Gurevych, Iryna, editor, and Kim, Jungi, editor
- Published
- 2013
- Full Text
- View/download PDF
5. Learning to Match Names Across Languages
- Author
-
Mani, Inderjeet, Yeh, Alex, Condon, Sherri, Poibeau, Thierry, editor, Saggion, Horacio, editor, Piskorski, Jakub, editor, and Yangarber, Roman, editor
- Published
- 2013
- Full Text
- View/download PDF
6. Resources
- Author
-
Leroy, Gondy and Leroy, Gondy
- Published
- 2011
- Full Text
- View/download PDF
7. Quantitative Variation in Korean Case Ellipsis: Implications for Case Theory
- Author
-
Lee, Hanjung, de Hoop, Helen, editor, and de Swart, Peter, editor
- Published
- 2009
- Full Text
- View/download PDF
8. Tools and Resources for Visualising Conversational-Speech Interaction
- Author
-
Campbell, Nick, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Kipp, Michael, editor, Martin, Jean-Claude, editor, Paggio, Patrizia, editor, and Heylen, Dirk, editor
- Published
- 2009
- Full Text
- View/download PDF
9. The Czech Broadcast Conversation Corpus
- Author
-
Kolář, Jáchym, Švec, Jan, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Matoušek, Václav, editor, and Mautner, Pavel, editor
- Published
- 2009
- Full Text
- View/download PDF
10. The Revival of US Government MT Research in 1990
- Author
-
Wilks, Yorick and Wilks, Yorick
- Published
- 2009
- Full Text
- View/download PDF
11. The Rich Transcription 2007 Meeting Recognition Evaluation
- Author
-
Fiscus, Jonathan G., Ajot, Jerome, Garofolo, John S., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Stiefelhagen, Rainer, editor, Bowers, Rachel, editor, and Fiscus, Jonathan, editor
- Published
- 2008
- Full Text
- View/download PDF
12. Adapting Morphology for Arabic Information Retrieval*
- Author
-
Darwish, Kareem, Oard, Douglas W., Ide, Nancy, editor, Véronis, Jean, editor, Baayen, Harald, editor, Church, Kenneth W., editor, Klavans, Judith, editor, Barnard, David T., editor, Tufis, Dan, editor, Llisterri, Joaquim, editor, Johansson, Stig, editor, Mariani, Joseph, editor, Soudi, Abdelhadi, editor, Bosch, Antal van den, editor, and Neumann, Günter, editor
- Published
- 2007
- Full Text
- View/download PDF
13. On Arabic Transliteration
- Author
-
Habash, Nizar, Soudi, Abdelhadi, Buckwalter, Timothy, Ide, Nancy, editor, Véronis, Jean, editor, Baayen, Harald, editor, Church, Kenneth W., editor, Klavans, Judith, editor, Barnard, David T., editor, Tufis, Dan, editor, Llisterri, Joaquim, editor, Johansson, Stig, editor, Mariani, Joseph, editor, Soudi, Abdelhadi, editor, Bosch, Antal van den, editor, and Neumann, Günter, editor
- Published
- 2007
- Full Text
- View/download PDF
14. Issues in Arabic Morphological Analysis
- Author
-
Buckwalter, Timothy, Ide, Nancy, editor, Véronis, Jean, editor, Baayen, Harald, editor, Church, Kenneth W., editor, Klavans, Judith, editor, Barnard, David T., editor, Tufis, Dan, editor, Llisterri, Joaquim, editor, Johansson, Stig, editor, Mariani, Joseph, editor, Soudi, Abdelhadi, editor, Bosch, Antal van den, editor, and Neumann, Günter, editor
- Published
- 2007
- Full Text
- View/download PDF
15. Detection of Dialogue Acts Using Perplexity-Based Word Clustering
- Author
-
Mporas, Iosif, Lyras, Dimitrios P., Sgarbas, Kyriakos N., Fakotakis, Nikos, Carbonell, Jaime G., editor, Siekmann, J\'org, editor, Matoušek, Václav, editor, and Mautner, Pavel, editor
- Published
- 2007
- Full Text
- View/download PDF
16. The Rich Transcription 2006 Spring Meeting Recognition Evaluation
- Author
-
Fiscus, Jonathan G., Ajot, Jerome, Michel, Martial, Garofolo, John S., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Renals, Steve, editor, Bengio, Samy, editor, and Fiscus, Jonathan G., editor
- Published
- 2006
- Full Text
- View/download PDF
17. The Lexico-Semantic Annotation of PDT: Some Results, Problems and Solutions
- Author
-
Bejček, Eduard, Möllerová, Petra, Straňák, Pavel, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Sojka, Petr, editor, Kopeček, Ivan, editor, and Pala, Karel, editor
- Published
- 2006
- Full Text
- View/download PDF
18. Klex: A Finite-State Transducer Lexicon of Korean
- Author
-
Han, Na-Rae, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Dough, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Yli-Jyrä, Anssi, editor, Karttunen, Lauri, editor, and Karhumäki, Juhani, editor
- Published
- 2006
- Full Text
- View/download PDF
19. The NITE XML Toolkit Meets the ICSI Meeting Corpus: Import, Annotation, and Browsing
- Author
-
Carletta, Jean, Kilgour, Jonathan, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Bengio, Samy, editor, and Bourlard, Hervé, editor
- Published
- 2005
- Full Text
- View/download PDF
20. A Study to Improve the Efficiency of a Discourse Parsing System
- Author
-
Le, Huong T., Abeysinghe, Geetha, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, and Gelbukh, Alexander, editor
- Published
- 2003
- Full Text
- View/download PDF
21. Signal Boosting for Translingual Topic Tracking : Document Expansion and n-best Translation
- Author
-
Levow, Gina-Anne, Oard, Douglas W., Croft, W. Bruce, editor, and Allan, James, editor
- Published
- 2002
- Full Text
- View/download PDF
22. Corpora for Topic Detection and Tracking
- Author
-
Cieri, Christopher, Strassel, Stephanie, Graff, David, Martey, Nii, Rennert, Kara, Liberman, Mark, Croft, W. Bruce, editor, and Allan, James, editor
- Published
- 2002
- Full Text
- View/download PDF
23. Topic Detection and Tracking Evaluation Overview
- Author
-
Fiscus, Jonathan G., Doddington, George R., Croft, W. Bruce, editor, and Allan, James, editor
- Published
- 2002
- Full Text
- View/download PDF
24. King Saud University Emotions Corpus: Construction, Analysis, Evaluation, and Comparison
- Author
-
Yousef Ajami Alotaibi, Mustafa A. Qamhan, Ali H. Meftah, Yasser Mohammad Seddiq, and Sid-Ahmed Selouani
- Subjects
General Computer Science ,media_common.quotation_subject ,emotion ,corpus ,CRNN ,computer.software_genre ,Convolutional neural network ,ResNet ,Linguistic Data Consortium ,General Materials Science ,Electrical and Electronic Engineering ,media_common ,business.industry ,Arabic language ,General Engineering ,Speech corpus ,Speech processing ,Sadness ,Surprise ,Emotional prosody ,Happiness ,digital speech processing ,Artificial intelligence ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,business ,Psychology ,computer ,lcsh:TK1-9971 ,Natural language processing - Abstract
Emotional speech recognition for the Arabic language is insufficiently tackled in the literature compared to other languages. In this paper, we present the work of creating and verifying the King Saud University Emotions (KSUEmotions) corpus, which was released by the Linguistic Data Consortium (LDC) in 2017 as the first public Arabic emotional speech corpus. KSUEmotions contains an emotional speech of twenty-three speakers from Saudi Arabia, Syria, and Yemen, and includes the emotions: neutral, happiness, sadness, surprise, and anger. The corpus content is verified in two different ways: a human perceptual test by nine listeners who rate emotional performance in audio files, and automatic emotion recognition. Two automatic emotion recognition systems are experimented with: Residual Neural Network and Convolutional Neural Network. This work also experiments with emotion recognition for the English language using the Emotional Prosody Speech and Transcripts Corpus (EPST). The current experimental work is conducted in three tracks: (i) monolingual, where independent experiments for Arabic and English are carried out, (ii) multilingual, where the Arabic and English corpora are merged in as mixed corpus, and (iii) cross-lingual, where models are trained using one language and tested using the other. A challenge encountered in this work is that the two corpora do not contain the same emotions. That problem is tackled by mapping the emotions to the arousal-valance space.
- Published
- 2021
25. Slovene Interactive Text-to-Speech Evaluation Site — SITES
- Author
-
Gros, Jerneja, Mihelič, France, Pavešić, Nikola, Goos, G., editor, Hartmanis, J., editor, van Leeuwen, J., editor, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Matousek, Václav, editor, Mautner, Pavel, editor, Ocelíková, Jana, editor, and Sojka, Petr, editor
- Published
- 1999
- Full Text
- View/download PDF
26. Creation and Analysis of (Agriculturally) Speech Database for Uttarakhand.
- Author
-
Kumar, Riyal Manoj, Prasad, Khanduri Vinod, Kumar, Rajput Nikhil, and Nagma, Irfan
- Subjects
APPLIED mathematics ,SPEECH perception ,ORAL communication ,PERIODICALS - Abstract
This study is aimed to develop speech recognition database and investigate acoustic characteristics of Hindi/Garhwali vernacular used in the Garhwal region of Uttarakhand, India, ultimately to develop a spoken language interface to reap the benefits of Information and Communication Technology by farmers. A speech database of words and sentences has been created with emphasis on agricultural commodities and acoustic properties were studied. The outcome of this study will be useful for implementing speech interfaces that will be very helpful to farmers as well as people of Uttarakhand. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
27. Transformer-IC: The Solution to Information Loss
- Author
-
Wenqian Shang, Zhigang Song, Jiazhao Chai, and Guo Yuning
- Subjects
Machine translation ,business.industry ,Computer science ,Information technology ,computer.software_genre ,Machine learning ,Information science ,Data modeling ,Linguistic Data Consortium ,Artificial intelligence ,business ,computer ,Reference model ,Arithmetic mean ,Transformer (machine learning model) - Abstract
With the development of information technology, machine translation technologies play a crucial role in cross-language communication. However, there is a problem of information loss in machine translation. In view of the common problem, this paper proposes three Transform-Information Combination (Transformer-IC) models based on information combination method. The models are based on the Transformer and select different middle-layer information to compensate the output through arithmetic mean combination method, linear transformation method and multi-layer information combination method respectively. Experimental results based on Linguistic Data Consortium (LDC) Chinese-to-English corpus and International Workshop on Spoken Language Translation (IWSLT) English-to-German corpus show that the BLEU values of all kinds of Transformer-IC model are higher than that of the reference model, in particular the arithmetic mean combination method improves the BLEU value by 1.9. Compared with the Bert model, the results show that even though the Bert model has a good performance, the Transformer-IC models are better than the Bert model. Transformer-IC models can make full use of the middle-layer information and effectively avoid the problem of information loss.
- Published
- 2021
28. Improving the performance of the speaker emotion recognition based on low dimension prosody features vector
- Author
-
Ch. V. Rama Rao, Ashishkumar Prabhakar Gudmalwar, and Anirban Dutta
- Subjects
Linguistics and Language ,Zero-crossing rate ,Artificial neural network ,Computer science ,Speech recognition ,Dimensionality reduction ,Feature vector ,02 engineering and technology ,01 natural sciences ,Language and Linguistics ,Human-Computer Interaction ,Linguistic Data Consortium ,ComputingMethodologies_PATTERNRECOGNITION ,Formant ,Emotional prosody ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Prosody ,010301 acoustics ,Software - Abstract
Speaker emotion recognition is an important research issue as it finds lots of applications in human–robot interaction, computer–human interaction, etc. This work deals with the recognition of emotion of the speaker from speech utterance. For that features like pitch, log energy, zero crossing rate, and first three formant frequencies are used. Feature vectors are constructed using the 11 statistical parameters of each feature. The Artificial Neural Network (ANN) is chosen as a classifier owing to its universal function approximation capabilities. In ANN based classifier, the time required for training the network as well as for classification depends upon the dimension of feature vector. This work focused on development of a speaker emotion recognition system using prosody features as well as reduction of dimensionality of feature vectors. Here, principle component analysis (PCA) is used for feature vector dimensionality reduction. Emotional prosody speech and transcription from Linguistic Data Consortium (LDC) and Berlin emotional databases are considered for evaluating the performance of proposed approach for seven types of emotion recognition. The performance of the proposed method is compared with existing approaches and better performance is obtained with proposed method. From experimental results it is observed that 75.32% and 84.5% recognition rate is obtained for Berlin emotional database and LDC emotional speech database respectively.
- Published
- 2018
29. DenseRecognition of Spoken Languages
- Author
-
Jaybrata Chakraborty, Bappaditya Chakraborty, and Ujjwal Bhattacharya
- Subjects
Network architecture ,business.industry ,Computer science ,Speech recognition ,SIGNAL (programming language) ,020206 networking & telecommunications ,02 engineering and technology ,Convolutional neural network ,Linguistic Data Consortium ,Task (computing) ,ComputingMethodologies_PATTERNRECOGNITION ,0202 electrical engineering, electronic engineering, information engineering ,Preprocessor ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data pre-processing ,Architecture ,business - Abstract
In the present study, we have considered a large number (27) of Indian languages for recognition from their speech signals of different sources. A dense convolutional network architecture (DenseNet) has been used for this classification task. Dynamic elimination of low energy frames from the input speech signal has been considered as a preprocessing operation. Mel-spectrogram of pre-processed speech signal is fed as input to the DenseNet architecture. Language recognition performance of this architecture has been compared with that of several state-of-the-art deep architectures which include a convolutional neural network (CNN), ResNet, CNN-BLSTM and DenseNet-BLSTM hybrid architectures. Additionally, we obtained recognition performances of a stacked BLSTM architecture fed with different sets of handcrafted features for comparison purpose. Simulations for both speaker independent and speaker dependent scenarios have been performed on two different standard datasets which include (i) IITKGP-MLILSC dataset of news clips in 27 different Indian languages and (ii) Linguistic Data Consortium (LDC) dataset of telephonic conversations in 5 different Indian languages. In each case, recognition performance of the DenseNet architecture along with Mel-spectrogram features has been found to be significantly better than all other frameworks implemented in this study.
- Published
- 2021
30. The LDC-IL Speech Corpora
- Author
-
Narayan Choudhary and D. G. Rao
- Subjects
Scheme (programming language) ,Government ,Higher education ,business.industry ,Computer science ,Data science ,Linguistic Data Consortium ,Public use ,The Internet ,business ,Human resources ,Activity-based costing ,computer ,computer.programming_language - Abstract
This paper introduces the first set of speech corpora released in 2019 by the Linguistic Data Consortium for Indian Languages (LDC-IL), a scheme under the Department of Higher Education, Ministry of Human Resource Development, Government of India. The datasets include a total of 13 scheduled languages of India, collected in various environments across length and breadth of the vast country, from a total of 5662 speakers of different age-groups with a total size of more than 1552 hours. The dataset is still growing as we prune them and make them ready for release. Unique language corpus is usually the largest available at present for these languages. Established in 2008, on the lines of the LDC of University of Pennsylvania, the LDC-IL has worked for over 10 years on various types language resources, including building the speech corpora. LDC-IL is a fully government funded project implemented by CIIL, Mysuru. Due to some restraints in the government business such as cost analysis and copyright issues, it took rather a long time to release the LDC-IL dataset for the public use. This paper gives a brief of the raw speech corpora now released and ready for public use (both commercial and non-commercial purposes). It also discusses how the two major bottlenecks of copyright and costing was addressed which held up the release of these datasets for several years.
- Published
- 2020
31. Arabic speaker recognition system based on phoneme fusion
- Author
-
Awais Mahmood
- Subjects
Fusion ,Biometrics ,Computer Networks and Communications ,Arabic ,Computer science ,Speech recognition ,020207 software engineering ,02 engineering and technology ,Speaker recognition ,language.human_language ,Linguistic Data Consortium ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Identity (object-oriented programming) ,language ,Benchmark (computing) ,Software ,Utterance - Abstract
With the increasing number and popularity of smart devices in the past few decades, especially those that execute different types of health-related applications, their security is a growing concern. Biometric authentication can be used for device security, and speaker recognition (SR) is one of its elegant forms. In this study, an SR system is developed for the security of smart devices, which identifies a person from the shortest utterance of Arabic language, achieving the highest SR rate, by using the benchmark Linguistic Data Consortium (LDC) Arabic dataset. This study focuses on the use of the consonants and vowels of the Arabic language. It is observed that certain consonants or a fusion of certain consonants in Arabic language helps to ascertain the speaker’s identity efficiently. The shortest utterance that better presents the speaker’s identity by using such consonants is revealed. The system was analyzed by using different numbers of consonants and achieved the maximum (100%) SR rate using a fusion of only three consonants of the Arabic language. These consonants or their combinations can be used to develop text for any speaker-based authentication system.
- Published
- 2020
32. An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization
- Author
-
Kamal Al-Sabahi, Zuping Zhang, Jun Long, and Khaled Alwesabi
- Subjects
FOS: Computer and information sciences ,Arabic ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,02 engineering and technology ,computer.software_genre ,Linguistic Data Consortium ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Semantic memory ,Computer Science - Computation and Language ,Multidisciplinary ,business.industry ,Latent semantic analysis ,Part of speech ,language.human_language ,Weighting ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Computation and Language (cs.CL) ,computer ,Natural language processing ,Sentence ,Word order - Abstract
The fast-growing amount of information on the Internet makes the research in automatic document summarization very urgent. It is an effective solution for information overload. Many approaches have been proposed based on different strategies, such as latent semantic analysis (LSA). However, LSA, when applied to document summarization, has some limitations which diminish its performance. In this work, we try to overcome these limitations by applying statistic and linear algebraic approaches combined with syntactic and semantic processing of text. First, the part of speech tagger is utilized to reduce the dimension of LSA. Then, the weight of the term in four adjacent sentences is added to the weighting schemes while calculating the input matrix to take into account the word order and the syntactic relations. In addition, a new LSA-based sentence selection algorithm is proposed, in which the term description is combined with sentence description for each topic which in turn makes the generated summary more informative and diverse. To ensure the effectiveness of the proposed LSA-based sentence selection algorithm, extensive experiment on Arabic and English are done. Four datasets are used to evaluate the new model, Linguistic Data Consortium (LDC) Arabic Newswire-a corpus, Essex Arabic Summaries Corpus (EASC), DUC2002, and Multilingual MSS 2015 dataset. Experimental results on the four datasets show the effectiveness of the proposed model on Arabic and English datasets. It performs comprehensively better compared to the state-of-the-art methods., Comment: This is a pre-print of an article published in Arabian Journal for Science and Engineering. The final authenticated version is available online at: https://doi.org/10.1007/s13369-018-3286-z
- Published
- 2018
33. Evaluation of an Arabic Speech Corpus of Emotions: A Perceptual and Statistical Analysis
- Author
-
Ali H. Meftah, Sid-Ahmed Selouani, and Yousef Ajami Alotaibi
- Subjects
General Computer Science ,media_common.quotation_subject ,02 engineering and technology ,Anger ,Corpus ,Linguistic Data Consortium ,030507 speech-language pathology & audiology ,03 medical and health sciences ,statistical evaluation ,modern standard Arabic ,Emotion perception ,Perception ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,media_common ,General Engineering ,020207 software engineering ,Speech corpus ,perceptual test ,Sadness ,Surprise ,speech emotion recognition ,Happiness ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,0305 other medical science ,lcsh:TK1-9971 ,Cognitive psychology - Abstract
The processing of emotion has a wide range of applications in many different fields and has become the subject of increasing interest and attention for many speech and language researchers. Speech emotion recognition systems face many challenges. One of these is the degree of naturalness of the emotions in speech corpora. To prove the ability of speakers to accurately emulate emotions and to check whether listeners could identify the intended emotion, a human perception test was designed for a new emotional speech corpus. This paper presents an exhaustive statistical and perceptual investigation of the emotional speech corpus (KSUEmotions) for Arabic King Saud University approved by the Linguistic Data Consortium. The KSUEmotions corpus was built in two phases and involved 23 native speakers (10 males and 13 females) to emulate the following five emotions: neutral, sadness, happiness, surprise, and anger. Nine listeners were participated in a blind and randomly structured human perceptual test to assess the validity of the intended emotions. Statistical tests were used to analyze the effects of speaker gender, reviewer (listener) gender, emotion type, sentence length, and the interaction between these factors. Conducted statistical tests included the two-way analysis of variance, normality, chi-square, Bonferroni, Tukey, and Mann–Whitney U tests. One of the outcomes of the study is that the speaker gender, emotion type, and interaction between emotion type and speaker gender yield significant effects on the emotion perception in this corpus.
- Published
- 2018
34. RST Signalling Corpus: a corpus of signals of coherence relations
- Author
-
Debopam Das and Maite Taboada
- Subjects
Linguistics and Language ,Computer science ,Treebank ,02 engineering and technology ,Library and Information Sciences ,computer.software_genre ,Language and Linguistics ,Education ,Linguistic Data Consortium ,Annotation ,0202 electrical engineering, electronic engineering, information engineering ,060201 languages & linguistics ,Parsing ,business.industry ,06 humanities and the arts ,16. Peace & justice ,Syntax ,Linguistics ,Rhetorical Structure Theory ,0602 languages and literature ,020201 artificial intelligence & image processing ,Artificial intelligence ,Computational linguistics ,business ,computer ,Discourse marker ,Coherence (linguistics) ,Natural language processing - Abstract
We present the RST Signalling Corpus (Das et al. in RST signalling corpus, LDC2015T10. https://catalog.ldc.upenn.edu/LDC2015T10 , 2015), a corpus annotated for signals of coherence relations. The corpus is developed over the RST Discourse Treebank (Carlson et al. in RST Discourse Treebank, LDC2002T07. https://catalog.ldc.upenn.edu/LDC2002T07 , 2002) which is annotated for coherence relations. In the RST Signalling Corpus, these relations are further annotated with signalling information. The corpus includes annotation not only for discourse markers which are considered to be the most typical (or sometimes the only type of) signals in discourse, but also for a wide array of other signals such as reference, lexical, semantic, syntactic, graphical and genre features as potential indicators of coherence relations. We describe the research underlying the development of the corpus and the annotation process, and provide details of the corpus. We also present the results of an inter-annotator agreement study, illustrating the validity and reproducibility of the annotation. The corpus is available through the Linguistic Data Consortium, and can be used to investigate the psycholinguistic mechanisms behind the interpretation of relations through signalling, and also to develop discourse-specific computational systems such as discourse parsing applications.
- Published
- 2017
35. Building a Speech and Text Corpus of Turkish: Large Corpus Collection with Initial Speech Recognition Results
- Author
-
Huseyin Polat and Saadin Oyucu
- Subjects
Text corpus ,Physics and Astronomy (miscellaneous) ,text corpus ,data acquisition ,Turkish ,Computer science ,General Mathematics ,Speech recognition ,Word error rate ,02 engineering and technology ,Session (web analytics) ,automatic speech recognition ,speech corpus ,multi-layer neural network ,natural language processing ,Linguistic Data Consortium ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,lcsh:Mathematics ,Speech corpus ,lcsh:QA1-939 ,language.human_language ,ComputingMethodologies_PATTERNRECOGNITION ,Chemistry (miscellaneous) ,language ,020201 artificial intelligence & image processing ,Language model ,0305 other medical science ,Transfer of learning - Abstract
To build automatic speech recognition (ASR) systems with a low word error rate (WER), a large speech and text corpus is needed. Corpus preparation is the first step required for developing an ASR system for a language with few argument speech documents available. Turkish is a language with limited resources for ASR. Therefore, development of a symmetric Turkish transcribed speech corpus according to the high resources languages corpora is crucial for improving and promoting Turkish speech recognition activities. In this study, we constructed a viable alternative to classical transcribed corpus preparation techniques for collecting Turkish speech data. In the presented approach, three different methods were used. In the first step, subtitles, which are mainly supplied for people with hearing difficulties, were used as transcriptions for the speech utterances obtained from movies. In the second step, data were collected via a mobile application. In the third step, a transfer learning approach to the Grand National Assembly of Turkey session records (videotext) was used. We also provide the initial speech recognition results of artificial neural network and Gaussian mixture-model-based acoustic models for Turkish. For training models, the newly collected corpus and other existing corpora published by the Linguistic Data Consortium were used. In light of the test results of the other existing corpora, the current study showed the relative contribution of corpus variability in a symmetric speech recognition task. The decrease in WER after including the new corpus was more evident with increased verified data size, compensating for the status of Turkish as a low resource language. For further studies, the importance of the corpus and language model in the success of the Turkish ASR system is shown.
- Published
- 2020
36. A Test Collection for Coreferent Mention Retrieval
- Author
-
Rashmi Sankepally, Benjamin Van Durme, Douglas W. Oard, and Tongfei Chen
- Subjects
Coreference ,business.industry ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,010102 general mathematics ,05 social sciences ,computer.software_genre ,01 natural sciences ,Test (assessment) ,Task (project management) ,Linguistic Data Consortium ,Entity linking ,Query by Example ,Artificial intelligence ,0509 other social sciences ,0101 mathematics ,050904 information & library sciences ,business ,computer ,Natural language processing ,Sentence ,computer.programming_language - Abstract
This paper introduces the coreferent mention retrieval task, in which the goal is to retrieve sentences that mention a specific entity based on a query by example in which one sentence mentioning that entity is provided. The development of a coreferent mention retrieval test collection is then described. Results are presented for five coreferent mention retrieval systems, both to illustrate the use of the collection and to specify the results that were pooled on which human coreference judgments were performed. The new test collection is built from content that is available from the Linguistic Data Consortium; the partitioning and human annotations used to create the test collection atop that content are being made freely available.
- Published
- 2018
37. Probabilistic Detection Methods for Acoustic Surveillance Using Audio Histograms
- Author
-
Rajesh M. Hegde, M. S. Reddy, and Karan Nathwani
- Subjects
business.industry ,Event (computing) ,Computer science ,Applied Mathematics ,Probabilistic logic ,Pattern recognition ,Statistical model ,computer.software_genre ,Novelty detection ,Acoustic space ,Linguistic Data Consortium ,Histogram ,Signal Processing ,Artificial intelligence ,Data mining ,Mel-frequency cepstrum ,business ,computer - Abstract
Acoustic surveillance is gaining importance given the pervasive nature of multimedia sensors being deployed in all environments. In this paper, novel probabilistic detection methods using audio histograms are proposed for acoustic event detection in a multimedia surveillance environment. The proposed detection methods use audio histograms to classify events in a well-defined acoustic space. The proposed methods belong to the category of novelty detection methods, since audio data corresponding to the event is not used in the training process. These methods hence alleviate the problem of collecting large amount of audio data for training statistical models. These methods are also computationally efficient since a conventional audio feature set like the Mel frequency cepstral coefficients in tandem with audio histograms are used to perform acoustic event detection. Experiments on acoustic event detection are conducted on the SUSAS database available from Linguistic data consortium. The performance is measured in terms of false detection rate and true detection rate. Receiver operating characteristics curves are obtained for the proposed probabilistic detection methods to evaluate their performance. The proposed probabilistic detection methods perform significantly better than the acoustic event detection methods available in literature. A cell phone-based alert system for an assisted living environment is also discussed as a future scope of the proposed method. The performance evaluation is presented as number of successful cell phone transactions. The results are motivating enough for the system to be used in practice.
- Published
- 2014
38. Linguistic Fieldwork and IRB Human Subjects Protocols
- Author
-
Denise DiPersio
- Subjects
Protocol (science) ,Linguistic Data Consortium ,Linguistics and Language ,IRB Approval ,Process (engineering) ,Approved Protocol ,Social science research ,Institutional review board ,Psychology ,Linguistics - Abstract
Linguistic fieldwork generally requires an approved protocol from an Institutional Review Board (IRB). A carefully-prepared protocol – in addition to being a useful planning tool – is likely to secure IRB approval since most fieldwork plans are well within the purview of federal human subjects regulations. This paper reviews the development of the current IRB system and its shortcomings with respect to social science research generally. The notion of a ‘fieldwork problem’ and IRB reviews is then explored with a discussion of the areas that should be addressed with particular care in fieldwork proposals. This is followed by a description of the Linguistic Data Consortium's IRB experience as an example of how linguists can approach the IRB process for a protocol that meets federal requirements and serves research goals. The final section contains a list of IRB resources for linguists available online.
- Published
- 2014
39. A Rhythm-Based Analysis of Arabic Native and Non-Native Speaking Styles
- Author
-
Yousef Ajami Alotaibi, Sid-Ahmed Selouani, Yacine Benahmed, Soumaya Gharsellaoui, Alaidine Ben Ayed, and Adel Omar Dahmane
- Subjects
Point (typography) ,Arabic ,Computer science ,business.industry ,First language ,Speech recognition ,Pronunciation ,computer.software_genre ,language.human_language ,Linguistic Data Consortium ,Rhythm ,Duration (music) ,language ,Modern Standard Arabic ,Artificial intelligence ,business ,computer ,Natural language processing ,Spoken language - Abstract
In this paper, we investigate the effect of the mother language (L1) on the rhythm of a second spoken language (L2) uttered by non-native speakers. Rhythm metrics are used to analyze this effect on non-native Arabic language speech data using a Modern Standard Arabic corpus, namely the Linguistic Data Consortium West Point corpus. A common problem with available Arabic corpora is that they usually are not time-labeled because of the time- consuming nature of such a task or because of a lack of resources. This is especially problematic if we are interested in studying rhythm metrics. To cope with this problem, we propose a framework for automatically labeling corpora using parallel processing. Such labeling allows us to perform a quantitative analysis of the rhythm of Arabic. Results show the effectiveness of acoustic rhythm metrics in analyzing variable duration patterns observed in the pronunciation of L1/L2 Arabic.
- Published
- 2013
40. Linguistic Resources for NLP
- Author
-
Mohamed Zakaria Kurdi
- Subjects
Linguistic Data Consortium ,business.industry ,Computer science ,Artificial intelligence ,business ,computer.software_genre ,computer ,Natural language processing ,Linguistics - Published
- 2016
41. Multistage Data Selection-based Unsupervised Speaker Adaptation for Personalized Speech Emotion Recognition
- Author
-
Jeong-Sik Park and Jaebok Kim
- Subjects
Computer science ,Speech recognition ,02 engineering and technology ,Machine learning ,computer.software_genre ,Linguistic Data Consortium ,Discriminative model ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,EWI-27463 ,Adaptation (computer science) ,Hidden Markov model ,Selection (genetic algorithm) ,METIS-320902 ,business.industry ,Acoustic model ,speaker adaptation ,IR-102933 ,020206 networking & telecommunications ,Speech corpus ,n/a OA procedure ,Control and Systems Engineering ,Hidden-Markov-Model ,HMI-SLT: Speech and Language Technology ,020201 artificial intelligence & image processing ,Artificial intelligence ,speech emotion detection ,business ,computer ,EC Grant Agreement nr.: FP7/611153 - Abstract
This paper proposes an efficient speech emotion recognition (SER) approach that utilizes personal voice data accumulated on personal devices. A representative weakness of conventional SER systems is the user-dependent performance induced by the speaker independent (SI) acoustic model framework. But, handheld communications devices such as smartphones provide a collection of individual voice data, thus providing suitable conditions for personalized SER that is more enhanced than the SI model framework. By taking advantage of personal devices, we propose an efficient personalized SER scheme employing maximum likelihood linear regression (MLLR), a representative speaker adaptation technique. To further advance the conventional MLLR technique for SER tasks, the proposed approach selects useful data that convey emotionally discriminative acoustic characteristics and uses only those data for adaptation. For reliable data selection, we conduct multistage selection using a log-likelihood distance-based measure and a universal background model. On SER experiments based on a Linguistic Data Consortium emotional speech corpus, our approach exhibited superior performance when compared to conventional adaptation techniques as well as the SI model framework. Graphical abstractDisplay Omitted
- Published
- 2016
42. Deep learning based parts of speech tagger for Bengali
- Author
-
Mohammad Nurul Huda, Khandaker Abdullah-Al-Mamun, and Md. Fasihul Kabir
- Subjects
Computer science ,business.industry ,Deep learning ,Speech recognition ,02 engineering and technology ,010501 environmental sciences ,Pragmatics ,computer.software_genre ,Part of speech ,01 natural sciences ,language.human_language ,Lexical item ,Linguistic Data Consortium ,Bengali ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,Hidden Markov model ,business ,computer ,Natural language processing ,Sentence ,0105 earth and related environmental sciences - Abstract
This paper describes the Part of Speech (POS) tagger for Bengali Language. Here, POS tagging is the process of assigning the part of speech tag or other lexical class marker to each and every word in a sentence. In many Natural Language Processing (NLP) applications, POS tagging is considered as the one of the basic necessary tools. Identifying the ambiguities in language lexical items is the challenging objective in the process of developing an efficient and accurate POS Tagger. Different methods of automating the process have been developed and employed for Bengali. In this paper, we report about our work on building POS tagger for Bengali using the Deep Learning. Bengali is a morphologically rich language and our taggers make use of morphological and contextual information of the words. It is observed from the experiments based on Linguistic Data Consortium (LDC) catalog number LDC2010T16 and ISBN 1-58563-561-8 corpus that 93.33% accuracy is obtained for Bengali POS tagger using the Deep Learning.
- Published
- 2016
43. ARL Arabic Dependency Treebank
- Author
-
Stephen C. Tratz
- Subjects
Parsing ,Dependency (UML) ,Computer science ,business.industry ,Arabic ,Treebank ,Technical note ,computer.software_genre ,Linguistics ,language.human_language ,Linguistic Data Consortium ,Software ,language ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
This technical note describes the US Army Research Laboratory (ARL) Arabic Dependency Treebank (AADT) for the purpose of documenting its release. The AADT was derived from existing Arabic treebanks distributed by the Linguistic Data Consortium using constituent-to-dependency conversion software written at ARL. Earlier versions of the AADT, as well as parsers trained from it, have been used in several published ARL research efforts, and, by releasing the data, we hope to facilitate additional Arabic language processing research by the greater community.
- Published
- 2016
44. On Arabic Multi-Genre Corpus Diacritization
- Author
-
Ossama Obeid, Mona Diab, Abdelati Hawwari, Kemal Oflazer, Wajdi Zaghouani, Houda Bouamor, and Mahmoud Ghoneim
- Subjects
Linguistic Data Consortium ,Computer science ,Lemmatisation ,Speech recognition ,Diacritic ,Modern Standard Arabic ,language ,Context (language use) ,Shadda ,Sentence ,Orthography ,language.human_language - Abstract
One of the characteristics of writing in Modern Standard Arabic (MSA) is that the commonly used orthography is mostly consonantal and does not provide full vocalization of the text. It sometimes includes optional diacritical marks (henceforth, diacritics or vowels).Arabic script consists of two classes of symbols: letters and diacritics. Letters comprise long vowels such as A, y, w as well as consonants. Diacritics on the other hand comprise short vowels, gemination markers, nunation markers, as well as other markers (such as hamza, the glottal stop which appears in conjunction with a small number of letters, dots on letters, elongation and emphatic markers) which in all, if present, render a more or less exact precise reading of a word. In this study, we are mostly addressing three types of diacritical marks: short vowels, nunation, and shadda (gemination).Diacritics are extremely useful for text readability and understanding. Their absence in Arabic text adds another layer of lexical and morphological ambiguity. Naturally occurring Arabic text has some percentage of these diacritics present depending on genre and domain. For instance, religious text such as the Quran is fully diacritized to minimize chances of reciting it incorrectly. So are children's educational texts. Classical poetry tends to be diacritized as well. However, news text and other genre are sparsely diacritized (e.g., around 1.5% of tokens in the United Nations Arabic corpus bear at least one diacritic (Diab et al., 2007)).In general, building models to assign diacritics to each letter in a word requires a large amount of annotated training corpora covering different topics and domains to overcome the sparseness problem. The currently available diacritized MSA corpora are generally limited to the newswire genres (those distributed by the LDC) or religion related texts such as Quran or the Tashkeela corpus. In this paper we present a pilot study where we annotate a sample of non-diacritized text extracted from five different text genres. We explore different annotation strategies where we present the data to the annotator in three modes: basic (only forms with no diacritics), intermediate (basic forms–POS tags), and advanced (a list of forms that is automatically diacritized). We show the impact of the annotation strategy on the annotation quality.It has been noted in the literature that complete diacritization is not necessary for readability Hermena et al. (2015) as well as for NLP applications, in fact, (Diab et al., 2007) show that full diacritization has a detrimental effect on SMT. Hence, we are interested in discovering the optimal level of diacritization. Accordingly, we explore different levels of diacritization. In this work, we limit our study to two diacritization schemes: FULL and MIN. For FULL, all diacritics are explicitly specified for every word. For MIN, we explore what is a minimum and optimal number of diacritics that needs to be added in order to disambiguate a given word in context and make a sentence easily readable and unambiguous for any NLP application.We conducted several experiments on a set of sentences that we extracted from five corpora covering different genres. We selected three corpora from the currently available Arabic Treebanks from the Linguistic Data Consortium (LDC). These corpora were chosen because they are fully diacritized and had undergone significant quality control, which will allow us to evaluate the anno tation accuracy as well as our annotators understanding of the task. We select a total of 16,770 words from these corpora for annotation. Three native Arabic annotators with good linguistic background annotated the corpora samples. Diab et al. (2007), define six different diacritization schemes that are inspired by the observation of the relevant naturally occurring diacritics in different texts. We adopt the FULL diacritization scheme, in which all the diacritics should be specified in a word. Annotators were asked to fully diacritize each word.The text genres were annotated following the different strategies:- Basic: In this mode, we ask for annotation of words where all diacritics are absent, including the naturally occurring ones. The words are presented in a raw tokenized format to the annotators in context.- Intermediate: In this mode, we provide the annotator with words along with their POS information. The intuition behind adding POS is to help the annotator disambiguate a word by narrowing down on the diacritization possibilities.- Advanced: In this mode, the annotation task is formulated as a selection task instead of an editng task. Annotators are provided with a list of automatically diacritized candidates and are asked to choose the correct one, if it appears in the list. Otherwise, if they are not satisfied with the given candidates, they can manually edit the word and add the correct diacritics. This technique is designed in order to reduce annotation time and especially reduce annotator workload. For each word, we generate a list of vowelized candidates using MADAMIRA (Pasha et al., 2014). MADAMIRA is able to achieve a lemmatization accuracy 99.2% and a diacritization accuracy of 86.3%. We present the annotator with the top three candidates suggested by MADAMIRA, when possible. Otherwise, only the available candidates are provided.We also provided annotators with detailed guidelines, describing our diacritization scheme and specifying how to add diacritics for each annotation strategy. We described the annotation procedure and specified how to deal with borderline cases. We also provided in the guidelines many annotated examples to illustrate the various rules and exceptions.In order to determine the most optimized annotation setup for the annotators, in terms of speed and efficiency, we test the results obtained following the three annotation strategies. These annotations are all conducted for the FULL scheme. We first calculated the number of words annotated per hour, for each annotator and in each mode. As expected, following the Advanced mode, our three annotators could annotate an average of 618.93 words per hour which is double those annotated in the Basic mode (only 302.14 words). Adding POS tags to the Basic forms, as in the Intermediate mode, does not accelerate the process much. Only − 90 more words are diacritized per hour compared to the basic mode.Then, we evaluated the Inter-Annotator Agreement (IAA) to quantify the extent to which independent annotators agree on the diacritics chosen for each word. For every text genre, two annotators were asked to annotate independently a sample of 100 words.We measured the IAA between two annotators by averaging WER (Word Error Rate) over all pairs of words. The higher the WER between two annotations, the lower their agreement. The results obtained show clearly that the Advanced mode is the best strategy to adopt for this diacritization task. It is the less confusing method on all text genres (with WER between 1.56 and 5.58).We also conducted a preliminary study for a minimum diacritization scheme. This is a diacritization scheme that encodes the most relevant differentiating diacritics to reduce confusability among words that look the same (homographs) when undiacritized but have different readings. Our hypothesis in MIN is that there is an optimal level of diacritization to render a text unambiguous for processing and enhance its readability. We showed the difficulty in defining such a scheme and how subjective this task can be.AcknowledgementThis publication was made possible by grant NPRP-6-1020-1-199 from the Qatar National Research Fund (a member of the Qatar Foundation).
- Published
- 2016
45. Detection of time varying pitch in tonal languages: an approach based on ensemble empirical mode decomposition
- Author
-
Run-tong Geng, Wei-min Su, Xiao-hua Zhu, Xinlong Wang, and Hong Hong
- Subjects
Linguistic Data Consortium ,Computer science ,Speech recognition ,Frame (networking) ,Detector ,General Engineering ,language ,Speech processing ,Mandarin Chinese ,Hilbert–Huang transform ,Subspace topology ,language.human_language ,Event (probability theory) - Abstract
A method based on ensemble empirical mode decomposition (EEMD) is proposed for accurately detecting the time varying pitch of speech in tonal languages. Unlike frame-, event-, or subspace-based pitch detectors, the time varying information of pitch within the short duration, which is of crucial importance in speech processing of tonal languages, can be accurately extracted. The Chinese Linguistic Data Consortium (CLDC) database for Mandarin Chinese was employed as standard speech data for the evaluation of the effectiveness of the method. It is shown that the proposed method provides more accurate and reliable results, particularly in estimating the tones of non-monotonically varying pitches like the third one in Mandarin Chinese. Also, it is shown that the new method has strong resistance to noise disturbance.
- Published
- 2012
46. SpatialML: annotation scheme, resources, and evaluation
- Author
-
Christy Doran, Inderjeet Mani, Janet Hitzeman, Ben Wellner, Dave Harris, Seamus Clancy, Scott A. Mardis, Rob Quimby, and Justin Richer
- Subjects
Scheme (programming language) ,Linguistics and Language ,Computer science ,business.industry ,Library and Information Sciences ,computer.software_genre ,Language and Linguistics ,Education ,Domain (software engineering) ,Linguistic Data Consortium ,Annotation ,Information extraction ,Corpus linguistics ,Artificial intelligence ,Computational linguistics ,business ,computer ,Natural language processing ,Natural language ,computer.programming_language - Abstract
SpatialML is an annotation scheme for marking up references to places in natural language. It covers both named and nominal references to places, grounding them where possible with geo-coordinates, and characterizes relationships among places in terms of a region calculus. A freely available annotation editor has been developed for SpatialML, along with several annotated corpora. Inter-annotator agreement on SpatialML extents is 91.3 F-measure on a corpus of SpatialML-annotated ACE documents released by the Linguistic Data Consortium. Disambiguation agreement on geo-coordinates on ACE is 87.93 F-measure. An automatic tagger for SpatialML extents scores 86.9 F on ACE, while a disambiguator scores 93.0 F on it. Results are also presented for two other corpora. In adapting the extent tagger to new domains, merging the training data from the ACE corpus with annotated data in the new domain provides the best performance.
- Published
- 2010
47. Arabic Speaker Recognition: Babylon Levantine Subset Case Study
- Author
-
Mansour Alsulaiman, Awais Mahmoud, Muhammad Ghulam, Youssef Alotaibi, and Mohamed A. Bencherif
- Subjects
Computer Networks and Communications ,Computer science ,Gaussian ,Speech recognition ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Speaker recognition ,Linguistic Data Consortium ,symbols.namesake ,Computer Science::Sound ,Artificial Intelligence ,symbols ,Mel-frequency cepstrum ,Hidden Markov model ,Focus (optics) ,Software - Abstract
Problem statement: Researchers on Arabic speaker recognition have used local data bases unavailable to the public. In this study we would like to investigate Arabic speaker recognition using a publically available database, namely Babylon Levantine available from the Linguistic Data Consortium (LDC). Approach: Among the different methods for speaker recognition we focus on Hidden Markov Models (HMM). We studied the effect of both the parameters of the HMM models and the size of the speech features on the recognition rate. Results: To accomplish this study, we divided the database into small and medium size datasets. For each subset, we found the effect of the system parameters on the recognition rate. The parameters we varied the number of HMM states, the number of Gaussian mixtures per state, and the number of speech features coefficients. From the results, we found that in general, the recognition rate increases with the increase in the number of mixtures, till it reaches a saturation level which depends on the data size and the number of HMM states. Conclusion/Recommendations: The effect of the number of state depends on the data size. For small data, low number of states has higher recognition rate. For larger data, the number of states has very small effect at low number of mixtures and negligible effect at high number of mixtures.
- Published
- 2010
48. Evaluating the MSA West Point Speech Corpus
- Author
-
Sid-Ahmed Selouani and Yousef Ajami Alotaibi
- Subjects
Text corpus ,Point (typography) ,business.industry ,Computer science ,Speech corpus ,Phonetics ,computer.software_genre ,Speech processing ,Linguistics ,language.human_language ,Linguistic Data Consortium ,Corpus linguistics ,Modern Standard Arabic ,language ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Compared to other major languages of the world, the Arabic language suffers from a dearth of research initiatives and research resources. As a result, Modern Standard Arabic (MSA) lacks reliable speech corpora for research in phonetics and related areas of linguistics. In recent years the Linguistic Data Consortium (LDC) published the first public MSA speech corpus designed for speech recognition experiments. That corpus was called West Point. Currently, we are using this corpus in our research experiments for speech recognition and other speech processing investigations. The aim of this paper is to evaluate the West Point Corpus from the MSA phonetic and linguistic point of view. The phonemes used and their numbers, the phoneme definitions, the labeling, and the scripts established by the West Point Corpus are included in the evaluation. Weaknesses, strengths, and discrepancies of the West Point Corpus regarding the linguistic rules and phonetic characteristics of MSA are also discussed in this paper.
- Published
- 2009
49. Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program
- Author
-
Ahmad Emami, Hong-Kwang Kuo, Hagen Soltau, Lidia Mangu, George Saon, Brian Kingsbury, and Daniel Povey
- Subjects
Acoustics and Ultrasonics ,Artificial neural network ,Machine translation ,business.industry ,Computer science ,Speech recognition ,Decision tree ,Word error rate ,computer.software_genre ,Data modeling ,Linguistic Data Consortium ,Unsupervised learning ,Artificial intelligence ,Language model ,Electrical and Electronic Engineering ,business ,computer ,Natural language processing - Abstract
This paper describes the Arabic broadcast transcription system fielded by IBM in the GALE Phase 2.5 machine translation evaluation. Key advances include the use of additional training data from the Linguistic Data Consortium (LDC), use of a very large vocabulary comprising 737 K words and 2.5 M pronunciation variants, automatic vowelization using flat-start training, cross-adaptation between unvowelized and vowelized acoustic models, and rescoring with a neural-network language model. The resulting system achieves word error rates below 10% on Arabic broadcasts. Very large scale experiments with unsupervised training demonstrate that the utility of unsupervised data depends on the amount of supervised data available. While unsupervised training improves system performance when a limited amount (135 h) of supervised data is available, these gains disappear when a greater amount (848 h) of supervised data is used, even with a very large (7069 h) corpus of unsupervised data. We also describe a method for modeling Arabic dialects that avoids the problem of data sparseness entailed by dialect-specific acoustic models via the use of non-phonetic, dialect questions in the decision trees. We show how this method can be used with a statically compiled decoding graph by partitioning the decision trees into a static component and a dynamic component, with the dynamic component being replaced by a mapping that is evaluated at run-time.
- Published
- 2009
50. Introduction to the special issue on multimodal corpora for modeling human multimodal behavior
- Author
-
Jean-Claude Martin, Patrizia Paggio, Peter Kuehnlein, Fabio Pianesi, and Rainer Stiefelhagen
- Subjects
World Wide Web ,Linguistic Data Consortium ,Linguistics and Language ,Modalities ,Computer science ,General Social Sciences ,Multimodal communication ,Library and Information Sciences ,Computational linguistics ,Language and Linguistics ,Education ,Gesture - Abstract
There is an increasing interest in multimodal communication as suggested by several national and international projects (ISLE, HUMAINE, SIMILAR, CHIL, AMI, CALO, VACE, CALLAS), the attention devoted to the topic by well-known institutions and organizations (the National Institute of Standards and Technology, the Linguistic Data Consortium), and the success of conferences related to multimodal communication (ICMI, IVA, Gesture, Measuring Behavior, Nordic Symposium on Multimodal Communication, LREC Workshops on Multimodal Corpora). As Dutoit et al. (2006) lament, however, « there is a lack of multimodal corpora suitable for the evaluation of recognition/synthesis approaches and interaction strategies . . . one must admit that most corpora available today target the study of a limited number of modalities, if not one ». Corpora are not only relevant to evaluation purposes, their importance extending to all the stages of design and development of multimodal systems. Moreover, established practices and guidelines
- Published
- 2008
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.