14 results on '"Jan Lehečka"'
Search Results
2. Deep LSTM Spoken Term Detection using Wav2Vec 2.0 Recognizer
- Author
-
Jan Švec, Jan Lehečka, and Luboš Šmídl
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,Computer Science - Computation and Language ,Audio and Speech Processing (eess.AS) ,FOS: Electrical engineering, electronic engineering, information engineering ,Spoken Term Detection, Wav2Vec ,Computation and Language (cs.CL) ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
In recent years, the standard hybrid DNN-HMM speech recognizers are outperformed by the end-to-end speech recognition systems. One of the very promising approaches is the grapheme Wav2Vec 2.0 model, which uses the self-supervised pretraining approach combined with transfer learning of the fine-tuned speech recognizer. Since it lacks the pronunciation vocabulary and language model, the approach is suitable for tasks where obtaining such models is not easy or almost impossible. In this paper, we use the Wav2Vec speech recognizer in the task of spoken term detection over a large set of spoken documents. The method employs a deep LSTM network which maps the recognized hypothesis and the searched term into a shared pronunciation embedding space in which the term occurrences and the assigned scores are easily computed. The paper describes a bootstrapping approach that allows the transfer of the knowledge contained in traditional pronunciation vocabulary of DNN-HMM hybrid ASR into the context of grapheme-based Wav2Vec. The proposed method outperforms the previously published system based on the combination of the DNN-HMM hybrid ASR and phoneme recognizer by a large margin on the MALACH data in both English and Czech languages.
- Published
- 2022
3. Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project
- Author
-
Jan Lehečka, Josef V. Psutka, and Josef Psutka
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Colloquial speech ,ASR ,Wav2Vec 2.0 ,Computation and Language (cs.CL) - Abstract
Czech is a very specific language due to its large differences between the formal and the colloquial form of speech. While the formal (written) form is used mainly in official documents, literature, and public speeches, the colloquial (spoken) form is used widely among people in casual speeches. This gap introduces serious problems for ASR systems, especially when training or evaluating ASR models on datasets containing a lot of colloquial speech, such as the MALACH project. In this paper, we are addressing this problem in the light of a new paradigm in end-to-end ASR systems -- recently introduced self-supervised audio Transformers. Specifically, we are investigating the influence of colloquial speech on the performance of Wav2Vec 2.0 models and their ability to transcribe colloquial speech directly into formal transcripts. We are presenting results with both formal and colloquial forms in the training transcripts, language models, and evaluation transcripts., to be published in Proceedings of TSD 2022
- Published
- 2022
4. Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech
- Author
-
Jan Lehečka, Jan Švec, Ales Prazak, and Josef Psutka
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,ComputingMethodologies_PATTERNRECOGNITION ,Computer Science - Computation and Language ,Audio and Speech Processing (eess.AS) ,FOS: Electrical engineering, electronic engineering, information engineering ,Computation and Language (cs.CL) ,Computer Science - Sound ,speech recognition, audio transformers, Wav2Vec ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
In this paper, we present our progress in pretraining Czech monolingual audio transformers from a large dataset containing more than 80 thousand hours of unlabeled speech, and subsequently fine-tuning the model on automatic speech recognition tasks using a combination of in-domain data and almost 6 thousand hours of out-of-domain transcribed speech. We are presenting a large palette of experiments with various fine-tuning setups evaluated on two public datasets (CommonVoice and VoxPopuli) and one extremely challenging dataset from the MALACH project. Our results show that monolingual Wav2Vec 2.0 models are robust ASR systems, which can take advantage of large labeled and unlabeled datasets and successfully compete with state-of-the-art LVCSR systems. Moreover, Wav2Vec models proved to be good zero-shot learners when no training data are available for the target ASR task., to be published in Proceedings of INTERSPEECH 2022
- Published
- 2022
5. Transfer Learning of Transformers for Spoken Language Understanding
- Author
-
Jan Švec, Adam Frémund, Martin Bulín, and Jan Lehečka
- Subjects
Speech recognition ,Wav2Vec model ,T5 model ,Spoken language understanding - Abstract
Pre-trained models used in the transfer-learning scenario are recently becoming very popular. Such models benefit from the availability of large sets of unlabeled data. Two kinds of such models include the Wav2Vec 2.0 speech recognizer and T5 text-to-text transformer. In this paper, we describe a novel application of such models for dialog systems, where both the speech recognizer and the spoken language understanding modules are represented as Transformer models. Such composition outperforms the baseline based on the DNN-HMM speech recognizer and CNN understanding.
- Published
- 2022
- Full Text
- View/download PDF
6. Comparison of Czech Transformers on Text Classification Tasks
- Author
-
Jan Švec and Jan Lehečka
- Subjects
Czech ,FOS: Computer and information sciences ,multi-label topic identification ,Training set ,Computer Science - Computation and Language ,business.industry ,Computer science ,Sentiment analysis ,computer.software_genre ,language.human_language ,Task (project management) ,text categorization and summarization ,Research community ,sentiment analysis ,language ,Artificial intelligence ,business ,monolingual transformers ,computer ,Computation and Language (cs.CL) ,Natural language processing ,Transformer (machine learning model) - Abstract
In this paper, we present our progress in pre-training monolingual Transformers for Czech and contribute to the research community by releasing our models for public. The need for such models emerged from our effort to employ Transformers in our language-specific tasks, but we found the performance of the published multilingual models to be very limited. Since the multilingual models are usually pre-trained from 100+ languages, most of low-resourced languages (including Czech) are under-represented in these models. At the same time, there is a huge amount of monolingual training data available in web archives like Common Crawl. We have pre-trained and publicly released two monolingual Czech Transformers and compared them with relevant public models, trained (at least partially) for Czech. The paper presents the Transformers pre-training procedure as well as a comparison of pre-trained models on text classification task from various domains., https://huggingface.co/fav-kky
- Published
- 2021
7. Transformer-Based Automatic Punctuation Prediction and Word Casing Reconstruction of the ASR Output
- Author
-
Jan Lehečka, Pavel Ircing, Luboš Šmídl, and Jan Švec
- Subjects
Punctuation predictor ,Czech ,Word casing reconstruction ,Computer science ,media_common.quotation_subject ,Speech recognition ,Punctuation ,Readability ,language.human_language ,T5 ,ASR ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,language ,Casing ,Word (computer architecture) ,BERT ,media_common ,Transformer (machine learning model) - Abstract
The paper proposes a module for automatic punctuation prediction and casing reconstruction based on transformers architectures (BERT/T5) that constitutes the current state-of-the-art in many similar NLP tasks. The main motivation for our work was to increase the readability of the ASR output. The ASR output is usually in the form of a continuous stream of text, without punctuation marks and with all words in lowercase. The resulting punctuation and casing reconstruction module is evaluated on both the written text and the actual ASR output in three languages (English, Czech and Slovak).
- Published
- 2021
- Full Text
- View/download PDF
8. Automatic Correction of i/y Spelling in Czech ASR Output
- Author
-
Pavel Ircing, Luboš Šmídl, Jan Lehečka, and Jan Švec
- Subjects
Czech ,Grammar rules ,050101 languages & linguistics ,Correctness ,Computer science ,Speech recognition ,05 social sciences ,02 engineering and technology ,language.human_language ,Spelling ,Perceived quality ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Encoder ,Grammatical error correction, ASR , BERT - Abstract
This paper concentrates on the design and evaluation of the method that would be able to automatically correct the spelling of i/y in the Czech words at the output of the ASR decoder. After analysis of both the Czech grammar rules and the data, we have decided to deal only with the endings consisting of consonants b/f/l/m/p/s/v/z followed by i/y in both short and long forms. The correction is framed as the classification task where the word could belong to the “i” class, the “y” class or the “empty” class. Using the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT) architecture, we were able to substantially improve the correctness of the i/y spelling both on the simulated and the real ASR output. Since the misspelling of i/y in the Czech texts is seen by the majority of native Czech speakers as a blatant error, the corrected output greatly improves the perceived quality of the ASR system.
- Published
- 2020
9. BERT-Based Sentiment Analysis Using Distillation
- Author
-
Luboš Šmídl, Pavel Ircing, Jan Švec, and Jan Lehečka
- Subjects
business.industry ,Computer science ,Pooling ,Sentiment analysis ,Knowledge distillation ,Machine learning ,computer.software_genre ,Production model ,Artificial intelligence ,Layer (object-oriented design) ,business ,computer ,Encoder ,BERT ,Transformer (machine learning model) ,Movie reviews - Abstract
In this paper, we present our experiments with BERT (Bidirectional Encoder Representations from Transformers) models in the task of sentiment analysis, which aims to predict the sentiment polarity for the given text. We trained an ensemble of BERT models from a large self-collected movie reviews dataset and distilled the knowledge into a single production model. Moreover, we proposed an improved BERT’s pooling layer architecture, which outperforms standard classification layer while enables per-token sentiment predictions. We demonstrate our improvements on a publicly available dataset with Czech movie reviews.
- Published
- 2020
- Full Text
- View/download PDF
10. Adjusting BERT’s Pooling Layer for Large-Scale Multi-Label Text Classification
- Author
-
Jan Švec, Pavel Ircing, Jan Lehečka, and Luboš Šmídl
- Subjects
050101 languages & linguistics ,Sequence ,business.industry ,Computer science ,05 social sciences ,Pooling ,Text document ,02 engineering and technology ,computer.software_genre ,Security token ,Class (biology) ,Task (project management) ,Text classification, BERT model ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Artificial intelligence ,Layer (object-oriented design) ,business ,Scale (map) ,computer ,Natural language processing - Abstract
In this paper, we present our experiments with BERT models in the task of Large-scale Multi-label Text Classification (LMTC). In the LMTC task, each text document can have multiple class labels, while the total number of classes is in the order of thousands. We propose a pooling layer architecture on top of BERT models, which improves the quality of classification by using information from the standard [CLS] token in combination with pooled sequence output. We demonstrate the improvements on Wikipedia datasets in three different languages using public pre-trained BERT models.
- Published
- 2020
11. Online LDA-Based Language Model Adaptation
- Author
-
Aleš Pražák and Jan Lehečka
- Subjects
Topic model ,Text corpus ,Perplexity ,Computer science ,business.industry ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,Latent Dirichlet allocation ,Task (project management) ,Reduction (complexity) ,symbols.namesake ,ComputingMethodologies_PATTERNRECOGNITION ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,020201 artificial intelligence & image processing ,Artificial intelligence ,Language model ,Adaptation (computer science) ,business ,computer ,Natural language processing - Abstract
In this paper, we present our improvements in online topic-based language model adaptation. Our aim is to enhance the automatic speech recognition of a multi-topic speech which is to be recognized in the real-time (online). Latent Dirichlet Allocation (LDA) is an unsupervised topic model designed to uncover hidden semantic relationships between words and documents in a text corpus and thus reveal latent topics automatically. We use LDA to cluster the text corpus and to predict topics online from partial hypotheses during the real-time speech recognition. Based on detected topic changes in the speech, we adapt the language model on-the-fly. We are demonstrating the improvement of our system on the task of online subtitling of TV news, where we achieved \(18\%\) relative reduction of perplexity and \(3.52\%\) relative reduction of WER over non-adapted system.
- Published
- 2018
- Full Text
- View/download PDF
12. General framework for mining, processing and storing large amounts of electronic texts for language modeling purposes
- Author
-
Pavel Ircing, Jan Vavruška, Lucie Skorkovská, Jan Lehečka, Aleš Pražák, Jan Švec, Jan Hoidekr, and Petr Stanislav
- Subjects
Czech ,Linguistics and Language ,business.industry ,Computer science ,media_common.quotation_subject ,Library and Information Sciences ,computer.software_genre ,Language and Linguistics ,language.human_language ,Education ,Set (abstract data type) ,Consistency (database systems) ,Text processing ,Web page ,language ,Quality (business) ,Artificial intelligence ,Language model ,Computational linguistics ,business ,computer ,Natural language processing ,media_common - Abstract
The paper describes a general framework for mining large amounts of text data from a defined set of Web pages. The acquired data are meant to constitute a corpus for training robust and reliable language models and thus the framework needs to also incorporate algorithms for appropriate text processing and duplicity detection in order to secure quality and consistency of the data. As we expect the resulting corpus to be very large, we have also implemented topic detection algorithms that allow us to automatically select subcorpora for domain-specific language models. The description of the framework architecture and the implemented algorithms is complemented with a detailed evaluation section. It analyses the basic properties of the gathered Czech corpus containing more than one billion text tokens collected using the described framework, shows the results of the topic detection methods and finally also describes the design and outcomes of the automatic speech recognition experiments with domain-specific language models estimated from the collected data.
- Published
- 2013
- Full Text
- View/download PDF
13. Improving Speech Recognition by Detecting Foreign Inclusions and Generating Pronunciations
- Author
-
Jan Švec and Jan Lehečka
- Subjects
Czech ,Language identification ,Computer science ,Character (computing) ,business.industry ,Speech recognition ,computer.software_genre ,language.human_language ,language ,Proper noun ,Finite state ,Language model ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
The aim of this paper is to improve speech recognition by enriching language models with automatically detected foreign inclusions from a training text. The enriching is restricted only to foreign, proper-noun inclusions which are typically a dominant part of miss-recognized words. In our suggested approach, character-based n-gram language models are used for detection of foreign, single-word inclusions and for a language identification, and finite state transducers are used to generate foreign pronunciations. Results of this paper show that by enriching language model with English proper nouns found in Czech training text, the recognition of a speech containing English inclusions can be improved by 9.4% relative reduction of WER.
- Published
- 2013
- Full Text
- View/download PDF
14. Automatic Topic Identification for Large Scale Language Modeling Data Filtering
- Author
-
Aleš Pražák, Pavel Ircing, Jan Lehečka, and Lucie Skorkovská
- Subjects
Identification (information) ,Data filtering ,Hierarchy ,Information retrieval ,Computer science ,Scale (chemistry) ,media_common.quotation_subject ,Complex system ,Quality (business) ,Language model ,media_common - Abstract
The paper presents a module for topic identification that is embedded into a complex system for acquisition and storing large volumes of text data from the Web. The module processes each of the acquired data items and assigns keywords to them from a defined topic hierarchy that was developed for this purposes and is also described in the paper. The quality of the topic identification is evaluated in two ways - using classic precision-recall measures and also indirectly, by measuring the ASR performance of the topic-specific language models that are built using the automatically filtered data.
- Published
- 2011
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.