Descriptor: "Text to speech" / Topic: speech - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Text to speech"' showing total 13 results

Start Over Descriptor "Text to speech" Topic speech

13 results on '"Text to speech"'

1. Optimizing feature fusion for improved zero-shot adaptation in text-to-speech synthesis.

Author: Chen, Zhiyong, Ai, Zhiqi, Ma, Youxuan, Li, Xinnuo, and Xu, Shugong
Subjects: SPEECH synthesis, SPEECH, VOICEPRINTS, PROSODIC analysis (Linguistics), VECTOR control
Abstract: In the era of advanced text-to-speech (TTS) systems capable of generating high-fidelity, human-like speech by referring a reference speech, voice cloning (VC), or zero-shot TTS (ZS-TTS), stands out as an important subtask. A primary challenge in VC is maintaining speech quality and speaker similarity with limited reference data for a specific speaker. However, existing VC systems often rely on naive combinations of embedded speaker vectors for speaker control, which compromises the capture of speaking style, voice print, and semantic accuracy. To overcome this, we introduce the Two-branch Speaker Control Module (TSCM), a novel and highly adaptable voice cloning module designed to precisely processing speaker or style control for a target speaker. Our method uses an advanced fusion of local-level features from a Gated Convolutional Network (GCN) and utterance-level features from a gated recurrent unit (GRU) to enhance speaker control. We demonstrate the effectiveness of TSCM by integrating it into advanced TTS systems like FastSpeech 2 and VITS architectures, significantly optimizing their performance. Experimental results show that TSCM enables accurate voice cloning for a target speaker with minimal data through both zero-shot or few-shot fine-tuning of pretrained TTS models. Furthermore, our TSCM-based VITS (TSCM-VITS) showcases superior performance in zero-shot scenarios compared to existing state-of-the-art VC systems, even with basic dataset configurations. Our method's superiority is validated through comprehensive subjective and objective evaluations. A demonstration of our system is available at https://great-research.github.io/tsct-tts-demo/, providing practical insights into its application and effectiveness. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Third Eye: Object Recognition and Speech Generation for Visually Impaired.

Author: Guravaiah, Koppala, Bhavadeesh, Yarlagadda Sai, Shwejan, Peddi, Vardhan, Allu Harsha, and Lavanya, S
Subjects: RECOGNITION (Psychology), PEOPLE with visual disabilities, AUTOMATIC speech recognition, SPEECH perception, OBJECT recognition (Computer vision), SPEECH
Abstract: Detecting and recognizing the objects and generating speech about the objects helps visually impaired in a great way in understanding their surroundings. Required a mechanism to assist the visually impaired person to travel independently with the ability to identify objects in their path, and the ability to generate speech describing the objects detected in the scene. This can be achieved with the help of YOLOv5 image detection model and text to speech converters such as gTTS and pyttsx3 modules in python. The proposed method called Third Eye, giving better accuracy in detection and speech generation to help the visually impaired people Compared to existing approaches. YOLO v5 is trained on custom dataset of 15 objects along with MS COCO 2017 Dataset of 80 objects (95 objects overall). The output labels of the model are transformed to text and later converted to audio format and are presented to the visually impaired, through a speaker. We compared two python libraries for audio conversion, one is pyttsx3, and the other is gTTS. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

3. El audiotexto, una forma de oralidad terciaria y una experiencia alternativa de lectura.

Author: Quiceno, Carlos Suárez and Muñoz, Wilson Castaño
Subjects: *SPEECH, *SPEECH synthesis, *ASSISTANCE in emergencies, *AUDIOBOOKS, *READING, *RESEARCH
Abstract: Based on the research "The text-to-speech resource beyond assistance", it is suggested that reading through voice synthesis applications entails a double relationship with the text: hearing and visual. This article proposes the term audiotextual reading, which is a textual format different from the traditional audiobook and it must be integrated into the category of tertiary orality. Finally, some research findings based on the perceptions of the audiences are considered. The term audioreaders is also suggested in this article. These insights contribute to the understanding of the conditions in which text-to-speech is used as an alternative form of reading experience. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

4. MINTZAI: End-to-end Deep Learning for Speech Translation.

Author: Etchegoyhen, Thierry, Arzelus, Haritz, Gete, Harritxu, Alvarez, Aitor, Hernaez, Inma, Navas, Eva, González-Docasal, Ander, Osácar, Jaime, Benites, Edson, Ellakuria, Igor, Calonge, Eusebi, and Martin, Maite
Subjects: DEEP learning, MACHINE translating, SPEECH synthesis, SPEECH perception, TRANSLATING & interpreting, ARTIFICIAL neural networks, SPEECH
Abstract: Speech Translation consists in translating speech in one language into text or speech in a different language. These systems have numerous applications, particularly in multilingual communities such as the European Union. The standard approach in the field involves the chaining of separate components for speech recognition, machine translation and speech synthesis. With the advances made possible by artificial neural networks and Deep Learning, training end-to-end speech translation systems has given rise to intense research and development activities in recent times. In this paper, we review the state of the art and describe project MINTZAI, which is being carried out in this field. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

5. Algorithm of Allophone Borders Correction in Automatic Segmentation of Acoustic Units

Author: Rafałko, Janusz, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Saeed, Khalid, editor, and Homenda, Władysław, editor
Published: 2016
Full Text: View/download PDF

6. A Robust Syllable Centric Pronunciation Model for Tamil Text To Speech Synthesizer.

Author: Rajendran, Vaibhavi and Kumar, G. Bharadwaja
Subjects: *VOCODER, *PRONUNCIATION, *ERROR rates, *INTERACTION design (Human-computer interaction), *SPEECH, *HUMAN-computer interaction
Abstract: The Human–Computer Interaction era contrived the researchers to work on speech and languages to develop interactive interfaces. A speech synthesizer is one such interface facilitating people to amalgamate with the digital era. The present work is focused on developing a Letter-To-Sound mapping for a Tamil speech synthesizer, which is an intriguing task due to the script to sound mapping irregularities in Tamil. Tamil is a syllable-timed language, hence a new syllable centric rule-based approach is formulated in the present work with a more extended set of rules than the existing rule-bases in the literature. This proposed rule-based system outperforms the existing rule-based systems with a low Character Error Rate and High Mean Similarity Score. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

7. An Arabic expert system for voice synthesis.

Author: Tebbi, Hanane, Azzoune, Hamid, and Hamadouche, Maamar
Subjects: *EXPERT systems, *SPEECH synthesis, *ARABIC language, *PRONUNCIATION, *SPEECH
Abstract: Abstract: In this paper, an efficient Arabic Expert System for Voice Synthesis, referred to us as AESVS, is proposed. This system, which is based on the Standard Arabic language, consists of 3 phases. The first phase is responsible for the automatic acquisition of human expert knowledge. The acquired knowledge consists of Arabic language pronunciation rules along with the required domain dictionary, to be used by the system. The second phase is concerned with the creation of the sound base that contains the acoustic units needed during the voice generation step. The third phase, which represents the speech synthesis operation, allows the generation of a phonemic sequence for any given word. All words in the input text are converted into their corresponding phonemes sequence by using orthographic phonetic transcription that takes into account their adaptation to the Arabic Language. To ensure this conversion from text to speech, we used approximately 350 predicates created by using the PROLOGUE inference engine, which are stored in a knowledge base that can be used upon request, during both the orthographic–phonetic transcription and the voice generation steps. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

8. Natural Speech Synthesizer for Blind Persons Using Hybrid Approach.

Author: Gahlawat, Mukta, Malik, Amita, and Bansal, Poonam
Subjects: VOCODER, BLIND experiment, SPEECH perception, ARTIFICIAL intelligence, SPEECH synthesis
Abstract: The major challenges faced by the researchers in speech synthesis are intelligibility and naturalness. Intelligibility means easily understandable and naturalness means the quality of speech being very near to human speech. Due to dynamic nature of human speech it is very difficult to mimic it, as the same content of speech in different situations is having different prosodic parameters. This paper discusses an approach to develop a natural sounding speech synthesizer. The developed Text To Speech system was tested on blind persons using subjective listening test. Test was performed using mean average score (MOS) and it was done on ten blind persons of age group varies from 14 years to 42 years. Five parameters naturalness, intelligibility, usability, localization awareness, expressions were considered for analysis of the speech synthesizer. As a result, good MOS was received for naturalness and usability, fair MOS for intelligibility and localization. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

9. Computer-Synthesized Speech and Perceptions of the Social Influence of Disabled Users.

Author: Stern, Steven E.
Subjects: *SPEECH, *ASSISTIVE technology, *SPEECH disorders, *AMYOTROPHIC lateral sclerosis, *SOCIAL influence, *SENSORY perception, *LANGUAGE & languages, *LINGUISTICS, *PSYCHOLOGY
Abstract: Computer-synthesized speech is frequently used as an assistive technology for people with speech disabilities including those with amyotrophic lateral sclerosis (ALS). In this short article, the conditions that lead to speech loss and how people perceive computersynthesized speech, particularly when it is used by the speaking disabled, are discussed. Specific attention is paid to the author's own program of research that has examined how perceptions of trustworthiness are moderated by the use of synthetic speech, whether the user is speech disabled, and the purpose for which the synthetic speech is used. Based on this research, four specific conclusions are presented. [ABSTRACT FROM AUTHOR]
Published: 2008
Full Text: View/download PDF

10. The development of the voice read-out system for digital television receiver.

Author: Furuta, Satoru, Kawashima, Keigo, Otsuka, Takahiro, Yamaura, Tadashi, and Otsuka, Reiji
Abstract: We hope that every people can enjoy the convenient functions of digital broadcasting such as an Electronic Program Guide (EPG). In 2007, Mitsubishi Electric launched the mass-produced LCD television equipped with the EPG voice read-out system to Japanese domestic market. This paper describes the development of the voice read out function of a digital television receiver, background of development, the introduced speech synthesis system and the improvement degrees of every model based on end-user's opinions are summarized respectively. [ABSTRACT FROM PUBLISHER]
Published: 2012
Full Text: View/download PDF

11. Algorithm of Allophone Borders Correction in Automatic Segmentation of Acoustic Units

Author: Janusz Rafałko, Warsaw University of Technology [Warsaw], Khalid Saeed, Władysław Homenda, and TC 8
Subjects: Computer science, Acoustic units, [SHS.INFO]Humanities and Social Sciences/Library and information sciences, Speech recognition, Speech synthesis, 02 engineering and technology, computer.software_genre, TTS, Allophone, Factor (programming language), 0202 electrical engineering, electronic engineering, information engineering, Speech, [INFO]Computer Science [cs], Border correction, computer.programming_language, business.industry, Speech quality, 020206 networking & telecommunications, Pattern recognition, Speech processing, Base (topology), Phoneme, Key (cryptography), Automatic segmentation, 020201 artificial intelligence & image processing, Artificial intelligence, Text to speech, business, Algorithm, computer
Abstract: Part 6: Algorithms; International audience; In concatenative speech synthesis the fundamental factor with heavy influence on synthesized speech quality is the database of acoustic units. In case of bases received in automatic way, the key matter is suitable marking the borders of acoustic units. This article describes the algorithm of correction of acoustic units borders appointive in automatic way. It is based on two factors specified and tested here. It also describes worked out method of grade of acoustic units database, which allows to observe the influence of introduced correction on the base quality.
Published: 2016

12. President Obama to Cadets: Lead the Way on Fighting Climate Change.

Author: Obama, Barack
Abstract: President Barack Obama gave this commencement speech at United State Coast Guard Academy [ABSTRACT FROM PUBLISHER]
Published: 2015

13. Natural Speech Synthesizer for Blind Persons Using Hybrid Approach

Author: Amita Malik, Mukta Gahlawat, and Poonam Bansal
Subjects: business.industry, Computer science, Speech recognition, Speech technology, Usability, Speech corpus, Speech synthesis, PSQM, Intelligibility (communication), Expressive Speech, computer.software_genre, Concatenative Speech Synthesis, General Earth and Planetary Sciences, Speech, Text to Speech, business, computer, Blind persons, Unit selection, General Environmental Science
Abstract: The major challenges faced by the researchers in speech synthesis are intelligibility and naturalness. Intelligibility means easily understandable and naturalness means the quality of speech being very near to human speech. Due to dynamic nature of human speech it is very difficult to mimic it, as the same content of speech in different situations is having different prosodic parameters. This paper discusses an approach to develop a natural sounding speech synthesizer. The developed Text To Speech system was tested on blind persons using subjective listening test. Test was performed using mean average score (MOS) and it was done on ten blind persons of age group varies from 14 years to 42 years. Five parameters naturalness, intelligibility, usability, localization awareness, expressions were considered for analysis of the speech synthesizer. As a result, good MOS was received for naturalness and usability, fair MOS for intelligibility and localization.
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

13 results on '"Text to speech"'

1. Optimizing feature fusion for improved zero-shot adaptation in text-to-speech synthesis.

2. Third Eye: Object Recognition and Speech Generation for Visually Impaired.

3. El audiotexto, una forma de oralidad terciaria y una experiencia alternativa de lectura.

4. MINTZAI: End-to-end Deep Learning for Speech Translation.

5. Algorithm of Allophone Borders Correction in Automatic Segmentation of Acoustic Units

6. A Robust Syllable Centric Pronunciation Model for Tamil Text To Speech Synthesizer.

7. An Arabic expert system for voice synthesis.

8. Natural Speech Synthesizer for Blind Persons Using Hybrid Approach.

9. Computer-Synthesized Speech and Perceptions of the Social Influence of Disabled Users.

10. The development of the voice read-out system for digital television receiver.

11. Algorithm of Allophone Borders Correction in Automatic Segmentation of Acoustic Units

12. President Obama to Cadets: Lead the Way on Fighting Climate Change.

13. Natural Speech Synthesizer for Blind Persons Using Hybrid Approach

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

13 results on '"Text to speech"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources