10,292 results on '"Speech coding"'
Search Results
2. An Efficient FPGA-Based Accelerator for Perceptual Weighting Filter in Speech Coding.
- Author
-
Singh, Dilip and Chandel, Rajeevan
- Subjects
- *
SPEECH processing systems , *VIDEO coding , *AUTOMATIC speech recognition , *SIGNAL denoising , *MOVING average process , *MATHEMATICAL optimization , *COMPUTATIONAL complexity , *GRAPES - Abstract
In speech coding, denoising of the speech signal is essential as well as crucial. The filters for minimizing errors through denoising employ the autoregressive moving average (ARMA) approach, introducing higher computational complexity in speech coder design. This research work presents the design and implementation of an effective perceptual weighting filter (PWF) for speech coding. The high-level synthesis of the fixed-point PWF filter is optimized by multiple optimization techniques along with detailed design space exploration using the weighted sum (WS) method. To enhance the performance, an FPGA-based hardware accelerator is proposed using hardware/software (HW/SW) co-design in an embedded environment. Simulative analysis in Vivado HLS and final accelerator design in the Vitis IDE tool validate the proposed architecture by using real-time speech samples, demonstrating a 50% reduction in area and a 99% execution improvement. This makes it well-suited for use in modern speech codecs, enhancing the efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Interactive Multimedia Association-Adaptive Differential Pulse Code Modulation Codec With Gated Recurrent Unit Predictor
- Author
-
Gebremichael Kibret Sheferaw, Waweru Mwangi, Michael W. Kimwele, Adane Mamuye, and Salau
- Subjects
Speech coding ,IMA-ADPCM ,GRU predictor ,predictive model ,speech compression ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Speech coding is important for effective storage and transmission of audio signals. However, current Interactive Multimedia Association Adaptive Differential Pulse Code Modulation (IMA-ADPCM) speech coding techniques that use a fixed predictor have an impact on the encoding of dynamic and non-stationary speech signals. The limitation of the fixed predictor in IMA-ADPCM speech coding is the motivation for this study. Our goal is to improve the fixed predictor by integrating a GRU predictor that can adapt to and make better predictions of dynamic speech signals. We evaluated the performance of the IMA-ADPCM encoding baseline and the GRU predictor embedded with the IMA-ADPCM codec algorithm. The proposed pre-trained GRU predictor based encoding system outperformed the maximum Signal-to-Noise Ratio (SNR) (43.2 dB and MOS scores 3.8 to 4.3) of 5.0, and our results demonstrated considerable improvements in audio quality. The main contribution of this study is the development of a GRU Predictor that integrates IMA-ADPCM coding algorithms according to the IMA-ADPCM output speech sample and the actual PCM speech sample dataset required. By integrating the GRU predictor model in accordance with these data samples, the newly designed algorithm significantly improved the quality of the IMA-ADPCM speech codec.
- Published
- 2024
- Full Text
- View/download PDF
4. Gated recurrent unit predictor model-based adaptive differential pulse code modulation speech decoder
- Author
-
Gebremichael Kibret Sheferaw, Waweru Mwangi, Michael Kimwele, and Adane Mamuye
- Subjects
Speech coding ,Gated recurrent unit ,Nonlinear prediction ,Waveform coding ,Audio coding ,Adaptive differential pulse code modulation ,Acoustics. Sound ,QC221-246 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Abstract Speech coding is a method to reduce the amount of data needs to represent speech signals by exploiting the statistical properties of the speech signal. Recently, in the speech coding process, a neural network prediction model has gained attention as the reconstruction process of a nonlinear and nonstationary speech signal. This study proposes a novel approach to improve speech coding performance by using a gated recurrent unit (GRU)-based adaptive differential pulse code modulation (ADPCM) system. This GRU predictor model is trained using a data set of speech samples from the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus actual sample and the ADPCM fixed-predictor output speech sample. Our contribution lies in the development of an algorithm for training the GRU predictive model that can improve its performance in speech coding prediction and a new offline trained predictive model for speech decoder. The results indicate that the proposed system significantly improves the accuracy of speech prediction, demonstrating its potential for speech prediction applications. Overall, this work presents a unique application of the GRU predictive model with ADPCM decoding in speech signal compression, providing a promising approach for future research in this field.
- Published
- 2024
- Full Text
- View/download PDF
5. Gated recurrent unit predictor model-based adaptive differential pulse code modulation speech decoder.
- Author
-
Sheferaw, Gebremichael Kibret, Mwangi, Waweru, Kimwele, Michael, and Mamuye, Adane
- Subjects
ADAPTIVE modulation ,SPEECH - Abstract
Speech coding is a method to reduce the amount of data needs to represent speech signals by exploiting the statistical properties of the speech signal. Recently, in the speech coding process, a neural network prediction model has gained attention as the reconstruction process of a nonlinear and nonstationary speech signal. This study proposes a novel approach to improve speech coding performance by using a gated recurrent unit (GRU)-based adaptive differential pulse code modulation (ADPCM) system. This GRU predictor model is trained using a data set of speech samples from the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus actual sample and the ADPCM fixed-predictor output speech sample. Our contribution lies in the development of an algorithm for training the GRU predictive model that can improve its performance in speech coding prediction and a new offline trained predictive model for speech decoder. The results indicate that the proposed system significantly improves the accuracy of speech prediction, demonstrating its potential for speech prediction applications. Overall, this work presents a unique application of the GRU predictive model with ADPCM decoding in speech signal compression, providing a promising approach for future research in this field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Waveform based speech coding using nonlinear predictive techniques: a systematic review.
- Author
-
Sheferaw, Gebremichael Kibret, Mwangi, Waweru, Kimwele, Michael, and Mamuye, Adane
- Abstract
Speech coding is a technique that compresses speech signals into a smaller digital form, making it easier to transmit or store, while still maintaining the quality and intelligibility of the speech. The review aimed to identify and analyses the most effective waveform-based nonlinear speech coding prediction techniques, including the use of neural networks and polynomial filters. The study analyzed 29 publications from 2000 to 2023 and found that neural network-based models are widely used for speech compression, with RNN topologies being favored due to their ability to introduce nonlinearity and nonstationary. While nonlinear adaptive speech prediction techniques have been explored for speech coding, further research is needed to optimize the adaptive algorithms used in these models. The review also identified a need for future research to address quality performance and computational cost, and suggested further exploration of RNN predictor models. The methodology used in this study involved a computer science approach that follows three main phases: planning, conducting, and reporting. Six different stages were followed, including determining research questions, defining research approach, study selection criteria, quality measurement criteria, data extraction strategy, and synthesizing extracted data. Overall, this study highlights the need for continued research in the development and improvement of neural network-based speech compression models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. A Reinforcement Learning Approach to Speech Coding
- Author
-
Gibson, Jerry and Oh, Hoontaek
- Subjects
Behavioral and Social Science ,reinforcement learning ,speech coding ,exploration ,exploitation ,dual control ,Information and Computing Sciences - Abstract
Speech coding is an essential technology for digital cellular communications, voice over IP, and video conferencing systems. For more than 25 years, the main approach to speech coding for these applications has been block-based analysis-by-synthesis linear predictive coding. An alternative approach that has been less successful is sample-by-sample tree coding of speech. We reformulate this latter approach as a multistage reinforcement learning problem with L step lookahead that incorporates exploration and exploitation to adapt model parameters and to control the speech analysis/synthesis process on a sample-by-sample basis. The minimization of the spectrally shaped reconstruction error to finite depth manages complexity and serves as an effective stand in for the overall subjective evaluation of reconstructed speech quality and intelligibility. Different control policies that attempt to persistently excite the system states and that encourage exploration are studied and evaluated. The resulting methods produce reconstructed speech quality competitive with the most popular speech codec utilized today. This new reinforcement learning formulation provides new insights and opens up new directions for system design and performance improvement.
- Published
- 2022
8. MBMS-GAN: Multi-Band Multi-Scale Adversarial Learning for Enhancement of Coded Speech at Very Low Rate
- Author
-
Xu, Qianhui, Tu, Weiping, Luo, Yong, Zhou, Xin, Xiao, Li, Zheng, Youqiang, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Iliadis, Lazaros, editor, Papaleonidas, Antonios, editor, Angelov, Plamen, editor, and Jayne, Chrisina, editor
- Published
- 2023
- Full Text
- View/download PDF
9. Virtual Speech System Based on Sensing Technology and Teaching Management in Universities
- Author
-
Niu Yan
- Subjects
compressed sensing technology ,interactive 3d speech ,frequency domain parameter compression ,speech coding ,speech feature extraction ,97c70 ,Mathematics ,QA1-939 - Abstract
In this paper, digital speech is compressed using discrete Fourier transform, discrete cosine transform, and improved discrete cosine transform, and compressed sensing technology is proposed. Based on the compressed sensing technology, the frequency domain parameter compression algorithm and the speech coding and decoding algorithm are designed, and the interactive 3D virtual speech system design is completed through the pre-processing of the speech system, the extraction of speech features and the design of speech control commands. The virtual voice system designed in this paper is introduced in the teaching management mode of colleges and universities, and the main functions of the system include four major sections: notification management, online Q&A, virtual voice system interaction, and teaching resource management. The virtual voice system built using sensing technology is simulated and tested, and the practical application effect of the system is studied through empirical analysis. The experimental results show that the amplitude of the sound recorded by the compression sensor in the voice sensing experiment is more concentrated, the range is concentrated between [-0.025,0.025], and the detected voice is smaller and more effective than the amplitude recorded by the cell phone. Students were mainly satisfied and very satisfied with the four system functions designed in this paper, and in terms of the online Q&A function, only one student expressed great dissatisfaction and the total number of satisfied people was 119, and the students were highly satisfied with the teaching management of the system designed in this study.
- Published
- 2024
- Full Text
- View/download PDF
10. FPGA-Based Hardware-Accelerated Design of Linear Prediction Analysis for Real-Time Speech Signal.
- Author
-
Singh, Dilip and Chandel, Rajeevan
- Subjects
- *
LINEAR statistical models , *SPEECH , *SYSTEMS on a chip , *INTELLECTUAL property - Abstract
Linear prediction analysis is a crucial technique used in speech coding to compress speech signals and facilitate their reliable transmission over limited bandwidth or storage space. However, this technique involves repetitive computations on a wide range of incoming audio data, leading to high resource consumption and execution time. To address this challenge, an FPGA-based acceleration method is proposed in the present work that provides high computational capabilities and is energy efficient. The study focuses on optimizing linear prediction analysis at the sub-block level, specifically by modifying the autocorrelation and Levinson–Durbin algorithm to enhance the performance of the overall system. The suggested algorithm is integrated into a hardware/software co-design as a high-performance intellectual property with an AXI4-Stream interface. The system achieves over 99% speed increase and a 60% reduction in resource utilization, by the proposed hardware acceleration implementation. These findings are significant for designing optimized hardware for low bit rate speech coders with improved execution time and reduced resource consumption. The approach is validated by simulating and testing the complete system-on-chip architecture on a Zynq Zybo FPGA for functionality and real-time data performance. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features.
- Author
-
Sankar, M. S. Arun and Sathidevi, P. S.
- Subjects
- *
BIT rate , *INTELLIGIBILITY of speech , *SPEECH perception , *SPEECH , *AUTOMATIC speech recognition , *LINEAR predictive coding , *LINEAR codes - Abstract
There has been a significant growth in the mobile devices and services, fuelling an increasing demand for voice-activated applications. In this context, it is important that individual speaker characteristics are captured, in addition to the salient information in the speech signal. Thus, efficient speech coders that can achieve the dual goals of compact speech representation that maintains speech intelligibility and quality, and preservation of speaker-specific characteristics are attractive. A wideband scalable bit rate mixed excitation linear prediction-enhanced speech coder with an efficient representation for excitation using glottal instants and linear predictive coding based on mel scale is proposed in this paper. The instantaneous pitch or epoch is included in the excitation to get an accurate estimation of glottal instants, a vital parameter in speaker recognition. By optimizing the bit requirement using speech category-based coding, the proposed wideband coder can operate at bit rates ranging from 3.3 to 5.1 kbps with an average bit rate of 3.6 kbps. The proposed coder provides, at 3.6 kbps, similar perceptual quality, as measured by mean opinion score and perceptual evaluation of speech quality, as that of code excited linear prediction operating at 6.4 kbps. The performance of the proposed coder in speaker recognition is analysed, and it gives an equal error rate of 12.5%, which is very promising. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Review of analysis methods for speech applications.
- Author
-
O'Shaughnessy, Douglas
- Subjects
- *
SPEECH perception , *AUTOMATIC speech recognition , *SPEECH , *TIME-frequency analysis , *TEXT recognition - Abstract
• This is a review of methods used to analyze speech signals for automatic recognition of their associated text, speaker identity, and other pertinent information are reviewed. • The survey focuses on the requirements of different applications, as diverse speech tasks have often used the same methods, despite having very different objectives. • As relevant information in a speech signal is distributed highly non-uniformly, a wide variety of time and frequency analysis techniques is examined. • The utility of methods is noted in terms of performance, using accuracy, complexity, cost, and latency as measures. This paper reviews methods used to analyze speech signals for various applications such as automatic recognition of associated text and speaker identity, and coding. The survey focuses on the requirements of different applications, as diverse speech tasks have often used the same methods, despite having very different objectives. As relevant information in a speech signal is distributed highly non-uniformly, a variety of time and frequency analysis techniques is examined. The utility of methods is noted in terms of performance, using accuracy, complexity, cost, and latency as criteria. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. Design of orthogonal filter banks using a multi-objective genetic algorithm for a speech coding scheme
- Author
-
Abdelkader BOUKHOBZA, Nasreddine TALEB, Abdelmalik TALEB-AHMED, and Abdennacer BOUNOUA
- Subjects
Discrete wavelet transform ,Orthogonal filter banks ,Speech coding ,Optimized wavelet ,NSGAIII algorithm ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
In this work, we propose an optimization scheme based on a multi-objective Genetic Algorithm (GA) for the design of orthogonal filter banks for speech compression. A parameterization is adopted to assure that the resulting filter banks satisfy perfect reconstruction and have at least two vanishing moments. We search for a parameter set that optimizes the coding gain and the frequency selectivity. As the objectives are conflicting, we investigate the solution that realizes the best compromise between the objectives criteria using the Non-dominated Sorting Genetic Algorithm (NSGAIII). Experimental results have shown that the optimized filter banks provide a significant gain in coding performances when comparing with the Daubechies orthogonal filter banks for test speech signals.
- Published
- 2022
- Full Text
- View/download PDF
14. A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion.
- Author
-
Krobba, Ahmed, Debyeche, Mohamed, and Selouani, Sid. Ahmed
- Subjects
SPEECH ,FEATURE extraction ,BIT rate ,VERBAL behavior testing ,INNER ear ,DEAF children - Abstract
Currently, the majority of the state-of-the-art speaker recognition systems predominantly use short-term cepstral feature extraction approaches to parameterize the speech signals. In this paper, we propose new auditory features based Caelen auditory model that simulate the external, middle and inner parts of the ear and Gammtone filter for speaker recognition system, called Caelen Auditory Model Gammatone Cepstral Coefficients (CAMGTCC). The performances evaluations of the proposed feature are carried by the TIMIT and NIST 2008 corpus. The speech coding represent by Adaptive Multi-Rate wideband (AMR-WB) and noisy conditions using various noises SNR levels which are extracted from NOISEX-92. Speaker recognition system using GMM-UBM and i-vector-GPLDA modelling. The experimental results demonstrate that the proposed feature extraction method performs better compared to the Gammatone Cepstral Coefficients (GTCC) and Mel Frequency Cepstral Coefficients (MFCC) features. For speech coding distortion, the features extraction proposed improve the robustness of codec-degraded speech at different bit rates. In addition, when the test speech signals are corrupted with noise at SNRs ranging from (0 dB to 15 dB), we observe that CAMGTCC achieves overall equal error rate (EER) reduction of 10.88% to 6.8% relative, compared to baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. Design of orthogonal filter banks using a multi-objective genetic algorithm for a speech coding scheme.
- Author
-
BOUKHOBZA, Abdelkader, TALEB, Nasreddine, TALEB-AHMED, Abdelmalik, and BOUNOUA, Abdennacer
- Subjects
FILTER banks ,GENETIC algorithms ,VERBAL behavior testing ,EVOLUTIONARY algorithms ,DISCRETE wavelet transforms ,AUTOMATIC speech recognition - Abstract
In this work, we propose an optimization scheme based on a multi-objective Genetic Algorithm (GA) for the design of orthogonal filter banks for speech compression. A parameterization is adopted to assure that the resulting filter banks satisfy perfect reconstruction and have at least two vanishing moments. We search for a parameter set that optimizes the coding gain and the frequency selectivity. As the objectives are conflicting, we investigate the solution that realizes the best compromise between the objectives criteria using the Non-dominated Sorting Genetic Algorithm (NSGAIII). Experimental results have shown that the optimized filter banks provide a significant gain in coding performances when comparing with the Daubechies orthogonal filter banks for test speech signals. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
16. Recursively Adaptive Randomized Multi-Tree Coding (RAR MTC) of Speech with VAD/CNG
- Author
-
Oh, Hoontaek
- Subjects
Electrical engineering ,Computer engineering ,Backward Adaptive Pole-and-Zero Predictor ,Perceptual Weighting Filter ,Polarity-pattern-based Gain Control ,Speech Coding ,Tree Coding ,Voice Activity Detection/Comfort Noise Generation - Abstract
A new form of a tree codec for narrowband speech, “Recursively Adaptive Randomized Multi-tree Coding (RAR MTC) with VAD/CNG”, is developed based on a sample-by-sample analysis-and-synthesis linear predictive model by benchmarking and upgrading the tree coding models suggested by J. D. Gibson, W. Chang and H. C. Woo. in the 1990s. A simple structure of the Voice Activity Detection/Comfort Noise Generation (VAD/CNG) algorithm is newly applied to the prior speech tree coder to lower the average bit rate by increasing encoding efficiency. A backward adaptive all-pole short-term predictor, which was cascaded to a pitch-based long-term predictor, is replaced with a backward adaptive pole-zero predictor for better input waveform-tracking performance with higher accuracy of prediction. The RAR MTC encodes the initial samples of each voiced region by spanning a 5-level Pitch Compensating Quantizer (PCQ) tree, and then, our randomly interleaved 4-level and 2-level multitree (4-2 MTC) is used to encode the rest of voiced samples with a set of prediction parameters initialized by the 5-level tree coding. A newly developed gain control algorithm for a 2-level tree based on the polarity pattern of the past 5 excitation values advances its gain tracking performance.In our simulations, the results show that those new features we have developed enable the RAR MTC codec to achieve very competitive performance with a lower delay and more natural tone recovery compared to the widely used standard, AMR-NB, which is built on a CELP structure based on a block-based predictive model.
- Published
- 2023
17. Stage Audio Classifier Using Artificial Neural Network
- Author
-
Arun Sankar, M. S., Bobba, Tharak Sai, Sathi Devi, P. S., Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zhang, Junjie James, Series Editor, Bindhu, V., editor, Chen, Joy, editor, and Tavares, João Manuel R. S., editor
- Published
- 2020
- Full Text
- View/download PDF
18. Modification of Pitch Parameters in Speech Coding for Information Hiding
- Author
-
Radej, Adrian, Janicki, Artur, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sojka, Petr, editor, Kopeček, Ivan, editor, Pala, Karel, editor, and Horák, Aleš, editor
- Published
- 2020
- Full Text
- View/download PDF
19. Speech Coding Using Discrete Cosine Transform and Chaotic Map.
- Author
-
Jamal, Marwa and Hassan, Tariq A.
- Subjects
SPEECH coding ,DISCRETE cosine transforms - Abstract
Recently, data of multimedia performs an exponentially blowing tendency, saturating daily life of humans. Various modalities of data, includes images, texts and video, plays important role in different aspects and has wide. However, the key problem of utilizing data of large scale is cost of processing and massive storage. Therefore, for efficient communications and for economical storage requires effective techniques of data compression to reduce the volume of data. Speech coding is a main problem in the area of digital speech processing. The process of converting the voice signals into a more compressed form is speech coding. In this work, we demonstrate that a DCT with a chaotic system combined with run-length coding can be utilized to implement speech coding of very low bit-rate with high reconstruction quality. Experimental result show that compression ratio is about 13% when implemented on Librispeech dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
20. Speech Compression
- Author
-
Gibson, Jerry D
- Subjects
speech coding ,voice coding ,speech coding standards ,speech coding performance ,linear prediction of speech ,Information and Computing Sciences - Published
- 2016
21. Neurally Optimized Decoder for Low Bitrate Speech Codec.
- Author
-
Kim, Hyung Yong, Yoon, Ji Won, Cho, Won Ik, and Kim, Nam Soo
- Subjects
VIDEO coding ,AUTOMATIC speech recognition ,SPEECH processing systems ,GENERATIVE adversarial networks ,BINARY sequences ,CODECS - Abstract
Recently, a conventional neural decoder for speech codec has shown promising performance. However, it typically requires some prior knowledge of decoding such as bit allocation or dequantization scheme, which is not a universal solution for many different kinds of speech codecs. In order to address this limitation, we propose a neurally optimized decoder based on a generative model which can directly reconstruct the speech from the bitstream without a prior knowledge. The proposed decoder mainly consists of two components: 1) a dequantization model to group and dequantize related bits from the bitstream and 2) a generative model to restore the speech conditioned on the output of the dequantization model. Through experiments with mixed excitation linear prediction (MELP), Advanced multi-band excitation (AMBE), and SPEEX at around 2.4 kb/s, it is showed that the proposed model showed better performance in most of the objective and subjective evaluation compared to the conventional speech codecs. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. AN EFFICIENT SPEECH GENERATIVE MODEL BASED ON DETERMINISTIC/STOCHASTIC SEPARATION OF SPECTRAL ENVELOPES
- Author
-
M. Taha, E. S. Azarov, D. S. Likhachov, and A. A. Petrovsky
- Subjects
speech generative model ,harmonic plus noise model ,speech analysis ,speech coding ,Electronics ,TK7800-8360 - Abstract
The paper presents a speech generative model that provides an efficient way of generating speech waveform from its amplitude spectral envelopes. The model is based on hybrid speech representation that includes deterministic (harmonic) and stochastic (noise) components. The main idea behind the approach originates from the fact that speech signal has a determined spectral structure that is statistically bound with deterministic/stochastic energy distribution in the spectrum. The performance of the model is evaluated using an experimental low-bitrate wide-band speech coder. The quality of reconstructed speech is evaluated using objective and subjective methods. Two objective quality characteristics were calculated: Modified Bark Spectral Distortion (MBSD) and Perceptual Evaluation of Speech Quality (PESQ). Narrow-band and wide-band versions of the proposed solution were compared with MELP (Mixed Excitation Linear Prediction) speech coder and AMR (Adaptive Multi-Rate) speech coder, respectively. The speech base of two female and two male speakers were used for testing. The performed tests show that overall performance of the proposed approach is speaker-dependent and it is better for male voices. Supposedly, this difference indicates the influence of pitch highness on separation accuracy. In that way, using the proposed approach in experimental speech compression system provides decent MBSD values and comparable PESQ values with AMR speech coder at 6,6 kbit/s. Additional subjective listening testsdemonstrate that the implemented coding system retains phonetic content and speaker’s identity. It proves consistency of the proposed approach.
- Published
- 2020
- Full Text
- View/download PDF
23. Using the Random Components of the Jitter of Speech Pitch Period to Assess the State of the User of Social-Cyber-Physical System
- Author
-
E. Pakulova, I. Vatamaniuk, V. Budkov, R. Iakovlev, and M. Nosov
- Subjects
frequency estimation ,jitter ,pitch control ,speech analysis ,speech coding ,speech synthesis ,Telecommunication ,TK5101-6720 - Abstract
Socio-cyber-physical systems are focused on perceptual interaction with users and involve analyzing and translating his or her physiological and psycho-emotional state. An actual scientific problem is to determine the latter basing on the user’s speech signal. In particular, it can be solved relying on investigating the jitter of the speech pitch period. In this paper we propose an algorithm that allows one to improve the noise immunity of determining the pitch period of speech signal and a method of jitter determination based on averaging the change of the pitch period relatively to the current value. We also propose an algorithm for separating periodic and random pitch jitter based on using the discrete Fourier transform on the sequence of the pitch periods with the presence of unknown values in the unvoiced speech frames. Simulation shows that the proposed approach of filling the unknown values of pitch period has better results compared to the existing methods based on interpolation of the nearest known values.
- Published
- 2019
- Full Text
- View/download PDF
24. An Adaptive Bitrate Switching Algorithm for Speech Applications in Context ofWebRTC.
- Author
-
ALAHMADI, MOHANNAD, POCTA, PETER, and MELVIN, HUGH
- Subjects
ALGORITHMS ,IP networks ,DIGITAL music ,AUDIO codec ,PSYCHOLOGICAL factors ,AUTOMATIC speech recognition - Abstract
Web Real-Time Communication (WebRTC) combines a set of standards and technologies to enable highquality audio, video, and auxiliary data exchange in web browsers and mobile applications. It enables peer-topeer multimedia sessions over IP networks without the need for additional plugins. The Opus codec, which is deployed as the default audio codec for speech and music streaming in WebRTC, supports a wide range of bitrates. This range of bitrates covers narrowband, wideband, and super-wideband up to fullband bandwidths. Users of IP-based telephony always demand high-quality audio. In addition to users' expectation, their emotional state, content type, and many other psychological factors; network quality of service; and distortions introduced at the end terminals could determine their quality of experience. To measure the quality experienced by the end user for voice transmission service, the E-model standardized in the ITU-T Rec. G.107 (a narrowband version), ITU-T Rec. G.107.1 (a wideband version), and the most recent ITU-T Rec. G.107.2 extension for the super-wideband E-model can be used. In this work, we present a quality of experience model built on the E-model to measure the impact of coding and packet loss to assess the quality perceived by the end user in WebRTC speech applications. Based on the computed Mean Opinion Score, a real-time adaptive codec parameter switching mechanism is used to switch to the most optimum codec bitrate under the present network conditions. We present the evaluation results to show the effectiveness of the proposed approach when compared with the default codec configuration in WebRTC. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
25. Compressed Sensing-Speech Coding Scheme for Mobile Communications.
- Author
-
Haneche, Houria, Ouahabi, Abdeldjalil, and Boudraa, Bachir
- Subjects
- *
MOBILE communication systems , *TELECOMMUNICATION systems , *VECTOR quantization , *INTELLIGIBILITY of speech , *ORAL communication , *COMPRESSED sensing - Abstract
A new source coding is proposed for secure and robust speech communications. The method is based on the combination of compressed sensing and split-multistage vector quantization. The proposed codec is integrated in an end-to-end communication system, and its performance is investigated in real mobile communication conditions. Channel compensation techniques are considered to mitigate the Rayleigh channel effects usually observed in mobile communications. Using the proposed speech coding scheme instead of current standards (e.g., AMR-WB) within the communication system results in a new end-to-end mobile communication design. The proposed design increases the transmission speed, robustness, and security without additional costs. For a bit rate of 8.85 kbit/s and in 10 dB Rayleigh environment, the recovered speech has a good perceptual evaluation of speech quality score close to 3.14 and a fair coherence speech intelligibility index value of around 0.47. Comparison with recent CS-based speech coding methods shows the merit of the proposed coder. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
26. A Simulation-Based Comparison on Code Excited Linear Prediction (CELP) Coder at Different Bit Rates
- Author
-
Joshi, Swati, Purohit, Hemant, Choudhary, Rita, Kacprzyk, Janusz, Series Editor, Tiwari, Basant, editor, Tiwari, Vivek, editor, Das, Kinkar Chandra, editor, Mishra, Durgesh Kumar, editor, and Bansal, Jagdish C., editor
- Published
- 2018
- Full Text
- View/download PDF
27. Mel Scale-Based Linear Prediction Approach to Reduce the Prediction Filter Order in CELP Paradigm.
- Author
-
Sankar, M. S. Arun and Sathidevi, P. S.
- Subjects
- *
LINEAR predictive coding , *VECTOR quantization , *LINEAR codes , *FORECASTING , *CLASSIFICATION algorithms - Abstract
This paper proposes a novel method to reduce the order of prediction filter from 10 to 7 in Code Excited Linear Prediction (CELP) coding framework by the inclusion of psychoacoustic Mel scale into Linear Predictive Coding (Mel-LPC). Efficient quantization methods using 2-split Vector Quantization (VQ) for Mel-LPC obtained a reduction of 4 bits/frame and resulted in a total bit gain of 200 bps. A weighting scheme for the Euclidean distance measure gave a reduction of 6 bits/frame that adds up to a total bit gain of 300 bps. A lower Mel-LPC order of 3 has been employed for unvoiced frames by using the perceptual quality as selection criteria and an efficient VQ method using 5 bits is developed which brought down the average bit requirement to 11.5 bits/frame. To incorporate this into Mel-LPC-based CELP encoding scheme, a neural network-based voiced-unvoiced classification algorithm using 5 derived features as input has been constructed and this selection of filter order based on signal statistics provides the benefit of bit reduction by 625 and 325 bps, respectively, for 10th order LPC and 7th order Mel-LPC. In addition to all, the incorporation of Mel-LPC gives a better performance in the estimation of formants. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
28. Combination and Comparison of Sound Coding Strategies Using Cochlear Implant Simulation With Mandarin Speech.
- Author
-
Huang, Enoch Hsin-Ho, Wu, Chao-Min, and Lin, Hung-Ching
- Subjects
COCHLEAR implants ,SPEECH ,INTELLIGIBILITY of speech ,HEARING aids ,NEURAL codes ,AUDITORY pathways - Abstract
Three cochlear implant (CI) sound coding strategies were combined in the same signal processing path and compared for speech intelligibility with vocoded Mandarin sentences. The three CI coding strategies, biologically-inspired hearing aid algorithm (BioAid), envelope enhancement (EE), and fundamental frequency modulation (F0mod), were combined with the advanced combination encoder (ACE) strategy. Hence, four singular coding strategies and four combinational coding strategies were derived. Mandarin sentences with speech-shape noise were processed using these coding strategies. Speech understanding of vocoded Mandarin sentences was evaluated using short-time objective intelligibility (STOI) and subjective sentence recognition tests with normal-hearing listeners. For signal-to-noise ratios at 5 dB or above, the EE strategy had slightly higher average scores in both STOI and listening tests compared to ACE. The addition of EE to BioAid slightly increased the mean scores for BioAid+EE, which was the combination strategy with the highest scores in both objective and subjective speech intelligibility. The benefits of BioAid, F0mod, and the four combinational coding strategies were not observed in CI simulation. The findings of this study may be useful for the future design of coding strategies and related studies with Mandarin. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
29. Power-Weighted LPC Formant Estimation.
- Author
-
de Frein, Ruairi
- Abstract
A power-weighted formant frequency estimation procedure based on Linear Predictive Coding (LPC) is presented. It works by pre-emphasizing the dominant spectral components of an input signal, which allows a subsequent estimation step to extract formant frequencies with greater accuracy. The accuracy of traditional LPC formant estimation is improved by this new power-weighted formant estimator for different classes of synthetic signals and for speech. Power-weighted LPC significantly and reliably outperforms LPC and variants of LPC at the task of formant estimation using the VTR formants dataset, a database consisting of the Vocal Tract Resonance (VTR) frequency trajectories obtained by human experts for the first three formant frequencies. This performance gain is evident over a range of filter orders. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
30. A Reinforcement Learning Approach to Speech Coding
- Author
-
Jerry Gibson and Hoontaek Oh
- Subjects
reinforcement learning ,speech coding ,exploration ,exploitation ,dual control ,Information technology ,T58.5-58.64 - Abstract
Speech coding is an essential technology for digital cellular communications, voice over IP, and video conferencing systems. For more than 25 years, the main approach to speech coding for these applications has been block-based analysis-by-synthesis linear predictive coding. An alternative approach that has been less successful is sample-by-sample tree coding of speech. We reformulate this latter approach as a multistage reinforcement learning problem with L step lookahead that incorporates exploration and exploitation to adapt model parameters and to control the speech analysis/synthesis process on a sample-by-sample basis. The minimization of the spectrally shaped reconstruction error to finite depth manages complexity and serves as an effective stand in for the overall subjective evaluation of reconstructed speech quality and intelligibility. Different control policies that attempt to persistently excite the system states and that encourage exploration are studied and evaluated. The resulting methods produce reconstructed speech quality competitive with the most popular speech codec utilized today. This new reinforcement learning formulation provides new insights and opens up new directions for system design and performance improvement.
- Published
- 2022
- Full Text
- View/download PDF
31. Afferent Coding and Efferent Control in the Normal and Impaired Cochlea
- Author
-
Sayles, Mark, Heinz, Michael G., Fay, Richard R., Series editor, Popper, Arthur N., Series editor, Manley, Geoffrey A., editor, and Gummer, Anthony W., editor
- Published
- 2017
- Full Text
- View/download PDF
32. Utterance Verification-Based Dysarthric Speech Intelligibility Assessment Using Phonetic Posterior Features.
- Author
-
Fritsch, Julian and Magimai-Doss, Mathew
- Subjects
INTELLIGIBILITY of speech ,AUTOMATIC speech recognition ,PEARSON correlation (Statistics) ,RANK correlation (Statistics) ,STATISTICAL correlation ,DYSARTHRIA - Abstract
In the literature, the task of dysarthric speech intelligibility assessment has been approached through development of different low-level feature representations, subspace modeling, phone confidence estimation or measurement of automatic speech recognition system accuracy. This paper proposes a novel approach where the intelligibility is estimated as the percentage of correct words uttered by a speaker with dysarthria by matching and verifying utterances of the speaker with dysarthria against control speakers’ utterances in phone posterior feature space and broad phonetic posterior feature space. Experimental validation of the proposed approach on the UA-Speech database, with posterior feature estimators trained on the data from auxiliary domain and language, obtained a best Pearson's correlation coefficient (r) of 0.950 and Spearman's correlation coefficient (ρ) of 0.957. Furthermore, replacing control speakers’ speech with speech synthesized by a neural text-to-speech system obtained a best r of 0.931 and ρ of 0.961. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
33. Three-Level Delta Modulation with Second-Order Prediction for Gaussian Source Coding
- Author
-
PERIC, Z., DENIC, B., and DESPOTOVIC, V.
- Subjects
delta modulation ,Huffman coding ,predictive coding ,speech coding ,signal to noise ratio ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 ,Computer engineering. Computer hardware ,TK7885-7895 - Abstract
An adaptive three-level delta modulation with a switched second-order linear prediction is proposed in this paper, intended for encoding the time-varying signals modeled by Gaussian distribution. The input signal is processed frame-by-frame, and the adaptation of the quantizer is performed at the frame level. The signal at the output of quantizer is further processed using variable length encoder to decrease the bit rate. The performance is tested in speech coding, showing that the proposed algorithm provides much wider dynamic range and attains higher Signal to Noise Ratio with respect to the baselines, including CFDM, CVSDM and 2-bit Adaptive Delta Modulation.
- Published
- 2018
- Full Text
- View/download PDF
34. Multilevel Delta Modulation with Switched First-Order Prediction for Wideband Speech Coding
- Author
-
Zoran Peric, Bojan Denic, and Vladimir Despotovic
- Subjects
quantization ,delta modulation ,correlation coefficient ,speech coding ,signal to noise ratio. ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In this paper a delta modulation speech coding scheme based on the ITU-T G.711 standard and the switched first-order predictor is presented. The forward adaptive scheme is used, where the adaptation to the signal variance is performed on frame-by-frame basis. The classification of the frames into weakly and highly correlated was done based on the correlation coefficient calculated for each frame, providing a basis for choosing the appropriate predictor coefficient. The obtained results indicate that the proposed model significantly outperforms the scalar companding system based on the G.711 standard. The obtained experimental results were verified using the theoretical model in the wide dynamic range of the input variance. DOI: http://dx.doi.org/10.5755/j01.eie.24.1.20156
- Published
- 2018
- Full Text
- View/download PDF
35. Dominant frequency component tracking of noisy time‐varying signals using the linear predictive coding pole processing method.
- Author
-
Xu, Jin, Davis, Mark, and de Fréin, Ruairí
- Subjects
- *
LINEAR predictive coding , *TRANSFER functions , *RADIO frequency modulation , *SPEECH coding , *NOISE - Abstract
The linear predictive coding pole processing (LPCPP) method proposed in our previous work overcomes the shortcomings of the LPC method, especially its sensitivity to noise and the filter order. The LPCPP method is a parameterised method that involves processing the LPC poles to produce a series of reduced‐order filter transfer functions to estimate the dominant frequency components of a signal. This paper analyses the ability of the LPCPP method to track the frequency changes of noisy, time‐varying signals in real‐time. Linear chirped frequency modulation signals are used in a series of experiments to simulate signals with different rates of frequency change. The results show that the LPCPP method can achieve real‐time tracking of the dominant frequency in the signal and outperforms the LPC method under different frequency change rates and different noise levels. Specifically, the valid estimate percentage of LPCPP is up to 41.3% higher than that of LPC which indicates that the LPCPP method significantly improves the validity of frequency estimates. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
36. 截幅失真对低速语音编码的影响分析及改进.
- Author
-
吴彭龙, 邹霞, 孙蒙, and 张星昱
- Abstract
Copyright of Journal of Signal Processing is the property of Journal of Signal Processing and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2020
- Full Text
- View/download PDF
37. Design of MELPe-Based Variable-Bit-Rate Speech Coding with Mel Scale Approach Using Low-Order Linear Prediction Filter and Representing Excitation Signal Using Glottal Closure Instants.
- Author
-
Arun Sankar, M. S. and Sathidevi, P. S.
- Subjects
- *
VIDEO coding , *SPEECH synthesis , *LINEAR predictive coding , *ALGORITHMS , *CODING theory , *BIT rate , *VECTOR quantization , *VIDEO compression , *EUCLIDEAN distance - Abstract
In this paper, we propose a variable-bit-rate speech codec-based on mixed excitation linear prediction enhanced (MELPe) with an average bit rate of 2 kbps and with a better representation of excitation signal. The order of the prediction filter in MELPe coding architecture is reduced from 10 to 7 without affecting the perceptual quality of the decoded speech by using psychoacoustic Mel scale. An efficient two-split vector quantization is developed with weighted Euclidean distance measure for Mel scale-based linear predictive coding (Mel-LPC), and it requires only 18 bits/frame. The instantaneous pitch or epoch that is vital for many speech processing applications is preserved in this codec by including it in the excitation signal used for reconstructing the voiced speech. The quantization scheme developed for glottal closure instants (GCIs) causes an increase in the bit requirement for voiced frames by 4–25 bits depending on the position of GCIs. To compensate for that, the Mel-LPC order for both silence and unvoiced frames has been brought down to 4 without compromising the perceptual quality of reconstructed speech. The lowered bit budget for unvoiced frame is 41 bits/frame, and for silence, it is 31 bits/frame. Further reduction of 10 bits for silence frame is obtained by reducing the number of transmitted parameters and by tuning the quantization bit requirement for each. For categorizing the speech frames at the entry of the encoder, a neural network-based voiced/unvoiced/silence classification algorithm using five-dimensional feature set is created. The experimental results show that the proposed coding scheme operates at an average bit rate of 2 kbps, which is less than the bit rate of MELPe (2.4 kbps), but with a better perceptual score. In addition to all these, the incorporation of Mel-LPC gives a better performance in the estimation of formants and GCIs. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
38. MASS: Microphone Array Speech Simulator in Room Acoustic Environment for Multi-Channel Speech Coding and Enhancement.
- Author
-
Cheng, Rui, Bao, Changchun, and Cui, Zihao
- Subjects
SPEECH enhancement ,MICROPHONE arrays ,ORAL communication ,ACOUSTIC arrays ,SPEECH - Abstract
Featured Application: The proposed MASS can simulate multiple signals collected by microphone array in room acoustic environment for multi-channel speech coding and enhancement. Multi-channel speech coding and enhancement is an indispensable technology in speech communication. In order to verify the effectiveness of multi-channel speech coding and enhancement methods in the research and development, a microphone array speech simulator (MASS) used in room acoustic environment is proposed. The proposed MASS is the improvement and extension of the existing multi-channel speech simulator. It aims to simulate clean speech, noisy speech, clean speech with reverberation, noisy speech with reverberation, and noise signals by microphone array used for multi-channel coding and enhancement of speech signal in room acoustic environment. The experimental results of the multi-channel speech coding and enhancement prove that the MASS could well simulate the signals used in real room acoustic environment and can be applied to the research of the related fields. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
39. Maximum entropy PLDA for robust speaker recognition under speech coding distortion.
- Author
-
Krobba, Ahmed, Debyeche, Mohamed, and Selouani, Sid. Ahmed
- Subjects
SPEECH perception ,SPEECH synthesis ,AUTOMATIC speech recognition ,FISHER discriminant analysis ,ENTROPY ,VERBAL behavior testing ,SYSTEM identification ,MAXIMUM entropy method - Abstract
The system combining i-vector and probabilistic linear discriminant analysis (PLDA) has been applied with great success in the speaker recognition task. The i-vector space gives a low-dimensional representation of a speech segment and training data of a PLDA model, which offers greater robustness under different conditions. In this paper, we propose a new framework based on i-vector/PLDA and Maximum Entropy (ME) to improve the performance of speaker identification system in the presence of speech coding distortion. The results are reported on TIMIT database and speech coding obtained by passing the speech test from TIMIT database through the AMR encoder/decoder. Our results show that the proposed methode achieves improved performance when compared with the i-vector/PLDA and MEGMM. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
40. A Dynamic FEC for Improved Robustness of CELP-Based Codec
- Author
-
Benamirouche, Nadir, Boudraa, Bachir, Gomez, Angel M., Pérez-Córdoba, José L., López-Espejo, Iván, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Abad, Alberto, editor, Ortega, Alfonso, editor, Teixeira, António, editor, García Mateo, Carmen, editor, Martínez Hinarejos, Carlos D., editor, Perdigão, Fernando, editor, Batista, Fernando, editor, and Mamede, Nuno, editor
- Published
- 2016
- Full Text
- View/download PDF
41. Scalable Identity-Oriented Speech Retrieval
- Author
-
Chen, Chaotao, Jiang, Di, Peng, Jinhua, Lian, Rongzhong, Li, Yawen, Zhang, Chen, Chen, Lei, Fan, Lixin, Chen, Chaotao, Jiang, Di, Peng, Jinhua, Lian, Rongzhong, Li, Yawen, Zhang, Chen, Chen, Lei, and Fan, Lixin
- Abstract
With the prevalence of voice devices in our daily life, speech data is accumulated at an unprecedented speed. The vast amount of speech data form an invaluable database for security surveillance and financial risk management. However, the speeches collected from different sources are not necessarily annotated with regard to the speaker identity, making the task of retrieving all the speech records for a given identity extremely challenging. In this paper, we propose a scalable system for Identity-Oriented Speech Retrieval (IO-SR), which seamlessly integrates speaker modeling and deep indexing techniques. Given a speech snippet and a speech database, IO-SR efficiently retrieves all speech snippets that are uttered by the same speaker as the given one. Evaluations on an industrial dataset containing millions of speech snippets show that our system achieves superior performance compared with the state-of-the-art methods.
- Published
- 2023
42. Objective measurement of voice activity detectors
- Author
-
Murrin, Paul
- Subjects
621.3822 ,Speech coding ,Detection ,Speech quality - Published
- 1999
43. Hybrid digital-analog coding with bandwidth expansion for correlated Gaussian sources under Rayleigh fading
- Author
-
Pradeepa Yahampath
- Subjects
Hybrid digital-analog coding ,Predictive quantization ,Transform coding ,Fading channels ,Speech coding ,Telecommunication ,TK5101-6720 ,Electronics ,TK7800-8360 - Abstract
Abstract Consider communicating a correlated Gaussian source over a Rayleigh fading channel with no knowledge of the channel signal-to-noise ratio (CSNR) at the transmitter. In this case, a digital system cannot be optimal for a range of CSNRs. Analog transmission however is optimal at all CSNRs, if the source and channel are memoryless and bandwidth matched. This paper presents new hybrid digital-analog (HDA) systems for sources with memory and channels with bandwidth expansion, which outperform both digital-only and analog-only systems over a wide range of CSNRs. The digital part is either a predictive quantizer or a transform code, used to achieve a coding gain. Analog part uses linear encoding to transmit the quantization error which improves the performance under CSNR variations. The hybrid encoder is optimized to achieve the minimum AMMSE (average minimum mean square error) over the CSNR distribution. To this end, analytical expressions are derived for the AMMSE of asymptotically optimal systems. It is shown that the outage CSNR of the channel code and the analog-digital power allocation must be jointly optimized to achieve the minimum AMMSE. In the case of HDA predictive quantization, a simple algorithm is presented to solve the optimization problem. Experimental results are presented for both Gauss-Markov sources and speech signals.
- Published
- 2017
- Full Text
- View/download PDF
44. Three-Level Delta Modulation for Laplacian Source Coding
- Author
-
DENIC, B., PERIC, Z., and DESPOTOVIC, V.
- Subjects
delta modulation ,Huffman coding ,predictive coding ,signal to noise ratio ,speech coding ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 ,Computer engineering. Computer hardware ,TK7885-7895 - Abstract
This paper proposes a novel solution for coding of time varying signals with Laplacian distribution, which is based on delta modulation and three-level quantization. It upgrades the conventional scheme by introducing the quantizer with variable length code. Forward adaptive scheme is used, where the adaptation to the signal variance is performed on frame-by-frame basis. We employ configurations with simple fixed first-order predictor and switched first-order predictor utilizing correlation. Furthermore, we propose different methods for optimizing predictor coefficients. The configurations are tested on speech signal and compared to an adaptive two-level and four-level delta modulation, showing that proposed three-level delta modulation offers performance comparable to a four-level baseline with significant savings in bit rate.
- Published
- 2017
- Full Text
- View/download PDF
45. Very low bit rate voice compression for mobile communications
- Author
-
Brooks, Fiona Clare Angharad
- Subjects
621.382 ,Speech coding ,PWI ,MBE - Abstract
This thesis is concerned with very low bit rate voice compression for mobile communications, concentrating exclusively on speech encoders beneath 4kbps, namely Prototype Waveform Interpolation (PWI), Multiband Excitation (MBE) and Sinusoidal Transform Coding (STC). Specifically, a 1.9kbps PWI speech encoder was developed, which employed Zinc Function Excitation (ZFE) to represent the voiced speech. Analysis-by-synthesis was adopted to select the best ZFE for each voiced segment. The error sensitivity of this speech coder was assessed together with harnessing several ZFEs for each excitation optimization sub-segment, in order to create a higher speech quality, higher bit rate speech encoder. Furthermore, several MBE speech encoders were developed, where the first version adopted simple pulse excitation with five MBE bands to create a 2.3kbps speech coder. Another speech encoder incorporated adds three MBE in the previously developed PWI-ZFE speech coder, in order to encode speech at 2.35kbps. The performance of MBE at a higher bit rate, where more frequency bands have been added, is also discussed. It was found that the PWI-ZFE speech coder with three MBE bands was preferred by 64.10% of listeners. The third technique, namely STC was developed into a STC-PWI speech coder operating at 2.4 and 3.8kpbs. Analysis-by-synthesis was harnessed to determine the best Fourier coefficients for each segment. A number of wavelet techniques were also examined in this thesis, which were employed in the process of pitch detection and voiced-unvoiced determination. A pitch detector with an overall pitch estimation error rate of 3.9% was developed.
- Published
- 1998
46. A scalable speech coding scheme using compressive sensing and orthogonal mapping based vector quantization
- Author
-
M.S. Arun Sankar and P.S. Sathidevi
- Subjects
Electrical engineering ,Speech processing ,Wavelet ,Speech coding ,CELP ,Compressive sensing ,Science (General) ,Q1-390 ,Social sciences (General) ,H1-99 - Abstract
A novel scalable speech coding scheme based on Compressive Sensing (CS), which can operate at bit rates from 3.275 to 7.275 kbps is designed and implemented in this paper. The CS based speech coding offers the benefit of combined compression and encryption with inherent de-noising and bit rate scalability. The non-stationary nature of speech signal causes the recovery process from CS measurements very complex due to the variation in sparsifying bases. In this work, the complexity of the recovery process is reduced by assigning a suitable basis to each frame of the speech signal based on its statistical properties. As the quality of the reconstructed speech depends on the sensing matrix used at the transmitter, a variant of Binary Permuted Block Diagonal (BPBD) matrix is also proposed here which offers a better performance than that of the commonly used Gaussian random matrix. To improve the coding efficiency, formant filter coefficients are quantized using the conventional Vector Quantization (VQ) and an orthogonal mapping based VQ is developed for the quantization of CS measurements. The proposed coding scheme offers the listening quality for reconstructed speech similar to that of Adaptive Multi rate - Narrowband (AMR-NB) codec at 6.7 kbps and Enhanced Voice Services (EVS) at 7.2 kbps. A separate de-noising block is not required in the proposed coding scheme due to the inherent de-noising property of CS. Scalability in bit rate is achieved in the proposed method by varying the number of random measurements and the number of levels for orthogonal mapping in the VQ stage of measurements.
- Published
- 2019
- Full Text
- View/download PDF
47. Advanced linear predictive speech compression at 3.0 kbits/sec and below
- Author
-
Atkinson, Ian Andrew
- Subjects
621.3822 ,Speech coding ,Vocoder modelling ,Mobile phones - Published
- 1997
48. Audio Bank: A High-Level Acoustic Signal Representation for Audio Event Recognition
- Author
-
Sukanya Sonowal, Jin Young Choi, and Tushar Sandhan
- Subjects
FOS: Computer and information sciences ,Audio mining ,Sound (cs.SD) ,Artificial neural network ,Event (computing) ,Computer science ,business.industry ,Feature vector ,Speech recognition ,Speech coding ,Acoustic model ,Pattern recognition ,Computer Science - Sound ,Non-negative matrix factorization ,Computer Science - Information Retrieval ,Support vector machine ,Audio and Speech Processing (eess.AS) ,Computer Science::Sound ,Computer Science::Multimedia ,FOS: Electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,business ,Information Retrieval (cs.IR) ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Automatic audio event recognition plays a pivotal role in making human robot interaction more closer and has a wide applicability in industrial automation, control and surveillance systems. Audio event is composed of intricate phonic patterns which are harmonically entangled. Audio recognition is dominated by low and mid-level features, which have demonstrated their recognition capability but they have high computational cost and low semantic meaning. In this paper, we propose a new computationally efficient framework for audio recognition. Audio Bank, a new high-level representation of audio, is comprised of distinctive audio detectors representing each audio class in frequency-temporal space. Dimensionality of the resulting feature vector is reduced using non-negative matrix factorization preserving its discriminability and rich semantic information. The high audio recognition performance using several classifiers (SVM, neural network, Gaussian process classification and k-nearest neighbors) shows the effectiveness of the proposed method., 6 pages, 9 figures, published in IEEE International Conf ICCAS 2014 (Best paper award)
- Published
- 2023
49. An investigation into a speaker dependent coding system
- Author
-
Murray, Alan
- Subjects
005 ,Speech repetition ,Speech coding ,Neural networks - Published
- 1996
50. The evaluation and prediction of the performance for future GSM-based digital mobile radio systems
- Author
-
Chung, Yeon Ho
- Subjects
621.382 ,Speech coding - Published
- 1996
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.