Descriptor: "SPEECH codecs" / Topic: speech coding - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"SPEECH codecs"' showing total 25 results

Start Over Descriptor "SPEECH codecs" Topic speech coding

25 results on '"SPEECH codecs"'

1. Neurally Optimized Decoder for Low Bitrate Speech Codec.

Author: Kim, Hyung Yong, Yoon, Ji Won, Cho, Won Ik, and Kim, Nam Soo
Subjects: VIDEO coding, AUTOMATIC speech recognition, SPEECH processing systems, GENERATIVE adversarial networks, BINARY sequences, CODECS
Abstract: Recently, a conventional neural decoder for speech codec has shown promising performance. However, it typically requires some prior knowledge of decoding such as bit allocation or dequantization scheme, which is not a universal solution for many different kinds of speech codecs. In order to address this limitation, we propose a neurally optimized decoder based on a generative model which can directly reconstruct the speech from the bitstream without a prior knowledge. The proposed decoder mainly consists of two components: 1) a dequantization model to group and dequantize related bits from the bitstream and 2) a generative model to restore the speech conditioned on the output of the dequantization model. Through experiments with mixed excitation linear prediction (MELP), Advanced multi-band excitation (AMBE), and SPEEX at around 2.4 kb/s, it is showed that the proposed model showed better performance in most of the objective and subjective evaluation compared to the conventional speech codecs. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

2. A scalable wideband speech codec using the wavelet packet transform based on the internet low bitrate codec.

Author: Seto, Koji and Ogunfunmi, Tokunbo
Subjects: *BROADBAND communication systems, *WAVELET transforms, *SPEECH codecs, *DISCRETE cosine transforms, *SPEECH codes theory, *INTERNET, *LINEAR network coding
Abstract: Highlights • A packet-loss robust scalable wideband speech codec is proposed. • Performance is improved using the wavelet transform instead of the MDCT. • The proposed codec outperforms the state-of-the-art codec in objective tests. • High performance of the proposed codec is confirmed in subjective tests. Abstract Most recent speech codecs employ code excited linear prediction (CELP) and transmit side information to improve speech quality under packet loss. Another approach to achieve high robustness to packet loss is to use the frame independent coding scheme based on the internet low bitrate codec (iLBC). The scalable wideband speech codec based on the iLBC was previously presented and outperformed G.729.1 at most bit rates according to the objective quality. This paper presents improvements to the previous work. Specifically, we employ the wavelet packet transform (WPT) instead of the modified discrete cosine transform (MDCT) to enhance the quality, and evaluate the proposed codec based on both the objective and subjective quality measures. The objective quality evaluation results show that clear improvement is achieved and that the proposed codec outperforms G.729.1 at the bit rate of 18 kbps or higher under clean channel conditions and has higher robustness to packet loss than G.729.1. The informal subjective test results also show similar trends. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

3. A Novel AMR-WB Speech Steganography Based on Diameter-Neighbor Codebook Partition.

Author: He, Junhui, Chen, Junxi, Xiao, Shichang, Tang, Shaohua, and Huang, Xiaoyu
Subjects: CRYPTOGRAPHY, SPEECH coding, SPEECH codecs, CRYPTOGRAPHY research, CRYPTOGRAPHIC equipment
Abstract: Steganography is a means of covert communication without revealing the occurrence and the real purpose of communication. The adaptive multirate wideband (AMR-WB) is a widely adapted format in mobile handsets and is also the recommended speech codec for VoLTE. In this paper, a novel AMR-WB speech steganography is proposed based on diameter-neighbor codebook partition algorithm. Different embedding capacity may be achieved by adjusting the iterative parameters during codebook division. The experimental results prove that the presented AMR-WB steganography may provide higher and flexible embedding capacity without inducing perceptible distortion compared with the state-of-the-art methods. With 48 iterations of cluster merging, twice the embedding capacity of complementary-neighbor-vertices-based embedding method may be obtained with a decrease of only around 2% in speech quality and much the same undetectability. Moreover, both the quality of stego speech and the security regarding statistical steganalysis are better than the recent speech steganography based on neighbor-index-division codebook partition. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

4. Automatic speaker verification on narrowband and wideband lossy coded clean speech.

Author: Jarina, Roman, Polacký, Jozef, Počta, Peter, and Chmulík, Michal
Abstract: Substantial progress has been achieved in voice‐based biometrics in recent times but a variety of challenges still remain for speech research community. One such obstacle is reliable speaker authentication from speech signals degraded by lossy compression. Compression is commonplace in modern telecommunications, such as mobile telephony, VoIP services, teleconference, voice messaging or gaming. In this study, the authors investigate the effect of lossy speech compression on text‐independent speaker verification. Voice biometrics performance is evaluated on clean speech signals distorted by the state‐of‐the‐art narrowband (NB) as well as wideband (WB) speech codecs. The tests are performed in both channel‐matched and channel‐mismatched scenarios. The test results show that coded WB speech improves voice authentication precision by 1–3% of equal error rate over coded NB speech, even at the lowest investigated bitrates. It is also shown that the enhanced voice services codec does not provide better results than the other codecs involved in this study. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

5. Pitch-based steganography for Speex voice codec.

Author: Janicki, Artur
Subjects: SPEECH codecs, CRYPTOGRAPHY, INTONATION (Phonetics), SPEECH processing systems, BIT rate
Abstract: This paper presents an improved version of a steganographic algorithm for IP telephony called HideF0. It is based on approximating the F0 parameter, which is responsible for conveying information about the pitch of the speech signal. The bits saved due to simplification of the pitch contour are used for the hidden transmission. In our experiments, the proposed method was applied to the narrowband Speex codec working in five different modes, with bitrates between 5,950 bps and 24,600 bps. We showed that HideF0 was able to create hidden channels with steganographic bandwidths of around 200 bps at the expense of a steganographic cost of between 0.5 and 0.7 MOS, depending on the Speex mode. Because of placing the approximation flag in the voice packet header, the improved version of the proposed algorithm yielded a significantly lower decrease in speech quality, when compared with the original version of HideF0. In addition, for low bitrates of the hidden channel (i.e., below ca. 50 bps) it was able to operate without introducing any steganographic cost. Copyright © 2016 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

6. Incorporating data hiding into G.729 speech codec.

Author: Yan, Shufan, Tang, Guangming, and Chen, Yanling
Subjects: SPEECH codecs, ENCODING, CODECS, SPEECH coding, SIGNS & symbols
Abstract: The rapid development of speech communication technology has made it possible for low bit-rate speech to become appropriate steganographic cover media. To incorporate data hiding into the low bit-rate speech codec, a novel steganography algorithm is proposed in this paper. By analyzing the encoding rule of fixed codebook vector, the way of transposing encoding locations of adjacent pulses is found to be suitable for data embedding with good imperceptibility. Based on encoding location transposition of adjacent pulses, the relationship between adjacent pulse locations is used to embed secret data while the fixed codebook search is being conducted during the encoding process of G.729 codec, which can maintain synchronization between data embedding and speech encoding. The experimental results demonstrate that the proposed steganography algorithm performs well in imperceptibility with a hiding capacity of 550 bits/s. Furthermore, the real-time and anti-detection performances are also satisfactory. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

7. Speech Coding Techniques.

Author: Jagtap, S.K., Mulye, M.S., and Uplane, M.D.
Subjects: SPEECH coding, DIGITAL technology, SIGNAL theory, SPEECH codecs, TECHNOLOGY
Abstract: Speech coding has been major issue in the area of digital speech processing. Speech coding is the process of transforming the speech signal in a more compressed form, which can then be transmitted with few numbers of binary digits. It is not possible to access unlimited bandwidth of a channel each time we send a signal across it which leads to code and compress speech signals. Speech compression is applied in long distance communication, high-class speech storage, and message encryption. Speech coding is a lossy type of coding and hence the output signal does not exactly sound like the input. Speech coding techniques discussed here are Linear predictive coding, waveform coding, Code excited linear predictive coding, etc. Linear Predictive Coding and Code Excited Linear Predictive Coding techniques are studied with the help of MATLAB to check their performance measures like compression ratio and speech audible quality. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

8. Influence of speech codecs selection on transcoding steganography.

Author: Janicki, Artur, Mazurczyk, Wojciech, and Szczypiorski, Krzysztof
Subjects: SPEECH codecs, CRYPTOGRAPHY, SPEECH coding, INTERNET protocols, TELEPHONE systems
Abstract: The typical approach to steganography is to compress the covert data in order to limit its size, which is reasonable in the context of a limited steganographic bandwidth. Trancoding steganography (TranSteg) is a new IP telephony steganographic method that was recently proposed that offers high steganographic bandwidth while retaining good voice quality. In TranSteg, compression of the overt data is used to make space for the steganogram. In this paper we focus on analyzing the influence of the selection of speech codecs on hidden transmission performance, that is, which codecs would be the most advantageous ones for TranSteg. Therefore, by considering the codecs which are currently most popular for IP telephony we aim to find out which codecs should be chosen for transcoding to minimize the negative influence on voice quality while maximizing the obtained steganographic bandwidth. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

9. The Codebook Design Based on Ant Colony Clustering Algorithm and its Application in AMR-WB.

Author: Li, Fenglian and Zhang, Xueying
Abstract: Codebook design plays a crucial role in the performance of speech signal processing systems based on vector quantization, such as speech codec, speech recognition and so on. LBG algorithm is one of the most effective methods to generate codebooks and is commonly used in multidimensional signal processing. But it has the problem of initialize codebook design. The paper proposed adopting ant colony clustering algorithm to design codebook. But the clustering number of ant colony clustering algorithm is automatically formed, it may not be exactly equal to the codebook size. So the paper put forward solving measures on the basis of nearest neighbor criterion or decomposition method. The codebook design procedures with ant colony clustering algorithm is given in the paper detailedly. Using the proposed method, the paper designed vector quantizer' codebooks of adaptive multi-rate wideband (AMR-WB) speech codec. Simulation results showed that the decoding speech qualities of AMR-WB speech codec were not almost degraded adopting newly designed codebooks. The transparent quantization effect of vector quantization was obtained. [ABSTRACT FROM PUBLISHER]
Published: 2012
Full Text: View/download PDF

10. Rate distortion performance bounds for wideband speech.

Author: Gibson, Jerry D. and Li, Ying-Yi
Abstract: We develop new rate distortion bounds for wideband speech sources based on phonetically-motivated composite source models, conditional rate distortion theory, and perceptual wideband PESQ (WPESQ) distortion measures. The approach is to calculate rate distortion bounds for MSE distortion for each subsource of the composite source model and use conditional rate distortion theory to calculate the MSE R(D) for the composite source. Since MSE is not a useful distortion measure for today's best-performing voice codecs, we generate a mapping of MSE-to-WPESQ using fully backward adaptive waveform coders, which have MSE distortion values that correctly order their performance, and for which WPESQ values can be generated. We generate the final rate distortion functions with the mapping and show that our new rate distortion curves lower bound the performance of the best known standardized wideband speech codecs. [ABSTRACT FROM PUBLISHER]
Published: 2012
Full Text: View/download PDF

11. Highly accurate non-intrusive speech forensics for codec identifications from observed decoded signals.

Author: Jenner, Frank and Kwasinski, Andres
Abstract: The ability to detect a particular speech codec from only the decoded audio has several useful forensic and system performance improvement applications. This paper presents a novel scheme for non-intrusive identification of speech codecs. The identification approach is based upon comparing a profile of a set of noise spectra and a time-domain histogram from the decoded speech to those from the candidate codecs. The presented results show a very high accuracy in identifying speech contemporary codecs from a diverse set of types and encoding rates. The presented codec identification scheme has a very low misidentification rate, including in the high coding rate regime where it improves on previous works by achieving perfect identification. This performance is achieved while reducing the duration of the analysis window of speech from 2 minutes to only 4 seconds. [ABSTRACT FROM PUBLISHER]
Published: 2012
Full Text: View/download PDF

12. Algebraic codebook search strategy for an algebraic code‐excited linear‐prediction speech coder by means of reduced candidate mechanism and iteration‐free pulse replacement.

Author: Yeh, Cheng‐Yu
Abstract: This work aims to present a combined version of reduced candidate mechanism (RCM) and iteration‐free pulse replacement (IFPR) as a novel and efficient way to enhance the performance of algebraic codebook search in an algebraic code‐excited linear‐prediction speech coder. As the first step, individual pulse contribution in each track is given by RCM, and the value of N is then specified. Subsequently, the replacement of a pulse is performed through the search over the sorted top N pulses by IFPR, and those of 2–4 pulses are carried out by a standard IFPR. Implemented on a G.729A speech codec, this proposal requires as few as 20 searches, a search load tantamount to 6.25% of G.729A, 31.25% of the global pulse replacement method (iteration = 2), 41.67% of IFPR, but still provides a comparable speech quality in any case. The aim of significant search performance improvement is hence achieved in this work. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

13. Cepstrum-Based Bandwidth Extension for Super-Wideband Coders.

Author: Keunseok Cho, Sangbae Jeong, and Minsoo Hahn
Subjects: CEPSTRUM analysis (Mechanics), BANDWIDTHS, ULTRA-wideband radar, DISCRETE cosine transforms, AUDIO codec
Abstract: This letter proposes a bandwidth extension (BWE) method using the cepstral envelope coding and duplication of quantized wideband (WB) signals by means of analysis -by-synthesis (AbS) for super-wideband (SWB) coders. In the proposed method, a high frequency band is generated by utilizing the quantized cepstral coefficients extracted from the envelope and the quantized modified discrete cosine transform (MDCT) shape of the wideband signal. The proposed method is compared with the latest G.718 SWB codec and the experimental results show that the proposed method outperforms the baseline codec both in subjective listening tests and objective performance measures. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

14. DeepVoCoder: A CNN Model for Compression and Coding of Narrow Band Speech

Author: Miroslav Voznak, Hacer Yalim Keles, H. Gokhan Ilk, and Jan Rozhon
Subjects: General Computer Science, Computer science, Speech recognition, Speech coding, Feature extraction, speech codecs, Bandwidth extension, Convolutional neural network, 02 engineering and technology, source coding, 01 natural sciences, Wideband audio, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Decimation, business.industry, Deep learning, 010401 analytical chemistry, General Engineering, deep learning, 020206 networking & telecommunications, 0104 chemical sciences, Convolutional code, Bit rate, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, lcsh:TK1-9971, Encoder
Abstract: This paper proposes a convolutional neural network (CNN)-based encoder model to compress and code speech signal directly from raw input speech. Although the model can synthesize wideband speech by implicit bandwidth extension, narrowband is preferred for IP telephony and telecommunications purposes. The model takes time domain speech samples as inputs and encodes them using a cascade of convolutional filters in multiple layers, where pooling is applied after some layers to downsample the encoded speech by half. The final bottleneck layer of the CNN encoder provides an abstract and compact representation of the speech signal. In this paper, it is demonstrated that this compact representation is sufficient to reconstruct the original speech signal in high quality using the CNN decoder. This paper also discusses the theoretical background of why and how CNN may be used for end-to-end speech compression and coding. The complexity, delay, memory requirements, and bit rate versus quality are discussed in the experimental results. Web of Science 7 75089 75081
Published: 2019
Full Text: View/download PDF

15. Decoder Initializing Technique for Improving Frame-Erasure Resilience of a CELP Speech Codec.

Author: Ehara, H. and Yoshida, K.
Abstract: The authors present and evaluate a technique for synchronizing the internal states of a code-excited-linear-prediction (CELP) encoder and decoder after the occurrence of frame erasure. The designed technique, called ldquoduplicated transmission (DT),rdquo uses some redundant information for realizing synchronization. The encoder performs encoding processes twice and sends two codes for each frame. One code is encoded by an encoder that is initialized. The code is used in cases where the previous frame is erased. An onset detector is combined with the DT technique to select the frames to which the DT should be applied. Subjective test results suggest that, by introducing DT selectively, the number of DT frames is reducible by about 80% without degrading the subjective quality. Results demonstrate that synchronization of the internal states is effective in cases of erasure of onset. The DT technique requires no additional algorithmic delay. For that reason, it would a better choice for particular applications for which the delay has a significant impact. [ABSTRACT FROM PUBLISHER]
Published: 2008
Full Text: View/download PDF

16. A subspace based progressive coding method for speech compression

Author: Serkan Keser, Ömer Nezih Gerek, Erol Seke, Mehmet Bilginer Gulmezoglu, Anadolu Üniversitesi, Mühendislik Fakültesi, Elektrik ve Elektronik Mühendisliği Bölümü, Gerek, Ömer Nezih, and Kırşehir Ahi Evran Üniversitesi, Teknik Bilimler Meslek Yüksekokulu, Elektrik ve Otomasyon Bölümü
Subjects: Linguistics and Language, Computer science, Speech recognition, Speech coding, TIMIT, 02 engineering and technology, Speech Codecs, Language and Linguistics, 020401 chemical engineering, 0202 electrical engineering, electronic engineering, information engineering, Codec, 0204 chemical engineering, Karhunen–Loève theorem, Subspace Methods, Communication, Vector quantization, 020206 networking & telecommunications, Independent Component Analysis (Ica), Independent component analysis, Computer Science Applications, Modeling and Simulation, Computer Vision and Pattern Recognition, Karhunen Loeve Transform (Klt), Software, Subspace topology, Coding (social sciences)
Abstract: WOS: 000414819300005, In this study, two novel methods, which are based on Karhunen Loeve Transform (KLT) and Independent Component Analysis (ICA), are proposed for coding of speech signals. Instead of immediately dealing with eigenvalue magnitudes, the KLT- and ICA-based methods use eigenvectors of covariance matrices (or independent components for ICA) by geometrically grouping these vectors into fewer numbers of vectors. In this way, a data representation compaction is achieved. Further compression is achieved through discarding autocovariance eigenvectors corresponding to the small eigenvalues and applying vector quantization on the remaining eigenvectors. Additionally, this study proposes an iterative error refinement process, which uses the rest of the available bandwidth in order to transmit an efficient representation of the description error for better SNR. The overall process constitutes a new approach to efficient speech coding, with ICA being used in subspace speech coding for the first time. Constant bit rate (CBR) and variable bit rate (VBR) coding algorithms are employed with the proposed methods. TIMIT speech database is used in the experimental studies. Speech signals are synthesized at 2.4 kbps, 8 kbps, 12.2 kbps, 16 kbps, 16.4kbps and 19.85 kbps rates by using various frame lengths. The qualities of synthesized speech signals are compared to those of available speech codecs, i.e., LPC (2.4 kbps), G.728 (LD-CELP, 16 kbps), G.729A (CS-CELP, 8 kbps), EVS (16.4 kbps), AMR-NB (12.2 kbps) and AMR-WB (19.85 kbps)
Published: 2017
Full Text: View/download PDF

17. Recognition of coded speech transmitted over wireless channels.

Author: Gomez, A.M., Peinado, A.M., Sanchez, V., and Rubio, A.J.
Abstract: Network-based speech recognition (NSR) and distributed speech recognition (DSR) have been proposed as solutions to translate speech recognition technologies to mobile environments. NSR is the most straightforward solution since it does not require any modification in the mobile phone, however DSR offers higher robustness against codec compression and transmission channel degradation. This paper explores an alternative approach for remote speech recognition which combines the advantages of NSR and DSR. In this scheme, a standard speech codec is used for speech transmission but the recognition is performed from the received codec parameters. In particular, we focus on the effect of transmission channel errors, which can cause a more severe performance reduction on speech recognition than codec distortion. First, we show that an NSR solution can approach DSR through a reconstruction technique along with an adapted noise reduction technique originally proposed for acoustic noise. Then, these results are improved by working with recognition features directly extracted from the codec bitstream by means of parameter transcoding. Required modifications on current networks in order to access the bitstream are described. The network upgrading with the tandem free operation (TFO) protocol is an attractive solution. This upgrade not only offers an overall improvement on the end-to-end speech quality, but would also allow a recognition performance similar, and even higher in poor channel conditions, to that obtained by DSR when parameter transcoding along with the proposed mitigation techniques are applied [ABSTRACT FROM PUBLISHER]
Published: 2006
Full Text: View/download PDF

18. Speech coding methods, standards, and applications.

Author: Gibson, J.D.
Abstract: Voice is the preferred method of human communication. Although there have been times when it seemed that the voice communications problem was solved, such as when the PSTN was our primary network or later when digital cellular networks reached maturity, such is not the case today. This paper addresses the challenges and opportunities starting from the basic issues in speech coder design, developing the important speech coding techniques and standards, discussing current and future applications, outlining techniques for evaluating speech coder performance, and identifying research directions. The most prominent speech coding standards are presented and their properties, such as performance, complexity, and coding delay, analyzed. Particular networks and applications for each standard are included. Further, reflecting upon the issues and developments highlighted in this paper, it becomes evident that there is a diverse set of challenges and opportunities for research and innovation in speech coding and voice communications. [ABSTRACT FROM PUBLISHER]
Published: 2005
Full Text: View/download PDF

19. Application of NB/WB AMR speech codecs in the 30-kHz TDMA system.

Author: Bo Wei, Hui Dong, and Gibson, J.D.
Abstract: A new system enhancement method is proposed for the EIA/TIA-136 system offering both channel operational range extension and improved performance within the current operational range. The existing time-division multiple-access (TDMA) (136) speech codec, the IS-641 enhanced full rate vocoder, operates at a fixed bit rate and does not allow the reallocation of bits to channel error protection as channel conditions degrade. The research presented here investigates the application of the narrow-band adaptive multirate (NB-AMR) speech codec and the wide-band AMR (WB-AMR) codec, both originally designed for the 200 kHz GSM channel, in the TDMA (TIA/EIA-136) 30-kHz system. In particular, we investigate adaptively allocating bits between NB/WB speech coding and error control coding within the limited channel bandwidth. Four modes out of 17 have been carefully chosen for the new TDMA/AMR system. Switching between codec rates as channel conditions change produces range extension below a C/I of 15 dB while also improving performance in the existing operational range above 15 dB. We keep the time slot formats unchanged so that our method is completely compatible with existing 136 systems. [ABSTRACT FROM PUBLISHER]
Published: 2004
Full Text: View/download PDF

20. Rate adaptive speech coding for universal multimedia access.

Author: Homayounfar, K.
Abstract: This article reviews state-of-the-art in transport adaptation techniques for mobile networks. It discusses the mechanisms for rate adaptation to combat quality degradations of speech caused by the radio links. It begins with a review of dynamic schemes for adaptation of speech encoders in cellular networks where we observe two distinct approaches to rate adaptation: network controlled and source controlled. The issues associated with adaptive voice over IP (VoIP) mechanisms are considered next. Here, the encoder detects some form of network congestion to judge how to behave itself for the good of the network. It is noted that this altruistic behavior will only benefit coordinated IP networks such as private intranets and its application to the public Internet is improbable. [ABSTRACT FROM PUBLISHER]
Published: 2003
Full Text: View/download PDF

21. An error-protected speech recognition system for wireless communications.

Author: Weerackody, V., Reichl, W., and Potamianos, A.
Abstract: Future wireless multimedia terminals will have a variety of applications that require speech recognition capabilities. We consider a robust distributed speech recognition system where representative parameters of the speech signal are extracted at the wireless terminal and transmitted to a centralized automatic speech recognition (ASR) server. We propose two unequal error protection schemes for the ASR bit stream and demonstrate the satisfactory performance of these schemes for typical wireless cellular channels. In addition, a "soft-feature" error concealment strategy is introduced at the ASR server that uses "soft-outputs" from the channel decoder to compute the marginal distribution of only the reliable features during likelihood computation at the speech recognizer. This soft-feature error concealment technique reduces the ASR error rate by more than a factor of 2.5 for certain channels. Also considered is a channel decoding technique with source information that improves ASR performance [ABSTRACT FROM PUBLISHER]
Published: 2002
Full Text: View/download PDF

22. High-performance DSPs.

Author: Junchen Du, Warner, G., Vallow, E., and Hollenbach, T.
Abstract: The processing delay is a serious constraint for speech communication. A one-way end-to-end delay of more than 150 ms can severely degrade the quality of real-time conversations. The components of the total system delay includes the speech frame size, the look ahead, other algorithmic delays, multiplexing delay, processing delay for computation, and transmission delay. At the transcoder rate adaptor unit (TRAU), the only delay that can be manipulated is the processing delay. The TRAUs are generally positioned remote to the base transceiver station (BTS). The channel codec units (CCUs) are located in the BTS. In general, 16 kbit/s traffic channels can be used for full rate speech between the TRAU and BTS. By putting the TRAU remote to the BTS, DSPs for speech coding can be utilized more efficiently to cut system cost. High performance DSPs, such as the Lucent DSP16000, can be used to further cut the cost per speech channel. This article presents an implementation of GSM enhanced full rate (EFR) codec on the Lucent Technologies' DSP16000. The original European Telecommunications Standards Institute (ETSI) C code has been restructured to address the issues of MIPS (million instructions per second), RAM usage, and processing delay. We give a performance overview of vocoder implementations on some existing fixed-point DSPs and discuss the architecture of the DSP16000. Details on how the ETSI C code is restructured are presented. The DSP16210 implementation results are then discussed [ABSTRACT FROM PUBLISHER]
Published: 2000
Full Text: View/download PDF

23. Simulation Support in the Search for an Efficient Speech Coder.

Author: Al-Akaidi, Marwan
Abstract: Speech coding is a well researched area, and researchers are making new proposals for im provements to current algorithms. The family of code-excitated linear prediction coders rep resents a recent breakthrough, which almost caters to the present need for low bit rates. In mobile communications, low bit-rate speech coders play a crucial role in spectrally efficient transmission. Theoretically, a rate of a few hundred bits per second is enough to code speech efficiently. Many coders have been de veloped using computationally intensive tech niques that can achieve rates of 4 to 6 kilobits per second. Theoretically, there is room for improvement. The trend is to use analytical models of speech production and perception. This paper briefly reviews existing speech cod ers, identifies future trends in this research area and describes results obtained from simu lating vowels and consonant waveforms and from a multi-pulse excitation/linear predictive speech coders algorithm. [ABSTRACT FROM PUBLISHER]
Published: 1998
Full Text: View/download PDF

24. Adapting entropy constrained coding of spectral envelope for fixed-rate coding in AMR speech codec

Author: Tadić, Tihomir, Petrinović, Davor, and Biljanović, Petar
Subjects: Bit rate, Decorrelation, Entropy coding, Frequency, Linear predictive coding, Nonlinear distortion, Speech codecs, Speech coding, Transform coding, Vector quantization, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Data_CODINGANDINFORMATIONTHEORY
Abstract: The Adaptive Multirate (AMR) speech codec operates in 8 different fixed-rate modes. In every mode, it uses a specified number of bits to quantize and encode the current speech frame. It encodes the spectral envelope by means of Line Spectral Frequencies (LSF) parameters, by using the constrained resolution (CR) fixed-rate Split Matrix Quantization (SMQ) and Split Vector Quantization (SVQ) methods. However, by using the Gaussian Mixture Model (GMM) based transform coding technique, the quantization of the spectral envelope can be significantly improved in the spectral distortion (SD) sense. This technique involves adaptive decorrelation of the LSF vectors by applying an orthogonal linear transformation combined with an ordinary scalar quantization of the decorrelated vector's components. We apply uniform scalar quantization followed by entropy constrained (EC) coding as it appears to be generally more efficient than non-uniform scalar quantizers. This paper describes the techniques used to adapt the entropy coded variable bit-rate output bit strings to the AMR codec modes using fixed-rate output bit strings. In order to constrain the length of the code, we use variable quantization step size and vector truncation techniques. Their application aspects are thoroughly investigated and described in this paper.
Published: 2010

25. Avoiding distortions due to speech coding and transmission errors

Author: Gallardo Antolín, Ascensión, Díaz de María, Fernando, and Valverde Albacete, Francisco José
Subjects: Telecomunicaciones, Coding distortion, Research, Speech coding, Automatic speech recognition, Transmission errors, Speech codecs, Speech recognition, Signal representation, GSM ASR tasks, GSM encoder, Coding errors, Digital radio, Decoded half-rate GSM codec, Cellular radio, Speaker independent isolated-digit ASR
Abstract: We have extended our previous research on a new approach to automatic speech recognition (ASR) in the GSM environment. Instead of recognizing from the decoded speech signal, our system works from the digital speech representation used by the GSM encoder. We have compared the performance of a conventional system and the one we propose on a speaker independent, isolated-digit ASR task. For the half and full-rate GSM codecs, from our results, we conclude that the proposed approach is much more effective in coping with the coding distortion and transmission errors. Furthermore, in clean speech conditions, our approach does not impoverish the recognition performance, even recognizing from GSM digital speech, in comparison with a conventional system working on unencoded speech Publicado
Published: 1999

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

25 results on '"SPEECH codecs"'

1. Neurally Optimized Decoder for Low Bitrate Speech Codec.

2. A scalable wideband speech codec using the wavelet packet transform based on the internet low bitrate codec.

3. A Novel AMR-WB Speech Steganography Based on Diameter-Neighbor Codebook Partition.

4. Automatic speaker verification on narrowband and wideband lossy coded clean speech.

5. Pitch-based steganography for Speex voice codec.

6. Incorporating data hiding into G.729 speech codec.

7. Speech Coding Techniques.

8. Influence of speech codecs selection on transcoding steganography.

9. The Codebook Design Based on Ant Colony Clustering Algorithm and its Application in AMR-WB.

10. Rate distortion performance bounds for wideband speech.

11. Highly accurate non-intrusive speech forensics for codec identifications from observed decoded signals.

12. Algebraic codebook search strategy for an algebraic code‐excited linear‐prediction speech coder by means of reduced candidate mechanism and iteration‐free pulse replacement.

13. Cepstrum-Based Bandwidth Extension for Super-Wideband Coders.

14. DeepVoCoder: A CNN Model for Compression and Coding of Narrow Band Speech

15. Decoder Initializing Technique for Improving Frame-Erasure Resilience of a CELP Speech Codec.

16. A subspace based progressive coding method for speech compression

17. Recognition of coded speech transmitted over wireless channels.

18. Speech coding methods, standards, and applications.

19. Application of NB/WB AMR speech codecs in the 30-kHz TDMA system.

20. Rate adaptive speech coding for universal multimedia access.

21. An error-protected speech recognition system for wireless communications.

22. High-performance DSPs.

23. Simulation Support in the Search for an Efficient Speech Coder.

24. Adapting entropy constrained coding of spectral envelope for fixed-rate coding in AMR speech codec

25. Avoiding distortions due to speech coding and transmission errors

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

25 results on '"SPEECH codecs"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources