364 results on '"Audio coding"'
Search Results
2. Gated recurrent unit predictor model-based adaptive differential pulse code modulation speech decoder
- Author
-
Gebremichael Kibret Sheferaw, Waweru Mwangi, Michael Kimwele, and Adane Mamuye
- Subjects
Speech coding ,Gated recurrent unit ,Nonlinear prediction ,Waveform coding ,Audio coding ,Adaptive differential pulse code modulation ,Acoustics. Sound ,QC221-246 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Abstract Speech coding is a method to reduce the amount of data needs to represent speech signals by exploiting the statistical properties of the speech signal. Recently, in the speech coding process, a neural network prediction model has gained attention as the reconstruction process of a nonlinear and nonstationary speech signal. This study proposes a novel approach to improve speech coding performance by using a gated recurrent unit (GRU)-based adaptive differential pulse code modulation (ADPCM) system. This GRU predictor model is trained using a data set of speech samples from the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus actual sample and the ADPCM fixed-predictor output speech sample. Our contribution lies in the development of an algorithm for training the GRU predictive model that can improve its performance in speech coding prediction and a new offline trained predictive model for speech decoder. The results indicate that the proposed system significantly improves the accuracy of speech prediction, demonstrating its potential for speech prediction applications. Overall, this work presents a unique application of the GRU predictive model with ADPCM decoding in speech signal compression, providing a promising approach for future research in this field.
- Published
- 2024
- Full Text
- View/download PDF
3. MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks.
- Author
-
Kemper, Guillermo, Sanchez, Alonso, and Serpa, Sergio
- Abstract
The Moving Picture Experts Group - 1 (MPEG-1) perceptual audio compression scheme is a successful family of audio codecs described in standard ISO/IEC 11172–3. Currently, there is no general framework to emulate nor MPEG-1 neither any other psychoacoustic model, which is a core piece of many perceptual codecs. This work presents a successful implementation of a convolutional neural network which emulates psychoacoustic model 1 from the MPEG-1 standard, termed "MCNN-PM" (Multiscale Convolutional Neural Network – Psychoacoustic Model). It is then implemented as part of the MPEG-1, Layer I codec. Using the objective difference grade (ODG) to evaluate audio quality, the MCNN-PM MPEG-1, Layer I codec outperforms the original MPEG-1, Layer I codec by up to 17% at 96 kbps, 14% at 128 kbps and performs almost equally at 192 kbps. This work shows that convolutional neural networks are a viable alternative to standard psychoacoustic models and can be used as part of perceptual audio codecs successfully. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Deep convolutional neural networks for double compressed AMR audio detection
- Author
-
Aykut Büker and Cemal Hanilçi
- Subjects
audio coding ,audio recording ,audio signal processing ,data compression ,feature extraction ,signal classification ,Telecommunication ,TK5101-6720 - Abstract
Abstract Detection of double compressed (DC) adaptive multi‐rate (AMR) audio recordings is a challenging audio forensic problem and has received great attention in recent years. Here, the authors propose to use convolutional neural networks (CNN) for DC AMR audio detection. The CNN is used as (i) an end‐to‐end DC AMR audio detection system and (ii) a feature extractor. The end‐to‐end system receives the audio spectrogram as the input and returns the decision whether the input audio is single compressed (SC) or DC. As a feature extractor in turn, it is used to extract discriminative features and then these features are modelled using support vector machines (SVM) classifier. Our extensive analysis conducted on four different datasets shows the success of the proposed system and provides new findings related to the problem. Firstly, double compression has a considerable impact on the high frequency components of the signal. Secondly, the proposed system yields great performance independent of the recording device or environment. Thirdly, when previously altered files are used in the experiments, 97.41% detection rate is obtained with the CNN system. Finally, the cross‐dataset evaluation experiments show that the proposed system is very effective in case of a mismatch between training and test datasets.
- Published
- 2021
- Full Text
- View/download PDF
5. Highly Efficient Audio Coding With Blind Spectral Recovery Based on Machine Learning.
- Author
-
Kim, Jae-Won, Beack, Seung Kwon, Lim, Wootaek, and Park, Hochong
- Subjects
MACHINE learning ,VIDEO coding ,MUSIC conducting ,AUDIO equipment ,INFORMATION processing - Abstract
This letter proposes a new method for audio coding that utilizes blind spectral recovery to improve the coding efficiency without compromising performance. The proposed method transmits only a fraction of the spectral coefficients, thereby reducing the coding bit rate. Then, it recovers the remaining coefficients in the decoder using the transmitted coefficients as input. The proposed method is differentiated from conventional spectral recovery in that the coefficients to be recovered are interleaved with the transmitted coefficients to obtain the most data correlation. Further, it enhances the transmitted coefficients, which are degraded by quantization errors, to deliver better information to the recovery process. The spectral recovery is conducted recursively on a band basis such that information recovered in one band is used for the recovery in subsequent bands. An improved level correction for the recovered coefficients and a new sign coding are also developed. A subjective performance evaluation confirms that the proposed method at 40 kbps provides statistically equivalent sound quality to a state-of-the-art coding method at 48 kbps for speech and music categories. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
6. MPEG Standards for Compressed Representation of Immersive Audio.
- Author
-
Quackenbush, Schuyler R. and Herre, Jurgen
- Subjects
MPEG (Video coding standard) ,MULTI-degree of freedom ,SINGLE-degree-of-freedom systems ,HEADPHONES ,LOUDSPEAKERS ,AUGMENTED reality - Abstract
The term “immersive audio” is frequently used to describe an audio experience that provides the listener the sensation of being fully immersed or “present” in a sound scene. This can be achieved via different presentation modes, such as surround sound (several loudspeakers horizontally arranged around the listener), 3D audio (with loudspeakers at, above, and below listener ear level), and binaural audio to headphones. This article provides an overview of two recent standards that support the bitrate-efficient carriage of high-quality immersive sound. The first is MPEG-H 3D audio, which is a versatile standard that supports multiple immersive sound signal formats (channels, objects, and higher order ambisonics) and is now being adopted in broadcast and streaming applications. The second is MPEG-I immersive audio, an extension of 3D audio, currently under development, which is targeted for virtual and augmented reality applications. This will support rendering of fully user-interactive immersive sound for three degrees of user movement [three degrees of freedom (3DoF)], i.e., yaw, pitch, and roll head movement, and for six degrees of user movement [six degrees of freedom (6DoF)], i.e., 3DoF plus translational ${x}$ , ${y}$ , and ${z}$ user position movements. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
7. Speech Coding, Synthesis, and Compression
- Author
-
Farouk, Mohamed Hesham, Gan, Woon-Seng, Series editor, Kuo, C.-C. Jay, Series editor, Zheng, Thomas Fang, Series editor, Barni, Mauro, Series editor, and Farouk, Mohamed Hesham
- Published
- 2018
- Full Text
- View/download PDF
8. Digital audio signal watermarking using minimum‐energy scaling optimisation in the wavelet domain.
- Author
-
Hsu, Chih‐Yu, Tu, Shu‐Yi, Yang, Chao‐Tung, Chang, Ching‐Lung, and Chen, Shuo‐Tsung
- Abstract
This work's contributions include three innovative concepts, an improved model, two‐stage Lagrange principle, and minimum‐energy scaling optimisation, for quantisation audio watermarking in the wavelet domain. First, discrete wavelet transform (DWT) multi‐coefficients quantisation, composed of arbitrary scaling on the lowest DWT coefficients, and the group‐based signal‐to‐noise ratio (SNR) of these coefficients is connected in a model. Then, the two‐stage Lagrange principle and minimum‐energy approach play two essential roles to obtain the optimal scaling factors. With the proposed scheme, the best fidelity and robustness of embedded audio can be attained and the perceptual evaluation of audio quality (PEAQ) test with an illustration of the relationship between SNR and PEAQ is also performed as well. Simulation results show that each watermarked audio by the proposed method attains a high SNR, good PEAQ, and a low bit error rate (BER). The SNR of most watermarked audios in their method is above 35 or even above 40 and the corresponding subjective difference grade of PEAQ is close to 0. In terms of comparing BER, most of their BER is as low as 2% or less indicating better robustness against many attacks, such as re‐sampling, amplitude scaling, and mp3 compression. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
9. A new source‐filter model audio bandwidth extension using high frequency perception feature for IoT communications.
- Author
-
Jiang, Lin, Yu, Shaoqian, Wang, Xiaochen, Wang, Chao, and Wang, Tonghan
- Subjects
BANDWIDTHS ,SENSORY perception - Abstract
Summary: Audio coding is a generic technology to IoT applications system. Audio bandwidth extension is a standard technique within contemporary audio codecs to efficiently code audio signals at low bitrates. In existing methods, high frequency signal is generated by a duplication of the corresponding low frequency and some parameters of high frequency. However, the perception quality of coding will significantly degrade if the correlation between high and low frequency becomes weak. In this paper, we proposed a new source‐filter model audio bandwidth extension method. In our method, a perception feature of the high frequency signal is extracted to restore the perception quality of coding. In the decoder side, the crest factor and noise level are obtained under the constraint of the high frequency perception parameter. The performance shown in our experiment results are superior to the classic methods. Compared with the state of art method, the proposed method is also comparative because of the coding bitrate is significantly reduced and keeps a close perception quality of coding. This paper also provided a new solution for IoT communications that requires low bitrates and high quality of coding, especially like Internet of Vehicles. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
10. Advancement of 22.2 Multichannel Sound Broadcasting Based on MPEG-H 3D Audio.
- Author
-
Sugimoto, Takehiro, Aoki, Shuichi, Hasegawa, Tomomi, and Komori, Tomoyasu
- Subjects
- *
BROADCASTING industry , *SOUND systems , *AUDIO equipment , *COMPUTER graphics , *LOUDSPEAKERS - Abstract
This study proposes improvements to 22.2 multichannel (22.2 ch) sound broadcasting service. 22.2 ch sound is currently used in the 8K satellite broadcasting in Japan. In this study, the audio system is migrated from channel-based audio to object-based audio. The object-based audio equips 22.2 ch sound with alternative and adaptive functionalities: the alternative functionality is related to dialogue controls such as multilingual services, while the adaptive functionality enables 22.2 ch sound to be adapted to the audio format of the playback equipment. Moving Picture Experts Group (MPEG)-H 3D Audio (3DA), which is the latest audio coding standard, is used as the audio coding scheme. A real-time encoder and decoder based on 3DA was developed to verify the practicability of the proposed system. The encoded audio data is packetized and transmitted by MPEG-H MPEG Media Transport (MMT) to be multiplexed with video data. A transmission experiment with 8K video was carried out in which the proposed system was proved to operate as designed in this study. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
11. Secure echo‐hiding audio watermarking method based on improved PN sequence and robust principal component analysis.
- Author
-
Wang, Shengbei, Wang, Chao, Yuan, Weitao, Wang, Lin, and Wang, Jianming
- Abstract
Echo‐hiding has been widely studied for audio watermarking. This study proposes a more secure echo‐hiding method based on modified pseudo‐noise (PN) sequence and robust principal component analysis (RPCA). In the proposed method, the RPCA is used to decompose the original audio signal into low‐rank and sparse parts and then a pair of opposite modified PN sequences is employed to embed watermarks. The modified PN sequence improves the robustness of watermark detection by providing additional correlation peaks. Meanwhile, benefit from the RPCA and the opposite PN sequences, the security of the proposed method is improved since watermarks cannot be detected from the whole signal even if the PN sequence is known, which is an obvious improvement compared with the previous PN‐based echo‐hiding methods. In the watermark detection process, the authors make use of the low‐rank and sparse characteristics of the watermarked signal to detect watermarks from the low‐rank and sparse parts, respectively. Based on this basic framework, they also propose a multi‐bit embedding scheme, which obtains a doubled embedding capacity compared with the previous PN‐based echo‐hiding methods. The proposed method was evaluated with respect to inaudibility, security, and robustness. The experiment results verified the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
12. AMBIQUAL: Towards a Quality Metric for Headphone Rendered Compressed Ambisonic Spatial Audio.
- Author
-
Narbutt, Miroslaw, Skoglund, Jan, Allen, Andrew, Chinen, Michael, Barry, Dan, and Hines, Andrew
- Subjects
HEADPHONES ,BIT rate ,STREAMING audio ,VIRTUAL reality ,STREAMING media ,QUALITY of service ,IMAGE compression ,VIRTUAL reality software - Abstract
Featured Application: Streaming spatial audio for immersive audio and virtual reality applications will require compression algorithms that maintain the localization accuracy and irrationality attributes of sound sources as well as a high-fidelity quality of experience. Models to evaluate quality will be important for media content streaming application such as YouTube as well as VR gaming and other immersive multimedia experiences. Spatial audio is essential for creating a sense of immersion in virtual environments. Efficient encoding methods are required to deliver spatial audio over networks without compromising Quality of Service (QoS). Streaming service providers such as YouTube typically transcode content into various bit rates and need a perceptually relevant audio quality metric to monitor users' perceived quality and spatial localization accuracy. The aim of the paper is two-fold. First, it is to investigate the effect of Opus codec compression on the quality of spatial audio as perceived by listeners using subjective listening tests. Secondly, it is to introduce AMBIQUAL, a full reference objective metric for spatial audio quality, which derives both listening quality and localization accuracy metrics directly from the B-format Ambisonic audio. We compare AMBIQUAL quality predictions with subjective quality assessments across a variety of audio samples which have been compressed using the Opus 1.2 codec at various bit rates. Listening quality and localization accuracy of first and third-order Ambisonics were evaluated. Several fixed and dynamic audio sources (single and multiple) were used to evaluate localization accuracy. Results show good correlation regarding listening quality and localization accuracy between objective quality scores using AMBIQUAL and subjective scores obtained during listening tests. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
13. On the Consumption of Multimedia Content Using Mobile Devices: a Year to Year User Case Study.
- Author
-
FALKOWSKI-GILSKI, Przemysław
- Subjects
- *
MULTIMEDIA systems , *DIGITAL music players , *CONCERT halls , *CASE studies , *MUSIC videos - Abstract
In the early days, consumption of multimedia content related with audio signals was only possible in a stationary manner. The music player was located at home, with a necessary physical drive. An alternative way for an individual was to attend a live performance at a concert hall or host a private concert at home. To sum up, audio-visual effects were only reserved for a narrow group of recipients. Today, thanks to portable players, vision and sound is at last available for everyone. Finally, thanks to multimedia streaming platforms, every music piece or video, e.g. from one's favourite artist or band, can be viewed anytime and everywhere. The background or status of an individual is no longer an issue. Each person who is connected to the global network can have access to the same resources. This paper is focused on the consumption of multimedia content using mobile devices. It describes a year to year user case study carried out between 2015 and 2019, and describes the development of current trends related with the expectations of modern users. The goal of this study is to aid policymakers, as well as providers, when it comes to designing and evaluating systems and services. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
14. Detection of Frequency-Scale Modification Using Robust Audio Watermarking Based on Amplitude Modulation
- Author
-
Nishimura, Akira, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Shi, Yun-Qing, editor, Kim, Hyoung Joong, editor, Pérez-González, Fernando, editor, and Echizen, Isao, editor
- Published
- 2016
- Full Text
- View/download PDF
15. Reviews on Technology and Standard of Spatial Audio Coding
- Author
-
Ikhwana Elfitri and Amirul Luthfi
- Subjects
spatial audio ,audio coding ,multi-channel audio signals ,MPEG standard ,object-based audio ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Market demands on a more impressive entertainment media have motivated for delivery of three dimensional (3D) audio content to home consumers through Ultra High Definition TV (UHDTV), the next generation of TV broadcasting, where spatial audio coding plays fundamental role. This paper reviews fundamental concept on spatial audio coding which includes technology, standard, and application. Basic principle of object-based audio reproduction system will also be elaborated, compared to the traditional channel-based system, to provide good understanding on this popular interactive audio reproduction system which gives end users flexibility to render their own preferred audio composition.
- Published
- 2017
- Full Text
- View/download PDF
16. An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction
- Author
-
Cobos, Maximo, Ahrens, Jens, Kowalczyk, Konrad, and Politis, Archontis
- Published
- 2022
- Full Text
- View/download PDF
17. Time domain synchronisation estimation algorithm for FBMC vector signal analysis in 5G system
- Author
-
Yunzhi Ling, Yu Zhang, and Lantian Xu
- Subjects
synchronisation ,time-domain analysis ,mobile communication ,channel bank filters ,ofdm modulation ,acoustic signal detection ,audio coding ,signal quality changes ,low signal-to-noise ratio ,good synchronisation ,frequency domain ,power synchronisation ,fifth-generation mobile communication network system ,filter bank multicarrier test applications ,fbmc vector signal analysis ,time domain synchronisation estimation algorithm ,5g test applications ,fbmc signal vector analysis ,fbmc signal vector analysers ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
For the challenges of filter bank multi-carrier (FBMC) test applications in the fifth-generation mobile communication network (5G) system, presented is a time domain synchronisation estimation algorithm based on power synchronisation in frequency domain. Through MATLAB simulation, the proposed algorithm is verified to achieve good synchronisation at low signal-to-noise ratio (SNR) and large frequency offset without signal quality changes. As a result, it can be applied in FBMC signal vector analysers, promoting FBMC signal vector analysis functions used in 5G test applications.
- Published
- 2019
- Full Text
- View/download PDF
18. Stethoscope with digital frequency translation for improved audibility
- Author
-
Herbert M. Aumann and Nuri W. Emanetoglu
- Subjects
hilbert transforms ,audio coding ,analogue-digital conversion ,bioacoustics ,cardiology ,biomedical equipment ,medical signal processing ,microcontrollers ,modulation ,intestinal sounds ,analog frequency translator ,digital frequency translation ,acoustic stethoscope ,heart sounds ,chest sounds ,hilbert transformer ,single sideband suppressed carrier modulation ,hearing impaired physicians ,microcontroller ,time delay ,audio coder-decoder ,frequency 200.0 hz ,frequency 72.0 mhz ,Medical technology ,R855-855.5 - Abstract
The performance of an acoustic stethoscope is improved by translating, without loss of fidelity, heart sounds, chest sounds, and intestinal sounds below 50 Hz into a frequency range of 200 Hz, which is easily detectable by the human ear. Such a frequency translation will be of significant benefit to hearing impaired physicians and it will improve the stethoscope performance in a noisy environment. The technique is based on a single sideband suppressed carrier modulation. Stability and bias problems commonly associated with an analog frequency translator are avoided by an all-digital implementation. Real-time audio processing is made possible by approximating a Hilbert transformer with a time delay. The performance of the digital frequency translator was verified with a 16-bit 44.1 Ks/s audio coder/decoder and a 32-bit 72 MHz microcontroller.
- Published
- 2019
- Full Text
- View/download PDF
19. Reversible and Robust Audio Watermarking Based on Spread Spectrum and Amplitude Expansion
- Author
-
Nishimura, Akira, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Shi, Yun-Qing, editor, Kim, Hyoung Joong, editor, Pérez-González, Fernando, editor, and Yang, Ching-Nung, editor
- Published
- 2015
- Full Text
- View/download PDF
20. Time domain synchronisation estimation algorithm for FBMC vector signal analysis in 5G system.
- Author
-
Ling, Yunzhi, Zhang, Yu, and Xu, Lantian
- Subjects
SYNCHRONIZATION ,FILTER banks ,SIGNAL-to-noise ratio ,5G networks ,MOBILE communication systems - Abstract
For the challenges of filter bank multi-carrier (FBMC) test applications in the fifth-generation mobile communication network (5G) system, presented is a time domain synchronisation estimation algorithm based on power synchronisation in frequency domain. Through MATLAB simulation, the proposed algorithm is verified to achieve good synchronisation at low signal-to-noise ratio (SNR) and large frequency offset without signal quality changes. As a result, it can be applied in FBMC signal vector analysers, promoting FBMC signal vector analysis functions used in 5G test applications. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
21. An effective hybrid low delay packet loss concealment algorithm for MDCT-based audio codec.
- Author
-
Lin, Zhibin, Lu, Jing, and Qiu, Xiaojun
- Subjects
- *
AUDIO codec , *AUDITORY perception , *SECRECY , *MUSIC & language , *ALGORITHMS - Abstract
This paper proposes a hybrid packet loss concealment (PLC) algorithm for the MDCT-based audio codec with different PLC strategies on tone dominant source signals and noise like signals respectively. It is meaningful to find that the phase angle of the MDCT-MDST coefficients decreases linearly with the increase of the frame index but the amplitude keeps unchanged for the stationary source signal with dominant tonal components. Therefore an efficient frame interpolation method is designed to accurately estimate the phase angle and the magnitude of the MDCT-MDST coefficients of the lost frame. For the noise-like signals without overwhelming tonal components, a modified shaped-noise insertion is proposed to improve the audio perception. Both objective and subjective test results show that the proposed algorithm provides better performance than the existing ones for both music and voiced speech signals. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
22. Modelling Timbral Hardness.
- Author
-
Pearce, Andy, Brookes, Tim, and Mason, Russell
- Subjects
METADATA ,HARDNESS ,ACOUSTIC filters ,SOUNDS ,LOUDNESS ,PSYCHOACOUSTICS ,REGRESSION analysis - Abstract
Featured Application: The model of timbral hardness described in this study is expected to be used for the searching and filtering of sound effects. Hardness is the most commonly searched timbral attribute within freesound.org , a commonly used online sound effect repository. A perceptual model of hardness was developed to enable the automatic generation of metadata to facilitate hardness-based filtering or sorting of search results. A training dataset was collected of 202 stimuli with 32 sound source types, and perceived hardness was assessed by a panel of listeners. A multilinear regression model was developed on six features: maximum bandwidth, attack centroid, midband level, percussive-to-harmonic ratio, onset strength, and log attack time. This model predicted the hardness of the training data with R 2 = 0.76. It predicted hardness within a new dataset with R 2 = 0.57, and predicted the rank order of individual sources perfectly, after accounting for the subjective variance of the ratings. Its performance exceeded that of human listeners. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
23. Reversible and Robust Audio Watermarking Based on Quantization Index Modulation and Amplitude Expansion
- Author
-
Nishimura, Akira, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Shi, Yun Qing, editor, Kim, Hyoung-Joong, editor, and Pérez-González, Fernando, editor
- Published
- 2014
- Full Text
- View/download PDF
24. BPL-PLC Voice Communication System for the Oil and Mining Industry
- Author
-
Grzegorz Debita, Przemysław Falkowski-Gilski, Marcin Habrych, Grzegorz Wiśniewski, Bogdan Miedziński, Przemysław Jedlikowski, Agnieszka Waniewska, Jan Wandzio, and Bartosz Polnik
- Subjects
audio coding ,digital systems ,electrical engineering ,ICT ,Industry 4.0 ,IoT ,Technology - Abstract
Application of a high-efficiency voice communication systems based on broadband over power line-power line communication (BPL-PLC) technology in medium voltage networks, including hazardous areas (like the oil and mining industry), as a redundant mean of wired communication (apart from traditional fiber optics and electrical wires) can be beneficial. Due to the possibility of utilizing existing electrical infrastructure, it can significantly reduce deployment costs. Additionally, it can be applied under difficult conditions, thanks to battery-powered devices. During an emergency situation (e.g., after coal dust explosion), the medium voltage cables are resistant to mechanical damage, providing a potentially life-saving communication link between the supervisor, rescue team, paramedics, and the trapped personnel. The assessment of such a system requires a comprehensive and accurate examination, including a number of factors. Therefore, various models were tested, considering: different transmission paths and types of coupling (inductive and capacitive), as well as various lengths of transmitted data packets. Next, a subjective quality evaluation study was carried out, considering speech signals from a number of languages (English, German, and Polish). Based on the obtained results, including both simulations and measurements, appropriate practical conclusions were formulated. Results confirmed the applicability of BPL-PLC technology as an efficient voice communication system for the oil and mining industry.
- Published
- 2020
- Full Text
- View/download PDF
25. AMBIQUAL: Towards a Quality Metric for Headphone Rendered Compressed Ambisonic Spatial Audio
- Author
-
Miroslaw Narbutt, Jan Skoglund, Andrew Allen, Michael Chinen, Dan Barry, and Andrew Hines
- Subjects
virtual reality ,spatial audio ,Ambisonics ,audio coding ,audio compression ,Opus codec ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Spatial audio is essential for creating a sense of immersion in virtual environments. Efficient encoding methods are required to deliver spatial audio over networks without compromising Quality of Service (QoS). Streaming service providers such as YouTube typically transcode content into various bit rates and need a perceptually relevant audio quality metric to monitor users’ perceived quality and spatial localization accuracy. The aim of the paper is two-fold. First, it is to investigate the effect of Opus codec compression on the quality of spatial audio as perceived by listeners using subjective listening tests. Secondly, it is to introduce AMBIQUAL, a full reference objective metric for spatial audio quality, which derives both listening quality and localization accuracy metrics directly from the B-format Ambisonic audio. We compare AMBIQUAL quality predictions with subjective quality assessments across a variety of audio samples which have been compressed using the Opus 1.2 codec at various bit rates. Listening quality and localization accuracy of first and third-order Ambisonics were evaluated. Several fixed and dynamic audio sources (single and multiple) were used to evaluate localization accuracy. Results show good correlation regarding listening quality and localization accuracy between objective quality scores using AMBIQUAL and subjective scores obtained during listening tests.
- Published
- 2020
- Full Text
- View/download PDF
26. Adaptive Selection of Embedding Locations for Spread Spectrum Watermarking of Compressed Audio
- Author
-
Koz, Alper, Delpha, Claude, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Shi, Yun Qing, editor, Kim, Hyoung-Joong, editor, and Perez-Gonzalez, Fernando, editor
- Published
- 2012
- Full Text
- View/download PDF
27. Uniform Transition Tree-Structured Critically Sampled Filterbanks
- Author
-
Wixen, Ryan
- Subjects
Signal processing ,Filterbanks ,Audio coding ,Electrical engineering - Abstract
Critically sampled filterbanks are useful in applications like audio coding that involve processing the time-varying spectral characteristics of signals. Critically sampled filterbanks can be implemented with a tree-structure. The odd-numbered subbands of a critically sampled filterbank exhibit frequency inversion, causing subbands to be unordered in frequency past the first layer. We show how subbands can be swapped at each layer to maintain their ordering. Self-similar filterbanks, using the same filters at each layer of the tree, have nonuniform transition widths and long impulse responses. We explore using filters with wider transition widths at deeper layers, and we demonstrate that this technique can be used to implement an efficient uniform transition filterbank. We compare this uniform transition filterbank with a self-similar filterbank in an audio codec, showing that the uniform transition filterbank achieves a smaller maximum transition width at a lower impulse response length.
- Published
- 2023
- Full Text
- View/download PDF
28. Frequency Domain Linear Prediction for QMF Sub-bands and Applications to Audio Coding
- Author
-
Motlicek, Petr, Ganapathy, Sriram, Hermansky, Hynek, Garudadri, Harinath, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Popescu-Belis, Andrei, editor, Renals, Steve, editor, and Bourlard, Hervé, editor
- Published
- 2008
- Full Text
- View/download PDF
29. Modeling Audio with Damped Sinusoids Using Total Least Squares Algorithms
- Author
-
Verhelst, Werner, Hermus, K., Lemmerling, P., Wambacq, P., Van Huffel, S., Van Huffel, Sabine, editor, and Lemmerling, Philippe, editor
- Published
- 2002
- Full Text
- View/download PDF
30. 22.2 ch Audio Encoding/Decoding Hardware System Based on MPEG-4 AAC.
- Author
-
Sugimoto, Takehiro, Nakayama, Yasushige, and Komori, Tomoyasu
- Subjects
- *
MPEG (Video coding standard) , *MULTIMEDIA communications , *SOUND systems , *DECODERS (Electronics) , *BIT rate - Abstract
A 22.2 multichannel (22.2 ch) sound system has been adopted as an audio system for 8K Super Hi-Vision (8K). The 22.2 ch sound system is an advanced sound system composed of 24 channels three-dimensionally located in a space to envelop listeners in an immersive sound field. NHK has been working on standardizing and developing an 8K broadcasting system via a broadcasting satellite in time for test broadcasting in 2016. For an audio coding scheme, NHK developed a world-first 22.2 ch audio encoding/decoding hardware system (22.2 ch audio codec) capable of real time encoding/decoding. The fabricated 22.2 ch audio codec is based on MPEG-4 AAC and was assembled into the 8K codec together with the 8K video codec and the multiplexer. The audio quality of the fabricated 22.2 ch audio codec was assessed in an objective evaluation, and the evaluation results revealed the operational bit rates of the fabricated codec. An 8K satellite broadcasting experiment was carried out as a final verification test of the 8K broadcasting system, and 22.2 ch audio codec was found to be valid. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
31. Delivering Scalable Audio Experiences using AC-4.
- Author
-
Riedmiller, Jeffrey, Kjorling, Kristofer, Roden, Jonas, Wolters, Martin, Biswas, Arijit, Boon, Prinyar, Carroll, Tim, Ekstrand, Per, Groschel, Alexander, Hedelin, Per, Hirvonen, Toni, Horich, Holger, Klejsa, Janusz, Koppens, Jeroen, Krauss, Kurt, Lehtonen, Heidi-Maria, Linzmeier, Karsten, Mehta, Sripal, Muesch, Hannes, and Mundt, Harald
- Subjects
- *
AUDIO codec , *DVB-H (Standard) , *INTERNET access control , *VIDEO recording , *IMAGE quality analysis - Abstract
AC-4 is a state-of-the-art audio codec standardized in ETSI (TS 103 190 and TS 103 190-2) and included in the DVB toolbox (TS 101 154 V2.2.1 and DVB BlueBook A157) and, at the time of writing, is a candidate standard for ATSC 3.0 as per A/342 part 2. AC-4 is an audio codec designed to address the current and future needs of video and audio entertainment services, including broadcast and Internet streaming. As such, it incorporates a number of features beyond the traditional audio coding algorithms, such as capabilities to support immersive and personalized audio, support for advanced loudness management, video-frame synchronous coding, dialog enhancement, etc. This paper will outline the thinking behind the design of the AC-4 codec, explain the different coding tools used, the systemic features included, and give an overview of performance and applications. It further outlines metadata aspects (immersive and personalized, essential for broadcast), metadata carriage, aspects of interchange of immersive programing, as well as immersive playback and rendering. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
32. Bandwidth extension method based on nonlinear audio characteristics classification
- Author
-
Li-yan ZHANG, Chang-chun BAO, Xin LIU, and Xing-tao ZHANG
- Subjects
audio coding ,bandwidth extension ,audio classification ,recurrence plot ,recurrence quantification analysis ,Telecommunication ,TK5101-6720 - Abstract
A bandwidth extension method based on audio classification was proposed.Time series of audio signals were classified into four types based on recurrence plot and recurrence quantification analysis,and the fine spectrums were recovered by taking advantage of four methods respectively.In addition,the spectrum envelope and energy gain were adjusted by Gaussian mixture model and codebook mapping on the basis of soft decision respectively.Subjective and objective testing results indicate that the proposed method has good quality compared with conventional blind bandwidth extension methods,and the performance of ITU-T G.722.1 codec with the proposed algorithm is better than that of G.722.1C codec at the same bit rate.
- Published
- 2013
- Full Text
- View/download PDF
33. Modelling Timbral Hardness
- Author
-
Andy Pearce, Tim Brookes, and Russell Mason
- Subjects
audio coding ,artificial intelligence ,sound recording ,sound quality ,psychoacoustics ,timbre ,modelling ,perception ,music information retrieval ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Hardness is the most commonly searched timbral attribute within freesound.org, a commonly used online sound effect repository. A perceptual model of hardness was developed to enable the automatic generation of metadata to facilitate hardness-based filtering or sorting of search results. A training dataset was collected of 202 stimuli with 32 sound source types, and perceived hardness was assessed by a panel of listeners. A multilinear regression model was developed on six features: maximum bandwidth, attack centroid, midband level, percussive-to-harmonic ratio, onset strength, and log attack time. This model predicted the hardness of the training data with R 2 = 0.76. It predicted hardness within a new dataset with R 2 = 0.57, and predicted the rank order of individual sources perfectly, after accounting for the subjective variance of the ratings. Its performance exceeded that of human listeners.
- Published
- 2019
- Full Text
- View/download PDF
34. Practical Issues of Dynamic Bit Allocation in Multimedia Source Compression
- Author
-
Ma, Kai-Kuang, Mital, Dinesh P., Terashima, Nobuyoshi, editor, and Altman, Edward, editor
- Published
- 1996
- Full Text
- View/download PDF
35. 8~64kbit/s super-wideband embedded speech and audio coding algorithm
- Author
-
JIA Mao-shen, BAO Chang-chun, and LI Rui
- Subjects
speech processing ,speech coding ,audio coding ,embedded coding ,Telecommunication ,TK5101-6720 - Abstract
Based on the international telecommunication union telecommunication standardization sector (ITU-T) recommendation G.729.1 and modified modulated lapped transform (MLT) coding, a super-wideband embedded variable bit-rate speech and audio coding algorithm was proposed, the bit-rates of this codec was from 8kbit/s to 64kbit/s. The information in the frequencies of 0~7 kHz was encoded by G.729.1 codec at 8~32kbit/s, the information in the frequencies of 7~14 kHz was encoded by the transform coding at 36、40 and 48kbit/s, and G.729.1 residual signal’s MDCT (modified discrete cosine transform) was encoded by the transform coding at 56 and 64kbit/s. The objective and subjective listening tests show that this codec has good performance compared with Terms of Reference given by ITU-T.
- Published
- 2009
36. MPEG-H 3D Audio: Immersive Audio Coding
- Author
-
Herre, Jürgen, Quackenbush, S.R., and Publica
- Subjects
Audio data reduction ,Audio coding ,Audio compression ,Immersive audio ,MPEG ,3D audio ,MPEG-H - Abstract
The term "Immersive Audio" is frequently used to describe an audio experience that provides to the listener the sensation of being fully immersed or "present" in a sound scene. This can be achieved via different presentation modes, such as surround sound (several loudspeakers horizontally arranged around the listener), 3D audio (with loudspeakers at, above and below listener ear level) and binaural audio to headphones. This article provides an overview of the recent MPEG standard, MPEG-H 3D Audio, which is a versatile standard that supports multiple immersive sound signal formats (channels, objects, higher order ambisonics), and is now being adopted in broadcast and streaming applications.
- Published
- 2022
37. Audio Coding Using Overlap and Kernel Adaptation.
- Author
-
Helmrich, Christian R. and Edler, Bernd
- Subjects
AUDIO codec ,KERNEL functions ,ENCODING ,QUANTIZATION (Physics) ,ELECTRIC switchgear - Abstract
Perceptual audio coding schemes typically apply the modified discrete cosine transform (MDCT) with different lengths and windows, and utilize signal-adaptive switching between these on a perframe basis for best subjective performance. In previous papers, the authors demonstrated that further quality gains can be achieved for some input signals using additional transform kernels such as the modified discrete sine transform (MDST) or greater inter-transform overlap by means of a modified extended lapped transform (MELT). This work discusses the algorithmic procedures and codec modifications necessary to combine all of the above features—transform length, window shape, transform kernel, and overlap ratio switching—into a flexible input-adaptive coding system. It is shown that, due to full time-domain aliasing cancelation, this system supports perfect signal reconstruction in the absence of quantization and, thanks to fast realizations of all transforms, increases the codec complexity only negligibly. The results of a 5.1 multichannel listening test are also reported. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
38. Parametric Coding of Stereo Audio
- Author
-
Erik Schuijers, Armin Kohlrausch, Steven van de Par, and Jeroen Breebaart
- Subjects
parametric stereo ,audio coding ,perceptual audio coding ,stereo coding. ,Telecommunication ,TK5101-6720 ,Electronics ,TK7800-8360 - Abstract
Parametric-stereo coding is a technique to efficiently code a stereo audio signal as a monaural signal plus a small amount of parametric overhead to describe the stereo image. The stereo properties are analyzed, encoded, and reinstated in a decoder according to spatial psychoacoustical principles. The monaural signal can be encoded using any (conventional) audio coder. Experiments show that the parameterized description of spatial properties enables a highly efficient, high-quality stereo audio representation.
- Published
- 2005
- Full Text
- View/download PDF
39. Gaussian channel transmission of images and audio files using cryptcoding
- Author
-
Aleksandra Popovska-Mitrovikj, Vesna Dimitrova, Vladimir Ilievski, Daniela Mechkaroska, Verica Bakeva, and Boro Jakimovski
- Subjects
packet error rate ,rcbq performance ,decoding ,error-correcting codes ,information security ,Computer science ,channel coding ,quasigroups ,audio files ,image files ,Data_CODINGANDINFORMATIONTHEORY ,02 engineering and technology ,cut-decoding algorithm ,decoding speed ,0203 mechanical engineering ,0202 electrical engineering, electronic engineering, information engineering ,random codes ,Electrical and Electronic Engineering ,image coding ,Computer Science::Information Theory ,cryptography ,gaussian channel transmission ,020206 networking & telecommunications ,020302 automobile design & engineering ,cryptcoding ,4-sets-cut-decoding algorithms ,decoded image ,Computer Science Applications ,error correction codes ,audio coding ,Gaussian channels ,bit-error rate ,Bit error rate ,coding algorithm ,gaussian channels ,Algorithm ,error statistics ,Decoding methods - Abstract
Random codes based on quasigroups (RCBQ) are cryptcodes, i.e. they are error-correcting codes, which provide information security. Cut-Decoding and 4-Sets-Cut-Decoding algorithms for these codes are defined elsewhere. Also, the performance of these codes for the transmission of text messages is investigated elsewhere. In this study, the authors investigate the RCBQ's performance with Cut-Decoding and 4-Sets-Cut-Decoding algorithms for transmission of images and audio files through a Gaussian channel. They compare experimental results for both coding/decoding algorithms and for different values of signal-to-noise ratio. In all experiments, the differences between the transmitted and decoded image or audio file are considered. Experimentally obtained values for bit-error rate and packet error rate and the decoding speed of both algorithms are compared. Also, two filters for enhancing the quality of the images decoded using RCBQ are proposed.
- Published
- 2019
- Full Text
- View/download PDF
40. Speech Coding, Synthesis, and Compression
- Author
-
Farouk, Mohamed Hesham, Neustein, Amy, Series editor, and Farouk, Mohamed Hesham
- Published
- 2014
- Full Text
- View/download PDF
41. Audio security through compressive sampling and cellular automata.
- Author
-
George, Sudhish, Augustine, Nishanth, and Pattathil, Deepthi
- Subjects
COMPRESSED sensing ,SOUND recording & reproducing ,DATA encryption ,CELLULAR automata ,DATA compression ,RANDOM matrices - Abstract
In this paper, a new approach for scrambling the compressive sensed (CS) audio data using two dimensional cellular automata is presented. In order to improve the security, linear feedback shift register (LFSR) based secure measurement matrix for compressive sensing is used. The basic idea is to select the different states of LFSR as the entries of a random matrix and orthonormalize these values to generate a Gaussian random measurement matrix. It is proposed to generate the initial state matrix of cellular automata using an LFSR based random bitstream generator. In order to improve the security and key space of the proposed cryptosystem, piecewise linear chaotic map (PWLCM) based initial seeds generation for LFSRs is used. In the proposed approach, the initial value, parameter value and the number of iterations of PWLCM are kept as secret to provide security. The proposed audio encryption method for CS audio data is validated with different compressive sensing reconstruction approaches. Experimental and analytical verification shows that the proposed encryption system gives good reconstruction performance, robustness to noise, high level of scrambling and good security against several forms of attack. Moreover, since the measurement matrix used for CS operation and the initial state matrix used for 2D cellular automata are generated using the secret key, the storage/transmission requirement of the same can be avoided. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
42. An enhanced direct chord transformation for music retrieval in the AAC transform domain with window switching.
- Author
-
Chang, Tai-Ming, Hsieh, Chia-Bin, and Chang, Pao-Chi
- Subjects
AUDIO codec ,DIGITAL image processing ,DIGITIZATION ,INFORMATION processing ,INFORMATION retrieval - Abstract
With the explosive growth in the number of music albums produced, retrieving music information has become a critical aspect of managing music data. Extracting frequency parameters directly from the compressed files to represent music greatly benefits processing speed when working on a large database. In this study, we focused on advanced audio coding (AAC) files and analyzed the disparity in frequency expression between discrete Fourier transform and discrete cosine transform, considered the frequency resolution to select the appropriate frequency range, and developed a direct chroma feature-transformation method in the AAC transform domain. An added challenge to using AAC files directly is long/short window switching, ignoring which may result in inaccurate frequency mapping and inefficient information retrieval. For a short window in particular, we propose a peak-competition method to enhance the pitch information that does not include ambiguous frequency components when combining eight subframes. Moreover, for chroma feature segmentation, we propose a simple dynamic-segmentation method to replace the complex computation of beat tracking. Our experimental results show that the proposed method increased the accuracy rate by approximately 7 % in Top-1 search results over transform-domain methods described previously and performed nearly as effectively as state-of-the-art waveform-domain approaches did. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
43. Subjective and Objective Assessment of Perceived Audio Quality of Current Digital Audio Broadcasting Systems and Web-Casting Applications.
- Author
-
Pocta, Peter and Beerends, John G.
- Subjects
- *
AUDIO codec , *DIGITAL audio , *BROADCAST channels , *IMAGE quality analysis , *STREAMING media - Abstract
This paper investigates the impact of different audio codecs typically deployed in current digital audio broadcasting (DAB) systems and web-casting applications, which represent a main source of quality impairment in these systems and applications, on the quality perceived by the end user. Both subjective and objective assessments are used. Two different audio quality prediction models, namely Perceptual Evaluation of Audio Quality (PEAQ) and Perceptual Objective Listening Quality Assessment (POLQA) Music, are evaluated by comparing the predictions with subjectively obtained grades. The results show that the degradations introduced by the typical lossy audio codecs deployed in current DAB systems and web-casting applications operating at the lowest bit rate typically used in these distribution systems and applications seriously impact the subjective audio quality perceived by the end user. Furthermore, it is shown that a retrained POLQA Music provides the best overall correlations between predicted objective measurements and subjective scores allowing to predict the final perceived quality with good accuracy when scores are averaged over a small set of musical fragments ( \mathbf R = 0.95 ). [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
44. MPEG-H 3D Audio—The New Standard for Coding of Immersive Spatial Audio.
- Author
-
Herre, Jurgen, Hilpert, Johannes, Kuntz, Achim, and Plogsties, Jan
- Abstract
The science and art of Spatial Audio is concerned with the capture, production, transmission, and reproduction of an immersive sound experience. Recently, a new generation of spatial audio technology has been introduced that employs elevated and lowered loudspeakers and thus surpasses previous ‘surround sound’ technology without such speakers in terms of listener immersion and potential for spatial realism. In this context, the ISO/MPEG standardization group has started the MPEG-H 3D Audio development effort to facilitate high-quality bitrate-efficient production, transmission and reproduction of such immersive audio material. The underlying format is designed to provide universal means for carriage of channel-based, object-based and Higher Order Ambisonics based input. High quality reproduction is provided for many output formats from 22.2 and beyond down to 5.1, stereo and binaural reproduction—independently of the original encoding format, thus overcoming the incompatibility between various 3D formats. This paper provides an overview of the MPEG-H 3D Audio project and technology and an assessment of the system capabilities and performance. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
45. ITU-T SG16 Sapporo Meeting Report.
- Author
-
Kiyoshi Tanaka, Shohei Matsuo, Hitoshi Ohmuro, and Seiichi Sakaya
- Subjects
- *
STREAMING video & television , *MULTIMEDIA systems , *VIDEO coding , *TELECOMMUNICATION conferences , *CONFERENCES & conventions - Abstract
The article discusses the highlights of the International Telecommunication Union-Telecommunication Standardization Sector Study Group 16 (ITU-T SG16) meeting held from June 30 to July 11, 2014 at the Sapporo Convention Center in Hokkaido, Japan. Topics tackled at the event include Internet protocol television (IPTV), accessibility of multimedia services and systems, video coding, and robust transmission technology.
- Published
- 2015
- Full Text
- View/download PDF
46. Finite‐state entropy‐constrained vector quantiser for audio modified discrete cosine transform coefficients uniform quantisation.
- Author
-
Jiang, Sumxin, Yin, Rendong, and Liu, Peilin
- Abstract
In this paper, an entropy‐constrained vector quantiser (ECVQ) scheme with finite memory, called finite‐state ECVQ (FS‐ECVQ), is presented. This scheme consists of a finite‐state vector quantiser (FSVQ) and multiple component ECVQs. By utilising the FSVQ, the inter‐frame dependencies within source sequence can be effectively exploited and no side information needs to be transmitted. By employing the ECVQs, the total memory requirements of FS‐ECVQ can be efficiently decreased while the coding performance is improved. An FS‐ECVQ, designed for the modified discrete cosine transform coefficients coding, was implemented and evaluated based on the unified speech and audio coding (USAC) scheme. Results showed that the FS‐ECVQ achieved reduction of the total memory requirements by 92.3%, compared with the encoder in USAC working draft 6 (WD6), and over 10%, compared with the encoder in USAC final version (FINAL), while maintaining coding performance similar to FINAL, which was about 4% better than that of WD6. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
47. IEEE Standard for System of Advanced Audio and Video Coding.
- Subjects
TRANSPORT protocols (Computer network protocols) ,ELECTRIC standards ,VIDEO codecs ,AUDIO codec - Abstract
Storage file formats and real-time transport protocol (RTP) payload formats for IEEE 1857(tm) video and IEEE 1857.2(tm) audio are defined. The storage of video and audio not only uses the existing capabilities of the ISO base media file format, but also defines extensions to support specific features of the IEEE 1857 video codec and IEEE 1857.2 audio codec. The target applications and services include but are not limited to Internet media streaming, IPTV, video conference, video telephony, and video-on-demand. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
48. Comparison of windowing in speech and audio coding.
- Author
-
Backstrom, Tom
- Abstract
Speech and audio coding have during the last decade converged to an increasingly unified technology. This contribution discusses one of the remaining fundamental differences between speech and audio paradigms, namely, windowing of the input signal. Audio codecs generally use lapped transforms and apply a perceptual model in the transform domain, whereby temporal continuity is achieved by windowing and overlap-add. Speech codecs on the other hand achieve temporal continuity by using linear predictive filtering, whereby windowing is applied in the residual domain. Despite these fundamental differences, we demonstrate that the two windowing approaches, combined with perceptual modeling, perform very similarly both in terms of perceptual quality and theoretical properties. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
49. MDCT domain parametric stereo audio coding.
- Author
-
Suresh, K. and Raj, R Akhil
- Abstract
Parametric coding of multichannel audio has gained popularity for low bit rate audio coding applications such as digital audio broadcasting. Most of the existing algorithms use MDCT domain techniques for compressing the audio, while the spatialization parameter estimation is done in a different time-frequency domain. An MDCT domain parametric stereo coding algorithm which represents the stereo channels as the linear combination of the ‘sum’ channel derived from the stereo channels and a reverberated channel generated from the ‘sum’ channel has been reported in literature. Spatialization parameters are estimated at the encoder by taking the scaled sub-band projections of stereo channels on ‘sum’ and reverberated channel. This model is inadequate to represent the stereo image since only four parameters per sub-band are used as spatialization parameters. In this work we improve the perceptual quality of this MDCT domain parametric coder with an augmented parameter extraction scheme using an additional reverberated channel. Subjective evaluation using MUSHRA test illustrates that the new algorithm has increased the perceptual audio quality of the encoded audio signal significantly. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
50. An MPEG-4 AAC decoder FPGA implementation for the Brazilian digital television.
- Author
-
Renner, Adriano and Susin, Altamiro Amadeu
- Abstract
This paper presents an MPEG-4 AAC decoder described in VHDL language and compliant with the Brazilian Digital Television standard (SBTVD). It has been synthesized to an Altera Cyclone II 2C35 FPGA using 26549 logic elements and 248704 memory bits. The implemented architecture has been verified using an Altera DE2 prototyping board, being capable of decoding stereo signals coded as MPEG-4 AAC Low Complexity audio objects. The minimum operating frequency required for real time decoding of a stereo audio stream with a sampling rate of 48 kHz is 4 MHz and the implemented decoder is capable of running at 56 MHz, meeting the requirements. This decoder design is intended to be integrated with a system on chip for the SBTVD set-top box. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.