227 results on '"Audio coding"'
Search Results
2. MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks.
- Author
-
Kemper, Guillermo, Sanchez, Alonso, and Serpa, Sergio
- Abstract
The Moving Picture Experts Group - 1 (MPEG-1) perceptual audio compression scheme is a successful family of audio codecs described in standard ISO/IEC 11172–3. Currently, there is no general framework to emulate nor MPEG-1 neither any other psychoacoustic model, which is a core piece of many perceptual codecs. This work presents a successful implementation of a convolutional neural network which emulates psychoacoustic model 1 from the MPEG-1 standard, termed "MCNN-PM" (Multiscale Convolutional Neural Network – Psychoacoustic Model). It is then implemented as part of the MPEG-1, Layer I codec. Using the objective difference grade (ODG) to evaluate audio quality, the MCNN-PM MPEG-1, Layer I codec outperforms the original MPEG-1, Layer I codec by up to 17% at 96 kbps, 14% at 128 kbps and performs almost equally at 192 kbps. This work shows that convolutional neural networks are a viable alternative to standard psychoacoustic models and can be used as part of perceptual audio codecs successfully. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Deep convolutional neural networks for double compressed AMR audio detection
- Author
-
Aykut Büker and Cemal Hanilçi
- Subjects
audio coding ,audio recording ,audio signal processing ,data compression ,feature extraction ,signal classification ,Telecommunication ,TK5101-6720 - Abstract
Abstract Detection of double compressed (DC) adaptive multi‐rate (AMR) audio recordings is a challenging audio forensic problem and has received great attention in recent years. Here, the authors propose to use convolutional neural networks (CNN) for DC AMR audio detection. The CNN is used as (i) an end‐to‐end DC AMR audio detection system and (ii) a feature extractor. The end‐to‐end system receives the audio spectrogram as the input and returns the decision whether the input audio is single compressed (SC) or DC. As a feature extractor in turn, it is used to extract discriminative features and then these features are modelled using support vector machines (SVM) classifier. Our extensive analysis conducted on four different datasets shows the success of the proposed system and provides new findings related to the problem. Firstly, double compression has a considerable impact on the high frequency components of the signal. Secondly, the proposed system yields great performance independent of the recording device or environment. Thirdly, when previously altered files are used in the experiments, 97.41% detection rate is obtained with the CNN system. Finally, the cross‐dataset evaluation experiments show that the proposed system is very effective in case of a mismatch between training and test datasets.
- Published
- 2021
- Full Text
- View/download PDF
4. Highly Efficient Audio Coding With Blind Spectral Recovery Based on Machine Learning.
- Author
-
Kim, Jae-Won, Beack, Seung Kwon, Lim, Wootaek, and Park, Hochong
- Subjects
MACHINE learning ,VIDEO coding ,MUSIC conducting ,AUDIO equipment ,INFORMATION processing - Abstract
This letter proposes a new method for audio coding that utilizes blind spectral recovery to improve the coding efficiency without compromising performance. The proposed method transmits only a fraction of the spectral coefficients, thereby reducing the coding bit rate. Then, it recovers the remaining coefficients in the decoder using the transmitted coefficients as input. The proposed method is differentiated from conventional spectral recovery in that the coefficients to be recovered are interleaved with the transmitted coefficients to obtain the most data correlation. Further, it enhances the transmitted coefficients, which are degraded by quantization errors, to deliver better information to the recovery process. The spectral recovery is conducted recursively on a band basis such that information recovered in one band is used for the recovery in subsequent bands. An improved level correction for the recovered coefficients and a new sign coding are also developed. A subjective performance evaluation confirms that the proposed method at 40 kbps provides statistically equivalent sound quality to a state-of-the-art coding method at 48 kbps for speech and music categories. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. MPEG Standards for Compressed Representation of Immersive Audio.
- Author
-
Quackenbush, Schuyler R. and Herre, Jurgen
- Subjects
MPEG (Video coding standard) ,MULTI-degree of freedom ,SINGLE-degree-of-freedom systems ,HEADPHONES ,LOUDSPEAKERS ,AUGMENTED reality - Abstract
The term “immersive audio” is frequently used to describe an audio experience that provides the listener the sensation of being fully immersed or “present” in a sound scene. This can be achieved via different presentation modes, such as surround sound (several loudspeakers horizontally arranged around the listener), 3D audio (with loudspeakers at, above, and below listener ear level), and binaural audio to headphones. This article provides an overview of two recent standards that support the bitrate-efficient carriage of high-quality immersive sound. The first is MPEG-H 3D audio, which is a versatile standard that supports multiple immersive sound signal formats (channels, objects, and higher order ambisonics) and is now being adopted in broadcast and streaming applications. The second is MPEG-I immersive audio, an extension of 3D audio, currently under development, which is targeted for virtual and augmented reality applications. This will support rendering of fully user-interactive immersive sound for three degrees of user movement [three degrees of freedom (3DoF)], i.e., yaw, pitch, and roll head movement, and for six degrees of user movement [six degrees of freedom (6DoF)], i.e., 3DoF plus translational ${x}$ , ${y}$ , and ${z}$ user position movements. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. Digital audio signal watermarking using minimum‐energy scaling optimisation in the wavelet domain.
- Author
-
Hsu, Chih‐Yu, Tu, Shu‐Yi, Yang, Chao‐Tung, Chang, Ching‐Lung, and Chen, Shuo‐Tsung
- Abstract
This work's contributions include three innovative concepts, an improved model, two‐stage Lagrange principle, and minimum‐energy scaling optimisation, for quantisation audio watermarking in the wavelet domain. First, discrete wavelet transform (DWT) multi‐coefficients quantisation, composed of arbitrary scaling on the lowest DWT coefficients, and the group‐based signal‐to‐noise ratio (SNR) of these coefficients is connected in a model. Then, the two‐stage Lagrange principle and minimum‐energy approach play two essential roles to obtain the optimal scaling factors. With the proposed scheme, the best fidelity and robustness of embedded audio can be attained and the perceptual evaluation of audio quality (PEAQ) test with an illustration of the relationship between SNR and PEAQ is also performed as well. Simulation results show that each watermarked audio by the proposed method attains a high SNR, good PEAQ, and a low bit error rate (BER). The SNR of most watermarked audios in their method is above 35 or even above 40 and the corresponding subjective difference grade of PEAQ is close to 0. In terms of comparing BER, most of their BER is as low as 2% or less indicating better robustness against many attacks, such as re‐sampling, amplitude scaling, and mp3 compression. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
7. A new source‐filter model audio bandwidth extension using high frequency perception feature for IoT communications.
- Author
-
Jiang, Lin, Yu, Shaoqian, Wang, Xiaochen, Wang, Chao, and Wang, Tonghan
- Subjects
BANDWIDTHS ,SENSORY perception - Abstract
Summary: Audio coding is a generic technology to IoT applications system. Audio bandwidth extension is a standard technique within contemporary audio codecs to efficiently code audio signals at low bitrates. In existing methods, high frequency signal is generated by a duplication of the corresponding low frequency and some parameters of high frequency. However, the perception quality of coding will significantly degrade if the correlation between high and low frequency becomes weak. In this paper, we proposed a new source‐filter model audio bandwidth extension method. In our method, a perception feature of the high frequency signal is extracted to restore the perception quality of coding. In the decoder side, the crest factor and noise level are obtained under the constraint of the high frequency perception parameter. The performance shown in our experiment results are superior to the classic methods. Compared with the state of art method, the proposed method is also comparative because of the coding bitrate is significantly reduced and keeps a close perception quality of coding. This paper also provided a new solution for IoT communications that requires low bitrates and high quality of coding, especially like Internet of Vehicles. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
8. Advancement of 22.2 Multichannel Sound Broadcasting Based on MPEG-H 3D Audio.
- Author
-
Sugimoto, Takehiro, Aoki, Shuichi, Hasegawa, Tomomi, and Komori, Tomoyasu
- Subjects
- *
BROADCASTING industry , *SOUND systems , *AUDIO equipment , *COMPUTER graphics , *LOUDSPEAKERS - Abstract
This study proposes improvements to 22.2 multichannel (22.2 ch) sound broadcasting service. 22.2 ch sound is currently used in the 8K satellite broadcasting in Japan. In this study, the audio system is migrated from channel-based audio to object-based audio. The object-based audio equips 22.2 ch sound with alternative and adaptive functionalities: the alternative functionality is related to dialogue controls such as multilingual services, while the adaptive functionality enables 22.2 ch sound to be adapted to the audio format of the playback equipment. Moving Picture Experts Group (MPEG)-H 3D Audio (3DA), which is the latest audio coding standard, is used as the audio coding scheme. A real-time encoder and decoder based on 3DA was developed to verify the practicability of the proposed system. The encoded audio data is packetized and transmitted by MPEG-H MPEG Media Transport (MMT) to be multiplexed with video data. A transmission experiment with 8K video was carried out in which the proposed system was proved to operate as designed in this study. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
9. Secure echo‐hiding audio watermarking method based on improved PN sequence and robust principal component analysis.
- Author
-
Wang, Shengbei, Wang, Chao, Yuan, Weitao, Wang, Lin, and Wang, Jianming
- Abstract
Echo‐hiding has been widely studied for audio watermarking. This study proposes a more secure echo‐hiding method based on modified pseudo‐noise (PN) sequence and robust principal component analysis (RPCA). In the proposed method, the RPCA is used to decompose the original audio signal into low‐rank and sparse parts and then a pair of opposite modified PN sequences is employed to embed watermarks. The modified PN sequence improves the robustness of watermark detection by providing additional correlation peaks. Meanwhile, benefit from the RPCA and the opposite PN sequences, the security of the proposed method is improved since watermarks cannot be detected from the whole signal even if the PN sequence is known, which is an obvious improvement compared with the previous PN‐based echo‐hiding methods. In the watermark detection process, the authors make use of the low‐rank and sparse characteristics of the watermarked signal to detect watermarks from the low‐rank and sparse parts, respectively. Based on this basic framework, they also propose a multi‐bit embedding scheme, which obtains a doubled embedding capacity compared with the previous PN‐based echo‐hiding methods. The proposed method was evaluated with respect to inaudibility, security, and robustness. The experiment results verified the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
10. AMBIQUAL: Towards a Quality Metric for Headphone Rendered Compressed Ambisonic Spatial Audio.
- Author
-
Narbutt, Miroslaw, Skoglund, Jan, Allen, Andrew, Chinen, Michael, Barry, Dan, and Hines, Andrew
- Subjects
HEADPHONES ,BIT rate ,STREAMING audio ,VIRTUAL reality ,STREAMING media ,QUALITY of service ,IMAGE compression ,VIRTUAL reality software - Abstract
Featured Application: Streaming spatial audio for immersive audio and virtual reality applications will require compression algorithms that maintain the localization accuracy and irrationality attributes of sound sources as well as a high-fidelity quality of experience. Models to evaluate quality will be important for media content streaming application such as YouTube as well as VR gaming and other immersive multimedia experiences. Spatial audio is essential for creating a sense of immersion in virtual environments. Efficient encoding methods are required to deliver spatial audio over networks without compromising Quality of Service (QoS). Streaming service providers such as YouTube typically transcode content into various bit rates and need a perceptually relevant audio quality metric to monitor users' perceived quality and spatial localization accuracy. The aim of the paper is two-fold. First, it is to investigate the effect of Opus codec compression on the quality of spatial audio as perceived by listeners using subjective listening tests. Secondly, it is to introduce AMBIQUAL, a full reference objective metric for spatial audio quality, which derives both listening quality and localization accuracy metrics directly from the B-format Ambisonic audio. We compare AMBIQUAL quality predictions with subjective quality assessments across a variety of audio samples which have been compressed using the Opus 1.2 codec at various bit rates. Listening quality and localization accuracy of first and third-order Ambisonics were evaluated. Several fixed and dynamic audio sources (single and multiple) were used to evaluate localization accuracy. Results show good correlation regarding listening quality and localization accuracy between objective quality scores using AMBIQUAL and subjective scores obtained during listening tests. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
11. On the Consumption of Multimedia Content Using Mobile Devices: a Year to Year User Case Study.
- Author
-
FALKOWSKI-GILSKI, Przemysław
- Subjects
- *
MULTIMEDIA systems , *DIGITAL music players , *CONCERT halls , *CASE studies , *MUSIC videos - Abstract
In the early days, consumption of multimedia content related with audio signals was only possible in a stationary manner. The music player was located at home, with a necessary physical drive. An alternative way for an individual was to attend a live performance at a concert hall or host a private concert at home. To sum up, audio-visual effects were only reserved for a narrow group of recipients. Today, thanks to portable players, vision and sound is at last available for everyone. Finally, thanks to multimedia streaming platforms, every music piece or video, e.g. from one's favourite artist or band, can be viewed anytime and everywhere. The background or status of an individual is no longer an issue. Each person who is connected to the global network can have access to the same resources. This paper is focused on the consumption of multimedia content using mobile devices. It describes a year to year user case study carried out between 2015 and 2019, and describes the development of current trends related with the expectations of modern users. The goal of this study is to aid policymakers, as well as providers, when it comes to designing and evaluating systems and services. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
12. Reviews on Technology and Standard of Spatial Audio Coding
- Author
-
Ikhwana Elfitri and Amirul Luthfi
- Subjects
spatial audio ,audio coding ,multi-channel audio signals ,MPEG standard ,object-based audio ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Market demands on a more impressive entertainment media have motivated for delivery of three dimensional (3D) audio content to home consumers through Ultra High Definition TV (UHDTV), the next generation of TV broadcasting, where spatial audio coding plays fundamental role. This paper reviews fundamental concept on spatial audio coding which includes technology, standard, and application. Basic principle of object-based audio reproduction system will also be elaborated, compared to the traditional channel-based system, to provide good understanding on this popular interactive audio reproduction system which gives end users flexibility to render their own preferred audio composition.
- Published
- 2017
- Full Text
- View/download PDF
13. An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction
- Author
-
Cobos, Maximo, Ahrens, Jens, Kowalczyk, Konrad, and Politis, Archontis
- Published
- 2022
- Full Text
- View/download PDF
14. Time domain synchronisation estimation algorithm for FBMC vector signal analysis in 5G system
- Author
-
Yunzhi Ling, Yu Zhang, and Lantian Xu
- Subjects
synchronisation ,time-domain analysis ,mobile communication ,channel bank filters ,ofdm modulation ,acoustic signal detection ,audio coding ,signal quality changes ,low signal-to-noise ratio ,good synchronisation ,frequency domain ,power synchronisation ,fifth-generation mobile communication network system ,filter bank multicarrier test applications ,fbmc vector signal analysis ,time domain synchronisation estimation algorithm ,5g test applications ,fbmc signal vector analysis ,fbmc signal vector analysers ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
For the challenges of filter bank multi-carrier (FBMC) test applications in the fifth-generation mobile communication network (5G) system, presented is a time domain synchronisation estimation algorithm based on power synchronisation in frequency domain. Through MATLAB simulation, the proposed algorithm is verified to achieve good synchronisation at low signal-to-noise ratio (SNR) and large frequency offset without signal quality changes. As a result, it can be applied in FBMC signal vector analysers, promoting FBMC signal vector analysis functions used in 5G test applications.
- Published
- 2019
- Full Text
- View/download PDF
15. Stethoscope with digital frequency translation for improved audibility
- Author
-
Herbert M. Aumann and Nuri W. Emanetoglu
- Subjects
hilbert transforms ,audio coding ,analogue-digital conversion ,bioacoustics ,cardiology ,biomedical equipment ,medical signal processing ,microcontrollers ,modulation ,intestinal sounds ,analog frequency translator ,digital frequency translation ,acoustic stethoscope ,heart sounds ,chest sounds ,hilbert transformer ,single sideband suppressed carrier modulation ,hearing impaired physicians ,microcontroller ,time delay ,audio coder-decoder ,frequency 200.0 hz ,frequency 72.0 mhz ,Medical technology ,R855-855.5 - Abstract
The performance of an acoustic stethoscope is improved by translating, without loss of fidelity, heart sounds, chest sounds, and intestinal sounds below 50 Hz into a frequency range of 200 Hz, which is easily detectable by the human ear. Such a frequency translation will be of significant benefit to hearing impaired physicians and it will improve the stethoscope performance in a noisy environment. The technique is based on a single sideband suppressed carrier modulation. Stability and bias problems commonly associated with an analog frequency translator are avoided by an all-digital implementation. Real-time audio processing is made possible by approximating a Hilbert transformer with a time delay. The performance of the digital frequency translator was verified with a 16-bit 44.1 Ks/s audio coder/decoder and a 32-bit 72 MHz microcontroller.
- Published
- 2019
- Full Text
- View/download PDF
16. Time domain synchronisation estimation algorithm for FBMC vector signal analysis in 5G system.
- Author
-
Ling, Yunzhi, Zhang, Yu, and Xu, Lantian
- Subjects
SYNCHRONIZATION ,FILTER banks ,SIGNAL-to-noise ratio ,5G networks ,MOBILE communication systems - Abstract
For the challenges of filter bank multi-carrier (FBMC) test applications in the fifth-generation mobile communication network (5G) system, presented is a time domain synchronisation estimation algorithm based on power synchronisation in frequency domain. Through MATLAB simulation, the proposed algorithm is verified to achieve good synchronisation at low signal-to-noise ratio (SNR) and large frequency offset without signal quality changes. As a result, it can be applied in FBMC signal vector analysers, promoting FBMC signal vector analysis functions used in 5G test applications. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
17. An effective hybrid low delay packet loss concealment algorithm for MDCT-based audio codec.
- Author
-
Lin, Zhibin, Lu, Jing, and Qiu, Xiaojun
- Subjects
- *
AUDIO codec , *AUDITORY perception , *SECRECY , *MUSIC & language , *ALGORITHMS - Abstract
This paper proposes a hybrid packet loss concealment (PLC) algorithm for the MDCT-based audio codec with different PLC strategies on tone dominant source signals and noise like signals respectively. It is meaningful to find that the phase angle of the MDCT-MDST coefficients decreases linearly with the increase of the frame index but the amplitude keeps unchanged for the stationary source signal with dominant tonal components. Therefore an efficient frame interpolation method is designed to accurately estimate the phase angle and the magnitude of the MDCT-MDST coefficients of the lost frame. For the noise-like signals without overwhelming tonal components, a modified shaped-noise insertion is proposed to improve the audio perception. Both objective and subjective test results show that the proposed algorithm provides better performance than the existing ones for both music and voiced speech signals. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
18. BPL-PLC Voice Communication System for the Oil and Mining Industry
- Author
-
Grzegorz Debita, Przemysław Falkowski-Gilski, Marcin Habrych, Grzegorz Wiśniewski, Bogdan Miedziński, Przemysław Jedlikowski, Agnieszka Waniewska, Jan Wandzio, and Bartosz Polnik
- Subjects
audio coding ,digital systems ,electrical engineering ,ICT ,Industry 4.0 ,IoT ,Technology - Abstract
Application of a high-efficiency voice communication systems based on broadband over power line-power line communication (BPL-PLC) technology in medium voltage networks, including hazardous areas (like the oil and mining industry), as a redundant mean of wired communication (apart from traditional fiber optics and electrical wires) can be beneficial. Due to the possibility of utilizing existing electrical infrastructure, it can significantly reduce deployment costs. Additionally, it can be applied under difficult conditions, thanks to battery-powered devices. During an emergency situation (e.g., after coal dust explosion), the medium voltage cables are resistant to mechanical damage, providing a potentially life-saving communication link between the supervisor, rescue team, paramedics, and the trapped personnel. The assessment of such a system requires a comprehensive and accurate examination, including a number of factors. Therefore, various models were tested, considering: different transmission paths and types of coupling (inductive and capacitive), as well as various lengths of transmitted data packets. Next, a subjective quality evaluation study was carried out, considering speech signals from a number of languages (English, German, and Polish). Based on the obtained results, including both simulations and measurements, appropriate practical conclusions were formulated. Results confirmed the applicability of BPL-PLC technology as an efficient voice communication system for the oil and mining industry.
- Published
- 2020
- Full Text
- View/download PDF
19. AMBIQUAL: Towards a Quality Metric for Headphone Rendered Compressed Ambisonic Spatial Audio
- Author
-
Miroslaw Narbutt, Jan Skoglund, Andrew Allen, Michael Chinen, Dan Barry, and Andrew Hines
- Subjects
virtual reality ,spatial audio ,Ambisonics ,audio coding ,audio compression ,Opus codec ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Spatial audio is essential for creating a sense of immersion in virtual environments. Efficient encoding methods are required to deliver spatial audio over networks without compromising Quality of Service (QoS). Streaming service providers such as YouTube typically transcode content into various bit rates and need a perceptually relevant audio quality metric to monitor users’ perceived quality and spatial localization accuracy. The aim of the paper is two-fold. First, it is to investigate the effect of Opus codec compression on the quality of spatial audio as perceived by listeners using subjective listening tests. Secondly, it is to introduce AMBIQUAL, a full reference objective metric for spatial audio quality, which derives both listening quality and localization accuracy metrics directly from the B-format Ambisonic audio. We compare AMBIQUAL quality predictions with subjective quality assessments across a variety of audio samples which have been compressed using the Opus 1.2 codec at various bit rates. Listening quality and localization accuracy of first and third-order Ambisonics were evaluated. Several fixed and dynamic audio sources (single and multiple) were used to evaluate localization accuracy. Results show good correlation regarding listening quality and localization accuracy between objective quality scores using AMBIQUAL and subjective scores obtained during listening tests.
- Published
- 2020
- Full Text
- View/download PDF
20. Modelling Timbral Hardness.
- Author
-
Pearce, Andy, Brookes, Tim, and Mason, Russell
- Subjects
METADATA ,HARDNESS ,ACOUSTIC filters ,SOUNDS ,LOUDNESS ,PSYCHOACOUSTICS ,REGRESSION analysis - Abstract
Featured Application: The model of timbral hardness described in this study is expected to be used for the searching and filtering of sound effects. Hardness is the most commonly searched timbral attribute within freesound.org , a commonly used online sound effect repository. A perceptual model of hardness was developed to enable the automatic generation of metadata to facilitate hardness-based filtering or sorting of search results. A training dataset was collected of 202 stimuli with 32 sound source types, and perceived hardness was assessed by a panel of listeners. A multilinear regression model was developed on six features: maximum bandwidth, attack centroid, midband level, percussive-to-harmonic ratio, onset strength, and log attack time. This model predicted the hardness of the training data with R 2 = 0.76. It predicted hardness within a new dataset with R 2 = 0.57, and predicted the rank order of individual sources perfectly, after accounting for the subjective variance of the ratings. Its performance exceeded that of human listeners. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
21. 22.2 ch Audio Encoding/Decoding Hardware System Based on MPEG-4 AAC.
- Author
-
Sugimoto, Takehiro, Nakayama, Yasushige, and Komori, Tomoyasu
- Subjects
- *
MPEG (Video coding standard) , *MULTIMEDIA communications , *SOUND systems , *DECODERS (Electronics) , *BIT rate - Abstract
A 22.2 multichannel (22.2 ch) sound system has been adopted as an audio system for 8K Super Hi-Vision (8K). The 22.2 ch sound system is an advanced sound system composed of 24 channels three-dimensionally located in a space to envelop listeners in an immersive sound field. NHK has been working on standardizing and developing an 8K broadcasting system via a broadcasting satellite in time for test broadcasting in 2016. For an audio coding scheme, NHK developed a world-first 22.2 ch audio encoding/decoding hardware system (22.2 ch audio codec) capable of real time encoding/decoding. The fabricated 22.2 ch audio codec is based on MPEG-4 AAC and was assembled into the 8K codec together with the 8K video codec and the multiplexer. The audio quality of the fabricated 22.2 ch audio codec was assessed in an objective evaluation, and the evaluation results revealed the operational bit rates of the fabricated codec. An 8K satellite broadcasting experiment was carried out as a final verification test of the 8K broadcasting system, and 22.2 ch audio codec was found to be valid. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
22. Delivering Scalable Audio Experiences using AC-4.
- Author
-
Riedmiller, Jeffrey, Kjorling, Kristofer, Roden, Jonas, Wolters, Martin, Biswas, Arijit, Boon, Prinyar, Carroll, Tim, Ekstrand, Per, Groschel, Alexander, Hedelin, Per, Hirvonen, Toni, Horich, Holger, Klejsa, Janusz, Koppens, Jeroen, Krauss, Kurt, Lehtonen, Heidi-Maria, Linzmeier, Karsten, Mehta, Sripal, Muesch, Hannes, and Mundt, Harald
- Subjects
- *
AUDIO codec , *DVB-H (Standard) , *INTERNET access control , *VIDEO recording , *IMAGE quality analysis - Abstract
AC-4 is a state-of-the-art audio codec standardized in ETSI (TS 103 190 and TS 103 190-2) and included in the DVB toolbox (TS 101 154 V2.2.1 and DVB BlueBook A157) and, at the time of writing, is a candidate standard for ATSC 3.0 as per A/342 part 2. AC-4 is an audio codec designed to address the current and future needs of video and audio entertainment services, including broadcast and Internet streaming. As such, it incorporates a number of features beyond the traditional audio coding algorithms, such as capabilities to support immersive and personalized audio, support for advanced loudness management, video-frame synchronous coding, dialog enhancement, etc. This paper will outline the thinking behind the design of the AC-4 codec, explain the different coding tools used, the systemic features included, and give an overview of performance and applications. It further outlines metadata aspects (immersive and personalized, essential for broadcast), metadata carriage, aspects of interchange of immersive programing, as well as immersive playback and rendering. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
23. Modelling Timbral Hardness
- Author
-
Andy Pearce, Tim Brookes, and Russell Mason
- Subjects
audio coding ,artificial intelligence ,sound recording ,sound quality ,psychoacoustics ,timbre ,modelling ,perception ,music information retrieval ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Hardness is the most commonly searched timbral attribute within freesound.org, a commonly used online sound effect repository. A perceptual model of hardness was developed to enable the automatic generation of metadata to facilitate hardness-based filtering or sorting of search results. A training dataset was collected of 202 stimuli with 32 sound source types, and perceived hardness was assessed by a panel of listeners. A multilinear regression model was developed on six features: maximum bandwidth, attack centroid, midband level, percussive-to-harmonic ratio, onset strength, and log attack time. This model predicted the hardness of the training data with R 2 = 0.76. It predicted hardness within a new dataset with R 2 = 0.57, and predicted the rank order of individual sources perfectly, after accounting for the subjective variance of the ratings. Its performance exceeded that of human listeners.
- Published
- 2019
- Full Text
- View/download PDF
24. MPEG-H 3D Audio: Immersive Audio Coding
- Author
-
Herre, Jürgen, Quackenbush, S.R., and Publica
- Subjects
Audio data reduction ,Audio coding ,Audio compression ,Immersive audio ,MPEG ,3D audio ,MPEG-H - Abstract
The term "Immersive Audio" is frequently used to describe an audio experience that provides to the listener the sensation of being fully immersed or "present" in a sound scene. This can be achieved via different presentation modes, such as surround sound (several loudspeakers horizontally arranged around the listener), 3D audio (with loudspeakers at, above and below listener ear level) and binaural audio to headphones. This article provides an overview of the recent MPEG standard, MPEG-H 3D Audio, which is a versatile standard that supports multiple immersive sound signal formats (channels, objects, higher order ambisonics), and is now being adopted in broadcast and streaming applications.
- Published
- 2022
25. Audio Coding Using Overlap and Kernel Adaptation.
- Author
-
Helmrich, Christian R. and Edler, Bernd
- Subjects
AUDIO codec ,KERNEL functions ,ENCODING ,QUANTIZATION (Physics) ,ELECTRIC switchgear - Abstract
Perceptual audio coding schemes typically apply the modified discrete cosine transform (MDCT) with different lengths and windows, and utilize signal-adaptive switching between these on a perframe basis for best subjective performance. In previous papers, the authors demonstrated that further quality gains can be achieved for some input signals using additional transform kernels such as the modified discrete sine transform (MDST) or greater inter-transform overlap by means of a modified extended lapped transform (MELT). This work discusses the algorithmic procedures and codec modifications necessary to combine all of the above features—transform length, window shape, transform kernel, and overlap ratio switching—into a flexible input-adaptive coding system. It is shown that, due to full time-domain aliasing cancelation, this system supports perfect signal reconstruction in the absence of quantization and, thanks to fast realizations of all transforms, increases the codec complexity only negligibly. The results of a 5.1 multichannel listening test are also reported. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
26. Parametric Coding of Stereo Audio
- Author
-
Erik Schuijers, Armin Kohlrausch, Steven van de Par, and Jeroen Breebaart
- Subjects
parametric stereo ,audio coding ,perceptual audio coding ,stereo coding. ,Telecommunication ,TK5101-6720 ,Electronics ,TK7800-8360 - Abstract
Parametric-stereo coding is a technique to efficiently code a stereo audio signal as a monaural signal plus a small amount of parametric overhead to describe the stereo image. The stereo properties are analyzed, encoded, and reinstated in a decoder according to spatial psychoacoustical principles. The monaural signal can be encoded using any (conventional) audio coder. Experiments show that the parameterized description of spatial properties enables a highly efficient, high-quality stereo audio representation.
- Published
- 2005
- Full Text
- View/download PDF
27. Audio security through compressive sampling and cellular automata.
- Author
-
George, Sudhish, Augustine, Nishanth, and Pattathil, Deepthi
- Subjects
COMPRESSED sensing ,SOUND recording & reproducing ,DATA encryption ,CELLULAR automata ,DATA compression ,RANDOM matrices - Abstract
In this paper, a new approach for scrambling the compressive sensed (CS) audio data using two dimensional cellular automata is presented. In order to improve the security, linear feedback shift register (LFSR) based secure measurement matrix for compressive sensing is used. The basic idea is to select the different states of LFSR as the entries of a random matrix and orthonormalize these values to generate a Gaussian random measurement matrix. It is proposed to generate the initial state matrix of cellular automata using an LFSR based random bitstream generator. In order to improve the security and key space of the proposed cryptosystem, piecewise linear chaotic map (PWLCM) based initial seeds generation for LFSRs is used. In the proposed approach, the initial value, parameter value and the number of iterations of PWLCM are kept as secret to provide security. The proposed audio encryption method for CS audio data is validated with different compressive sensing reconstruction approaches. Experimental and analytical verification shows that the proposed encryption system gives good reconstruction performance, robustness to noise, high level of scrambling and good security against several forms of attack. Moreover, since the measurement matrix used for CS operation and the initial state matrix used for 2D cellular automata are generated using the secret key, the storage/transmission requirement of the same can be avoided. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
28. An enhanced direct chord transformation for music retrieval in the AAC transform domain with window switching.
- Author
-
Chang, Tai-Ming, Hsieh, Chia-Bin, and Chang, Pao-Chi
- Subjects
AUDIO codec ,DIGITAL image processing ,DIGITIZATION ,INFORMATION processing ,INFORMATION retrieval - Abstract
With the explosive growth in the number of music albums produced, retrieving music information has become a critical aspect of managing music data. Extracting frequency parameters directly from the compressed files to represent music greatly benefits processing speed when working on a large database. In this study, we focused on advanced audio coding (AAC) files and analyzed the disparity in frequency expression between discrete Fourier transform and discrete cosine transform, considered the frequency resolution to select the appropriate frequency range, and developed a direct chroma feature-transformation method in the AAC transform domain. An added challenge to using AAC files directly is long/short window switching, ignoring which may result in inaccurate frequency mapping and inefficient information retrieval. For a short window in particular, we propose a peak-competition method to enhance the pitch information that does not include ambiguous frequency components when combining eight subframes. Moreover, for chroma feature segmentation, we propose a simple dynamic-segmentation method to replace the complex computation of beat tracking. Our experimental results show that the proposed method increased the accuracy rate by approximately 7 % in Top-1 search results over transform-domain methods described previously and performed nearly as effectively as state-of-the-art waveform-domain approaches did. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
29. Subjective and Objective Assessment of Perceived Audio Quality of Current Digital Audio Broadcasting Systems and Web-Casting Applications.
- Author
-
Pocta, Peter and Beerends, John G.
- Subjects
- *
AUDIO codec , *DIGITAL audio , *BROADCAST channels , *IMAGE quality analysis , *STREAMING media - Abstract
This paper investigates the impact of different audio codecs typically deployed in current digital audio broadcasting (DAB) systems and web-casting applications, which represent a main source of quality impairment in these systems and applications, on the quality perceived by the end user. Both subjective and objective assessments are used. Two different audio quality prediction models, namely Perceptual Evaluation of Audio Quality (PEAQ) and Perceptual Objective Listening Quality Assessment (POLQA) Music, are evaluated by comparing the predictions with subjectively obtained grades. The results show that the degradations introduced by the typical lossy audio codecs deployed in current DAB systems and web-casting applications operating at the lowest bit rate typically used in these distribution systems and applications seriously impact the subjective audio quality perceived by the end user. Furthermore, it is shown that a retrained POLQA Music provides the best overall correlations between predicted objective measurements and subjective scores allowing to predict the final perceived quality with good accuracy when scores are averaged over a small set of musical fragments ( \mathbf R = 0.95 ). [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
30. MPEG-H 3D Audio—The New Standard for Coding of Immersive Spatial Audio.
- Author
-
Herre, Jurgen, Hilpert, Johannes, Kuntz, Achim, and Plogsties, Jan
- Abstract
The science and art of Spatial Audio is concerned with the capture, production, transmission, and reproduction of an immersive sound experience. Recently, a new generation of spatial audio technology has been introduced that employs elevated and lowered loudspeakers and thus surpasses previous ‘surround sound’ technology without such speakers in terms of listener immersion and potential for spatial realism. In this context, the ISO/MPEG standardization group has started the MPEG-H 3D Audio development effort to facilitate high-quality bitrate-efficient production, transmission and reproduction of such immersive audio material. The underlying format is designed to provide universal means for carriage of channel-based, object-based and Higher Order Ambisonics based input. High quality reproduction is provided for many output formats from 22.2 and beyond down to 5.1, stereo and binaural reproduction—independently of the original encoding format, thus overcoming the incompatibility between various 3D formats. This paper provides an overview of the MPEG-H 3D Audio project and technology and an assessment of the system capabilities and performance. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
31. IEEE Standard for System of Advanced Audio and Video Coding.
- Subjects
TRANSPORT protocols (Computer network protocols) ,ELECTRIC standards ,VIDEO codecs ,AUDIO codec - Abstract
Storage file formats and real-time transport protocol (RTP) payload formats for IEEE 1857(tm) video and IEEE 1857.2(tm) audio are defined. The storage of video and audio not only uses the existing capabilities of the ISO base media file format, but also defines extensions to support specific features of the IEEE 1857 video codec and IEEE 1857.2 audio codec. The target applications and services include but are not limited to Internet media streaming, IPTV, video conference, video telephony, and video-on-demand. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
32. Comparison of windowing in speech and audio coding.
- Author
-
Backstrom, Tom
- Abstract
Speech and audio coding have during the last decade converged to an increasingly unified technology. This contribution discusses one of the remaining fundamental differences between speech and audio paradigms, namely, windowing of the input signal. Audio codecs generally use lapped transforms and apply a perceptual model in the transform domain, whereby temporal continuity is achieved by windowing and overlap-add. Speech codecs on the other hand achieve temporal continuity by using linear predictive filtering, whereby windowing is applied in the residual domain. Despite these fundamental differences, we demonstrate that the two windowing approaches, combined with perceptual modeling, perform very similarly both in terms of perceptual quality and theoretical properties. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
33. MDCT domain parametric stereo audio coding.
- Author
-
Suresh, K. and Raj, R Akhil
- Abstract
Parametric coding of multichannel audio has gained popularity for low bit rate audio coding applications such as digital audio broadcasting. Most of the existing algorithms use MDCT domain techniques for compressing the audio, while the spatialization parameter estimation is done in a different time-frequency domain. An MDCT domain parametric stereo coding algorithm which represents the stereo channels as the linear combination of the ‘sum’ channel derived from the stereo channels and a reverberated channel generated from the ‘sum’ channel has been reported in literature. Spatialization parameters are estimated at the encoder by taking the scaled sub-band projections of stereo channels on ‘sum’ and reverberated channel. This model is inadequate to represent the stereo image since only four parameters per sub-band are used as spatialization parameters. In this work we improve the perceptual quality of this MDCT domain parametric coder with an augmented parameter extraction scheme using an additional reverberated channel. Subjective evaluation using MUSHRA test illustrates that the new algorithm has increased the perceptual audio quality of the encoded audio signal significantly. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
34. An MPEG-4 AAC decoder FPGA implementation for the Brazilian digital television.
- Author
-
Renner, Adriano and Susin, Altamiro Amadeu
- Abstract
This paper presents an MPEG-4 AAC decoder described in VHDL language and compliant with the Brazilian Digital Television standard (SBTVD). It has been synthesized to an Altera Cyclone II 2C35 FPGA using 26549 logic elements and 248704 memory bits. The implemented architecture has been verified using an Altera DE2 prototyping board, being capable of decoding stereo signals coded as MPEG-4 AAC Low Complexity audio objects. The minimum operating frequency required for real time decoding of a stereo audio stream with a sampling rate of 48 kHz is 4 MHz and the implemented decoder is capable of running at 56 MHz, meeting the requirements. This decoder design is intended to be integrated with a system on chip for the SBTVD set-top box. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
35. Speech/Audio Signal Classification Using Spectral Flux Pattern Recognition.
- Author
-
Lee, Sangkil, Kim, Jieun, and Lee, Insung
- Abstract
In this paper, we present a novel method for the improvement of speech and audio signal classification using spectral flux (SF) pattern recognition for the MPEG Unified Speech and Audio Coding (USAC) standard. For effective pattern recognition, the Gaussian mixture model (GMM)probability model is used. For the optimal GMM parameter extraction, we use the expectation maximization (EM)algorithm. The proposed classification algorithm is divided into two significant parts. The first one extracts the optimal parameters for the GMM. The second distinguishes between speech and audio signals using SF pattern recognition. The performance of the proposed classification algorithm shows better results compared to the conventionally implemented USAC scheme. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
36. Audio codingwith power spectral density preserving quantization.
- Author
-
Li, Minyue, Klejsa, Janusz, Ozerov, Alexey, and Kleijn, W. Bastiaan
- Abstract
The coding of audio-visual signals is generally based on different paradigms for high and low rates. At high rates the signal is approximated directly and at low rates only signal features are transmitted. The recently introduced distribution preserving quantization (DPQ) paradigm provides a seamless transition between these two regimes. In this paper we present a simplified scheme that preserves the power spectral density (PSD) rather than the probability distribution. In a practical system the PSD must be estimated. We show that both forward adaptive and backward adaptive PSD estimation are possible. Our experimental results confirm that preservation of PSD at finite precision leads to a unified coding paradigm that provides effective coding at both high and low rates. An audio coding application shows the perceptual benefits of PSD preserving quantization. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
37. An embedded stereo speech and audio coding method based on principal component analysis.
- Author
-
Jia, Mao-shen, Bao, Chang-chun, Liu, Xin, Li, Xiao-ming, and Li, Ru-wei
- Abstract
In this paper a compressive sampling method of MLT coefficients which is used for extracting stereo information is adopted based on principal component analysis (PCA) and Modulated Lapped Transform (MLT). With this method, an embedded variable bit-rates stereo speech and audio coding algorithm is proposed in this paper. In this codec, the stereo signal sampled at 32 kHz and 16 kHz can be coded in terms of scalable bit rates, the structure of bit-stream is embedded and the bit-stream can be divided into several layers. The core codec is ITU-T G.729.1 which can process mono signal with 7 kHz bandwidth. Besides there are four extra bit-rates added include 40, 48, 56, and 64kb/s. The maximum bit-rates of wideband stereo signal and super-wideband stereo signal are 48kb/s and 64kb/s, respectively. The objective and subjective test results show that the quality of the proposed codec is no worse than the reference codec which is requested by ITU-T. [ABSTRACT FROM PUBLISHER]
- Published
- 2011
- Full Text
- View/download PDF
38. A perceptually reweighted mixed-norm method for sparse approximation of audio signals.
- Author
-
Christensen, Mads Groesboll and Sturm, Bob L.
- Abstract
In this paper, we consider the problem of finding sparse representations of audio signals for coding purposes. In doing so, it is of utmost importance that when only a subset of the present components of an audio signal are extracted, it is the perceptually most important ones. To this end, we propose a new iterative algorithm based on two principles: 1) a reweighted 1-norm based measure of sparsity; and 2) a reweighted 2-norm based measure of perceptual distortion. Using these measures, the considered problem is posed as a constrained convex optimization problem that can be solved optimally using standard software. A prominent feature of the new method is that it solves a problem that is closely related to the objective of coding, namely rate-distortion optimization. In computer simulations, we demonstrate the properties of the algorithm and its application to real audio signals. [ABSTRACT FROM PUBLISHER]
- Published
- 2011
- Full Text
- View/download PDF
39. Frequency Domain Linear Prediction for QMF Sub-bands and Applications to Audio Coding.
- Author
-
Motlicek, Petr, Ganapathy, Sriram, Hermansky, Hynek, and Garudadri, Harinath
- Abstract
This paper proposes an analysis technique for wide-band audio applications based on the predictability of the temporal evolution of Quadrature Mirror Filter (QMF) sub-band signals. The input audio signal is first decomposed into 64 sub-band signals using QMF decomposition. The temporal envelopes in critically sampled QMF sub-bands are approximated using frequency domain linear prediction applied over relatively long time segments (e.g. 1000 ms). Line Spectral Frequency parameters related to autoregressive models are computed and quantized in each frequency sub-band. The sub-band residuals are quantized in the frequency domain using a combination of split Vector Quantization (VQ) (for magnitudes) and uniform scalar quantization (for phases). In the decoder, the sub-band signal is reconstructed using the quantized residual and the corresponding quantized envelope. Finally, application of inverse QMF reconstructs the audio signal. Even with simple quantization techniques and without any sophisticated modules, the proposed audio coder provides encouraging results in objective quality tests. Also, the proposed coder is easily scalable across a wide range of bit-rates. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
40. Low Delay Robust Audio Coding by Noise Shaping, Fractional Sampling, and Source Prediction
- Author
-
Jan Ostergaard, Bilgin, Ali, Marcellin, Michael W., Serra-Sagrista, Joan, and Storer, James A.
- Subjects
PEAQ ,Audio signal ,Computer science ,Network packet ,low delay ,020206 networking & telecommunications ,02 engineering and technology ,fractional sampling ,Noise shaping ,030507 speech-language pathology & audiology ,03 medical and health sciences ,audio coding ,Packet loss ,0202 electrical engineering, electronic engineering, information engineering ,source predictions ,Oversampling ,Multiple descriptions ,0305 other medical science ,Algorithm ,noise shaping ,Decoding methods ,Data compression - Abstract
It was recently shown that the combination of source prediction, two-times oversampling, and noise shaping, can be used to obtain a robust (multiple-description) audio coding frame- work for networks with packet loss probabilities less than 10%. Specifically, it was shown that audio signals could be encoded into two descriptions (packets), which were separately sent over a communication channel. Each description yields a desired performance by itself, and when they are combined, the performance is improved. This paper extends the previ- ous work to an arbitrary number of descriptions (packets) by using fractional oversampling and a new decoding principle. We demonstrate that, due to source aliasing, existing MSE optimized reconstruction rules from noisy sampled data, performs poorly from a perceptual point of view. A simple reconstruction rule is proposed, that improves the PEAQ objective difference grades (ODG) by more than 2 points. The proposed audio coder enables low- delay high-quality audio streaming on networks with late packet arrivals or packet losses. With a coding delay of 2.5 ms, and a total bitrate of 300 kbps, it is demonstrated that mean PEAQ ODGs around -0.65 can be obtained for 48 kHz (mono) music (pop & rock), and packet loss probabilities of 20%.
- Published
- 2021
- Full Text
- View/download PDF
41. sc3nb: a Python-SuperCollider Interface for Auditory Data Science
- Author
-
Thomas Hermann and Dennis Reinsch
- Subjects
interactive programming ,Interactive programming ,Interface (Java) ,Computer science ,NumPy ,Audification ,SuperCollider ,Python (programming language) ,Data science ,audio coding ,Sonification ,auditory data science ,sonification ,computer ,computer.programming_language ,Coding (social sciences) ,Python - Abstract
This paper introduces sc3nb, a Python package for audio coding and interactive control of the SuperCollider programming environment. sc3nb supports Jupyter notebooks, enables flexible means for sound and music computing such as sound synthesis and analysis and is particularly tailored for sonification. We present the main concepts and interfaces and illustrate how to use sc3nb at hand of selected code examples for basic sonification approaches, such as audification and parameter-mapping sonification. Finally, we introduce TimedQueues which enable coordinated audiovisual displays, e.g. to synchronize matplotlib data and sc3nb-based sound rendition. sc3nb enables interactive sound applications right in the center of the pandas/numpy/scipy data science ecosystem. The open source package is hosted at GitHub and available via the Python Package Index PyPI.
- Published
- 2021
42. ITU-T SG16 Sapporo Meeting Report.
- Author
-
Kiyoshi Tanaka, Shohei Matsuo, Hitoshi Ohmuro, and Seiichi Sakaya
- Subjects
- *
STREAMING video & television , *MULTIMEDIA systems , *VIDEO coding , *TELECOMMUNICATION conferences , *CONFERENCES & conventions - Abstract
The article discusses the highlights of the International Telecommunication Union-Telecommunication Standardization Sector Study Group 16 (ITU-T SG16) meeting held from June 30 to July 11, 2014 at the Sapporo Convention Center in Hokkaido, Japan. Topics tackled at the event include Internet protocol television (IPTV), accessibility of multimedia services and systems, video coding, and robust transmission technology.
- Published
- 2015
- Full Text
- View/download PDF
43. Finite‐state entropy‐constrained vector quantiser for audio modified discrete cosine transform coefficients uniform quantisation.
- Author
-
Jiang, Sumxin, Yin, Rendong, and Liu, Peilin
- Abstract
In this paper, an entropy‐constrained vector quantiser (ECVQ) scheme with finite memory, called finite‐state ECVQ (FS‐ECVQ), is presented. This scheme consists of a finite‐state vector quantiser (FSVQ) and multiple component ECVQs. By utilising the FSVQ, the inter‐frame dependencies within source sequence can be effectively exploited and no side information needs to be transmitted. By employing the ECVQs, the total memory requirements of FS‐ECVQ can be efficiently decreased while the coding performance is improved. An FS‐ECVQ, designed for the modified discrete cosine transform coefficients coding, was implemented and evaluated based on the unified speech and audio coding (USAC) scheme. Results showed that the FS‐ECVQ achieved reduction of the total memory requirements by 92.3%, compared with the encoder in USAC working draft 6 (WD6), and over 10%, compared with the encoder in USAC final version (FINAL), while maintaining coding performance similar to FINAL, which was about 4% better than that of WD6. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
44. HHT-based audio coding.
- Author
-
Khaldi, Kais, Boudraa, Abdel-Ouahab, Torresani, Bruno, and Chonavel, Thierry
- Abstract
In this paper, a new audio coding scheme combining the Hilbert transform and the empirical mode decomposition (EMD) is introduced. Based on the EMD, the coding is fully a data-driven approach. Audio signal is first decomposed adaptively, by EMD, into intrinsic oscillatory components called intrinsic mode functions (IMFs). The key idea of this work is to code both instantaneous amplitude (IA) and instantaneous frequency (IF), of the extracted IMFs, calculated using Hilbert transform. Since IA (resp. IF) is strongly correlated, it is encoded via a linear prediction technique. The decoder recovers the original signal by superposition of the demodulated IMFs. The proposed approach is applied to audio signals, and the results are compared to those obtained by advanced audio coding (AAC) and MP3 codecs, and wavelets-based compression. Coding performances are evaluated using the bit rate, objective difference grade (ODG) and noise to mask ratio (NMR) measures. Based on the analyzed audio signals, overall, our coding scheme performs better than wavelet compression, AAC and MP3 codecs. Results also show that this new scheme has good coding performances without significant perceptual distortion, resulting in an ODG in range $$[-1,0]$$ and large negative NMR values. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
45. The IEEE 1857 Standard: Empowering Smart Video Surveillance Systems.
- Author
-
Gao, Wen, Tian, Yonghong, Huang, Tiejun, Ma, Siwei, and Zhang, Xianguo
- Subjects
VIDEO surveillance ,CODING theory ,VIDEO coding ,ELECTRONICS in traffic engineering - Abstract
The IEEE 1857 Standard for Advanced Audio and Video Coding was released as IEEE 1857-2013 in June 2013. Despite consisting of several different groups, the most significant feature of IEEE 1857-2013 is its Surveillance Groups, which can not only achieve at least twice the coding efficiency on surveillance videos as H.264/AVC High Profile, but it's the most analysis-friendly video coding standard. This article presents an overview of IEEE 1857 Surveillance Groups, highlighting background model-based coding technology and analysis-friendly functionalities. IEEE 1857-2013 will present new opportunities and drive research in smart video surveillance communities and industries. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
46. Optimal normalisation of prediction residual for predictive coding with random access.
- Author
-
Shu, Haiyan, Yu, Rongshan, and Huang, Haibin
- Abstract
Linear prediction serves as a mathematical operation to estimate the future values of a discrete‐time signal based on a linear function of previous samples. When applied to predictive coding of waveform such as speech and audio, a common issue that plagues compression performance is the non‐stationary characteristics of prediction residuals around the starting point of the random access frames. This is because dependencies between prediction residuals and the historical waveform are interrupted to satisfy the random access requirement. In such cases, the dynamic range of the prediction residuals will fluctuate dramatically in such frames, leading to substantially poor coding performance in the subsequent entropy coder. In this study, the authors developed a solution to this long‐standing issue by establishing a theoretical relationship between the energy envelope of linear prediction residuals in the random access frames and the prediction coefficients. Using the established relationship, an adaptive normalisation method is formulated as a preprocessor to the entropy coder to mitigate the poor coding performance in the random access frames. Simulation results confirm the superiority of the proposed method over existing solutions in terms of coding efficiency performance. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
47. A Time-Frequency Hybrid Downmixing Method for AC-3 Decoding.
- Author
-
Hui Li, Yiwen Wang, and Ping Li
- Subjects
TIME-frequency analysis ,DECODING algorithms ,DISCRETE cosine transforms ,FREQUENCY-domain analysis ,HYBRID systems - Abstract
In this letter, a time-frequency hybrid downmixing method is proposed for AC-3 decoding. The proposed method consists of downmixing frequency coefficients of long blocks and short blocks respectively, computing long and short inverse modified discrete cosine transforms (IMDCTs), and adding operation. Compared with one reported fast frequency-domain downmixing method, the proposed method does not need frequency coefficients conversion between long and short blocks, and has approximately 26% and 32% reduction in numbers of additions and multiplications, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
48. Cepstrum-Based Bandwidth Extension for Super-Wideband Coders.
- Author
-
Keunseok Cho, Sangbae Jeong, and Minsoo Hahn
- Subjects
CEPSTRUM analysis (Mechanics) ,BANDWIDTHS ,ULTRA-wideband radar ,DISCRETE cosine transforms ,AUDIO codec - Abstract
This letter proposes a bandwidth extension (BWE) method using the cepstral envelope coding and duplication of quantized wideband (WB) signals by means of analysis -by-synthesis (AbS) for super-wideband (SWB) coders. In the proposed method, a high frequency band is generated by utilizing the quantized cepstral coefficients extracted from the envelope and the quantized modified discrete cosine transform (MDCT) shape of the wideband signal. The proposed method is compared with the latest G.718 SWB codec and the experimental results show that the proposed method outperforms the baseline codec both in subjective listening tests and objective performance measures. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
49. Audio coding via EMD
- Author
-
Thierry Chonavel, Kais Khaldi, Mounia Turki Hadj-Alouane, Ali Komaty, Abdel-Ouahab Boudraa, Institut de Recherche de l'Ecole Navale (IRENAV), Université de Bordeaux (UB)-Institut Polytechnique de Bordeaux-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Arts et Métiers Sciences et Technologies, HESAM Université (HESAM)-HESAM Université (HESAM), Département Signal et Communications (IMT Atlantique - SC), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Lab-STICC_IMTA_CID_TOMS, Laboratoire des sciences et techniques de l'information, de la communication et de la connaissance (Lab-STICC), École Nationale d'Ingénieurs de Brest (ENIB)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)-Université Bretagne Loire (UBL)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-École Nationale d'Ingénieurs de Brest (ENIB)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)-Université Bretagne Loire (UBL)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), and Institut Mines-Télécom [Paris] (IMT)
- Subjects
Computer science ,Audio coding ,02 engineering and technology ,Sub-band coding ,Hilbert–Huang transform ,Wavelet ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Codec ,Psychoacoustics ,Electrical and Electronic Engineering ,Empirical mode decomposition ,Audio signal ,Applied Mathematics ,020206 networking & telecommunications ,Stationarity index ,Maxima and minima ,Computational Theory and Mathematics ,Signal Processing ,020201 artificial intelligence & image processing ,Empirical mode compression ,Computer Vision and Pattern Recognition ,Statistics, Probability and Uncertainty ,Algorithm ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,Psychoacoustic model ,Coding (social sciences) - Abstract
International audience; In this paper an audio coding scheme based on the empirical mode decomposition in association with a psychoacoustic model is presented. The principle of the method consists in breaking down adaptively the audio signal into intrinsic oscillatory components, called Intrinsic Mode Functions (IMFs), that are fully described by their local extrema. These extrema are encoded. The coding is carried out frame by frame and no assumption is made upon the signal to be coded. The number of allocated bits varies from mode to mode and obeys to the coding error inaudibility constraint. Due to the symmetry of an IMF, only the extrema (maxima or minima) of one of its interpolating envelopes are perceptually coded. In addition, to deal with rapidly changing audio signals, a stationarity index is used and when a transient is detected, the frame is split into two overlapping sub-frames. At the decoder side, the IMFs are recovered using the associated coded maxima, and the original signal is reconstructed by IMFs summation. Performance of the proposed coding is analyzed and compared to that of MP3 and AAC codecs, and the wavelet-based coding approach. Based on the analyzed mono audio signals, the obtained results show that the proposed coding scheme outperforms the MP3 and the wavelet-based coding methods and performs slightly better than the AAC codec, showing thus the potential of the EMD for data-driven audio coding.
- Published
- 2020
- Full Text
- View/download PDF
50. Analysis by synthesis spatial audio coding.
- Author
-
Elfitri, Ikhwana, Shi, Xiyu, and Kondoz, Ahmet
- Abstract
This study presents a novel spatial audio coding (SAC) technique, called analysis by synthesis SAC (AbS‐SAC), with a capability of minimising signal distortion introduced during the encoding processes. The reverse one‐to‐two (R‐OTT), a module applied in the MPEG Surround to down‐mix two channels as a single channel, is first configured as a closed‐loop system. This closed‐loop module offers a capability to reduce the quantisation errors of the spatial parameters, leading to an improved quality of the synthesised audio signals. Moreover, a sub‐optimal AbS optimisation, based on the closed‐loop R‐OTT module, is proposed. This algorithm addresses a problem of practicality in implementing an optimal AbS optimisation while it is still capable of improving further the quality of the reconstructed audio signals. In terms of algorithm complexity, the proposed sub‐optimal algorithm provides scalability. The results of objective and subjective tests are presented. It is shown that significant improvement of the objective performance, when compared to the conventional open‐loop approach, is achieved. On the other hand, subjective test show that the proposed technique achieves higher subjective difference grade scores than the tested advanced audio coding multichannel. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.