94 results on '"Bernd Edler"'
Search Results
52. Speech coding in MPEG-4.
- Author
-
Bernd Edler
- Published
- 1999
- Full Text
- View/download PDF
53. A Hands-On Comparison of DNNs for Dialog Separation Using Transfer Learning from Music Source Separation
- Author
-
Matteo Torcoli, Bernd Edler, Jouni Paulus, and Martin Strauss
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,Computer science ,Speech recognition ,SIGNAL (programming language) ,Separation (aeronautics) ,Degree (music) ,Computer Science - Sound ,Audio and Speech Processing (eess.AS) ,Application domain ,FOS: Electrical engineering, electronic engineering, information engineering ,Source separation ,Deep neural networks ,Dialog box ,Transfer of learning ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
This paper describes a hands-on comparison on using state-of-the-art music source separation deep neural networks (DNNs) before and after task-specific fine-tuning for separating speech content from non-speech content in broadcast audio (i.e., dialog separation). The music separation models are selected as they share the number of channels (2) and sampling rate (44.1 kHz or higher) with the considered broadcast content, and vocals separation in music is considered as a parallel for dialog separation in the target application domain. These similarities are assumed to enable transfer learning between the tasks. Three models pre-trained on music (Open-Unmix, Spleeter, and Conv-TasNet) are considered in the experiments, and fine-tuned with real broadcast data. The performance of the models is evaluated before and after fine-tuning with computational evaluation metrics (SI-SIRi, SI-SDRi, 2f-model), as well as with a listening test simulating an application where the non-speech signal is partially attenuated, e.g., for better speech intelligibility. The evaluations include two reference systems specifically developed for dialog separation. The results indicate that pre-trained music source separation models can be used for dialog separation to some degree, and that they benefit from the fine-tuning, reaching a performance close to task-specific solutions., accepted in INTERSPEECH 2021
- Published
- 2021
- Full Text
- View/download PDF
54. Tests on MPEG-4 audio codec proposals.
- Author
-
Laura Contin, Bernd Edler, D. Meares, and P. Schreiner
- Published
- 1997
- Full Text
- View/download PDF
55. Spectrum Segmentation Techniques for Edge-RAN Decoding in Telemetry-Based IoT Networks
- Author
-
Albert Heuberger, Bernd Edler, Michael Schadhauser, and Joerg Robert
- Subjects
Base station ,Computer science ,Real-time computing ,Mesh networking ,Enhanced Data Rates for GSM Evolution ,Filter (signal processing) ,Filter bank ,Wireless sensor network ,Random access ,Block (data storage) - Abstract
The possible fields of application for small sensor nodes are tremendous and still growing fast. Concepts like the Internet of Things (IoT), Smart City or Industry 4.0 adopt wireless sensor networks for environmental interaction or metering purposes. As they commonly operate in license-exempt frequency bands, telemetry transmissions of sensors are subject to strong interferences and possible shadowing. Especially in the scope of Low Power Wide Area (LPWA) communications, this scenario results in high computational effort and complexity for the receiver side to perceive the signals of interest. Therefore, this paper investigates means to an adequate segmentation of receive spectra for a partial spectrum exchange between base stations of telemetry-based IoT sensor networks. The distinct interchange of in-phase and quadrature (IQ) data could facilitate stream combining techniques to mask out interferences amongst other approaches. This shall improve decoding rates even under severe operation conditions and simultaneously limit the required data volume. We refer to this approach of a reception network as Edge-RAN (Random Access Network). To cope with the high data rates and still enable a base station collaboration, especially in wirelessly connected receiver mesh networks, different filter bank techniques and block transforms are examined, to divide telemetry spectra into distinct frequency sub-channels. Operational constraints for the spectral decomposition are given and different filter methodologies are introduced. Finally, suitable metrics are established. These metrics shall assess the performance of the presented spectrum segmentation schemes for the purpose of a selective partial interchange between sensor network receivers.
- Published
- 2021
- Full Text
- View/download PDF
56. Overlapping block transform: window design, fast algorithm, and an image coding experiment.
- Author
-
Miodrag R. Temerinac and Bernd Edler
- Published
- 1995
- Full Text
- View/download PDF
57. LINC: a common theory of transform and subband coding.
- Author
-
Miodrag R. Temerinac and Bernd Edler
- Published
- 1993
- Full Text
- View/download PDF
58. Blind Bandwidth Extension of Speech based on LPCNet
- Author
-
Bernd Edler and Konstantin Schmidt
- Subjects
Speech production ,Excitation signal ,Computer science ,Speech recognition ,Speech coding ,Bandwidth (signal processing) ,Bandwidth extension ,020206 networking & telecommunications ,02 engineering and technology ,Signal ,Autoregressive model ,Spectral envelope ,0202 electrical engineering, electronic engineering, information engineering ,Codec ,020201 artificial intelligence & image processing - Abstract
A blind bandwidth extension is presented which improves the perceived quality of 4 kHz speech by artificially extending the speech’s frequency range to 8 kHz. Based on the source-filter model of the human speech production, the speech signal is decomposed into spectral envelope and excitation signal and each of them is extrapolated separately. With this decomposition, good perceptual quality can be achieved while keeping the computational complexity low. The focus of this work is in the generation of an excitation signal with and autoregressive model that calculates a distribution for each audio sample conditioned on previous samples. This is achieved with a deep neural network following the architecture of LPCNet [1].A listening test shows that it significantly improves the perceived quality of bandlimited speech. The system has an algorithmic delay of 30 ms and can be applied in state-of-the-art speech and audio codecs.
- Published
- 2021
- Full Text
- View/download PDF
59. A unified approach to lapped orthogonal transforms.
- Author
-
Miodrag R. Temerinac and Bernd Edler
- Published
- 1992
- Full Text
- View/download PDF
60. Perceptual Audio Coding with Adaptive Non-uniform Time/frequency Tilings Using Subband Merging and Time Domain Aliasing Reduction
- Author
-
Bernd Edler and Nils Werner
- Subjects
Computer science ,Quantization (signal processing) ,media_common.quotation_subject ,020206 networking & telecommunications ,02 engineering and technology ,MUSHRA ,Filter bank ,Time–frequency analysis ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Aliasing ,Perception ,0202 electrical engineering, electronic engineering, information engineering ,Time domain ,0305 other medical science ,Algorithm ,Coding (social sciences) ,media_common - Abstract
In this paper, we investigate the coding efficiency of perceptual coding using an adaptive non-uniform orthogonal filter-bank based on MDCT analysis/synthesis and time domain aliasing reduction. We compare its performance to a system using a traditional adaptive uniform MDCT filterbank with window switching. The comparison is performed using a listening test at two different quantization settings. The statistical evaluation shows that the percetpual quality of the nonuniform filterbank significantly out-performs that of the uniform filterbank by 5 to 10 MUSHRA points.
- Published
- 2019
- Full Text
- View/download PDF
61. Audio Coding Using Overlap and Kernel Adaptation
- Author
-
Bernd Edler and Christian Helmrich
- Subjects
Modified discrete cosine transform ,Applied Mathematics ,Speech recognition ,020206 networking & telecommunications ,02 engineering and technology ,Sub-band coding ,Discrete sine transform ,Kernel (image processing) ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Discrete cosine transform ,Lapped transform ,Codec ,Electrical and Electronic Engineering ,Algorithm ,Transform coding ,Mathematics - Abstract
Perceptual audio coding schemes typically apply the modified discrete cosine transform (MDCT) with different lengths and windows, and utilize signal-adaptive switching between these on a perframe basis for best subjective performance. In previous papers, the authors demonstrated that further quality gains can be achieved for some input signals using additional transform kernels such as the modified discrete sine transform (MDST) or greater inter-transform overlap by means of a modified extended lapped transform (MELT). This work discusses the algorithmic procedures and codec modifications necessary to combine all of the above features—transform length, window shape, transform kernel, and overlap ratio switching—into a flexible input-adaptive coding system. It is shown that, due to full time-domain aliasing cancelation, this system supports perfect signal reconstruction in the absence of quantization and, thanks to fast realizations of all transforms, increases the codec complexity only negligibly. The results of a 5.1 multichannel listening test are also reported.
- Published
- 2016
- Full Text
- View/download PDF
62. CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning
- Author
-
Bernd Edler, Emanuel A. P. Habets, Fabian-Robert Stöter, Soumitro Chakrabarty, Scientific Data Management (ZENITH), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), International Audio Laboratories Erlangen (AUDIO LABS), Friedrich-Alexander Universität Erlangen-Nürnberg (FAU)-Fraunhofer Institute for Integrated Circuits (Fraunhofer IIS), Fraunhofer (Fraunhofer-Gesellschaft)-Fraunhofer (Fraunhofer-Gesellschaft), The authors gratefully acknowledge the compute resources and support provided by the Erlangen Regional Computing Center (RRZE). They would like to thank A. Liutkus for his constructive criticism of the paper., and Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Inria Sophia Antipolis - Méditerranée (CRISAM)
- Subjects
Speaker count estimation ,Reverberation ,cocktail-party ,overlap detection ,Acoustics and Ultrasonics ,Artificial neural network ,Computer science ,Speech recognition ,Supervised learning ,Probabilistic logic ,Blind signal separation ,Speaker diarisation ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Computational Mathematics ,[MATH.MATH-LO]Mathematics [math]/Logic [math.LO] ,Recurrent neural network ,Computer Science (miscellaneous) ,number of concurrent speakers ,Point estimation ,Electrical and Electronic Engineering ,0305 other medical science - Abstract
International audience; Estimating the maximum number of concurrent speakers from single-channel mixtures is a challenging problem and an essential first step to address various audio-based tasks such as blind source separation, speaker diarization, and audio surveillance. We propose a unifying probabilistic paradigm, where deep neural network architectures are used to infer output posterior distributions. These probabilities are in turn processed to yield discrete point estimates. Designing such architectures often involves two important and complementary aspects that we investigate and discuss. First, we study how recent advances in deep architectures may be exploited for the task of speaker count estimation. In particular, we show that convolutional recurrent neural networks outperform recurrent networks used in a previous study when adequate input features are used. Even for short segments of speech mixtures, we can estimate up to five speakers, with a significantly lower error than other methods. Second, through comprehensive evaluation, we compare the best-performing method to several baselines, as well as the influence of gain variations, different data sets, and reverberation. The output of our proposed method is compared to human performance. Finally, we give insights into the strategy used by our proposed method.
- Published
- 2019
- Full Text
- View/download PDF
63. Single-Channel Dereverberation Using Direct MMSE Optimization and Bidirectional LSTM Networks
- Author
-
Sebastian Braun, Soumitro Chakrabarty, Wolfgang Mack, Bernd Edler, Fabian-Robert Stöter, and Emanuel A. P. Habets
- Subjects
Reverberation ,Speech acquisition ,Computer science ,Speech recognition ,Short-time Fourier transform ,020206 networking & telecommunications ,02 engineering and technology ,Impulse (physics) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,symbols.namesake ,Fourier transform ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,0305 other medical science - Abstract
Dereverberation is useful in hands-free communication and voice controlled devices for distant speech acquisition. Single-channel dereverberation can be achieved by applying a time-frequency (TF) mask to the short-time Fourier transform (STFT) representation of a reverberant signal. Recent approaches have used deep neural networks (DNNs) to estimate such masks. Previously proposed DNN-based mask estimation methods train a DNN to minimize the mean-squared-error (MSE) between the desired and estimated masks. Recent TF mask estimation methods for signal separation directly minimize instead the MSE between the desired and estimated STFT magnitudes. We apply this direct optimization concept to dereverberation. Moreover, as reverberation exceeds the duration of a single STFT frame, we propose to use a bidirectional long short-term memory (LSTM) network which is able to take the relation between multiple STFT frames into account. We evaluated our method for different reverberation times and source-microphone distances using simulated as well as measured room impulse responses of different rooms. An evaluation of the proposed method and a comparison with a state-of-the-art method demonstrate the superiority of our approach and its robustness to different acoustic conditions.
- Published
- 2018
- Full Text
- View/download PDF
64. webMUSHRA — A Comprehensive Framework for Web-based Listening Tests
- Author
-
Fabian-Robert Stöter, Susanne Westphal, Jürgen Herre, Marlene Roess, Michael Schoeffler, Bernd Edler, Sarah Bartoschek, and Publica
- Subjects
Web standards ,Standardization ,Computer science ,web-based ,02 engineering and technology ,MUSHRA ,Library and Information Sciences ,computer.software_genre ,Listening test ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,Web application ,Active listening ,Web Audio API ,Audio signal processing ,lcsh:Computer software ,Multimedia ,business.industry ,020206 networking & telecommunications ,listening tests ,lcsh:QA76.75-76.765 ,auditory experiments ,Auditory experiments ,Web programming ,0305 other medical science ,business ,computer ,Software ,Information Systems - Abstract
For a long time, many popular listening test methods, such as ITU-R BS.1534 (MUSHRA), could not be carried out as web-based listening tests, since established web standards did not support all required audio processing features. With the standardization of the Web Audio API, the required features became available and, therefore, also the possibility to implement a wide range of established methods as web-based listening tests. In order to simplify the implementation of MUSHRA listening tests, the development of webMUSHRA was started. By utilizing webMUSHRA, experimenters can configure web-based MUSHRA listening tests without the need of web programming expertise. Today, webMUSHRA supports many more listening test methods, such as ITU-R BS.1116 and forced-choice procedures. Moreover, webMUS HRA is highly customizable and has been used in many auditory studies for different purposes.
- Published
- 2018
65. Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation
- Author
-
Soumitro Chakrabarty, Emanuel A. P. Habets, Fabian-Robert Stöter, and Bernd Edler
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,Channel (digital image) ,Artificial neural network ,Computer science ,Speech recognition ,Supervised learning ,020206 networking & telecommunications ,02 engineering and technology ,Blind signal separation ,Computer Science - Sound ,Speaker diarisation ,Audio and Speech Processing (eess.AS) ,0202 electrical engineering, electronic engineering, information engineering ,FOS: Electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Integer (computer science) ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
The task of estimating the maximum number of concurrent speakers from single channel mixtures is important for various audio-based applications, such as blind source separation, speaker diarisation, audio surveillance or auditory scene classification. Building upon powerful machine learning methodology, we develop a Deep Neural Network (DNN) that estimates a speaker count. While DNNs efficiently map input representations to output targets, it remains unclear how to best handle the network output to infer integer source count estimates, as a discrete count estimate can either be tackled as a regression or a classification problem. In this paper, we investigate this important design decision and also address complementary parameter choices such as the input representation. We evaluate a state-of-the-art DNN audio model based on a Bi-directional Long Short-Term Memory network architecture for speaker count estimations. Through experimental evaluations aimed at identifying the best overall strategy for the task and show results for five seconds speech segments in mixtures of up to ten speakers., Accepted in ICASSP 2018
- Published
- 2017
66. Nonuniform Orthogonal Filterbanks Based on MDCT Analysis/Synthesis and Time-Domain Aliasing Reduction
- Author
-
Bernd Edler, Nils Werner, and Publica
- Subjects
Modified discrete cosine transform ,Applied Mathematics ,Speech recognition ,020206 networking & telecommunications ,02 engineering and technology ,Time–frequency analysis ,Reduction (complexity) ,Aliasing ,Signal Processing ,Discrete frequency domain ,0202 electrical engineering, electronic engineering, information engineering ,Discrete cosine transform ,020201 artificial intelligence & image processing ,Time domain ,Electrical and Electronic Engineering ,Algorithm ,Impulse response ,Mathematics - Abstract
In this letter we describe nonuniform orthogonal modified discrete cosine transform (MDCT) filterbanks and time-domain aliasing reduction (TDAR). By adding a postprocessing step to the MDCT, our method allows for arbitrary nonuniform frequency resolutions using subband merging with smooth windowing and overlap in frequency. This overlap allows for an improved temporal compactness of the impulse response, which is especially useful for audio coders. The postprocessing step comprises another lapped MDCT transform along the frequency axis and TDAR along each subband signal.
- Published
- 2017
67. Masked threshold for noise bands masked by narrower bands of noise: Effects of masker bandwidth and center frequency
- Author
-
Armin Taghipour, Brian C. J. Moore, Bernd Edler, and Publica
- Subjects
Physics ,Acoustics and Ultrasonics ,Equivalent rectangular bandwidth ,Noise effects ,Acoustics ,Bandwidth (signal processing) ,01 natural sciences ,03 medical and health sciences ,Noise ,0302 clinical medicine ,Arts and Humanities (miscellaneous) ,0103 physical sciences ,Center frequency ,030223 otorhinolaryngology ,010301 acoustics ,Staircase method - Abstract
This paper examines how masked thresholds depend on the masker bandwidth and center frequency when the masker has a smaller bandwidth than the signal. The signal bandwidth was equal to the equivalent rectangular bandwidth of the auditory filter and the masker bandwidth was 0.1, 0.35, or 0.6 times the signal bandwidth. The masker and signal were centered at the same frequency of 257, 697, 1538, 3142, or 6930 Hz. Masked thresholds were estimated using a two-interval two-alternative forced-choice paradigm and a three-down one-up adaptive staircase method. Masked thresholds increased with increasing masker bandwidth and were lowest for medium center frequencies.
- Published
- 2016
68. Current Steering and Results From Novel Speech Coding Strategies
- Author
-
Thomas Lenarz, Martina Brendel, Andreas Buechner, Carolin Frohne-Büchner, Beate Krueger, Bernd Edler, and Waldo Nogueira
- Subjects
Adult ,Male ,Bionics ,Speech coding ,Pitch perception ,Deafness ,Prosthesis Design ,otorhinolaryngologic diseases ,Electronic engineering ,Humans ,Prosthesis design ,Medicine ,Prospective Studies ,Pitch Perception ,Cross-Over Studies ,Models, Statistical ,Voice activity detection ,Fourier Analysis ,business.industry ,fungi ,food and beverages ,Middle Aged ,Sensory Systems ,Electrodes, Implanted ,Cochlear Implants ,Otorhinolaryngology ,Speech Perception ,Female ,Neurology (clinical) ,Current (fluid) ,business ,Algorithms ,Communication channel - Abstract
Advanced Bionics' cochlear implants have independent current sources that can share stimulation current between 2 contacts (Current Steering). By stimulating 2 adjacent electrodes with different weights, different pitches can be evoked, allowing to increase the number of processing channels.A counterbalanced crossover design was used to compare 3 different current steering implementations to the clinical HiRes strategy.The study was a prospective, within-subject, repeated-measure experiment.The study group consisted of 8 postlingually deaf subjects with a minimum of 12-month experience in HiRes.The following programs were evaluated: 1) a Fast Fourier Transformation (FFT)-based current steering implementation with 120 stimulation sites; 2) the same current steering implementation but with 16,000 stimulation sites; and 3) a current steering implementation based on a sinusoidal decomposition of the original signal with 16,000 stimulation sites.Speech perception tests in quiet and in Comité Consultatif International Télégraphique et Téléphonique (CCITT) noise, as well as with competing talker; an adaptive test of the frequency difference limen; a Quality Assessment Questionnaire.Current results do not show any improvement in speech perception for a certain current steering strategy compared with HiRes. However, when selecting the optimal current steering strategy, subjects could achieve a significant benefit compared with the clinical HiRes. In addition, the frequency difference limen could be reduced significantly at 1,280 Hz.Current steering seems to have the potential to improve both understanding in adverse listening situations and frequency resolution. However, the optimal implementation needs further investigation.
- Published
- 2008
- Full Text
- View/download PDF
69. Common Fate Model for Unison source Separation
- Author
-
Roland Badeau, Fabian-Robert Stöter, Bernd Edler, Paul Magron, Antoine Liutkus, International Audio Laboratories Erlangen (AUDIO LABS), Friedrich-Alexander Universität Erlangen-Nürnberg (FAU)-Fraunhofer Institute for Integrated Circuits (Fraunhofer IIS), Fraunhofer (Fraunhofer-Gesellschaft)-Fraunhofer (Fraunhofer-Gesellschaft), Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Laboratoire Traitement et Communication de l'Information (LTCI), Télécom ParisTech-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS), ANR-13-CORD-0008,EDISON 3D,Edition et Diffusion Sonore spatialisée en 3 dimensions(2013), ANR-15-CE38-0003,KAMoulox,Démixage en ligne de larges archives sonores(2015), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Non-Negative tensor factorization ,Speech recognition ,Sound source separation ,Common Fate Model ,020206 networking & telecommunications ,Musical instrument ,02 engineering and technology ,Fundamental frequency ,Discrete Fourier transform ,Amplitude modulation ,030507 speech-language pathology & audiology ,03 medical and health sciences ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,Modulation (music) ,0202 electrical engineering, electronic engineering, information engineering ,Source separation ,Harmonic ,Spectrogram ,0305 other medical science ,Algorithm ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,Mathematics - Abstract
International audience; In this paper we present a novel source separation method aiming to overcome the difficulty of modelling non-stationary signals. The method can be applied to mixtures of musical instruments with frequency and/or amplitude modulation, e.g. typically caused by vi-brato. It is based on a signal representation that divides the complex spectrogram into a grid of patches of arbitrary size. These complex patches are then processed by a two-dimensional discrete Fourier transform, forming a tensor representation which reveals spectral and temporal modulation textures. Our representation can be seen as an alternative to modulation transforms computed on magnitude spectrograms. An adapted factorization model allows to decompose different time-varying harmonic sources based on their particular common modulation profile: hence the name Common Fate Model. The method is evaluated on musical instrument mixtures playing the same fundamental frequency (unison), showing improvement over other state-of-the-art methods.
- Published
- 2016
70. Durations required to distinguish noise and tone: Effects of noise bandwidth and frequency
- Author
-
Brian C. J. Moore, Armin Taghipour, Bernd Edler, Publica, Moore, Brian [0000-0001-7071-0671], and Apollo - University of Cambridge Repository
- Subjects
Audio noise measurement ,Masking (art) ,Adult ,Male ,Time Factors ,Acoustics and Ultrasonics ,Computer science ,Acoustics ,Noise figure ,01 natural sciences ,Pitch Discrimination ,03 medical and health sciences ,Young Adult ,0302 clinical medicine ,Arts and Humanities (miscellaneous) ,0103 physical sciences ,Humans ,Center frequency ,030223 otorhinolaryngology ,010301 acoustics ,Audio signal ,Noise measurement ,Noise (signal processing) ,Equivalent rectangular bandwidth ,Quantization (signal processing) ,Bandwidth (signal processing) ,Auditory Threshold ,Middle Aged ,Noise floor ,Sound recording and reproduction ,Noise ,Amplitude ,Acoustic Stimulation ,Colors of noise ,Audiometry, Pure-Tone ,Female ,Perceptual Masking ,Noise (radio) ,Psychoacoustics - Abstract
Perceptual audio coders exploit the masking properties of the human auditory system to reduce the bit rate in audio recording and transmission systems; it is intended that the quantization noise is just masked by the audio signal. The effectiveness of the audio signal as a masker depends on whether it is tone-like or noise-like. The determination of this, both physically and perceptually, depends on the duration of the stimuli. To gather information that might improve the efficiency of perceptual coders, the duration required to distinguish between a narrowband noise and a tone was measured as a function of center frequency and noise bandwidth. In experiment 1, duration thresholds were measured for isolated noise and tone bursts. In experiment 2, duration thresholds were measured for tone and noise segments embedded within longer tone pulses. In both experiments, center frequencies were 345, 754, 1456, and 2658 Hz and bandwidths were 0.25, 0.5, and 1 times the equivalent rectangular bandwidth of the auditory filter at each center frequency. The duration thresholds decreased with increasing bandwidth and with increasing center frequency up to 1456 Hz. It is argued that the duration thresholds depended mainly on the detection of amplitude fluctuations in the noise bursts.
- Published
- 2016
71. Signal-adaptive transform kernel switching for stereo audio coding
- Author
-
Bernd Edler and Christian Helmrich
- Subjects
Modified discrete cosine transform ,Discrete sine transform ,Speech recognition ,Discrete cosine transform ,Lapped transform ,S transform ,Algorithm ,Decoding methods ,Transform coding ,Sub-band coding ,Mathematics - Abstract
Modern stereo and multi-channel perceptual audio codecs utilizing the modified discrete cosine transform (MDCT) can achieve very good overall coding quality even at low bit-rates but lack efficiency on some material with inter-channel phase difference (IPD) of about ±90 degrees. To address this issue a generalization of the lapped transform coding scheme is proposed which retains the perfect reconstruction property while allowing the usage of three further transform kernels, one of which is the modified discrete sine transform (MDST). Blind listening tests indicate that by frame-wise adaptation of each channel's transform kernel to the instantaneous IPD characteristics, notable gains in coding quality are possible with only negligible increase in decoder complexity and parameter rate.
- Published
- 2015
- Full Text
- View/download PDF
72. Low-complexity semi-parametric joint-stereo audio transform coding
- Author
-
Stefan Bayer, Andreas Niedermeier, Christian Helmrich, and Bernd Edler
- Subjects
Modified discrete cosine transform ,MPEG-4 Part 3 ,Computer science ,Speech recognition ,Speech coding ,Sub-band coding ,Shannon–Fano coding ,Adaptive Multi-Rate audio codec ,Frequency domain ,Computer Science::Multimedia ,Discrete cosine transform ,Extended Adaptive Multi-Rate – Wideband ,Lapped transform ,Codec ,Algorithm ,Encoder ,Transform coding ,Data compression - Abstract
Traditional audio codecs based on real-valued transforms utilize separate and largely independent algorithmic schemes for parametric coding of noise-like or high-frequency spectral components as well as channel pairs. It is shown that in the frequency-domain part of coders such as Extended HE-AAC, these schemes can be unified into a single algorithmic block located at the core of the modified discrete cosine transform path, enabling greater flexibility like semi-parametric coding and large savings in codec delay and complexity. This paper focuses on the stereo coding aspect of this block and demonstrates that, by using specially chosen spectral configurations when deriving the parametric side-information in the encoder, perceptual artifacts can be reduced and the spatial processing in the decoder can remain real-valued. Listening tests confirm the benefit of our proposal at intermediate bit-rates.
- Published
- 2015
- Full Text
- View/download PDF
73. Comparison of two tonality estimation methods used in a psychoacoustic model
- Author
-
Armin Taghipour, Bernd Edler, and Hao Chen
- Subjects
Amplitude modulation ,Audio signal ,Masking threshold ,Computer science ,Speech recognition ,Spectral flatness ,Estimator ,Psychoacoustics ,Center frequency ,Filter bank - Abstract
Perceptual audio codecs apply psychoacoustic principles such as masking effects of the human auditory system in order to reduce irrelevancies in the input audio signal. Psychoacoustic studies show differences between masking strength of tonal and noise maskers: the masking effect of narrowband noise is stronger than that of a tone which has the same power and is placed in the center frequency of the noise. In this paper, two tonality estimation methods are discussed which are implemented in a filter bank based psychoacoustic model. The first method is called Partial Spectral Flatness Measure (PSFM) and the second is referred to as Amplitude Modulation Ratio (AM-R). The psychoacoustic model uses a set of complex band-pass filters. It was designed according to the temporal/spectral resolution of the human auditory system, and takes into account post masking as well as the spreading effect of individual local maskers in simultaneous masking. This paper describes the model, tonality estimation methods and their implementation. The estimators are compared to each other by subjective tests. The results are presented and discussed.
- Published
- 2014
- Full Text
- View/download PDF
74. Improved low-delay MDCT-based coding of both stationary and transient audio signals
- Author
-
Goran Markovic, Bernd Edler, and Christian Helmrich
- Subjects
Imagination ,Voice over IP ,Audio signal ,business.industry ,Computer science ,media_common.quotation_subject ,Speech recognition ,Speech coding ,Low delay ,Sub-band coding ,Search engine ,business ,Transform coding ,media_common - Abstract
General-purpose MDCT-based audio coders like MP3 or HE-AAC utilize long inter-transform overlap and lookahead-based transform length switching to provide good coding quality for both stationary and non-stationary, i. e. transient, input signals even at low bitrates. In low-delay communication scenarios such as Voice over IP, however, algorithmic delay due to framing and overlap typically needs to be reduced and additional lookahead must be avoided. We show that these restrictions limit the performance of contemporary low-delay transform coders on either stationary or transient material and propose 3 modifications: an improved noise substitution technique and increased overlap between “long”transforms for stationary, and “long to short” transform length switching without lookahead and directly from the long overlap for transient frames. A listening test indicates the merit of these changes when integrated into AAC-LD.
- Published
- 2014
- Full Text
- View/download PDF
75. Cheap beeps - Efficient synthesis of sinusoids and sweeps in the MDCT domain
- Author
-
Bernd Edler, Sascha Disch, and Benjamin Schubert
- Subjects
Tone (musical instrument) ,Noise ,Computer science ,Speech recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Discrete cosine transform ,Bandwidth extension ,Codec - Abstract
Modern transform audio coders often employ parametric enhancements, like noise substitution or bandwidth extension. In addition to these well-known parametric tools, it might also be desirable to synthesize parametric sinusoidal tones in the decoder. Low computational complexity is an important criterion in codec development and essential for acceptance and deployment. Therefore, efficient ways of generating these tones are needed. Since contemporary codecs like AAC or USAC are based on an MDCT domain representation of audio, we propose to generate synthetic tones by patching tone patterns into the MDCT spectrum at the decoder. We demonstrate how appropriate spectral patterns can be derived and adapted to their target location in (and between) the MDCT time/frequency (t/f) grid to seamlessly synthesize high quality sinusoidal tones including sweeps.
- Published
- 2013
- Full Text
- View/download PDF
76. Dependency of tonality perception on frequency, bandwidth and duration
- Author
-
Armin Taghipour, Jürgen Herre, Bernd Edler, and Masoumeh Amirpour
- Subjects
Acoustics and Ultrasonics ,Acoustics ,media_common.quotation_subject ,Bandwidth (signal processing) ,Estimator ,Arts and Humanities (miscellaneous) ,Perception ,Temporal resolution ,Codec ,Psychoacoustics ,Center frequency ,Tonality ,media_common ,Mathematics - Abstract
Psychoacoustic studies show that a narrowband noise masker exhibits a stronger simultaneous masking effect than a tonal masker with the same signal power placed at the noise center frequency. Consequently, perceptual audio codecs commonly incorporate some sort of tonality estimation as part of their perceptual model. However, common tonality estimation techniques do not necessarily reflect the perception of tonality by human listeners. As long as the tone and narrowband noise signals are long enough, they are easily distinguishable for normal hearing listeners. However, if the stimulus duration decreases, both signal types approach the shape of impulses and therefore, at some point become audibly identical. Consequently, at a given frequency and noise bandwidth, there is a duration threshold below which the signals cannot be distinguished. A series of so-called "2-AFC 3-step up-down" psychoacoustic tests are designed and carried out to investigate the frequency and bandwidth dependency of these duration thresholds. The test results, collected from 32 listeners, are statistically evaluated and confirm a decreasing threshold for increasing center frequency and bandwidth. These results can be used to improve psychoacoustic models for audio codecs by using tonality estimators with frequency and bandwidth adapted temporal resolution.
- Published
- 2013
- Full Text
- View/download PDF
77. Frequency selective pitch transposition of audio signals
- Author
-
Bernd Edler and Sascha Disch
- Subjects
Amplitude modulation ,Sound recording and reproduction ,Transposition (music) ,Audio signal ,Computer science ,Speech recognition ,Musical ,Audio signal processing ,computer.software_genre ,computer ,Frequency modulation ,Timbre - Abstract
Modern music production often uses pre-recorded pieces of audio, so-called samples, taken from a huge sample database. Consequently, there is an increasing demand to extensively adapt these samples to their intended new musical environment in a flexible way. Such an application, for instance, retroactively changes the key mode of audio recordings, e.g. from a major key to minor key by a frequency selective transposition of pitch. Recently, the modulation vocoder (MODVOC) has been proposed to handle this task. In this paper, two enhancements to the MODVOC are presented and the subjective quality of its application to selective pitch transposition is assessed. Moreover, the proposed scheme is compared with results obtained by applying a commercial computer program, which became newly available on the market. The proposed method is clearly preferred in terms of the perceptual quality aspect ≪melody and chords transposition≫, while the commercial program is favored by the majority with regard to the aspect ≪timbre preservation≫.
- Published
- 2011
- Full Text
- View/download PDF
78. Multiband perceptual modulation analysis, processing and synthesis of audio signals
- Author
-
Bernd Edler and Sascha Disch
- Subjects
Signal processing ,Audio electronics ,Audio signal ,Computer science ,Speech recognition ,Audio signal flow ,computer.software_genre ,Signal ,Adaptive filter ,Band-pass filter ,Modulation (music) ,Audio signal processing ,Timbre ,computer - Abstract
The decomposition of audio signals into perceptually meaningful multiband modulation components opens up new possibilities for advanced signal processing. The signal adaptive analysis approach proposed in this paper will be shown to provide a powerful handle on the signal's perceptual properties: pitch, timbre or roughness can be manipulated straight forward. Additionally a synthesis method is specified providing high subjective perceptual quality. Furthermore, as an application example, a novel audio processing technique is proposed which changes the key mode of a given piece of music e.g. from major to minor key or vice versa.
- Published
- 2009
- Full Text
- View/download PDF
79. Results from a psychoacoustic model-based strategy for the nucleus-24 and freedom cochlear implants
- Author
-
Thomas Lenarz, Andreas Büchner, Waldo Nogueira, Rolf-Dieter Battmer, and Bernd Edler
- Subjects
Adult ,Male ,medicine.medical_specialty ,Speech perception ,medicine.medical_treatment ,Speech coding ,Monitoring, Ambulatory ,Audiology ,Deafness ,Models, Biological ,Cochlear implant ,otorhinolaryngologic diseases ,medicine ,Humans ,Psychoacoustics ,Selection algorithm ,Aged ,Cross-Over Studies ,Models, Statistical ,business.industry ,Middle Aged ,Sensory Systems ,Cochlear Implants ,Otorhinolaryngology ,QUIET ,Calibration ,Speech Perception ,Female ,Neurology (clinical) ,business ,Encoder ,Sentence ,Algorithms - Abstract
OBJECTIVE In normal-hearing listeners acoustic masking occurs depending on frequency, amplitude, and energy of specific signals. If the selection of stimulated channels in cochlear implant systems was based on psychoacoustic masking models, the bandwidth of the electrode/nerve interface could be used more effectively by concentrating on relevant signal components and neglecting those that are usually not perceived by normal hearing listeners. Subsequently, a new strategy called PACE (Psychoacoustic Advanced Combination Encoder) has been developed which uses a psychoacoustic model for the channel selection instead of the simple maxima selection algorithm of the ACE strategy. STUDY DESIGN Only subjects having at least 2 years experience with the ACE strategy were included. A counterbalanced cross-over design was used to compare the new speech coding strategy with the ACE strategy. SETTING The investigation was a prospective, within-subject, repeated-measures experiment. PATIENTS The study group consisted of 10 postlingually deafened adult subjects. INTERVENTIONS The following programs were evaluated: (1) ACE with 8 maxima selected; (2) PACE with 8 channels selected; and (3) PACE with 4 channels selected. MAIN OUTCOME MEASURES Speech perception tests in quiet and noise, Quality Assessment Questionnaire. RESULTS Results indicate a trend towards better performance with PACE. Scores in the Freiburg monosyllabic word test increased by 8% while the SNR50 in the Oldenburger sentence test improved significantly by 1.3 dB. CONCLUSION The use of psychoacoustic masking models in speech coding strategies has the potential to improve speech perception performance in cochlear implant subjects.
- Published
- 2008
80. Aliasing Reduction for Modified Discrete Cosine Transform Domain Filtering and its Application to Speech Enhancement
- Author
-
Bernd Edler and Fabian Kuech
- Subjects
Speech enhancement ,Audio signal ,Modified discrete cosine transform ,Speech recognition ,Speech coding ,Discrete cosine transform ,Filter (signal processing) ,Filter bank ,Algorithm ,Transform coding ,Mathematics - Abstract
Efficient combinations of coding and manipulation of audio signals in the spectral domain are often desirable in communication systems. The modified discrete cosine transform (MDCT) represents a popular spectral transform in audio coding as it leads to compact signal representations. However, as the MDCT corresponds to a critically sampled filter bank, it is in general not appropriate to directly apply it to filtering tasks. In this paper we present a method to compensate for aliasing terms that arise from such direct MDCT domain filtering. The discussion is thereby based on a rigorous matrix representation of critically sampled filter banks which also leads to corresponding efficient realizations. As an application showcase, noise reduction for MDCT based speech coding is considered in simulations.
- Published
- 2007
- Full Text
- View/download PDF
81. Automatic speech recognition with a cochlear implant front-end
- Author
-
Tamás Harczos, Andreas Büchner, Jörn Ostermann, Bernd Edler, and Waldo Nogueira
- Subjects
Signal processing ,Computer science ,medicine.medical_treatment ,Speech recognition ,Intelligibility (communication) ,Speech processing ,Front and back ends ,Profound hearing loss ,Cochlear implant ,otorhinolaryngologic diseases ,medicine ,Psychoacoustics ,Hidden Markov model ,Encoder - Abstract
Today, cochlear implants (CIs) are the treatment of choice in patients with profound hearing loss. However speech intelligibility with these devices is still limited. A factor that determines hearing performance is the processing method used in CIs. Therefore, research is focused on designing different speech processing methods. The evaluation of these strategies is subject to variability as it is usually performed with cochlear implant recipients. Hence, an objective method for the evaluation would give more robustness compared to the tests performed with CI patients. This paper proposes a method to evaluate signal processing strategies for CIs based on a hidden markov model speech recognizer. Two signal processing strategies for CIs, the Advanced Combinational Encoder (ACE) and the Psychoacoustic Advanced Combinational Encoder (PACE), have been compared in a phoneme recognition task. Results show that PACE obtained higher recognition scores than ACE as found with CI recipients.
- Published
- 2007
- Full Text
- View/download PDF
82. An Auditory Model Based Strategy for Cochlear Implants
- Author
-
Bernd Edler, András Kátai, Waldo Nogueira, Frank Klefenz, Tamas Harczos, and Andreas Buechner
- Subjects
Engineering ,Signal processing ,Hair Cells, Auditory, Inner ,business.industry ,Speech recognition ,Cell model ,Filter bank ,Models, Biological ,Basilar Membrane ,Human auditory system ,Basilar membrane ,Cochlear Implants ,medicine.anatomical_structure ,Evoked Potentials, Auditory ,medicine ,Humans ,sense organs ,Hair cell ,business ,Algorithms - Abstract
A physiological and computational model of the human auditory system has been fitted in a signal processing strategy for cochlear implants (CIs). The aim of the new strategy is to obtain more natural sound in CIs by better mimicking the human auditory system. The new strategy was built in three independent stages as proposed in [6]. First a basilar membrane motion model was substituted by the filterbank commonly used in commercial strategies. Second, an inner hair cell model was included in a commercial strategy while maintaining the original filterbank. Third, both the basilar membrane motion and the inner-hair cell model were included in the commercial strategy. This paper analyses the properties and presents results obtained with CI recipients for each algorithm designed.
- Published
- 2007
- Full Text
- View/download PDF
83. Signal analysis by using adaptive filterbanks in cochlear implants
- Author
-
Bernd Edler, Waldo Nogueira, Andreas Büchner, and Amparo Albalate
- Subjects
Engineering ,Signal processing ,Audio signal ,business.industry ,Speech recognition ,medicine.medical_treatment ,Filter bank ,Speech processing ,Signal ,Adaptive filter ,Cochlear implant ,medicine ,business ,Encoder - Abstract
Current speech processing in cochlear implants use a filterbank to analyse audio signals into several frequency bands, each associated with one electrode. Because the processing is performed on input signal blocks of fixed sizes, the filterbank provides a unique time-frequency resolution to represent the various signal features. However, different components of audio signals may require different time-frequency resolutions for an accurate representation and perception. In this paper we investigate the influence on speech intelligibility in cochlear implants users when filterbanks with different time-frequency resolutions are used. In order to represent all signal features accurately, an adaptive filterbank has been developed that accepts input blocks of different sizes. The different resolutions required are achieved by adequately switching between block sizes depending on the input signal characteristics. The filterbank was incorporated into the commercial Advanced Combinational Encoder (ACE) and acutely tested on six cochlear implant recipients.
- Published
- 2006
- Full Text
- View/download PDF
84. Wavelet Packet Filterbank for Speech Processing Strategies in Cochlear Implants
- Author
-
Bernd Edler, Andreas Büchner, Waldo Nogueira, and A. Giese
- Subjects
Signal processing ,Audio signal ,Computer science ,Speech recognition ,medicine.medical_treatment ,Wavelet transform ,Intelligibility (communication) ,Filter bank ,computer.software_genre ,Speech processing ,Wavelet packet decomposition ,Wavelet ,Cochlear implant ,otorhinolaryngologic diseases ,medicine ,Audio signal processing ,computer ,Cochlea - Abstract
Current speech processing strategies for cochlear implants use a filterbank which decomposes the audio signals into multiple frequency bands each associated with one electrode. Pitch perception with cochlear implants is related to the number of electrodes inserted in the cochlea and to the rate of stimulation of these electrodes. The filterbank should, therefore, be able to analyze the time-frequency features of the audio signals while also exploiting the time-frequency features of the implant. This study investigates the influence on speech intelligibility in cochlear implant users when filterbanks with different time-frequency resolutions are used. Three filter-banks, based on the structure of a wavelet packet transform but using different basis functions, were designed. The filter-banks were incorporated into a commercial speech processing strategy and were tested on device users in an acute study.
- Published
- 2006
- Full Text
- View/download PDF
85. Coding of coefficients of two-dimensional non-separable adaptive Wiener interpolation filter
- Author
-
Jörn Ostermann, Yuri Vatis, I. Wassermann, Bernd Edler, and Dieu Thanh Nguyen
- Subjects
business.industry ,Low-pass filter ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Raised-cosine filter ,Adaptive filter ,Filter design ,Kernel adaptive filter ,Computer vision ,Artificial intelligence ,business ,Algorithm ,Digital filter ,Harmonic Vector Excitation Coding ,Root-raised-cosine filter ,Mathematics - Abstract
Standard video compression techniques apply motion-compensated prediction combined with transform coding of the prediction error. In the context of prediction with fractional-pel motion vector resolution it was shown, that aliasing components contained in an image signal are limiting the prediction accuracy obtained by motion compensation. In order to consider aliasing, quantisation and motion estimation errors, camera noise, etc., we analytically developed a two-dimensional (2D) non-separable interpolation filter, which is calculated for each frame independently by minimising the prediction error energy. For every fractional-pel position to be interpolated, an individual set of 2D filter coe±cients is determined. Since transmitting filter coefficients as side information results in an additional bit rate, which is almost independent for different total bit rates and image resolutions, the overall gain decreases when total bit rates decrease. In this paper we present an algorithm, which regards the non-separable two-dimensional filter as a polyphase filter. For each frame, predicting the interpolation filter impulse response through evaluation of the polyphase filter, we only have to encode the filter coefficients prediction error. This enables bit rate savings, needed for transmitting filter coe±cients of up to 75% compared to PCM coding. A coding gain of up to 1,2 dB Y-PSNR at same bit rate or up to 30% reduction of bit rate is obtained for HDTV-sequences compared to the standard H.264/AVC. Up to 0,5 dB (up to 10% bit rate reduction) are achieved for CIF-sequences.
- Published
- 2005
- Full Text
- View/download PDF
86. Verwendung eines psychoakustischen Modells bei einer modifizierten ACE-Strategie für das Nucleus 24 Implantat
- Author
-
W Nogueira, Andreas Büchner, Bernd Edler, T Lenarz, and Battmer Rd
- Subjects
Otorhinolaryngology - Published
- 2004
- Full Text
- View/download PDF
87. Parametric audio coding
- Author
-
Heiko Purnhagen and Bernd Edler
- Subjects
Computer science ,Speech recognition ,Speech coding ,Bandwidth extension ,computer.software_genre ,Computer Science::Multimedia ,Extended Adaptive Multi-Rate – Wideband ,Codec ,Waveform ,Sound quality ,Audio signal processing ,Transform coding ,Digital audio ,Audio signal ,MPEG-4 Part 3 ,Audio bit depth ,Audio signal flow ,Linear predictive coding ,Sub-band coding ,Adaptive Multi-Rate audio codec ,Computer Science::Sound ,Joint (audio engineering) ,computer ,Encoder ,Harmonic Vector Excitation Coding ,Data compression - Abstract
For very low bit rate audio coding applications in mobile communications or on the Internet, parametric audio coding has evolved as a technique complementing the more traditional approaches. These are transform codecs originally designed for achieving CD-like quality on one hand, and specialized speech codecs on the other hand. Both of these techniques usually represent the audio signal waveform in a way such that the decoder output signal gives an approximation of the encoder input signal, while taking into account perceptual criteria. Compared to this approach, in parametric audio coding the models of the signal source and of human perception are extended. The source model is now based on the assumption that the audio signal is the sum of "components," each of which can be approximated by a relatively simple signal model with a small number of parameters. The perception model is based on the assumption that the sound of the decoder output signal should be as similar as possible to that of the encoder input signal. Therefore, the approximation of waveforms is no longer necessary. This approach can lead to a very efficient representation. However, a suitable set of models for signal components, a good decomposition, and a good parameter estimation are all vital for achieving maximum audio quality. We give an overview on the current status of parametric audio coding developments and demonstrate advantages and challenges of this approach. Finally, we indicate possible directions of further improvements.
- Published
- 2002
- Full Text
- View/download PDF
88. Audio coding using a psychoacoustic pre- and post-filter
- Author
-
Bernd Edler and Gerald Schuller
- Subjects
Adaptive filter ,Filter design ,Computer Science::Sound ,Computer science ,Speech recognition ,Speech coding ,Codec ,Image warping ,Transform coding ,Sub-band coding ,Coding (social sciences) ,Interpolation - Abstract
A novel concept for perceptual audio coding is presented which is based on the combination of a pre- and post-filter, controlled by a psychoacoustic model, with a transform coding scheme. This paradigm allows modeling of the temporal and spectral shape of the masked threshold with a resolution independent of the used transform. By using frequency warping techniques the maximum possible detail for a given filter order can be made frequency-dependent and thus better adapted to the human auditory system. The filter coefficients are represented efficiently by LSF parameters which can be adaptively interpolated over time. First experiments with a system obtained by extending an existing transform codec showed that this approach can significantly improve the performance for speech signals, while the performance for other signals remained the same.
- Published
- 2002
- Full Text
- View/download PDF
89. Perceptual audio coding using adaptive pre- and post-filters and lossless compression
- Author
-
Bernd Edler, Gerald Schuller, Dawei Huang, Bin Yu, and Publica
- Subjects
Lossless compression ,Acoustics and Ultrasonics ,Computer science ,Speech recognition ,Speech coding ,Data_CODINGANDINFORMATIONTHEORY ,Dictionary coder ,Lossy compression ,Adaptive coding ,Computer Vision and Pattern Recognition ,Electrical and Electronic Engineering ,Software ,Context-adaptive binary arithmetic coding ,Data compression ,Image compression - Abstract
This paper proposes a versatile perceptual audio coding method that achieves high compression ratios and is capable of low encoding/decoding delay. It accommodates a variety of source signals (including both music and speech) with different sampling rates. It is based on separating irrelevance and redundancy reductions into independent functional units. This contrasts traditional audio coding where both are integrated within the same subband decomposition. The separation allows for the independent optimization of the irrelevance and redundancy reduction units. For both reductions, we rely on adaptive filtering and predictive coding as much as possible to minimize the delay. A psycho-acoustically controlled adaptive linear filter is used for the irrelevance reduction, and the redundancy reduction is carried out by a predictive lossless coding scheme, which is termed weighted cascaded least mean squared (WCLMS) method. Experiments are carried out on a database of moderate size which contains mono-signals of different sampling rates and varying nature (music, speech, or mixed). They show that the proposed WCLMS lossless coder outperforms other competing lossless coders in terms of compression ratios and delay, as applied to the pre-filtered signal. Moreover, a subjective listening test of the combined pre-filter/lossless coder and a state-of-the-art perceptual audio coder (PAC) shows that the new method achieves a comparable compression ratio and audio quality with a lower delay.
- Published
- 2002
90. 125. Simulation of a cochlear implant device diminishes the electrophysiological auditory orienting reaction to speech
- Author
-
T. F. Muente, Bernd Edler, C. Dethlefsen, Waldo Nogueira, Andreas Buechner, Wido Nager, Reinhard Dengler, and Thomas Lenarz
- Subjects
medicine.medical_specialty ,Electrophysiology ,Neurology ,business.industry ,Physiology (medical) ,Cochlear implant ,medicine.medical_treatment ,Medicine ,Neurology (clinical) ,Audiology ,business ,Sensory Systems - Published
- 2009
- Full Text
- View/download PDF
91. WO31 Electrophysiological comparison between ‘Hi Res’ and ‘Spec Res’ speech coding strategies in a virtual cochlea implant (CI) prosthesis
- Author
-
Waldo Nogueira, Wido Nager, T. F. Muente, Bernd Edler, Reinhard Dengler, Thomas Lenarz, J. Lambrecht, Andreas Buechner, and J. Ostermann
- Subjects
medicine.medical_specialty ,business.industry ,medicine.medical_treatment ,Speech coding ,Spec# ,Audiology ,Prosthesis ,Sensory Systems ,Electrophysiology ,Neurology ,Cochlea implant ,Physiology (medical) ,Medicine ,Neurology (clinical) ,business ,computer ,computer.programming_language - Published
- 2008
- Full Text
- View/download PDF
92. A Psychoacoustic 'NofM'-Type Speech Coding Strategy for Cochlear Implants
- Author
-
Thomas Lenarz, Andreas Büchner, Waldo Nogueira, and Bernd Edler
- Subjects
Masking (art) ,noise ,cis ,additivity ,Computer science ,speech coding ,medicine.medical_treatment ,Speech recognition ,ddc:621,3 ,Speech coding ,lcsh:TK7800-8360 ,signal processors ,lcsh:Telecommunication ,lcsh:TK5101-6720 ,Cochlear implant ,audio ,Data_FILES ,medicine ,Psychoacoustics ,psychoacoustic model ,Electrical and Electronic Engineering ,speak ,ACE ,masking ,Signal processing ,Audio signal ,nucleus ,NofM ,lcsh:Electronics ,cochlear implant ,Dewey Decimal Classification::600 | Technik::620 | Ingenieurwissenschaften und Maschinenbau::621 | Angewandte Physik::621,3 | Elektrotechnik, Elektronik ,Speech processing ,Noise ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Hardware and Architecture ,Signal Processing ,recognition - Abstract
We describe a new signal processing technique for cochlear implants using a psychoacoustic-masking model. The technique is based on the principle of a so-called "NofM" strategy. These strategies stimulate fewer channels ( ) per cycle than active electrodes (NofM; ). In "NofM" strategies such as ACE or SPEAK, only the channels with higher amplitudes are stimulated. The new strategy is based on the ACE strategy but uses a psychoacoustic-masking model in order to determine the essential components of any given audio signal. This new strategy was tested on device users in an acute study, with either 4 or 8 channels stimulated per cycle. For the first condition (4 channels), the mean improvement over the ACE strategy was . For the second condition (8 channels), no significant difference was found between the two strategies.
- Full Text
- View/download PDF
93. Codierung von Audiosignalen mit überlappender Transformation und adaptiven Fensterfunktionen
- Author
-
Bernd Edler
- Subjects
Signal processing ,Audio signal ,Computer science ,Audio equipment ,Speech recognition ,ddc:621,3 ,computer.file_format ,Dewey Decimal Classification::600 | Technik::620 | Ingenieurwissenschaften und Maschinenbau::621 | Angewandte Physik::621,3 | Elektrotechnik, Elektronik ,Overlapping Block Transform ,Bit Rate Reduction ,Transform Coder ,Time Domain Aliasing Cancellation ,Audio Signals ,Codes, Symbolic--Applications ,Signal Processing ,Electrical and Electronic Engineering ,Audio Equipment ,computer ,Adaptive Window Functions - Abstract
Das hier vorgestellte Verfahren der Bitratenreduktion für Audiosignale basiert auf überlappenden Transformationen mit „Time Domain Aliasing Cancellation“, deren Fensterfunktionen und Transformationslängen in Abhängigkeit vom Eingangssignal umgeschaltet werden. Die adaptive Fensterung verbessert das Verhalten der Transformationscodierung mit überlappenden Blöcken, die sich durch einen hohen Codierungsgewinn auszeichnet, beim Auftreten von Impulsen und Amplitudensprüngen im Eingangssignal. © 1989, Walter de Gruyter. Alle Rechte vorbehalten.
- Published
- 1989
94. Proceedings of the 17th International Conference on Digital Audio Effects, DAFx-14, Erlangen, Germany, September 1-5, 2014
- Author
-
Sascha Disch, Jürgen Herre, Rudolf Rabenstein, Bernd Edler, Meinard Müller, and Stefan Turowski
- Published
- 2014
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.