25 results on '"Rainer Martin"'
Search Results
2. First-Order Recursive Smoothing of Short-Time Power Spectra in the Presence of Interference
- Author
-
Jalal Taghia, Daniel Neudek, Tobias Rosenkranz, Henning Puder, and Rainer Martin
- Subjects
Applied Mathematics ,Signal Processing ,Electrical and Electronic Engineering - Published
- 2022
- Full Text
- View/download PDF
3. Privacy-Preserving Audio Classification Using Variational Information Feature Extraction
- Author
-
Rainer Martin and Alexandru Nelus
- Subjects
Scheme (programming language) ,Information privacy ,Acoustics and Ultrasonics ,Computer science ,Feature extraction ,Speaker recognition ,computer.software_genre ,Data modeling ,Computational Mathematics ,Robustness (computer science) ,Feature (computer vision) ,Computer Science (miscellaneous) ,Data mining ,Electrical and Electronic Engineering ,Representation (mathematics) ,computer ,computer.programming_language - Abstract
In this paper we investigate and tackle the privacy risks of deep-neural-network-based feature extraction for sound classification in acoustic sensor networks. To this end, we analyze a single-label domestic activity monitoring and a multi-label urban sound tagging scenario. We show that in both cases, the feature representations designed for sound classification also carry a significant amount of speaker-dependent data, thus posing serious privacy risks for speaker recognition attacks based on feature interception. We then propose to mitigate the aforementioned privacy risks by introducing a variational information feature extraction scheme that allows sound classification while, concurrently, minimizing the feature representation’s level of information and hence, inhibiting speaker recognition attempts. We control and analyze the balance between the performance of the trusted and attacker tasks via the resulting model’s composite loss function, its budget scaling factor, and latent space size. It is empirically demonstrated that the proposed privacy-preserving feature representation generalizes well to both single-label and multi-label scenarios with vast as well as reduced training-dataset resources. Furthermore, it exhibits robustness against x-vector-based, state-of-the-art speaker recognition attacks.
- Published
- 2021
- Full Text
- View/download PDF
4. Improved Target Detection through DNN-based Multi-channel Interference Mitigation in Automotive Radar
- Author
-
Shengyi Chen, Marvin Klemp, Jalal Taghia, Uwe Kühnau, Nils Pohl, and Rainer Martin
- Published
- 2023
- Full Text
- View/download PDF
5. Binaural Direct-to-Reverberant Energy Ratio and Speaker Distance Estimation
- Author
-
Rainer Martin and Mehdi Zohourian
- Subjects
Beamforming ,Sound localization ,Reverberation ,Critical distance ,Acoustics and Ultrasonics ,Computer science ,Maximum likelihood ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Computational Mathematics ,Computer Science (miscellaneous) ,Coherence (signal processing) ,Electrical and Electronic Engineering ,0305 other medical science ,Algorithm ,Binaural recording - Abstract
This article addresses the problem of distance estimation using binaural hearing aid microphones in reverberant rooms. Among several distance indicators, the direct-to-reverberant energy ratio (DRR) has been shown to be more effective than other features. Therefore, we present two novel approaches to estimate the DRR of binaural signals. The first method is based on the interaural magnitude-squared coherence whereas the second approach uses stochastic maximum likelihood beamforming to estimate the power of the direct and reverberant components. The proposed DRR estimation algorithms are integrated into a distance estimation technique. When based solely on DRR, the distance estimation algorithm requires calibration where naturally the critical distance is a good calibration point. We thus propose two approaches for the calibration of the distance estimation algorithm: Informed calibration using the critical distance of the reverberant room and blind calibration using the listener's own voice. Results across various acoustical environments show the benefit of the proposed algorithms for the estimation of sound source distances up to 3 m with an estimation error of about 35 cm using informed calibration and about 1 m using the fully blind calibration strategy.
- Published
- 2020
- Full Text
- View/download PDF
6. Making Music More Accessible for Cochlear Implant Listeners: Recent Developments
- Author
-
Anil Nagathil, Rainer Martin, and Waldo Nogueira
- Subjects
Applied Mathematics ,medicine.medical_treatment ,media_common.quotation_subject ,020206 networking & telecommunications ,02 engineering and technology ,Musical ,medicine.anatomical_structure ,Music perception ,Instrumental evaluation ,Cochlear implant ,Perception ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Auditory system ,Electrical and Electronic Engineering ,Psychology ,Timbre ,Coding (social sciences) ,media_common ,Cognitive psychology - Abstract
Cochlear implants (CIs) have become remarkably successful in restoring the hearing abilities of profoundly hearing-impaired or deaf people. Although in most cases the understanding of continuously spoken speech reaches around 90% after a training and adaptation time of two years, key musical features like pitch and timbre are poorly transmitted by CIs, leading to a severely distorted perception of music. Because music is a ubiquitous means of sociocultural interaction, this handicap significantly degrades the quality of life of CI users. Therefore, in this article, we present recent developments that enable CI users to access music. After a brief review of the state of the art of CIs, we point out the problems of inaccurate pitch and timbre transmission as well as its implications for music perception with CIs. The main part of this article encompasses different emerging strategies for improving CI users' music enjoyment, such as customized music compositions, music preprocessing methods for the reduction of signal complexity, and improved sound coding strategies, and we describe subjective and objective instrumental evaluation procedures.
- Published
- 2019
- Full Text
- View/download PDF
7. Binaural Speaker Localization Integrated Into an Adaptive Beamformer for Hearing Aids
- Author
-
Rainer Martin, Mehdi Zohourian, and Gerald Enzner
- Subjects
Beamforming ,Acoustics and Ultrasonics ,Computer science ,Speech recognition ,020206 networking & telecommunications ,Context (language use) ,Statistical model ,02 engineering and technology ,Background noise ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Computational Mathematics ,Noise ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Source separation ,Electrical and Electronic Engineering ,0305 other medical science ,Binaural recording ,Adaptive beamformer - Abstract
In this paper, we present and compare novel algorithms to localize simultaneous speakers using four microphones distributed on a pair of binaural hearing aids. The framework consists of two groups of localization algorithms, namely, beamforming-based and statistical model based localization algorithms. We first generalize our previously proposed methods based on beamforming techniques to the binaural configuration with 2 $\times$ 2 microphones. Next, we contribute two statistical model based methods for binaural localization using the maximum likelihood approach that also takes head-related transfer functions and unknown noise conditions into account. The methods enable the localization of multiple source positions for all azimuth angles and do not require prior training of binaural cues. The proposed localization algorithms are integrated into a generalized side-lobe canceller (GSC) to extract the desired speaker in the presence of competing speakers and background noise and when the head of the listener turns. The GSC components are adapted with the frequency-wise target presence probability and the frame-wise broadband direction-of-arrival (DOA) estimates that track the turns of the listener's head. We evaluate the performance of the localization algorithms individually and also in the context of the adaptive binaural beamformer in various noisy and reverberant conditions. Finally, we introduce a new adaptive beamformer, which combines the GSC with multichannel speech presence probability estimation and achieves superior source separation performance in noisy environment.
- Published
- 2018
- Full Text
- View/download PDF
8. A Frequency-Domain Adaptive Line Enhancer With Step-Size Control Based on Mutual Information for Harmonic Noise Reduction
- Author
-
Rainer Martin and Jalal Taghia
- Subjects
Acoustics and Ultrasonics ,Noise measurement ,Computer science ,Speech recognition ,Noise reduction ,020206 networking & telecommunications ,02 engineering and technology ,Noise floor ,Speech enhancement ,Gradient noise ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Computational Mathematics ,symbols.namesake ,Gaussian noise ,Colors of noise ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,symbols ,Value noise ,Electrical and Electronic Engineering ,0305 other medical science - Abstract
We propose an adaptive line enhancer with a frequency-dependent step-size. The proposed frequency-domain adaptive line enhancer is used as a single-channel noise reduction system for removing harmonic noise from noisy speech. Our main contribution is to exploit the temporal dependence in the log-magnitude and phase spectra of the noisy speech using mutual information, and to derive a frequency-dependent step-size which detects the presence of harmonic noise in different frequency bins. Our proposed step-size control allows the suppression of harmonic noise and the preservation of speech components. The experiments are performed with different real-life acoustic noises which contain harmonic components. Using instrumental speech intelligibility and quality measures, we demonstrate that the proposed approach can outperform the conventional frequency-domain adaptive line enhancer with a fixed step-size for harmonic noise reduction.
- Published
- 2016
- Full Text
- View/download PDF
9. Spectral Complexity Reduction of Music Signals for Mitigating Effects of Cochlear Hearing Loss
- Author
-
Anil Nagathil, Rainer Martin, and Claus Weihs
- Subjects
Acoustics and Ultrasonics ,Computer science ,medicine.medical_treatment ,Speech recognition ,02 engineering and technology ,behavioral disciplines and activities ,Reduction (complexity) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Distortion ,Cochlear implant ,otorhinolaryngologic diseases ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,medicine ,Source separation ,Auditory system ,Active listening ,Electrical and Electronic Engineering ,Dimensionality reduction ,020206 networking & telecommunications ,humanities ,Computational Mathematics ,medicine.anatomical_structure ,Principal component analysis ,0305 other medical science - Abstract
In this paper we study reduced-rank approximations of music signals in the constant-Q spectral domain as a means to reduce effects stemming from cochlear hearing loss. The rationale behind computing reduced-rank approximations is that they allow to reduce the spectral complexity of a music signal. The method is motivated by studies with cochlear implant listeners which have shown that solo instrumental music or music remixed at higher signal-to-interference ratios are preferred over complex music ensembles or orchestras. For computing the reduced-rank approximations we investigate methods based on principal component analysis and partial least squares analysis, and compare them to source separation algorithms. The strategies, which are applied to music with a predominant leading voice, are compared in terms of their ability for mitigating effects of simulated reduced frequency selectivity and with respect to source signal distortions. Established instrumental measures and a newly developed measure indicate a considerable reduction of the auditory distortion resulting from cochlear hearing loss. Furthermore, a listening test reveals a significant preference for the reduced-rank approximations in terms of melody clarity and ease of listening.
- Published
- 2016
- Full Text
- View/download PDF
10. Two-Stage Filter-Bank System for Improved Single-Channel Noise Reduction in Hearing Aids
- Author
-
Timo Gerkmann, Rainer Martin, Henning Puder, Alexander Schasse, Thomas Pilgrim, and Wolfgang Sörgel
- Subjects
Acoustics and Ultrasonics ,Noise measurement ,Computer science ,Speech recognition ,Salt-and-pepper noise ,Noise figure ,Speech processing ,Noise floor ,Computational Mathematics ,Noise ,Phase noise ,Computer Science (miscellaneous) ,Median filter ,Electrical and Electronic Engineering - Abstract
The filter-bank system implemented in hearing aids has to fulfill various constraints such as low latency and high stop-band attenuation, usually at the cost of low frequency resolution. In the context of frequency-domain noise-reduction algorithms, insufficient frequency resolution may lead to annoying residual noise artifacts since the spectral harmonics of the speech cannot properly be resolved. Especially in case of female speech signals, the noise between the spectral harmonics causes a distinct roughness of the processed signals. Therefore, this work proposes a two-stage filter-bank system, such that the frequency resolution can be improved for the purpose of noise reduction, while the original first-stage hearing-aid filter-bank system can still be used for compression and amplification. We also propose methods to implement the second filter-bank stage with little additional algorithmic delay. Furthermore, the computational complexity is an important design criterion. This finally leads to an application of the second filter-bank stage to lower frequency bands only, resulting in the ability to resolve the harmonics of speech. The paper presents a systematic description of the second filter-bank stage, discusses its influence on the processed signals in detail and further presents the results of a listening test which indicates the improved performance compared to the original single-stage filter-bank system.
- Published
- 2015
- Full Text
- View/download PDF
11. Estimation of Subband Speech Correlations for Noise Reduction via MVDR Processing
- Author
-
Rainer Martin and Alexander Schasse
- Subjects
Acoustics and Ultrasonics ,Computer science ,Speech recognition ,Noise reduction ,Wiener filter ,Speech coding ,Estimator ,Intelligibility (communication) ,Linear predictive coding ,Speech enhancement ,Computational Mathematics ,symbols.namesake ,Computer Science::Sound ,Frequency domain ,Computer Science (miscellaneous) ,symbols ,Electrical and Electronic Engineering - Abstract
Recently, it has been proposed to use the minimum-variance distortionless-response (MVDR) approach in single-channel speech enhancement in the short-time frequency domain. By applying optimal FIR filters to each subband signal, these filters reduce additive noise components with less speech distortion compared to conventional approaches. An important ingredient to these filters is the temporal correlation of the speech signals. We derive algorithms to provide a blind estimation of this quantity based on a maximum-likelihood and maximum a-posteriori estimation. To derive proper models for the inter-frame correlation of the speech and noise signals, we investigate their statistics on a large dataset. If the speech correlation is properly estimated, the previously derived subband filters discussed in this work show significantly less speech distortion compared to conventional noise reduction algorithms. Therefore, the focus of the experimental parts of this work lies on the quality and intelligibility of the processed signals. To evaluate the performance of the subband filters in combination with the clean speech inter-frame correlation estimators, we predict the speech quality and intelligibility by objective measures.
- Published
- 2014
- Full Text
- View/download PDF
12. Variational Bayesian Inference for Multichannel Dereverberation and Noise Reduction
- Author
-
Dominic Schmid, Sarmad Malik, Dorothea Kolossa, Gerald Enzner, and Rainer Martin
- Subjects
Reverberation ,Acoustics and Ultrasonics ,Computer science ,Speech recognition ,Noise reduction ,Bayesian inference ,Markov model ,Background noise ,Computational Mathematics ,Noise ,Computer Science::Sound ,Expectation–maximization algorithm ,Computer Science (miscellaneous) ,Electrical and Electronic Engineering ,Blind equalization - Abstract
Room reverberation and background noise severely degrade the quality of hands-free speech communication systems. In this work, we address the problem of combined speech dereverberation and noise reduction using a variational Bayesian (VB) inference approach. Our method relies on a multichannel state-space model for the acoustic channels that combines frame-based observation equations in the frequency domain with a first-order Markov model to describe the time-varying nature of the room impulse responses. By modeling the channels and the source signal as latent random variables, we formulate a lower bound on the log-likelihood function of the model parameters given the observed microphone signals and iteratively maximize it using an online expectation-maximization approach. Our derivation yields update equations to jointly estimate the channel and source posterior distributions and the remaining model parameters. An inspection of the resulting VB algorithmfor blind equalization and channel identification (VB-BENCH) reveals that the presented framework includes previously proposed methods as special cases. Finally, we evaluate the performance of our approach in terms of speech quality, adaptation times, and speech recognition results to demonstrate its effectiveness for a wide range of reverberation and noise conditions.
- Published
- 2014
- Full Text
- View/download PDF
13. Objective Intelligibility Measures Based on Mutual Information for Speech Subjected to Speech Enhancement Processing
- Author
-
Jalal Taghia and Rainer Martin
- Subjects
Acoustics and Ultrasonics ,Computer science ,Speech recognition ,Higher-order statistics ,Mutual information ,Intelligibility (communication) ,Information theory ,Correlation ,Speech enhancement ,Computational Mathematics ,Computer Science (miscellaneous) ,Active listening ,Hearing instruments ,Electrical and Electronic Engineering - Abstract
We propose a novel method for objective speech intelligibility prediction which can be useful in many application domains such as hearing instruments and forensics. Most objective intelligibility measures available in the literature employ some kind of signal-to-noise ratio (SNR) or a correlation-based comparison between the spectro-temporal representations of clean and processed speech. In this paper, we investigate the speech intelligibility prediction from the viewpoint of information theory and introduce novel objective intelligibility measures based on the estimated mutual information between the temporal envelopes of clean speech and processed speech in the subband domain. Mutual information allows to account for higher order statistics and hence to consider dependencies beyond the conventional second order statistics. Using data from three different listening tests it is shown that the proposed objective intelligibility measures provide promising results for speech intelligibility prediction in different scenarios of speech enhancement where speech is processed by non-linear modification strategies.
- Published
- 2014
- Full Text
- View/download PDF
14. Corpus-Based Speech Enhancement With Uncertainty Modeling and Cepstral Smoothing
- Author
-
Ramón Fernandez Astudillo, Dorothea Kolossa, Rainer Martin, and Robert M. Nickel
- Subjects
Voice activity detection ,Acoustics and Ultrasonics ,Computer science ,business.industry ,Speech recognition ,Pattern recognition ,Speech synthesis ,computer.software_genre ,Speech processing ,Speech enhancement ,Background noise ,Cepstrum ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,PESQ ,Smoothing - Abstract
We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010. Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean signals. The goal of the procedure is to perceptually improve the sound of speech signals in background noise. The proposed new method modifies Xiao's method in four significant ways. Firstly, it employs a Gaussian mixture model (GMM) instead of a vector quantizer in the phoneme recognition front-end. Secondly, the state decoding of the recognition stage is supported with an uncertainty modeling technique. With the GMM and the uncertainty modeling it is possible to eliminate the need for noise dependent system training. Thirdly, the post-processing of the original method via sinusoidal modeling is replaced with a powerful cepstral smoothing operation. And lastly, due to the improvements of these modifications, it is possible to extend the operational bandwidth of the procedure from 4 kHz to 8 kHz. The performance of the proposed method was evaluated across different noise types and different signal-to-noise ratios. The new method was able to significantly outperform traditional methods, including the one by Xiao and Nickel, in terms of PESQ scores and other objective quality measures. Results of subjective CMOS tests over a smaller set of test samples support our claims.
- Published
- 2013
- Full Text
- View/download PDF
15. Spectral Domain Speech Enhancement Using HMM State-Dependent Super-Gaussian Priors
- Author
-
Rainer Martin, Nasser Mohammadiha, and Arne Leijon
- Subjects
super-Gaussian pdf ,business.industry ,Applied Mathematics ,Gaussian ,Estimator ,Pattern recognition ,Electrical Engineering, Electronic Engineering, Information Engineering ,Linear predictive coding ,Exponential function ,Speech enhancement ,symbols.namesake ,Computer Science::Sound ,Signal Processing ,Prior probability ,symbols ,Gamma distribution ,speech enhancement ,Artificial intelligence ,HMM ,Electrical and Electronic Engineering ,Elektroteknik och elektronik ,Hidden Markov model ,business ,Mathematics - Abstract
The derivation of MMSE estimators for the DFT coefficients of speech signals, given an observed noisy signal and super-Gaussian prior distributions, has received a lot of interest recently. In this letter, we look at the distribution of the periodogram coefficients of different phonemes, and show that they have a gamma distribution with shape parameters less than one. This verifies that the DFT coefficients for not only the whole speech signal but also for individual phonemes have super-Gaussian distributions. We develop a spectral domain speech enhancement algorithm, and derive hidden Markov model (HMM) based MMSE estimators for speech periodogram coefficients under this gamma assumption in both a high uniform resolution and a reduced-resolution Mel domain. The simulations show that the performance is improved using a gamma distribution compared to the exponential case. Moreover, we show that, even though beneficial in some aspects, the Mel-domain processing does not lead to better results than the algorithms in the high-resolution domain. QC 20130221
- Published
- 2013
- Full Text
- View/download PDF
16. A Versatile Framework for Speaker Separation Using a Model-Based Speaker Localization Approach
- Author
-
Nilesh Madhu and Rainer Martin
- Subjects
Acoustics and Ultrasonics ,Computer science ,Robustness (computer science) ,Estimation theory ,Speech recognition ,Source separation ,Electrical and Electronic Engineering ,Mixture model ,Speaker recognition ,Adaptive beamformer ,Algorithm ,Blind signal separation ,Subspace topology - Abstract
We build upon our speaker localization framework developed in a previous work (N. Madhu and R. Martin, A scalable framework for multiple speaker localization and tracking,” in Proc. Int. Workshop Acoustic Echo Noise Control (IWAENC), Sep. 2008) to perform source separation. The proposed approach, exploiting the supplementary information from the mixture of Gaussians-based localization model, allows for the incorporation of a wide class of separation algorithms, from the nonlinear time-frequency mask-based approaches to a fully adaptive beamformer in the generalized sidelobe canceller (GSC) structure. We propose, in addition, a generalized estimation of the blocking matrix based on subspace projectors. The adaptive beamformer realized as proposed is insensitive to gain mismatches among the sensors, obviating the need for magnitude calibration of the microphones. It is also demonstrated that the proposed linear approach has a performance comparable to that of an optimal (oracle) GSC implementation. In comparison to ICA-based approaches, another advantage of the separation framework described herein is its robustness to ambient noise and scenarios with an unknown number of sources.
- Published
- 2011
- Full Text
- View/download PDF
17. Huge Music Archives on Mobile Devices
- Author
-
W Theimer, Holger Blume, Igor Vatolkin, Martin Botteck, Günter Rudolph, C Igel, G Roetter, Claus Weihs, Rainer Martin, and Bernd Bischl
- Subjects
Hardware architecture ,Multimedia ,business.industry ,Computer science ,Applied Mathematics ,Feature extraction ,Mobile computing ,computer.software_genre ,User experience design ,Signal Processing ,Music information retrieval ,Mobile telephony ,Electrical and Electronic Engineering ,User interface ,business ,computer ,Mobile device - Abstract
The availability of huge nonvolatile storage capacities such as flash memory allows large music archives to be maintained even in mobile devices. With the increase in size, manual organization of these archives and manual search for specific music becomes very inconvenient. Automated dynamic organization enables an attractive new class of applications for managing ever-increasing music databases. For these types of applications, extraction of music features as well as subsequent feature processing and music classification have to be performed. However, these are computationally intensive tasks and difficult to tackle on mobile platforms. Against this background, we provided an overview of algorithms for music classification as well as their computation times and other hardware-related aspects, such as power consumption on various hardware architectures. For mobile platforms such as smartphones, a careful balance of algorithm complexity, hardware architecture, and classification accuracy has to be found to provide a high quality user experience.
- Published
- 2011
- Full Text
- View/download PDF
18. Analysis of the Decision-Directed SNR Estimator for Speech Enhancement With Respect to Low-SNR and Transient Conditions
- Author
-
Colin Breithaupt and Rainer Martin
- Subjects
Speech enhancement ,Noise ,Acoustics and Ultrasonics ,Signal-to-noise ratio (imaging) ,Noise measurement ,Computer science ,Speech recognition ,Noise reduction ,Estimator ,Spectral density estimation ,Electrical and Electronic Engineering ,Smoothing - Abstract
Because of their many applications and their relative ease of implementation, single-channel speech enhancement algorithms have received much attention. As a consequence, a vast amount of publications on estimation procedures and their implementation in noise reduction systems exists. However, there has been little systematic research on the theoretic performance of such estimators. In this paper, we provide a systematic analysis of the performance of noise reduction algorithms in low signal-to-noise ratio (SNR) and transient conditions, where we consider approaches using the well-known decision-directed SNR estimator. We show that the smoothing properties of the decision-directed SNR estimator in low SNR conditions can be analytically described and that the limits of noise reduction for widely used spectral speech estimators based on the decision-directed approach can be predicted. We also illustrate that achieving both a good preservation of speech onsets in transient conditions on one side and the suppression of musical noise on the other can be especially problematic when the decision-directed SNR estimation is used.
- Published
- 2011
- Full Text
- View/download PDF
19. On the Statistics of Spectral Amplitudes After Variance Reduction by Temporal Cepstrum Smoothing and Cepstral Nulling
- Author
-
Timo Gerkmann and Rainer Martin
- Subjects
Degrees of freedom (statistics) ,Covariance ,Time–frequency analysis ,symbols.namesake ,Signal Processing ,Cepstrum ,Statistics ,symbols ,Variance reduction ,Mel-frequency cepstrum ,Electrical and Electronic Engineering ,Gaussian process ,Smoothing ,Mathematics - Abstract
In this paper, we derive the signal power bias that arises when spectral amplitudes are smoothed by reducing their variance in the cepstral domain (often referred to as cepstral smoothing) and develop a power bias compensation method. We show that if chi-distributed spectral amplitudes are smoothed in the cepstral domain, the resulting smoothed spectral amplitudes are also approximately chi-distributed but with more degrees of freedom and less signal power. The key finding for the proposed power bias compensation method is that the degrees of freedom of chi-distributed spectral amplitudes are directly related to their average cepstral variance. Furthermore, this work gives new insights into the statistics of the cepstral coefficients derived from chi-distributed spectral amplitudes using tapered spectral analysis windows. We derive explicit expressions for the variance and covariance of correlated chi-distributed spectral amplitudes and the resulting cepstral coefficients, parameterized by the degrees of freedom. The results in this work allow for a cepstral smoothing of spectral quantities without affecting their signal power. As we assume the parameterized chi-distribution for the spectral amplitudes, the results hold for Gaussian, super-Gaussian, and sub-Gaussian distributed complex spectral coefficients. The proposed bias compensation method is computationally inexpensive and shown to work very well for white and colored signals, as well as for rectangular and tapered spectral analysis windows.
- Published
- 2009
- Full Text
- View/download PDF
20. Improved A Posteriori Speech Presence Probability Estimation Based on a Likelihood Ratio With Fixed Priors
- Author
-
Rainer Martin, Colin Breithaupt, and Timo Gerkmann
- Subjects
A priori probability ,Acoustics and Ultrasonics ,Speech recognition ,Estimator ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Speech processing ,Speech enhancement ,Signal-to-noise ratio ,Computer Science::Sound ,Prior probability ,A priori and a posteriori ,Detection theory ,Electrical and Electronic Engineering ,Mathematics - Abstract
In this paper, we present an improved estimator for the speech presence probability at each time-frequency point in the short-time Fourier transform domain. In contrast to existing approaches, this estimator does not rely on an adaptively estimated and thus signal-dependent a priori signal-to-noise ratio estimate. It therefore decouples the estimation of the speech presence probability from the estimation of the clean speech spectral coefficients in a speech enhancement task. Using both a fixed a priori signal-to-noise ratio and a fixed prior probability of speech presence, the proposed a posteriori speech presence probability estimator achieves probabilities close to zero for speech absence and probabilities close to one for speech presence. While state-of-the-art speech presence probability estimators use adaptive prior probabilities and signal-to-noise ratio estimates, we argue that these quantities should reflect true a priori information that shall not depend on the observed signal. We present a detection theoretic framework for determining the fixed a priori signal-to-noise ratio. The proposed estimator is conceptually simple and yields a better tradeoff between speech distortion and noise leakage than state-of-the-art estimators.
- Published
- 2008
- Full Text
- View/download PDF
21. Cepstral Smoothing of Spectral Filter Gains for Speech Enhancement Without Musical Noise
- Author
-
Timo Gerkmann, Colin Breithaupt, and Rainer Martin
- Subjects
business.industry ,Applied Mathematics ,Speech recognition ,Pattern recognition ,Filter (signal processing) ,Speech enhancement ,Adaptive filter ,Noise ,Signal-to-noise ratio ,Computer Science::Sound ,Signal Processing ,Cepstrum ,Mel-frequency cepstrum ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Smoothing ,Mathematics - Abstract
Many speech enhancement algorithms that modify short-term spectral magnitudes of the noisy signal by means of adaptive spectral gain functions are plagued by annoying spectral outliers. In this letter, we propose cepstral smoothing as a solution to this problem. We show that cepstral smoothing can effectively prevent spectral peaks of short duration that may be perceived as musical noise. At the same time, cepstral smoothing preserves speech onsets, plosives, and quasi-stationary narrowband structures like voiced speech. The proposed recursive temporal smoothing is applied to higher cepstral coefficients only, excluding those representing the pitch information. As the higher cepstral coefficients describe the finer spectral structure of the Fourier spectrum, smoothing them along time prevents single coefficients of the filter function from changing excessively and independently of their neighboring bins, thus suppressing musical noise. The proposed cepstral smoothing technique is very effective in nonstationary noise.
- Published
- 2007
- Full Text
- View/download PDF
22. MAP Estimators for Speech Enhancement Under Normal and Rayleigh Inverse Gaussian Distributions
- Author
-
Richard C. Hendriks and Rainer Martin
- Subjects
Acoustics and Ultrasonics ,Rayleigh distribution ,Speech recognition ,Estimator ,Speech processing ,Discrete Fourier transform ,Inverse Gaussian distribution ,Speech enhancement ,symbols.namesake ,Signal-to-noise ratio ,Range (statistics) ,symbols ,Statistical physics ,Electrical and Electronic Engineering ,Mathematics - Abstract
This paper presents a new class of estimators for speech enhancement in the discrete Fourier transform (DFT) domain, where we consider a multidimensional normal inverse Gaussian (MNIG) distribution for the speech DFT coefficients. The MNIG distribution can model a wide range of processes, from heavy-tailed to less heavy-tailed processes. Under the MNIG distribution complex DFT and amplitude estimators are derived. In contrast to other estimators, the suppression characteristics of the MNIG-based estimators can be adapted online to the underlying distribution of the speech DFT coefficients. Compared to noise suppression algorithms based on preselected super-Gaussian distributions, the MNIG-based complex DFT and amplitude estimators lead to a performance improvement in terms of segmental signal-to-noise ratio (SNR) in the order of 0.3 to 0.6 dB and 0.2 to 0.6 dB, respectively
- Published
- 2007
- Full Text
- View/download PDF
23. Speech enhancement based on minimum mean-square error estimation and supergaussian priors
- Author
-
Rainer Martin
- Subjects
Minimum mean square error ,Acoustics and Ultrasonics ,business.industry ,Wiener filter ,Estimator ,Pattern recognition ,Noise (electronics) ,Discrete Fourier transform ,Complex normal distribution ,Speech enhancement ,symbols.namesake ,symbols ,Applied mathematics ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Gaussian process ,Software ,Mathematics - Abstract
This paper presents a class of minimum mean-square error (MMSE) estimators for enhancing short-time spectral coefficients of a noisy speech signal. In contrast to most of the presently used methods, we do not assume that the spectral coefficients of the noise or of the clean speech signal obey a (complex) Gaussian probability density. We derive analytical solutions to the problem of estimating discrete Fourier transform (DFT) coefficients in the MMSE sense when the prior probability density function of the clean speech DFT coefficients can be modeled by a complex Laplace or by a complex bilateral Gamma density. The probability density function of the noise DFT coefficients may be modeled either by a complex Gaussian or by a complex Laplacian density. Compared to algorithms based on the Gaussian assumption, such as the Wiener filter or the Ephraim and Malah (1984) MMSE short-time spectral amplitude estimator, the estimators based on these supergaussian densities deliver an improved signal-to-noise ratio.
- Published
- 2005
- Full Text
- View/download PDF
24. A psychoacoustic approach to combined acoustic echo cancellation and noise reduction
- Author
-
Rainer Martin, Peter Vary, Peter Jax, and Stefan Gustafsson
- Subjects
Acoustics and Ultrasonics ,Noise measurement ,Computer science ,Speech recognition ,Noise reduction ,Acoustics ,Echo (computing) ,Weighting ,Speech enhancement ,Background noise ,Noise ,Noise control ,Computer Vision and Pattern Recognition ,Electrical and Electronic Engineering ,Software - Abstract
This paper presents and compares algorithms for combined acoustic echo cancellation and noise reduction for hands-free telephones. A structure is proposed, consisting of a conventional acoustic echo canceler and a frequency domain postfilter in the sending path of the hands-free system. The postfilter applies the spectral weighting technique and attenuates both the background noise and the residual echo which remains after imperfect echo cancellation. Two weighting rules for the postfilter are discussed. The first is a conventional one, known from noise reduction, which is extended to attenuate residual echo as well as noise. The second is a psychoacoustically motivated weighting rule. Both rules are evaluated and compared by instrumental and auditive tests. They succeed about equally well in attenuating the noise and the residual echo. In listening tests, however, the psychoacoustically motivated weighting rule is mostly preferred since it leads to more natural near end speech and to less annoying residual noise.
- Published
- 2002
- Full Text
- View/download PDF
25. Noise power spectral density estimation based on optimal smoothing and minimum statistics
- Author
-
Rainer Martin
- Subjects
Noise power ,Acoustics and Ultrasonics ,business.industry ,Speech coding ,Spectral density estimation ,Pattern recognition ,Speech processing ,Linear predictive coding ,Background noise ,Speech enhancement ,Noise ,ComputingMethodologies_PATTERNRECOGNITION ,Computer Science::Sound ,Statistics ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Software ,Mathematics - Abstract
We describe a method to estimate the power spectral density of nonstationary noise when a noisy speech signal is given. The method can be combined with any speech enhancement algorithm which requires a noise power spectral density estimate. In contrast to other methods, our approach does not use a voice activity detector. Instead it tracks spectral minima in each frequency band without any distinction between speech activity and speech pause. By minimizing a conditional mean square estimation error criterion in each time step we derive the optimal smoothing parameter for recursive smoothing of the power spectral density of the noisy speech signal. Based on the optimally smoothed power spectral density estimate and the analysis of the statistics of spectral minima an unbiased noise estimator is developed. The estimator is well suited for real time implementations. Furthermore, to improve the performance in nonstationary noise we introduce a method to speed up the tracking of the spectral minima. Finally, we evaluate the proposed method in the context of speech enhancement and low bit rate speech coding with various noise types.
- Published
- 2001
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.