Author: "Shengkui Zhao" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

1. D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement

Author: Shengkui Zhao and Bin Ma
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Monaural speech enhancement has been widely studied using real networks in the time-frequency (TF) domain. However, the input and the target are naturally complex-valued in the TF domain, a fully complex network is highly desirable for effectively learning the feature representation and modelling the sequence in the complex domain. Moreover, phase, an important factor for perceptual quality of speech, has been proved learnable together with magnitude from noisy speech using complex masking or complex spectral mapping. Many recent studies focus on either complex masking or complex spectral mapping, ignoring their performance boundaries. To address above issues, we propose a fully complex dual-path dual-decoder conformer network (D2Former) using joint complex masking and complex spectral mapping for monaural speech enhancement. In D2Former, we extend the conformer network into the complex domain and form a dual-path complex TF self-attention architecture for effectively modelling the complex-valued TF sequence. We further boost the TF feature representation in the encoder and the decoders using a dual-path learning structure by exploiting complex dilated convolutions on time dependency and complex feedforward sequential memory networks (CFSMN) for frequency recurrence. In addition, we improve the performance boundaries of complex masking and complex spectral mapping by combining the strengths of the two training targets into a joint-learning framework. As a consequence, D2Former takes fully advantages of the complex-valued operations, the dual-path processing, and the joint-training targets. Compared to the previous models, D2Former achieves state-of-the-art results on the VoiceBank+Demand benchmark with the smallest model size of 0.87M parameters., Comment: 5 pages, 3 figures, accepted by ICASSP 2023
Published: 2023
Full Text: View/download PDF

2. FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech Enhancement

Author: Shengkui Zhao, Bin Ma, Karn N. Watcharasupat, and Woon-Seng Gan
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED) structure and a recurrent structure have achieved promising performance for monaural speech enhancement. However, feature representation across frequency context is highly constrained due to limited receptive fields in the convolutions of CED. In this paper, we propose a convolutional recurrent encoder-decoder (CRED) structure to boost feature representation along the frequency axis. The CRED applies frequency recurrence on 3D convolutional feature maps along the frequency axis following each convolution, therefore, it is capable of catching long-range frequency correlations and enhancing feature representations of speech inputs. The proposed frequency recurrence is realized efficiently using a feedforward sequential memory network (FSMN). Besides the CRED, we insert two stacked FSMN layers between the encoder and the decoder to model further temporal dynamics. We name the proposed framework as Frequency Recurrent CRN (FRCRN). We design FRCRN to predict complex Ideal Ratio Mask (cIRM) in complex-valued domain and optimize FRCRN using both time-frequency-domain and time-domain losses. Our proposed approach achieved state-of-the-art performance on wideband benchmark datasets and achieved 2nd place for the real-time fullband track in terms of Mean Opinion Score (MOS) and Word Accuracy (WAcc) in the ICASSP 2022 Deep Noise Suppression (DNS) challenge (https://github.com/alibabasglab/FRCRN)., The paper has been accepted by ICASSP 2022. 5 pages, 2 figures, 5 tables
Published: 2022

3. Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram

Author: Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, and Bin Ma
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Machine Learning, Computer science, Mean opinion score, Speech recognition, Inference, Phonetics, Mandarin Chinese, Computer Science - Sound, language.human_language, Machine Learning (cs.LG), Naturalness, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, language, Set (psychology), Prosody, Electrical Engineering and Systems Science - Audio and Speech Processing, Transformer (machine learning model)
Abstract: Cross-lingual voice conversion (VC) is an important and challenging problem due to significant mismatches of the phonetic set and the speech prosody of different languages. In this paper, we build upon the neural text-to-speech (TTS) model, i.e., FastSpeech, and LPCNet neural vocoder to design a new cross-lingual VC framework named FastSpeech-VC. We address the mismatches of the phonetic set and the speech prosody by applying Phonetic PosteriorGrams (PPGs), which have been proved to bridge across speaker and language boundaries. Moreover, we add normalized logarithm-scale fundamental frequency (Log-F0) to further compensate for the prosodic mismatches and significantly improve naturalness. Our experiments on English and Mandarin languages demonstrate that with only mono-lingual corpus, the proposed FastSpeech-VC can achieve high quality converted speech with mean opinion score (MOS) close to the professional records while maintaining good speaker similarity. Compared to the baselines using Tacotron2 and Transformer TTS models, the FastSpeech-VC can achieve controllable converted speech rate and much faster inference speed. More importantly, the FastSpeech-VC can easily be adapted to a speaker with limited training utterances., 5 pages, 2 figures, 4 tables, accepted by ICASSP 2021
Published: 2021
Full Text: View/download PDF

4. Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

Author: Trung Hieu Nguyen, Bin Ma, and Shengkui Zhao
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Machine Learning, Computer science, Speech recognition, Computer Science - Sound, Machine Learning (cs.LG), Time–frequency analysis, Speech enhancement, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Time domain, Representation (mathematics), Joint (audio engineering), Encoder, Decoding methods, Block (data storage), Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Deep complex U-Net structure and convolutional recurrent network (CRN) structure achieve state-of-the-art performance for monaural speech enhancement. Both deep complex U-Net and CRN are encoder and decoder structures with skip connections, which heavily rely on the representation power of the complex-valued convolutional layers. In this paper, we propose a complex convolutional block attention module (CCBAM) to boost the representation power of the complex-valued convolutional layers by constructing more informative features. The CCBAM is a lightweight and general module which can be easily integrated into any complex-valued convolutional layers. We integrate CCBAM with the deep complex U-Net and CRN to enhance their performance for speech enhancement. We further propose a mixed loss function to jointly optimize the complex models in both time-frequency (TF) domain and time domain. By integrating CCBAM and the mixed loss, we form a new end-to-end (E2E) complex speech enhancement framework. Ablation experiments and objective evaluations show the superior performance of the proposed approaches., 5 pages, 4 figures, 2 tables, accepted by ICASSP 2021
Published: 2021

5. End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression

Author: Karn N. Watcharasupat, Thi Ngoc Tho Nguyen, Woon-Seng Gan, Shengkui Zhao, and Bin Ma
Subjects: Signal Processing (eess.SP), FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Electrical Engineering and Systems Science - Signal Processing, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Machine Learning (cs.LG)
Abstract: Echo and noise suppression is an integral part of a full-duplex communication system. Many recent acoustic echo cancellation (AEC) systems rely on a separate adaptive filtering module for linear echo suppression and a neural module for residual echo suppression. However, not only do adaptive filtering modules require convergence and remain susceptible to changes in acoustic environments, but this two-stage framework also often introduces unnecessary delays to the AEC system when neural modules are already capable of both linear and nonlinear echo suppression. In this paper, we exploit the offset-compensating ability of complex time-frequency masks and propose an end-to-end complex-valued neural network architecture. The building block of the proposed model is a pseudocomplex extension based on the densely-connected multidilated DenseNet (D3Net) building block, resulting in a very small network of only 354K parameters. The architecture utilized the multi-resolution nature of the D3Net building blocks to eliminate the need for pooling, allowing the network to extract features using large receptive fields without any loss of output resolution. We also propose a dual-mask technique for joint echo and noise suppression with simultaneous speech enhancement. Evaluation on both synthetic and real test sets demonstrated promising results across multiple energy-based metrics and perceptual proxies., Comment: To be presented at the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP)
Published: 2021
Full Text: View/download PDF

6. Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion

Author: Hao Wang, Shengkui Zhao, Trung Hieu Nguyen, and Bin Ma
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Machine Learning, Computer science, Speech recognition, Speech synthesis, Intelligibility (communication), computer.software_genre, Mandarin Chinese, language.human_language, Computer Science - Sound, Machine Learning (cs.LG), Fluency, Naturalness, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Code (cryptography), language, Natural (music), computer, Transformer (machine learning model), Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recent state-of-the-art neural text-to-speech (TTS) synthesis models have dramatically improved intelligibility and naturalness of generated speech from text. However, building a good bilingual or code-switched TTS for a particular voice is still a challenge. The main reason is that it is not easy to obtain a bilingual corpus from a speaker who achieves native-level fluency in both languages. In this paper, we explore the use of Mandarin speech recordings from a Mandarin speaker, and English speech recordings from another English speaker to build high-quality bilingual and code-switched TTS for both speakers. A Tacotron2-based cross-lingual voice conversion system is employed to generate the Mandarin speaker's English speech and the English speaker's Mandarin speech, which show good naturalness and speaker similarity. The obtained bilingual data are then augmented with code-switched utterances synthesized using a Transformer model. With these data, three neural TTS models -- Tacotron2, Transformer and FastSpeech are applied for building bilingual and code-switched TTS. Subjective evaluation results show that all the three systems can produce (near-)native-level speech in both languages for each of the speaker., 5 pages, 2 figures, INTERSPEECH 2020
Published: 2020

7. Large-region acoustic source mapping using a movable array and sparse covariance fitting

Author: Cagdas Tuna, Shengkui Zhao, Thi Ngoc Tho Nguyen, and Douglas L. Jones
Subjects: Beamforming, Signal processing, Acoustics and Ultrasonics, Covariance matrix, Computer science, Linear model, 020206 networking & telecommunications, Reconstruction algorithm, 02 engineering and technology, Covariance, 01 natural sciences, Sample mean and sample covariance, Noise, Arts and Humanities (miscellaneous), Region of interest, 0103 physical sciences, Statistics, 0202 electrical engineering, electronic engineering, information engineering, Sound pressure, 010301 acoustics, Algorithm
Abstract: Large-region acoustic source mapping is important for city-scale noise monitoring. Approaches using a single-position measurement scheme to scan large regions using small arrays cannot provide clean acoustic source maps, while deploying large arrays spanning the entire region of interest is prohibitively expensive. A multiple-position measurement scheme is applied to scan large regions at multiple spatial positions using a movable array of small size. Based on the multiple-position measurement scheme, a sparse-constrained multiple-position vectorized covariance matrix fitting approach is presented. In the proposed approach, the overall sample covariance matrix of the incoherent virtual array is first estimated using the multiple-position array data and then vectorized using the Khatri-Rao (KR) product. A linear model is then constructed for fitting the vectorized covariance matrix and a sparse-constrained reconstruction algorithm is proposed for recovering source powers from the model. The user parameter settings are discussed. The proposed approach is tested on a 30 m × 40 m region and a 60 m × 40 m region using simulated and measured data. Much cleaner acoustic source maps and lower sound pressure level errors are obtained compared to the beamforming approaches and the previous sparse approach [Zhao, Tuna, Nguyen, and Jones, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP) (2016)].
Published: 2017
Full Text: View/download PDF

8. Performance analysis and enhancements of adaptive algorithms and their applications

Author: Shengkui Zhao, Man Zhihong, Cai Jianfei, School of Computer Engineering, and Centre for Computational Intelligence
Subjects: Least mean squares filter, Adaptive filter, Signal processing, Noise (signal processing), Computer science, Adaptive system, Asymptotic computational complexity, Algorithm engineering, Engineering::Computer science and engineering::Theory of computation::Analysis of algorithms and problem complexity [DRNTU], Probabilistic analysis of algorithms, Algorithm
Abstract: Adaptive filters that self-adjust their transfer functions according to optimizing algorithms are powerful adaptive systems with numerous applications in the fields of signal processing, communications, radar, sonar, seismology, navigation systems and biomedical engineering. An adaptive signal processing algorithm, e.g., the least mean squares (LMS) algorithm and the recursive least square (RLS) algorithm, is used to deal with adaptation of adaptive filters. The adaptive algorithms are expected to be computationally simple, numerically robust, fast convergent and low fluctuant. Unfortunately, none of the adaptive algorithms developed so far perfectly fulfils these requirements. The stability and convergence performance of the widely-used adaptive algorithms also haven’t been fully explored. This work aims to deal with performance analysis and enhancements for the adaptive algorithms and their applications. We first develop a new variable step-size adjustment scheme for the LMS algorithm using a quotient form of filtered quadratic output errors. Compared to the existing approaches, the proposed scheme reduces the convergence sensitivity to the power of the measurement noise and improves the steady-state performance and tracking capability for comparable transient behavior, with negligible increase in the computational costs. We then develop variable step-size approaches for the normalized least mean squares (NLMS) algorithm. We derive the optimal step-size which minimizes the mean square deviation at each iteration, and propose four approximated step-sizes according to the correlation properties of the additive noise and the variations of the input excitation. We next analyze the stability and performance of the transform-domain LMS algorithms which preprocess the inputs with a fixed data-independent orthogonal transform such as
Published: 2019
Full Text: View/download PDF

9. Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition

Author: Shengkui Zhao, Chongjia Ni, Rong Tong, and Bin Ma
Subjects: Adversarial system, Computer science, Consistency (statistics), Speech recognition, Residual, Joint (audio engineering), Generative grammar, Task (project management)
Published: 2019
Full Text: View/download PDF

10. Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks

Author: Trung Hieu Nguyen, Shengkui Zhao, Hao Wang, and Bin Ma
Subjects: Adversarial system, Computer science, business.industry, Artificial intelligence, Many-to-many (data model), Star (graph theory), Residual, business, Generative grammar
Published: 2019
Full Text: View/download PDF

11. Drive-by large-region acoustic noise-source mapping via sparse beamforming tomography

Author: Cagdas Tuna, Thi Ngoc Tho Nguyen, Douglas L. Jones, and Shengkui Zhao
Subjects: Beamforming, Microphone array, Acoustics and Ultrasonics, Computer science, Acoustics, 020206 networking & telecommunications, 02 engineering and technology, 01 natural sciences, Noise, Arts and Humanities (miscellaneous), Noise-canceling microphone, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Tomography, Loudspeaker, Environmental noise, Sound pressure, 010301 acoustics
Abstract: Environmental noise is a risk factor for human physical and mental health, demanding an efficient large-scale noise-monitoring scheme. The current technology, however, involves extensive sound pressure level (SPL) measurements at a dense grid of locations, making it impractical on a city-wide scale. This paper presents an alternative approach using a microphone array mounted on a moving vehicle to generate two-dimensional acoustic tomographic maps that yield the locations and SPLs of the noise-sources sparsely distributed in the neighborhood traveled by the vehicle. The far-field frequency-domain delay-and-sum beamforming output power values computed at multiple locations as the vehicle drives by are used as tomographic measurements. The proposed method is tested with acoustic data collected by driving an electric vehicle with a rooftop-mounted microphone array along a straight road next to a large open field, on which various pre-recorded noise-sources were produced by a loudspeaker at different locations. The accuracy of the tomographic imaging results demonstrates the promise of this approach for rapid, low-cost environmental noise-monitoring.
Published: 2016
Full Text: View/download PDF

12. Wideband compressive beamforming tomography for drive-by large-scale acoustic source mapping

Author: Cagdas Tuna, Thi Ngoc Tho Nguyen, Douglas L. Jones, and Shengkui Zhao
Subjects: Beamforming, Microphone array, Tomographic reconstruction, Acoustics and Ultrasonics, Computer science, Noise pollution, Acoustics, Spectral density, 020206 networking & telecommunications, 02 engineering and technology, 01 natural sciences, Frequency spectrum, Noise, Compressed sensing, Arts and Humanities (miscellaneous), 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Loudspeaker, Wideband, Environmental noise, 010301 acoustics
Abstract: Noise-mapping is an effective sound visualization tool for the identification of urban noise hotspots, which is crucial to taking targeted measures to tackle environmental noise pollution. This paper develops a high-resolution wideband acoustic source mapping methodology using a portable microphone array, where the joint localization and power spectrum estimation of individual sources sparsely distributed over a large region are achieved by tomographic imaging with the multi-frequency delay-and-sum beamforming power outputs from multiple array positions. Exploiting the fact that a wideband source has a common spatial signal-support across the frequency spectrum, two-dimensional tomographic maps are produced by applying compressive sensing techniques including group least absolute shrinkage selection operator formulation and sparse Bayesian learning to promote group sparsity over multiple frequency bands. The high-resolution mapping is demonstrated with experimental data recorded with a microphone array mounted atop an electric vehicle driven along a road while playing audio clips from a loudspeaker positioned within the adjacent open field.
Published: 2018

13. ITEM: Immersive Telepresence for Entertainment and Meetings—A Practical Approach

Author: Dung T. Vu, Hongsheng Yang, Viet-Anh Nguyen, Jiangbo Lu, Minh N. Do, Shengkui Zhao, and Douglas L. Jones
Subjects: FOS: Computer and information sciences, 3D sound localization, business.product_category, Multimedia, Computer science, business.industry, Teleconference, H.5.1, I.4.9, computer.software_genre, Multimedia (cs.MM), Videoconferencing, Software deployment, Laptop, Signal Processing, Scalability, The Internet, Electrical and Electronic Engineering, business, computer, Computer Science - Multimedia, Coding (social sciences)
Abstract: This paper presents an Immersive Telepresence system for Entertainment and Meetings (ITEM). The system aims to provide a radically new video communication experience by seamlessly merging participants into the same virtual space to allow a natural interaction among them and shared collaborative contents. With the goal to make a scalable, flexible system for various business solutions as well as easily accessible by massive consumers, we address the challenges in the whole pipeline of media processing, communication, and displaying in our design and realization of such a system. Particularly, in this paper we focus on the system aspects that maximize the end-user experience, optimize the system and network resources, and enable various teleimmersive application scenarios. In addition, we also present a few key technologies, i.e. fast object-based video coding for real world data and spatialized audio capture and 3D sound localization for group teleconferencing. Our effort is to investigate and optimize the key system components and provide an efficient end-to-end optimization and integration by considering user needs and preferences. Extensive experiments show the developed system runs reliably and comfortably in real time with a minimal setup requirement (e.g. a webcam and/or a depth camera, an optional microphone array, a laptop/desktop connected to the public Internet) for teleimmersive communication. With such a really minimal deployment requirement, we present a variety of interesting applications and user experiences created by ITEM.
Published: 2015
Full Text: View/download PDF

14. On time-frequency mask estimation for MVDR beamforming with application in robust speech recognition

Author: Xiong Xiao, Eng Siong Chng, Haizhou Li, Douglas L. Jones, and Shengkui Zhao
Subjects: Beamforming, Artificial neural network, Covariance function, Computer science, business.industry, Speech recognition, Word error rate, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Time–frequency analysis, 030507 speech-language pathology & audiology, 03 medical and health sciences, Recurrent neural network, Minimum-variance unbiased estimator, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, 0305 other medical science, business
Abstract: Acoustic beamforming has played a key role in the robust automatic speech recognition (ASR) applications. Accurate estimates of the speech and noise spatial covariance matrices (SCM) are crucial for successfully applying the minimum variance distortionless response (MVDR) beamforming. Reliable estimation of time-frequency (TF) masks can improve the estimation of the SCMs and significantly improve the performance of the MVDR beamforming in ASR tasks. In this paper, we focus on the TF mask estimation using recurrent neural networks (RNN). Specifically, our methods include training the RNN to estimate the speech and noise masks independently, training the RNN to minimize the ASR cost function directly, and performing multiple passes to iteratively improve the mask estimation. The proposed methods are evaluated individually and overally on the CHiME-4 challenge. The results show that the proposed methods improve the ASR performance individually and also work complementarily. The overall performance achieves a word error rate of 8.9% with 6-microphone configuration, which is much better than 12.0% achieved with the state-of-the-art MVDR implementation.
Published: 2017
Full Text: View/download PDF

15. A novel sparse model for multi-source localization using distributed microphone array

Author: Thi Ngoc Tho Nguyen, Douglas L. Jones, Shengkui Zhao, and Cagdas Tuna
Subjects: 0209 industrial biotechnology, Microphone array, Cross-correlation, Microphone, Speech recognition, 02 engineering and technology, Sparse approximation, Inverse problem, Multilateration, Power (physics), 030507 speech-language pathology & audiology, 03 medical and health sciences, 020901 industrial engineering & automation, 0305 other medical science, Algorithm, Multi-source, Mathematics
Abstract: When distances between microphone pairs are larger than the half-wavelength of signals, source localization methods using cross-correlation such as time-difference-of-arrival (TDOA), steered response power (SRP) are commonly used in practice. We present here a novel model that expresses microphone pairwise cross-correlations as a sum of autocorrelations of source signals shifted by the relative delays of the signals arriving at the microphone pairs, and weighted by the source power and the distances between the sources and the microphone pairs. The model is formulated as a linear inverse problem and is sparse with respect to the source power map. The source power map, which directly shows the locations of all the sound sources, can be reconstructed using l 1 -norm minimization algorithms. We demonstrate the effectiveness of our model in a wildlife monitoring application, where the goal is to locate multiple frogs in a dense chorus.
Published: 2017
Full Text: View/download PDF

16. Teleimmersive Audio-Visual Communication Using Commodity Hardware [Applications Corner]

Author: Minh N. Do, Douglas L. Jones, Viet-Anh Nguyen, Jiangbo Lu, and Shengkui Zhao
Subjects: Multimedia, Computer science, Commodity hardware, Human–computer interaction, Applied Mathematics, Signal Processing, Audio visual, Electrical and Electronic Engineering, computer.software_genre, computer
Published: 2014
Full Text: View/download PDF

17. Underdetermined direction of arrival estimation using acoustic vector sensor

Author: Shengkui Zhao, Tigran Saluev, and Douglas L. Jones
Subjects: Engineering, Underdetermined system, Covariance matrix, business.industry, Direction of arrival, Covariance, Sensor array, Control and Systems Engineering, Signal Processing, Electronic engineering, Identifiability, Computer Vision and Pattern Recognition, Electrical and Electronic Engineering, Sound pressure, business, Triangular array, Algorithm, Software
Abstract: This paper presents a new approach for the estimation of two-dimensional (2D) direction-of-arrival (DOA) of more sources than sensors using an Acoustic Vector Sensor (AVS). The approach is developed based on Khatri-Rao (KR) product by exploiting the subspace characteristics of the time variant covariance matrices of the uncorrelated quasi-stationary source signals. An AVS is used to measure both the acoustic pressure and pressure gradients in a complete sound field and the DOAs are determined in both horizontal and vertical planes. The identifiability of the presented KR-AVS approach is studied in both theoretic analysis and computer simulations. Computer simulations demonstrated that 2D DOAs of six speech sources are successfully estimated. Superior root mean square error (RMSE) is obtained using the new KR-AVS array approach compared to the other geometries of the non-uniform linear array, the 2D L-shape array, and the 2D triangular array.
Published: 2014
Full Text: View/download PDF

18. Multi-surface sliding control for fast finite-time leader-follower consensus with high order SISO uncertain nonlinear agents

Author: Lihua Xie, Zhihong Man, Suiyang Khoo, and Shengkui Zhao
Subjects: Lyapunov stability, Engineering, Variable structure control, business.industry, Mechanical Engineering, General Chemical Engineering, Biomedical Engineering, Aerospace Engineering, Sliding mode control, Industrial and Manufacturing Engineering, Power (physics), Nonlinear system, Control and Systems Engineering, Control theory, Bounded function, Integrator, Graph (abstract data type), Electrical and Electronic Engineering, business
Abstract: SUMMARY In this paper, multi surface sliding cooperative control scheme is presented and new multiple sliding surfaces are proposed. It is proven that, for the setup that each agent is described by a chain of integrators, where the last integrator is perturbed by a bounded disturbance, leader–follower consensus can be achieved on these sliding surfaces if the communication graph has a directed spanning tree. Also, sliding variables can be driven to the sliding surfaces in fast finite time by the nonsmooth control law. The fast finite-time Lyapunov stability theorem, the terminal sliding control technique, and the adding a power integrator design approach are used in our proposed control. Simulation results demonstrate the effectiveness of the proposed scheme. Copyright © 2013 John Wiley & Sons, Ltd.
Published: 2013
Full Text: View/download PDF

19. An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources

Author: Shengkui Zhao, Thi Ngoc Tho Nguyen, Xiong Xiao, Douglas L. Jones, Haizhou Li, and Eng Siong Chng
Subjects: Microphone array, Covariance function, Mean squared error, business.industry, Direction of arrival, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, 030507 speech-language pathology & audiology, 03 medical and health sciences, Expectation–maximization algorithm, 0202 electrical engineering, electronic engineering, information engineering, Mixture distribution, Artificial intelligence, 0305 other medical science, business, Cluster analysis, Eigenvalues and eigenvectors, Mathematics
Abstract: This paper presents an eigenvector clustering approach for estimating the direction of arrival (DOA) of multiple speech signals using a microphone array. Existing clustering approaches usually only use low frequencies to avoid spatial aliasing. In this study, we propose a probabilistic eigenvector clustering approach to use all frequencies. In our work, time-frequency (TF) bins dominated by only one source are first detected using a combination of noise-floor tracking, onset detection and coherence test. For each selected TF bin, the largest eigenvector of its spatial covariance matrix is extracted for clustering. A mixture density model is introduced to model the distribution of the eigenvectors, where each component distribution corresponds to one source and is parameterized by the source DOA. To use eigenvectors of all frequencies, the steering vectors of all frequencies of the sources are used in the distribution function. The DOAs of the sources can be estimated by maximizing the likelihood of the eigenvectors using an expectation-maximization (EM) algorithm. Simulation and experimental results show that the proposed approach significantly improves the root-mean-square error (RMSE) for DOA estimation of multiple speech sources compared to the MUSIC algorithm implemented on the single-source dominated TF bins and our previous clustering approach.
Published: 2016
Full Text: View/download PDF

20. Large region acoustic source mapping: A generalized sparse constrained deconvolution approach

Author: Cagdas Tuna, Thi Ngoc Tho Nguyen, Douglas L. Jones, and Shengkui Zhao
Subjects: 010302 applied physics, Generalized inverse, Speech recognition, Astrophysics::Instrumentation and Methods for Astrophysics, Absolute power, Inverse problem, 01 natural sciences, Noise, Robustness (computer science), 0103 physical sciences, Source localization, Deconvolution, 010301 acoustics, Algorithm, Adaptive beamformer, Mathematics
Abstract: This paper presents a generalized multiple-point sparse constrained deconvolution approach for mapping acoustic noise sources in large regions using a movable array. Extended from our previous MPSC-DAMAS approach, we first derive a generalized inverse problem relating to the source powers and the array manifold using a generic beamformer and an explicit measurement noise model. We then propose a generalized MPSC-DAMAS (GMPSC-DAMAS) approach for resolving the inverse problem. A new parameter setting method based on a multiple-point minimum-variance-distortionless-response (MVDR) beamformer is also presented. The realizations of the GMPSC-DAMAS approach using the delay- and-sum (DAS) beamformer and the MVDR beamformer are evaluated. Simulation results show the proposed GMPSC-DAMAS approach achieves much lower absolute power estimation errors and processing time than the MPSC-DAMAS approach in terms of number of sources and robustness to measurement noise.
Published: 2016
Full Text: View/download PDF

21. A generalized data windowing scheme for adaptive conjugate gradient algorithms

Author: Suiyang Khoo, Shengkui Zhao, and Zhihong Man
Subjects: Recursive least squares filter, Adaptive algorithm, Iterative method, Adaptive filter, Least mean squares filter, Rate of convergence, Control and Systems Engineering, Conjugate gradient method, Signal Processing, Computer Vision and Pattern Recognition, Electrical and Electronic Engineering, Algorithm, Gradient method, Software, Mathematics
Abstract: The performance of the modified adaptive conjugate gradient (CG) algorithms based on the iterative CG method for adaptive filtering is highly related to the ways of estimating the correlation matrix and the cross-correlation vector. The existing approaches of implementing the CG algorithms using the data windows of exponential form or sliding form result in either loss of convergence or increase in misadjustment. This paper presents and analyzes a new approach to the implementation of the CG algorithms for adaptive filtering by using a generalized data windowing scheme. For the new modified CG algorithms, we show that the convergence speed is accelerated, the misadjustment and tracking capability comparable to those of the recursive least squares (RLS) algorithm are achieved. Computer simulations demonstrated in the framework of linear system modeling problem show the improvements of the new modifications.
Published: 2009
Full Text: View/download PDF

22. Stability and Convergence Analysis of Transform-Domain LMS Adaptive Filters With Second-Order Autoregressive Process

Author: Hong Ren Wu, Zhihong Man, Suiyang Khoo, and Shengkui Zhao
Subjects: Least mean squares filter, Adaptive filter, Normalization (statistics), Mathematical optimization, Signal processing, Autoregressive model, Signal Processing, Convergence (routing), Electrical and Electronic Engineering, Algorithm, Stability (probability), Moving-average model, Mathematics
Abstract: In this paper, the stability and convergence properties of the class of transform-domain least mean square (LMS) adaptive filters with second-order autoregressive (AR) process are investigated. It is well known that this class of adaptive filters improve convergence property of the standard LMS adaptive filters by applying the fixed data-independent orthogonal transforms and power normalization. However, the convergence performance of this class of adaptive filters can be quite different for various input processes, and it has not been fully explored. In this paper, we first discuss the mean-square stability and steady-state performance of this class of adaptive filters. We then analyze the effects of the transforms and power normalization performed in the various adaptive filters for both first-order and second-order AR processes. We derive the input asymptotic eigenvalue distributions and make comparisons on their convergence performance. Finally, computer simulations on AR process as well as moving-average (MA) process and autoregressive-moving-average (ARMA) process are demonstrated for the support of the analytical results.
Published: 2009
Full Text: View/download PDF

23. Variable step-size LMS algorithm with a quotient form

Author: Suiyang Khoo, Shengkui Zhao, Zhihong Man, and Hong Ren Wu
Subjects: Computational complexity theory, System identification, Least mean squares filter, Noise, Variable (computer science), Control and Systems Engineering, Signal Processing, Convergence (routing), Computer Vision and Pattern Recognition, Sensitivity (control systems), Electrical and Electronic Engineering, Algorithm, Software, Quotient, Mathematics
Abstract: An improved robust variable step-size least mean square (LMS) algorithm is developed in this paper. Unlike many existing approaches, we adjust the variable step-size using a quotient form of filtered versions of the quadratic error. The filtered estimates of the error are based on exponential windows, applying different decaying factors for the estimations in the numerator and denominator. The new algorithm, called more robust variable step-size (MRVSS), is able to reduce the sensitivity to the power of the measurement noise, and improve the steady-state performance for comparable transient behavior, with negligible increase in the computational cost. The mean convergence, the steady-state performance and the mean step-size behavior of the MRVSS algorithm are studied under a slow time-varying system model, which can be served as guidelines for the design of MRVSS algorithm in practical applications. Simulation results are demonstrated to corroborate the analytic results, and to compare MRVSS with the existing representative approaches. Superior properties of the MRVSS algorithm are indicated.
Published: 2009
Full Text: View/download PDF

24. Comments on 'Adaptive multiple-surface sliding control for non-autonomous systems with mismatched uncertainties'

Author: Suiyang Khoo, Shengkui Zhao, and Zhihong Man
Subjects: Surface (mathematics), Chen, biology, Control and Systems Engineering, Control theory, Mathematical induction, Electrical and Electronic Engineering, Control (linguistics), biology.organism_classification, Time complexity, Mathematics
Abstract: This note points out that the time complexity of the main multiple-surface sliding control (MSSC) algorithm in Huang and Chen [Huang, A. C. & Chen, Y. C. (2004). Adaptive multiple-surface sliding control for non-autonomous systems with mismatched uncertainties. Automatica, 40(11), 1939-1945] is O(2^n). Here, we propose a simplified recursive design MSSC algorithm with time complexity O(n), and, using mathematical induction, we show that this algorithm agrees with this MSSC law.
Published: 2008
Full Text: View/download PDF

25. Learning to estimate reverberation time in noisy and reverberant rooms

Author: Haizhou Li, Douglas L. Jones, Xiong Xiao, Shengkui Zhao, Xionghu Zhong, and Eng Siong Chng
Subjects: Reverberation, business.industry, Computer science, Speech recognition, Deep learning, Deep neural networks, Artificial intelligence, business, Machine learning, computer.software_genre, computer
Published: 2015
Full Text: View/download PDF

26. Large region acoustic source mapping using movable arrays

Author: Thi Ngoc Tho Nguyen, Douglas L. Jones, and Shengkui Zhao
Subjects: Noise, Computer science, Region of interest, Covariance matrix, Acoustics, Speech recognition, Deconvolution, Environmental noise, Scale (map), Power (physics)
Abstract: Mapping environmental noise with high resolution on a large scale (such as a city) is prohibitively expensive with current approaches, which use a large, dense array spanning the entire region of interest, or sequential noise measurements at thousands of locations on a dense grid. We propose instead a new acoustic measurement scheme using a small movable array (for example, mounted on a vehicle driving along the streets of a city) to rapidly acquire measurements at many different locations. A multiple-point sparse constrained deconvolution approach for the mapping of acoustic sources (MPSCDAMAS) and a multiple-point covariance matrix fitting (MPCMF) approach are developed to accurately estimate the locations and powers of stationary noise sources across the region of interest. Computer simulations of large region acoustic mapping demonstrate that superior resolution and much lower power estimation errors are achieved by the proposed approaches compared to the state-of-the-art SC-DAMAS approach and CMF approach.
Published: 2015
Full Text: View/download PDF

27. A learning-based approach to direction of arrival estimation in noisy and reverberant environments

Author: Haizhou Li, Douglas L. Jones, Xiong Xiao, Xionghu Zhong, Shengkui Zhao, and Eng Siong Chng
Subjects: Microphone array, Reverberation, Signal processing, Mean squared error, Artificial neural network, Robustness (computer science), business.industry, Computer science, Direction of arrival, Pattern recognition, Artificial intelligence, Multilateration, business
Abstract: This paper presents a learning-based approach to the task of direction of arrival estimation (DOA) from microphone array input. Traditional signal processing methods such as the classic least square (LS) method rely on strong assumptions on signal models and accurate estimations of time delay of arrival (TDOA) . They only work well in relatively clean conditions, but suffer from noise and reverberation distortions. In this paper, we propose a learning-based approach that can learn from a large amount of simulated noisy and reverberant microphone array inputs for robust DOA estimation. Specifically, we extract features from the generalised cross correlation (GCC) vectors and use a multilayer perceptron neural network to learn the nonlinear mapping from such features to the DOA. One advantage of the learning based method is that as more and more training data becomes available, the DOA estimation will become more and more accurate. Experimental results on simulated data show that the proposed learning based method produces much better results than the state-of-the-art LS method. The testing results on real data recorded in meeting rooms show improved root-mean-square error (RMSE) compared to the LS method.
Published: 2015
Full Text: View/download PDF

28. Frequency-domain beamformers using conjugate gradient techniques for speech enhancement

Author: Shengkui Zhao, Suiyang Khoo, Douglas L. Jones, and Zhihong Man
Subjects: Acoustics and Ultrasonics, Iterative method, Computer science, Acoustics, Estimator, Speech enhancement, symbols.namesake, Arts and Humanities (miscellaneous), Autocorrelation matrix, Lagrange multiplier, Frequency domain, Conjugate gradient method, symbols, Algorithm
Abstract: A multiple-iteration constrained conjugate gradient (MICCG) algorithm and a single-iteration constrained conjugate gradient (SICCG) algorithm are proposed to realize the widely used frequency-domain minimum-variance-distortionless-response (MVDR) beamformers and the resulting algorithms are applied to speech enhancement. The algorithms are derived based on the Lagrange method and the conjugate gradient techniques. The implementations of the algorithms avoid any form of explicit or implicit autocorrelation matrix inversion. Theoretical analysis establishes formal convergence of the algorithms. Specifically, the MICCG algorithm is developed based on a block adaptation approach and it generates a finite sequence of estimates that converge to the MVDR solution. For limited data records, the estimates of the MICCG algorithm are better than the conventional estimators and equivalent to the auxiliary vector algorithms. The SICCG algorithm is developed based on a continuous adaptation approach with a sample-by-sample updating procedure and the estimates asymptotically converge to the MVDR solution. An illustrative example using synthetic data from a uniform linear array is studied and an evaluation on real data recorded by an acoustic vector sensor array is demonstrated. Performance of the MICCG algorithm and the SICCG algorithm are compared with the state-of-the-art approaches.
Published: 2014

29. Underdetermined 2D DOA estimation using Acoustic Vector Sensor

Author: Shengkui Zhao and Douglas L. Jones
Subjects: Engineering, Signal-to-noise ratio, Underdetermined system, business.industry, Acoustics, Identifiability, Acoustic source localization, Covariance, business, Sound pressure, Measure (mathematics), Subspace topology
Abstract: This paper presents an approach for the estimation of two-directional (2D) direction-of-arrival (DOA) of more sources than sensors using an Acoustic Vector Sensor (AVS). The approach is developed based on Khatri-Rao (KR) product by exploiting the subspace characteristics of the time variant covariance matrices of the uncorrelated quasi-stationary source signals. An AVS is used to measure both the acoustic pressure and pressure gradients in a complete sound field and the DOAs are determined in both horizontal and vertical planes. The identifiability of the presented approach is studied using computer simulations. It is demonstrated that the 2D DOAs of six speech sources are successfully determined using the new approach, which is significantly superior over the existing linear array approach.
Published: 2014
Full Text: View/download PDF

30. Robust DOA estimation of multiple speech sources

Author: Nguyen Thi Ngoc Tho, Douglas L. Jones, and Shengkui Zhao
Subjects: Microphone array, Reverberation, business.industry, Speech recognition, Centroid, Pattern recognition, Covariance, Time–frequency analysis, Background noise, Coherence (signal processing), Artificial intelligence, Omnidirectional antenna, business, Mathematics
Abstract: It is challenging to determine the directions of arrival of speech signals when there are fewer sensors than sources, particularly in noisy and reverberant environments. The coherence test by Mohan et al. exploits the time-frequency sparseness of non-stationary speech signals to select more relevant time-frequency bins to estimate directions of arrival. With no prior knowledge about the incoming sources, this work proposes a combination of noise-floor tracking, onset detection and a coherence test to robustly identify time-frequency bins where only one source is dominant. After that, the largest eigenvectors of covariance matrices corresponding to these bins are clustered and the directions of arrival of the sources are estimated based on the cluster centroids. Simulation and experimental results show that this method is able to localize 8 sources with small errors using only 3 omnidirectional microphones. The proposed method is robust to background noise and reverberation.
Published: 2014
Full Text: View/download PDF

31. A New Auxiliary-Vector Algorithm with Conjugate Orthogonality for Speech Enhancement

Author: Douglas L. Jones and Shengkui Zhao
Subjects: Speech enhancement, Adaptive filter, Dimension (vector space), Orthogonality, Microphone, Computer science, Filter (signal processing), Algorithm, Conjugate
Abstract: In this paper, we propose a new auxiliary-vector (AV) algorithm using the conjugate orthogonality for speech enhancement. When only a limited data record is available, the AV algorithm is the state-of-the-art for obtaining the minimumvariance-distortionless (MVDR) filter. However, the current AV algorithms suffer from convergence problems when applied to the speech enhancement. Based on the conjugate GramSchmidt process, we develop new auxiliary vectors that are conjugate orthogonal and apply them to the AV algorithm. The proposed conjugate AV algorithm converges to the optimal MVDR solution within finite steps no greater than the filter dimension. Theoretical analysis establishes formal convergence of the proposed conjugate AV algorithm. Our experiments using the synthetic and real speech data show favorites of the new proposal over the state-of-the-art approaches. Index Terms: speech enhancement, microphone arrays, correlation, convergence, adaptive signal processing
Published: 2014
Full Text: View/download PDF

32. THE NTU-ADSC SYSTEMS FOR REVERBERATION CHALLENGE 2014

Author: Xiao, Xiong, Shengkui Zhao, Hoang, Duc, Nguyen, Ha, Xionghu Zhong, Jones, Douglas L, Eng Siong Chng, and Haizhou Li
Published: 2014
Full Text: View/download PDF

33. Spatialized audio multiparty teleconferencing with commodity miniature microphone array

Author: Minh N. Do, Douglas L. Jones, Shengkui Zhao, Tien Dung Vu, and Viet-Anh Nguyen
Subjects: Microphone array, 3D sound localization, Videoconferencing, Audio signal, Multimedia, Computer science, Teleconference, Acoustic source localization, computer.software_genre, computer, Digital audio
Abstract: This paper presents a Spatialized Audio Multiparty Teleconferencing (SAMT) system with a radically new communication experience for group teleconferencing. The system includes our recently developed 3D audio technologies: 3D sound source localization (SSL) and 3D audio capture and reproduction using a low-cost and compact design microphone array. In essence, the SAMT system offers 3D audio capture capability and spatial audio perception with multiple participants at a site, which still falls short in teleconferencing solutions. In addition to being able to identify and automatically track the active speaker, the system allows more compelling visual presentation for effective communication. Requiring only a low-cost microphone array and a consumer depth camera, the proposed system runs reliably and comfortably in real time on a commodity laptop or desktop PC. With such a minimal deployment requirement, we present a variety of user experiences created by SAMT.
Published: 2013
Full Text: View/download PDF

34. Sparse tomographic acoustic imaging for environmental noise-source mapping

Author: Cagdas Tuna, Shengkui Zhao, Thi Ngoc Tho Nguyen, and Douglas L. Jones
Subjects: education.field_of_study, Microphone array, Acoustics and Ultrasonics, Computer science, Microphone, Acoustics, Population, Noise, Arts and Humanities (miscellaneous), Computer Science::Sound, education, Sound pressure, Environmental noise, Optoacoustic imaging
Abstract: Environmental noise has become a major problem in large cities, increasing health risks for the urban population. Current methods are overly expensive for city-wide noise-monitoring as they generally require dense deployment of fixed microphones for noise-level measurements. We present here alternative sparse tomographic acoustic imaging techniques using arrays with relatively few number of microphones for large-region acoustic noise mapping. We first demonstrate that the locations and sound pressure levels of fixed noise sources sparsely located in a large field are recovered by collecting acoustic data at multiple locations with a portable microphone array for tomographic reconstruction. We then introduce a nonstationary tomographic imaging approach using fixed microphone arrays, which can also capture the intermittent changes in the acoustic field due to transient and/or moving noise sources. We test both the sparse static and dynamic imaging models with acoustic measurements collected with a circular ...
Published: 2016
Full Text: View/download PDF

35. A fast-converging adaptive frequency-domain MVDR beamformer for speech enhancement

Author: Shengkui Zhao and Douglas L. Jones
Subjects: Speech enhancement, Adaptive filter, Microphone array, Covariance matrix, Computer science, Speech recognition, Frequency domain, Gradient descent, Adaptive beamformer
Abstract: In this paper, we present a fast-converging adaptive frequency-domain minimum-variance-distortionlessresponse (MVDR) beamformer (FMV) for speech enhancement. The well-known FMV solution is optimum in the microphone array processing. However, the direct computation of the optimum FMV solution is often undesirable due to the the inversion of the spatio-spectral correlation matrix which is often unstable and is expensive for large arrays. To avoid the matrix inversion, we develop a fast-converging conjugate gradient (CG) algorithm for iteratively computing the FMV solution. Compared to the existing steepest descent (SD) algorithm, the CG algorithm can dramatically improve the convergence speed for the case of multiple interfering signals in speech enhancement. Therefore, the computational load and processing time can be significantly reduced. The speech enhancement experiments using a four-channel acousticvector-sensor (AVS) microphone array are demonstrated for the target speech signal corrupted by two and five interfering speech signals and superior performance are achieved.
Published: 2012
Full Text: View/download PDF

36. Adaptive data based neural network leader-follower control of multi-agent networks

Author: Shengkui Zhao, Juliang Yin, Suiyang Khoo, Bin Wang, and Zhihong Man
Subjects: Computer Science::Multiagent Systems, Nonlinear system, Approximation theory, Adaptive control, Artificial neural network, Control theory, Computer science, Multi-agent system, Fourier series
Abstract: In this paper, we propose a data based neural network leader-follower control for multi-agent networks where each agent is described by a class of high-order uncertain nonlinear systems with input perturbation. The control laws are developed using multiple-surface sliding control technique. In particular, novel set of sliding variables are proposed to guarantee leader-follower consensus on the sliding surfaces. Novel switching is proposed to overcome the unavailability of instantaneous control output from the neighbor. By utilizing RBF neural network and Fourier series to approximate the unknown functions, leader-follower consensus can be reached, under the condition that the dynamic equations of all agents are unknown. An O(n) data based algorithm is developed, using only the network's measurable input/output data to generate the distributed virtual control laws. Simulation results demonstrate the effectiveness of the approach.
Published: 2011
Full Text: View/download PDF

37. Nonlinear image restoration using recurrent radial basis function network

Author: Jianfei Cai, Zhihong Man, and Shengkui Zhao
Subjects: Radial basis function network, Computational complexity theory, Pixel, business.industry, Pattern recognition, Computer Science::Computational Geometry, Nonlinear system, Distortion, Feedforward neural network, Computer vision, Artificial intelligence, Noise (video), business, Image restoration, Mathematics
Abstract: For nonlinear distorted images, the performance of the existing image restoration methods is limited in either visual quality or computational complexity. In this paper, we apply the recently developed technique called recurrent radial basis function network (RBFN) for nonlinear image restoration. We give the details of the construction of the recurrent RBFN network and the determination of the network parameters. Simulation results show that the proposed recurrent RBFN scheme outperforms the existing RBFN based methods in both visual quality and complexity when the degraded process is recursive.
Published: 2010
Full Text: View/download PDF

38. Observer-based robust finite-time cooperative consensus control for multi-agent networks

Author: Lihua Xie, Zhihong Man, Shengkui Zhao, and Suiyang Khoo
Subjects: Computer Science::Multiagent Systems, Observer (quantum physics), Computer science, Control theory, Control system, Terminal sliding mode, Mobile robot, State observer, Robust control, Network topology, Sliding mode control
Abstract: This paper studies the finite-time consensus tracking control for multi-agent networks. The time-varying control input and the velocity of the leader is unknown to any follower. Only the position of the leader is known to its neighbors. We first propose a new finite-time multiple-surface sliding mode observer to estimate the leader's velocity. It is seen that the estimation error of the observer can converge to zero in a finite time. Then, we prove that finite-time consensus tracking of multi-agent networks can be achieved on a new terminal sliding mode surface. Simulation results are presented to validate the analysis.
Published: 2009
Full Text: View/download PDF

39. A class of modified variable step-size NLMS algorithms for system identification

Author: Suiyang Khoo, Shengkui Zhao, and Zhihong Man
Subjects: Least mean squares filter, Adaptive filter, Noise, Rate of convergence, Computer Science::Sound, Convergence (routing), System identification, Algorithm design, Root-mean-square deviation, Algorithm, Mathematics
Abstract: This paper proposes a class of modified variable step-size normalized least mean square (VS NLMS) algorithms. The class of schemes are obtained from estimating the optimum step-size of NLMS that minimizes the mean square deviation (MSD). During the estimation, we consider the properties of the additive noise and the input excitation together. The developed class of VS NLMS algorithms have simple forms and give improved tradeoff of fast convergence rate and low misadjustment in system identification.
Published: 2009
Full Text: View/download PDF

40. Adaptive fast finite-time multiple-surface sliding control for a class of uncertain non-linear systems

Author: Shengkui Zhao, Suiyang Khoo, and Zhihong Man
Subjects: Nonlinear system, Variable structure control, Adaptive control, Control theory, Applied Mathematics, Modeling and Simulation, Bounded function, Integrator, Stability (learning theory), Sliding mode control, Computer Science Applications, Control-Lyapunov function, Mathematics
Abstract: This paper concerns the adaptive fast finite-time multiple-surface sliding control (AFFTMSSC) problem for a class of high-order uncertain non-linear systems of which the upper bounds of the system uncertainties are unknown. By using the fast control Lyapunov function and the method of so-called adding a power integrator merging with adaptive technique, a recursive design procedure is provided, which guarantees the fast finite-time stability of the closed-loop system. Further, it is proved that the control input is bounded.
Published: 2012
Full Text: View/download PDF

41. Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

Author: Xiong Xiao, Xionghu Zhong, Eng Siong Chng, Shengkui Zhao, Duc Hoang Ha Nguyen, Haizhou Li, Douglas L. Jones, School of Computer Engineering, and Temasek Laboratories
Subjects: Reverberation, Computer science, Speech recognition, Feature vector, Speech enhancement, 02 engineering and technology, 030507 speech-language pathology & audiology, 03 medical and health sciences, Distortion, Beamforming, Deep neural networks, 0202 electrical engineering, electronic engineering, information engineering, Reverberation challenge, business.industry, 020206 networking & telecommunications, Pattern recognition, Speech corpus, Dynamic features, Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing), Constraint (information theory), Computer Science::Sound, Feature adaptation, Artificial intelligence, Robust speech recognition, 0305 other medical science, business, Energy (signal processing)
Abstract: This paper investigates deep neural networks (DNN) based on nonlinear feature mapping and statistical linear feature adaptation approaches for reducing reverberation in speech signals. In the nonlinear feature mapping approach, DNN is trained from parallel clean/distorted speech corpus to map reverberant and noisy speech coefficients (such as log magnitude spectrum) to the underlying clean speech coefficients. The constraint imposed by dynamic features (i.e., the time derivatives of the speech coefficients) are used to enhance the smoothness of predicted coefficient trajectories in two ways. One is to obtain the enhanced speech coefficients with a least square estimation from the coefficients and dynamic features predicted by DNN. The other is to incorporate the constraint of dynamic features directly into the DNN training process using a sequential cost function. In the linear feature adaptation approach, a sparse linear transform, called cross transform, is used to transform multiple frames of speech coefficients to a new feature space. The transform is estimated to maximize the likelihood of the transformed coefficients given a model of clean speech coefficients. Unlike the DNN approach, no parallel corpus is used and no assumption on distortion types is made. The two approaches are evaluated on the REVERB Challenge 2014 tasks. Both speech enhancement and automatic speech recognition (ASR) results show that the DNN-based mappings significantly reduce the reverberation in speech and improve both speech quality and ASR performance. For the speech enhancement task, the proposed dynamic feature constraint help to improve cepstral distance, frequency-weighted segmental signal-to-noise ratio (SNR), and log likelihood ratio metrics while moderately degrades the speech-to-reverberation modulation energy ratio. In addition, the cross transform feature adaptation improves the ASR performance significantly for clean-condition trained acoustic models. Published version
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

41 results on '"Shengkui Zhao"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources