48 results on '"Zhenyang Wu"'
Search Results
2. ROLE OF CYTOKINES AND CORRELATED CELLS IN HOST IMMUNITY AGAINST NEMATODE IN RUMINANTS
- Author
-
Muhammad Awais, Sumayyah Qadri, Muhammad Naveed, Zhenyang Wu, Abdur Rahman Ansari, Akhtar Rasool Asif, Tongren Univesity, Xiaohui Tang, Shakeel Ahmed, Xiaoyong Du, and Ruheena Javed
- Subjects
0301 basic medicine ,business.industry ,Host (biology) ,Parasitism ,Virulence ,Drug resistance ,Biology ,biology.organism_classification ,Microbiology ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,Nematode ,Immunity ,Immunology ,Parasite hosting ,Livestock ,business ,030215 immunology - Abstract
Nematode contagion is a core impediment to the profitable dairy production for livestock farms. Gastrointestinal (GI) parasitism causes weight loss and low milk production, along with high mortality in sheep and goat. The current intensive anthelmintics therapies to control parasitic burden resulted in the appearance of drug resistant parasitic strains. Due to high demand of unadulterated animal products free from drugs motivated alternative strategies for improvement; including breeding plans for parasite control in ruminants. The increase of protective resistance to nematode infections, host genome show particular expression that is frequently confused by mechanisms concurrently essential to control multiple nematodes species as well as protozoan ecto-parasites, viral and microbial pathogens. The involved molecular mechanisms under these developments correspond to crucial steps toward improvement of efficient new parasite control strategies. Knowledge of various immunity methods of host and regulation of development of parasite, physiology, and virulence is able to identify the objective of parasite control. This review recapitulate current evolution and restrictions of optimistic regulatory biological pathways and genetic networks that concern with susceptibility and host resistance to infection of GI nematode in ruminants.
- Published
- 2016
- Full Text
- View/download PDF
3. Design of a Network Sensing System Based on Android Platform
- Author
-
Dengyin Zhang, En Tong, Zhenyang Wu, and Fei Ding
- Subjects
Smart phone ,Computer science ,Wireless network ,business.industry ,ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS ,Local area network ,020206 networking & telecommunications ,02 engineering and technology ,Embedded system ,0202 electrical engineering, electronic engineering, information engineering ,Software design ,Wireless ,020201 artificial intelligence & image processing ,Android (operating system) ,business ,Sensing system ,Mobile device - Abstract
In recent years, the scale of the wireless network of communication operators has been expanding. Various WLAN (Wireless Local Area Networks) hotspots are increasing and the application scenarios are becoming more diverse. The traditional WLAN network testing has limitations of cost, application scenarios, and sampling points. With the rapid popularization of smart phones, their functions are becoming increasingly powerful and intelligent. As a simple and portable functional mobile device, the development of the smart phone provides a better platform for portable wireless detection. In this paper, a software design and implementation method of WLAN sensing application based on an Android intelligent terminal is proposed. The developed sensing APP client implements the WLAN hotspot network quality testing and technology validation, providing a low-cost and efficient way to realize the development of the nationwide wireless network perception.
- Published
- 2018
- Full Text
- View/download PDF
4. Object tracking via collaborative multi-task learning and appearance model updating
- Author
-
Tongchi Zhou, Lin Zhou, Nijun Li, Zhenyang Wu, and Xu Cheng
- Subjects
Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Multi-task learning ,Pattern recognition ,Sparse approximation ,Object (computer science) ,Tracking (particle physics) ,Active appearance model ,Discriminative model ,Video tracking ,Eye tracking ,Computer vision ,Artificial intelligence ,business ,Software - Abstract
We propose a tracking method using both overall information of object and local representation.The object is divided into m overlapped patches to improve tracking performance in background clutter and partial occlusions.We take the prior information into account for avoiding ambiguity in discriminative model.The dictionary is updated by Metropolis-Hastings to capture the appearance changes.We achieve comparable performances with other methods on challenging sequences. In this paper, we propose a novel visual tracking algorithm using the collaboration of generative and discriminative trackers under the particle filter framework. Each particle denotes a single task, and we encode all the tasks simultaneously in a structured multi-task learning manner. Then, we implement generative and discriminative trackers, respectively. The discriminative tracker considers the overall information of object to represent the object appearance; while the generative tracker takes the local information of object into account for handling partial occlusions. Therefore, two models are complementary during the tracking. Furthermore, we design an effective dictionary updating mechanism. The dictionary is composed of fixed and variational parts. The variational parts are progressively updated using Metropolis-Hastings strategy. Experiments on different challenging video sequences demonstrate that the proposed tracker performs favorably against several state-of-the-art trackers.
- Published
- 2015
- Full Text
- View/download PDF
5. Single-channel Speech Separation Using Dictionary-updated Orthogonal Matching Pursuit and Temporal Structure Information
- Author
-
Guo Haiyan, Xiaoxiong Li, Zhenyang Wu, and Lin Zhou
- Subjects
K-SVD ,Channel (digital image) ,Computer science ,business.industry ,Applied Mathematics ,Frame (networking) ,Pattern recognition ,Sparse approximation ,Matching pursuit ,Matrix decomposition ,Signal Processing ,Time domain ,Limit (mathematics) ,Artificial intelligence ,business - Abstract
In this paper, we propose a two-stage sparse decomposition-based method for single-channel speech separation in time domain. First, we propose a Dictionary-updated orthogonal matching pursuit (DUOMP) algorithm which is used in both separation stages. In the proposed DUOMP algorithm, all atoms of each source-specific dictionary are updated by subtracting off the current approximation of each source to the original atoms. It is proved that the DUOMP algorithm can limit the separated sources within a region where they are uncorrelated in statistical sense more quickly. Then, we propose an adaptive dictionary generation method followed by a frame labeling method to perform a second-stage separation on the mixed frames having certain temporal structure. Experiments show that the proposed method outperforms a separation method using sparse non-negative matrix factorization (SNMF), a separation method using OMP and a source-filter-based method using pitch information in overall. Additionally, what affects the performance of the proposed method is also shown.
- Published
- 2015
- Full Text
- View/download PDF
6. Recognizing human interactions by genetic algorithm-based random forest spatio-temporal correlation
- Author
-
Xu Cheng, Nijun Li, Zhenyang Wu, and Haiyan Guo
- Subjects
Computer science ,business.industry ,Reliability (computer networking) ,020207 software engineering ,Pattern recognition ,Context (language use) ,02 engineering and technology ,Machine learning ,computer.software_genre ,Random forest ,Correlation ,Artificial Intelligence ,Feature (computer vision) ,Pattern recognition (psychology) ,Genetic algorithm ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Point (geometry) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer - Abstract
Recognizing human interactions is a more challenging task than recognizing single person activities and has attracted much attention of the computer vision community. This paper proposes an innovative and effective way to recognize human interactions, which incorporates the advantages of both global motion context (MC) feature and spatio-temporal (S-T) correlation of local spatio-temporal interest point feature. The MC feature is used to train a random forest where genetic algorithm (GA) is applied to the training phase to achieve a good compromise between reliability and efficiency. Besides, we propose S-T correlation-based match, where MC's structure and Needleman---Wunsch algorithm are used to calculate the spatial and temporal correlation score of two videos, respectively. Experiments on the UT-Interaction dataset show that our approaches outperform other prevalent machine learning methods, and that the combination of GA search-based random forest and S-T correlation achieves the state-of-the-art performance.
- Published
- 2015
- Full Text
- View/download PDF
7. Multi-Task Object Tracking with Feature Selection
- Author
-
Xu Cheng, Nijun Li, Lin Zhou, Zhenyang Wu, and Tongchi Zhou
- Subjects
business.industry ,Computer science ,Applied Mathematics ,Multi-task learning ,Feature selection ,Pattern recognition ,Sparse approximation ,Computer Graphics and Computer-Aided Design ,Task (project management) ,Feature (computer vision) ,Video tracking ,Signal Processing ,Eye tracking ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Feature learning - Published
- 2015
- Full Text
- View/download PDF
8. Realistic human action recognition by Fast HOG3D and self-organization feature map
- Author
-
Zhenyang Wu, Xu Cheng, Suofei Zhang, and Nijun Li
- Subjects
Self-organization ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Local feature descriptor ,Machine learning ,computer.software_genre ,Computer Science Applications ,Support vector machine ,Task (computing) ,Hardware and Architecture ,Pattern recognition (psychology) ,Feature (machine learning) ,Preprocessor ,Action recognition ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Software - Abstract
Nowadays, local features are very popular in vision-based human action recognition, especially in "wild" or unconstrained videos. This paper proposes a novel framework that combines Fast HOG3D and self-organization feature map (SOM) network for action recognition from unconstrained videos, bypassing the demanding preprocessing such as human detection, tracking or contour extraction. The contributions of our work not only lie in creating a more compact and computational effective local feature descriptor than original HOG3D, but also lie in first successfully applying SOM to realistic action recognition task and studying its training parameters' influence. We mainly test our approach on the UCF-YouTube dataset with 11 realistic sport actions, achieving promising results that outperform local feature-based support vector machine and are comparable with bag-of-words. Experiments are also carried out on KTH and UT-Interaction datasets for comparison. Results on all the three datasets confirm that our work has comparable, if not better, performance comparing with state-of-the-art.
- Published
- 2014
- Full Text
- View/download PDF
9. Robust Visual Tracking with SIFT Features and Fragments Based on Particle Swarm Optimization
- Author
-
Xu Cheng, Suofei Zhang, Nijun Li, and Zhenyang Wu
- Subjects
Fitness function ,Computer science ,business.industry ,Applied Mathematics ,Particle swarm optimization ,Scale-invariant feature transform ,RANSAC ,Tracking (particle physics) ,Active appearance model ,Feature (computer vision) ,Signal Processing ,Eye tracking ,Computer vision ,Artificial intelligence ,business - Abstract
We propose a novel approach for visual tracking based on a particle swarm optimization (PSO) framework using SIFT feature points correspondence and multiple fragments in a candidate target region to cope with the problems of partial occlusions, illumination changes, and large motion changes of the tracked target. Firstly, optimal search in the successive frame tracking process is performed by the PSO algorithm, which guides all particles towards the global optima state based on a fitness function. Then, the SIFT feature information is integrated into the iterative results of PSO to acquire a more accurate tracking state. Secondly, we present an effective appearance model updating criterion, which evaluates which fragments in appearance model need updating at each frame. However, the fragments with occluded parts or low quality measure values are not updated. The method for updating appearance model is introduced to improve the tracking performance. Compared with state-of-the-art algorithms, the proposed method can still stably track the target during the course of long-term partial occlusions using superior fragments of tracked target. The experiment results demonstrate the effectiveness of our algorithm in complex environments where the target object undergoes partial occlusions and large changes in pose and illumination.
- Published
- 2013
- Full Text
- View/download PDF
10. Adaptive object detection by implicit sub-class sharing features
- Author
-
Nijun Li, Zhenyang Wu, Xu Cheng, and Suofei Zhang
- Subjects
Boosting (machine learning) ,Computer science ,business.industry ,Machine learning ,computer.software_genre ,Boosting methods for object categorization ,Object detection ,Method ,Control and Systems Engineering ,Signal Processing ,Viola–Jones object detection framework ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Cluster analysis ,computer ,Software - Abstract
In this paper, we describe an adaptive object detection system based on boosted weak classifiers. We formulate the learning of object detection as to find the representative sharing features over the examples of object. Objects with low intra-class variation imply that more generic features for detection can be selected. In contrast, more specific features for only subset of examples should be encouraged to gain an elegant representation for the object with high intra-class variation. In this spirit, we implement an implicit partition over positive examples to obtain a data-driven clustering. Based on the implicit partition, we encourage an intra-class and inter-class feature sharing between sub-categories and build an adaptive hierarchical cascade of weak classifiers. Experimental results prove that the intra-class sharing makes an adaptive trade-off between performance and efficiency on various object detection tasks. The inter-class sharing allows objects to borrow semantic information from related well labeled objects. We hope that the proposed implicit feature sharing over sub-categories can extend the application of traditional boosting methods.
- Published
- 2013
- Full Text
- View/download PDF
11. Maximum likelihood subband polynomial regression for robust speech recognition
- Author
-
Yong Lü and Zhenyang Wu
- Subjects
Polynomial regression ,Polynomial ,Acoustics and Ultrasonics ,business.industry ,Speech recognition ,Pattern recognition ,Adaptation (eye) ,Function (mathematics) ,Weighting ,ComputingMethodologies_PATTERNRECOGNITION ,Computer Science::Multimedia ,Cepstrum ,Discrete cosine transform ,Artificial intelligence ,Hidden Markov model ,business ,Mathematics - Abstract
In this paper, we propose a model adaptation algorithm based on maximum likelihood subband polynomial regression (MLSPR) for robust speech recognition. In this algorithm, the cepstral mean vectors of prior trained hidden Markov models (HMMs) are converted to the log-spectral domain by the inverse discrete cosine transform (DCT) and each log-spectral mean vector is divided into several subband vectors. The relationship between the training and testing subband vectors is approximated by a polynomial function. The polynomial coefficients are estimated from adaptation data using the expectation–maximization (EM) algorithm under the maximum likelihood (ML) criterion. The experimental results show that the proposed MLSPR algorithm is superior to both the maximum likelihood linear regression (MLLR) adaptation and maximum likelihood subband weighting (MLSW) approach. In the MLSPR adaptation, only a very small amount of adaptation data is required and therefore it is more useful for fast model adaptation.
- Published
- 2013
- Full Text
- View/download PDF
12. Sound source localization based on discrimination of cross-correlation functions
- Author
-
Zhenyang Wu and Xinwang Wan
- Subjects
Reverberation ,Engineering ,Microphone array ,Acoustics and Ultrasonics ,Cross-correlation ,Microphone ,business.industry ,Speech recognition ,Pattern recognition ,Ranging ,Acoustic source localization ,Speech enhancement ,Naive Bayes classifier ,Computer Science::Sound ,Artificial intelligence ,business - Abstract
Sound source localization plays a crucial role in many microphone arrays application, ranging from speech enhancement to human–computer interface in a reverberant noisy environment. The steered response power (SRP) using the phase transform (SRP-PHAT) method is one of the most popular modern localization algorithms. The SRP-based source localizers have been proved robust, however, the methods may fail to locate the sound source in adverse noise and reverberation conditions, especially when the direct paths to the microphones are unavailable. This paper proposes a localization algorithm based on discrimination of cross-correlation functions. The cross-correlation functions are calculated by the generalized cross-correlation phase transform (GCC-PHAT) method. Using cross-correlation functions, sound source location is estimated by one of the two classifiers: Naive-Bayes classifier and Euclidean distance classifier. Simulation results have demonstrated that the proposed algorithms provide higher localization accuracy than the SRP-PHAT algorithm in reverberant noisy environment.
- Published
- 2013
- Full Text
- View/download PDF
13. Exploring encoding and normalization methods on probabilistic latent semantic analysis model for action recognition
- Author
-
Zhenyang Wu, Tongchi Zhou, Qinjun Xu, and Lin Zhou
- Subjects
Topic model ,Normalization (statistics) ,Probabilistic latent semantic analysis ,business.industry ,Computer science ,Pattern recognition ,02 engineering and technology ,Machine learning ,computer.software_genre ,Activity recognition ,0202 electrical engineering, electronic engineering, information engineering ,Action recognition ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer - Abstract
Topic models have been wildly applied in the field of computer vision, through which superior performance was yielded in various recognizing tasks. Among them, probabilistic latent semantic analysis model has earned much attention due to its simplicity and effect. But the affection of encoding and normalization methods on topic models has been ignored during the period. This paper explores the impact of encoding methods combined with different normalization on probabilistic latent semantic analysis model in the context of action classification in videos. Detailed experiments are conducted on KTH and UT-interaction datasets. The results show that an appropriate combination of encoding and normalization methods could significantly improve the performance of probabilistic latent semantic analysis model. The recognition accuracy reachs 96.44% and 93.33% on UT-interaction set1 and set2 respectively, which outperforms the state-of-the-art. Especially, we obtain 94.24% on UT-interaction set1 using sparse STIPs.
- Published
- 2016
- Full Text
- View/download PDF
14. A New Algorithm for Image Registration and Super-Resolution Reconstruction
- Author
-
Jianpo Gao, Hao Yang, and Zhenyang Wu
- Subjects
business.industry ,Computer science ,Image registration ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Superresolution - Published
- 2011
- Full Text
- View/download PDF
15. Improved steered response power method for sound source localization based on principal eigenvector
- Author
-
Zhenyang Wu and Xinwang Wan
- Subjects
Microphone array ,Engineering ,Acoustics and Ultrasonics ,Microphone ,Covariance matrix ,business.industry ,Acoustics ,Ranging ,Acoustic source localization ,Noise ,Power iteration ,business ,Algorithm ,Eigenvalues and eigenvectors - Abstract
Sound source localization is essential in many microphone arrays application, ranging from teleconferencing systems to artificial perception in a reverberant noisy environment. The steered response power (SRP) using the phase transform (SRP-PHAT) source localization algorithm has been proved robust, however, the performance of the SRP-PHAT algorithm degrades in highly reverberant noisy environment. Though the SRP-based maximum likelihood localizers are more robust than SRP-PHAT, they have the drawback of requiring noise variance to be estimated in a silent room. This paper presents an improved SRP-PHAT algorithm based on principal eigenvector. Sound source location is estimated from the principal eigenvector computed from the frequency-domain correlation matrix. Using both simulated and real data, we show that the proposed algorithm achieves higher source localization accuracy compared to the SRP-PHAT algorithm.
- Published
- 2010
- Full Text
- View/download PDF
16. Robust speech recognition using improved vector taylor series algorithm for embedded systems
- Author
-
Zhenyang Wu, Haiyang Wu, and Yong Lu
- Subjects
Computational complexity theory ,business.industry ,Computer science ,Iterative method ,Speech recognition ,Pattern recognition ,Variance (accounting) ,Mixture model ,symbols.namesake ,Noise ,Computer Science::Sound ,Embedded system ,Media Technology ,Taylor series ,symbols ,Feature (machine learning) ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Hidden Markov model ,Algorithm - Abstract
This paper proposes a novel robust speech recognition technique using improved vector Taylor series (VTS) algorithm for embedded systems. It uses a hidden Markov model (HMM) to replace the Gaussian mixture model (GMM) for estimating the clean speech feature, and gives the closed-form solutions of the noise parameters including the mean and variance at each expectation-maximization (EM) iteration. The experimental results show that the proposed algorithm makes a good balance between the computational complexity and recognition accuracy, and thus is more useful for embedded systems.
- Published
- 2010
- Full Text
- View/download PDF
17. Accelerated steered response power method for sound source localization using orthogonal linear array
- Author
-
Zhenyang Wu, Shikui Wang, and Weiping Cai
- Subjects
Microphone array ,Engineering ,Acoustics and Ultrasonics ,Microphone ,business.industry ,Computation ,Direction of arrival ,Acoustic source localization ,Space (mathematics) ,Task (computing) ,Power iteration ,Electronic engineering ,business ,Algorithm - Abstract
In microphone arrays application, it is a difficult task to accurately and fast localize sound source in a noisy, reverberant environment. In order to solve this problem, many approaches have been presented. Among them, the steered response power-phase transform weighted (SRP–PHAT) source localization algorithm has been proved robust. However, SRP–PHAT requires high computation cost for searching a large location space. To overcome this shortcoming, an improved SRP–PHAT will be presented that reduces a two-dimension searching space into a couple of one-dimension ones by using an orthogonal linear array. In this method, the parameters of direction of arrival (DOA) are separated. The main computation can be carried out independently in two one-dimension spaces, thus the computational load will be greatly cut down. Simulations show that there is no loss in accuracy in the proposed method.
- Published
- 2010
- Full Text
- View/download PDF
18. A novel fast moving object contour tracking algorithm
- Author
-
Zhenyang Wu, Guo-Cheng An, and Hao Yang
- Subjects
Motion compensation ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Tracking (particle physics) ,Object (computer science) ,Maxima and minima ,Position (vector) ,Computer Science::Computer Vision and Pattern Recognition ,Computer vision ,Node (circuits) ,Mean-shift ,Artificial intelligence ,Electrical and Electronic Engineering ,Particle filter ,business ,Algorithm ,Mathematics - Abstract
If a somewhat fast moving object exists in a complicated tracking environment, snake’s nodes may fall into the inaccurate local minima. We propose a mean shift snake algorithm to solve this problem. However, if the object goes beyond the limits of mean shift snake module operation in successive sequences, mean shift snake’s nodes may also fall into the local minima in their moving to the new object position. This paper presents a motion compensation strategy by using particle filter; therefore a new Particle Filter Mean Shift Snake (PFMSS) algorithm is proposed which combines particle filter with mean shift snake to fulfill the estimation of the fast moving object contour. Firstly, the fast moving object is tracked by particle filter to create a coarse position which is used to initialize the mean shift algorithm. Secondly, the whole relevant motion information is used to compensate the snake’s node positions. Finally, snake algorithm is used to extract the exact object contour and the useful information of the object is fed back. Some real world sequences are tested and the results show that the novel tracking method have a good performance with high accuracy in solving the fast moving problems in cluttered background.
- Published
- 2009
- Full Text
- View/download PDF
19. HRTF personalization based on artificial neural network in individual virtual auditory space
- Author
-
Lin Zhou, Hongmei Hu, Zhenyang Wu, and Hao Ma
- Subjects
Sound localization ,Engineering ,Acoustics and Ultrasonics ,Mean squared error ,Artificial neural network ,business.industry ,Speech recognition ,Pattern recognition ,Interaural time difference ,Virtual reality ,Head-related transfer function ,Transfer function ,Set (abstract data type) ,Artificial intelligence ,business - Abstract
The synthesis of individual virtual auditory space (VAS) is an important and challenging task in virtual reality. One of the key factors for individual VAS is to obtain a set of individual head related transfer functions (HRTFs). A customization method based on back-propagation (BP) artificial neural network (ANN) is proposed to obtain an individual HRTF without complex measurement. The inputs of the neural network are the anthropometric parameters chosen by correlation analysis and the outputs are the characteristic parameters of HRTFs together with the interaural time difference (ITD). Objective simulation experiments and subjective sound localization experiments are implemented to evaluate the performance of the proposed method. Experiments show that the estimated non-individual HRTF has small mean square error, and has similar perception effect to the corresponding one obtained from the database. Furthermore, the localization accuracy of personalized HRTF is increased compared to the non-individual HRTF.
- Published
- 2008
- Full Text
- View/download PDF
20. Blur Identification and Image Super-Resolution Reconstruction Using an Approach Similar to Variable Projection
- Author
-
Zhenyang Wu, Hao Yang, and Jianpo Gao
- Subjects
Blind deconvolution ,Point spread function ,business.industry ,Applied Mathematics ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Lanczos algorithm ,Iterative reconstruction ,Computer Science::Computer Vision and Pattern Recognition ,Signal Processing ,Projection method ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Projection (set theory) ,Image resolution ,Image restoration ,Mathematics - Abstract
Super-resolution reconstruction (SRR) produces a high-resolution image from multiple low-resolution images. Many image SRR algorithms assume that the blurring process, i.e., point spread function (PSF) of the imaging system is known in advance. However, the blurring process is not known or is known only to within a set of parameters in many practical applications. In this letter, we propose an approach for solving the joint blur identification and image SRR based on the principle similar to the variable projection method. The approach can avoid some shortcomings of cyclic coordinate descent optimization procedure. We also propose an efficient implementation based on Lanczos algorithm and Gauss quadrature theory. Experimental results are presented to demonstrate the effectiveness of our method.
- Published
- 2008
- Full Text
- View/download PDF
21. Combination of pitch synchronous analysis and fisher criterion for speaker identification
- Author
-
Yumin Zeng and Zhenyang Wu
- Subjects
business.industry ,Speech recognition ,Feature vector ,Pattern recognition ,Mixture model ,Identification (information) ,Computer Science::Sound ,Cepstrum ,Perceptual linear predictive ,Fisher criterion ,Speaker identification ,Artificial intelligence ,Mel-frequency cepstrum ,Electrical and Electronic Engineering ,business ,Mathematics - Abstract
A novel text independent speaker identification system is proposed. In the proposed system, the 12-order perceptual linear predictive cepstrum and their delta coefficients in the span of five frames are extracted from the segmented speech based on the method of pitch synchronous analysis. The Fisher ratios of the original coefficients then be calculated, and the coefficients whose Fisher ratios are bigger are selected to form the 13-dimensional feature vectors of speaker. The Gaussian mixture model is used to model the speakers. The experimental results show that the identification accuracy of the proposed system is obviously better than that of the systems based on other conventional coefficients like the linear predictive cepstral coefficients and the Mel-frequency cepstral coefficients.
- Published
- 2007
- Full Text
- View/download PDF
22. A study of relative motion point trajectories for action recognition
- Author
-
Nijun Li, Qinjun Xu, Zhenyang Wu, Xu Cheng, Tongchi Zhou, and Lin Zhou
- Subjects
Multiple kernel learning ,Constant of motion ,Orientation (computer vision) ,business.industry ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Quarter-pixel motion ,Motion field ,Match moving ,Motion estimation ,Computer vision ,Artificial intelligence ,business ,Mathematics - Abstract
Trajectories extracted by previous methods for human action recognition contain irrelevant changes, and the Orientation-Magnitude descriptors of their shapes lack the robustness to camera motion. To solve these problems, action recognition by tracking salient relative motion points is proposed in this paper. Firstly, motion boundary detector which suppresses the camera constant motion is utilized to extract motion features. After processing the detected boundaries by the adaptive threshold, the super-pixels that contain salient points are defined as relative motion regions. Then tracking the points within super-pixels is to generate trajectories. For the trajectory shape, the pre-defined orientation assignments with coarse-to-fine quantization levels are used to produce orientation statistics. Finally, the descriptors of oriented gradient, motion boundary, oriented statistic and their combination are adopted to represent action videos, respectively. On the benchmark KTH and UCF-sports action datasets, experimental results show that the extracted trajectories can describe the movement process of object. Compared with the conventional algorithms, our method with multiple kernel learning obtains good performance.
- Published
- 2015
- Full Text
- View/download PDF
23. Action recognition by Huffman coding and implicit action model
- Author
-
Tongchi Zhou, Nijun Li, Zhenyang Wu, and Lin Zhou
- Subjects
business.industry ,Computer science ,Feature extraction ,Corner detection ,Pattern recognition ,Machine learning ,computer.software_genre ,Huffman coding ,Visualization ,symbols.namesake ,Naive Bayes classifier ,Feature (machine learning) ,symbols ,Visual Word ,Artificial intelligence ,Neural coding ,business ,computer - Abstract
Human action recognition is at the core of computer vision, and has great application value in intelligent human-computer interactions. On the basis of Bag-of-Words (BoW), this work presents a Huffman coding and Implicit Action Model (IAM) combined framework for action recognition. Specifically, Huffman coding, which outperforms naive Bayesian method, is a robust estimation of visual words' conditional probabilities; whereas IAM captures the spatio-temporal relationships of local features and outperforms most other common machine learning methods. Spatio-Temporal Interest Points (STIPs) and Harris corners are employed as local features, and multichannel feature description is adopted to exploit the complementarity among different features. Experiments on UCF-YouTube and HOHA2 datasets systematically compare the performance of various feature channels and machine learning methods, demonstrating the effectiveness of the approaches proposed by this paper. Finally, multiple augment mechanisms such as feature fusion, hierarchical codebooks and sparse coding are integrated into the recognition system, achieving the best ever performance comparing with the state-of-the-art.
- Published
- 2015
- Full Text
- View/download PDF
24. Natural color image enhancement and evaluation algorithm based on human visual system
- Author
-
Kaiqi Huang, Qiao Wang, and Zhenyang Wu
- Subjects
Color histogram ,business.industry ,Image quality ,Color image ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Color balance ,Color quantization ,Signal Processing ,Human visual system model ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Graphics ,business ,Software ,Image restoration ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
To a significant degree, multimedia applications derive their effectiveness from the use of color graphics, images, and videos. In these applications, human visual system (HVS) often gives the final evaluation of the processed results. In this paper, we first propose a novel color image enhancement method, which is named HVS Controlled Color Image Enhancement and Evaluation algorithm (HCCIEE algorithm). We then applied the HCCIEE to color image by considering natural image quality metrics. This HCCIEE algorithm is base on multiscale representation of pattern, luminance, and color processing in the HVS. Experiments illustrated that the HCCIEE algorithm can produce distinguished details without ringing or halo artifacts. (These two problems often occur in conventional multiscale enhancement techniques.) As a result, the experimental results appear as similar as possible to the viewers' perception of the actual scenes.
- Published
- 2006
- Full Text
- View/download PDF
25. Color image denoising with wavelet thresholding based on human visual system model
- Author
-
Francis H. Y. Chan, Zhenyang Wu, Kaiqi Huang, and George S.K. Fung
- Subjects
business.industry ,Second-generation wavelet transform ,Stationary wavelet transform ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Wavelet transform ,Data_CODINGANDINFORMATIONTHEORY ,Non-local means ,Wavelet packet decomposition ,Wavelet ,Computer Science::Computer Vision and Pattern Recognition ,Signal Processing ,Human visual system model ,Video denoising ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Software ,Mathematics - Abstract
Recent research in transform-based image denoising has focused on the wavelet transform due to its superior performance over other transform. Performance is often measured solely in terms of PSNR and denoising algorithms are optimized for this quantitative metric. The performance in terms of subjective quality is typically not evaluated. Moreover, human visual system (HVS) is often not incorporated into denoising algorithm. This paper presents a new approach to color image denoising taking into consideration HVS model. The denoising process takes place in the wavelet transform domain. A Contrast Sensitivity Function (CSF) implementation is employed in the subband of wavelet domain based on an invariant single factor weighting and noise masking is adopted in succession. Significant improvement is reported in the experimental results in terms of perceptual error metrics and visual effect.
- Published
- 2005
- Full Text
- View/download PDF
26. Image enhancement based on the statistics of visual representation
- Author
-
Qiao Wang, Kaiqi Huang, and Zhenyang Wu
- Subjects
Brightness ,business.industry ,Dynamic range ,media_common.quotation_subject ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image (mathematics) ,Wavelet ,Signal Processing ,Contrast (vision) ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Representation (mathematics) ,Image restoration ,Feature detection (computer vision) ,Mathematics ,media_common - Abstract
This paper introduces a novel algorithm to image enhancement that exploits the multi-scale wavelet and statistical characters of visual representation. Processing includes the global dynamic range (brightness) correction and local contrast adjustment, whose parameters are picked automatically by the information contained in the image itself. Experimental results show that the new algorithm outperforms other many existing image enhancement methods and is highly resilient to the effects of both the image-source variations. (C) 2004 Elsevier B.V. All rights reserved.
- Published
- 2005
- Full Text
- View/download PDF
27. Single-channel Speech Separation Using Orthogonal Matching Pursuit
- Author
-
Zhenyang Wu, Xiaoxiong Li, Lin Zhou, and Haiyan Guo
- Subjects
K-SVD ,Channel (digital image) ,Computer science ,business.industry ,Separation (statistics) ,Pattern recognition ,Sparse approximation ,Matching pursuit ,Matrix decomposition ,Human-Computer Interaction ,Artificial Intelligence ,Artificial intelligence ,business ,Software - Abstract
In this paper, we propose a new sparse decomposition based single-channel speech separation method using orthogonal matching pursuit (OMP). The separation is performed using source-individual dictionaries consisting of time-domain training frames as atoms. OMP is used to compute sparse coefficients to estimate sources. We report the separation results of our proposed method and compare them with a separation method based on sparse non-negative matrix factorization (SNMF) which is a classical sparse decomposition based separation method. Experiments show that our proposed method results in higher signal-to-noise ratio (SNR) and signal-to-interference ratio (SIR).
- Published
- 2014
- Full Text
- View/download PDF
28. A Particle Swarm Optimization Algorithm with Local Sparse Representation for Visual Tracking
- Author
-
Nijun Li, Zhenyang Wu, Tongchi Zhou, Lin Zhou, and Xu Cheng
- Subjects
General Computer Science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Scale-invariant feature transform ,Particle swarm optimization ,Pattern recognition ,Sparse approximation ,Object (computer science) ,Tracking (particle physics) ,Feature (computer vision) ,Position (vector) ,Eye tracking ,Computer vision ,Artificial intelligence ,business ,Mathematics - Abstract
Handling appearance variations caused by the occlusion or abrupt motion is a challenging task for visual tracking. In this paper, we propose a novel tracking method that deals with the appearance changes based on sparse representation in a particle swarm optimization (PSO) framework. First, we divide each candidate state into multiple structural patches to cope with the partial occlusions of the object. Once the object is lost, we present an object’s recovery scheme by the scale invariant feature transforms (SIFT) correspondence between two frames to reacquire the rough object position. Then the tracking state is searched in the vicinage of the rough object position using the PSO iteration. In addition, an online dictionary updating mechanism is presented to capture the object appearance variations. The object information from the initial frame is never updated in the tracking, while other templates in the dictionary are progressively updated based on the coefficients of templates. Compared with several conventional trackers, the experimental results demonstrate that our approach is more robust in dealing with the occlusions and abrupt motion variations.
- Published
- 2014
- Full Text
- View/download PDF
29. A Hybrid Method for Human Interaction Recognition Using Spatio-temporal Interest Points
- Author
-
Haiyan Guo, Xu Cheng, Zhenyang Wu, and Nijun Li
- Subjects
Structure (mathematical logic) ,Computer science ,business.industry ,Reliability (computer networking) ,Context (language use) ,Pattern recognition ,computer.software_genre ,Random forest ,Feature (computer vision) ,Human interaction ,Genetic algorithm ,Artificial intelligence ,Data mining ,business ,computer - Abstract
This paper proposes an innovative and effective hybrid way to recognize human interactions, which incorporates the advantages of both global feature (Motion Context, MC) and Spatio-Temporal (S-T) correlation of local Spatio-Temporal Interest Points (STIPs). The MC feature, which also derives from STIPs, is used to train a random forest where Genetic Algorithm (GA) is applied to the training phase to achieve a good compromise between reliability and efficiency. Besides, we design an effective and efficient S-T correlation based match to assist the MC feature, where MC's structure and a biological sequence matching algorithm are employed to calculate the spatial and temporal correlation score, respectively. Experiments on the UT-Interaction dataset show that our GA search based random forest and S-T correlation based match achieve better performance than some other prevalent machine leaning methods, and that a combination of those two methods outperforms most of the state-of-the-art works.
- Published
- 2014
- Full Text
- View/download PDF
30. Visual Tracking via Sparse Representation and Online Dictionary Learning
- Author
-
Lin Zhou, Nijun Li, Xu Cheng, Tongchi Zhou, and Zhenyang Wu
- Subjects
Scheme (programming language) ,K-SVD ,Computer science ,business.industry ,Association (object-oriented programming) ,Pattern recognition ,Sparse approximation ,Tracking (particle physics) ,Active appearance model ,Video tracking ,Eye tracking ,Artificial intelligence ,business ,computer ,computer.programming_language - Abstract
Sparse representation has been shown competitive performance on single object tracking. In this paper, we extend this technique to tracking multiple interactive objects and present a novel sparse tracker under the tracking-by-detection framework, with saliency detector for objects detection and sparse representation for objects association. Furthermore, we propose an online dictionary learning scheme to capture appearance variations of objects. To avoid using trivial templates, the dictionary contains not only objects templates, but also background information, resulting in more robust estimation. The experiments demonstrate that our approach achieves favorable performance over state-of-the-art algorithms.
- Published
- 2014
- Full Text
- View/download PDF
31. Recognizing human actions by BP-AdaBoost algorithm under a hierarchical recognition framework
- Author
-
Nijun Li, Zhenyang Wu, Xu Cheng, and Suofei Zhang
- Subjects
Artificial neural network ,Computer science ,business.industry ,Time delay neural network ,3D single-object recognition ,Frame (networking) ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Cognitive neuroscience of visual object recognition ,Pattern recognition ,Machine learning ,computer.software_genre ,Backpropagation ,ComputingMethodologies_PATTERNRECOGNITION ,Feature (machine learning) ,Artificial intelligence ,business ,computer - Abstract
This paper explores the performance of Neural Network (NN) for human action recognition and proposes a novel hierarchical and boosting-based action recognition system. Specifically, the main contributions of our work are three-fold: (1) A boosted NN based scheme is applied to the human action recognition task for the first time, during which we extend the standard binary AdaBoost algorithm to a multiclass version; (2) A novel hierarchical recognition framework with pre-decision and post-decision modules is proposed, which can significantly enhance the training efficiency as well as the frame-based recognition accuracy; (3) Numerous modified features (both motion and shape features) are utilized and combined in this paper. Experiments on the Weizmann dataset show promising results of our approach in comparison with other state-of-the-art methods.
- Published
- 2013
- Full Text
- View/download PDF
32. Tracking Deformable Parts via Dynamic Conditional Random Fields
- Author
-
Xu Cheng, Lin Zhou, Haiyan Guo, Suofei Zhang, and Zhenyang Wu
- Subjects
Conditional random field ,FOS: Computer and information sciences ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Variation (game tree) ,Object (computer science) ,Tracking (particle physics) ,Object detection ,Multimedia (cs.MM) ,Video tracking ,Computer vision ,Artificial intelligence ,business ,Computer Science - Multimedia - Abstract
Despite the success of many advanced tracking methods in this area, tracking targets with drastic variation of appearance such as deformation, view change and partial occlusion in video sequences is still a challenge in practical applications. In this letter, we take these serious tracking problems into account simultaneously, proposing a dynamic graph based model to track object and its deformable parts at multiple resolutions. The method introduces well learned structural object detection models into object tracking applications as prior knowledge to deal with deformation and view change. Meanwhile, it explicitly formulates partial occlusion by integrating spatial potentials and temporal potentials with an unparameterized occlusion handling mechanism in the dynamic conditional random field framework. Empirical results demonstrate that the method outperforms state-of-the-art trackers on different challenging video sequences., Comment: 4 pages, 5 figures, the manuscript has been submitted to IEEE Signal Processing Letters
- Published
- 2013
- Full Text
- View/download PDF
33. Implicit JointBoost for multiclass object detection under high intra-class variation
- Author
-
Suofei Zhang, Xu Cheng, Nijun Li, and Zhenyang Wu
- Subjects
Boosting (machine learning) ,Training set ,Computer science ,business.industry ,Prior assumption ,Pattern recognition ,Machine learning ,computer.software_genre ,Object detection ,Local optimum ,Learning methods ,Artificial intelligence ,Semantic information ,business ,computer ,Classifier (UML) - Abstract
In this paper, we address the problem of using boosting to detect different classes of objects with significant intra-class variation. Current approaches generally require prior knowledge such as semantic information to obtain an explicit partition over positive samples. This can require a lot of manually labeled samples and can limit the performance of classifier due to subjective error. We present a novel JointBoost based learning method which can learn the variant appearances of targets without any prior assumption. With an implicit partition, the specific features for fractional samples can contribute to classification as well as generic features for the whole training set. By encouraging an intra-class and inter-class feature sharing between implicit sub-categories, our data-driven learning approach avoids a local optimum in the candidate weak classifier space. Experimental results on two popular tasks demonstrate the considerable improvements brought by the new approach. We hope that implicit JointBoost will extend the application of traditional boosting methods.
- Published
- 2012
- Full Text
- View/download PDF
34. Localization of Multiple Speech Sources Based on Sub-band Steered Response Power
- Author
-
Xiaoyan Zhao, Zhenyang Wu, and Weiping Cai
- Subjects
Microphone array ,Engineering ,business.industry ,Speech recognition ,Phase (waves) ,Pattern recognition ,Acoustic source localization ,Speech processing ,Signal ,Hierarchical clustering ,Power (physics) ,Artificial intelligence ,business ,Cluster analysis - Abstract
The steered response power with phase transform weighted (SRP-PHAT) is a robust sound source localization method based on microphone array. Multiple source localization has been implemented using SRP-PHAT with agglomerative clustering (AC). In this paper, a novel method of multiple speech source localization based on sub-band SRP is proposed. In this method, speech signal is divided into several sub-bands, where sub-band SRP is computed, initial estimations are generated by searching the maximum in every sub-band SRP, and the final source locations are determined from the initial estimations using AC. The proposed method is tested with the real-world recordings under the condition that the number of active speakers is unknown. The results show that our method provides higher localization performance than that of the conventional SRP-PHAT with AC in the cases with up to 3 concurrent speakers.
- Published
- 2010
- Full Text
- View/download PDF
35. Improved AdaBoost Algorithm Using VQMAP for Speaker Identification
- Author
-
Haiyang Wu, Yong Lu, and Zhenyang Wu
- Subjects
Boosting (machine learning) ,business.industry ,Computer science ,Speech recognition ,Vector quantization ,Pattern recognition ,Mixture model ,Speaker recognition ,ComputingMethodologies_PATTERNRECOGNITION ,Margin classifier ,Maximum a posteriori estimation ,AdaBoost ,Artificial intelligence ,business ,Classifier (UML) - Abstract
Adaptive boosting (AdaBoost) learning method can improve the performance of a base classifier by mining feature information in depth. But it is computationally expensive, and the base classifier without a suitable accuracy will cause over fitting. In this paper an improved Adaboost algorithm using maximum a posteriori vector quantization model (VQMAP) for speaker identification is presented. A suitable VQMAP classifier matched the size of speaker identification problem is constructed first. Then it is boosted to a strong classifier by AdaBoost with early stopping method. Experiments show that the performance of the boosted VQMAP classifier is better than that of VQMAP, and is slightly lower than that of maximum a posteriori adapted Gaussian mixture model (GMMMAP), but with a faster recognition speed. In the case of limited data and predictable speaker number, it will reach or exceeded GMMMAP.
- Published
- 2010
- Full Text
- View/download PDF
36. A mean shift algorithm based on modified Parzen window for small target tracking
- Author
-
Suofei Zhang, Zhenyang Wu, Jianjun Chen, and Guocheng An
- Subjects
business.industry ,Kernel (statistics) ,Computation ,Histogram ,Kernel density estimation ,Computer vision ,Density estimation ,Mean-shift ,Artificial intelligence ,Similarity measure ,Tracking (particle physics) ,business ,Mathematics - Abstract
This paper addresses the problem of small scale target tracking. The divided-by-zero problem in the weight computation of mean shift algorithm and its associated tracking interrupt problem are presented. To tackle these problems, the Parzen window density estimation method is modified to interpolate the histogram of the target candidate. Then the Kullback-Leibler distance is employed as a new similarity measure between the target model and the target candidate. Its corresponding weight computation and new location expressions are derived. On the basis of these works, we propose a new small target tracking algorithm using mean shift framework. The tracking experiments for real world video sequences show that the proposed algorithm can track the target successively and accurately. It can successfully track very small targets with only 6×12 pixels.
- Published
- 2010
- Full Text
- View/download PDF
37. The Estimation of Personalized HRTFs in Individual VAS
- Author
-
Liang Chen, Zhenyang Wu, and Hongmei Hu
- Subjects
Frequency response ,business.industry ,Computer science ,Speech recognition ,Pattern recognition ,Virtual reality ,computer.software_genre ,medicine.anatomical_structure ,medicine ,Auditory system ,Artificial intelligence ,business ,Audio signal processing ,computer - Abstract
The synthesis of individual virtual auditory system (VAS) is an important and state-of-the-art technology in virtual reality. One of the key factors for individual VAS is to obtain a set of proper head related transfer functions (HRTFs). A personalization method is presented in this paper. First, multiple linear regression analysis is applied to get the linear relationship between HRTFs and some anthropometric parameters; second, the magnitude response of the individual HRTF is estimated; finally the center frequencies of the two prominent notches in the estimated magnitude response are modified by using the center frequencies estimated from the subject's pinna parameters. The results show that the modified HRTF is more approximate to the measured one.
- Published
- 2008
- Full Text
- View/download PDF
38. An efficient approach for registration and super-resolution of aliased images
- Author
-
Zhenyang Wu, Jianpo Gao, and Hao Yang
- Subjects
business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image registration ,Image processing ,Iterative reconstruction ,Translation (geometry) ,Computer Science::Computer Vision and Pattern Recognition ,Digital image processing ,Computer vision ,Artificial intelligence ,Aliasing (computing) ,business ,Image resolution ,Sub-pixel resolution ,Mathematics - Abstract
This paper addresses the problem of reconstructing a super- resolved image from a set of aliased, noisy, low-resolution (LR), and blurry images. The accurate knowledge of the sub-pixel registration parameters for each LR image is the key for this problem. However, the presence of aliasing is the main challenge in registering these images. In this paper, we analyze the method which combines the registration problem into the super-resolution reconstruction (SSR) and propose a novel approach to solve it. The proposed approach utilizes the principle similar to the variable projection method, which results in a better-conditioned problem and avoids some shortcomings of cyclic coordinate descent optimization procedure. It can be efficiently implemented by using Lanezos algorithm and Gauss quadrature theory. As a result, the proposed approach can deal with translation and rotation between the observed LR images. Experimental results demonstrate the effectiveness of our approach.
- Published
- 2007
- Full Text
- View/download PDF
39. Head Related Transfer Function Personalization Based on Multiple Regression Analysis
- Author
-
Zhenyang Wu, Hao Ma, Jie Zhang, Lin Zhou, and Hongmei Hu
- Subjects
business.industry ,Computer science ,Speech recognition ,Regression analysis ,Pattern recognition ,Artificial intelligence ,Virtual reality ,Set (psychology) ,business ,Head-related transfer function ,Personalization - Abstract
The synthesis of personalization virtual auditory space (VAS) is an important and challenging task in virtual reality. One of the key factors for personalization VAS is to obtain a set of proper head related transfer functions (HRTFs). In this paper, a customization method is presented by applying multiple linear regression analysis to HRTFs and some anthropometric parameters chosen by correlation analysis. Experiments show that the estimated HRTF has small mean square error, and has similar perception effect as the measured HRTF. Furthermore, the localization accuracy of personalized HRTF is increased compared to the non-individual HRTF.
- Published
- 2006
- Full Text
- View/download PDF
40. A Novel Video Object Spatial Segmenting Strategy Based on Morphological Filtering
- Author
-
Zhenyang Wu, Yujian Wang, and Jianpo Gao
- Subjects
Watershed ,Pixel ,Computer science ,business.industry ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Iterative reconstruction ,Image segmentation ,Mathematical morphology ,Reduction (complexity) ,Maxima and minima ,Computer vision ,Artificial intelligence ,business - Abstract
Video object extraction has been a hot topic in the field of modern information processing. This paper proposes a novel video object watershed segmenting strategy, which combines the merits of alternative sequential filtering by reconstruction and adaptive threshold algorithm. This strategy can automatically determine the structural element size of the morphological filter and the threshold value of nonlinear transformation as well. The iteration of opening-closing by reconstruction with gradually expanding structural elements guarantees the reduction of the segmented regions. Besides, the iteration makes it easier to distinguish the gradient of edge pixels and that of the pixels inside the plateaus, which facilitates removing the small regions of local gradient extrema by adaptive threshold algorithm. Experiment results show that the proposed strategy brings about satisfactory spatial segmenting results.
- Published
- 2006
- Full Text
- View/download PDF
41. Robust GMM Based Gender Classification using Pitch and RASTA-PLP Parameters of Speech
- Author
-
Yumin Zeng, Tiago H. Falk, Wai-Yip Chan, and Zhenyang Wu
- Subjects
business.industry ,Computer science ,Speech recognition ,Feature extraction ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Pattern recognition ,Speech processing ,Mixture model ,computer.software_genre ,Linear predictive coding ,symbols.namesake ,Computer Science::Sound ,symbols ,Artificial intelligence ,Audio signal processing ,business ,computer ,Gaussian process ,Classifier (UML) - Abstract
A novel gender classification system has been proposed based on Gaussian Mixture Models, which apply the combined parameters of pitch and 10th order relative spectral perceptual linear predictive coefficients to model the characteristics of male and female speech. The performances of gender classification system have been evaluated on the conditions of clean speech, noisy speech and multi-language. The simulations show that the performance of the proposed gender classifier is excellent; it is very robust for noise and completely independent of languages; the classification accuracy is as high as above 98% for all clean speech and remains 95% for most noisy speech, even the SNR of speech is degraded to 0dB.
- Published
- 2006
- Full Text
- View/download PDF
42. Face Tracking Algorithm Based on Mean Shift and Ellipse Fitting
- Author
-
Zhenyang Wu, Yujian Wang, and Jianpo Gao
- Subjects
Scale (ratio) ,business.industry ,Facial motion capture ,Frame (networking) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Kernel Bandwidth ,Ellipse ,Tracking (particle physics) ,Video tracking ,Computer vision ,Artificial intelligence ,Mean-shift ,business ,Algorithm ,Mathematics - Abstract
The mean shift algorithm is an efficient technique for object tracking. However, it has a shortcoming that it can't adjust scale with object during tracking process. There are presently no effective ways to solve this problem. The kernel bandwidth of mean shift tracker in one frame is generally steered by the object scale obtained in the previous frame, so it is very important for mean shift tracker to correctly describe the scale of the target in very frame. In accordance with the kernel-bandwidth effect on the mean shift tracker and the property of face, this paper introduces a new idea that uses direct least square ellipse fitting to adjust the facial scale. The experimental results demonstrate the efficiency of this algorithm. Its performance has been proven superior to the original mean shift tracking algorithm.
- Published
- 2006
- Full Text
- View/download PDF
43. Multi-band maximum a posteriori multi-transformation algorithm based on the discriminative combination
- Author
-
Hong-Mei Hu, Wei Sun, Zhenyang Wu, and Yumin Zeng
- Subjects
Computer science ,business.industry ,Maximum likelihood ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Function (mathematics) ,Speech processing ,Radio spectrum ,Noise ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,Computer Science::Sound ,Linear regression ,Maximum a posteriori estimation ,Artificial intelligence ,Hidden Markov model ,business - Abstract
According to auditory characteristics of human's hearing system, a multi-band maximum a posteriori multi-transformation algorithm based on the discriminative combination is developed to improve the performance of speech recognition systems in noisy environment. The algorithm utilizes the difference between noise's spectrum and speech's spectrum, and the different effects of noise on recognition performance in different frequency bands. It compensates the effect of noise with a discriminative function and maximum a posteriori multi-transformation. Experimental results show that the proposed algorithm outperforms the maximum a posterior linear regression algorithm. The results also show that the utilization of effective band with information redundancy helps to improve the recognition performance.
- Published
- 2005
- Full Text
- View/download PDF
44. Mean and covariance adaptation based on minimum classification error linear regression for continuous density HMMs
- Author
-
Zhenyang Wu and Haibin Liu
- Subjects
Polynomial regression ,General linear model ,Proper linear model ,Continuous density ,Computer science ,business.industry ,Maximum likelihood ,Mean and predicted response ,Pattern recognition ,Covariance ,ComputingMethodologies_PATTERNRECOGNITION ,Bayesian multivariate linear regression ,Linear regression ,Maximum a posteriori estimation ,Principal component regression ,Artificial intelligence ,business ,Hidden Markov model - Abstract
The performance of speech recognition system will be significantly deteriorated because of the mismatches between training and testing conditions. This paper addresses the problem and proposes an algorithm to adapt the mean and covariance of HMM simultaneously within the minimum classification error linear regression (MCELR) framework. Rather than estimating the transformation parameters using maximum likelihood estimation (MLE) or maximum a posteriori, we proposed to use minimum classification error (MCE) as the estimation criterion. The proposed algorithm, called IMCELR (Improved MCELR), has been evaluated on a Chinese digit recognition tasks based on continuous density HMM. The experiments show that the proposed algorithm is more efficient than maximum likelihood linear regression with the same amount of adaptation data.
- Published
- 2004
- Full Text
- View/download PDF
45. Color image enhancement and evaluation algorithm based on human visual system
- Author
-
Zhenyang Wu, Qiao Wang, and Kaiqi Huang
- Subjects
Color histogram ,Demosaicing ,business.industry ,Image quality ,Color image ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Wavelet transform ,Color balance ,Image processing ,Luminance ,Color quantization ,Human visual system model ,Computer vision ,Artificial intelligence ,business ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
We propose a novel color image enhancement method, the human visual system controlled color image enhancement and evaluation (HCCIEE) algorithm, and apply it to color images by considering natural image quality metrics. This HCCIEE algorithm is based on a multiscale representation of pattern, luminance and color processing in the human visual system. Experiments illustrate that the HCCIEE algorithm can produce distinguishing details while avoiding artifacts, which often occur in conventional multiscale enhancement methods, as well as producing images that appear as similar as possible to the viewer's perception of actual scenes.
- Published
- 2004
- Full Text
- View/download PDF
46. Color image denoising with wavelet thresholding based on human visual system model
- Author
-
Kaiqi Huang and Zhenyang Wu
- Subjects
business.industry ,Stationary wavelet transform ,Second-generation wavelet transform ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Wavelet transform ,Data_CODINGANDINFORMATIONTHEORY ,Non-local means ,Wavelet packet decomposition ,Wavelet ,Computer Science::Computer Vision and Pattern Recognition ,Human visual system model ,Computer vision ,Video denoising ,Artificial intelligence ,business ,Mathematics - Abstract
Recent research in transform-based image denoising has focused on the wavelet transform due to its superior performance over other transform. Performance is often measured solely in terms of PSNR and denoising algorithms are optimized for this quantitative metric. The performance in terms of subjective quality is typically not evaluated. Moreover, human visual system (HVS) is often not incorporated into denoising algorithm. This paper presents a new approach to color image denoising taking into consideration HVS model. The denoising process takes place in the wavelet transform domain. A Contrast Sensitivity Function (CSF) implementation is employed in the subband of wavelet domain based on an invariant single factor weighting and noise masking is adopted in succession. Significant improvement is reported in the experimental results in terms of perceptual error metrics and visual effect. (C) 2004 Elsevier B.V. All rights reserved.
- Published
- 2003
- Full Text
- View/download PDF
47. Wavelet analysis of head-related transfer functions
- Author
-
J.C.K. Chan, F.K. Lam, F.H.Y. Chan, T.F. Lo, and Zhenyang Wu
- Subjects
Discrete wavelet transform ,Wavelet ,Relation (database) ,Computer science ,Head (linguistics) ,business.industry ,Acoustics ,Pattern recognition ,Artificial intelligence ,business ,Transfer function - Abstract
The directional-dependent information in the head-related transfer function (HRTF) is important for the study of human sound localization system and the synthesis of virtual auditory signals. Its time-domain and frequency-domain characteristics have been widely studied by researchers. The purpose of this paper is to explore the ability of discrete wavelet transform to describe the time-scale characteristics of HRTFs. Both the time-domain characteristics and energy distribution of different frequency subbands were studied. Discrete wavelet analysis is found to be a new direction-dependence information showing the relation of the characteristics of the HRTFs to sound source directions.
- Published
- 2002
- Full Text
- View/download PDF
48. Applications of least-squares FIR filters to virtual acoustic space
- Author
-
Richard A. Reale, Jiashu Chen, and Zhenyang Wu
- Subjects
Signal processing ,Finite impulse response ,Computer science ,business.industry ,Acoustics ,Ear, Middle ,Transfer function ,Models, Biological ,Sensory Systems ,Acoustic Stimulation ,Frequency domain ,Humans ,Computer Simulation ,Time domain ,Sound Localization ,Least-Squares Analysis ,business ,Sound pressure ,Impulse response ,Digital signal processing - Abstract
A virtual acoustic space (VAS) employs the localization cues specified by the direction-dependent 'free-field to eardrum transfer function' (FETF) to synthesize sound-pressure waveforms present near the tympanum. The combination of a VAS and the earphone delivery of synthesized waveforms is useful to study parametrically the neural mechanisms of directional hearing. The VAS-earphone procedure requires accurate FETF estimation from free-field measurements and appropriate compensation for the undesirable spectral characteristics of the closed-field earphone sound delivery and measurement systems. Here we describe how specially designed finite-impulse-response (FIR) filters improve these two operations. The coefficients of an FIR filter arc determined using a least-squares error criterion. The least-squares FIR filter is implemented entirely in the time domain and avoids the usual problems with division inherent in a frequency domain approach. The estimation of an FETF by a least-squares FIR filter is veracious since its impulse response can recover signals that were recorded near the eardrum in the free field with a very high fidelity. The correlation coefficient between recorded and recovered time waveforms typically exceeds 0.999. Similarly, least-squares FIR filters prove excellent in compensating closed-field sound systems since comparisons of waveforms delivered by a compensated earphone to their corresponding predistorted signals yield correlation coefficients that exceed 0.99 on average.
- Published
- 1994
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.