41 results on '"Yonghong Song"'
Search Results
2. Multi-view gait recognition using NMF and 2DLDA
- Author
-
Yonghong Song, Yuanlin Zhang, and Chen Wu
- Subjects
Computer Networks and Communications ,Computer science ,business.industry ,Deep learning ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Linear discriminant analysis ,Matrix decomposition ,Non-negative matrix factorization ,Gait (human) ,Transformation (function) ,Hardware and Architecture ,Face (geometry) ,Principal component analysis ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Artificial intelligence ,business ,Software - Abstract
View Transformation Model(VTM) is extensively employed in multi-view gait recognition. However, there still exists decline of matching accuracy among view transformation procedures. Particularly, the loss grows rapidly with the increase of the disparity of views. In the face of this difficulty, firstly, Non-negative Matrix Factorization(NMF) is introduced to obtain local structured features of human body to compensate accuracy loss. Moreover, 2D Linear Discriminant Analysis(2DLDA) is applied to improve classification ability by projecting features into a discriminant space. In the end, gait features, the Gait Energy Images(GEIs), is strengthened as 2D Enhanced GEI(2D-EGEI) by using the reconstruction of 2D Principal Component Analysis(2DPCA). Compared with the state-of-the-art, proposed method significantly outperforms the others. Furthermore, the comparisons of two deep learning methods is evaluated as well. Experimental outcomes show that the proposed method provides an alternative way to obtain the approximative outcomes compared with the deep learning methods.
- Published
- 2019
- Full Text
- View/download PDF
3. You Ought to Look Around: Precise, Large Span Action Detection
- Author
-
Ge Pan, Yuanlin Zhang, Fan Yu, Yonghong Song, Han Zhang, and Han Yuan
- Subjects
Computer science ,business.industry ,05 social sciences ,Feature extraction ,Pattern recognition ,010501 environmental sciences ,01 natural sciences ,Variable (computer science) ,Feature (computer vision) ,0502 economics and business ,Pattern recognition (psychology) ,Graph (abstract data type) ,Overhead (computing) ,Leverage (statistics) ,Pyramid (image processing) ,Artificial intelligence ,050207 economics ,business ,0105 earth and related environmental sciences - Abstract
For the action localization task, pre-defined action anchors are the cornerstone of mainstream techniques. State-of-the-art models mostly rely on a dense segmenting scheme, where anchors are sampled uniformly over the temporal domain with a predefined set of scales. However, it is not sufficient because action duration varies greatly. Therefore, it is necessary for the anchors or proposals to have a variable receptive field. In this paper, we propose a method called YOLA (You Ought to Look Around) which includes three parts: 1) a robust backbone SPN-I3D for extracting spatio-temporal features. In this part, we employ a stronger backbone I3D with SPN (Segment Pyramid Network) instead of C3D to obtain multi-scale features; 2) a simple but useful feature fusion module named LFE (Local Feature Extraction). Compared with the fully connected layer and global average pooling, our LFE model is more advantageous for network to fit and fuse features. 3) a new feature segment aligning method called TPGC (Two Pathway Graph Convolution), which allows one proposal to leverage semantic features of adjacent proposals to update its content and make sure the proposals have a variable receptive field. YOLA add only a small overhead to the baseline network, and is easy to train in an end-to-end manner, running at a speed of 1097 fps. YOLA achieves a mAP of 58.3%, outperforming all existing models including both RGB-based and two stream on THUMOS'14, and achieves competitive results on ActivityNet 1.3.
- Published
- 2021
- Full Text
- View/download PDF
4. Multi-view Gait Recognition by Inception-Encoder and CL-GEI
- Author
-
Chongdong Huang, Chen Wu, and Yonghong Song
- Subjects
Identification (information) ,Gait (human) ,Transformation (function) ,Computer science ,business.industry ,Feature vector ,Pattern recognition ,Artificial intelligence ,Invariant (physics) ,Representation (mathematics) ,business ,Encoder ,Image (mathematics) - Abstract
To solve multi-view problem in gait recognition, some methods based on Generative Adversarial Networks (GANs) are proposed. These methods mainly transformed multi-view gait features with walking variations into a common view without these variations. However, the direct pixel-to-pixel transformation would result to inefficient and inaccurate. Moreover, the transformed features often did not preserve enough identification information which would lead to accuracy decline. Besides, Gait Energy Image (GEI) often loses temporal information of sequences. To address these problems, Inception-encoder is proposed to extract effective gait features into feature vectors which are invariant to views and other walking variations by adopting generative constraints from GANs. To preserve more identification information, identification constraints is adopted from labels. Furthermore, inception model is embedded into the encoder for improving representation ability. Moreover, CL-GEI is proposed to preserve more temporal information. Experiments on CASIA-B and OU-ISIR prove the competitive performance of the combination of Inception-encoder with CL-GEI compared with the state-of-the-art.
- Published
- 2021
- Full Text
- View/download PDF
5. Feature Separation GAN for Cross View Gait Recognition
- Author
-
Yuanlin Zhang, Chongdong Huang, and Yonghong Song
- Subjects
Discriminator ,Basis (linear algebra) ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Image (mathematics) ,Constraint (information theory) ,Transformation (function) ,Gait (human) ,Feature (computer vision) ,Artificial intelligence ,business ,Generator (mathematics) - Abstract
Gait information can be collected by a long-distance camera. But the relative angle between the subject and the camera changes, resulting in a cross-view gait recognition problem. This paper proposes a view transformation model method based on feature separation generate adversarial networks. Based on the GAN model, this method separates the features of the input data as an additional discriminant basis. On the premise of building a single model, it can convert image to any angle as needed. In order to make the images generated by GAN more realistic, the proposed method separates view and dress information from the identity data and encodes them. The discriminator is also optimized by adding the conditional codes as an additional basis, so that the generator can generate the corresponding image more realistically based on the encoded information image. In addition, the proposed method also adds a constraint to increase the inter-class variation of subjects and reduce their intra-class distance. Thus, the synthesized image retains more feature information of original subject. The proposed method achieves a great generating effect and improves the performance of cross-view gait recognition.
- Published
- 2021
- Full Text
- View/download PDF
6. Gait Recognition Based on 3D Skeleton Data and Graph Convolutional Network
- Author
-
Yonghong Song and Mengge Mao
- Subjects
Computational complexity theory ,Biometrics ,Computer science ,business.industry ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,01 natural sciences ,Gait (human) ,Dual graph ,0103 physical sciences ,Softmax function ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,Artificial intelligence ,010306 general physics ,business - Abstract
Gait recognition is a hot topic in the field of biometrics because of its unique advantages such as non-contact and long distance. The appearance-based gait recognition methods usually extract features from the silhouettes of human body, which are easy to be affected by factors such as clothing and carrying objects. Although the model-based methods can effectively reduce the influence of appearance factors, it has high computational complexity. Therefore, this paper proposes a gait recognition method based on the 3D skeleton data and graph convolutional network. The 3D skeleton data is robust to the change of view. In this paper, we extract 3D joint feature and 3D bone feature based on 3D skeleton data, design a dual graph convolutional network to extract corresponding gait features and fuse them at feature level. At the same time, we use a multi-loss strategy to combine center loss and softmax loss to optimize the network. Our method is evaluated on the dataset CASIA B. The experimental results show that the proposed method can achieve state-of-the-art performance, and it can effectively reduce the influence of view, clothing and other factors.
- Published
- 2020
- Full Text
- View/download PDF
7. A coarse-to-fine scene text detection method based on Skeleton-cut detector and Binary-Tree-Search based rectification
- Author
-
He Xiang, Yuanlin Zhang, and Yonghong Song
- Subjects
Computer science ,business.industry ,Detector ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Text detection ,Coarse to fine ,Rectification ,Artificial Intelligence ,Binary search tree ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Classifier (UML) ,Software - Abstract
Scene text detection has been a long standing hot and challenging research topic in pattern recognition. In this paper a novel coarse-to-fine text detection method is proposed to solve edge-adhesion problem. In coarse detection stage, Skeleton-cut detector is proposed. At first,8-Neighborhoods-Search is applied on skeletons map to find the adhesion junctions between text and background skeletons. Then junctions in disordered skeletons are picked out by hysteresis selection and cut to separate text skeletons from background. And the text skeletons are verified through a two-stage classifier to obtain the coarse detection result. In fine detection stage, bounding boxes of all these filtered skeletons are weighted accumulated to obtain the Static Skeleton Response(SSR). Then many finer text lines candidates can be calculated through the gradient operation to the SSR’s horizontal projection. And the text rectification based on Binary-Tree-Search is proposed to find a path from text lines’ search space to the fine detection result. Experimental results on ICDAR dataset, SVT dataset and MSRA-TD500 dataset demonstrate that our algorithm achieves state of art performance in scene text detection.
- Published
- 2018
- Full Text
- View/download PDF
8. Enhanced Darknet53 Combine MLFPN Based Real-Time Defect Detection in Steel Surface
- Author
-
Yonghong Song, Yuanlin Zhang, and Xiao Yi
- Subjects
Computer science ,business.industry ,020208 electrical & electronic engineering ,Detector ,Pooling ,0202 electrical engineering, electronic engineering, information engineering ,Network structure ,020201 artificial intelligence & image processing ,Pattern recognition ,02 engineering and technology ,Artificial intelligence ,Detection rate ,business - Abstract
Real-time detection of wire surface defects is an important part of wire quality detection. Because traditional algorithms need lots of parameters and have weak universality, besides, time performance of detector based on candidate region is poor. To solve these problems, we study the effectiveness of single-stage detector in real-time detection, and propose a detection algorithm of wire surface defects combining enhanced darknet53 and feature pyramid (FPN). Firstly, we use CBAM_Darknet53 which introduces channel attention and spatial attention to extract more differentiated features. Secondly, considering the large change of defects, we use the multi-level feature pyramid (MLFPN) which adds the maximum pooling layer to fuse multi-level features to detect multi-scale defects. Then we reprocess the detector to improve the detection rate of defects and the accuracy of the detection box. Finally, network structure is optimized by modifying loss function. Experiments on defect datasets in real industrial environments show that recall and mAP of this method reach 94.49% and 88.46%, which is higher than state-of-the-art methods.
- Published
- 2020
- Full Text
- View/download PDF
9. Body Part Level Attention Model For Skeleton-Based Action Recognition
- Author
-
Yuanlin Zhang, Han Zhang, and Yonghong Song
- Subjects
Computer science ,business.industry ,010401 analytical chemistry ,Pattern recognition ,02 engineering and technology ,Skeleton (category theory) ,021001 nanoscience & nanotechnology ,01 natural sciences ,0104 chemical sciences ,Human skeleton ,Recurrent neural network ,medicine.anatomical_structure ,Discriminative model ,Action (philosophy) ,Benchmark (computing) ,medicine ,Artificial intelligence ,0210 nano-technology ,business ,Representation (mathematics) - Abstract
Skeleton-based action recognition has catched more eyes recently. Many approaches model the human skeleton sequences spatio-temporal representation by Recurrent Neural Networks, while RNNs-based approaches don’t explicitly have the attention scheme. In this paper, a body part level attention method is proposed for determining which parts are discriminative. On the basis of human joints natural connection, human body is divided to disjoint parts which then are fed into corresponding modules to generate action scores. A body part level attention module which distributes different importance to different action scores is proposed to adaptively learn informative body parts. The final representative of the action video is produced in a weighted average way and it will be fed into DNNs for classification. Out method outperforms other attention model and we obtain the state-of-the-art performance compared to other methods on two popularly used benchmark NTU dataset and UT Kinect dataset.
- Published
- 2019
- Full Text
- View/download PDF
10. Weak Supervised Surface Defect Detection Method Based on Selective Search and CAM
- Author
-
Yonghong Song, Xiao Yi, and Xu Tang
- Subjects
Surface (mathematics) ,business.industry ,Computer science ,010401 analytical chemistry ,Pattern recognition ,02 engineering and technology ,Filter (signal processing) ,021001 nanoscience & nanotechnology ,01 natural sciences ,0104 chemical sciences ,Image (mathematics) ,Minimum bounding box ,Segmentation ,Artificial intelligence ,0210 nano-technology ,business - Abstract
Due to the large scale variation of surface defects of different types of strip steel, there are limitations in using threshold segmentation to locate objects, we propose a surface defect detection algorithm combining selective search and class activation mapping (CAM) to improve objects localization. First, we use selective search to generate defect bounding box in the image, and predicts the classification and CAM of the defect in the image through the trained model. Then, in the defect detection, filter the bounding box with the classification information of the defect as priori knowledge. We only retain the bounding box that approximate the shape of the defect and map the filtered defect bounding box to the CAM of the corresponding defect. Finally, select the bounding box with the highest score as a detection result. Experiment results show that the proposed method can achieve a mean average precision of 91.1% on our dataset. And it can more accurately locate defects in the image. Compared with traditional CAM, our method has more excellent detection performance in surface defect detection applications of strip steel.
- Published
- 2019
- Full Text
- View/download PDF
11. Enhanced EAST: Improving Network's Feature Extraction Ability and Text Complete Shape Perception
- Author
-
Liu Yang, Yonghong Song, and Yuanlin Zhang
- Subjects
Network architecture ,Training set ,Computer science ,business.industry ,media_common.quotation_subject ,Detector ,Feature extraction ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Text mining ,Kernel (image processing) ,Perception ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,0105 earth and related environmental sciences ,media_common - Abstract
EAST [1] is a popular end to end text detector, however, it performs deficiently with long or large texts, because of limited network's feature extraction capacity and narrow receptive fields. Based on EAST, we propose an approach named Enhanced EAST. Firstly, we offer low-level feature layers more semantic information by introducing information from high levels to low levels, which reduces the information gap between different layers. Meanwhile, we utilize a two-stream large kernel convolution to increase receptive fields with reasonable computational cost, therefore, improving the network's features detection and fusion ability. In addition, we also optimize the label generation of training data and design a weighted mask for each text, which can guide the training process to enhance the network's complete shape perception of texts, thus impelling the predicted text boxes locate more accurately. In the end, we perform data equalization and augmentation in the experiments and experiment results on ICDAR 2015, MSRA-TD500 and ICDAR 2017 MLT datasets demonstrate the proposed algorithm achieves a state-of-art performance in multi-oriented scene text detection.
- Published
- 2019
- Full Text
- View/download PDF
12. Graph Convolutional LSTM Model for Skeleton-Based Action Recognition
- Author
-
Han Zhang, Yuanlin Zhang, and Yonghong Song
- Subjects
Spatial configuration ,Artificial neural network ,Computer science ,business.industry ,Feature extraction ,Graph theory ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Graph ,Discriminative model ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,Action recognition ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,0105 earth and related environmental sciences - Abstract
Skeleton-based action recognition has made impressive progress these years. Yet few methods consider spatial configuration of joints and temporal correlation meanwhile as a unity. To model action sequences in a way which regard both two dimensions, a Graph Convolutional Long Short Term Memory Networks (GC-LSTM) model is proposed in this paper, which automatically learns spatiotemporal features to model the action. Our model introduces the GCN operation into conventional RNN unit including graph convolution at each time step for input-to-state and state-to-state transition. Plenty of experiment analyses show that the proposed GC-LSTM model strives (1) to focus more on discriminative parts at discriminative frames and (2) to be insensitive to the redundant parts which are irrelevant for recognition. Moreover, several methods are compared with ours on two publicly available datasets and experimental results demonstrate that our model achieves the state-of-the-art performance.
- Published
- 2019
- Full Text
- View/download PDF
13. Spatial Mask ConvLSTM Network and Intra-Class Joint Training Method for Human Action Recognition in Video
- Author
-
Yuanlin Zhang, Jingjun Chen, and Yonghong Song
- Subjects
Pixel ,business.industry ,Computer science ,05 social sciences ,Feature extraction ,Pattern recognition ,010501 environmental sciences ,01 natural sciences ,Class (biology) ,Receptive field ,Position (vector) ,0502 economics and business ,RGB color model ,Artificial intelligence ,050207 economics ,business ,Joint (audio engineering) ,0105 earth and related environmental sciences - Abstract
For action recognition, attention model is widely used, but most of them lack consideration of the relationship of spatial and temporal information. We thus propose a Spatial Mask ConvLSTM Network (SM_ConvLSTM-Net) to determine the attention score of each pixel position. SM_ConvLSTM-Net is used to combine the information of space and time for getting more precise spatial mask, which has a long receptive field in time domain. Furthermore, to combine the connection of different samples from same category, a novel training method called intra-class joint training method is proposed to make network extract the common characteristics related to actions of the same class in different background. Extensive experiments illustrate the effectiveness of our method and our method significantly outperforms the baseline C3D network on UCF101 and HMDB51. Moreover, our approach achieves the best performance on UCF101 and a compared result on HMDB51 in comparison to some state-of-the-art approaches with RGB input.
- Published
- 2019
- Full Text
- View/download PDF
14. A hierarchical recursive method for text detection in natural scene images
- Author
-
Yuanlin Zhang, Jingmin Xin, Yonghong Song, and Xiaobing Wang
- Subjects
Connected component ,Artificial neural network ,Computer Networks and Communications ,Computer science ,business.industry ,media_common.quotation_subject ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Image (mathematics) ,Hardware and Architecture ,Reading (process) ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,020201 artificial intelligence & image processing ,Segmentation ,Enhanced Data Rates for GSM Evolution ,Artificial intelligence ,Line (text file) ,business ,Software ,media_common - Abstract
Text detection in natural scene images is a challenging problem in computer vision. To robust detect various texts in complex scenes, a hierarchical recursive text detection method is proposed in this paper. Usually, texts in natural scenes are not alone and arranged into lines for easy reading. To find all possible text lines in an image, candidate text lines are obtained using text edge box and conventional neural network at first. Then, to accurately find out the true text lines in the image, these candidate text lines are analyzed in a hierarchical recursive architecture. For each of them, connected components segmentation and hierarchical random field based analysis are recursively employed until the detected text line no more changes. Now the detected text lines are output as the text detection result. Experiments on ICDAR 2003 dataset, ICDAR 2013 dataset and Street View Dataset show that the hierarchical recursive architecture can improve text detection performance and the proposed method achieves the state-of-art in scene text detection.
- Published
- 2016
- Full Text
- View/download PDF
15. Improved Sacked Denoising Autoencoders-Based Defect Detection in Bar Surface
- Author
-
Yonghui He, Yonghong Song, and Qianwen Lv
- Subjects
Normalization (statistics) ,0209 industrial biotechnology ,Computer science ,business.industry ,Noise reduction ,Deep learning ,Activation function ,Pattern recognition ,02 engineering and technology ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
Traditional pattern recognition methods are widely used to detect defects in the industry, however most of existing methods are not universal for all kinds of defects on bars. This paper proposes a method which combines Rectified Linear Units (ReLU) and Batch Normalization(BN) in Sacked Denoising Autoencoders(SDA) for surface defect detection of bars. Gradient diffusion often occurs in traditional SDA, which leading to inefficient learning. In order to solve gradient diffusion, we replace the Sigmod activation function with ReLU function. The activation value calculated by ReLU is often oversparing which leading to loss of features. For solving the oversparing of ReLU, we add BN layer into SDA to normalize each batch. Finally we obtain network weights through unsupervised pre-training and supervised fine-tuning. We train two models which one for prediction and the other for reconstruction. Experiment results show that the proposed method can achieve an average accuracy rate of 99.1% on our data set. Compared with the traditional pattern recognition method, traditional SDA and Fisher criterion-based stacked denoising autoencoders(FCSDA), our method shows higher accuracy and TPR. Moreover, due to the addition of BN, the time complexity of our method is significantly lower than the SDA and FCSDA.
- Published
- 2018
- Full Text
- View/download PDF
16. Feature Fusion for Weakly Supervised Object Localization
- Author
-
Xu Tang, Yuanlin Zhang, and Yonghong Song
- Subjects
Pointwise ,Connected component ,Feature fusion ,business.industry ,Computer science ,Pattern recognition ,02 engineering and technology ,Pascal (programming language) ,010501 environmental sciences ,01 natural sciences ,Convolutional neural network ,Minimum bounding box ,Pyramid ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,0105 earth and related environmental sciences ,computer.programming_language - Abstract
Improving the precision of weakly supervised multiscale objects localization is of significant challenge in computer vision, especially when tackling the small objects. in this paper, we propose to integrate the feature pyramid network (FPN) with convolutional neural network (CNN) for weakly supervised object localization, where the FPN is built upon the outputs of different layers of the CNN. Then, we upsample the high-level maps by nearest-neighbor interpolation and fuse with the low-level maps in the FPN to produce multi-scale fused maps which features of both high resolution and strong semantics. Finally, we produce class activation maps by each layer of the FPN and gain multiple prediction scores by wildcat spatial pooling. To acquire more precise localization, we select the class activation map that corresponds to the highest score across all multiscale maps for object localization. In particular, we choose the maximum response regions of the class activation map for pointwise localization and choose the largest connected component above the threshold in the class activation map for bounding box localization. By applying the proposed strategy over PASCAL VOC dataset and MS COCO dataset, it is demonstrated that our strategy is highly effective in improving the precision of weakly supervised object localization as compared with some of the state-of-the-art weakly supervised methods.
- Published
- 2018
- Full Text
- View/download PDF
17. Face Aging with Improved Invertible Conditional GANs
- Author
-
Yuanlin Zhang, Yonghong Song, and Li Jia
- Subjects
business.industry ,Computer science ,Age progression ,Pattern recognition ,02 engineering and technology ,Iterative reconstruction ,law.invention ,Image (mathematics) ,Invertible matrix ,law ,Face (geometry) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Encoder - Abstract
Due to the continuous development of GAN, vivid faces can be generated, and the use of GAN for face aging becomes a novel trend. However, many existing works for face aging require tedious pre-processing of datasets. This brings a lot of computational burden and limits the application of face aging. In order to solve these problems, a face aging network is constructed using IcGAN without any data pre-processing which map a face image into personality and age vector spaces through encoders Z and Y. Different from the previous work, we make an emphasis on the preservation of both personalized and aging features. Thus, the minimize absolute reconstructing loss is proposed to optimize vector z, which can remain the personality characteristics, meanwhile preserving the pose, hairstyle and background of the input face. Additionally, we introduce a novel age vector optimization approach by classifying reconstruction loss and introduce the parameter λ which is well-balanced between large age features and subtle texture features. The experimental results demonstrate our proposed AIGAN provides better aging faces over other state-of-the-art age progression methods.
- Published
- 2018
- Full Text
- View/download PDF
18. Which Part is Better: Multi-Part Competition Network for person Re-Identification
- Author
-
Yuanlin Zhang, Peng Du, and Yonghong Song
- Subjects
Computer science ,business.industry ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,Construct (python library) ,010501 environmental sciences ,01 natural sciences ,Task (computing) ,Discriminative model ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Layer (object-oriented design) ,business ,0105 earth and related environmental sciences - Abstract
Person re-identification is a challenging task due to the background clutters, occlusion and illumination variations. In addition, the pedestrian misalignment always exists in some automatic-detection datasets. In this paper, we propose a Multi-Part Competition Network (MPCN) consisting of Multi-Part Network (MPN) and Part Competition Network (PCN), which aims to solve the misalignment problem caused by the detector errors and human pose variations. First, we construct original body parts and enlarged body parts using human pose estimation algorithm. These two kinds of body parts not only alleviate the misalignment from background and varying human pose but also solve the missing details and imprecise body parts introduced by human pose estimator. Then, we use MPN to acquire global features and two different body parts features. The components of MPN, a global branch and two part branches, are combined by ROI pooling layer. Finally, we apply PCN to achieve a tradeoff between the original body parts and the enlarged body parts and acquire discriminative part features from these two different body parts. Extensive evaluations on three widely used re-id datasets, Market-1501, CUHK03, VIPeR demonstrate that our proposed network have a competitive result compared to the state-of-the-art methods.
- Published
- 2018
- Full Text
- View/download PDF
19. A Bidirectional Information Aggregation Architecture for Scene Text Detection
- Author
-
Yuanlin Zhang, Li Xiaoyu, and Yonghong Song
- Subjects
Computer science ,business.industry ,Detector ,Feature extraction ,Pattern recognition ,02 engineering and technology ,Text detection ,010501 environmental sciences ,01 natural sciences ,Bounding overwatch ,Information aggregation ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Deconvolution ,Artificial intelligence ,Architecture ,business ,Classifier (UML) ,0105 earth and related environmental sciences - Abstract
TextBoxes[1] is one of the most advanced text detection method in both aspects of accuracy and efficiency, but it is still not very sensitive to the small text in natural scenes and often can not localize text regions precisely. To tackle these problems, we first present a Bidirectional Information Aggregation (BIA) architecture by effectively aggregating multi-scale feature maps to enhance local details and strengthen context information, making the detector not only work reliably on multi-scale text, especially the small text, but also predict more precise boxes for texts. This architecture also results in a single classifier network, which allows our model to be trained much faster and easily with better generalization power. Then, we propose to use multiple symmetrical feature maps for feature extraction in the test stages for further improving the performance on the small text. To further promote precise predicting boxes, we present a statistical grouping method that operates on the training set bounding boxes to generate aspect ratios for default boxes. Finally, our model not only outperforms the TextBoxes without much time overhead, but also provides promising performance compared to the recent state-of-theart methods on the ICDAR 2011 and 2013 database.
- Published
- 2018
- Full Text
- View/download PDF
20. Scene text localization using edge analysis and feature pool
- Author
-
Yuanlin Zhang, Chong Yu, and Yonghong Song
- Subjects
Computer science ,Character (computing) ,business.industry ,Cognitive Neuroscience ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Filter (signal processing) ,Edge detection ,Computer Science Applications ,Image (mathematics) ,Support vector machine ,Task (computing) ,Artificial Intelligence ,Feature (computer vision) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Enhanced Data Rates for GSM Evolution ,business - Abstract
Due to the rapid development of machine learning and data mining in nowadays, how to acquire information from images becomes more and more important. The direct information of an image is the texts inside. However, detecting such texts in images is always a challenging problem in computer vision area.Edge is one of the most important clues in scene character detection task. However, many edge based text detection methods usually had trouble with sticky edges and did not fully utilize characteristic of texts. In this paper, we proposed a method for detecting and localizing texts in natural scene images, by edge recombining, edge filtering and multi-channel processing. In order to segment texts from backgrounds, edges are firstly over-segmented into edge segments during edge analysis. These edge segments are then recombined to candidate characters and an edge filter is used to filter out most of background edges. The left candidate character edges are linked up to candidate text lines. We use two different classifiers to filter out non-text lines. To classify more accurately, extracted edge-based and region-based features are firstly stored in feature pools. Then we use liner SVM to select the most effective features from the feature pool to train classifiers. Finally, multi-channel is used to ensure the recall and a modified non-maximal suppress is applied to eliminate duplicate results. Experimental results on the ICDAR 2011 competition dataset and SVT database demonstrate the effectiveness of our method.
- Published
- 2016
- Full Text
- View/download PDF
21. Multilingual corpus construction based on printed and handwritten character separation
- Author
-
Yuping Lin, Kai He, Fang Wang, Yingyu Li, and Yonghong Song
- Subjects
Computer Networks and Communications ,Computer science ,business.industry ,Character (computing) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,computer.software_genre ,Image (mathematics) ,Set (abstract data type) ,Hardware and Architecture ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,020201 artificial intelligence & image processing ,Artificial intelligence ,AdaBoost ,Precision and recall ,business ,computer ,Connected-component labeling ,Software ,Natural language processing - Abstract
This paper proposes an effective method to extract printed and handwritten characters from multilingual document images to build corpus. To extract the characters from the document images, a connected component analysis method is used to remove the graphics. After that, multiple types of features and AdaBoost algorithm are introduced to classify printed and handwritten characters in a more versatile and robust way. Firstly, the content of the image is divided into several text patches which are then used to distinguish different languages. Secondly, we use the multiple types of features and AdaBoost algorithm to train the classifiers based on the segmented patches. Finally, we can separate printed and handwritten parts of new image set by the trained classifiers. The proposed method improves the precision of the extraction of written materials in text images of different languages. Experimental results demonstrate that the proposed method is more accurate in terms of precision and recall rate compared with the state-of the-art methods.
- Published
- 2015
- Full Text
- View/download PDF
22. Fast document image comparison in multilingual corpus without OCR
- Author
-
Yuping Lin, Fang Wang, Yonghong Song, and Yingyu Li
- Subjects
060201 languages & linguistics ,Matching (graph theory) ,Computer Networks and Communications ,Computer science ,business.industry ,Character (computing) ,Template matching ,Feature extraction ,Pattern recognition ,06 humanities and the arts ,Similarity measure ,Set (abstract data type) ,Hardware and Architecture ,0602 languages and literature ,Media Technology ,Segmentation ,Artificial intelligence ,Projection (set theory) ,business ,Software ,Information Systems - Abstract
This paper proposes a method to compare document images in multilingual corpus, which is composed of character segmentation, feature extraction and similarity measure. In character segmentation, a top-down strategy is used. We apply projection and self-adaptive threshold to analyze the layout and then segment the text line by horizontal projection. Then, English, Chinese and Japanese are recognized by different methods based on the distribution and ratios of text line. Finally, character segmentation with different languages is done using different strategies. In feature extraction and similarity measure, four features are given for coarse measurement, and then a template is set up. Based on the templates, a fast template matching method based on coarse-to-fine strategy and bit memory is presented for precise matching. The experimental results demonstrate that our method can handle multilingual document images of different resolutions and font sizes with high precision and speed.
- Published
- 2015
- Full Text
- View/download PDF
23. Natural scene text detection with multi-layer segmentation and higher order conditional random field based analysis
- Author
-
Yuanlin Zhang, Jingmin Xin, Xiaobing Wang, and Yonghong Song
- Subjects
Conditional random field ,Connected component ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Artificial Intelligence ,Cut ,Signal Processing ,Pattern recognition (psychology) ,RGB color model ,Segmentation ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Connected-component labeling ,Software - Abstract
The contrasts in RGB channels are integrated to segment image into multi layers.The multi-layer segmentation is implemented with a graph cuts based model.A higher order CRF based connected component analysis is used. Text detection in natural scene images is a hot and challenging problem in pattern recognition and computer vision. Considering the complex situations in natural scene images, we propose a robust two-steps method in this paper based on multi-layer segmentation and higher order conditional random field (CRF). Given an input image, the method separates text from its background by using multi-layer segmentation, which decomposes the input image into nine layers. Then, the connected components (CCs) in these different layers are obtained as candidate text. These candidate text CCs are verified by higher order CRF based analysis. Inspired from the multistage information integration mechanism of visual brains, features from three different levels, including separate CCs, CC pairs and CC strings, are integrated by a higher order CRF model to distinguish text from non-text. The remaining CCs are then grouped into words for easy evaluation. Experiments on the ICDAR datasets and street view dataset show that the proposed method achieves the state-of-art in natural scene text detection.
- Published
- 2015
- Full Text
- View/download PDF
24. Multi-view gait recognition using 2D-EGEI and NMF
- Author
-
Yuanlin Zhang, Yonghong Song, and Chen Wu
- Subjects
Transformation (function) ,Gait (human) ,Discriminant ,Computer science ,business.industry ,Deep learning ,Feature extraction ,Pattern recognition ,Iterative reconstruction ,Artificial intelligence ,Linear discriminant analysis ,business ,Non-negative matrix factorization - Abstract
View Transformation Model(VTM) is a widely used method to solve the multi-view problem in gait recognition. But accuracy loss always occurs during the view transformation procedure, especially when the difference of viewing angles between two gait features grows. On one hand, faced with this difficulty, 2D Enhanced GEI(2D-EGEI) is proposed to extract effective gait features by using the reconstruction of 2DPCA. On the other hand, Nonnegative Matrix Factorization(NMF) is adopted to learn local structured features for supplying accuracy loss. Moreover, 2D Linear Discriminant Analysis(2DLDA) is introduced to project features into a discriminant space to improve classification ability. Compared with two deep learning methods, experimental results prove that the proposed method significantly outperforms the Stack Aggressive Auto-Encoder(SPAE) method, and could get close to the deep CNN network method.
- Published
- 2018
- Full Text
- View/download PDF
25. Multi-view Gait Recognition Method Based on RBF Network
- Author
-
Yaru Qiu and Yonghong Song
- Subjects
Biometrics ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,Variation (game tree) ,Image (mathematics) ,Identification (information) ,ComputingMethodologies_PATTERNRECOGNITION ,Gait (human) ,Discriminative model ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Focus (optics) ,Eigenvalues and eigenvectors - Abstract
Gait is an important biometrics in human identification, but the view variation problem seriously affects the accuracy of gait recognition. Existing methods for multi-view gait-based identification mainly focus on transforming the features of one view to another view, which might be unsuitable for the real applications. In this paper, we propose a multi-view gait recognition method based on RBF network that employs a unique view-invariant model. First, extracts the gait features by calculating the gait individual image (GII), which could better capture the discriminative information for cross view gait recognition. Then, constructs a joint model, use the DLDA algorithm to project the model and get a projection matrix. Finally, the projected eigenvectors are classified by RBF network. Experiments have been conducted in the CASIA-B database to prove the validity of the proposed method. Experiment results shows that our method performs better than the state-of-the-art multi-view methods.
- Published
- 2018
- Full Text
- View/download PDF
26. Scene text detection based on skeleton-cut detector
- Author
-
Yuanlin Zhang, Yonghong Song, and He Xiang
- Subjects
Pixel ,Computer science ,business.industry ,Detector ,0202 electrical engineering, electronic engineering, information engineering ,020207 software engineering ,020201 artificial intelligence & image processing ,Pattern recognition ,02 engineering and technology ,Text detection ,Artificial intelligence ,business ,Classifier (UML) - Abstract
As the structural information of an object can be well descripted by edge pixels, we observe that the greatest challenge for locating text edges on scene image is how to handle the edge-adhesion problem. In this paper we propose the Skeleton-cut Text Detector, which take text-specific edge cues such as a novel presentation skeleton into account to hunt text efficiently with improved recall rate. To address edge-adhesion problem, skeleton-junctions detection and elimination are performed first to cut candidate text out of the edge map. Then the candidates are verified through a two-stage classifier based on properties like concentration ratio. Finally iteratively local refinement (IRL) is applied to enhance the overlap of proposals. Experimental results on public benchmarks, ICDAR 2013 and MSRA, demonstrate that our algorithm achieves state-of-the-art performance. Moreover in severe scenarios, our proposed method shows stronger adaptability to texts by exploiting skeleton compared to conventional presentations like MSERs.
- Published
- 2017
- Full Text
- View/download PDF
27. Human skeleton tree recurrent neural network with joint relative motion feature for skeleton based action recognition
- Author
-
Yuanlin Zhang, Shenghua Wei, and Yonghong Song
- Subjects
business.industry ,Computer science ,Node (networking) ,010401 analytical chemistry ,Feature extraction ,Pattern recognition ,02 engineering and technology ,Skeleton (category theory) ,01 natural sciences ,0104 chemical sciences ,Tree (data structure) ,Human skeleton ,Recurrent neural network ,Tree structure ,medicine.anatomical_structure ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,medicine ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
Recently, the recurrent neural network(RNN) has been widely used for skeleton based action recognition because of its ability to model long-term temporal dependencies automatically. However, current methods cannot accurately describe the characteristics of actions, because they only consider joint positions rather than high order features like relative motion to different joints and ignore the impact of human physical structure. In this paper, a novel high order joint relative motion feature(JRMF) and a novel human skeleton tree RNN network(HST-RNN) are proposed. Human skeleton joints structure can be represented by a tree. The JRMF for each skeleton joint consists of the relative position, velocity and acceleration to this joint of all its descendant joints. It describes the instantaneous status of the skeleton joint better than joint positions. The HST-RNN network is constructed with the same tree structure as the human skeleton joints. Each node of the tree is a Gated Recurrent Unit(GRU) and represents a skeleton joint. The outputs of its child nodes and the corresponding JRMF are concatenated and fed into each GRU. The network combines low-level features and extracts high level features from the leaf nodes to the root node in a hierarchical way according to the human physical structure. The experimental results demonstrates that the proposed HST-RNN with JRMF achieves the state-of-art performance on challenging datasets like MSR-Action3D, UT-Kinect and UTD-MHAD.
- Published
- 2017
- Full Text
- View/download PDF
28. Scene text extraction with local symmetry transform
- Author
-
Yuanlin Zhang, Chen Qi, and Yonghong Song
- Subjects
Pixel ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,Text recognition ,Local symmetry ,0202 electrical engineering, electronic engineering, information engineering ,Feature descriptor ,020201 artificial intelligence & image processing ,Segmentation ,Computer vision ,Artificial intelligence ,Invariant (mathematics) ,business ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
As an important part of scene understanding, scene text extraction can promote the performance of text recognition significantly. This paper focuses on extracting text pixels from background in images of natural scenes. First, a novel feature descriptor called local symmetry transform is proposed to detect local symmetry relationship of strokes for each image pixel. The presented method is local and color invariant, which makes it robust enough to handle many complex situations for scene text extraction such as variation of lighting, blur of text regions, shadow and complex background. Then, guided filter is introduced to enhance the local symmetry map produced by the local symmetry transform, which can smooth the feature map and reduce noise noticeably. Finally, text candidates are generated as seed points, and by seed-based segmentation we automatically judge the text polarity to obtain correct text regions. Experimental results on KAIST dataset show that our method leads to a high performance.
- Published
- 2017
- Full Text
- View/download PDF
29. Scene text detection based on multi-scale SWT and edge filtering
- Author
-
Yuanlin Zhang, Yonghong Song, and Yuanyuan Feng
- Subjects
Computer science ,business.industry ,Feature extraction ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Text detection ,Reflection (mathematics) ,Pyramid ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Pyramid (image processing) ,Enhanced Data Rates for GSM Evolution ,Artificial intelligence ,Scale (map) ,business - Abstract
This paper presents a text detection method based on multi-scale Stroke Width Transform (SWT). First, an image pyramid is built and SWT is performed on each level of the pyramid. Second, edge components are filtered using two novel features, stroke pair ratio (SPR) and edge density of a connected component (EDC). Next, the remaining edge components on each level are grouped into text lines. And these lines are projected back onto a single image and merged. Finally, candidate text lines are verified by integrating block level features and line level features. The multi-scale mechanism makes it possible to detect text defected by reflection or blurring. And the two features are proved to be both effective and efficient in filtering non-text edges. Moreover, experimental results on the ICDAR Robust Reading Competition datasets show that the proposed text detection method provides promising performance.
- Published
- 2016
- Full Text
- View/download PDF
30. Hand gesture recognition using view projection from point cloud
- Author
-
Yonghong Song, Yuanlin Zhang, and Chaoyu Liang
- Subjects
Computer science ,business.industry ,Feature vector ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Point cloud ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Image segmentation ,Convolutional neural network ,Feature (computer vision) ,Gesture recognition ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Segmentation ,Artificial intelligence ,business ,Gesture - Abstract
In this paper we propose a multi-view method to recognize hand gestures using point cloud. The main idea of this paper is to project point cloud into view images and hand gestures are described by extracting and fusing features in view images. The conversion of feature space increases the inner-class similarity and meanwhile reduces the inter-class similarity. The features of view images are extracted in parallel so the scale of each feature extractor can be reduced to converge easily. In our method we perform a refined hand segmentation to segment hand form background firstly. Then the segmented hand point cloud is projected into different view planes to form view images. Next we use convolutional neural networks as feature extractors to extract features of view images. The extracted view image features are fused to form the features of hand gestures. Finally a SVM is trained for hand gesture recognition. The experimental results show that our multiview method achieves higher recognition rate and more robust to the challenging rotation changes especially out-plane rotations.
- Published
- 2016
- Full Text
- View/download PDF
31. Enhanced Active Color Image for Gait Recognition
- Author
-
Yufei Shang, Yuanlin Zhang, and Yonghong Song
- Subjects
Biometrics ,Color image ,Computer science ,business.industry ,Frame (networking) ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,Image (mathematics) ,Gait (human) ,Position (vector) ,0202 electrical engineering, electronic engineering, information engineering ,RGB color model ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Energy (signal processing) - Abstract
Active Energy Image (AEI) is an efficient template for gait recognition. However, the AEI is short of the temporal information. In this paper, we present a novel gait template, named Enhanced Active Color Image (EACI). The EACI is extract the difference of two interval in each gait frame, followed by calculating the width of that difference image and then mapping into RGB space with the ratio, describing the relative position, and composition them to a single EACI. To prove the validity of the EACI, we employ experiments on the USF HUMANID database. Experiment result shows that our EACI describes the dynamic, static and temporal information better. Compared with other published gait recognition approaches, we achieve competitive performance in gait recognition.
- Published
- 2016
- Full Text
- View/download PDF
32. Scene text localization using extremal regions and Corner-HOG feature
- Author
-
Yuanlin Zhang, Yuanyuan Feng, and Yonghong Song
- Subjects
Tree (data structure) ,Local histogram ,Feature (computer vision) ,business.industry ,Computer vision ,Pattern recognition ,Text detection ,Artificial intelligence ,Covariance ,business ,Pruning (morphology) ,Mathematics ,Image (mathematics) - Abstract
This paper presents a text detection method based on Extremal Regions (ERs) and Corner-HOG feature. Local Histogram of Oriented Gradient (HOG) extracted around corners (Corner-HOG) is used to effectively prune the non-text components in the component tree. Experimental results show that the Corner-HOG based pruning method can discard an average of 83.06% of all ERs in an image while preserving a recall of 90.51% of the text components. The remaining ERs are then grouped into text lines and candidate text lines are verified using black-white transition feature and the covariance descriptor of HOG. Experimental results on the 2011 Robust Reading Competition dataset show that the proposed text detection method provides promising performance.
- Published
- 2015
- Full Text
- View/download PDF
33. Text detection in natural scene with edge analysis
- Author
-
Yuanlin Zhang, Quan Meng, Yang Liu, and Yonghong Song
- Subjects
Contextual image classification ,Computer science ,business.industry ,Shadow ,Feature extraction ,Pattern recognition ,Computer vision ,Image segmentation ,Artificial intelligence ,Enhanced Data Rates for GSM Evolution ,business ,Edge detection ,Image gradient - Abstract
Text plays an important role in daily life due to its rich information, thus automatic text detection in natural scenes has many attractive applications. However, detecting such text is a challenge problem, because of the variations of scale, font, color, lighting and shadow. In this paper, we propose a method that detects text in natural scene through two steps of edge analysis, namely candidate edge combination and edge classification. In the step of candidate edge combination, the edge of input image is divided into small segments firstly. Then neighbor edge segments are merged, when they have similar stroke width and color. Through this step, each character is described by one edge segment set. Because sole letter rarely appears in natural scene, in the step of edge classification, candidate edges are aggregated into text chain, following with chain classification based on character-based and chain-based features. In order to evaluate the effectiveness of our method, we run our algorithm on the ICDAR 2011 public database and Street View Text database. The experimental results show that the proposed method provides promising performance in comparison with existing methods.
- Published
- 2013
- Full Text
- View/download PDF
34. Natural Scene Text Detection with Multi-channel Connected Component Segmentation
- Author
-
Yuanlin Zhang, Xiaobing Wang, and Yonghong Song
- Subjects
Connected component ,Markov random field ,business.industry ,Computer science ,Segmentation-based object categorization ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Scale-space segmentation ,Markov process ,Pattern recognition ,Image segmentation ,symbols.namesake ,symbols ,RGB color model ,Computer vision ,Segmentation ,Artificial intelligence ,business - Abstract
Text detection attracts more and more attention these years. But natural scene text detection is still a challenge problem due to the variations of text and the complexity of the background. In this paper an efficient text detection method with multi-channel connected component segmentation is proposed. First, connected component segmentation is done using Markov Random Field with local contrasts, colors and gradients of RGB channels. Three segmentation images are obtained corresponding to the three channels. Then, non-text connected components in the three segmentation images are removed. Finally, the remaining text components in the three segmentation images are merged and then grouped into words. Experiments on the ICDAR 2003 dataset and the ICDAR2011 dataset demonstrate that this method compares favorably with the state-of-the-art methods.
- Published
- 2013
- Full Text
- View/download PDF
35. A Novel Multi-oriented Chinese Text Extraction Approach from Videos
- Author
-
Yuanlin Zhang, Yang Liu, Yonghong Song, and Quan Meng
- Subjects
Contextual image classification ,business.industry ,Orientation (computer vision) ,Computer science ,Search engine indexing ,Feature extraction ,Pattern recognition ,Image segmentation ,Optical character recognition ,computer.software_genre ,Support vector machine ,Text mining ,Segmentation ,Computer vision ,Artificial intelligence ,business ,computer - Abstract
Video texts contain useful high level information which contributes to video indexing and retrieval. This paper proposed a novel video text extraction method for Chinese text of any orientation. Firstly, candidate text regions are detected by a wavelet based algorithm. Secondly, horizontal, vertical, slant, curve or arc text lines are merged by color and space relationship based on arrangement structures of multi-oriented text lines in these candidate regions. And then character segmentation will automatically choose best fit strategies based on structure analysis, due to the complex structures (single, up-down, left-right, encircling) of Chinese characters. Finally, a SVM classifier eliminates false positives. The experimental results show the proposed approach is robust for Chinese texts of any orientation in videos.
- Published
- 2013
- Full Text
- View/download PDF
36. Text Detection in Natural Scenes with Salient Region
- Author
-
Yonghong Song and Quan Meng
- Subjects
Conditional random field ,Connected component ,Similarity (geometry) ,business.industry ,Computer science ,Computation ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Set (abstract data type) ,Salient ,Histogram ,Computer vision ,Artificial intelligence ,business ,Connected-component labeling - Abstract
In this paper, we present a novel approach to detect text in natural scenes. This approach is a type of bionic method, which imitates how human beings detect text exactly and robustly. Practically, human beings follow two steps to detect text: the first step is to find salient regions in a scene and the second step is to determine whether these salient regions are text or not. Therefore, two similar steps namely salient regions computation and text localization are used in our method. In the step of salient regions computation, a set of salient features including multi-sacle contrast, modified center-surround histogram, color spatial distribution and similarity of stroke width are used to describe an image, following with computation of salient regions based on the combination of Conditional Random Fields model and above features. Because sole letter rarely appear, in the step of text localization, salient regions are segmented and the connected components are grouped into text strings based on their features such as spatial relationships, color difference and stroke width. As an elementary unit, the text string is refined by connected component analysis. We tested the effectiveness of our method on the ICDAR 2003 database. The experimental results show that the proposed method provides promising performance in comparison with existing methods.
- Published
- 2012
- Full Text
- View/download PDF
37. A new approach of color image quantization based on Normalized Cut algorithm
- Author
-
Jin Zhang, Yuanlin Zhang, Wang Xiao-bing, and Yonghong Song
- Subjects
Color histogram ,Color image ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Color balance ,Pattern recognition ,Data_CODINGANDINFORMATIONTHEORY ,Color space ,Color quantization ,Median cut ,Color depth ,High color ,Artificial intelligence ,business ,Algorithm ,ComputingMethodologies_COMPUTERGRAPHICS ,Mathematics - Abstract
This paper presents a novel color quantization method based on Normalized Cut clustering algorithm, in order to generate a quantized image with the minimum loss of information and the maximum compression ratio, which benefits the storage and transmission of the color image. This new method uses a deformed Median Cut algorithm as a coarse partition of color pixels in the RGB color space, and then take the average color of each partition as the representative color of a node to construct a condensed graph. By employing the Normalized Cut clustering algorithm, we could get the palette with defined color number, and then reconstruct the quantized image. Experiments on common used test images demonstrate that our method is very competitive with state-of-the-art color quantization methods in terms of image quality, compression ratio and computation time.
- Published
- 2011
- Full Text
- View/download PDF
38. A spatial FCM color quantization algorithm with pyramid data structure
- Author
-
Yuanlin Zhang, Yonghong Song, and Xiaobing Wang
- Subjects
Linde–Buzo–Gray algorithm ,Mathematics::General Mathematics ,Color image ,business.industry ,Computer Science::Information Retrieval ,Fuzzy set ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Data_CODINGANDINFORMATIONTHEORY ,Data structure ,Fuzzy logic ,Color quantization ,ComputingMethodologies_PATTERNRECOGNITION ,Computer Science::Computational Engineering, Finance, and Science ,Computer Science::Computer Vision and Pattern Recognition ,Pyramid (image processing) ,Artificial intelligence ,business ,Quantization (image processing) ,Algorithm ,Mathematics - Abstract
Fuzzy C-Means (FCM) algorithm is an important color quantization technology. Though it is widely used, its runtime is long and its quantization result is not good enough. In the paper, a spatial FCM color quantization algorithm which uses pyramid data structure for the hierarchical analysis of a color image is proposed. Experiments show that the algorithm has better quantization result and shorter runtime than the conventional FCM algorithm and the spatial FCM algorithm before.
- Published
- 2011
- Full Text
- View/download PDF
39. A Handwritten Character Extraction Algorithm for Multi-language Document Image
- Author
-
Yuanlin Zhang, Guilin Xiao, Liuliu Zhao, Yonghong Song, and Lei Yang
- Subjects
Connected component ,Markov random field ,Contextual image classification ,Language identification ,Computer science ,business.industry ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Context (language use) ,Image segmentation ,computer.software_genre ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Artificial intelligence ,business ,computer ,Connected-component labeling ,Natural language processing - Abstract
In this paper, we propose a novel method for extracting handwritten characters from multi-language document images, which may contain various types of characters, e.g. Chinese, English, Japanese or their mixture. Firstly, text patches in document image are segmented based on connected component analysis. Rules for merging connected components are chosen according to the results of language identification. Then features are extracted for each basic analysis unit-text patch. Genetic algorithm is applied for feature fusion and patch type classification. Finally, a Markov Random Field model is utilized as a post-processing step to further correct the misclassification of text patch type by considering the document context. Experimental results show that the proposed algorithm can apparently improve the performance of handwritten character extraction.
- Published
- 2011
- Full Text
- View/download PDF
40. A robust inverse halftoning algorithm based on parameter estimation for AM halftone image
- Author
-
Yong Xudong, Yonghong Song, and Yuanlin Zhang
- Subjects
Halftone ,Estimation theory ,business.industry ,Binary image ,Feature extraction ,Pattern recognition ,Grayscale ,Edge detection ,Robustness (computer science) ,Artificial intelligence ,business ,Algorithm ,Image retrieval ,Mathematics - Abstract
In printing industry, Amplitude Modulation (AM) screening is one of the most important halftone methods to create dot screen, and is used to convert continuous tone image into binary image. On the contrary, in order to get a good visual effect and to do further OCR, we need to remove dot screen by employing inverse halftoning algorithm, however, it is difficult to find a tradeoff between screen suppression and edge details preservation. In this paper, a robust inverse halftoning algorithm based on parameter estimation is proposed, which can deal with the above difficulty perfectly. The proposed algorithm comprises three steps, extraction of screening region, estimation of screening parameters, and inverse transform. Experimental results show that the proposed method is independent with parameters and is robust to obtain good visual performance.
- Published
- 2011
- Full Text
- View/download PDF
41. Document Images Retrieval Based on Multiple Features Combination
- Author
-
Gaofeng Meng, Yonghong Song, Yuanlin Zhang, and Nanning Zheng
- Subjects
Structure (mathematical logic) ,Computer science ,Local binary patterns ,business.industry ,Feature extraction ,Pattern recognition ,Image (mathematics) ,Feature (computer vision) ,Histogram ,Computer vision ,Artificial intelligence ,Projection (set theory) ,business ,Image retrieval - Abstract
Retrieving the relevant document images from a great number of digitized pages with different kinds of artificial variations and documents quality deteriorations caused by scanning and printing is a meaningful and challenging problem. We attempt to deal with this problem by combining up multiple different kinds of document features in a hybrid way. Firstly, two new kinds of document image features based on the projection histograms and crossings number histograms of an image are proposed. Secondly, the proposed two features, together with density distribution feature and local binary pattern feature, are combined in a multistage structure to develop a novel document image retrieval system. Experimental results show that the proposed novel system is very efficient and robust for retrieving different kinds of document images, even if some of them are severely degraded.
- Published
- 2007
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.