38 results on '"Wu, Fei"'
Search Results
2. Degradation trajectory prediction of lithium‐ion batteries based on charging‐discharging health features extraction and integrated data‐driven models.
- Author
-
Zheng, Xiujuan, Wu, Fei, and Tao, Liujun
- Subjects
- *
FEATURE extraction , *REMAINING useful life , *LITHIUM-ion batteries , *ELECTRIC charge , *CONVOLUTIONAL neural networks , *SHORT-term memory , *LONG-term memory - Abstract
As one of the key technologies in battery management system, accurate remaining useful life (RUL) prediction is critical to guarantee the reliability and safety for electrical equipment. However, the generalization and robustness of a single method are limited. A novel fusion data‐driven RUL prediction method CSSA‐ELM‐LSSVR based on charging‐discharging health features extraction is proposed in this paper, which fusions chaotic sparrow search algorithm (CSSA), extreme learning machine (ELM), and least squares support vector regression (LSSVR). First, four health indicators (HIs) are extracted from the charging‐discharging process, which can reflect the battery degradation phenomenon from multiple perspectives. Then, pearson correlation coefficient is used to numerically analyze the correlation between HIs and battery aging capacities. Second, the extracted HIs are used as the inputs for ELM and LSSVR to predict the degradation trend of battery, where CSSA is used for hyperparameters optimization in ELM. Finally, considering that CSSA‐ELM can capture the general trend of degradation curves, while LSSVR can trace the detail changes, a fusion framework based on CSSA‐ELM and LSSVR is proposed for RUL prediction. Two weighting schemes, namely precision‐based weighting (PW) and random forest regressor‐based weighting (RFRW) are put forward to fix the weights of CSSA‐ELM and LSSVR algorithms. Two publicly available datasets from National Aeronautics and Space Administration (NASA) and MIT are adopted to verify the feasibility and effectiveness of the proposed method. The results indicate that the proposed method with any weighting scheme has an overall superior prediction performance for different kinds of batteries compared with CSSA‐ELM, LSSVR, convolution neural network and long short term memory. Moreover, the RFRW scheme has better overall performance. Specifically, the maximum root mean square error of the predicted method is 2.5126%, the mean absolute percentage error is 12.9167%, the mean absolute error is 1.8376%, the predicted RUL errors are within one cycle, and the determination coefficient R2$R^2$ is above 0.97. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Joint MR image reconstruction and super-resolution via mutual co-attention network.
- Author
-
Chen, Jiacheng, Wu, Fei, and Wang, Wanliang
- Subjects
MAGNETIC resonance imaging ,HIGH resolution imaging ,IMAGE reconstruction ,FEATURE extraction ,MAGNETIC domain ,DIAGNOSIS - Abstract
In the realm of medical diagnosis, recent strides in deep neural network-guided magnetic resonance imaging (MRI) restoration have shown promise. Nevertheless, persistent drawbacks overshadow these advancements. Challenges persist in balancing acquisition speed and image quality, while existing methods primarily focus on singular tasks like MRI reconstruction or super-resolution (SR), neglecting the interplay between these tasks. To tackle these challenges, this paper introduces the mutual co-attention network (MCAN) specifically designed to concurrently address both MRI reconstruction and SR tasks. Comprising multiple mutual cooperation attention blocks (MCABs) in succession, MCAN is tailored to maintain consistency between local physiological details and global anatomical structures. The intricately crafted MCAB includes a feature extraction block, a local attention block and a global attention block. Additionally, to ensure data fidelity without compromising acquired data, we propose the channel-wise data consistency block. Thorough experimentation on the IXI and fastMRI dataset showcases MCAN's superiority over existing state-of-the-art methods. Both quantitative metrics and visual quality assessments validate the enhanced performance of MCAN in MRI restoration. The findings underscore MCAN's potential in significantly advancing therapeutic applications. By mitigating the trade-off between acquisition speed and image quality while simultaneously addressing both MRI reconstruction and SR tasks, MCAN emerges as a promising solution in the domain of magnetic resonance image restoration. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Group recursive discriminant subspace learning with image set decomposition
- Author
-
Wu, Fei, Jing, Xiao-Yuan, Yao, Yong-Fang, Yue, Dong, and Chen, Jun
- Published
- 2016
- Full Text
- View/download PDF
5. RT‐Unet: An advanced network based on residual network and transformer for medical image segmentation.
- Author
-
Li, Bo, Liu, Sikai, Wu, Fei, Li, GuangHui, Zhong, Meiling, and Guan, Xiaohui
- Subjects
COMPUTER-assisted image analysis (Medicine) ,IMAGE segmentation ,DEEP learning ,DIAGNOSTIC imaging ,IMAGE processing ,FEATURE extraction - Abstract
For the past several years, semantic segmentation method based on deep learning, especially Unet, have achieved tremendous success in medical image processing. The U‐shaped topology of Unet can well solve image segmentation tasks. However, due to the limitation of traditional convolution operations, Unet cannot realize global semantic information interaction. To address this problem, this paper proposes RT‐Unet, which combines the advantages of Transformer and Residual network for accurate medical segmentation. In RT‐Unet, the Residual block is taken as the image feature extraction layer to alleviate the problem of gradient degradation and obtain more effective features. Meanwhile, Skip‐Transformer is proposed, which takes Multi‐head Self‐Attention as the main algorithm framework, instead of the original Skip‐Connection layer in Unet to avoid the influence of shallow features on the network's performance. Besides, we add attention module at the decoder to reduce semantic differences. According to the experiments on MoNuSeg data set and ISBI_2018cell data set, RT‐Unet achieves better segmentation performance than other deep learning‐based algorithms. In addition, a series of further ablation experiments were conducted on Residual network and Skip‐Transformer, which verified the effectiveness and efficiency of the proposed methods in this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
6. Local-Global Graph Pooling via Mutual Information Maximization for Video-Paragraph Retrieval.
- Author
-
Zhang, Pengcheng, Zhao, Zhou, Wang, Nannan, Yu, Jun, and Wu, Fei
- Subjects
VIDEOS ,FEATURE extraction - Abstract
As a task of cross-modal retrieval between long videos and paragraphs, video-paragraph retrieval is a non-trivial task. Unlike traditional video-text retrieval, the video in video-paragraph retrieval usually contains multiple clips. Each clip corresponds to a descriptive sentence; all the sentences constitute the corresponding paragraph of the video. Previous methods for video-paragraph retrieval usually encode videos and para-graphs from segment-level (clips and sentences) and overall-level (videos and paragraphs). However, there are also contents about actions and objects that exist in the segment. Hence, we propose a Local-Global Graph Pooling Network (LGGP) via Mutual Information Maximization for video-paragraph retrieval. Our model disentangles videos and paragraphs into four levels: overall-level, segment-level, motion-level, and object-level. We construct the Hierarchical Local Graph (segment-level, motion-level, and object-level) and the Hierarchical Global Graph (overall-level, segment-level, motion-level, and object-level), respectively, for semantic interaction among different levels. Meanwhile, to obtain hierarchical pooling features with fine-grained semantic information, we design hierarchical graph pooling methods to maximize the mutual information between pooling features and corresponding graph nodes. We evaluate our model on two video-paragraph retrieval datasets with three different video features. The experimental results show that our model establishes state-of-the-art results for video-paragraph retrieval. Our code will be released at https://github.com/PengchengZhang1997/LGGP. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. Memory-Efficient Class-Incremental Learning for Image Classification.
- Author
-
Zhao, Hanbin, Wang, Hui, Fu, Yongjian, Wu, Fei, and Li, Xi
- Subjects
KNOWLEDGE transfer ,CLASSIFICATION - Abstract
With the memory-resource-limited constraints, class-incremental learning (CIL) usually suffers from the “catastrophic forgetting” problem when updating the joint classification model on the arrival of newly added classes. To cope with the forgetting problem, many CIL methods transfer the knowledge of old classes by preserving some exemplar samples into the size-constrained memory buffer. To utilize the memory buffer more efficiently, we propose to keep more auxiliary low-fidelity exemplar samples, rather than the original real-high-fidelity exemplar samples. Such a memory-efficient exemplar preserving scheme makes the old-class knowledge transfer more effective. However, the low-fidelity exemplar samples are often distributed in a different domain away from that of the original exemplar samples, that is, a domain shift. To alleviate this problem, we propose a duplet learning scheme that seeks to construct domain-compatible feature extractors and classifiers, which greatly narrows down the above domain gap. As a result, these low-fidelity auxiliary exemplar samples have the ability to moderately replace the original exemplar samples with a lower memory cost. In addition, we present a robust classifier adaptation scheme, which further refines the biased classifier (learned with the samples containing distillation label knowledge about old classes) with the help of the samples of pure true class labels. Experimental results demonstrate the effectiveness of this work against the state-of-the-art approaches. We will release the code, baselines, and training statistics for all models to facilitate future research. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Modality and Event Adversarial Networks for Multi-Modal Fake News Detection.
- Author
-
Wei, Pengfei, Wu, Fei, Sun, Ying, Zhou, Hong, and Jing, Xiao-Yuan
- Subjects
FAKE news ,SOCIAL media ,IMAGE representation ,IMAGE reconstruction ,ELECTRONIC newspapers - Abstract
With the popularity of news on social media, fake news has become an important issue for the public and government. There exist some fake news detection methods that focus on information exploration and utilization from multiple modalities, e.g., text and image. However, how to effectively learn both modality-invariant and event-invariant discriminant features is still a challenge. In this paper, we propose a novel approach named Modality and Event Adversarial Networks (MEAN) for fake news detection. It contains two parts: a multi-modal generator and a dual discriminator. The multi-modal generator extracts latent discriminant feature representations of text and image modalities. A decoder is adopted to reduce information loss in the generation process for each modality. The dual discriminator includes a modality discriminator and an event discriminator. The discriminator learns to classify the event or the modality of features, and network training is guided by the adversarial scheme. Experiments on two widely used datasets show that MEAN can perform better than state-of-the-art related multi-modal fake news detection methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
9. TapLab: A Fast Framework for Semantic Video Segmentation Tapping Into Compressed-Domain Knowledge.
- Author
-
Feng, Junyi, Li, Songyuan, Li, Xi, Wu, Fei, Tian, Qi, Yang, Ming-Hsuan, and Ling, Haibin
- Subjects
IMAGE segmentation ,VIDEOS ,VIDEO coding ,SPEED ,GRAPHICS processing units - Abstract
Real-time semantic video segmentation is a challenging task due to the strict requirements of inference speed. Recent approaches mainly devote great efforts to reducing the model size for high efficiency. In this paper, we rethink this problem from a different viewpoint: using knowledge contained in compressed videos. We propose a simple and effective framework, dubbed TapLab, to tap into resources from the compressed domain. Specifically, we design a fast feature warping module using motion vectors for acceleration. To reduce the noise introduced by motion vectors, we design a residual-guided correction module and a residual-guided frame selection module using residuals. TapLab significantly reduces redundant computations of the state-of-the-art fast semantic image segmentation models, running 3 to 10 times faster with controllable accuracy degradation. The experimental results show that TapLab achieves 70.6 percent mIoU on the Cityscapes dataset at 99.8 FPS with a single GPU card for the 1024×2048 videos. A high-speed version even reaches the speed of 160+ FPS. Code will be available soon at https://github.com/Sixkplus/TapLab. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. LLM: Learning Cross-Modality Person Re-Identification via Low-Rank Local Matching.
- Author
-
Feng, Yujian, Xu, Jing, Ji, Yi-mu, and Wu, Fei
- Subjects
PROBABILITY measures ,DISTRIBUTION (Probability theory) ,FEATURE extraction - Abstract
Existing methods of cross-modality person re-identification usually use set-level global feature constraints to reduce the cross-modality discrepancy. However they often ignore the modality-specific and modality-shared local feature matching. Modality-specific local matching could help extract discriminative identity-consistent features and alleviate spatial misalignment, and modality-shared local matching could make identity-consistent features be correlated in two modalities and mitigate the modality misalignment. Therefore, we establish the intra-modality and inter-modality co-occurrence relation between identity parts. We observe the modality-specific and modality-shared local feature matching of identity parts as intra-modality and inter-modality low-rank relation-finding problems. Therefore, in this work, we propose low-rank local matching (LLM) approach to establish the intra-modality and inter-modality co-occurrence relation between identity parts. First, in order to reinforce identity-consist features in two modalities and correlate these cross-modality identity-consist features together, the local matching (LM) module is designed to estimate partial co-occurrence probability by measuring local feature similarity. Moreover, we propose cross-modality triplet-center loss (CTLoss) which adds global constraint for each class distribution in embedding space to alleviate dramatic data expansion due to modality variance. Extensive experiments on two datasets demonstrate the superior performance of our approach over the existing state-of-the-art. Our code is released on https://github.com/fegnyujian/LLM. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
11. Training Robust Object Detectors From Noisy Category Labels and Imprecise Bounding Boxes.
- Author
-
Xu, Youjiang, Zhu, Linchao, Yang, Yi, and Wu, Fei
- Subjects
OBJECT recognition (Computer vision) ,CONVOLUTIONAL neural networks ,DETECTORS ,FOOD labeling ,SUPERVISED learning - Abstract
Object detection has gained great improvements with the advances of convolutional neural networks and the availability of large amounts of accurate training data. Though the amount of data is increasing significantly, the quality of data annotations is not guaranteed from the existing crowd-sourcing labeling platforms. In addition to noisy category labels, imprecise bounding box annotations are commonly existed for object detection data. When the quality of training data degenerates, the performance of the typical object detectors is severely impaired. In this paper, we propose a Meta-Refine-Net (MRNet) to train object detectors from noisy category labels and imprecise bounding boxes. First, MRNet learns to adaptively assign lower weights to proposals with incorrect labels so as to suppress large loss values generated by these proposals on the classification branch. Second, MRNet learns to dynamically generate more accurate bounding box annotations to overcome the misleading of imprecisely annotated bounding boxes. Thus, the imprecise bounding boxes could impose positive impacts on the regression branch rather than simply be ignored. Third, we propose to refine the imprecise bounding box annotations by jointly learning from both the category and the localization information. By doing this, the approximation of ground-truth bounding boxes is more accurate while the misleading would be further alleviated. Our MRNet is model-agnostic and is capable of learning from noisy object detection data with only a few clean examples (less than 2%). Extensive experiments on PASCAL VOC 2012 and MS COCO 2017 demonstrate the effectiveness and efficiency of our method. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
12. ResKD: Residual-Guided Knowledge Distillation.
- Author
-
Li, Xuewei, Li, Songyuan, Omar, Bourahla, Wu, Fei, and Li, Xi
- Subjects
TRAINING of student teachers ,KNOWLEDGE transfer - Abstract
Knowledge distillation, aimed at transferring the knowledge from a heavy teacher network to a lightweight student network, has emerged as a promising technique for compressing neural networks. However, due to the capacity gap between the heavy teacher and the lightweight student, there still exists a significant performance gap between them. In this article, we see knowledge distillation in a fresh light, using the knowledge gap, or the residual, between a teacher and a student as guidance to train a much more lightweight student, called a res-student. We combine the student and the res-student into a new student, where the res-student rectifies the errors of the former student. Such a residual-guided process can be repeated until the user strikes the balance between accuracy and cost. At inference time, we propose a sample-adaptive strategy to decide which res-students are not necessary for each sample, which can save computational cost. Experimental results show that we achieve competitive performance with 18.04%, 23.14%, 53.59%, and 56.86% of the teachers’ computational costs on the CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet datasets. Finally, we do thorough theoretical and empirical analysis for our method. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
13. FREE: A Fast and Robust End-to-End Video Text Spotter.
- Author
-
Cheng, Zhanzhan, Lu, Jing, Zou, Baorui, Qiao, Liang, Xu, Yunlu, Pu, Shiliang, Niu, Yi, Wu, Fei, and Zhou, Shuigeng
- Subjects
TEXT recognition ,STREAMING media ,VIDEO surveillance ,GLOBAL optimization ,VIDEOS - Abstract
Currently, video text spotting tasks usually fall into the four-staged pipeline: detecting text regions in individual images, recognizing localized text regions frame-wisely, tracking text streams and post-processing to generate final results. However, they may suffer from the huge computational cost as well as sub-optimal results due to the interferences of low-quality text and the none-trainable pipeline strategy. In this article, we propose a fast and robust end-to-end video text spotting framework named FREE by only recognizing the localized text stream one-time instead of frame-wise recognition. Specifically, FREE first employs a well-designed spatial-temporal detector that learns text locations among video frames. Then a novel text recommender is developed to select the highest-quality text from text streams for recognizing. Here, the recommender is implemented by assembling text tracking, quality scoring and recognition into a trainable module. It not only avoids the interferences from the low-quality text but also dramatically speeds up the video text spotting. FREE unites the detector and recommender into a whole framework, and helps achieve global optimization. Besides, we collect a large scale video text dataset for promoting the video text spotting community, containing 100 videos from 21 real-life scenarios. Extensive experiments on public benchmarks show our method greatly speeds up the text spotting process, and also achieves the remarkable state-of-the-art. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
14. High-low level task combination for object detection in foggy weather conditions.
- Author
-
Hu, Ke, Wu, Fei, Zhan, Zhenfei, Luo, Jun, and Pu, Huayan
- Subjects
- *
OBJECT recognition (Computer vision) , *FOG , *IMAGE processing , *PIXELS , *FEATURE extraction - Abstract
For the object detection task in foggy weather conditions, image dehazing network is often used as preprocessing method to get a clear input. However, there is not strictly a strong positive correlation between image dehazing task and object detection task. Moreover, the preprocessing module can increase the inference time of the whole model to a certain extent. To alleviate these problems, we propose a novel High-Low level task combination network (HLNet) based on multitask learning, which can learn both high-level and low-level tasks. Specially, instead of restoring the features to clear pixel-wise feature space like common image dehazing method, we opt to perform a restoration in feature level to mitigate the influence of the Batch Normalization (BN) layer of encoder on dehazing task. HLNet jointly learn dehazing task and detection task in an end-to-end fashion, which ensures that the weather-specific information in latent feature space is suppressed. Moreover, we applied the HLNet framework on three different object detection networks, including R e t i n a N n e t , Y O L O v 3 and Y O L O v 5 s network, and achieved improvements of 1.7 percent, 2.3 percent, and 1.2 percent in m A P respectively. The experimental results demonstrate the effectiveness and generalization ability of our proposed HLNet framework in real foggy scenarios. • The proposed HLNet initially explores how to combine high-level and low-level tasks and improves the detection performance of the Retinanet on real foggy test dataset without increasing the inference time. • The application of the YOLOv3 model and YOLOv5s model also prove the effectiveness and generalization performance of the strategy in this article. • The contrastive loss is used to enhance the learning of task-relevant factors. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. A Feature Extraction Method of Network Traffic for Time-Frequency Synchronization Applications
- Author
-
Meng Yin, Weizha Ma, Guan Song, Song Manrui, Xiangdong Jiang, Wu Fei, Li Xuebin, Ling Wang, Ming Liu, and Jiang Yundou
- Subjects
Feature (computer vision) ,Computer science ,Synchronization (computer science) ,Feature extraction ,Anomaly detection ,Time domain ,Data mining ,computer.software_genre ,computer ,Time–frequency analysis ,Domain (software engineering) ,Wavelet packet decomposition - Abstract
Network traffic embodies network behaviors and users' activity patterns, which holds many inherent features and dynamic properties. Feature extraction for network traffic in the network plays an significantly important role in time-frequency synchronization applications. How to accurately extract the hidden properties and features of network traffic has an important impact on network activities, such as network failure positioning, anomaly detection, and performance analysis. To this end, this paper propose a new feature extraction method to characterize network traffic. Firstly, the time-frequency analysis theory is used to transform network traffic in time-frequency synchronization applications to the time-frequency domain. Then the cluster analysis theory is used to dig and extract network traffic feature components. And the k-means analysis method is exploited to refine the hidden features of network traffic in the time domain. Finally, to validate our feature analysis method, we conduct an anomaly detection test. Simulation results show that our approach is promising.
- Published
- 2017
16. Adaptive Graph Representation Learning for Video Person Re-Identification.
- Author
-
Wu, Yiming, Bourahla, Omar El Farouk, Li, Xi, Wu, Fei, Tian, Qi, and Zhou, Xue
- Subjects
REPRESENTATIONS of graphs ,DEEP learning ,VIDEOS ,LEARNING - Abstract
Recent years have witnessed the remarkable progress of applying deep learning models in video person re-identification (Re-ID). A key factor for video person Re-ID is to effectively construct discriminative and robust video feature representations for many complicated situations. Part-based approaches employ spatial and temporal attention to extract representative local features. While correlations between parts are ignored in the previous methods, to leverage the relations of different parts, we propose an innovative adaptive graph representation learning scheme for video person Re-ID, which enables the contextual interactions between relevant regional features. Specifically, we exploit the pose alignment connection and the feature affinity connection to construct an adaptive structure-aware adjacency graph, which models the intrinsic relations between graph nodes. We perform feature propagation on the adjacency graph to refine regional features iteratively, and the neighbor nodes’ information is taken into account for part feature representation. To learn compact and discriminative representations, we further propose a novel temporal resolution-aware regularization, which enforces the consistency among different temporal resolutions for the same identities. We conduct extensive evaluations on four benchmarks, i.e. iLIDS-VID, PRID2011, MARS, and DukeMTMC-VideoReID, experimental results achieve the competitive performance which demonstrates the effectiveness of our proposed method. Code is available at https://github.com/weleen/AGRL.pytorch. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
17. Frame Augmented Alternating Attention Network for Video Question Answering.
- Author
-
Zhang, Wenqiao, Tang, Siliang, Cao, Yanpeng, Pu, Shiliang, Wu, Fei, and Zhuang, Yueting
- Abstract
Vision and language understanding is one of the most fundamental and challenging problems in Multimedia Intelligence. Simultaneously understanding video actions with a related natural language question, and further produces accurate answer is even more challenging since it requires joint modeling information across modality. In the past few years, some studies begin to attack this problem by utilizing attention enhanced deep neural networks. However, simple attention mechanisms such as unidirectional attention fail to yield a better mapping between different modalities. Moreover, none of these Video QA models explore high-level semantics in augmented video-frame level. In this paper, we augmented each frame representation with its context information by a novel feature extractor that combines the advantages of Resnet and a variant of C3D. In addition, we proposed a novel alternating attention network which can alternately attend frame regions, video frames and words in the question in multi-turns. This yields better joint representations of video and question, further help the deep model to discover the deeper relationship between two modalities. Our method outperforms the state-of-the-art Video QA models on two existing video question answering datasets. Further ablation studies proved that our feature extractor and the alternating attention mechanism can improve the performance jointly. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
18. Intraspectrum Discrimination and Interspectrum Correlation Analysis Deep Network for Multispectral Face Recognition.
- Author
-
Wu, Fei, Jing, Xiao-Yuan, Dong, Xiwei, Hu, Ruimin, Yue, Dong, Wang, Lina, Ji, Yi-Mu, Wang, Ruchuan, and Chen, Guoliang
- Abstract
Multispectral images contain rich recognition information since the multispectral camera can reveal information that is not visible to the human eye or to the conventional RGB camera. Due to this characteristic of multispectral images, multispectral face recognition has attracted lots of research interest. Although some multispectral face recognition methods have been presented in the last decade, how to fully and effectively explore the intraspectrum discriminant information and the useful interspectrum correlation information in multispectral face images for recognition has not been well studied. To boost the performance of multispectral face recognition, we propose an intraspectrum discrimination and interspectrum correlation analysis deep network (IDICN) approach. Multiple spectra are divided into several spectrum-sets, with each containing a group of spectra within a small spectral range. The IDICN network contains a set of spectrum-set-specific deep convolutional neural networks attempting to extract spectrum-set-specific features, followed by a spectrum pooling layer, whose target is to select a group of spectra with favorable discriminative abilities adaptively. IDICN jointly learns the nonlinear representations of the selected spectra, such that the intraspectrum Fisher loss and the interspectrum discriminant correlation are minimized. Experiments on the well-known Hong Kong Polytechnic University, Carnegie Mellon University, and the University of Western Australia multispectral face datasets demonstrate the superior performance of the proposed approach over several state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
19. Context-Aware Deep Spatiotemporal Network for Hand Pose Estimation From Depth Images.
- Author
-
Wu, Yiming, Ji, Wei, Li, Xi, Wang, Gang, Yin, Jianwei, and Wu, Fei
- Abstract
As a fundamental and challenging problem in computer vision, hand pose estimation aims to estimate the hand joint locations from depth images. Typically, the problems are modeled as learning a mapping function from images to hand joint coordinates in a data-driven manner. In this paper, we propose a context-aware deep spatiotemporal network, a novel method to jointly model the spatiotemporal properties for hand pose estimation. Our proposed network is able to learn the representations of the spatial information and the temporal structure from the image sequences. Moreover, by adopting the adaptive fusion method, the model is capable of dynamically weighting different predictions to lay emphasis on sufficient context. Our method is examined on two common benchmarks, the experimental results demonstrate that our proposed approach achieves the best or the second-best performance with the state-of-the-art methods and runs in 60 fps. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
20. Deep Q Learning Driven CT Pancreas Segmentation With Geometry-Aware U-Net.
- Author
-
Man, Yunze, Huang, Yangsibo, Feng, Junyi, Li, Xi, and Wu, Fei
- Subjects
PANCREAS ,DEEP learning ,IMAGE processing ,IMAGE segmentation ,DIAGNOSTIC imaging - Abstract
The segmentation of pancreas is important for medical image analysis, yet it faces great challenges of class imbalance, background distractions, and non-rigid geometrical features. To address these difficulties, we introduce a deep Q network (DQN) driven approach with deformable U-Net to accurately segment the pancreas by explicitly interacting with contextual information and extract anisotropic features from pancreas. The DQN-based model learns a context-adaptive localization policy to produce a visually tightened and precise localization bounding box of the pancreas. Furthermore, deformable U-Net captures geometry-aware information of pancreas by learning geometrically deformable filters for feature extraction. The experiments on NIH dataset validate the effectiveness of the proposed framework in pancreas segmentation. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
21. Multi-Task Structure-Aware Context Modeling for Robust Keypoint-Based Object Tracking.
- Author
-
Li, Xi, Zhao, Liming, Ji, Wei, Wu, Yiming, Wu, Fei, Yang, Ming-Hsuan, Tao, Dacheng, and Reid, Ian
- Subjects
ARTIFICIAL satellite tracking ,TRACKING & trailing ,MODELING (Sculpture) ,LEARNING ,OBJECT (Aesthetics) - Abstract
In the fields of computer vision and graphics, keypoint-based object tracking is a fundamental and challenging problem, which is typically formulated in a spatio-temporal context modeling framework. However, many existing keypoint trackers are incapable of effectively modeling and balancing the following three aspects in a simultaneous manner: temporal model coherence across frames, spatial model consistency within frames, and discriminative feature construction. To address this problem, we propose a robust keypoint tracker based on spatio-temporal multi-task structured output optimization driven by discriminative metric learning. Consequently, temporal model coherence is characterized by multi-task structured keypoint model learning over several adjacent frames; spatial model consistency is modeled by solving a geometric verification based structured learning problem; discriminative feature construction is enabled by metric learning to ensure the intra-class compactness and inter-class separability. To achieve the goal of effective object tracking, we jointly optimize the above three modules in a spatio-temporal multi-task learning scheme. Furthermore, we incorporate this joint learning scheme into both single-object and multi-object tracking scenarios, resulting in robust tracking results. Experiments over several challenging datasets have justified the effectiveness of our single-object and multi-object trackers against the state-of-the-art. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
22. “Like charges repulsion and opposite charges attraction” law based multilinear subspace analysis for face recognition.
- Author
-
Wu, Fei, Jing, Xiao-Yuan, Wu, Songsong, Gao, Guangwei, Ge, Qi, and Wang, Ruchuan
- Subjects
- *
HUMAN facial recognition software , *MACHINE learning , *FEATURE extraction , *CLUSTER analysis (Statistics) , *IMAGE processing - Abstract
Multiple image variations occur in natural face images, such as the changes of pose, illumination, occlusion and expression. For non-specific variations based face recognition, learning effective features is an important research topic. Subspace learning is a widely used face recognition technique; however, numerous subspace analysis methods do not fully utilize the prior information of facial variations. Tensor-based multilinear subspace analysis methods can take advantage of the prior information, but they need to be further improved. With respect to a single facial variation, we observe that the image samples belonging to the same variation-state but different classes tend to cluster together, whereas those belonging to different variation-states but the same class tend to remain separate. This is adverse to classification. In this paper, motivated by the idea of charge law, “like charges repulsion and opposite charges attraction”, in which like and opposite charges are regarded as same and different variation-states, respectively, we propose a non-specific variations based discriminant analysis (NVDA) criterion. It searches for an optimal discriminant subspace in which samples belonging to same variation-state but different classes are separable, whereas those belonging to different variation-states but same class cluster together. We then propose a novel face recognition approach called non-specific variations based multi-subspace analysis (NVMSA), which serially utilizes NVDA criterion to learn multiple discriminant subspaces corresponding to different variations. In the proposed approach, we design a strategy to select the serial calculation order of variations and provide a rule to choose projection vectors with favorable discriminant capabilities. Furthermore, we formulate the locally statistical orthogonal constraints for the multiple subspaces learning to remove the local correlation of discriminant features obtained from multiple variations. Experiments on the AR, Weizmann, PIE and LFW databases demonstrate the effectiveness and efficiency of the proposed approach. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
23. Deep Context-Sensitive Facial Landmark Detection With Tree-Structured Modeling.
- Author
-
Zeng, Jiajian, Liu, Siyuan, Li, Xi, Mahdi, Debbah Abderrahmane, Wu, Fei, and Wang, Gang
- Subjects
IMAGE analysis ,CARTOGRAPHY ,HUMAN facial recognition software ,GENETIC algorithms ,SOFTWARE engineering - Abstract
Facial landmark detection is typically cast as a point-wise regression problem that focuses on how to build an effective image-to-point mapping function. In this paper, we propose an end-to-end deep learning approach for contextually discriminative feature construction together with effective facial structure modeling. The proposed learning approach is able to predict more contextually discriminative facial landmarks by capturing their associated contextual information. Moreover, we present a tree model to characterize human face structure and a structural loss function to measure the deformation cost between the ground-truth and predicted tree model, which are further incorporated into the proposed learning approach and jointly optimized within a unified framework. The presented tree model is able to well characterize the spatial layout patterns of facial landmarks for capturing the facial structure information. Experimental results demonstrate the effectiveness of the proposed approach against the state-of-the-art over the MTFL and AFLW-full data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
24. Multimodal Deep Embedding via Hierarchical Grounded Compositional Semantics.
- Author
-
Zhuang, Yueting, Song, Jun, Wu, Fei, Li, Xi, Zhang, Zhongfei, and Rui, Yong
- Subjects
SEMANTICS ,COMPARATIVE linguistics ,INFORMATION theory ,LANGUAGE & languages ,IMAGE databases - Abstract
For a number of important problems, isolated semantic representations of individual syntactic words or visual objects do not suffice, but instead a compositional semantic representation is required; for example, a literal phrase or a set of spatially concurrent objects. In this paper, we aim to harness the existing image–sentence databases to exploit the compositional nature of image–sentence data for multimodal deep embedding. In particular, we propose an approach called hierarchical-alike (bottom–up two layers) multimodal grounded compositional semantics (hiMoCS) learning. The proposed hiMoCS systemically captures the compositional semantic connotation of multimodal data in the setting of hierarchical-alike deep learning by modeling the inherent correlations between two modalities of collaboratively grounded semantics, such as the textual entity (with its describing attribute) and visual object, the phrase (e.g., subject-verb–object triplet), and spatially concurrent objects. We argue that hiMoCS is more appropriate to reflect the multimodal compositional semantics of the image and its narrative textual sentence, which are strongly coupled. We evaluate hiMoCS on the several benchmark data sets and show that the utilization of the hiMoCS (textual entities and visual objects, textual phrase, and spatially concurrent objects) achieves a much better performance than only using the flat grounded compositional semantics. [ABSTRACT FROM PUBLISHER]
- Published
- 2018
- Full Text
- View/download PDF
25. Using CONDENSATION Tracking to Recover Stroke Order of Chinese Calligraphic Handwritings with CCM
- Author
-
Wu Fei, Wu Ying-fei, Zhuang Yue-ting, and Pan Yun-he
- Subjects
Line segment ,Computer science ,Handwriting ,Speech recognition ,Feature extraction ,Stroke order ,Feature (machine learning) ,Pattern matching ,Condensation algorithm ,Visualization - Abstract
Stroke order is an intrinsic writing feature which is missing in off-line handwriting. Recovering stroke order from off-line handwriting is an important task by which the visualization of writing process is possible. Chinese Calligraphic handwriting is some different with western handwriting. Traditional method with assumption of the minimization of curvature between two adjacent line segments is not applicable for Chinese Calligraphic handwriting. We propose a new framework named Chinese Calligraphic Model (CCM). With this framework CONDENSATION algorithm - conditional density propagation over time is performed to find the best matching model in model database with the input pattern to recover stroke order. Experiments results show our method is a promising way to recover stroke order from Chinese Calligraphic handwriting based on CCM by CONDENSATION tracking.
- Published
- 2007
26. Regularized Deep Belief Network for Image Attribute Detection.
- Author
-
Wu, Fei, Wang, Zhuhao, Lu, Weiming, Li, Xi, Yang, Yi, Luo, Jiebo, and Zhuang, Yueting
- Subjects
- *
BOLTZMANN machine , *DEEP learning , *ARTIFICIAL neural networks , *BACK propagation , *MACHINE learning - Abstract
In general, an image attribute is a human-nameable visual property that has a semantic connotation. Appropriate modeling of the intrinsic contextual correlations among attributes plays a fundamental role in attribute detection. In this paper, we consider image attribute detection from the perspective of regularized deep learning. In particular, we propose a regularized deep belief network (rDBN) to perform the image attribute detection task, which is composed of two parts: 1) a detection DBN (dDBN) that models the joint distribution of images and their corresponding attributes, which acts as an attribute detector and 2) a contextual restricted Boltzmann machine that explicitly models the correlations among attributes acting as a regularizer that restraints the output detection result given by the dDBN to meet the contextual prior of attributes. Furthermore, we propose an efficient fine-tuning scheme that can further optimize the performance of the dDBN by backpropagation. Experimental results show that the proposed rDBN obtains improvements over the state-of-the-art methods for attribute detection on the benchmark data sets. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
27. Uncorrelated multi-set feature learning for color face recognition.
- Author
-
Wu, Fei, Jing, Xiao-Yuan, Dong, Xiwei, Ge, Qi, Wu, Songsong, Liu, Qian, Yue, Dong, and Yang, Jing-Yu
- Subjects
- *
FEATURE selection , *FACE perception , *FEATURE extraction , *ORTHOGONAL functions , *DATABASES - Abstract
Most existing color face feature extraction methods need to perform color space transformation, and they reduce correlation of color components on the data level that has no direct connection with classification. Some methods extract features from R, G and B components serially with orthogonal constraints on the feature level, yet the serial extraction manner might make discriminabilities of features derived from three components distinctly different. Multi-set feature learning can jointly learn features from multiple sets of data effectively. In this paper, we propose two novel color face recognition approaches, namely multi-set statistical uncorrelated projection analysis (MSUPA) and multi-set discriminating uncorrelated projection analysis (MDUPA), which extract discriminant features from three color components together and simultaneously reduce the global statistical and global discriminating feature-level correlation between color components in a multi-set manner, respectively. Experiments on multiple public color face databases demonstrate that the proposed approaches outperform several related state-of-the-arts. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
28. A Novel Inshore Ship Detection via Ship Head Classification and Body Boundary Determination.
- Author
-
Li, Sun, Zhou, Zhiqiang, Wang, Bo, and Wu, Fei
- Abstract
In this letter, we propose a novel method for inshore ship detection via ship head classification and body boundary determination. Compared with some traditional ship head detection methods depending on accurate ship head segmentation, we generate novel ship head features in the transformed domain of polar coordinate, where the ship heads have an approximate trapezoid shape and can be more easily detected. Then, these features are used in the classification based on support vector machine to detect the ship head candidates, and give the important information of initial ship head direction. Next, the surrounding consistent line segments are utilized to refine the ship direction, and the ship boundary is determined based on the saliency of directional gradient information symmetrical about the ship body. Finally, the context information of sea areas is introduced to remove false alarms. Experimental results show that the proposed method can accurately and robustly detect the inshore ships in high-resolution optical remote sensing images. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
29. Multi-spectral low-rank structured dictionary learning for face recognition.
- Author
-
Jing, Xiao-Yuan, Wu, Fei, Zhu, Xiaoke, Dong, Xiwei, Ma, Fei, and Li, Zhiqiang
- Subjects
- *
MACHINE learning , *HUMAN facial recognition software , *FEATURE extraction , *INFORMATION theory , *MATHEMATICAL regularization - Abstract
Multi-spectral face recognition has been attracting increasing interest. In the last decade, several multi-spectral face recognition methods have been presented. However, it has not been well studied that how to jointly learn effective features with favorable discriminability from multiple spectra even when multi-spectral face images are severely contaminated by noise. Multi-view dictionary learning is an effective feature learning technique, which learns dictionaries from multiple views of the same object and has achieved state-of-the-art classification results. In this paper, we for the first time introduce the multi-view dictionary learning technique into the field of multi-spectral face recognition and propose a multi-spectral low-rank structured dictionary learning (MLSDL) approach. It learns multiple structured dictionaries, including a spectrum-common dictionary and multiple spectrum-specific dictionaries, which can fully explore both the correlated information and the complementary information among multiple spectra. Each dictionary contains a set of class-specified sub-dictionaries. Based on the low-rank matrix recovery theory, we apply low-rank regularization in multi-spectral dictionary learning procedure such that MLSDL can well solve the problem of multi-spectral face recognition with high levels of noise. We also design the low-rank structural incoherence term for multi-spectral dictionary learning, so as to reduce the redundancy among multiple spectrum-specific dictionaries. In addition, to enhance the efficiency of classification procedure, we design a low-rank structured collaborative representation classification scheme for MLSDL. Experimental results on HK PolyU, CMU and UWA hyper-spectral face databases demonstrate the effectiveness of the proposed approach. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
30. Sparse Multi-Modal Hashing.
- Author
-
Wu, Fei, Yu, Zhou, Yang, Yi, Tang, Siliang, Zhang, Yin, and Zhuang, Yueting
- Abstract
Learning hash functions across heterogenous high-dimensional features is very desirable for many applications involving multi-modal data objects. In this paper, we propose an approach to obtain the sparse codesets for the data objects across different modalities via joint multi-modal dictionary learning, which we call sparse multi-modal hashing (abbreviated as SM^2H). In SM^2H, both intra-modality similarity and inter-modality similarity are first modeled by a hypergraph, then multi-modal dictionaries are jointly learned by Hypergraph Laplacian sparse coding. Based on the learned dictionaries, the sparse codeset of each data object is acquired and conducted for multi-modal approximate nearest neighbor retrieval using a sensitive Jaccard metric. The experimental results show that SM^2H outperforms other methods in terms of mAP and Percentage on two real-world data sets. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
31. Sparse Unsupervised Dimensionality Reduction for Multiple View Data.
- Author
-
Han, Yahong, Wu, Fei, Tao, Dacheng, Shao, Jian, Zhuang, Yueting, and Jiang, Jianmin
- Subjects
- *
SPARSE matrices , *DIMENSION reduction (Statistics) , *PRINCIPAL components analysis , *FEATURE extraction , *RENDERING (Computer graphics) , *COMPUTER algorithms - Abstract
Different kinds of high-dimensional visual features can be extracted from a single image. Images can thus be treated as multiple view data when taking each type of extracted high-dimensional visual feature as a particular understanding of images. In this paper, we propose a framework of sparse unsupervised dimensionality reduction for multiple view data. The goal of our framework is to find a low-dimensional optimal consensus representation from multiple heterogeneous features by multiview learning. In this framework, we first learn low-dimensional patterns individually from each view, considering the specific statistical property of each view. We construct a low-dimensional optimal consensus representation from those learned patterns, the goal of which is to leverage the complementary nature of the multiple views. We formulate the construction of the low-dimensional consensus representation to approximate the matrix of patterns by means of a low-dimensional consensus base matrix and a loading matrix. To select the most discriminative features for the spectral embedding of multiple views, we propose to add an \ell1-norm into the loading matrix's columns and impose orthogonal constraints on the base matrix. We develop a new alternating algorithm, i.e., spectral sparse multiview embedding, to efficiently obtain the solution. Each row of the loading matrix encodes structured information corresponding to multiple patterns. In order to gain flexibility in sharing information across subsets of the views, we impose a novel structured sparsity-inducing norm penalty on the loading matrix's rows. This penalty makes the loading coefficients adaptively load shared information across subsets of the learned patterns. We call this method structured sparse multiview dimensionality reduction. Experiments on a toy benchmark image data set and two real-world Web image data sets demonstrate the effectiveness of the proposed algorithms. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
32. Image Annotation by Input–Output Structural Grouping Sparsity.
- Author
-
Han, Yahong, Wu, Fei, Tian, Qi, and Zhuang, Yueting
- Subjects
- *
THREE-dimensional imaging , *FEATURE extraction , *STATISTICAL correlation , *SEMANTICS , *FEATURE selection , *ANNOTATIONS , *IMAGE retrieval , *IMAGE processing - Abstract
Automatic image annotation (AIA) is very important to image retrieval and image understanding. Two key issues in AIA are explored in detail in this paper, i.e., structured visual feature selection and the implementation of hierarchical correlated structures among multiple tags to boost the performance of image annotation. This paper simultaneously introduces an input and output structural grouping sparsity into a regularized regression model for image annotation. For input high-dimensional heterogeneous features such as color, texture, and shape, different kinds (groups) of features have different intrinsic discriminative power for the recognition of certain concepts. The proposed structured feature selection by structural grouping sparsity can be used not only to select group-of-features but also to conduct within-group selection. Hierarchical correlations among output labels are well represented by a tree structure, and therefore, the proposed tree-structured grouping sparsity can be used to boost the performance of multitag image annotation. In order to efficiently solve the proposed regression model, we relax the solving process as a framework of the bilayer regression model for multilabel boosting by the selection of heterogeneous features with structural grouping sparsity (Bi-MtBGS). The first-layer regression is to select the discriminative features for each label. The aim of the second-layer regression is to refine the feature selection model learned from the first layer, which can be taken as a multilabel boosting process. Extensive experiments on public benchmark image data sets and real-world image data sets demonstrate that the proposed approach has better performance of multitag image annotation and leads to a quite interpretable model for image understanding. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
33. Combining 2D image and point cloud deep learning to predict wheat above ground biomass.
- Author
-
Zhu, Shaolong, Zhang, Weijun, Yang, Tianle, Wu, Fei, Jiang, Yihan, Yang, Guanshuo, Zain, Muhammad, Zhao, Yuanyuan, Yao, Zhaosheng, Liu, Tao, and Sun, Chengming
- Subjects
- *
OPTICAL radar , *LIDAR , *NITROGEN fertilizers , *FEATURE extraction , *CULTIVARS - Abstract
Purpose: The use of Unmanned aerial vehicle (UAV) data for predicting crop above-ground biomass (AGB) is becoming a more feasible alternative to destructive methods. However, canopy height, vegetation index (VI), and other traditional features can become saturated during the mid to late stages of crop growth, significantly impacting the accuracy of AGB prediction. In 2022 and 2023, UAV multispectral, RGB, and light detection and ranging point cloud data of wheat populations were collected at seven growth stages across two experimental fields. The point cloud depth features were extracted using the improved PointNet++ network, and AGB was predicted by fusion with VI, color index (CI), and texture index (TI) raster image features.The findings indicate that when the point cloud depth features were fused, the
R 2 values predicted from VI, CI, TI, and canopy height model images increased by 0.05, 0.08, 0.06, and 0.07, respectively. For the combination of VI, CI, and TI,R 2 increased from 0.86 to a maximum of 0.9, while the root-mean-square error (RMSE) and mean absolute error were 1.80 t ha−1 and 1.36 t ha−1, respectively. Additionally, our findings revealed that the hybrid fusion exhibits the highest accuracy, it demonstrates robust adaptability in predicting AGB across various years, growth stages, crop varieties, nitrogen fertilizer applications, and densities. This study effectively addresses the saturation in spectral and chemical information, provides valuable insights for high-precision phenotyping and advanced crop field management, and serves as a reference for studying other crops and phenotypic parameters. Methods: The use of Unmanned aerial vehicle (UAV) data for predicting crop above-ground biomass (AGB) is becoming a more feasible alternative to destructive methods. However, canopy height, vegetation index (VI), and other traditional features can become saturated during the mid to late stages of crop growth, significantly impacting the accuracy of AGB prediction. In 2022 and 2023, UAV multispectral, RGB, and light detection and ranging point cloud data of wheat populations were collected at seven growth stages across two experimental fields. The point cloud depth features were extracted using the improved PointNet++ network, and AGB was predicted by fusion with VI, color index (CI), and texture index (TI) raster image features.The findings indicate that when the point cloud depth features were fused, theR 2 values predicted from VI, CI, TI, and canopy height model images increased by 0.05, 0.08, 0.06, and 0.07, respectively. For the combination of VI, CI, and TI,R 2 increased from 0.86 to a maximum of 0.9, while the root-mean-square error (RMSE) and mean absolute error were 1.80 t ha−1 and 1.36 t ha−1, respectively. Additionally, our findings revealed that the hybrid fusion exhibits the highest accuracy, it demonstrates robust adaptability in predicting AGB across various years, growth stages, crop varieties, nitrogen fertilizer applications, and densities. This study effectively addresses the saturation in spectral and chemical information, provides valuable insights for high-precision phenotyping and advanced crop field management, and serves as a reference for studying other crops and phenotypic parameters. Results: The use of Unmanned aerial vehicle (UAV) data for predicting crop above-ground biomass (AGB) is becoming a more feasible alternative to destructive methods. However, canopy height, vegetation index (VI), and other traditional features can become saturated during the mid to late stages of crop growth, significantly impacting the accuracy of AGB prediction. In 2022 and 2023, UAV multispectral, RGB, and light detection and ranging point cloud data of wheat populations were collected at seven growth stages across two experimental fields. The point cloud depth features were extracted using the improved PointNet++ network, and AGB was predicted by fusion with VI, color index (CI), and texture index (TI) raster image features.The findings indicate that when the point cloud depth features were fused, theR 2 values predicted from VI, CI, TI, and canopy height model images increased by 0.05, 0.08, 0.06, and 0.07, respectively. For the combination of VI, CI, and TI,R 2 increased from 0.86 to a maximum of 0.9, while the root-mean-square error (RMSE) and mean absolute error were 1.80 t ha−1 and 1.36 t ha−1, respectively. Additionally, our findings revealed that the hybrid fusion exhibits the highest accuracy, it demonstrates robust adaptability in predicting AGB across various years, growth stages, crop varieties, nitrogen fertilizer applications, and densities. This study effectively addresses the saturation in spectral and chemical information, provides valuable insights for high-precision phenotyping and advanced crop field management, and serves as a reference for studying other crops and phenotypic parameters. Conclusion: The use of Unmanned aerial vehicle (UAV) data for predicting crop above-ground biomass (AGB) is becoming a more feasible alternative to destructive methods. However, canopy height, vegetation index (VI), and other traditional features can become saturated during the mid to late stages of crop growth, significantly impacting the accuracy of AGB prediction. In 2022 and 2023, UAV multispectral, RGB, and light detection and ranging point cloud data of wheat populations were collected at seven growth stages across two experimental fields. The point cloud depth features were extracted using the improved PointNet++ network, and AGB was predicted by fusion with VI, color index (CI), and texture index (TI) raster image features.The findings indicate that when the point cloud depth features were fused, theR 2 values predicted from VI, CI, TI, and canopy height model images increased by 0.05, 0.08, 0.06, and 0.07, respectively. For the combination of VI, CI, and TI,R 2 increased from 0.86 to a maximum of 0.9, while the root-mean-square error (RMSE) and mean absolute error were 1.80 t ha−1 and 1.36 t ha−1, respectively. Additionally, our findings revealed that the hybrid fusion exhibits the highest accuracy, it demonstrates robust adaptability in predicting AGB across various years, growth stages, crop varieties, nitrogen fertilizer applications, and densities. This study effectively addresses the saturation in spectral and chemical information, provides valuable insights for high-precision phenotyping and advanced crop field management, and serves as a reference for studying other crops and phenotypic parameters. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
34. Vein Centerline Extraction of Visible Images Based on Tracking Method
- Author
-
Zhang, Yufeng, Tang, Chaoying, Yang, Jiarui, Wang, Biao, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ma, Huimin, editor, Wang, Liang, editor, Zhang, Changshui, editor, Wu, Fei, editor, Tan, Tieniu, editor, Wang, Yaonan, editor, Lai, Jianhuang, editor, and Zhao, Yao, editor
- Published
- 2021
- Full Text
- View/download PDF
35. A Unified Modular Framework with Deep Graph Convolutional Networks forMulti-label Image Recognition
- Author
-
Lin, Qifan, Chen, Zhaoliang, Wang, Shiping, Guo, Wenzhong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ma, Huimin, editor, Wang, Liang, editor, Zhang, Changshui, editor, Wu, Fei, editor, Tan, Tieniu, editor, Wang, Yaonan, editor, Lai, Jianhuang, editor, and Zhao, Yao, editor
- Published
- 2021
- Full Text
- View/download PDF
36. Two-Stage Recognition Algorithm for Untrimmed Converter Steelmaking Flame Video
- Author
-
Chen, Yi, Liu, Jiyuan, Xiong, Huilin, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ma, Huimin, editor, Wang, Liang, editor, Zhang, Changshui, editor, Wu, Fei, editor, Tan, Tieniu, editor, Wang, Yaonan, editor, Lai, Jianhuang, editor, and Zhao, Yao, editor
- Published
- 2021
- Full Text
- View/download PDF
37. Cascaded multi-3D-view fusion for 3D-oriented object detection.
- Author
-
Sun, Jing, Xu, Jing, Ji, Yi-Mu, Wu, Fei, and Sun, Yanfei
- Subjects
- *
OBJECT recognition (Computer vision) , *IMPLICIT learning , *GRAPHICAL projection , *POINT cloud , *FEATURE extraction - Abstract
Currently, multi-view fusion methods fuse point- or proposal-level features from different views at the end stage of the backbone. This once-end-fusion method is not conducive to the timely adjustment of spatial misalignment for features from different views. Consequently, the discriminative depth and orientation details of the 3D oriented point cloud object may be filtered. To enhance the feature capture capability of the network, we introduce a cascaded multi-3D-view fusion method (CM3DV) to learn the implicit representation of object orientation. In particular, the proposed CM3DV method incorporates the cylindrical front view projection into a voxelised 3D bird's-eye-view representation in a cascaded manner, and vice versa. Through the learning of 3D-regulated instance representation, this bi-directional mutual fusion module, called cascaded multi-view feature fusion module, alleviates the spatial misalignment of the two views. Furthermore, to learn the rotation- and shape-invariant features of objects, modulated rotation head (MRH) develops a direction-guided adjustment instead of an axis-aligned structure to extract instance-consistent features. By excluding the irrelevant content using MRH, this instance-consistent feature will benefit the object classification and orientation regression. Extensive experiments on the KITTI dataset show that the proposed method achieves a significant improvement over existing advanced methods, especially for orientation estimation. [Display omitted] • A multi-3D-view fusion method for accurate 3D orientation detection is proposed. • A cascaded multi-view feature fusion module is proposed. • Multi-view features are fused in a cascaded bi-directional fusion manner. • Modulated rotation head to learn the deformable feature is proposed. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
38. Adversarial domain adaptation network with pseudo-siamese feature extractors for cross-bearing fault transfer diagnosis.
- Author
-
Yao, Qunwang, Qian, Quan, Qin, Yi, Guo, Liang, and Wu, Fei
- Subjects
- *
FEATURE extraction - Abstract
The traditional domain adaptation model just uses a single (siamese) feature extractor for mapping the source domain and target domain data to a feature space simultaneously, but it may be not well suitable for the cross-machine feature mapping. To improve the performance of the cross-bearing fault transfer diagnosis, an adversarial domain adaptation network with pseudo-siamese feature extractors (PSFEN) is proposed. The core idea is to construct a pair of feature extractors with the same structure but not sharing parameters, which form a pair of pseudo-siamese feature extractors. When the source domain data differs greatly from the target domain data in the cross-machine transfer diagnosis, a pair of pseudo-siamese feature extractors is used to extract the features of source domain and target domain respectively, thus some exclusive characteristics of two domains can be obtained except for the common characteristics. It is theoretically analyzed that the distribution discrepancy obtained by the pseudo-siamese feature extractors can be closer to its actual upper limit. By reducing the more real supremum, the domain adaptation can be better achieved, thus improving the transfer diagnosis accuracy. Then, a distance metric of maximum mean discrepancy and an unbalanced adversarial training algorithm are integrated to train the pseudo-siamese feature extractors and reduce the discrepancy between the source and target domains. The effectiveness of the proposed method is verified by experiments on six cross-bearing fault transfer diagnosis tasks. The comparative results show that the proposed method have much higher diagnostic accuracy compared to six classical models. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.