Journal: pattern recognition / Publication Year Range: Last 10 years - Searchworks@Jio Institute Digital Library Search Results

Showing total 4,548 results

Start Over Publication Year Range Last 10 years Journal pattern recognition

4,548 results

1. Paperswithtopic: Topic Identification from Paper Title Only

Author: Cho, Daehyun, primary and Wallraven, Christian, additional
Published: 2022
Full Text: View/download PDF

2. Exploring global information for session-based recommendation

Author: Wang, Ziyang, Wei, Wei, Zou, Ding, Liu, Yifan, Li, Xiao-Li, Mao, Xian-Ling, and Qiu, Minghui
Published: 2024
Full Text: View/download PDF

3. Automatically classifying non-functional requirements using deep neural network

Author: Li, Bing and Nong, Xiuwen
Published: 2022
Full Text: View/download PDF

4. Imprecise Gaussian discriminant classification

Author: Carranza Alarcón, Yonatan Carlos and Destercke, Sébastien
Published: 2021
Full Text: View/download PDF

5. AlignedReID++: Dynamically matching local information for person re-identification

Author: Luo, Hao, Jiang, Wei, Zhang, Xuan, Fan, Xing, Qian, Jingjing, and Zhang, Chi
Published: 2019
Full Text: View/download PDF

6. Fast main density peak clustering within relevant regions via a robust decision graph

Author: Guan, Junyi, Li, Sheng, Zhu, Jinhui, He, Xiongxiong, and Chen, Jiajia
Published: 2024
Full Text: View/download PDF

7. On the classification of dynamical data streams using novel “Anti-Bayesian” techniques

Author: Hammer, Hugo Lewi, Yazidi, Anis, and Oommen, B. John
Published: 2018
Full Text: View/download PDF

8. Robust feature selection via central point link information and sparse latent representation.

Author: Kong, Jiarui, Shang, Ronghua, Zhang, Weitong, Wang, Chao, and Xu, Songhua
Subjects: *FEATURE selection, *LAPLACIAN matrices, *SPARSE matrices, *DATA mining
Abstract: • This paper proposes a novel unsupervised feature selection called CPSLR. • CPSLR uses a specific formula to obtain the central point matrix. Construct a link graph through the central matrix and the Laplacian matrix to retain the similarity between the data. • The link graph and the data graph form a dual graph structure, which can not only preserve more complete data information but also holds on to the manifold structure of the data. • Feature selection is conducted in the latent representation space, and interconnection information among data is mined by using latent representation learning to preserve the connections among data itself. • CPSLR applies l 2,1/2-norm constraint on the feature transformation matrix to select robust and low-redundancy features. Before conducting unsupervised feature selection, it is usually assumed that these data are independent of each other. On the contrary, real data will influence each other. Therefore, traditional feature selection methods may lose information related to each other between data. This can lead to inaccurately generated pseudo-label information and may result in poor feature selection results. To find solutions to this issue, this paper proposes robust feature selection via central point link information and sparse latent representation (CPSLR). Firstly, structure a link graph by calculating the center matrix to store the distance information from the sample to the center point. If two samples have similar distances to the center point, it can be determined that they belong to the same class. Therefore, the similarity between samples is preserved, and more accurate pseudo-label information is obtained. Secondly, CPSLR uses data graph and link graph to form a dual graph structure. It can not only retain the link information between samples but also retain the manifold structures of the samples. Then, CPSLR saves the interconnection contents between samples by sparse latent representation. That is, the constraint l 2,1 -norm is exerted on the expression of latent representation, and sparse non-redundant interconnection information is preserved. And by combining central point link information with sparse latent representation makes the interconnections between data reserved more comprehensive. That is to say, the pseudo-labels obtained are more like the real labels of the classes. Finally, CPSLR constrains the feature transformation matrix by l 2,1/2 -norm constraint so as to select robust and sparse features. CPSLR uses l 2,1/2 -norm constraint to assure that the feature transformation matrix is sparse, selecting more discriminative features, thereby obtaining the feature selection that can improve its efficiency. The experiments demonstrate that the clustering result of CPSLR outperform six classical or latest compared algorithms on eight datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. UVMO: Deep unsupervised visual reconstruction-based multimodal-assisted odometry.

Author: Han, Songrui, Li, Mingchi, Tang, Hongying, Song, Yaozhe, and Tong, Guanjun
Subjects: *VISUAL odometry, *DEEP learning, *POSE estimation (Computer vision), *MONOCULARS
Abstract: In recent years, unsupervised visual odometry (VO) based on visual reconstruction has attracted lots of attention due to its end-to-end pose estimation approach and the advantage of not requiring real labels for training. Unsupervised VO inputs monocular video frames into a pose estimation network to output the predicted poses, and optimizes the pose prediction by minimizing visual reconstruction loss with epipolar geometry constraint. However, lack of depth information and complex environments such as rapid turns and uneven lighting in monocular video frames can result in insufficient visual information for pose estimation. Additionally, dynamic objects and discontinuous occlusions in monocular video frames can introduce inappropriate errors in visual reconstruction. In this paper, an U nsupervised V isual reconstruction-based M ultimodal-assisted O dometry (UVMO) is proposed. UVMO leverages inertial and lidar information to complement visual information to acquire more accurate pose estimation. Specifically, a triple-modal fusion strategy called SMPF is proposed to conduct a more comprehensive and stable fusion of the three modalities' data. Additionally, an image-based mask is introduced to filter out the dynamic occlusion regions in video frames, improving the accuracy of visual reconstruction. To the best of our knowledge, this paper is the first to propose a pure deep learning-based visual-inertial-lidar odometry. Experiments show that UVMO achieves state-of-the-art performance among pure deep learning-based unsupervised odometry. • This paper mainly focus on learning-based odometry. • A triple-modal fusion strategy is proposed to improve the fusion effect. • An image-based mask is proposed for dynamic occlusion problem solution. • UVMO achieves state-of-the-art results among learning-based unsupervised methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Synthetic unknown class learning for learning unknowns.

Author: Jang, Jaeyeon
Subjects: *POSSIBILITY
Abstract: This paper addresses the open set recognition (OSR) problem, where the goal is to correctly classify samples of known classes while detecting unknown samples to reject. In the OSR problem, "unknown" is assumed to have infinite possibilities because we have no knowledge about unknowns until they emerge. Intuitively, the more an OSR system explores the possibilities of unknowns, the more likely it is to detect unknowns. Even though several generative OSR models have been proposed to explore more by generating synthetic samples and learning them as unknowns, the generated samples are limited to a small subspace of the known classes. Thus, this paper proposes a novel synthetic unknown class learning method that constantly generates unknown-like samples while maintaining diversity between the generated samples. By learning the unknown-like samples and known samples in an alternating manner, the proposed method can not only experience diverse synthetic unknowns but also reduce overgeneralization with respect to known classes. Experiments on several benchmark datasets show that the proposed method significantly outperforms other state-of-the-art approaches by generating diverse realistic unknown samples. • A novel generative open set recognition (OSR) model is developed. • The limitation of generative OSR models that generate limited samples is addressed. • A new learning technique generates realistic unknown-like samples and learns them. • Knowledge distillation is employed to reduce overgeneralization on known classes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. DynamicKD: An effective knowledge distillation via dynamic entropy correction-based distillation for gap optimizing.

Author: Zhu, Songling, Shang, Ronghua, Yuan, Bo, Zhang, Weitong, Li, Wenjie, Li, Yangyang, and Jiao, Licheng
Subjects: *DISTILLATION, *KNOWLEDGE gap theory, *ENTROPY, *ENTROPY (Information theory)
Abstract: The knowledge distillation uses a high-performance teacher network to guide the student network. However, the performance gap between the teacher and student networks can affect the student's training. This paper proposes a novel knowledge distillation algorithm based on dynamic entropy correction, which adjusts the student instead of the teacher to reduce the gap. Firstly, the effect of changing the output entropy (short for output information entropy) on the distillation loss in the student is analyzed in theory. This paper shows that correcting the output entropy can reduce the gap. Then, a knowledge distillation algorithm based on dynamic entropy correction is created, which can correct the output entropy in real-time with an entropy controller updated dynamically by the distillation loss. The proposed algorithm is validated on the CIFAR100, ImageNet, and PASCAL VOC 2007. The comparison with various state-of-the-art distillation algorithms shows impressive results, especially in the experiment on the CIFAR100 regarding teacher–student pair resnet32x4–resnet8x4. The proposed algorithm raises 2.64 points over the traditional distillation algorithm and 0.87 points over the state-of-the-art algorithm CRD in classification accuracy, demonstrating its effectiveness and efficiency. • This paper proposes a novel knowledge distillation algorithm called DynamicKD. • DynamicKD designs an entropy controller to reduce the distillation gap in real time. • DynamicKD uses the dynamic entropy correction to reduce the learning difficulty. • DynamicKD uses a single entropy controller to help students' learning. • Experimental results show the effectiveness and efficiency of DynamicKD. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Deep and wide nonnegative matrix factorization with embedded regularization.

Author: Moayed, Hojjat and Mansoori, Eghbal G.
Subjects: *MATRIX decomposition, *NONNEGATIVE matrices, *PATTERN recognition systems, *TIME complexity, *COMPUTATIONAL complexity, *NEURONS, *DEEP learning, *FEATURE extraction
Abstract: • Paper proposes Deep and Wide Nonnegative Matrix Factorization with embedded regularization (DWNMF) as an end-to-end model. • It prevents overfitting via embedded regularization, and decreases vanishing gradient via training layers independently. • Model size can grow incrementally to achieve the performance. It saves memory since it only holds each layer's parameters. • Model size can grow incrementally to achieve the performance. It saves memory since it only holds each layer's parameters. • Experimental results showed that DWNMF performs better than end-to-end feature learning models in complexity and CPU time. End-to-end learning is an advanced framework in deep learning. It combines feature extraction, followed by pattern recognition (classification, clustering, etc.) in a unified learning structure. However, these deep networks face several challenges such as overfitting, vanishing gradient, computational complexity, information loss in layers, and weak robustness to noisy data/features. To address these challenges, this paper presents Deep and Wide Nonnegative Matrix Factorization (DWNMF) with embedded regularization for the feature extraction stage of the end-to-end models. DWNMF aims to identify more robust features while preventing overfitting via embedding regularization. For this purpose, DWNMF integrates input data with its noisy versions as diverse augmented channels. Then, the features in all channels are extracted in parallel using distinct network branches. The parameters of this model learn the intrinsic hierarchical features in the channels of complex data objects. Finally, the extracted features in different channels are aggregated in a single feature space to perform the classification task. To embed regularization in the DWNMF model, some NMF neurons in the layers are substituted by random neurons to increase the stability and robustness of the extracted features. Experimental results confirm that the DWNMF model extracts more robust features, prevents overfitting, and achieves better classification accuracy compared to state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Dual GroupGAN: An unsupervised four-competitor (2V2) approach for video anomaly detection.

Author: Sun, Zhe, Wang, Panpan, Zheng, Wang, and Zhang, Meng
Subjects: *ANOMALY detection (Computer security), *INTRUSION detection systems (Computer security), *VIDEOS, *GLOBAL method of teaching
Abstract: • We proposed a dual GroupGAN network constructed by a SENet-based four-competitor (2V2), which leverages the predicted video frame as input to the reconstruction network to amplify the reconstruction error and improve the detection of anomalies. • The proposed approach introduced in this paper can effectively enhance crucial spatial-temporal features presented in video frames, thereby facilitating better preservation of normal patterns in memory. • The effectiveness of the proposed approach is demonstrated through extensive experiments on three standard public VAD datasets. In response to the issues of overgeneralization in reconstruction-based methods and noise sensitivity in prediction-based methods for video anomaly detection, this paper proposes a novel unsupervised video anomaly detection approach using dual GroupGAN, refers to a four-competitor (2V2), based on channel attention mechanism. Our appraoch incorporates a channel attention mechanism into two generators, namely the SE-U-Net and SE-VAE, which respectively serve as the prediction and reconstruction networks. The SE-U-Net captures essential spatio-temporal features and automatically calibrates the channel dimension, while the SE-VAE learns global features from associated video frames. A weighting strategy is used to fuse the anomaly scores of the two networks and balance their emphasis on spatio-temporal feature representation. To wrap up, the proposed prediction network (SE-U-Net) is resistant to overgeneralization and improves quality of the reconstruction network (SE-VAE) when using the prediction frame as the input of SE-VAE. Also, the SE-VAE enhances predicted future frames from normal events, thereby increasing the robustness of the SE-U-Net. Experimental results from UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets demonstrate the effectiveness of the proposed approach both qualitatively and quantitatively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Incremental convolutional transformer for baggage threat detection.

Author: Hassan, Taimur, Hassan, Bilal, Owais, Muhammad, Velayudhan, Divya, Dias, Jorge, Ghazal, Mohammed, and Werghi, Naoufel
Subjects: *DEEP learning, *LUGGAGE, *MACHINE learning
Abstract: Detecting cluttered and overlapping contraband items from baggage scans is one of the most challenging tasks, even for human experts. Recently, considerable literature has grown up around the theme of deep learning-based X-ray screening for localizing contraband data. However, the existing threat detection systems are still vulnerable to high occlusion, clutter, and concealment. Furthermore, they require exhaustive training routines on large-scale and well-annotated data in order to produce accurate results. To overcome the above-mentioned limitations, this paper presents a novel convolutional transformer system that recognizes different overlapping instances of prohibited objects in complex baggage X-ray scans via a distillation-driven incremental instance segmentation scheme. Furthermore, unlike its competitors, the proposed framework allows an incremental integration of new item instances while avoiding costly training routines. In addition to this, the proposed framework also outperforms state-of-the-art approaches by achieving a mean average precision score of 0.7896, 0.5974, and 0.7569 on publicly available GDXray, SIXray, and OPIXray datasets for detecting concealed and cluttered baggage threats. • This paper presents a novel incremental convolutional transformer model. • A β hyperparameter is introduced in the paper to control catastrophic forgetting. • A unique segmentation scheme is proposed to extract cluttered object instances. • The proposed system is thoroughly tested on three public X-ray datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Cross co-teaching for semi-supervised medical image segmentation.

Author: Zhang, Fan, Liu, Huiying, Wang, Jinjiang, Lyu, Jun, Cai, Qing, Li, Huafeng, Dong, Junyu, and Zhang, David
Subjects: *DIAGNOSTIC imaging, *TEACHING teams
Abstract: Excellent performance has been achieved on semi-supervised medical image segmentation, but existing algorithms perform relatively poorly for objects with variable topologies and weak boundaries. In this paper, we propose a novel cross co-teaching framework, called Cross-structure-task Collaborative Teaching (CroCT), which not only can effectively handle variable topologies, but also strengthens the learning for weak boundaries of unlabeled data. Specifically, a new cross-structure-task collaborative teaching mechanism is developed based on our designed "E-Net" structure composed of a shared encoder and two decoder branches with distinct learning paradigms, which asks these two branches to regress topology-aware signed distance functions and densely-predicted segmentation masks for each other. Powered by the collaboration across different structural biases and sequence-related tasks, our CroCT can extract more discriminative yet complementary representations from abundant raw medical data to promote the consistency learning generalization, further boosting the performance for tackling highly diverse shapes and topological changes intra-/inter-slices. Besides, it guarantees the diversities from multi-levels, i.e., structure and task perspectives, to preclude prediction uncertainty. In addition, a novel adaptive boundary enhancing (ABE) module is proposed to introduce compact annularly enhanced boundary features into semi-supervised training, which significantly improves weak boundary perception ability for unlabeled data while facilitating collaborative teaching for efficiently propagating complementary knowledge across different branches. The extensive experiments on three challenging medical benchmarks, employing different labeled settings, demonstrate the superiority of our CroCT over recent state-of-the-art competitors. • A novel cross-structure-task collaborative teaching framework is presented. • ABE facilitates the efficient collaboration and fusion of complementary knowledge. • Variable topologies and weak boundary issues in SSMIS are well solved in this paper. • Results on three challenging SSMIS benchmarks confirm the superiority of our CroCT. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. SceneFake: An initial dataset and benchmarks for scene fake audio detection.

Author: Yi, Jiangyan, Wang, Chenglong, Tao, Jianhua, Zhang, Chu Yuan, Fan, Cunhang, Tian, Zhengkun, Ma, Haoxin, and Fu, Ruibo
Subjects: *SPEECH enhancement, *SOURCE code, *SIGNAL-to-noise ratio, *PROSODIC analysis (Linguistics)
Abstract: Many datasets have been designed to further the development of fake audio detection. However, fake utterances in previous datasets are mostly generated by altering timbre, prosody, linguistic content or channel noise of original audio. These datasets leave out a scenario, in which the acoustic scene of an original audio is manipulated with a forged one. It will pose a major threat to our society if some people misuse the manipulated audio with malicious purpose. Therefore, this motivates us to fill in the gap. This paper proposes such a dataset for scene fake audio detection named SceneFake, where a manipulated audio is generated by only tampering with the acoustic scene of an real utterance by using speech enhancement technologies. Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper. In addition, an analysis of fake attacks with different speech enhancement technologies and signal-to-noise ratios are presented in this paper. The results indicate that scene fake utterances cannot be reliably detected by baseline models trained on the ASVspoof 2019 dataset. Although these models perform well on the SceneFake training set and seen testing set, their performance is poor on the unseen test set. The dataset 2 2 https://zenodo.org/record/7663324#.Y%5fXKMuPYuUk. and benchmark source codes 3 3 https://github.com/ADDchallenge/SceneFake. are publicly available. • This paper proposes a new problem: scene fake audio detection. • This is the first attempt to pose such an audio fake attack using speech enhancement. • This paper designs a dataset and provides benchmarks for scene fake audio detection. • The dataset provides speech enhancement technology information of fake utterances. • The dataset and benchmark source codes are publicly available. • The dataset will further foster research on fake audio detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Graph embedding orthogonal decomposition: A synchronous feature selection technique based on collaborative particle swarm optimization.

Author: Zhong, Jingyu, Shang, Ronghua, Xu, Songhua, and Li, Yangyang
Subjects: *ORTHOGONAL decompositions, *FEATURE selection, *PARTICLE swarm optimization
Abstract: • This paper proposes a synchronous feature selection technique based on graph-embedded cluster label orthogonal decomposition and collaborative particle swarm optimization (GOD-cPSO). • GOD-cPSO extends the feature selection framework of clustering label orthogonal decomposition by graph embedding. • The l 2,1-2-norm with strong global convergence is extended to the graph embedding clustering label orthogonal decomposition framework. • The local structure preserving of low-dimensional manifolds is integrated into the graph-embedded clustering label orthogonal decomposition framework. • GOD-cPSO synchronously guides the graph-embedding clustering labeling orthogonal decomposition framework for feature selection through collaborative particle swarm optimization. In unsupervised feature selection, the clustering label matrix has the ability to distinguish between projection clusters. However, the latent geometric structure of the clustering labels is often ignored. In addition, the optimal sub-feature selection performance of feature selection techniques relies greatly on the choice of balanced parameters, and the selection range of most technical parameters is limited and fixed. To solve the above-mentioned problems, this paper proposes a synchronous feature selection technique based on graph-embedded cluster label orthogonal decomposition and collaborative particle swarm optimization (GOD-cPSO). First, GOD-cPSO extends the feature selection framework of clustering label orthogonal decomposition by graph embedding to retain the latent geometric structure of clustering labels, thus maintaining the correlation between clustered sample labels. Then, the l 2,1-2 -norm with strong global convergence is extended to the graph embedding clustering label orthogonal decomposition framework. By imposing this non-convex constraint, GOD-cPSO can achieve low-dimensional sparse and low-redundant sub-features. In addition, the local structure preserving of low-dimensional manifolds is integrated into the graph-embedded clustering label orthogonal decomposition framework to obtain good cluster separation and effectively maintain the latent local structure of the data. Finally, to ensure the adaptive parameter selection over a large range, GOD-cPSO synchronously guides the graph-embedding clustering labeling orthogonal decomposition framework for feature selection through collaborative particle swarm optimization. GOD-cPSO has synchronous parameter optimization and feature selection and picks parameters in a larger range. Comprehensive numerical experiments are performed on nine datasets to test the validity of the GOD-cPSO. The experimental results demonstrate that the sub-features selected by the GOD-cPSO have stronger discriminative power and are superior to other techniques in the clustering assignments. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Multi-target label backdoor attacks on graph neural networks.

Author: Wang, Kaiyang, Deng, Huaxin, Xu, Yijia, Liu, Zhonglin, and Fang, Yong
Subjects: *GRAPH neural networks, *POISONS
Abstract: Graph neural networks have been shown to have characteristics that make them susceptible to backdoor attacks, and many recent works have proposed feasible graph backdoor attack methods. However, existing graph backdoor attack methods only target one-to-one attack types and lack graph backdoor attack methods that can address one-to-many attack requirements. This paper is the first research work on one-to-many type graph backdoor attacks and proposes the backdoor attack method MLGB, which can achieve multi-target label attacks for GNN node classification tasks. We designed encoding mechanisms to allow MLGB to customize triggers for different target labels and ensure differentiation between triggers for different target labels through loss functions. Additionally, we designed an innovative poisoned node selection method to improve the efficiency of MLGB's attacks further. Extensive experiments were conducted to validate MLGB's effectiveness across multiple datasets and model architectures, demonstrating its robustness against graph backdoor attack defense mechanisms. Furthermore, ablation experiments and explainability analyses were conducted to provide deeper insights into MLGB. Our work reveals that graph neural networks are also vulnerable to one-to-many type backdoor attacks, which is important for practitioners to understand model risks comprehensively. • To our knowledge, this paper is the first work in the field of one-to-many backdoor attacks on graph neural networks. • We propose MLGB, a graph backdoor attack method that enables attackers to set multiple target labels simultaneously. • We design a poison node selection method to enhance the efficiency of graph backdoor attacks. • We design an encoding mechanism and loss functions tailored for multi-target requirements. • We perform large-scale experiments, and comprehensively evaluate the effectiveness and stealthiness of MLGB. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. GhostFormer: Efficiently amalgamated CNN-transformer architecture for object detection

Author: Xie, Xin, Wu, Dengquan, Xie, Mingye, and Li, Zixi
Published: 2024
Full Text: View/download PDF

20. Scalable and accurate subsequence transform for time series classification

Author: Mbouopda, Michael Franklin and Mephu Nguifo, Engelbert
Published: 2024
Full Text: View/download PDF

21. GITGAN: Generative inter-subject transfer for EEG motor imagery analysis

Author: Yin, Kang, Lim, Elissa Yanting, and Lee, Seong-Whan
Published: 2024
Full Text: View/download PDF

22. Federated learning for medical image analysis: A survey.

Author: Guan, Hao, Yap, Pew-Thian, Bozoki, Andrea, and Liu, Mingxia
Subjects: *FEDERATED learning, *COMPUTER-assisted image analysis (Medicine), *IMAGE analysis, *MACHINE learning, *DIAGNOSTIC imaging
Abstract: Machine learning in medical imaging often faces a fundamental dilemma, namely, the small sample size problem. Many recent studies suggest using multi-domain data pooled from different acquisition sites/centers to improve statistical power. However, medical images from different sites cannot be easily shared to build large datasets for model training due to privacy protection reasons. As a promising solution, federated learning, which enables collaborative training of machine learning models based on data from different sites without cross-site data sharing, has attracted considerable attention recently. In this paper, we conduct a comprehensive survey of the recent development of federated learning methods in medical image analysis. We have systematically gathered research papers on federated learning and its applications in medical image analysis published between 2017 and 2023. Our search and compilation were conducted using databases from IEEE Xplore, ACM Digital Library, Science Direct, Springer Link, Web of Science, Google Scholar, and PubMed. In this survey, we first introduce the background of federated learning for dealing with privacy protection and collaborative learning issues. We then present a comprehensive review of recent advances in federated learning methods for medical image analysis. Specifically, existing methods are categorized based on three critical aspects of a federated learning system, including client end, server end, and communication techniques. In each category, we summarize the existing federated learning methods according to specific research problems in medical image analysis and also provide insights into the motivations of different approaches. In addition, we provide a review of existing benchmark medical imaging datasets and software platforms for current federated learning research. We also conduct an experimental study to empirically evaluate typical federated learning methods for medical image analysis. This survey can help to better understand the current research status, challenges, and potential research opportunities in this promising research field. • Summarize existing methods from a system perspective. • Introduce different methods in a "question–answer" paradigm. • Introduce software platforms and benchmark datasets. • Conduct an experimental study. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. Rigid pairwise 3D point cloud registration: A survey.

Author: Lyu, Mengjin, Yang, Jie, Qi, Zhiquan, Xu, Ruijie, and Liu, Jiabin
Subjects: *POINT cloud, *RECORDING & registration, *RESEARCH personnel, *BINOCULAR vision
Abstract: Over the past years, 3D point cloud registration has attracted unprecedented attention. Researchers develop various approaches to tackle the challenging task, such as optimization-based and deep learning-based methods. To systematically sort out the relevant literature and follow the state-of-the-art solutions, this paper conducts a thorough survey. We propose a novel taxonomy dubbed Intermediates Based Taxon (IBTaxon) which effectively categorizes multifarious registration approaches by the introduced intermediate variables or the leveraged intermediate modules. We further delve into each of the categories and present a comprehensive technique review with a focus on the distinct insight behind each of the methods. Besides, the relevant datasets and evaluation metrics are also combed and reorganized. We conclude our paper by discussing the possible open research problems and presenting our visions for future research in the field of 3D point cloud registration. • A novel taxonomy dubbed IBTaxon is proposed which categorizes registration methods as one-stage and two-stage approaches. • Following the IBTaxon, alternate and sequential optimization are induced as two strategies for two-stage approaches to achieve alignment. • Several widely used datasets and metrics for the standard evaluation and comparison of various registration methods are combed. • Experimental performances of several representative methods have been arranged and analyzed to provide a reference. • We make the conclusion that the balance between accuracy, speed, and robustness is more considerable rather than aspiring to a single indicator. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Multi-task hierarchical convolutional network for visual-semantic cross-modal retrieval.

Author: Ji, Zhong, Lin, Zhigang, Wang, Haoran, Pang, Yanwei, and Li, Xuelong
Subjects: *CONVOLUTIONAL neural networks, *MODAL logic, *SEMANTICS
Abstract: Bridging visual and textual representations plays a central role in delving into multimedia data understanding. The main challenge arises from that images and texts exist in heterogeneous spaces, leading to the difficulty to preserve the semantic consistency between both modalities. To narrow the modality gap, most recent methods resort to extra object detectors or parsers to obtain the hierarchical representations. In this work, we address this problem by introducing our Multi-Task Hierarchical Convolutional Neural Network (MT-HCN). It is characterized by mining the hierarchical semantic information without the aid of any extra supervisions. Firstly, from the perspective of representing architecture, we leverage the intrinsic hierarchical structure of Convolutional Neural Networks (CNNs) to decompose the representations of both modalities into two semantically complementary levels, i.e. , exterior representations and concept representations. The former focuses on discovering the fine-grained low-level associations between both modalities, meanwhile the latter underlines capturing more high-level abstract semantics. Specifically, we present a Self-Supervised Clustering (SSC) loss to preserve more fine-grained semantic clues in exterior representations. It is constituted on the basis of viewing multiple image/text pairs with similar exterior as a category. In addition, a novel harmonious bidirectional triplet ranking (HBTR) loss is proposed, which mitigate the adverse effects brought about by the biased and noisy negative samples. Besides hardest negatives, it also imposes the constraints on the distance between the positive pairs and the centroid of negative pairs. Extensive experimental results on two popular cross-modal retrieval benchmarks demonstrate our proposed MT-HCN can achieve the competitive results compared with the state-of-the-art methods. • This paper proposes a novel Multi-Task Hierarchical Convolutional Network (MT-HCN) for visual-semantic cross-modal retrieval, which is characterized by adopting classification task to improve the hierarchical multi-modal representation learning. • This paper proposes a novel Self-Supervision Clustering (SSC) loss to learn the exterior representations that fully exploits low-level fine-grained correlation for associating images and texts. • This paper presents an effective bidirectional ranking loss, namely Harmonious Bidirectional Ranking (HBR) for cross-modal correlation preserving. It not only efficiently assists us to seek out more representative hard negative samples, but also leverages the category center of negatives to enhance the robustness of cross-modal representations. • Extensive experiments on two benchmark datasets validate the superiority of our proposed model in comparison to the state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Survey of spectral clustering based on graph theory.

Author: Ding, Ling, Li, Chao, Jin, Di, and Ding, Shifei
Subjects: *GRAPH theory, *LAPLACIAN matrices, *TIME complexity, *CUTTING stock problem, *EIGENVECTORS
Abstract: • This paper introduces the basic concept of graph theory, reviews the properties of Laplacian matrix and the traditional graph cuts method. Starting from four aspects in the realization process of spectral clustering (construction of similarity matrix, establishment of Laplacian matrix, selection of eigenvectors, and determination of the number of clusters), we have summarized in detail some representative algorithms in recent years. • Some successful applications of spectral clustering are summarized. In each aspect, the shortcomings of spectral clustering and some representative improved algorithms are emphatically analyzed. • This paper comprehensively analyzes some research on spectral clustering that has not yet been in-depth, and gives prospects on some valuable research directions. Spectral clustering converts the data clustering problem to the graph cut problem. It is based on graph theory. Due to the reliable theoretical basis and good clustering performance, spectral clustering has been successfully applied in many fields. Although spectral clustering has many advantages, it faces the challenges of high time and space complexity when dealing with large scale complex data. Firstly, this paper introduces the basic concept of graph theory, reviews the properties of Laplacian matrix and the traditional graph cuts method. Then, it focuses on four aspects of the realization process of spectral clustering, including the construction of similarity matrix, the establishment of Laplacian matrix, the selection of eigenvectors and the determination of the number of clusters. In addition, some successful applications of spectral clustering are summarized. In each aspect, the shortcomings of spectral clustering and some representative improved algorithms are emphatically analyzed. Finally, the paper comprehensively analyzes some research on spectral clustering that has not yet been in-depth, and gives prospects on some valuable research directions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Certainty weighted voting-based noise correction for crowdsourcing.

Author: Li, Huiru, Jiang, Liangxiao, and Li, Chaoqun
Subjects: *CROWDSOURCING, *INDIVIDUALS' preferences, *NOISE, *CERTAINTY, *POPULAR music genres
Abstract: In crowdsourcing scenarios, we can obtain each instance's multiple noisy label set from different workers and then use a ground truth inference algorithm to infer its integrated label. Despite the effectiveness of ground truth inference algorithms, there is still a certain level of noise in integrated labels. To reduce the impact of noise, many noise correction algorithms have been proposed in recent years. To the best of our knowledge, almost all these algorithms assume that workers have the same labeling certainty on different classes and instances. However, it is rarely true in reality due to the differences in workers' individual preferences and cognitive abilities. In this paper, we argue that the labeling certainty of a worker should be class-dependent and instance-dependent. Based on this premise, we propose a certainty weighted voting-based noise correction (CWVNC) algorithm. At first, we use the consistency between worker-labeled labels and integrated labels on different classes to estimate the class-dependent certainty. Then, we train a probability-based classifier on the instances labeled by each worker separately and use it to estimate the instance-dependent certainty. Finally, we correct the integrated label of each instance by weighted voting based on class-dependent certainty and instance-dependent certainty. When the proposed algorithm CWVNC is examined, the average noise ratio of CWVNC on 34 simulated datasets is equal to 15.08%, and on two real-world datasets "Income" and "Music_genre" the noise ratio is equal to 25.77% and 26.94%, respectively. The results show that CWVNC significantly outperforms all other state-of-the-art noise correction algorithms used for comparison. • Crowdsourcing provides an effective way to collect labels from crowd workers. • Noise correction algorithms have been proposed to reduce the noise. • Existing algorithms assume that workers have the same labeling certainty. • This paper proposes a certainty weighted voting-based noise correction algorithm. • The extensive experiments validate the effectiveness of the proposed algorithm. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. HDR light field imaging of dynamic scenes: A learning-based method and a benchmark dataset.

Author: Chen, Yeyao, Jiang, Gangyi, Yu, Mei, Jin, Chongchong, Xu, Haiyong, and Ho, Yo-Sung
Subjects: *HIGH dynamic range imaging, *FEATURE extraction, *IMAGE sensors, *POTENTIAL well
Abstract: • A novel learning-based method is proposed for ghost-free high dynamic range (HDR) light field imaging. • A multi-scale architecture integrating deformable alignment module and angular embedding module is designed. • A new large-scale benchmark dataset is established to serve the HDR light field imaging task for dynamic scenes. • The proposed method achieves superior spatial quality and preserves accurate angular consistency. Light field (LF) imaging is an effective way to enable immersive applications. However, limited by the potential well capacity of the image sensor, the acquired LF images suffer from low dynamic range and are thus prone to under-exposure or over-exposure. High dynamic range (HDR) LF imaging is an efficacious avenue to improve the LF imaging's dynamic range. Unfortunately, for dynamic scenes, existing methods are inclined to produce ghosting artifacts and lose details in the saturated regions, while potentially damaging the parallax structure of generated HDR LF images. To address the above challenges, in this paper, we propose a new ghost-free HDR LF imaging method based on a deformable aggregation and angular embedding network. Specifically, considering the four-dimensional geometric structure of the LF image, a deformable alignment module is first designed to handle dynamic regions in the spatial domain, and then the aligned spatial features are fully fused through an aggregation operation. Subsequently, an angular embedding module is constructed to explore angular information to enhance the aggregated spatial features. Based on this, the above two modules are cascaded in a multi-scale manner to achieve multi-level feature extraction and enhance the feature representation ability. Finally, a decoder is leveraged to recover the ghost-free HDR LF image from the enhanced multi-scale features. For performance evaluation, this paper establishes a large-scale benchmark dataset with multi-exposure inputs and ground truth images. Extensive experimental results show that the proposed method generates visually pleasing HDR LF images while preserving accurate angular consistency. Moreover, the proposed method surpasses the state-of-the-art methods in both quantitative and qualitative comparisons. The code and dataset will be available at https://github.com/YeyaoChen/HDRLFI. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Fusion-competition framework of local topology and global texture for head pose estimation.

Author: Ma, Dongsheng, Fu, Tianyu, Yang, Yifei, Cao, Kaibin, Fan, Jingfan, Xiao, Deqiang, Song, Hong, Gu, Ying, and Yang, Jian
Subjects: *POINT cloud, *SUBSPACES (Mathematics)
Abstract: • The proposed method combined the heterogeneous data to fully utilizes the texture information of RGB image and the geometric information of point cloud. Compared with depth image, the point cloud has more powerful topology feature, which can be learned with texture feature for accurately and robustly head pose estimation. • The proposed framework is constructed to achieve the feature fusion in the texture-topology level and generate the feature competition among the local regions. This fusion-competition framework enhances the expression of the features with different categories in the different levels to decrease the estimation error and increase the stability. • This paper constructed an RGB-Depth dataset using HoloLens2 for training and testing in head pose estimation. This dataset has abundant head pose samples including 24 sessions with 12 K frames from 21 males and 1 female, and the ground truth of pose in each frame is labeled by an accurate tracking device tied on the head. RGB image and point cloud involve texture and geometric structure, which are widely used for head pose estimation. However, images lack of spatial information, and the quality of point cloud is easily affected by sensor noise. In this paper, a novel fusion-competition framework (FCF) is proposed to overcome the limitations of a single modality. The global texture information is extracted from image and the local topology information is extracted from point cloud to project heterogeneous data into a common feature subspace. The projected texture feature weighted by the channel attention mechanism is embedded into each local point cloud region with different topological features for fusion. The scoring mechanism creates competition among the regions involving local-global fused features to predict final pose with the highest score. According to the evaluation results on the public and our constructed datasets, the FCF improves the estimation accuracy and stability by an average of 13.6 % and 12.7 %, which is compared to nine state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Multi-scale hypergraph-based feature alignment network for cell localization.

Author: Li, Bo, Zhang, Yong, Zhang, Chengyang, Piao, Xinglin, Hu, Yongli, and Yin, Baocai
Subjects: *CELL imaging, *GRAPH algorithms, *IMAGE analysis, *CELL morphology, *MULTISCALE modeling, *BIOLOGICAL networks, *HYPERGRAPHS
Abstract: Cell localization in medical image analysis is a challenging task due to the significant variation in cell shape, size and color. Existing localization methods continue to tackle these challenges separately, frequently facing complications where these difficulties intersect and adversely impact model performance. In this paper, these challenges are first reframed as issues of feature misalignment between cell images and location maps, which are then collectively addressed. Specifically, we propose a feature alignment model based on a multi-scale hypergraph attention network. The model considers local regions in the feature map as nodes and utilizes a learnable similarity metric to construct hypergraphs at various scales. We then utilize a hypergraph convolutional network to aggregate the features associated with the nodes and achieve feature alignment between the cell images and location maps. Furthermore, we introduce a stepwise adaptive fusion module to fuse features at different levels effectively and adaptively. The comprehensive experimental results demonstrate the effectiveness of our proposed multi-scale hypergraph attention module in addressing the issue of feature misalignment, and our model achieves state-of-the-art performance across various cell localization datasets. • This paper innovatively addresses the challenges stemming from significant variations in cell shape, scale, and color by reframing them as a feature misalignment problem between cell images and location maps, thereby presenting a unified solution to these complexities. • We propose an innovative multi-scale hypergraph attention module that achieves feature alignment through the adaptive aggregation of features across various scale ranges. • The proposed model achieves state-of-the-art performance on multiple cell localization datasets and reveals great potential. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. PDRLRR: A novel low-rank representation with projection distance regularization via manifold optimization for clustering.

Author: Chen, Haoran, Chen, Xu, Tao, Hongwei, Li, Zuhe, and Wang, Boyue
Subjects: *MACHINE learning, *DATA reduction
Abstract: The low-rank representation (LRR) method has attracted widespread attention due to its excellent performance in pattern recognition and machine learning. LRR-based variants have been proposed to solve the three existing problems in LRR: (1) the projection matrix is permanently fixed when dimensionality reduction techniques are adopted; (2) LRR fails to capture the local geometric structure; and (3) the solution deviates from the real low-rank solution. To address these problems, this paper proposes a low-rank representation with projection distance regularization (PDRLRR) via manifold optimization for clustering. In detail, we first introduce a low-dimensional projection matrix and a projection distance regularization term to fit the projected data automatically and capture the local structure of the data, respectively. Consequently, the projection matrix and representation matrix are obtained jointly. Then, we obtain a more accurate low-rank solution by minimizing the Schatten- p norm instead of the nuclear norm. Next, the projection matrix is optimized through a generalized Stiefel manifold. Extensive experiments demonstrate that our proposed method outperforms the state-of-the-art methods. • This paper proposes a novel PDRLRR model that can simultaneously address the three common problems in LRR. • Data dimensionality reduction technology is integrated into LRR, reducing the data dimensions while learning representation matrices. • For extracting the complete information, the projection distance regularization term is introduced to capture the global and local structure of the data. • The Schatten- p norm instead of the nuclear norm is employed to solve the rank minimization problem of the representation matrix, which can more accurately approximate the real low-rank solution. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. SED: Searching Enhanced Decoder with switchable skip connection for semantic segmentation.

Author: Zhang, Xian, Quan, Zhibin, Li, Qiang, Zhu, Dejun, and Yang, Wankou
Subjects: *IMAGE segmentation, *OCCUPATIONAL retraining, *SPINE, *TEMPORAL lobe
Abstract: Neural architecture search (NAS) has shown excellent performance. However, existing semantic segmentation models rely heavily on pre-training on Image-Net or COCO and mainly focus on the designing of decoders. Directly training the encoder–decoder architecture search models from scratch to SOTA for semantic segmentation requires even thousands GPU days, which greatly limits the application of NAS. To address this issue, we propose a novel neural architecture Search framework for Enhanced Decoder (SED). Utilizing the pre-trained hand-designing backbone and the searching space composed of light-weight cells, SED searches for a decoder which can perform high-quality segmentation. Furthermore, we attach switchable skip connection operations to search space, expanding the diversity of possible network structure. The parameters of backbone and operations selected in searching phrase are copied to retraining process. As a result, searching, pruning and retraining can be done in just 1 day. The experimental results show that the SED proposed in this paper only needs 1/4 of the parameters and calculation in contrast to hand-designing decoder, and obtains higher segmentation accuracy on Cityscapes. Transferring the same decoder architecture to other datasets, such as: Pascal VOC 2012, Camvid, ADE20K proves the robustness of SED. • For the task of image semantic segmentation, we propose a gradient-based, pre-trainable neural network architecture search framework SED. In this paper we simultaneously considering decoder and skip connection search. Our method maximizes the advantages of NAS and pre- trained backbone. • SED can compress the retraining iterations to several thousands. The whole searching, pruning, retraining process can be compressed into 1 day. Furthermore, after searching on Cityscapes, the searched network architecture can achieve 80.2% mIoU. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Robust multi-scale weighting-based edge-smoothing filter for single image dehazing.

Author: Yadav, Sumit Kr. and Sarawadekar, Kishor
Subjects: *COST functions, *REGULARIZATION parameter, *SMOOTHING (Numerical analysis), *QUANTITATIVE research, *HAZE
Abstract: The guided image filter (GIF) and weighted guided image filter (WGIF) are local linear model-based good edge-preserving filters. However, due to fixed regularization parameter, they suffer from halo artifacts (morphological artifacts) in the sharp regions. To overcome this issue, a robust multi-scale weighting-based edge-smoothing filter (RMWEF) for single image dehazing is proposed in this paper. It removes morphological artifacts and over-smoothness strongly and preserves edge information precisely in both flat and sharp regions. The proposed dehaze method has four-steps. First, initial transmission map and atmospheric map are estimated by using a novel dark channel prior (DCP) method. Then, the morphological artifacts of initial transmission map are reduced by using non-local haze line averaging (NL-HLA) method. In the third step, transmission map is refined by using the proposed RMWEF. Finally, the haze free image is restored. Theoretical and experimental analysis proves that the proposed algorithm produce effective dehaze results quicker than the existing methods. • Robust multi-scale weighting-based edge-smoothing filter (RMWEF) is proposed in this paper. • The value of the cost function (a x ′ , y ′ ) must vary in the range of '0' to '1' depending on the edge- aware smoothing parameter (γ x ′ , y ′ ). The mathematical formulation presented in this paper shows that the proposed filter maintains this relationship, as expected. • This article presents quantitative analysis for three different values of the regularization (ϵ) parameter viz. 0.01 2 , 0.1 2 , 1 2. Thus, it shows the trade-off between regularization parameter (ϵ) and morphological artifacts. • The proposed RMWEF removes morphological artifacts and over-smoothing effects strongly in the fine structures and preserves details in such areas very well by choosing large window radius ζ 1 =60. • The proposed method is tested on 6,618 images from different datasets and the results are compared with 9 existing methods. The experimental results show that it is independent of the nature of images. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Deep federated learning hybrid optimization model based on encrypted aligned data.

Author: Zhao, Zhongnan, Liang, Xiaoliang, Huang, Hai, and Wang, Kun
Subjects: *DEEP learning, *FEDERATED learning, *BLENDED learning, *GAUSSIAN mixture models, *SEARCH algorithms, *FEATURE extraction, *RECEIVER operating characteristic curves
Abstract: • Improving the quality of Federal Learning encrypted alignment data. • Use Gaussian mixture clustering to cluster samples and set a threshold to filter samples. • Use the encrypted sample attribute searching algorithm to fill in the missing value of the sample. • Design the combination model of variation auto-encoder Gaussian hybrid clustering and federated learning. Federated learning can achieve multi-party data-collaborative applications while safeguarding personal privacy. However, the process often leads to a decline in the quality of sample data due to a substantial amount of missing encrypted aligned data, and there is a lack of research on how to improve the model learning effect by increasing the number of samples of encrypted aligned data in federated learning. Therefore, this paper integrates the functional characteristics of deep learning models and proposes a Variational AutoEncoder Gaussian Mixture Model Clustering Vertical Federated Learning Model (VAEGMMC-VFL), which leverages the feature extraction capability of the autoencoder and the clustering and pattern discovery capabilities of Gaussian mixture clustering on diverse datasets to further explore a large number of potentially usable samples. Firstly, the Variational AutoEncoder is used to achieve dimensionality reduction and sample feature reconstruction of high-dimensional data samples. Subsequently, Gaussian mixture clustering is further employed to partition the dataset into multiple potential Gaussian-distributed clusters and filter the sample data using thresholding. Additionally, the paper introduces a labeled sample attribute value finding algorithm to fill in attribute values for encrypted unaligned samples that meet the requirements, allowing for the full recovery of encrypted unaligned data. In the experimental section, the paper selects four sets of datasets from different industries and compares the proposed method with three federated learning clustering methods in terms of clustering loss, reconstruction loss, and other metrics. Tests on precision, accuracy, recall, ROC curve, and F1-score indicate that the proposed method outperforms similar approaches. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Dual subspace manifold learning based on GCN for intensity-invariant facial expression recognition.

Author: Chen, Jingying, Shi, Jinxin, and Xu, Ruyi
Subjects: *FACIAL expression, *EMOTIONS, *SUPERVISED learning, *COMPUTER vision
Abstract: Facial expression recognition (FER) is one of the most important computer vision tasks for understanding human inner emotions. However, the poor generation ability of the FER model limits its applicability due to tremendous intraclass variation. Especially for expressions of varying intensities, the appearance differences among weak expressions are subtle, which makes FER tasks challenging. In response to these issues, this paper presents a dual subspace manifold learning method based on a graph convolutional network (GCN) for intensity-invariant FER tasks. Our method treats the target task as a node classification problem and learns the manifold representation using two subspace analysis methods: locality preserving projection (LPP) and peak-piloted locality preserving projection (PLPP). Inspired by the classic LPP, which maintains local similarity among data, this paper introduces a novel PLPP that maintains the locality between peak expressions and non-peak expressions to enhance the representation of weak expressions. This paper also reports two subspace fusion methods, one based on a weighted adjacency matrix and another on a self-attention mechanism, that combine the LPP and PLPP to further improve FER performance. The second method achieves a recognition accuracy of 93.83% on the CK+, 74.86% on the Oulu-CASIA and 75.37% on the MMI for weak expressions, outperforming state-of-the-art methods. • A semi-supervised learning framework based on a GCN for intensity-invariant FER tasks. • A novel PLPP method to keep the locality between peak and non-peak expressions. • Two different subspace fusion methods that combine the LPP and PLPP results. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Linearized alternating direction method of multipliers for elastic-net support vector machines.

Author: Liang, Rongmei, Wu, Xiaofei, and Zhang, Zhimin
Subjects: *SUPPORT vector machines, *PETRI nets, *MULTIPLIERS (Mathematical analysis), *CONVEX functions
Abstract: In many high-dimensional datasets, the phenomenon that features are relevant often occurs. Elastic-net regularization is widely used in support vector machines (SVMs) because it can automatically perform feature selection and encourage highly correlated features to be selected or removed together. Recently, some effective algorithms have been proposed to solve the elastic-net SVMs with different convex loss functions, such as hinge, squared hinge, huberized hinge, pinball and huberized pinball. In this paper, we develop a linearized alternating direction method of multipliers (LADMM) algorithm to solve above elastic-net SVMs. In addition, our algorithm can be applied to solve some new elastic-net SVMs such as elastic-net least squares SVM. Compared with some existing algorithms, our algorithm has comparable or better performances in terms of computational cost and accuracy. Under mild conditions, we prove the convergence and derive convergence rate of our algorithm. Furthermore, numerical experiments on synthetic and real datasets demonstrate the feasibility and validity of the proposed algorithm. • We develop a linearized ADMM algorithm for effectively solving a series of elastic-net SVMs. • The proposed algorithm is accompanied by convergence and complexity analysis. • A new elastic-net SVM called LESVM is proposed in this paper. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Disturbance rejection with compensation on features.

Author: Hu, Xiaobo, Su, Jianbo, and Zhang, Jun
Subjects: *CONVOLUTIONAL neural networks, *FEATURE extraction, *TRANSFORMER models
Abstract: In pattern recognition tasks, the information from system input is modeled through a series of nonlinear operations, which include but not limited to feature extraction, regression, and classification. Both theoretically and practically, these operations are inevitably subject to internal modeling error and external disturbance, resulting at a performance challenge. Those state-of-the-art methods, e.g. Convolutional Neural Network and Transformer, still display significant instabilities and failures under practical applications, so comes a lack of generalization. Consequently, the more robust pattern recognition methods and related theories still merit a further study. This paper firstly reviews those state-of-the-art technologies in the field. The bottleneck of performances in those latest researches is associated with a lack of disturbance estimation and corresponding compensation. Therefore, the implications of disturbance rejection in pattern recognition field are further discussed from a control point of view. Then, the open problems are summarized. Ultimately, a discussion of the potential solutions, which is related to the application of compensation on features, is given to highlight the future study. Through the systematic review in this paper, the disturbance rejection in pattern recognition is developed into a control problem. Hopefully, more effective control technologies for the compensation on features can be used to improve the robustness of pattern recognition theoretically and practically. • The pattern recognition process is inevitably subject to internal modeling error and external disturbance. • The shortcomings of existing disturbance rejection technologies are summarized. • The bottleneck of performances in those latest researches is associated with a lack of disturbance estimation and corresponding compensation. • The open problems and the potential solutions are summarized to discuss the future study in an innovative control point of view, i.e. a compensation on features. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Time pattern reconstruction for classification of irregularly sampled time series.

Author: Sun, Chenxi, Li, Hongyan, Song, Moxian, Cai, Derun, Zhang, Baofeng, and Hong, Shenda
Subjects: *TIME series analysis, *DEEP learning, *MULTIPLE imputation (Statistics), *PERIODIC health examinations, *DIAGNOSIS methods, *INDEPENDENT variables, *DIAGNOSIS
Abstract: Irregularly Sampled Time Series (ISTS) include partially observed feature vectors caused by the lack of temporal alignment across dimensions and the presence of variable time intervals. Especially in medical applications, because patients' examinations depend on their health status, observations in this event-based medical time series are nonuniformly distributed. When using deep learning models to classify ISTS, most work defines the problem that needs to be solved as alignment-caused data missing or nonuniformity-caused dependency change. However, they only modeled relationships between observed values, ignoring the fact that time is the independent variable for a time series. In this paper, we emphasize that irregularity is active, time-depended, and class-associated and is reflected in the Time Pattern (TP). To this end, this paper focused on the TP of ISTS for the first time, proposing a Time Pattern Reconstruction (TPR) method. It first encodes time information by the time encoding mechanism, then imputes values from time codes by the continuous-discrete Kalman network, selects key time points by the conditional masking mechanism, and finally classifies ISTS based on the reconstructed TP. Experiments on four real-world medical datasets and three other datasets show that TPR outperforms all baselines. We also show that TP can reveal biomarkers and key time points for diseases. • Classifying irregularly sampled time series is crucial in real-world applications. • Incorporate time-depended mapping into time series classification. • Extract time patterns of irregularly sampled time series. • Propose a plug-in for time-series deep learning models. • Promote the knowledge discovery ability of medical diagnosis methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Regional context-based recalibration network for cataract recognition in AS-OCT.

Author: Zhang, Xiaoqing, Xiao, Zunjie, Yang, Bing, Wu, Xiao, Higashita, Risa, and Liu, Jiang
Subjects: *CATARACT, *CONVOLUTIONAL neural networks, *FEMTOSECOND lasers, *OPTICAL coherence tomography, *IMAGE recognition (Computer vision), *BAYESIAN analysis
Abstract: Deep convolutional neural networks (CNNs) have been widely applied to cataract recognition tasks and have achieved promising results. However, most existing methods focused on designing data-driven CNN architectures, and failed to exploit asymmetric opacity distribution prior of cataract, which is significant for cataract diagnosis. To this end, this paper proposes a regional context-based recalibration (RCR) module, which fully leverages the clinical prior to recalibrate the feature maps with regional pooling, region-based context integration, and integrated context fusion. We stack these RCR modules to form an RCRNet based on anterior segment optical coherence tomography (AS-OCT) images for cataract recognition. Experiments on the AS-OCT-NC2 dataset and two publicly available medical datasets demonstrate that RCRNet achieves a better trade-off between performance and efficiency than state-of-the-art channel attention-based networks. We also explain the inherent behavior of RCRNet with the aid of the visual analysis. In addition, this paper is the first to study the effects of two performance evaluation methods on AS-OCT image-based cataract classification results: the single-image level and the single-eye level, suggesting that adopting the single-eye level to evaluate cataract classification performance according to clinical diagnosis requirement. • We design a regional context-based recalibration module by leveraging the prior of cataract. • We construct an RCRNet for cataract recognition based on AS-OCT images. • We test the effects of two performance evaluation methods on cataract classification performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Semi-supervised medical image segmentation via hard positives oriented contrastive learning.

Author: Tang, Cheng, Zeng, Xinyi, Zhou, Luping, Zhou, Qizheng, Wang, Peng, Wu, Xi, Ren, Hongping, Zhou, Jiliu, and Wang, Yan
Subjects: *IMAGE segmentation, *SUPERVISED learning, *DIAGNOSTIC imaging, *LEARNING strategies, *IMAGE recognition (Computer vision), *SOURCE code
Abstract: • The paper proposes a hard positives oriented contrastive learning strategy for semi-supervised medical image segmentation. • The HPC strategy is constructed from two levels: an unsupervised image-level HPC and a supervised pixel-level HPC. • The pixel-level HPC is implemented in a region-based manner to save memory and deliver more multi-granularity information. • The paper inserts several feature swap modules into the pre-trained decoder to encourage robust segmentation predictions. • The proposed framework outperforms the state-of-the-art methods on two public clinical datasets. Semi-supervised learning (SSL) has been a popular technique to resolve the annotation scarcity problem in pattern recognition and medical image segmentation, which usually focuses on two critical issues: 1) learning a well-structured categorizable embedding space, and 2) establishing a robust mapping from the embedding space to the pixel space. In this paper, to resolve the first issue, we propose a h ard p ositives oriented c ontrastive (HPC) learning strategy to pre-train an encoder-decoder-based segmentation model. Different from vanilla contrastive learning tending to focus only on hard negatives, our HPC learning strategy additionally concentrates on hard positives (i.e., samples with the same category but dissimilar feature representations to the anchor), which are considered to play an even more crucial role in delivering discriminative knowledge for semi-supervised medical image segmentation. Specifically, the HPC is constructed from two levels, including an unsupervised image-level HPC (IHPC) and a supervised pixel-level HPC (PHPC), empowering the embedding space learned by the encoder with both local and global senses. Particularly, the PHPC learning strategy is implemented in a region-based manner, saving memory usage while delivering more multi-granularity information. In response to the second issue, we insert several feature swap (FS) modules into the pre-trained decoder. These FS modules aim to perturb the mapping from the intermediate embedding space towards the pixel space, trying to encourage more robust segmentation predictions. Experiments on two public clinical datasets demonstrate that our proposed framework surpasses the state-of-the-art methods by a large margin. Source codes are available at https://github.com/PerPerZXY/BHPC. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. Deep image clustering with contrastive learning and multi-scale graph convolutional networks.

Author: Xu, Yuankun, Huang, Dong, Wang, Chang-Dong, and Lai, Jian-Huang
Subjects: *CONVOLUTIONAL neural networks, *GRAPH neural networks
Abstract: Deep clustering has shown its promising capability in joint representation learning and clustering via deep neural networks. Despite the significant progress, the existing deep clustering works mostly utilize some distribution-based clustering loss, lacking the ability to unify representation learning and multi-scale structure learning. To address this, this paper presents a new deep clustering approach termed I mage c luster i ng with c ontrastive le arning and multi-scale G raph C onvolutional N etworks (IcicleGCN), which bridges the gap between convolutional neural network (CNN) and graph convolutional network (GCN) as well as the gap between contrastive learning and multi-scale structure learning for the deep clustering task. Our framework consists of four main modules, namely, the CNN-based backbone, the Instance Similarity Module (ISM), the Joint Cluster Structure Learning and Instance reconstruction Module (JC-SLIM), and the Multi-scale GCN module (M-GCN). Specifically, the backbone network with two weight-sharing views is utilized to learn the representations for the two augmented samples (from each image). The learned representations are then fed to ISM and JC-SLIM for joint instance-level and cluster-level contrastive learning, respectively, during which an auto-encoder in JC-SLIM is also pretrained to serve as a bridge to the M-GCN module. Further, to enforce multi-scale neighborhood structure learning, two streams of GCNs and the auto-encoder are simultaneously trained via (i) the layer-wise interaction with representation fusion and (ii) the joint self-adaptive learning. Experiments on multiple image datasets demonstrate the superior clustering performance of IcicleGCN over the state-of-the-art. The code is available at https://github.com/xuyuankun631/IcicleGCN. • This paper for the first time enables multi-scale structure learning for image clustering via GCNs. • Multi-scale structure learning and two levels of contrastive learning are jointly enforced. • A novel deep image clustering approach termed IcicleGCN is proposed. • Extensive experimental results confirm the superiority of IcicleGCN over the state-of-the-art. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. High-order relational generative adversarial network for video super-resolution.

Author: Chen, Rui, Mu, Yang, and Zhang, Yan
Subjects: *GENERATIVE adversarial networks, *VIDEO compression, *VIDEO coding, *ACQUISITION of manuscripts, *OPTICAL flow, *IMAGE processing, *IMAGE representation
Abstract: • We would like to submit the manuscript entitled "High-order relational generative adversarial network for video super-resolution", which we wish to be considered as a research article for publication in "Pattern Recognition". Because our contributed research covers the area" Image processing and representation", this manuscript belongs in this journal. No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication. • Both the instability of motion compensation and feature aggregation often undermine the SR results of videos. Only the global spatio-temporal dependencies of same scale features are considered as one-order relations, with difficulty in handling the complex motions and context variations. To deal with these issues, we argue that high-order relations should be modelled and applied in the generative strategies to exploit feature alignments and long-range information fusion. In fact, the high-order relations mean that the dependencies of features among pixel positions are mined not only from global feature representations but also through local patch interactions. Moreover, the stronger relations can be further revealed across different scales, by which the aggregation weight matrices with high-order statistics are adaptively determined. The videos are regarded as the complementary fusion of motion and context. Thus, simultaneously capturing the underlying relations of the motion and context is the key to video SR. In this paper, we propose a high-order relational generative adversarial network (HOR-GAN) for accurate video SR, which has strong capacity of alleviating the erroneous motion compensation and merging the useful information among the consecutive frames. • In summary, we highlight the main contributions of this paper as follows: We propose a HOR-GAN framework for highly accurate and realistic video SR. By exploiting high-order relations of feature patches under the construction of the pyramid graphs, the proposed HOR-GAN works well to produce a better feature alignment and fuse more spatio-temporal information. We adopt the dual discriminators to provide spatially coherent feedback to the generator, thus making the generator focus more on fine-grained features. The effectiveness of the proposed method has been justified through the extensive experiments. • We design a motion-aware relation module to accurately align neighboring frames with the reference ones. In this module, a patch-wise matching strategy is first used to build cross-scale correspondences of similar patches. The graph attention layers then adaptively aggregate the local patch features to further decrease the alignment error. We develop a context-aware relation module to make full use of high-order dependencies among all warped feature patches. We introduce the multi-scale graph convolution layers to mine contextual interaction relations for better fusing the spatio-temporal features, in which the position information is encoded to augment the detail recovery. Finally, each pixel is enhanced via a global self-attention. • We deeply appreciate your consideration of our manuscript, and we look forward to receiving comments from the reviewers. If you have any queries, please don't hesitate to contact me. Yours sincerely, Rui Chen Video super-resolution can reconstruct a sequence of high-resolution frames with temporally consistent contents from their corresponding low-resolution sequences. The key challenge for this task is how to effectively utilize both inter-frame temporal relations and intra-frame spatial relations. The existing methods for super-resolving the videos commonly estimate optical flows to align the features of multiple frames based on temporal correlations. However, motion estimation is often error-prone and hence largely hinders the recovery of plausible details. Moreover, high-order contextual dependencies in the feature space are rarely exploited for further enhancing the spatio-temporal information fusion. To this end, we propose a novel generative adversarial network to super-resolve low-resolution videos, which makes full use of patch embeddings and is effective in exploring high-order spatio-temporal relations of the feature patches. Specifically, a motion-aware relation module is designed to handle the alignment between neighboring frames and reference ones. Depending on a patch-matching strategy for adaptive selection of multiple most similar patches, the cross-scale graph is constructed to reliably aggregate these patches using a feature pyramid. Based on the structure of multi-scale graph, a context-aware relation module is developed to capture high-order dependencies among resulting warped patches for better leveraging long-range complementary contexts. To further enhance reconstruction ability, the temporal position information of video sequences is also encoded into this module. Dual discriminators with cycle consistent constraints are adopted to provide more informative feedback to the generator while maintaining the global coherence. Extensive experiments have demonstrated the effectiveness of the proposed method in terms of quantitative and qualitative evaluation metrics. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. A Sparse Local Binary Pattern extraction algorithm applied to event sensor data for object classification.

Author: Fardo, Fernando Azevedo and Rodrigues, Paulo Sérgio Silva
Subjects: *CONVOLUTIONAL neural networks, *COMPUTER vision, *DATA extraction, *MOTION detectors, *COMPUTER operators, *DETECTORS, *OBJECT recognition (Computer vision)
Abstract: Recently, new sensors with active pixels were brought to market. These sensors export local variations of light intensity in the form of asynchronous events with low latency. Since the data output format is a stream of addressable events and not a complete image of light intensities, new algorithms are required for known problems in the field of Computer Vision, such as segmentation, VO, SLAM, object, and scene recognition. There are some proposed methodologies for object recognition using conventional methods, convolutional neural networks, and third-generation neural networks based on spikes. However, convolutional neural networks and spike neural networks require specific hardware for processing, hard to miniaturize. Also, several traditional Computer Vision operators and feature descriptors have been neglected in the context of event sensors and could contribute to lighter methodologies in object recognition. This paper proposes an algorithm for local binary pattern extraction in sparse structures, typically found in this context. This paper also proposes two methodologies using local binary patterns to captures with event-based sensors for object recognition. The first methodology exploits the known motion performed by the sensor, while the second is motion agnostic. It is demonstrated experimentally that the LBP operator is a fast and light alternative that enables variable reduction using PCA in some cases. The experiments also show that it is possible to reduce the final feature vector for classification by up to 99 , 73 % when compared to conventional methods considered state-of-the-art while maintaining comparable accuracy. • Sparse-LBP extraction algorithm. • Methodologies for memory-efficient object recognition with event-based sensors. • Surface of active events representation using hash tables. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. Customized meta-dataset for automatic classifier accuracy evaluation.

Author: Huang, Yan, Zhang, Zhang, Wu, Qiang, Huang, Han, Zhong, Yi, and Wang, Liang
Subjects: *IMAGE recognition (Computer vision), *PERFORMANCE standards, *NAIVE Bayes classification
Abstract: Automatic classifier accuracy evaluation (ACAEval) on unlabeled test sets is critical for unseen real-world environments. The use of dataset-level regression on synthesized meta-datasets (comprised of many sample sets) has shown promising results for ACAEval. However, the existing meta-dataset for ACAEval is created using simple image transformations such as rotation and background substitution, which can make it difficult to ensure a reasonable distribution shift between the sample set and the test set. When the distribution shift is large, it becomes challenging to estimate the classifier accuracy on the test set using those sample sets. To ensure more robust ACAEval, this paper attempts to customize a meta-dataset in which each sample set has a reasonable distribution shift to the test set. An intra-class cycle-consistent adversarial learning (ICAL) method is introduced to transfer the style of a labeled training set to the style of the test set, by jointly considering the domain shift issue, the label flipping issue (the semantic information may be changed after style transformation), and the diversity of multiple sample sets in the meta-dataset. Experiments validate that under the same experimental setup, our method outperforms the existing ACAEval methods by a good margin, and achieves state-of-the-art performance on several standard benchmark datasets, including digit classification and natural image classification. • Two authors named Yan Huang in pinyin, one PostDoc (1st author), the other an Associate Prof (3rd author). • Paper on ACAEval for classifier accuracy. Uses meta-dataset technique for unlabeled real-world data. • Customized meta-dataset for ACAEval to address distribution shift between samples and test set. • Our sample set considers label flip issue. Introduces random indicator and FD margin loss. • Experiments demonstrate our ACAEval matches existing methods and surpasses dataset-level regression. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. CS-GAC: Compressively sensed geodesic active contours.

Author: Shan, Hao
Subjects: *GEODESICS, *COMPRESSED sensing, *IMAGE reconstruction algorithms, *IMAGE segmentation, *SENSES
Abstract: This paper proposes an edge based compressively sensed (CS) geodesic active contour (GAC) model, termed CS-GAC, to ensure faithful edge detection and accurate object segmentation. The motivation behind this paper is that edge information driving the contour evolution can be iteratively obtained by incomplete CS measurements. In each iteration, the CS-GAC is a three-step process including edge detection, active contouring and sparse reconstruction. Instead of working on the final reconstructed images themselves, the evolution of the CS-GAC is driven by a few CS measurements and guided by updatable edge information. The edge information is generated by a complex shearlet transform (CST) based edge map. In the framework, reconstruction and edge detection work alternately. The iterative update property that takes advantages of both edge sparsity and edge detection can largely improve the evolution precision. Numerical experiments show that the CS-GAC can obtain challenging segmentation results in comparisons with the state of the art methods, and has competitive prospects. • Edge information driving the contour evolution is iteratively obtained from CS measurements. • Updating property of the edge indicator takes advantages of both edge sparsity and edge detection. • The complex shearlet transform based edge map has been improved by the iteratively updating mechanism. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Weakly-supervised semantic segmentation with superpixel guided local and global consistency

Author: Yi, Sheng, Ma, Huimin, Wang, Xiang, Hu, Tianyu, Li, Xi, and Wang, Yu
Published: 2022
Full Text: View/download PDF

46. Distance learning by mining hard and easy negative samples for person re-identification.

Author: Zhu, Xiaoke, Jing, Xiao-Yuan, Zhang, Fan, Zhang, Xinyu, You, Xinge, and Cui, Xiang
Subjects: *DISTANCE education, *IDENTIFICATION
Abstract: • We have proposed a Hard and Easy Negative samples mining based Distance learning (HEND) approach for person re-identification. • We have designed a symmetric triplet constraint for the proposed HEND approach. • We have proposed a Projection based HEND (PHEND) approach, which simultaneously learns a projection matrix and a distance metric. • We have conducted extensive experiments in this paper to evaluate our approaches. Distance learning is an effective technique for person re-identification. In practice, the hard negative samples usually contain more discriminative information than the easy negative samples. Therefore, it's necessary to investigate how to make full use of the discriminative information conveyed by different types of negative samples in the distance learning process. In this paper, we propose a H ard and E asy N egative samples mining based D istance learning (HEND) approach for person re-identification, which learns the distance metric by designing different objective functions for hard and easy negative samples, such that the discriminative information contained in negative samples can be exploited more effectively. Moreover, considering that there usually exist large differences between the images captured by different cameras, we further propose a projection-based HEND approach to reduce the influence of between-camera differences to the re-identification. Experimental results on seven pedestrian image datasets demonstrate the effectiveness of the proposed approaches. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

47. Two-hop walks indicate PageRank order.

Author: Tang, Ying
Subjects: *RANDOM matrices, *PROBABILISTIC number theory, *EIGENVECTORS, *WALKING, *MATRICES (Mathematics), *ORDER
Abstract: • This paper opens a new door for studying PageRank based on the spectral distribution law of random matrices. • We provide a method for pairwise comparisons in PageRank in O (1) without computing the exact value of the principle eigenvectors of Google matrices from a probabilistic view. • We provide a method for extracting the top k list in O (kn), where n is the total number of involved items This paper shows that pairwise PageRank orders emerge from two-hop walks. The main tool used here refers to a specially designed sign-mirror function and a parameter curve, whose low-order derivative information implies pairwise PageRank orders with high probability. We study the pairwise correct rate by placing the Google matrix G in a probabilistic framework, where G may be equipped with different random ensembles for model-generated or real-world networks with sparse, small-world, scale-free features, the proof of which is mixed by mathematical and numerical evidence. We believe that the underlying spectral distribution of aforementioned networks is responsible for the high pairwise correct rate. Moreover, the perspective of this paper naturally leads to an O (1) algorithm for any single pairwise PageRank comparison if assuming both A = G − I n , where I n denotes the identity matrix of order n , and A 2 are ready on hand (e.g., constructed offline in an incremental manner), based on which it is easy to extract the top k list in O (kn), thus making it possible for PageRank algorithm to deal with super large-scale datasets in real time. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

48. Fusion of evolvable genome structure and multi-objective optimization for subspace clustering.

Author: Paul, Dipanjyoti, Saha, Sriparna, and Mathew, Jimson
Subjects: *GENETIC techniques, *PATTERN recognition systems, *GENE expression, *KEY performance indicators (Management)
Abstract: • This paper reports the first attempt in integrating multiobjective opti- mization (MOO) and genomic structure for solving the subspace clustering problem. • Multiobjective optimization framework is utilized to simultaneously opti- mize several sub-space cluster quality measures. • The existing cluster quality measures are modified to handle subspace clustering problem. • As a part of the experiments, the proposed algorithm is applied on the bi-clustering of gene expression data to show the efficacy of the existing technique in solving some real-life problem. Bi-clustering of gene expres- sion data is a sub-space clustering problem where a subset of rows and a subset of columns need to be selected. Subspace clustering techniques become paramount in pattern recognition for detecting local variations from high dimensional data. Several techniques exist in the recent literature for subspace clustering, majority of which optimize implicitly or explicitly a single cluster quality measure. Inspired by the success of multi-objective optimization in solving clustering problem, we developed a multi-objective based subspace clustering technique in this paper. The proposed technique simultaneously optimizes two subspace cluster quality measures, capable of capturing different cluster shapes/properties. Two existing cluster quality measures, XB-index and PBM-index, are modified to develop subspace cluster validity indices, and then those are used as optimization criteria. These cluster validity indices measure the appropriateness of generated subspace clusters in terms of intra-subspace cluster similarity and separation between subspace clusters. The proposed approach utilizes a new evolvable genome structure which stores the information about subspaces in its phenotype and genotype and evolves this genome structure with the help of different genetic operators. The developed algorithm is applied on ten standard real-life data sets and sixteen synthetic datasets for identifying different subspace clusters. The results obtained by this algorithm are compared against some state-of-the-art techniques with respect to different performance metrics. Experimentation reveals that the proposed algorithm is able to take advantages of its evolvable genomic structure and multi-objective based framework and it can be applied to any data set. In a part of the paper, the efficacy of the proposed technique is also shown for bi-clustering of gene-expression data sets. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

49. PSCFormer: A lightweight hybrid network for gas identification in electronic nose system.

Author: Li, Ziyang, Kang, Siyuan, Feng, Ninghui, Yin, Chongbo, and Shi, Yan
Subjects: *ELECTRONIC noses, *CONVOLUTIONAL neural networks, *ELECTRONIC systems, *TRANSFORMER models, *FEATURE extraction, *METHODS engineering
Abstract: • LPSF module is designed to improve the gas features. • Collaborative attention is proposed for local and global gas features. • The feature compensation mechanism of CNN and Transformer is designed. • PSCFormer is proposed for gas recognition. Based on their powerful feature extraction capability, a convolutional neural network (CNN) has been gradually applied to gas identification in the electronic nose (e-nose) system. The responses of different intensities in the e-nose system are significantly correlated, and CNN extracts the local gas features by convolution while ignoring their global correlation. Transformer combines different responses and obtains the correlation between global features by self-attention. This paper proposes a lightweight hybrid network called Peak Search-based Convolutional Transformers (PSCFormer). First, combining the data characteristics of gas information, the Local Peak Search and Feature Fusion (LPSF) module is proposed to focus on the key gas features. Second, Transformer Encoder (TE) is proposed to obtain the global correlation between global features, and the parallel Convolution Encoder (CE) is proposed to capture the local dependence. Finally, a reasonable feature complementation mechanism is presented, and the preference of TE is alleviated for the slow-down response while solving the receptive field limitation of CE. This paper has evaluated three different datasets to validate the effectiveness of PSCFormer, all of which show stable and excellent performance with a good tradeoff between efficiency and complexity. The results prove that PSCFormer is an efficient and lightweight gas identification network, which provides a method to promote the engineering application of the e-nose system. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Crisis event summary generative model based on hierarchical multimodal fusion.

Author: Wang, Jing, Yang, Shuo, and Zhao, Hui
Subjects: *RESCUE work, *CRISES, *SOCIAL media
Abstract: How to quickly obtain information about crisis events on social media such as Twitter and Weibo is crucial for follow-up rescue work and the promotion of postdisaster reconstruction. Therefore, it is very important to obtain useful information through multimodal summary generation technology. The current technology for generating crisis event summaries is mainly affected by unimodal bias and disregards the diversity of information in text and images. To solve these problems, this paper proposes a hierarchical multimodal crisis event summary generation model based on the modal alignment premise and hierarchical thinking. First, the visual context vector and text context vector are obtained, and then the hierarchical multimodal pointer model is employed to generate the text summary. Thus, the modal deviation is solved. Second, to select high-quality images, this paper proposes a dynamic selection strategy, which to some extent considers the requirements of the high correlation between text and images and the diversity of crisis information. Last, the experimental results based on the crisis event data in the MSMO dataset show that the proposed model achieves good performance in the summary generation and image selection of crisis events. • This paper proposes a multimodal hierarchical fusion model for crisis event summarization, which maintains the independence of images. • This paper proposes a dynamic selection strategy to better select suitable images. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Region

Database

Publisher

4,548 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources