Journal: pattern recognition letters / Search Limiters: Peer Reviewed - Searchworks@Jio Institute Digital Library Search Results

1. Automatic generation of scientific papers for data augmentation in document layout analysis

Author: Pisaneschi, Lorenzo, Gemelli, Andrea, and Marinai, Simone
Published: 2023
Full Text: View/download PDF

2. Special section: Best papers of the international conference on pattern recognition and artificial intelligence (ICPRAI) 2022.

Author: El-Yacoubi, Mounîm A., Pal, Umapada, Granger, Eric, and Yuen, Pong Chi
Subjects: *ARTIFICIAL intelligence, *CONFERENCE papers, *CONFERENCES & conventions, *PATTERN recognition systems
Published: 2024
Full Text: View/download PDF

3. Scientific papers citation analysis using textual features and SMOTE resampling techniques.

Author: Umer, Muhammad, Sadiq, Saima, Missen, Malik Muhammad Saad, Hameed, Zahid, Aslam, Zahid, Siddique, Muhammad Abubakar, and NAPPI, Michele
Subjects: *CITATION analysis, *CONTENT analysis, *MACHINE learning, *SENTIMENT analysis, *PATTERN recognition systems, *USER-generated content
Abstract: • Explore qualitative aspects of citations to measure the influence of a research article. • Apply a feature representation technique in combination with machine learning models to find the sentiment of citation. • Determine the sentiment of citation instances into positive, negative, or neutral. • Analyze the efficacy of SMOTE in balancing the citation sentiment dataset. Ascertaining the impact of research is significant for the research community and academia of all disciplines. The only prevalent measure associated with the quantification of research quality is the citation-count. Although a number of citations play a significant role in academic research, sometimes citations can be biased or made to discuss only the weaknesses and shortcomings of the research. By considering the sentiment of citations and recognizing patterns in text can aid in understanding the opinion of the peer research community and will also help in quantifying the quality of research articles. Efficient feature representation combined with machine learning classifiers has yielded significant improvement in text classification. However, the effectiveness of such combinations has not been analyzed for citation sentiment analysis. This study aims to investigate pattern recognition using machine learning models in combination with frequency-based and prediction-based feature representation techniques with and without using Synthetic Minority Oversampling Technique (SMOTE) on publicly available citation sentiment dataset. Sentiment of citation instances are classified into positive, negative or neutral. Results indicate that the Extra tree classifier in combination with Term Frequency-Inverse Document Frequency achieved 98.26% accuracy on the SMOTE-balanced dataset. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

4. Special section: Best papers of the 14th Mexican conference on pattern recognition (MCPR) 2022.

Author: Vergara Villegas, Osslan Osiris, Cruz Sánchez, Vianey Guadalupe, Sossa Azuela, Juan Humberto, Carrasco Ochoa, Jesús Ariel, Martínez Trinidad, José Francisco, and Olvera López, José Arturo
Subjects: *CONFERENCES & conventions
Published: 2023
Full Text: View/download PDF

5. Editorial paper for pattern recognition letters VSI on advances in graph-based recognition for pattern recognition.

Author: Conte, Donatello, Ramel, Jean-Yves, and Foggia, Pasquale
Published: 2022
Full Text: View/download PDF

6. Editorial paper for Pattern Recognition Letters VSI on cross model understanding for visual question answering.

Author: Wan, Shaohua, Gao, Zan, Zhang, Hanwang, Xiaojun, Chang, Chen, Chen, and Tefas, Anastasios
Published: 2022
Full Text: View/download PDF

7. Editorial paper for pattern recognition letters VSI on multi-view representation learning and multi-modal information representation.

Author: Song, Dan, Zhang, Wenshu, Ren, Tongwei, and Chang, Xiaojun
Subjects: *PATTERN recognition systems
Published: 2022
Full Text: View/download PDF

8. Detection and recognition of erasures in on-line captured paper forms

Author: Wiart, Alain, Paquet, Thierry, and Heutte, Laurent
Subjects: *ERASERS, *IMAGE processing, *ELECTRONIC data processing, *GRAPHOLOGY, WRITING
Abstract: Abstract: This paper presents a method to automatically locate and recognize erasures in on-line captured handwritten documents in order to avoid a subsequent misrecognition of characters and words. We offer a comprehensive definition of the ambiguous concept of erasure in handwriting that results in a more accurate characterization of the different types of erasures. Thanks to this characterization, a preprocessing step, placed upstream of the word recognition engine, enables to classify through an MLP each couple of connected strokes as being an erasure or not using a low-level feature set. We evaluate our system on a real handwritten document database and show how our system can be tuned to operate in accordance with various recognition engines thus leading to high performance in erasure detection and recognition. [Copyright &y& Elsevier]
Published: 2007
Full Text: View/download PDF

9. Special issue on ICPR 2014 awarded papers.

Author: Chellappa, Rama, Heyden, Anders, Laurendeau, Denis, Felsberg, Michael, and Borga, Magnus
Subjects: *PATTERN recognition systems, *IMAGE processing, *CONFERENCES & conventions
Published: 2016
Full Text: View/download PDF

10. Award winning papers from the 23rd International Conference on Pattern Recognition (ICPR).

Author: Davis, Larry, Bimbo, Alberto Del, and Lovell, Brian
Subjects: *PATTERN perception, *CONFERENCES & conventions, *AWARDS, *LIGHT
Published: 2019
Full Text: View/download PDF

11. Neural network based cognitive approaches from face perception with human performance benchmark.

Author: Chen, Yiyang, Li, Yi-Fan, Cheng, Chuanxin, and Ying, Haojiang
Abstract: Artificial neural network models are able to achieve great performance at numerous computationally challenging tasks like face recognition. It is of significant importance to explore the difference between neural network models and human brains in terms of computational mechanism. This issue has become an experimental focus for some researchers in recent studies, and it is believed that using human behavior to understand neural network models can address this issue. This paper compares the neural network model performance with human performance on a classic yet important task: judging the ethnicity of a given face. This study uses Caucasian and East Asian faces to train 4 neural networks including AlexNet, VGG11, VGG13, and VGG16. Then, the ethnicity judgments of the neural networks are compared with human data using classical psychophysical methods by fitting psychometric curves. The results suggest that VGG11, followed by VGG16, shows a similar response pattern as humans, while simpler AlexNet and more complex VGG13 do not resemble human performance. Thus, this paper explores a new paradigm to compare neural networks and human brains. • Neural networks are able to provide cognitive approaches from face perception. • Human perception is used as a benchmark to evaluate the performance of neural networks. • Neural networks utilizing eye regions are confirmed to be better to perceive faces. • Attentional region analysis would unveil the processing details of a neural network. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Conditional Information Gain Trellis.

Author: Bicici, Ufuk Can, Meral, Tuna Han Salih, and Akarun, Lale
Abstract: Conditional computing processes an input using only part of the neural network's computational units. Learning to execute parts of a deep convolutional network by routing individual samples has several advantages: This can facilitate the interpretability of the model, reduce the model complexity, and reduce the computational burden during training and inference. Furthermore, if similar classes are routed to the same path, that part of the network learns to discriminate between finer differences and better classification accuracies can be attained with fewer parameters. Recently, several papers have exploited this idea to select a particular child of a node in a tree-shaped network or to skip parts of a network. In this work, we follow a Trellis-based approach for generating specific execution paths in a deep convolutional neural network. We have designed routing mechanisms that use differentiable information gain-based cost functions to determine which subset of features in a convolutional layer will be executed. We call our method Conditional Information Gain Trellis (CIGT). We show that our conditional execution mechanism achieves comparable or better model performance compared to unconditional baselines, using only a fraction of the computational resources. We provide our code and model checkpoints used in the paper at: https://github.com/ufukcbicici/cigt/tree/prl/prl_scripts. • We introduce Conditional Information Gain Trellis (CIGT) for conditional computing. • We derive the CIGT loss function based on classification and information gain losses. • CIGT performs better or comparably using a fraction of the computational resources. • We give tests on MNIST, Fashion MNIST, and CIFAR 10, showing CIGT compares favorably. • Supplementary materials show that semantically similar classes are grouped together. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. FDM: Document image seen-through removal via Fuzzy Diffusion Models.

Author: Wang, Yijie, Xu, Jindong, Liang, Zongbao, Chong, Qianpeng, and Cheng, Xiang
Abstract: While scanning or shooting a document, factors like ink density and paper transparency may cause the content from the reverse side to become visible through the paper, resulting in a digital image with a 'seen-through' phenomenon, which will affect practical applications. In addition, document images can be affected by random factors during the imaging process, such as differences in the performance of camera equipment and variations in the physical properties of the document itself. These random factors increase the noise of the document image and may cause the seen-through phenomena to become more complex and diverse. To tackle this issue, we propose the Fuzzy Diffusion Model (FDM), which combines fuzzy logic with diffusion models. It effectively models complex seen-through effects and handles uncertainties in document images. Specifically, we gradually degrade the original image with mean-reverting stochastic differential equation(SDE) to transform it into seen-through mean state with fixed Gaussian noise version. Following this, fuzzy operations are introduced into the noise network. Which helps the model better learn noise and data distributions by reasoning about the affiliation relationship of each pixel point through fuzzy logic. Eventually, in the reverse process, the low-quality image is gradually restored by simulating the corresponding reverse-time SDE. Extensive quantitative and qualitative experiments conducted on various datasets demonstrate that the proposed method significantly removes the seen-through effects and achieves good results under several metrics. The proposed FDM effectively solves the seen-through effects of document images and obtains better visual quality. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. A simple and efficient filter feature selection method via document-term matrix unitization.

Author: Li, Qing, Zhao, Shuai, He, Tengjiao, and Wen, Jinming
Subjects: *FEATURE selection, *FILM reviewing, *ABSOLUTE value, *PRODUCT reviews
Abstract: Text processing tasks commonly grapple with the challenge of high dimensionality. One of the most effective solutions to this challenge is to preprocess text data through feature selection methods. Feature selection can select the most advantageous features for subsequent operations (e.g., classification) from the native feature space of the text. This process effectively trims the feature space's dimensionality, enhancing subsequent operations' efficiency and accuracy. This paper proposes a straightforward and efficient filter feature selection method based on document-term matrix unitization (DTMU) for text processing. Diverging from previous filter feature selection methods that concentrate on scoring criteria definition, our method achieves more optimal feature selection by unitizing each column of the document-term matrix. This approach mitigates feature-to-feature influence and reinforces the role of the weighting proportion within the features. Subsequently, our scoring criterion subtracts the sum of weights for negative samples from positive samples and takes the absolute value. We conduct numerical experiments to compare DTMU with four advanced filter feature selection methods: max–min ratio metric, proportional rough feature selector, least loss, and relative discrimination criterion, along with two classical filter feature selection methods: Chi-square and information gain. The experiments are performed on four ten-thousand-dimensional feature space datasets: b o o k , d v d , m u s i c , m o v i e and two thousand-dimensional feature space datasets: i m d b , a m a z o n _ c e l l s , sourced from Amazon product reviews and movie reviews. Experimental findings demonstrate that DTMU selects more advantageous features for subsequent operations and achieves a higher dimensionality reduction rate than those of the other six methods used for comparison. Moreover, DTMU exhibits robust generalization capabilities across various classifiers and dimensional datasets. Notably, the average CPU time for a single run of DTMU is measured at 1.455 s. • This paper offers DTMU, a filter feature selection method enhancing feature quality via unitization for improved properties. • DTMU is notably user-friendly, involving only two straightforward steps. • This paper substantiates, through numerical experiments, that DTMU stands as an advanced and effective method. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Call for Papers.

Subjects: *PATTERN recognition systems, *PUBLISHING, *PERIODICAL articles, *COMPUTER science research, *COMPUTER science periodicals
Published: 2014
Full Text: View/download PDF

16. Call for Papers: special issue on ’Meta-heuristic intelligence based image processing’

Published: 2009
Full Text: View/download PDF

17. Neural ordinary differential equation for irregular human motion prediction.

Author: Chen, Yang, Liu, Hong, Song, Pinhao, and Li, Wenhao
Subjects: *MOTION capture (Cinematography), *ORDINARY differential equations, *MOTION capture (Human mechanics), *FIXED interest rates, *QUATERNIONS
Abstract: Human motion prediction often assumes that the input sequence is of fixed frame rates. However, in real-world applications, the motion capture system may work unstably sometimes and miss some frames, which leads to inferior performance. To solve this problem, this paper leverages neural Ordinary Differential Equations and proposes a human Motion Prediction method named MP-ODE to handle irregular-time human pose series. First, a Difference Operator and a Positional Encoding are proposed to explicitly provide the kinematic and time information for the model. Second, we construct the encoder–decoder model with ODE-GRU unit, which enables us to learn continuous-time dynamics of human motion. Third, a Quaternion Loss transforms exponential maps to quaternion to train MP-ODE. The Quaternion Loss can avoid the discontinuities and singularities of exponential maps, boosting the convergence of the model. Comprehensive experiments on Human3.6 m and CMU-Mocap datasets demonstrate that the proposed MP-ODE achieves promising performance in both normal and irregular-time conditions. • This paper designs a framework MP-ODE to tackle irregular human motion prediction. • With Neural ODEs, MP-ODE has the continuous-time series modeling ability. • MP-ODE incorporates dynamics information as well as Positional Encoding into the input features. • A Quaternion Loss is proposed to avoids discontinuities and singularities during the training. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Abductive natural language inference by interactive model with structural loss.

Author: Li, Linhao, Wang, Ao, Xu, Ming, Dong, Yongfeng, and Li, Xin
Subjects: *STRUCTURAL models, *NATURAL languages, *LANGUAGE models, *INFERENCE (Logic), *STRUCTURAL design
Abstract: The abductive natural language inference task (α NLI) is proposed to infer the most plausible explanation between the cause and the event. In the α NLI task, two observations are given, and the most plausible hypothesis is asked to pick out from the candidates. Existing methods model the relation between each candidate hypothesis separately and penalize the inference network uniformly. In this paper, we argue that it is unnecessary to distinguish the reasoning abilities among correct hypotheses; and similarly, all wrong hypotheses contribute the same when explaining the reasons of the observations. Therefore, we propose to group instead of ranking the hypotheses and design a structural loss called "joint softmax focal loss" in this paper. Based on the observation that the hypotheses are generally semantically related, we design a novel interactive language model aiming at exploiting the rich interaction among competing hypotheses. We name this new model for α NLI: Interactive Model with Structural Loss (IMSL). The experimental results show that our IMSL has achieved the highest performance on the RoBERTa-large pretrained model, with ACC and AUC results increased by about 1% and 5% respectively. We also compared the performance in terms of precision and sensitivity with publicly available code, demonstrating the efficiency and robustness of the proposed approach. • For α NLI task, we regroup instead of ranking all hypotheses. • We design a softmax focal loss for each group and combine them into a joint loss. • we design an information interaction layer that increases the AUC by about 5%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Two-step multi-view data classification based on dynamic Graph-ELM.

Author: Li, Li, Han, Qihong, Li, Jiayao, and Cui, Zhanqi
Subjects: *MACHINE learning, *REPRESENTATIONS of graphs, *CLASSIFICATION algorithms, *GRAPH algorithms, *WEIGHTED graphs, *LINEAR complementarity problem
Abstract: This paper focuses on the classification problem of multi-view data, aiming to improve the classification accuracy of current algorithms on multi-view data. Previous multi-view classification algorithms are usually based on exploiting the complementarity of different views and fusing features from different views. A representative category is the graph-based method, which builds a graph matrix for each view, and then fuses the graph matrices of different views to obtain a unified graph. These methods have the following problems: firstly, the graph matrix is simply based on sample similarity usually; secondly, the learned graph matrix does not change dynamically; thirdly, the weight of the graph representation matrix for a single view cannot be learned in the unified graph matrix. Therefore, this paper designs a Two-step classification algorithm based on Dynamic Graph-ELM, called TSDGELM. In the TSDGELM, the dynamic Graph-ELM is used to obtain the graph representation matrix of each view to save the local neighbor information of the data, and then a joint graph learning algorithm is designed based on the GBS (Graph-Based System) mechanism to fuse the graph matrix of the single-view, and finally the united graph is input into the classifier. To evaluate the effectiveness of the proposed method in this work, we conduct a series of experiments on eight datasets, and the results demonstrate the superiority of the proposed method. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

20. Behavioral analysis of bar charts in documents via stochastic petri-net modeling.

Author: Alexiou, Michail S. and Bourbakis, Nikolaos G.
Subjects: *BEHAVIORAL assessment, *CONVOLUTIONAL neural networks, *SUPERVISED learning, *STOCHASTIC models, *MACHINE translating
Abstract: • A new method is proposed for the behavioral analysis of bars charts images from documents. • The method consists of supervised learning combined with formal analysis. • Extracted information is mapped into Stochastic Petri-net to deduce the monotonic behavior of the chart data. • User-based and mathematical evaluation against state of-the-art tools proves the efficiency of our approach. The accurate understanding of documents depends on the effective processing of its individual modalities such as text, diagrams, tables, charts, and etc. While many research papers focus on extracting the illustrated values in bars charts, little work has been conduced regarding the analysis of this data to deduce behavioral information. In this paper, we present a methodology for the recognition and behavioral analysis of bar chart images. In particular, a Convolutional Neural Network model is trained for the initial chart classification and keypoints are extracted for the translation of identified columns into curves. By analyzing the curves associations and interactions with each other, and converting them into Stochastic Petri-nets, the methodology can perform behavioral analysis and deduce their functional characteristics. Empirical evaluation against state-of-the-art chart analysis tools shows high user-approval scores for the proposed method regarding the depth of extracted information and quality of responses. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

21. E3ID: An efficient end to end person search model.

Author: Wang, Siyang, Liang, Yanchun, Li, Ao, Wang, Zeqing, and Han, Xiaosong
Subjects: *PEDESTRIANS, *VIDEO surveillance, *STREAMING video & television, *IMAGE processing, *VIDEO processing
Abstract: Person Search has recently emerged as a challenging task that aims to jointly solve Pedestrian Detection and Person Re-identification (Re-ID). However, the existing approaches still stay at the image processing stage. In this paper, we proposed E3ID, an efficient end-to-end person search model for video, to better solve the person search problem in real densely populated areas. To speed up the model, a low-quality image filter is proposed to adaptively adjust the sampling frequency of Yolov5 according to the moving speed of pedestrians in the video with minimal computational effort. And the cropped pedestrian images are stored in the gallery library for the Re-ID step. Then, we exploit color feature enhancer to mine color features in specific regions that contribute significantly to the Re-ID process. Compared to other state-of-the-art methods, our proposed E3ID can reach 77.94 % on the evaluation index rank 1, which creates a feasibility of using this model in other complex real-world scenarios. Person Search has recently emerged as a challenging task that aims to jointly solve Pedestrian Detection and Person Re-identification (Re-ID). However, the existing approaches still stay at the image processing stage. In this paper, we proposed E3ID, an efficient end-to-end person search model for video, to better solve the person search problem in real densely populated areas. To speed up the model, a low-quality image filter is proposed to adaptively adjust the sampling frequency of Yolov5 according to the moving speed of pedestrians in the video with minimal computational effort. And the cropped pedestrian images are stored in the gallery library for the Re-ID step. Then, we exploit color feature enhancer to mine color features in specific regions that contribute significantly to the Re-ID process. Compared to other state-of-the-art methods, our proposed E3ID can reach 77.94 % on the evaluation index rank 1, which creates a feasibility of using this model in other complex real-world scenarios. [Display omitted] [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

22. Distributed edge-event-triggered consensus of multi-agent system under DoS attack.

Author: Xue, Shuangsi, Li, Huan, Cao, Hui, and Tan, Junkai
Subjects: *DENIAL of service attacks, *MULTIAGENT systems, *DISTRIBUTED algorithms, *LYAPUNOV stability, *STABILITY theory
Abstract: This paper designs a consensus control protocol for multi-agent systems(MAS) based on the edge-event-triggering mechanism to achieve a leader-following consensus under Denial-of-service (DoS) attacks. First, the dynamics model of the multi-agent system is given, and the DoS attack is modeled. A compensator is designed for each follower separately to keep tracking the state of the leader. In this paper, agents are classified into leaders, informed followers, and uninformed followers. The event-triggering conditions are designed separately. Then, the controller of each agent is designed using the difference between the state of the compensator and the agent state. Using the Lyapunov stability theory, the effectiveness of the designed control strategy under the DoS attack is analyzed, and the constraints on the control parameters are given. Additionally, we proved that the proposed control strategy could avoid the Zeno behavior. Finally, the effectiveness of the proposed control strategy is verified through numerical simulations. • Designed the follower adaptive compensators without global information. • Designed the edge-event-triggering conditions for each type of agent. • Employed state estimations to maximize communication resources when an edge is under attack. • Zeno behavior is avoided by ensuring that triggering events occur at discrete intervals. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

23. Handwriting word spotting in the space of difference between representations using vision transformers.

Author: Mhiri, Mohamed, Hamdan, Mohammed, and Cheriet, Mohamed
Subjects: *TRANSFORMER models, *WORD recognition, *SPACE perception, *HANDWRITING
Abstract: Word spotting in handwritten documents is challenging due to the high intra-class and inter-class variability of handwritten forms. This paper addresses the word spotting problem in the segmentation and the training scenarios. Overall, this paper makes the following three contributions: (1) a new word text representation, called the Pyramid of Bidirectional Character Sequences (PBCS), which can solve both the word spotting problem and the word recognition problem. The use of the PBCS representation allows trained models to identify the character subsequences shared by words. Thus, words that are not seen during training can be represented and spotted. In addition, the PBCS representation encodes word texts redundantly, allowing for word discrimination. (2) A binary classification modeling of the word spotting problem in the difference space between representations, where spotting of non-vocabulary words is more efficient. Finally, (3) a new deep neural network architecture that combines the strengths of convolutional layers and transformers. We evaluated our solution on IAM and RIEMS datasets and showed that it outperforms recent state-of-the-art methods in the query-by-example scenario. • The Pyramid of Bidirectional Character Sequences (PBCS) word text representation. • Word spotting task as a binary classification problem. • Combing the strengths of convolutional layers and transformers. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

24. Mask-guided network for image captioning.

Author: Lim, Jian Han and Chan, Chee Seng
Subjects: *PIXELS, *DEEP learning
Abstract: Attention mechanisms have been widely adopted for image captioning because of their powerful performance. In this paper, we propose a Mask Captioning Network (MaC) consisting of an object layer and a background layer to capture the objects and scenes of an image to generate a sentence. To this end, we leverage the Mask RCNN to detect salient regions at the pixel level instead of a bounding box in the object layer. Meanwhile, in the background layer, a CNN model is used to encode the scene features. In addition, MaC is implemented in both LSTM-based and Transformer-based image captioning architectures. We introduce a mask-guided transformer encoder with additional features to enhance the model. Experimental results show that our model significantly outperforms (with a much richer sentence) baseline models and achieves comparable results with state-of-the-art methods on MSCOCO and Flickr30k datasets. • This paper proposes a new image captioning model consisting of the mask and scene layers. • The idea of the mask layer is to eliminate background information to focus on the image objects only. • The scene layer will focus on generating scene features to ensure that the overall meaning of the images is adapted. • Also, our method can be easily integrated into a transformer-based image captioning model to achieve better performance. • Our method obtains comparable/better results across four popular metrics against SOTA in MSCOCO and Flick30K datasets. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

25. HARWE: A multi-modal large-scale dataset for context-aware human activity recognition in smart working environments.

Author: Esmaeilzehi, Alireza, Khazaei, Ensieh, Wang, Kai, Kaur Kalsi, Navjot, Ng, Pai Chet, Liu, Huan, Yu, Yuanhao, Hatzinakos, Dimitrios, and Plataniotis, Konstantinos
Abstract: In recent years, deep neural networks (DNNs) have provided high performances for various tasks, such as human activity recognition (HAR), in view of their end-to-end training process between the input data and output labels. However, the performances of the DNNs are highly dependent on the availability of large-scale data for their training processes. In this paper, we propose a novel dataset for the task of HAR , in which the labels are specified for the working environments (WE). Our proposed dataset, namely HARWE , considers multiple signal modalities, including visual signal, audio signal, inertial sensor signals, and biological signals, that are acquired using four different electronic devices. Furthermore, our HARWE dataset is acquired from a large number of participants while considering the realistic disturbances that can occur in the wild. Our HARWE data is context-driven, which means there exist a number of labels in it that even though they are correlated with each other, they have contextual differences. A deep conventional multi-modal neural network provides an accuracy of 99.06% and 68.60%, for the cases of the easy and difficult settings of our dataset, respectively, which indicates its applicability for the task of human activity recognition. • We have proposed a novel dataset for the task of human activity recognition. • Our human activity recognition dataset is specified for the smart workplaces. • The proposed human activity recognition is multi-modal and large-scale. • The labeling process of the proposed dataset is performed in a context-aware manner. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Deep motion estimation through adversarial learning for gait recognition.

Author: Yue, Yuanhao, Shi, Laixiang, Zheng, Zheng, Chen, Long, Wang, Zhongyuan, and Zou, Qin
Abstract: Gait recognition is a form of identity verification that can be performed over long distances without requiring the subject's cooperation, making it particularly valuable for applications such as access control, surveillance, and criminal investigation. The essence of gait lies in the motion dynamics of a walking individual. Accurate gait-motion estimation is crucial for high-performance gait recognition. In this paper, we introduce two main designs for gait motion estimation. Firstly, we propose a fully convolutional neural network named W-Net for silhouette segmentation from video sequences. Secondly, we present an adversarial learning-based algorithm for robust gait motion estimation. Together, these designs contribute to a high-performance system for gait recognition and user authentication. In the experiment, two datasets, i.e., OU-IRIS and our own dataset, are used for performance evaluation. Experimental results show that, the W-Net achieves an accuracy of 89.46% in silhouette segmentation, and the proposed user-authentication method achieves over 99.6% and 93.8% accuracy on the two datasets, respectively. • A novel GAN-based learning approach for gait motion extraction. • A W-Net for enhanced gait silhouette extraction. • A new dataset containing 40 subjects for gait recognition. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Ensemble clustering via synchronized relabelling.

Author: Alziati, Michele, Amarù, Fiore, Magri, Luca, and Arrigoni, Federica
Abstract: Ensemble clustering is an important problem in unsupervised learning that aims at aggregating multiple noisy partitions into a unique clustering solution. It can be formulated in terms of relabelling and voting, where relabelling refers to the task of finding optimal permutations that bring coherence among labels in input partitions. In this paper we propose a novel solution to the relabelling problem based on permutation synchronization. By effectively circumventing the need for a reference clustering, our method achieves superior performance than previous work under varying assumptions and scenarios, demonstrating its capability to handle diverse and complex datasets. • Novel relabelling method for Ensemble Clustering based on permutation synchronization. • Flexible formulation that can manage partitions with different numbers of clusters. • Compares favourably against previous Ensemble Clustering techniques. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Learning to learn point signature for 3D shape geometry.

Author: Huang, Hao, Wang, Lingjing, Li, Xiang, Yuan, Shuaihang, Wen, Congcong, Hao, Yu, and Fang, Yi
Abstract: Point signature is a representation that describes the structural geometry of a point within a neighborhood in 3D shapes. Conventional approaches apply a weight-sharing network, e.g. , Graph Neural Network (GNN), to all neighborhoods of all points to directly generate point signatures and gain the generalization ability of the network by extensive training over amounts of samples from scratch. However, such approaches lack the flexibility to rapidly adapt to unseen neighborhood structures and thus cannot generalize well to new point sets. In this paper, we propose a novel meta-learning 3D point signature model, 3D me ta p oint s ignature (MEPS) network , which is capable of learning robust 3D point signatures. Regarding each point signature learning process as a task, our method obtains an optimized model over the best performance on the distribution of all tasks, generating reliable signatures for new tasks, i.e. , signatures of unseen point neighborhoods. Specifically, our MEPS consists of two modules: a base signature learner and a meta signature learner. During training, a base-learner is trained to perform specific signature learning tasks. Meanwhile, a meta-learner is trained to update the base-learner with optimal parameters. During testing, the meta-learner learned with the distribution of all tasks can adaptively change the base-learner parameters to accommodate unseen local neighborhoods. We evaluate our MEPS model on 3D shape correspondence and segmentation. Experimental results demonstrate that our method not only gains significant improvements over the baseline model to achieve state-of-the-art performance, but also is capable of handling unseen 3D geometry. Our implementation is available at https://github.com/hhuang-code/MEPS. [Display omitted] • A meta-learning-based 3D point signature generation for 3d shape geometry learning. • A theoretical proof justifying the necessity of the meta-learning process. • A bi-level optimiaztion framework to instantiate the 3D meta point signature learning. • Evaluation of meta point signature on 3D shape correspondence and part segmentation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Global-local graph neural networks for node-classification.

Author: Eliasof, Moshe and Treister, Eran
Abstract: The task of graph node classification is often approached by utilizing a local Graph Neural Network (GNN), that learns only local information from the node input features and their adjacency. In this paper, we propose to improve the performance of node classification GNNs by utilizing both global and local information, specifically by learning label - and node - features. We therefore call our method Global-Local-GNN (GLGNN). To learn proper label features, for each label, we maximize the similarity between its features and nodes features that belong to the label, while maximizing the distance between nodes that do not belong to the considered label. We then use the learnt label features to predict the node classification map. We demonstrate our GLGNN using three different GNN backbones, and show that our approach improves baseline performance, revealing the importance of global information utilization for node classification. • We propose to learn label features to capture global information of the input graph. • We fuse label and node features to predict a node-classification map. • We qualitatively demonstrate our method by illustrating the learnt label and node features. • We quantitatively demonstrate the benefit of using our global label features approach on 12 real-world datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Self-supervised learning with automatic data augmentation for enhancing representation.

Author: Park, Chanjong and Kim, Eunwoo
Abstract: Self-supervised learning has become an increasingly popular method for learning effective representations from unlabeled data. One prominent approach in self-supervised learning is contrastive learning, which trains models to distinguish between similar and dissimilar sample pairs by pulling similar pairs closer and pushing dissimilar pairs farther apart. The key to the success of contrastive learning lies in the quality of the data augmentation, which increases the diversity of the data and helps the model learn more powerful and generalizable representations. While many studies have emphasized the importance of data augmentation, however, most of them rely on human-crafted augmentation strategies. In this paper, we propose a novel method, S elf A ugmentation on C ontrastive L earning with Cl ustering (SACL), searching for the optimal data augmentation policy automatically using Bayesian optimization and clustering. The proposed approach overcomes the limitations of relying on domain knowledge and avoids the high costs associated with manually designing data augmentation rules. It automatically captures informative and useful features within the data by exploring augmentation policies. We demonstrate that the proposed method surpasses existing approaches that rely on manually designed augmentation rules. Our experiments show SACL outperforms manual strategies, achieving a performance improvement of 1.68% and 1.57% over MoCo v2 on the CIFAR10 and SVHN datasets, respectively. • Optimal augmentation for robust, discriminative representations in contrastive learning. • Diverse transformations for adaptable augmentation strategies across datasets. • Bayesian optimization to find effective augmentation policies with minimal computation. • Weighted combination of contrastive loss and clustering score for data-specific optimization. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Discovering the signal subgraph: An iterative screening approach on graphs.

Author: Shen, Cencheng, Wang, Shangsi, Badea, Alexandra, Priebe, Carey E., and Vogelstein, Joshua T.
Abstract: Supervised learning on graphs is a challenging task due to the high dimensionality and inherent structural dependencies in the data, where each edge depends on a pair of vertices. Existing conventional methods are designed for standard Euclidean data and do not account for the structural information inherent in graphs. In this paper, we propose an iterative vertex screening method to achieve dimension reduction across multiple graph datasets with matched vertex sets and associated graph attributes. Our method aims to identify a signal subgraph to provide a more concise representation of the full graphs, potentially benefiting subsequent vertex classification tasks. The method screens the rows and columns of the adjacency matrix concurrently and stops when the resulting distance correlation is maximized. We establish the theoretical foundation of our method by proving that it estimates the true signal subgraph with high probability. Additionally, we establish the convergence rate of classification error under the Erdos-Renyi random graph model and prove that the subsequent classification can be asymptotically optimal, outperforming the entire graph under high-dimensional conditions. Our method is evaluated on various simulated datasets and real-world human and murine graphs derived from functional and structural magnetic resonance images. The results demonstrate its excellent performance in estimating the ground-truth signal subgraph and achieving superior classification accuracy. • An iterative feature screening method for identifying signal vertices in graphs. • Theoretical guarantee for high-probability recovery of ground-truth vertices. • The signal subgraph is Bayes optimal under the Erdos-Renyi graph model. • Excellent accuracy in identifying true signal vertices in simulations. • Application to identify potential brain regions as signal subgraphs. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Query-guided generalizable medical image segmentation.

Author: Yang, Zhiyi, Zhao, Zhou, Gu, Yuliang, and Xu, Yongchao
Abstract: The practical implementation of deep neural networks in clinical settings faces hurdles due to variations in data distribution across different centers. While the incorporation of query-guided Transformer has improved performance across diverse tasks, the full scope of their generalization capabilities remains unexplored. Given the ability of the query-guided Transformer to dynamically adjust to individual samples, fulfilling the need for domain generalization, this paper explores the potential of query-based Transformer for cross-center generalization and introduces a novel Query-based Cross-Center medical image Segmentation mechanism (QuCCeS). By integrating a query-guided Transformer into a U-Net-like architecture, QuCCeS utilizes attribution modeling capability of query-guided Transformer decoder for segmentation in fluctuating scenarios with limited data. Additionally, QuCCeS incorporates an auxiliary task with adaptive sample weighting for coarse mask prediction. Experimental results demonstrate QuCCeS's superior generalization on unseen domains. • Introducing a plug-and-play module for adapting to varying distribution shifts. • Segmenting directly on updated queries rather than parametric classification. • Incorporating an auxiliary task to improve model convergence and generalization. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Edge-preserving image restoration based on a weighted anisotropic diffusion model.

Author: Qi, Huiqing, Li, Fang, Chen, Peng, Tan, Shengli, Luo, Xiaoliu, and Xie, Ting
Abstract: Partial differential equation-based methods have been widely applied in image restoration. The anisotropic diffusion model has a good noise removal capability without affecting significant edges. However, existing anisotropic diffusion-based models closely depend on the diffusion coefficient function and threshold parameter. This paper proposes a new weighted anisotropic diffusion coefficient model with multiple scales, and it has a higher speed of closing to X-axis and exploits adaptive threshold parameters. Meanwhile, the proposed algorithm is verified to be suitable for multiple types of noise. Numerical metrics and visual comparison of simulation experiments show the proposed model has significant superiority in edge-preserving and staircase artifacts reducing over the existing anisotropic diffusion-based techniques. • We find the weighted anisotropic diffusion coefficient function with high convergence speed. • The adaptive threshold parameter helps keep more details in restored images. • Multi-scale feature map fusing can reduce staircase artifacts along edges. • The performance of the proposed method is promising for real natural and medical images. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Structural self-similarity pattern in global food prices: Utilizing a segmented multifractal detrended fluctuation analysis.

Author: Saâdaoui, Foued
Abstract: This paper provides a comprehensive analysis of the structural self-similarity observed in global food prices, focusing specifically on key commodities such as olive oil, eggs, bread, chicken, and beef. Employing Segmented Multifractal Detrended Fluctuation Analysis (SMF-DFA), we investigate the multifractal intricacies within the price dynamics of these essential food items. SMF-DFA facilitates a detailed examination of piecewise self-similarity, delineating segments by change-points and offering a nuanced understanding of the complex structures inherent in global market prices. Furthermore, our proposal incorporates Levene's test to examine whether the volatility differs significantly among the segments separated by change-points, thereby enhancing the robustness of this analytical stage. This study surpasses conventional methods, providing valuable insights into the multifractal characteristics of food prices across various scales. These findings contribute to a deeper comprehension of the intricate patterns governing global food prices, crucial for decision-making in agricultural economics, financial markets, and the dynamics of global trade. • The structural self-similarity in global food prices is studied. • We use a segmented multifractal detrended fluctuation analysis (SMF-DFA) for this aim. • SMF-DFA allows a piecewise multifractal analysis tools. • Levene's test, introduced for variance inequality, enhances the proposal's robustness. • The study offers vital insights for decision-making in agriculture and global trade. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Decoding class dynamics in learning with noisy labels.

Author: Tatjer, Albert, Nagarajan, Bhalaji, Marques, Ricardo, and Radeva, Petia
Abstract: The creation of large-scale datasets annotated by humans inevitably introduces noisy labels, leading to reduced generalization in deep-learning models. Sample selection-based learning with noisy labels is a recent approach that exhibits promising upbeat performance improvements. The selection of clean samples amongst the noisy samples is an important criterion in the learning process of these models. In this work, we delve deeper into the clean-noise split decision and highlight the aspect that effective demarcation of samples would lead to better performance. We identify the Global Noise Conundrum in the existing models, where the distribution of samples is treated globally. We propose a per-class-based local distribution of samples and demonstrate the effectiveness of this approach in having a better clean-noise split. We validate our proposal on several benchmarks — both real and synthetic, and show substantial improvements over different state-of-the-art algorithms. We further propose a new metric, classiness to extend our analysis and highlight the effectiveness of the proposed method. Source code and instructions to reproduce this paper are available at https://github.com/aldakata/CCLM/ • Label noise leads to reduced generalization in deep learning models. • Global Noise Conundrum exists in several Learning with Noisy Labels sample-selection methods. • Class-Conditional Local noise Model (CCLM) uses per-class-based local distribution of samples with local thresholds. • Class-aware decision boundary of CCLM leads to a better clean-noise split. • Locally adapted clean-noise split yielded improvements in both real and synthetic noise benchmarks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Weight Saliency search with Semantic Constraint for Neural Machine Translation attacks.

Author: Han, Wen, Yang, Xinghao, Liu, Baodi, Zhang, Kai, and Liu, Weifeng
Abstract: Text adversarial attack is an effective way to improve the robustness of Neural Machine Translation (NMT) models. Existing NMT attack tasks are often completed by replacing words. However, most of previous works pursue a high attack success rate but produce semantic inconsistency sentences, leading to wrong translations even for humans. In this paper, we propose a Weight Saliency search with Semantic Constraint (WSSC) algorithm to make semantic consistency word modifications to the input sentence for black-box NMT attacks. Specifically, our WSSC has two major merits. First, it optimizes the word substitution with a word saliency method, which is helpful to reduce word replacement rate. Second, it constrains the objective function with a semantic similarity loss, ensuring every modification does not lead to significant semantic changes. We evaluate the effectiveness of the proposed WSSC by attacking three popular NMT models, i.e., T5, Marian, and BART, on three widely used datasets, i.e., WMT14, WMT16, and TED. Experimental results validate that our WSSC improves Attack Success Rate (ASR) by 4.02% and Semantic Similarity score (USE) by 1.28% on average. Besides, our WSSC also shows good properties in keeping grammar correctness and transfer attack. • Optimize word substitution with word saliency to reduce word replacement rate. • Constrain objective function with semantic similarity loss to ensure inconspicuous semantic changes. • Generate higher grammar accuracy and transferability adversarial examples with WSSC algorithm. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Select & Enhance: Masked-based image enhancement through tree-search theory and deep reinforcement learning.

Author: Cotogni, Marco and Cusano, Claudio
Subjects: *DEEP reinforcement learning, *IMAGE intensifiers, *COMPUTER vision, *COMPUTATIONAL photography, *IMAGE processing
Abstract: The enhancement of low-quality images is both a challenging task and an essential endeavor in many fields including computer vision, computational photography, and image processing. In this paper, we propose a novel and fully explainable method for image enhancement that combines spatial selection and histogram equalization. Our approach leverages tree-search theory and deep reinforcement learning to iteratively select areas to be processed. Extensive experimentation on two datasets demonstrates the quality of our method compared to other state-of-the-art models. We also conducted a multi-user experiment which shows that our method can emulate a variety of enhancement styles. These results highlight the effectiveness and versatility of the proposed method in producing high-quality images through an explainable enhancement process. • A fully explainable image enhancement method based on reinforcement learning. • The method alternates spatial selection and histogram equalization through deep RL. • An extensive experimentation shows that our method is competitive with SOTA methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. MOD-YOLO: Multispectral object detection based on transformer dual-stream YOLO.

Author: Shao, Yanhua, Huang, Qimeng, Mei, yanying, and Chu, hongyu
Subjects: *CRYSTAL field theory
Abstract: • Design a Cross Stage Partial CFT (Cross-Modality Fusion Transformer) module named CSP-CFT. • CSP-CFT can reduce the computing cost by 60 %−70 % on the premise of ensuring high accuracy with CFT. • A powerful and lightweight multispectral object detection dual-stream YOLO (MOD-YOLO), based on CSP-CFT, is proposed. • Propose MOD-YOLO-Tiny, ensuring a high level of accuracy and reducing a lot of computation. Multispectral object detection can effectively improve the precision of object detection in low-visibility scenes, which increases the reliability and stability of the object detection application in the open environment. Cross-Modality Fusion Transformer (CFT) can effectively fuse different spectral information, but this method relies on large models and expensive computing resources. In this paper, we propose multispectral object detection dual-stream YOLO (MOD-YOLO), based on Cross Stage Partial CFT (CSP-CFT), to address the issue that prior studies need heavy inference calculations from the recurrent fusing of multispectral features. This network can divide the fused feature map into two parts, respectively for cross stage output and combined with the next stage feature, to achieve the correct speed/memory/precision balance. To further improve the accuracy, SIoU was selected as the loss function. Ultimately, extensive experiments on multiple publicly available datasets demonstrate that our model, which achieves the smallest model size and excellent performance, produces better tradeoffs between accuracy and model size than other popular models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Patch-based probabilistic identification of plant roots using convolutional neural networks.

Author: Cardellicchio, A., Solimani, F., Dimauro, G., Summerer, S., and Renò, V.
Subjects: *CONVOLUTIONAL neural networks, *PLANT identification, *PLANT roots, *ARTIFICIAL intelligence, *ARTIFICIAL vision
Abstract: Recently, computer vision and artificial intelligence are being used as enabling technologies for plant phenotyping studies, since they allow the analysis of large amounts of data gathered by the sensors. Plant phenotyping studies can be devoted to the evaluation of complex plant traits either on the aerial part of the plant as well as on the underground part, to extract meaningful information about the growth, development, tolerance, or resistance of the plant itself. All plant traits should be evaluated automatically and quantitatively measured in a non-destructive way. This paper describes a novel approach for identifying plant roots from images of the root system architecture using a convolutional neural network (CNN) that operates on small image patches calculating the probability that the center point of the patch is a root pixel. The underlying idea is that the CNN model should embed as much information as possible about the variability of the patches that can show chaotic and heterogeneous backgrounds. Results on a real dataset demonstrate the feasibility of the proposed approach, as it overcomes the current state of the art. • Root systems must be monitored to assess the growth and well-being of a plant. • State-of-the-art approaches mainly use classic ML or U-networks for segmentation. • CNNs can be used for monitoring considering patch-based information. • These models are simpler and faster, and provide better segmentation performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. A shared-private sentiment analysis approach based on cross-modal information interaction.

Author: Hou, Yilin, Zhong, Xianjing, Cao, Hui, Zhu, Zheng, Zhou, Yunfeng, and Zhang, Jie
Subjects: *SENTIMENT analysis, *EMOTION recognition, *TRANSFORMER models, *USER-generated content, *AFFECTIVE computing, *EMOTIONS
Abstract: To explore the heterogeneous sentiment information in each modal feature and improve the accuracy of sentiment analysis, this paper proposes a Multimodal Sentiment Analysis based on Text-Centric Sharing-Private Affective Semantics (TCSP). First, the Deep Canonical Time Wrapping (DCTW) algorithm is employed to effectively align the timing deviations of Audio and Picture modalities. Then, a cross-modal shared mask matrix is designed, and a mutual attention mechanism is introduced to compute the shared affective semantic features of Audio-picture-to-text. Following this, the private affective semantic features within Audio and Picture modalities are derived via the self-attention mechanism with LSTM. Finally, the Transformer Encoder structure is improved, achieving deep interaction and feature fusion of cross-modal emotional information, and conducting emotional analysis. Experiments are conducted on the IEMOCAP and MELD datasets. By comparing with current state-of-the-art models, the accuracy of the TCSP model reaches 82.02%, fully validating the effectiveness. In addition, the rationality of the design of each structure within the model is verified through ablation experiments. • Proposed an emotion recognition method that includes shared and private emotions. • Utilized the DCTW for audio-picture time-series features alignment effectively improved recognition accuracy. • TCSP Achieved 82.02% accuracy on the IEMOCAP dataset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. Adversarial regularized attributed network embedding for graph anomaly detection.

Author: Tian, Chongrui, Zhang, Fengbin, and Wang, Ruidong
Subjects: *ANOMALY detection (Computer security), *COMPACT spaces (Topology), *LATENT variables
Abstract: Graph anomaly detection aims to identify the nodes that display significantly different behavior from the majority. However, existing methods neglect the combined interaction between the network structure and node attributes, resulting in suboptimal latent representations of nodes due to network noise. In this paper, we introduce a novel approach called adversarial regularized attributed network embedding (ARANE) for graph anomaly detection. ARANE addresses this issue by forcing normal nodes to inhabit a compact manifold in the latent space, taking into account both the network structure and node attributes. It ensures that data points from the normal class, originating from different distributions, are distributed within a single compact latent space, while excluding anomalies from this region. ARANE employs a dual-encoder architecture consisting of an attribute encoder and a structure encoder. The attribute encoder learns node attribute embeddings, while the structure encoder focuses on learning structure embeddings. To obtain high-quality node embeddings for effective anomaly detection, we apply adversarial learning to regularize the learned embeddings separately in both the structure and attribute spaces. Furthermore, we introduce a fusion module that combines the final node embeddings derived from the structure and attribute spaces. These joint embeddings serve as inputs to a dual-decoder for graph reconstruction, where the resulting reconstruction errors are utilized as anomaly scores for anomaly detection. Extensive experiments conducted on real-world attributed networks demonstrate the superior effectiveness of our proposed method compared to state-of-the-art approaches. • We introduce ARANE for one-class graph classification. It uses network structure & node attributes to tightly cluster normal nodes, excluding anomalies. • Our method uses a compact fusion module, capturing interactions between structural & attribute data in networks, for seamless integration. • Extensive experiments on real-world networks prove ARANE's effectiveness as a top solution for one-class graph classification. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. A three-stream fusion and self-differential attention network for multi-modal crowd counting.

Author: Tang, Haihan, Wang, Yi, Lin, Zhiping, Chau, Lap-Pui, and Zhuang, Huiping
Subjects: *COUNTING, *CROWDS
Abstract: Multi-modal crowd counting aims at using multiple types of data, like RGB-Thermal and RGB-Depth, to count the number of people in crowded scenes. Current methods mainly focus on two-stream multi-modal information fusing in the encoder and single-scale semantic features in the decoder. In this paper, we propose an end-to-end three-stream fusion and self-differential attention network to simultaneously address the multi-modal fusion and scale variation problems for multi-modal crowd counting. Specifically, the encoder adopts three-stream fusion to fuse stage-wise modality-paired and modality-specific features. The decoder applies a self-differential attention mechanism on multi-level fused features to extract basic and differential information adaptively, and finally, the counting head predicts the density map. Experimental results on RGB-T and RGB-D benchmarks show the superiority of our proposed method compared with the state-of-the-art multi-modal crowd counting methods. Ablation studies and visualization demonstrate the advantages of the proposed modules in our model. • We propose a novel multi-modal crowd counting model to address information fusion and scale variation problems. • The model uses the three-stream fusion encoder with IIM to fuse modality-paired and modality-specific features. • The model adaptively integrates multi-scale features by SDAM to emphasize discriminative scale information. • Our method outperforms its counterparts and performs consistently well in the daytime and nighttime. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. Efficient label-free pruning and retraining for Text-VQA Transformers.

Author: Poh, Soon Chang, Chan, Chee Seng, and Lim, Chee Kau
Subjects: *OCCUPATIONAL retraining, *QUESTION answering systems, *RESEARCH personnel
Abstract: Recent advancements in Scene Text Visual Question Answering (Text-VQA) employ autoregressive Transformers, showing improved performance with larger models and pre-training datasets. Although various pruning frameworks exist to simplify Transformers, many are integrated into the time-consuming training process. Researchers have recently explored post-training pruning techniques, which separate pruning from training and reduce time consumption. Some methods use gradient-based importance scores that rely on labeled data, while others offer retraining-free algorithms that quickly enhance pruned model accuracy. This paper proposes a novel gradient-based importance score that only necessitates raw, unlabeled data for post-training structured autoregressive Transformer pruning. Additionally, we introduce a Retraining Strategy (ReSt) for efficient performance restoration of pruned models of arbitrary sizes. We evaluate our approach on TextVQA and ST-VQA datasets using TAP, TAP†† and SaL‡-Base where all utilize autoregressive Transformers. On TAP and TAP†† , our pruning approach achieves up to 60% reduction in size with less than a 2.4% accuracy drop and the proposed ReSt retraining approach takes only 3 to 34 min, comparable to existing retraining-free techniques. On SaL‡-Base , the proposed method achieves up to 50% parameter reduction with less than 2.9% accuracy drop requiring only 1.19 h of retraining using the proposed ReSt approach. The code is publicly accessible at https://github.com/soonchangAI/LFPR. • We study a label-free importance score for structured pruning of autoregressive Transformers. • We propose an adaptive retraining approach for pruned Transformer models of varying sizes. • Our pruned model achieve up to 60% reduction in size with only ¡2.4% drop in accuracy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Segmentation assisted Prostate Cancer Grading with Multitask Collaborative Learning.

Author: Zhang, Zheng, Song, Yushan, Tan, Yunpeng, Yan, Shuo, Zhang, Bo, and Zhuang, Yufeng
Subjects: *PROSTATE cancer, *COLLABORATIVE learning, *PROSTATE-specific antigen, *IMAGE segmentation, *INFORMATION networks, *COMPUTER assisted instruction
Abstract: Medical image segmentation can provide doctors with more direct information on the location and size of organs or lesions, which can serve as an valuable auxiliary task for prostate cancer grading. Meanwhile, other types of diagnostic data besides images are also essential, such as patient age, Prostate-Specific Antigen (PSA), etc. Currently, there is a lack of in-depth research on how to effectively differentiate and select shared features and task-specific features in multitask learning, as well as how to balance and explore the potential correlations between different tasks. In this paper, we propose a novel Shared Feature Hybrid Gating Experts (SFHGE) architecture for collaborative main (lesion grading) and auxiliary (lesion segmentation) task learning, dynamically selecting shared and task-specific features. To efficiently utilize complementary features, we also introduce a Cross-Task Attention module (CrossTA) to capture cross-task integrated representation. Additionally, recognizing that non-image clinical information often provides crucial diagnostic insights, we further design a Heterogeneous Information Fusion Network (HIFN) to better integrate clinical data, thereby improving grading performance. Extensive experiments on the PI-CAI dataset demonstrate that our approach outperforms mainstream classification and segmentation models. • A shared feature hybrid gating experts framework is proposed for segmentation assisted prostate cancer grading. • A crosstask attention module is designed to provide effective complementary information between tasks. • A heterogeneous information fusion network is designed to integrate multimodal diagnostic data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Manifold information through neighbor embedding projection for image retrieval.

Author: Leticio, Gustavo Rosseto, Kawai, Vinicius Sato, Valem, Lucas Pascotti, Pedronette, Daniel Carlos Guimarães, and da S. Torres, Ricardo
Subjects: *IMAGE retrieval, *CONVOLUTIONAL neural networks, *TRANSFORMER models, *DATA visualization, *DIMENSION reduction (Statistics)
Abstract: Although studied for decades, constructing effective image retrieval remains an open problem in a wide range of relevant applications. Impressive advances have been made to represent image content, mainly supported by the development of Convolution Neural Networks (CNNs) and Transformer-based models. On the other hand, effectively computing the similarity between such representations is still challenging, especially in collections in which images are structured in manifolds. This paper introduces a novel solution to this problem based on dimensionality reduction techniques, often used for data visualization. The key idea consists in exploiting the spatial relationships defined by neighbor embedding data visualization methods, such as t-SNE and UMAP, to compute a more effective distance/similarity measure between images. Experiments were conducted on several widely-used datasets. Obtained results indicate that the proposed approach leads to significant gains in comparison to the original feature representations. Experiments also indicate competitive results in comparison with state-of-the-art image retrieval approaches. • Manifold information encoded by the Neighbor Embedding framework for image retrieval. • Use of 2D spatial relationships given by Neighbor Embedding for similarity definition. • A simple, yet effective and efficient image retrieval scheme is proposed. • A late fusion method is used to combine distance given by t-SNE and UMAP projections. • Significant gains obtained on diverse datasets and features based on CNNs and Transformers. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. Deepfake face discrimination based on self-attention mechanism.

Author: Wang, Shuai, Zhu, Donghui, Chen, Jian, Bi, Jiangbo, and Wang, Wenyi
Subjects: *NATIONAL security, *EVERYDAY life
Abstract: With the rapid progress of deepfake technology, the improper use of manipulated images and videos presenting synthetic faces has arisen as a noteworthy concern, thereby posing threats to both daily life and national security. While numerous CNN based deepfake face detection methods were proposed, most of the existing approaches encounter challenges in effectively capturing the image contents across different scales and positions. In this paper, we present a novel two-branch structural network, referred to as the Self-Attention Deepfake Face Discrimination Network (SADFFD). Specifically, a branch incorporating cascaded multi self-attention mechanism (SAM) modules, is parallelly integrated with EfficientNet-B4 (EffB4). The multi SAM branch supplies additional features that concentrate on image regions essential for discriminating between real and fake. The EffB4 network is adopted because of its efficiency by adjusting the resolution, depth, and width of the network. According to our comprehensive experiments conducted on FaceForensics++, Celeb-DF, and our self-constructed SAMGAN3 datasets, the proposed SADFFD demonstrated the highest detection accuracy, averaging 99.01% in FaceForensics++, 98.65% in Celeb-DF, and an impressive 99.99% in SAMGAN3, surpassing other state-of-the-art (SOTA) methods. • A novel two-branch CNN structure is proposed for deepfake face discrimination. • The self-attention mechanism is utilized to enhance the accuracy of discrimination. • FaceForensics++, Celeb-DF and our self-built dataset are used in evaluation in terms of detection accuracy. • Forged face images/videos from various generating methods are included in our evaluation datasets. • Comprehensive experiments demonstrate the superior performance of our proposed method in discriminating deepfake face. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. Data-agnostic Face Image Synthesis Detection using Bayesian CNNs.

Author: Leyva, Roberto, Sanchez, Victor, Epiphaniou, Gregory, and Maple, Carsten
Subjects: *CONVOLUTIONAL neural networks, *ANOMALY detection (Computer security), *COMPUTER security
Abstract: Face image synthesis detection is considerably gaining attention because of the potential negative impact on society that this type of synthetic data brings. In this paper, we propose a data-agnostic solution to detect the face image synthesis process. Specifically, our solution is based on an anomaly detection framework that requires only real data to learn the inference process. It is therefore data-agnostic in the sense that it requires no synthetic face images. The solution uses the posterior probability with respect to the reference data to determine if new samples are synthetic or not. Our evaluation results using different synthesizers show that our solution is very competitive against the state-of-the-art, which requires synthetic data for training. • We use an anomaly detection framework to detect synthetic data. • Our proposed solution requires only real data to detect the synthesis process. • Our solution achieves very competitive performance, outperforming existing solutions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Decomposition via elastic-band transform.

Author: Choi, Guebin and Oh, Hee-Seok
Subjects: *DECOMPOSITION method, *DATA analysis, *SIGNALS & signaling
Abstract: In this paper, we propose a novel decomposition method using elastic-band transform (EBT), which mimics eye scanning and is utilized for multiscale analysis of signals. The proposed EBT-based method can efficiently extract the features of various signals with the following three advantages. First, it is a data-driven approach that extracts several important modes based solely on data without using predetermined basis functions. Second, it does not assume that the signal consists of (locally) sinusoidal intrinsic mode functions, which is a common assumption in existing methods. Therefore, the proposed method can handle a wide range of signals. Finally, it is robust to noise. A practical algorithm for decomposition is presented, along with some theoretical properties. Simulation examples and real data analysis results show promising empirical properties of the proposed method. • The proposed is a data-driven approach that extracts several important modes based solely on data. • The proposed method does not assume that the signal consists of (locally) sinusoidal intrinsic mode functions. • The proposed method is robust to noise. • The proposed method extends the scope of signals for decomposition significantly. • A practical algorithm for decomposition is presented along with some theoretical properties. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Learning from feature and label spaces' bias for uncertainty-adaptive facial emotion recognition.

Author: Xu, Luhui, Gan, Yanling, and Xia, Haiying
Subjects: *EMOTION recognition, *LEARNING modules, *BAYESIAN analysis, *KNOWLEDGE transfer, *EMOTIONS
Abstract: Developing an accurate deep model for facial emotion recognition is a long-term challenge. It is because the uncertainty of emotions, stemming from the ambiguity of different emotional categories and the difference of subjective annotations, can ruin the ability of model to achieve the desired optimization. This paper constructs two distinct datasets, namely original sample set and ambiguous sample set, to explore an effective ambiguous knowledge transfer method to realize the adaptive awareness of uncertainty in facial emotion recognition. The original sample set is the weakly-augmented data with relatively low uncertainty, as most emotions are clean in reality. Meanwhile, the ambiguous sample set is strongly-augmented data that introduces feature and label bias with regard to emotion, which are with relatively high uncertainty. The proposed framework consists of two sub-nets, which are trained using the original set and the ambiguous set respectively. To achieve uncertainty-adaptive learning for two sub-nets, we introduce two modules. One is the cross-space attention consistency learning module that performs attention coupling across original and ambiguous feature spaces, achieving uncertainty-aware representation learning in feature granularity. The other is the soft-label learning module that models and utilizes uncertainty in label granularity, through aligning the posterior distributions between original label space and ambiguous label space. Experimental studies on public datasets indicate that our method is competitive with the state-of-the-art. • We establish an uncertainty-adaptive framework via exploring the bias between two kinds of sample sets. • We custom two modules namely cross-space attention consistency learning module and soft-label learning module. • The experimental results on public datasets demonstrate the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Topological optimization of continuous action iterated dilemma based on finite-time strategy using DQN.

Author: Jin, Xiaoyue, Li, Haojing, Yu, Dengxiu, Wang, Zhen, and Li, Xuelong
Subjects: *DILEMMA, *LYAPUNOV functions, *DISCOUNT prices, *PROBLEM solving, *DYNAMIC models
Abstract: In this paper, a finite-time convergent continuous action iterated dilemma (CAID) with topological optimization is proposed to overcome the limitations of traditional methods. Asymptotic stability in traditional CAID does not provide information about the rate of convergence or the dynamics of the system in the finite time. There are no effective methods to analyze its convergence time in previous works. We made some efforts to solve these problems. Firstly, CAID is proposed by enriching the players' strategies as continuous, which means the player can choose an intermediate state between cooperation and defection. And discount rate is considered to imitate that players cannot learn accurately based on strategic differences. Then, to analyze the convergence time of CAID, a finite-time convergent analysis based on the Lyapunov function is introduced. Furthermore, the optimal communication topology generation method based on the Deep Q-learning (DQN) is proposed to explore a better game structure. At last, the simulation shows the effectiveness of the proposed method. • The dynamic model of Continuous Action Iterated Dilemma (CAID) with continuous strategy is more realistic. • The convergence time of CAID is analyzed by proposed finite-time convergent analysis method based on the Lyapunov function. • The optimal communication topology generation method based on DQN is proposed to enhance the game structure. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Region

Database

3,554 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources