88 results
Search Results
2. Neural network based cognitive approaches from face perception with human performance benchmark.
- Author
-
Chen, Yiyang, Li, Yi-Fan, Cheng, Chuanxin, and Ying, Haojiang
- Abstract
Artificial neural network models are able to achieve great performance at numerous computationally challenging tasks like face recognition. It is of significant importance to explore the difference between neural network models and human brains in terms of computational mechanism. This issue has become an experimental focus for some researchers in recent studies, and it is believed that using human behavior to understand neural network models can address this issue. This paper compares the neural network model performance with human performance on a classic yet important task: judging the ethnicity of a given face. This study uses Caucasian and East Asian faces to train 4 neural networks including AlexNet, VGG11, VGG13, and VGG16. Then, the ethnicity judgments of the neural networks are compared with human data using classical psychophysical methods by fitting psychometric curves. The results suggest that VGG11, followed by VGG16, shows a similar response pattern as humans, while simpler AlexNet and more complex VGG13 do not resemble human performance. Thus, this paper explores a new paradigm to compare neural networks and human brains. • Neural networks are able to provide cognitive approaches from face perception. • Human perception is used as a benchmark to evaluate the performance of neural networks. • Neural networks utilizing eye regions are confirmed to be better to perceive faces. • Attentional region analysis would unveil the processing details of a neural network. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Conditional Information Gain Trellis.
- Author
-
Bicici, Ufuk Can, Meral, Tuna Han Salih, and Akarun, Lale
- Abstract
Conditional computing processes an input using only part of the neural network's computational units. Learning to execute parts of a deep convolutional network by routing individual samples has several advantages: This can facilitate the interpretability of the model, reduce the model complexity, and reduce the computational burden during training and inference. Furthermore, if similar classes are routed to the same path, that part of the network learns to discriminate between finer differences and better classification accuracies can be attained with fewer parameters. Recently, several papers have exploited this idea to select a particular child of a node in a tree-shaped network or to skip parts of a network. In this work, we follow a Trellis-based approach for generating specific execution paths in a deep convolutional neural network. We have designed routing mechanisms that use differentiable information gain-based cost functions to determine which subset of features in a convolutional layer will be executed. We call our method Conditional Information Gain Trellis (CIGT). We show that our conditional execution mechanism achieves comparable or better model performance compared to unconditional baselines, using only a fraction of the computational resources. We provide our code and model checkpoints used in the paper at: https://github.com/ufukcbicici/cigt/tree/prl/prl_scripts. • We introduce Conditional Information Gain Trellis (CIGT) for conditional computing. • We derive the CIGT loss function based on classification and information gain losses. • CIGT performs better or comparably using a fraction of the computational resources. • We give tests on MNIST, Fashion MNIST, and CIFAR 10, showing CIGT compares favorably. • Supplementary materials show that semantically similar classes are grouped together. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. FDM: Document image seen-through removal via Fuzzy Diffusion Models.
- Author
-
Wang, Yijie, Xu, Jindong, Liang, Zongbao, Chong, Qianpeng, and Cheng, Xiang
- Abstract
While scanning or shooting a document, factors like ink density and paper transparency may cause the content from the reverse side to become visible through the paper, resulting in a digital image with a 'seen-through' phenomenon, which will affect practical applications. In addition, document images can be affected by random factors during the imaging process, such as differences in the performance of camera equipment and variations in the physical properties of the document itself. These random factors increase the noise of the document image and may cause the seen-through phenomena to become more complex and diverse. To tackle this issue, we propose the Fuzzy Diffusion Model (FDM), which combines fuzzy logic with diffusion models. It effectively models complex seen-through effects and handles uncertainties in document images. Specifically, we gradually degrade the original image with mean-reverting stochastic differential equation(SDE) to transform it into seen-through mean state with fixed Gaussian noise version. Following this, fuzzy operations are introduced into the noise network. Which helps the model better learn noise and data distributions by reasoning about the affiliation relationship of each pixel point through fuzzy logic. Eventually, in the reverse process, the low-quality image is gradually restored by simulating the corresponding reverse-time SDE. Extensive quantitative and qualitative experiments conducted on various datasets demonstrate that the proposed method significantly removes the seen-through effects and achieves good results under several metrics. The proposed FDM effectively solves the seen-through effects of document images and obtains better visual quality. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. A simple and efficient filter feature selection method via document-term matrix unitization.
- Author
-
Li, Qing, Zhao, Shuai, He, Tengjiao, and Wen, Jinming
- Subjects
- *
FEATURE selection , *FILM reviewing , *ABSOLUTE value , *PRODUCT reviews - Abstract
Text processing tasks commonly grapple with the challenge of high dimensionality. One of the most effective solutions to this challenge is to preprocess text data through feature selection methods. Feature selection can select the most advantageous features for subsequent operations (e.g., classification) from the native feature space of the text. This process effectively trims the feature space's dimensionality, enhancing subsequent operations' efficiency and accuracy. This paper proposes a straightforward and efficient filter feature selection method based on document-term matrix unitization (DTMU) for text processing. Diverging from previous filter feature selection methods that concentrate on scoring criteria definition, our method achieves more optimal feature selection by unitizing each column of the document-term matrix. This approach mitigates feature-to-feature influence and reinforces the role of the weighting proportion within the features. Subsequently, our scoring criterion subtracts the sum of weights for negative samples from positive samples and takes the absolute value. We conduct numerical experiments to compare DTMU with four advanced filter feature selection methods: max–min ratio metric, proportional rough feature selector, least loss, and relative discrimination criterion, along with two classical filter feature selection methods: Chi-square and information gain. The experiments are performed on four ten-thousand-dimensional feature space datasets: b o o k , d v d , m u s i c , m o v i e and two thousand-dimensional feature space datasets: i m d b , a m a z o n _ c e l l s , sourced from Amazon product reviews and movie reviews. Experimental findings demonstrate that DTMU selects more advantageous features for subsequent operations and achieves a higher dimensionality reduction rate than those of the other six methods used for comparison. Moreover, DTMU exhibits robust generalization capabilities across various classifiers and dimensional datasets. Notably, the average CPU time for a single run of DTMU is measured at 1.455 s. • This paper offers DTMU, a filter feature selection method enhancing feature quality via unitization for improved properties. • DTMU is notably user-friendly, involving only two straightforward steps. • This paper substantiates, through numerical experiments, that DTMU stands as an advanced and effective method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Neural ordinary differential equation for irregular human motion prediction.
- Author
-
Chen, Yang, Liu, Hong, Song, Pinhao, and Li, Wenhao
- Subjects
- *
MOTION capture (Cinematography) , *ORDINARY differential equations , *MOTION capture (Human mechanics) , *FIXED interest rates , *QUATERNIONS - Abstract
Human motion prediction often assumes that the input sequence is of fixed frame rates. However, in real-world applications, the motion capture system may work unstably sometimes and miss some frames, which leads to inferior performance. To solve this problem, this paper leverages neural Ordinary Differential Equations and proposes a human Motion Prediction method named MP-ODE to handle irregular-time human pose series. First, a Difference Operator and a Positional Encoding are proposed to explicitly provide the kinematic and time information for the model. Second, we construct the encoder–decoder model with ODE-GRU unit, which enables us to learn continuous-time dynamics of human motion. Third, a Quaternion Loss transforms exponential maps to quaternion to train MP-ODE. The Quaternion Loss can avoid the discontinuities and singularities of exponential maps, boosting the convergence of the model. Comprehensive experiments on Human3.6 m and CMU-Mocap datasets demonstrate that the proposed MP-ODE achieves promising performance in both normal and irregular-time conditions. • This paper designs a framework MP-ODE to tackle irregular human motion prediction. • With Neural ODEs, MP-ODE has the continuous-time series modeling ability. • MP-ODE incorporates dynamics information as well as Positional Encoding into the input features. • A Quaternion Loss is proposed to avoids discontinuities and singularities during the training. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Abductive natural language inference by interactive model with structural loss.
- Author
-
Li, Linhao, Wang, Ao, Xu, Ming, Dong, Yongfeng, and Li, Xin
- Subjects
- *
STRUCTURAL models , *NATURAL languages , *LANGUAGE models , *INFERENCE (Logic) , *STRUCTURAL design - Abstract
The abductive natural language inference task (α NLI) is proposed to infer the most plausible explanation between the cause and the event. In the α NLI task, two observations are given, and the most plausible hypothesis is asked to pick out from the candidates. Existing methods model the relation between each candidate hypothesis separately and penalize the inference network uniformly. In this paper, we argue that it is unnecessary to distinguish the reasoning abilities among correct hypotheses; and similarly, all wrong hypotheses contribute the same when explaining the reasons of the observations. Therefore, we propose to group instead of ranking the hypotheses and design a structural loss called "joint softmax focal loss" in this paper. Based on the observation that the hypotheses are generally semantically related, we design a novel interactive language model aiming at exploiting the rich interaction among competing hypotheses. We name this new model for α NLI: Interactive Model with Structural Loss (IMSL). The experimental results show that our IMSL has achieved the highest performance on the RoBERTa-large pretrained model, with ACC and AUC results increased by about 1% and 5% respectively. We also compared the performance in terms of precision and sensitivity with publicly available code, demonstrating the efficiency and robustness of the proposed approach. • For α NLI task, we regroup instead of ranking all hypotheses. • We design a softmax focal loss for each group and combine them into a joint loss. • we design an information interaction layer that increases the AUC by about 5%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. HARWE: A multi-modal large-scale dataset for context-aware human activity recognition in smart working environments.
- Author
-
Esmaeilzehi, Alireza, Khazaei, Ensieh, Wang, Kai, Kaur Kalsi, Navjot, Ng, Pai Chet, Liu, Huan, Yu, Yuanhao, Hatzinakos, Dimitrios, and Plataniotis, Konstantinos
- Abstract
In recent years, deep neural networks (DNNs) have provided high performances for various tasks, such as human activity recognition (HAR), in view of their end-to-end training process between the input data and output labels. However, the performances of the DNNs are highly dependent on the availability of large-scale data for their training processes. In this paper, we propose a novel dataset for the task of HAR , in which the labels are specified for the working environments (WE). Our proposed dataset, namely HARWE , considers multiple signal modalities, including visual signal, audio signal, inertial sensor signals, and biological signals, that are acquired using four different electronic devices. Furthermore, our HARWE dataset is acquired from a large number of participants while considering the realistic disturbances that can occur in the wild. Our HARWE data is context-driven, which means there exist a number of labels in it that even though they are correlated with each other, they have contextual differences. A deep conventional multi-modal neural network provides an accuracy of 99.06% and 68.60%, for the cases of the easy and difficult settings of our dataset, respectively, which indicates its applicability for the task of human activity recognition. • We have proposed a novel dataset for the task of human activity recognition. • Our human activity recognition dataset is specified for the smart workplaces. • The proposed human activity recognition is multi-modal and large-scale. • The labeling process of the proposed dataset is performed in a context-aware manner. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Deep motion estimation through adversarial learning for gait recognition.
- Author
-
Yue, Yuanhao, Shi, Laixiang, Zheng, Zheng, Chen, Long, Wang, Zhongyuan, and Zou, Qin
- Abstract
Gait recognition is a form of identity verification that can be performed over long distances without requiring the subject's cooperation, making it particularly valuable for applications such as access control, surveillance, and criminal investigation. The essence of gait lies in the motion dynamics of a walking individual. Accurate gait-motion estimation is crucial for high-performance gait recognition. In this paper, we introduce two main designs for gait motion estimation. Firstly, we propose a fully convolutional neural network named W-Net for silhouette segmentation from video sequences. Secondly, we present an adversarial learning-based algorithm for robust gait motion estimation. Together, these designs contribute to a high-performance system for gait recognition and user authentication. In the experiment, two datasets, i.e., OU-IRIS and our own dataset, are used for performance evaluation. Experimental results show that, the W-Net achieves an accuracy of 89.46% in silhouette segmentation, and the proposed user-authentication method achieves over 99.6% and 93.8% accuracy on the two datasets, respectively. • A novel GAN-based learning approach for gait motion extraction. • A W-Net for enhanced gait silhouette extraction. • A new dataset containing 40 subjects for gait recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Ensemble clustering via synchronized relabelling.
- Author
-
Alziati, Michele, Amarù, Fiore, Magri, Luca, and Arrigoni, Federica
- Abstract
Ensemble clustering is an important problem in unsupervised learning that aims at aggregating multiple noisy partitions into a unique clustering solution. It can be formulated in terms of relabelling and voting, where relabelling refers to the task of finding optimal permutations that bring coherence among labels in input partitions. In this paper we propose a novel solution to the relabelling problem based on permutation synchronization. By effectively circumventing the need for a reference clustering, our method achieves superior performance than previous work under varying assumptions and scenarios, demonstrating its capability to handle diverse and complex datasets. • Novel relabelling method for Ensemble Clustering based on permutation synchronization. • Flexible formulation that can manage partitions with different numbers of clusters. • Compares favourably against previous Ensemble Clustering techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Learning to learn point signature for 3D shape geometry.
- Author
-
Huang, Hao, Wang, Lingjing, Li, Xiang, Yuan, Shuaihang, Wen, Congcong, Hao, Yu, and Fang, Yi
- Abstract
Point signature is a representation that describes the structural geometry of a point within a neighborhood in 3D shapes. Conventional approaches apply a weight-sharing network, e.g. , Graph Neural Network (GNN), to all neighborhoods of all points to directly generate point signatures and gain the generalization ability of the network by extensive training over amounts of samples from scratch. However, such approaches lack the flexibility to rapidly adapt to unseen neighborhood structures and thus cannot generalize well to new point sets. In this paper, we propose a novel meta-learning 3D point signature model, 3D me ta p oint s ignature (MEPS) network , which is capable of learning robust 3D point signatures. Regarding each point signature learning process as a task, our method obtains an optimized model over the best performance on the distribution of all tasks, generating reliable signatures for new tasks, i.e. , signatures of unseen point neighborhoods. Specifically, our MEPS consists of two modules: a base signature learner and a meta signature learner. During training, a base-learner is trained to perform specific signature learning tasks. Meanwhile, a meta-learner is trained to update the base-learner with optimal parameters. During testing, the meta-learner learned with the distribution of all tasks can adaptively change the base-learner parameters to accommodate unseen local neighborhoods. We evaluate our MEPS model on 3D shape correspondence and segmentation. Experimental results demonstrate that our method not only gains significant improvements over the baseline model to achieve state-of-the-art performance, but also is capable of handling unseen 3D geometry. Our implementation is available at https://github.com/hhuang-code/MEPS. [Display omitted] • A meta-learning-based 3D point signature generation for 3d shape geometry learning. • A theoretical proof justifying the necessity of the meta-learning process. • A bi-level optimiaztion framework to instantiate the 3D meta point signature learning. • Evaluation of meta point signature on 3D shape correspondence and part segmentation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Global-local graph neural networks for node-classification.
- Author
-
Eliasof, Moshe and Treister, Eran
- Abstract
The task of graph node classification is often approached by utilizing a local Graph Neural Network (GNN), that learns only local information from the node input features and their adjacency. In this paper, we propose to improve the performance of node classification GNNs by utilizing both global and local information, specifically by learning label - and node - features. We therefore call our method Global-Local-GNN (GLGNN). To learn proper label features, for each label, we maximize the similarity between its features and nodes features that belong to the label, while maximizing the distance between nodes that do not belong to the considered label. We then use the learnt label features to predict the node classification map. We demonstrate our GLGNN using three different GNN backbones, and show that our approach improves baseline performance, revealing the importance of global information utilization for node classification. • We propose to learn label features to capture global information of the input graph. • We fuse label and node features to predict a node-classification map. • We qualitatively demonstrate our method by illustrating the learnt label and node features. • We quantitatively demonstrate the benefit of using our global label features approach on 12 real-world datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Self-supervised learning with automatic data augmentation for enhancing representation.
- Author
-
Park, Chanjong and Kim, Eunwoo
- Abstract
Self-supervised learning has become an increasingly popular method for learning effective representations from unlabeled data. One prominent approach in self-supervised learning is contrastive learning, which trains models to distinguish between similar and dissimilar sample pairs by pulling similar pairs closer and pushing dissimilar pairs farther apart. The key to the success of contrastive learning lies in the quality of the data augmentation, which increases the diversity of the data and helps the model learn more powerful and generalizable representations. While many studies have emphasized the importance of data augmentation, however, most of them rely on human-crafted augmentation strategies. In this paper, we propose a novel method, S elf A ugmentation on C ontrastive L earning with Cl ustering (SACL), searching for the optimal data augmentation policy automatically using Bayesian optimization and clustering. The proposed approach overcomes the limitations of relying on domain knowledge and avoids the high costs associated with manually designing data augmentation rules. It automatically captures informative and useful features within the data by exploring augmentation policies. We demonstrate that the proposed method surpasses existing approaches that rely on manually designed augmentation rules. Our experiments show SACL outperforms manual strategies, achieving a performance improvement of 1.68% and 1.57% over MoCo v2 on the CIFAR10 and SVHN datasets, respectively. • Optimal augmentation for robust, discriminative representations in contrastive learning. • Diverse transformations for adaptable augmentation strategies across datasets. • Bayesian optimization to find effective augmentation policies with minimal computation. • Weighted combination of contrastive loss and clustering score for data-specific optimization. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Discovering the signal subgraph: An iterative screening approach on graphs.
- Author
-
Shen, Cencheng, Wang, Shangsi, Badea, Alexandra, Priebe, Carey E., and Vogelstein, Joshua T.
- Abstract
Supervised learning on graphs is a challenging task due to the high dimensionality and inherent structural dependencies in the data, where each edge depends on a pair of vertices. Existing conventional methods are designed for standard Euclidean data and do not account for the structural information inherent in graphs. In this paper, we propose an iterative vertex screening method to achieve dimension reduction across multiple graph datasets with matched vertex sets and associated graph attributes. Our method aims to identify a signal subgraph to provide a more concise representation of the full graphs, potentially benefiting subsequent vertex classification tasks. The method screens the rows and columns of the adjacency matrix concurrently and stops when the resulting distance correlation is maximized. We establish the theoretical foundation of our method by proving that it estimates the true signal subgraph with high probability. Additionally, we establish the convergence rate of classification error under the Erdos-Renyi random graph model and prove that the subsequent classification can be asymptotically optimal, outperforming the entire graph under high-dimensional conditions. Our method is evaluated on various simulated datasets and real-world human and murine graphs derived from functional and structural magnetic resonance images. The results demonstrate its excellent performance in estimating the ground-truth signal subgraph and achieving superior classification accuracy. • An iterative feature screening method for identifying signal vertices in graphs. • Theoretical guarantee for high-probability recovery of ground-truth vertices. • The signal subgraph is Bayes optimal under the Erdos-Renyi graph model. • Excellent accuracy in identifying true signal vertices in simulations. • Application to identify potential brain regions as signal subgraphs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Query-guided generalizable medical image segmentation.
- Author
-
Yang, Zhiyi, Zhao, Zhou, Gu, Yuliang, and Xu, Yongchao
- Abstract
The practical implementation of deep neural networks in clinical settings faces hurdles due to variations in data distribution across different centers. While the incorporation of query-guided Transformer has improved performance across diverse tasks, the full scope of their generalization capabilities remains unexplored. Given the ability of the query-guided Transformer to dynamically adjust to individual samples, fulfilling the need for domain generalization, this paper explores the potential of query-based Transformer for cross-center generalization and introduces a novel Query-based Cross-Center medical image Segmentation mechanism (QuCCeS). By integrating a query-guided Transformer into a U-Net-like architecture, QuCCeS utilizes attribution modeling capability of query-guided Transformer decoder for segmentation in fluctuating scenarios with limited data. Additionally, QuCCeS incorporates an auxiliary task with adaptive sample weighting for coarse mask prediction. Experimental results demonstrate QuCCeS's superior generalization on unseen domains. • Introducing a plug-and-play module for adapting to varying distribution shifts. • Segmenting directly on updated queries rather than parametric classification. • Incorporating an auxiliary task to improve model convergence and generalization. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Edge-preserving image restoration based on a weighted anisotropic diffusion model.
- Author
-
Qi, Huiqing, Li, Fang, Chen, Peng, Tan, Shengli, Luo, Xiaoliu, and Xie, Ting
- Abstract
Partial differential equation-based methods have been widely applied in image restoration. The anisotropic diffusion model has a good noise removal capability without affecting significant edges. However, existing anisotropic diffusion-based models closely depend on the diffusion coefficient function and threshold parameter. This paper proposes a new weighted anisotropic diffusion coefficient model with multiple scales, and it has a higher speed of closing to X-axis and exploits adaptive threshold parameters. Meanwhile, the proposed algorithm is verified to be suitable for multiple types of noise. Numerical metrics and visual comparison of simulation experiments show the proposed model has significant superiority in edge-preserving and staircase artifacts reducing over the existing anisotropic diffusion-based techniques. • We find the weighted anisotropic diffusion coefficient function with high convergence speed. • The adaptive threshold parameter helps keep more details in restored images. • Multi-scale feature map fusing can reduce staircase artifacts along edges. • The performance of the proposed method is promising for real natural and medical images. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Structural self-similarity pattern in global food prices: Utilizing a segmented multifractal detrended fluctuation analysis.
- Author
-
Saâdaoui, Foued
- Abstract
This paper provides a comprehensive analysis of the structural self-similarity observed in global food prices, focusing specifically on key commodities such as olive oil, eggs, bread, chicken, and beef. Employing Segmented Multifractal Detrended Fluctuation Analysis (SMF-DFA), we investigate the multifractal intricacies within the price dynamics of these essential food items. SMF-DFA facilitates a detailed examination of piecewise self-similarity, delineating segments by change-points and offering a nuanced understanding of the complex structures inherent in global market prices. Furthermore, our proposal incorporates Levene's test to examine whether the volatility differs significantly among the segments separated by change-points, thereby enhancing the robustness of this analytical stage. This study surpasses conventional methods, providing valuable insights into the multifractal characteristics of food prices across various scales. These findings contribute to a deeper comprehension of the intricate patterns governing global food prices, crucial for decision-making in agricultural economics, financial markets, and the dynamics of global trade. • The structural self-similarity in global food prices is studied. • We use a segmented multifractal detrended fluctuation analysis (SMF-DFA) for this aim. • SMF-DFA allows a piecewise multifractal analysis tools. • Levene's test, introduced for variance inequality, enhances the proposal's robustness. • The study offers vital insights for decision-making in agriculture and global trade. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Decoding class dynamics in learning with noisy labels.
- Author
-
Tatjer, Albert, Nagarajan, Bhalaji, Marques, Ricardo, and Radeva, Petia
- Abstract
The creation of large-scale datasets annotated by humans inevitably introduces noisy labels, leading to reduced generalization in deep-learning models. Sample selection-based learning with noisy labels is a recent approach that exhibits promising upbeat performance improvements. The selection of clean samples amongst the noisy samples is an important criterion in the learning process of these models. In this work, we delve deeper into the clean-noise split decision and highlight the aspect that effective demarcation of samples would lead to better performance. We identify the Global Noise Conundrum in the existing models, where the distribution of samples is treated globally. We propose a per-class-based local distribution of samples and demonstrate the effectiveness of this approach in having a better clean-noise split. We validate our proposal on several benchmarks — both real and synthetic, and show substantial improvements over different state-of-the-art algorithms. We further propose a new metric, classiness to extend our analysis and highlight the effectiveness of the proposed method. Source code and instructions to reproduce this paper are available at https://github.com/aldakata/CCLM/ • Label noise leads to reduced generalization in deep learning models. • Global Noise Conundrum exists in several Learning with Noisy Labels sample-selection methods. • Class-Conditional Local noise Model (CCLM) uses per-class-based local distribution of samples with local thresholds. • Class-aware decision boundary of CCLM leads to a better clean-noise split. • Locally adapted clean-noise split yielded improvements in both real and synthetic noise benchmarks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Weight Saliency search with Semantic Constraint for Neural Machine Translation attacks.
- Author
-
Han, Wen, Yang, Xinghao, Liu, Baodi, Zhang, Kai, and Liu, Weifeng
- Abstract
Text adversarial attack is an effective way to improve the robustness of Neural Machine Translation (NMT) models. Existing NMT attack tasks are often completed by replacing words. However, most of previous works pursue a high attack success rate but produce semantic inconsistency sentences, leading to wrong translations even for humans. In this paper, we propose a Weight Saliency search with Semantic Constraint (WSSC) algorithm to make semantic consistency word modifications to the input sentence for black-box NMT attacks. Specifically, our WSSC has two major merits. First, it optimizes the word substitution with a word saliency method, which is helpful to reduce word replacement rate. Second, it constrains the objective function with a semantic similarity loss, ensuring every modification does not lead to significant semantic changes. We evaluate the effectiveness of the proposed WSSC by attacking three popular NMT models, i.e., T5, Marian, and BART, on three widely used datasets, i.e., WMT14, WMT16, and TED. Experimental results validate that our WSSC improves Attack Success Rate (ASR) by 4.02% and Semantic Similarity score (USE) by 1.28% on average. Besides, our WSSC also shows good properties in keeping grammar correctness and transfer attack. • Optimize word substitution with word saliency to reduce word replacement rate. • Constrain objective function with semantic similarity loss to ensure inconspicuous semantic changes. • Generate higher grammar accuracy and transferability adversarial examples with WSSC algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Select & Enhance: Masked-based image enhancement through tree-search theory and deep reinforcement learning.
- Author
-
Cotogni, Marco and Cusano, Claudio
- Subjects
- *
DEEP reinforcement learning , *IMAGE intensifiers , *COMPUTER vision , *COMPUTATIONAL photography , *IMAGE processing - Abstract
The enhancement of low-quality images is both a challenging task and an essential endeavor in many fields including computer vision, computational photography, and image processing. In this paper, we propose a novel and fully explainable method for image enhancement that combines spatial selection and histogram equalization. Our approach leverages tree-search theory and deep reinforcement learning to iteratively select areas to be processed. Extensive experimentation on two datasets demonstrates the quality of our method compared to other state-of-the-art models. We also conducted a multi-user experiment which shows that our method can emulate a variety of enhancement styles. These results highlight the effectiveness and versatility of the proposed method in producing high-quality images through an explainable enhancement process. • A fully explainable image enhancement method based on reinforcement learning. • The method alternates spatial selection and histogram equalization through deep RL. • An extensive experimentation shows that our method is competitive with SOTA methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. MOD-YOLO: Multispectral object detection based on transformer dual-stream YOLO.
- Author
-
Shao, Yanhua, Huang, Qimeng, Mei, yanying, and Chu, hongyu
- Subjects
- *
CRYSTAL field theory - Abstract
• Design a Cross Stage Partial CFT (Cross-Modality Fusion Transformer) module named CSP-CFT. • CSP-CFT can reduce the computing cost by 60 %−70 % on the premise of ensuring high accuracy with CFT. • A powerful and lightweight multispectral object detection dual-stream YOLO (MOD-YOLO), based on CSP-CFT, is proposed. • Propose MOD-YOLO-Tiny, ensuring a high level of accuracy and reducing a lot of computation. Multispectral object detection can effectively improve the precision of object detection in low-visibility scenes, which increases the reliability and stability of the object detection application in the open environment. Cross-Modality Fusion Transformer (CFT) can effectively fuse different spectral information, but this method relies on large models and expensive computing resources. In this paper, we propose multispectral object detection dual-stream YOLO (MOD-YOLO), based on Cross Stage Partial CFT (CSP-CFT), to address the issue that prior studies need heavy inference calculations from the recurrent fusing of multispectral features. This network can divide the fused feature map into two parts, respectively for cross stage output and combined with the next stage feature, to achieve the correct speed/memory/precision balance. To further improve the accuracy, SIoU was selected as the loss function. Ultimately, extensive experiments on multiple publicly available datasets demonstrate that our model, which achieves the smallest model size and excellent performance, produces better tradeoffs between accuracy and model size than other popular models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Patch-based probabilistic identification of plant roots using convolutional neural networks.
- Author
-
Cardellicchio, A., Solimani, F., Dimauro, G., Summerer, S., and Renò, V.
- Subjects
- *
CONVOLUTIONAL neural networks , *PLANT identification , *PLANT roots , *ARTIFICIAL intelligence , *ARTIFICIAL vision - Abstract
Recently, computer vision and artificial intelligence are being used as enabling technologies for plant phenotyping studies, since they allow the analysis of large amounts of data gathered by the sensors. Plant phenotyping studies can be devoted to the evaluation of complex plant traits either on the aerial part of the plant as well as on the underground part, to extract meaningful information about the growth, development, tolerance, or resistance of the plant itself. All plant traits should be evaluated automatically and quantitatively measured in a non-destructive way. This paper describes a novel approach for identifying plant roots from images of the root system architecture using a convolutional neural network (CNN) that operates on small image patches calculating the probability that the center point of the patch is a root pixel. The underlying idea is that the CNN model should embed as much information as possible about the variability of the patches that can show chaotic and heterogeneous backgrounds. Results on a real dataset demonstrate the feasibility of the proposed approach, as it overcomes the current state of the art. • Root systems must be monitored to assess the growth and well-being of a plant. • State-of-the-art approaches mainly use classic ML or U-networks for segmentation. • CNNs can be used for monitoring considering patch-based information. • These models are simpler and faster, and provide better segmentation performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. A shared-private sentiment analysis approach based on cross-modal information interaction.
- Author
-
Hou, Yilin, Zhong, Xianjing, Cao, Hui, Zhu, Zheng, Zhou, Yunfeng, and Zhang, Jie
- Subjects
- *
SENTIMENT analysis , *EMOTION recognition , *TRANSFORMER models , *USER-generated content , *AFFECTIVE computing , *EMOTIONS - Abstract
To explore the heterogeneous sentiment information in each modal feature and improve the accuracy of sentiment analysis, this paper proposes a Multimodal Sentiment Analysis based on Text-Centric Sharing-Private Affective Semantics (TCSP). First, the Deep Canonical Time Wrapping (DCTW) algorithm is employed to effectively align the timing deviations of Audio and Picture modalities. Then, a cross-modal shared mask matrix is designed, and a mutual attention mechanism is introduced to compute the shared affective semantic features of Audio-picture-to-text. Following this, the private affective semantic features within Audio and Picture modalities are derived via the self-attention mechanism with LSTM. Finally, the Transformer Encoder structure is improved, achieving deep interaction and feature fusion of cross-modal emotional information, and conducting emotional analysis. Experiments are conducted on the IEMOCAP and MELD datasets. By comparing with current state-of-the-art models, the accuracy of the TCSP model reaches 82.02%, fully validating the effectiveness. In addition, the rationality of the design of each structure within the model is verified through ablation experiments. • Proposed an emotion recognition method that includes shared and private emotions. • Utilized the DCTW for audio-picture time-series features alignment effectively improved recognition accuracy. • TCSP Achieved 82.02% accuracy on the IEMOCAP dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Adversarial regularized attributed network embedding for graph anomaly detection.
- Author
-
Tian, Chongrui, Zhang, Fengbin, and Wang, Ruidong
- Subjects
- *
ANOMALY detection (Computer security) , *COMPACT spaces (Topology) , *LATENT variables - Abstract
Graph anomaly detection aims to identify the nodes that display significantly different behavior from the majority. However, existing methods neglect the combined interaction between the network structure and node attributes, resulting in suboptimal latent representations of nodes due to network noise. In this paper, we introduce a novel approach called adversarial regularized attributed network embedding (ARANE) for graph anomaly detection. ARANE addresses this issue by forcing normal nodes to inhabit a compact manifold in the latent space, taking into account both the network structure and node attributes. It ensures that data points from the normal class, originating from different distributions, are distributed within a single compact latent space, while excluding anomalies from this region. ARANE employs a dual-encoder architecture consisting of an attribute encoder and a structure encoder. The attribute encoder learns node attribute embeddings, while the structure encoder focuses on learning structure embeddings. To obtain high-quality node embeddings for effective anomaly detection, we apply adversarial learning to regularize the learned embeddings separately in both the structure and attribute spaces. Furthermore, we introduce a fusion module that combines the final node embeddings derived from the structure and attribute spaces. These joint embeddings serve as inputs to a dual-decoder for graph reconstruction, where the resulting reconstruction errors are utilized as anomaly scores for anomaly detection. Extensive experiments conducted on real-world attributed networks demonstrate the superior effectiveness of our proposed method compared to state-of-the-art approaches. • We introduce ARANE for one-class graph classification. It uses network structure & node attributes to tightly cluster normal nodes, excluding anomalies. • Our method uses a compact fusion module, capturing interactions between structural & attribute data in networks, for seamless integration. • Extensive experiments on real-world networks prove ARANE's effectiveness as a top solution for one-class graph classification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. A three-stream fusion and self-differential attention network for multi-modal crowd counting.
- Author
-
Tang, Haihan, Wang, Yi, Lin, Zhiping, Chau, Lap-Pui, and Zhuang, Huiping
- Subjects
- *
COUNTING , *CROWDS - Abstract
Multi-modal crowd counting aims at using multiple types of data, like RGB-Thermal and RGB-Depth, to count the number of people in crowded scenes. Current methods mainly focus on two-stream multi-modal information fusing in the encoder and single-scale semantic features in the decoder. In this paper, we propose an end-to-end three-stream fusion and self-differential attention network to simultaneously address the multi-modal fusion and scale variation problems for multi-modal crowd counting. Specifically, the encoder adopts three-stream fusion to fuse stage-wise modality-paired and modality-specific features. The decoder applies a self-differential attention mechanism on multi-level fused features to extract basic and differential information adaptively, and finally, the counting head predicts the density map. Experimental results on RGB-T and RGB-D benchmarks show the superiority of our proposed method compared with the state-of-the-art multi-modal crowd counting methods. Ablation studies and visualization demonstrate the advantages of the proposed modules in our model. • We propose a novel multi-modal crowd counting model to address information fusion and scale variation problems. • The model uses the three-stream fusion encoder with IIM to fuse modality-paired and modality-specific features. • The model adaptively integrates multi-scale features by SDAM to emphasize discriminative scale information. • Our method outperforms its counterparts and performs consistently well in the daytime and nighttime. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Efficient label-free pruning and retraining for Text-VQA Transformers.
- Author
-
Poh, Soon Chang, Chan, Chee Seng, and Lim, Chee Kau
- Subjects
- *
OCCUPATIONAL retraining , *QUESTION answering systems , *RESEARCH personnel - Abstract
Recent advancements in Scene Text Visual Question Answering (Text-VQA) employ autoregressive Transformers, showing improved performance with larger models and pre-training datasets. Although various pruning frameworks exist to simplify Transformers, many are integrated into the time-consuming training process. Researchers have recently explored post-training pruning techniques, which separate pruning from training and reduce time consumption. Some methods use gradient-based importance scores that rely on labeled data, while others offer retraining-free algorithms that quickly enhance pruned model accuracy. This paper proposes a novel gradient-based importance score that only necessitates raw, unlabeled data for post-training structured autoregressive Transformer pruning. Additionally, we introduce a Retraining Strategy (ReSt) for efficient performance restoration of pruned models of arbitrary sizes. We evaluate our approach on TextVQA and ST-VQA datasets using TAP, TAP†† and SaL‡-Base where all utilize autoregressive Transformers. On TAP and TAP†† , our pruning approach achieves up to 60% reduction in size with less than a 2.4% accuracy drop and the proposed ReSt retraining approach takes only 3 to 34 min, comparable to existing retraining-free techniques. On SaL‡-Base , the proposed method achieves up to 50% parameter reduction with less than 2.9% accuracy drop requiring only 1.19 h of retraining using the proposed ReSt approach. The code is publicly accessible at https://github.com/soonchangAI/LFPR. • We study a label-free importance score for structured pruning of autoregressive Transformers. • We propose an adaptive retraining approach for pruned Transformer models of varying sizes. • Our pruned model achieve up to 60% reduction in size with only ¡2.4% drop in accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Segmentation assisted Prostate Cancer Grading with Multitask Collaborative Learning.
- Author
-
Zhang, Zheng, Song, Yushan, Tan, Yunpeng, Yan, Shuo, Zhang, Bo, and Zhuang, Yufeng
- Subjects
- *
PROSTATE cancer , *COLLABORATIVE learning , *PROSTATE-specific antigen , *IMAGE segmentation , *INFORMATION networks , *COMPUTER assisted instruction - Abstract
Medical image segmentation can provide doctors with more direct information on the location and size of organs or lesions, which can serve as an valuable auxiliary task for prostate cancer grading. Meanwhile, other types of diagnostic data besides images are also essential, such as patient age, Prostate-Specific Antigen (PSA), etc. Currently, there is a lack of in-depth research on how to effectively differentiate and select shared features and task-specific features in multitask learning, as well as how to balance and explore the potential correlations between different tasks. In this paper, we propose a novel Shared Feature Hybrid Gating Experts (SFHGE) architecture for collaborative main (lesion grading) and auxiliary (lesion segmentation) task learning, dynamically selecting shared and task-specific features. To efficiently utilize complementary features, we also introduce a Cross-Task Attention module (CrossTA) to capture cross-task integrated representation. Additionally, recognizing that non-image clinical information often provides crucial diagnostic insights, we further design a Heterogeneous Information Fusion Network (HIFN) to better integrate clinical data, thereby improving grading performance. Extensive experiments on the PI-CAI dataset demonstrate that our approach outperforms mainstream classification and segmentation models. • A shared feature hybrid gating experts framework is proposed for segmentation assisted prostate cancer grading. • A crosstask attention module is designed to provide effective complementary information between tasks. • A heterogeneous information fusion network is designed to integrate multimodal diagnostic data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Manifold information through neighbor embedding projection for image retrieval.
- Author
-
Leticio, Gustavo Rosseto, Kawai, Vinicius Sato, Valem, Lucas Pascotti, Pedronette, Daniel Carlos Guimarães, and da S. Torres, Ricardo
- Subjects
- *
IMAGE retrieval , *CONVOLUTIONAL neural networks , *TRANSFORMER models , *DATA visualization , *DIMENSION reduction (Statistics) - Abstract
Although studied for decades, constructing effective image retrieval remains an open problem in a wide range of relevant applications. Impressive advances have been made to represent image content, mainly supported by the development of Convolution Neural Networks (CNNs) and Transformer-based models. On the other hand, effectively computing the similarity between such representations is still challenging, especially in collections in which images are structured in manifolds. This paper introduces a novel solution to this problem based on dimensionality reduction techniques, often used for data visualization. The key idea consists in exploiting the spatial relationships defined by neighbor embedding data visualization methods, such as t-SNE and UMAP, to compute a more effective distance/similarity measure between images. Experiments were conducted on several widely-used datasets. Obtained results indicate that the proposed approach leads to significant gains in comparison to the original feature representations. Experiments also indicate competitive results in comparison with state-of-the-art image retrieval approaches. • Manifold information encoded by the Neighbor Embedding framework for image retrieval. • Use of 2D spatial relationships given by Neighbor Embedding for similarity definition. • A simple, yet effective and efficient image retrieval scheme is proposed. • A late fusion method is used to combine distance given by t-SNE and UMAP projections. • Significant gains obtained on diverse datasets and features based on CNNs and Transformers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Deepfake face discrimination based on self-attention mechanism.
- Author
-
Wang, Shuai, Zhu, Donghui, Chen, Jian, Bi, Jiangbo, and Wang, Wenyi
- Subjects
- *
NATIONAL security , *EVERYDAY life - Abstract
With the rapid progress of deepfake technology, the improper use of manipulated images and videos presenting synthetic faces has arisen as a noteworthy concern, thereby posing threats to both daily life and national security. While numerous CNN based deepfake face detection methods were proposed, most of the existing approaches encounter challenges in effectively capturing the image contents across different scales and positions. In this paper, we present a novel two-branch structural network, referred to as the Self-Attention Deepfake Face Discrimination Network (SADFFD). Specifically, a branch incorporating cascaded multi self-attention mechanism (SAM) modules, is parallelly integrated with EfficientNet-B4 (EffB4). The multi SAM branch supplies additional features that concentrate on image regions essential for discriminating between real and fake. The EffB4 network is adopted because of its efficiency by adjusting the resolution, depth, and width of the network. According to our comprehensive experiments conducted on FaceForensics++, Celeb-DF, and our self-constructed SAMGAN3 datasets, the proposed SADFFD demonstrated the highest detection accuracy, averaging 99.01% in FaceForensics++, 98.65% in Celeb-DF, and an impressive 99.99% in SAMGAN3, surpassing other state-of-the-art (SOTA) methods. • A novel two-branch CNN structure is proposed for deepfake face discrimination. • The self-attention mechanism is utilized to enhance the accuracy of discrimination. • FaceForensics++, Celeb-DF and our self-built dataset are used in evaluation in terms of detection accuracy. • Forged face images/videos from various generating methods are included in our evaluation datasets. • Comprehensive experiments demonstrate the superior performance of our proposed method in discriminating deepfake face. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Data-agnostic Face Image Synthesis Detection using Bayesian CNNs.
- Author
-
Leyva, Roberto, Sanchez, Victor, Epiphaniou, Gregory, and Maple, Carsten
- Subjects
- *
CONVOLUTIONAL neural networks , *ANOMALY detection (Computer security) , *COMPUTER security - Abstract
Face image synthesis detection is considerably gaining attention because of the potential negative impact on society that this type of synthetic data brings. In this paper, we propose a data-agnostic solution to detect the face image synthesis process. Specifically, our solution is based on an anomaly detection framework that requires only real data to learn the inference process. It is therefore data-agnostic in the sense that it requires no synthetic face images. The solution uses the posterior probability with respect to the reference data to determine if new samples are synthetic or not. Our evaluation results using different synthesizers show that our solution is very competitive against the state-of-the-art, which requires synthetic data for training. • We use an anomaly detection framework to detect synthetic data. • Our proposed solution requires only real data to detect the synthesis process. • Our solution achieves very competitive performance, outperforming existing solutions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Decomposition via elastic-band transform.
- Author
-
Choi, Guebin and Oh, Hee-Seok
- Subjects
- *
DECOMPOSITION method , *DATA analysis , *SIGNALS & signaling - Abstract
In this paper, we propose a novel decomposition method using elastic-band transform (EBT), which mimics eye scanning and is utilized for multiscale analysis of signals. The proposed EBT-based method can efficiently extract the features of various signals with the following three advantages. First, it is a data-driven approach that extracts several important modes based solely on data without using predetermined basis functions. Second, it does not assume that the signal consists of (locally) sinusoidal intrinsic mode functions, which is a common assumption in existing methods. Therefore, the proposed method can handle a wide range of signals. Finally, it is robust to noise. A practical algorithm for decomposition is presented, along with some theoretical properties. Simulation examples and real data analysis results show promising empirical properties of the proposed method. • The proposed is a data-driven approach that extracts several important modes based solely on data. • The proposed method does not assume that the signal consists of (locally) sinusoidal intrinsic mode functions. • The proposed method is robust to noise. • The proposed method extends the scope of signals for decomposition significantly. • A practical algorithm for decomposition is presented along with some theoretical properties. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Learning from feature and label spaces' bias for uncertainty-adaptive facial emotion recognition.
- Author
-
Xu, Luhui, Gan, Yanling, and Xia, Haiying
- Subjects
- *
EMOTION recognition , *LEARNING modules , *BAYESIAN analysis , *KNOWLEDGE transfer , *EMOTIONS - Abstract
Developing an accurate deep model for facial emotion recognition is a long-term challenge. It is because the uncertainty of emotions, stemming from the ambiguity of different emotional categories and the difference of subjective annotations, can ruin the ability of model to achieve the desired optimization. This paper constructs two distinct datasets, namely original sample set and ambiguous sample set, to explore an effective ambiguous knowledge transfer method to realize the adaptive awareness of uncertainty in facial emotion recognition. The original sample set is the weakly-augmented data with relatively low uncertainty, as most emotions are clean in reality. Meanwhile, the ambiguous sample set is strongly-augmented data that introduces feature and label bias with regard to emotion, which are with relatively high uncertainty. The proposed framework consists of two sub-nets, which are trained using the original set and the ambiguous set respectively. To achieve uncertainty-adaptive learning for two sub-nets, we introduce two modules. One is the cross-space attention consistency learning module that performs attention coupling across original and ambiguous feature spaces, achieving uncertainty-aware representation learning in feature granularity. The other is the soft-label learning module that models and utilizes uncertainty in label granularity, through aligning the posterior distributions between original label space and ambiguous label space. Experimental studies on public datasets indicate that our method is competitive with the state-of-the-art. • We establish an uncertainty-adaptive framework via exploring the bias between two kinds of sample sets. • We custom two modules namely cross-space attention consistency learning module and soft-label learning module. • The experimental results on public datasets demonstrate the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Topological optimization of continuous action iterated dilemma based on finite-time strategy using DQN.
- Author
-
Jin, Xiaoyue, Li, Haojing, Yu, Dengxiu, Wang, Zhen, and Li, Xuelong
- Subjects
- *
DILEMMA , *LYAPUNOV functions , *DISCOUNT prices , *PROBLEM solving , *DYNAMIC models - Abstract
In this paper, a finite-time convergent continuous action iterated dilemma (CAID) with topological optimization is proposed to overcome the limitations of traditional methods. Asymptotic stability in traditional CAID does not provide information about the rate of convergence or the dynamics of the system in the finite time. There are no effective methods to analyze its convergence time in previous works. We made some efforts to solve these problems. Firstly, CAID is proposed by enriching the players' strategies as continuous, which means the player can choose an intermediate state between cooperation and defection. And discount rate is considered to imitate that players cannot learn accurately based on strategic differences. Then, to analyze the convergence time of CAID, a finite-time convergent analysis based on the Lyapunov function is introduced. Furthermore, the optimal communication topology generation method based on the Deep Q-learning (DQN) is proposed to explore a better game structure. At last, the simulation shows the effectiveness of the proposed method. • The dynamic model of Continuous Action Iterated Dilemma (CAID) with continuous strategy is more realistic. • The convergence time of CAID is analyzed by proposed finite-time convergent analysis method based on the Lyapunov function. • The optimal communication topology generation method based on DQN is proposed to enhance the game structure. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Time to retire F1-binary score for action unit detection.
- Author
-
Hinduja, Saurabh, Nourivandi, Tara, Cohn, Jeffrey F., and Canavan, Shaun
- Subjects
- *
FACIAL expression , *FACE perception , *TASK analysis , *CLASS actions - Abstract
Detecting action units is an important task in face analysis, especially in facial expression recognition. This is due, in part, to the idea that expressions can be decomposed into multiple action units. To evaluate systems that detect action units, F1-binary score is often used as the evaluation metric. In this paper, we argue that F1-binary score does not reliably evaluate these models due largely to class imbalance. Because of this, F1-binary score should be retired and a suitable replacement should be used. We justify this argument through a detailed evaluation of the negative influence of class imbalance on action unit detection. This includes an investigation into the influence of class imbalance in train and test sets and in new data (i.e., generalizability). We empirically show that F1-micro should be used as the replacement for F1-binary. • We show that AU base rates have a large influence on detection across different architectures. • We show how different evaluation metrics are impacted by AU base rates. • We argue that F1-binary should not be used, for AU detection, and that F1-micro should replace it. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Attention-map augmentation for hypercomplex breast cancer classification.
- Author
-
Lopez, Eleonora, Betello, Filippo, Carmignani, Federico, Grassucci, Eleonora, and Comminiello, Danilo
- Subjects
- *
TUMOR classification , *BREAST cancer , *EARLY diagnosis , *CANCER diagnosis , *NETWORK performance - Abstract
Breast cancer is the most widespread neoplasm among women and early detection of this disease is critical. Deep learning techniques have become of great interest to improve diagnostic performance. However, distinguishing between malignant and benign masses in whole mammograms poses a challenge, as they appear nearly identical to an untrained eye, and the region of interest (ROI) constitutes only a small fraction of the entire image. In this paper, we propose a framework, parameterized hypercomplex attention maps (PHAM), to overcome these problems. Specifically, we deploy an augmentation step based on computing attention maps. Then, the attention maps are used to condition the classification step by constructing a multi-dimensional input comprised of the original breast cancer image and the corresponding attention map. In this step, a parameterized hypercomplex neural network (PHNN) is employed to perform breast cancer classification. The framework offers two main advantages. First, attention maps provide critical information regarding the ROI and allow the neural model to concentrate on it. Second, the hypercomplex architecture has the ability to model local relations between input dimensions thanks to hypercomplex algebra rules, thus properly exploiting the information provided by the attention map. We demonstrate the efficacy of the proposed framework on both mammography images as well as histopathological ones. We surpass attention-based state-of-the-art networks and the real-valued counterpart of our approach. The code of our work is available at https://github.com/ispamm/AttentionBCS. • Deep learning enhances breast cancer diagnosis. • Mammogram mass discrimination is challenging. • Attention maps can highlight the small ROI. • Attention map-augmentation can condition a hypercomplex network to improve performance. • Hypercomplex algebra exploits the additional information provided by the attention map. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Kreĭn twin support vector machines for imbalanced data classification.
- Author
-
Jimenez-Castaño, C., Álvarez-Meza, A., Cárdenas-Peña, D., Orozco-Gutíerrez, A., and Guerrero-Erazo, J.
- Subjects
- *
SUPPORT vector machines , *KERNEL functions , *CLASSIFICATION , *KERNEL operating systems - Abstract
Conventional classification assumes a balanced sample distribution among classes. However, such a premise leads to biased performance over the majority class (with the highest number of instances). The Twin Support Vector Machines (TWSVM) obtained great prominence due to their low computational burden compared to the standard SVM. Besides, traditional machine learning seeks methods whose solution depends on a convex problem or semi-positive definite similarity matrices. Yet, this kind of matrix cannot adequately represent many real-world applications. The above defines the need to use non-negative measures as an indefinite function in a Reproducing Kernel Kreĭn Space (RKKS). This paper proposes a novel approach called Kreĭn Twin Support Vector Machines (KTSVM), which appropriately incorporates indefinite kernels within a TWSVM-based gradient optimization. To code pertinent input patterns within an imbalanced data discrimination, our KTSVM employs an implicit mapping to a RKKS. Also, our approach takes advantage of the TWSVM scheme's benefits by creating two parallel hyperplanes. This promotes the KTSVM optimization in a gradient-descent framework. Results obtained on synthetic and real-world datasets demonstrate that our solution performs better in terms of imbalanced data classification than state-of-the-art techniques. • Kreĭn Twin Support Vector Machines (KTSVM) enhances nonlinear TWSVM with indefinite kernels for imbalanced classification. • KTSVM represents high dimensions with indefinite kernels, improving imbalanced classification. • Our approach transforms the traditional TWSVM's convex dual problem into a gradient-based optimization. • KTSVM builds dual hyperplanes in RKKS, refined with gradient descent. • KTSVM excels over SVM, KSVM, and TWSVM in binary imbalanced classification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Weakly-supervised Incremental learning for Semantic segmentation with Class Hierarchy.
- Author
-
Kim, Hyoseo and Choe, Junsuk
- Subjects
- *
MACHINE learning , *SUPERVISED learning , *DEEP learning , *LEARNING ability - Abstract
Although current semantic segmentation approaches have achieved impressive performance, their ability to incrementally learn new classes is limited. Moreover, pixel-by-pixel annotations are costly and time-consuming. Therefore, a new field called Weakly-supervised Incremental Learning for Semantic Segmentation (WILSS) has emerged, which learns new classes using image-level labels. However, image-level labels do not provide sufficient detail, and we discover that the state-of-the-art of WILSS suffers from confusion between old knowledge and new knowledge. To address this issue, we propose W eakly-supervised I ncremental learning for S emantic segmentation with Class H ierarchy (WISH), a method that considers the hierarchical structure of each class when determining which knowledge to trust in cases of confusion between old and new knowledge. Our method has achieved new state-of-the-art performances in all settings compared to the previous methods on the Pascal VOC and MS COCO datasets. • Our paper introduces WISH, utilizing hierarchy and image labels for class incremental learning in semantic segmentation. • We efficiently utilize hierarchy, boosting segmentation performances without added costs. • Our method outperforms existing methods in all configurations, showcasing hierarchical integration's effectiveness. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Enhancing mass spectrometry data analysis: A novel framework for calibration, outlier detection, and classification.
- Author
-
Peng, Weili, Zhou, Tao, and Chen, Yuanyuan
- Subjects
- *
OUTLIER detection , *TRANSFORMER models , *AUTOMATIC classification , *CORONARY disease , *MYOCARDIAL infarction - Abstract
Mass spectrometry (MS) is a powerful analytical technique in metabolomics, enabling the identification and quantification of metabolites. However, analyzing MS data poses challenges such as batch effects, outliers, and high-dimensional data. In this paper, we propose a comprehensive framework for MS data analysis. The framework integrates data calibration, outlier detection, and automatic classification modules. Data calibration is performed using a deep autoencoder to remove batch effects. Outlier detection combines multiple algorithms through ensemble learning to identify and remove outliers. Automatic classification utilizes a transformer model to handle high-dimensional data and capture global feature relationships. Experimental results on myocardial infarction (MI) and coronary heart disease (CHD) datasets demonstrate the effectiveness of the framework. It outperforms traditional classification models and achieves higher accuracy. The proposed framework provides a robust solution for MS data analysis, facilitating more accurate classification and enabling reliable biological insights in metabolomics research. [Display omitted] • Novel framework integrates three steps for MS data analysis. • Deep AE removes batch effects, while an ensemble approach removes outliers. • Transformer captures intrinsic relationships in MS data for accurate classification. • The framework provides a comprehensive solution to improve classification accuracy. • The framework also facilitates reliable biological insights. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Deep neural networks for automatic speaker recognition do not learn supra-segmental temporal features.
- Author
-
Neururer, Daniel, Dellwo, Volker, and Stadelmann, Thilo
- Subjects
- *
ARTIFICIAL neural networks , *SPEECH perception , *DEEP learning - Abstract
While deep neural networks have shown impressive results in automatic speaker recognition and related tasks, it is dissatisfactory how little is understood about what exactly is responsible for these results. Part of the success has been attributed in prior work to their capability to model supra-segmental temporal information (SST), i.e., learn rhythmic-prosodic characteristics of speech in addition to spectral features. In this paper, we (i) present and apply a novel test to quantify to what extent the performance of state-of-the-art neural networks for speaker recognition can be explained by modeling SST; and (ii) present several means to force respective nets to focus more on SST and evaluate their merits. We find that a variety of CNN- and RNN-based neural network architectures for speaker recognition do not model SST to any sufficient degree, even when forced. The results provide a highly relevant basis for impactful future research into better exploitation of the full speech signal and give insights into the inner workings of such networks, enhancing explainability of deep learning for speech technologies. • Literature explains speaker recognition in neural nets by modeling of voice dynmaics. • Diagnostic: We quantify how well deep learning models actually capture dynamics. • Observation: State-of-the-art deep nets do not model speaker prosody but ignore it. • Interpretation as "cheating": Achieving high without putting in due effort. • Outlook: Increasing task difficulty biases models towards prosody, but not enough. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Multi-layer encoder–decoder time-domain single channel speech separation.
- Author
-
Liu, Debang, Zhang, Tianqi, Christensen, Mads Græsbøll, Yi, Chen, and Wei, Ying
- Subjects
- *
SPEECH , *COMPUTATIONAL complexity , *VIDEO coding , *DEAF children - Abstract
With the emergence of more advanced separation networks, significant progress has been made in time-domain speech separation methods. These methods typically use a temporal encoder–decoder structure to encode speech feature sequences, thereby accomplishing the separation task. However, due to the limitation of traditional encoder–decoder structure, the separation performance decreases sharply when the encoded sequence is short, and when encoded sequence is sufficiently long, the separation performance improves, but which leads to an increase in computational complexity and training cost. Therefore, this paper compresses and reconstructs the speech feature sequence through a multi-layer convolution structure, and proposes a multi-layer encoder–decoder time-domain speech separation model (MLED). In this model, our encoder–decoder structure can compress speech sequence to a short length while ensuring the separation performance does not decrease. And combined with our multi-scale temporal attention (MSTA) separation network, MLED achieves efficient and precise separation of short encoded sequences. Therefore, compared to previous advanced time-domain separation methods, our experiments show that MLED achieves competitive separation performance with smaller model size, lower computational complexity, and training cost. • Our designed encoder-decoder network is more effective in shorter encoded sequence. • Since encoded sequence is shorter, MLED can efficiently performs separation task. • MLED can better balance performance, model size, computational and training costs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. A guided-based approach for deepfake detection: RGB-depth integration via features fusion.
- Author
-
Leporoni, Giorgio, Maiano, Luca, Papa, Lorenzo, and Amerini, Irene
- Subjects
- *
DEEPFAKES , *DETECTORS , *DISINFORMATION , *COMPUTER vision - Abstract
Deep fake technology paves the way for a new generation of super realistic artificial content. While this opens the door to extraordinary new applications, the malicious use of deepfakes allows for far more realistic disinformation attacks than ever before. In this paper, we start from the intuition that generating fake content introduces possible inconsistencies in the depth of the generated images. This extra information provides valuable spatial and semantic cues that can reveal inconsistencies facial generative methods introduce. To test this idea, we evaluate different strategies for integrating depth information into an RGB detector and we propose an attention mechanism that makes it possible to integrate information from depth effectively. In addition to being more accurate than an RGB model, our Masked Depthfake Network method is + 3. 2 % more robust against common adversarial attacks on average than a typical RGB detector. Furthermore, we show how this technique allows the model to learn more discriminative features than RGB alone. • Integrating depth and RGB improves the accuracy and robustness of deepfake detectors. • Late fusion is the best fusion strategy for integrating RGB and depth features. • We guide the integration of depth information via a self-attention mechanism. • Depth integration makes the model more robust to some opposing attacks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Joint facial action unit recognition and self-supervised optical flow estimation.
- Author
-
Shao, Zhiwen, Zhou, Yong, Li, Feiran, Zhu, Hancheng, and Liu, Bing
- Subjects
- *
OPTICAL flow , *FACIAL muscles - Abstract
Facial action unit (AU) recognition and optical flow estimation are two highly correlated tasks, since optical flow can provide motion information of facial muscles to facilitate AU recognition. However, most existing AU recognition methods handle the two tasks independently by offline extracting optical flow as auxiliary information or directly ignoring the use of optical flow. In this paper, we propose a novel end-to-end joint framework of AU recognition and optical flow estimation, in which the two tasks contribute to each other. Moreover, due to the lack of optical flow annotations in AU datasets, we propose to estimate optical flow in a self-supervised manner. To regularize the self-supervised estimation of optical flow, we propose an identical mapping constraint for the optical flow guided image warping process, in which the estimated optical flow between two same images is required to not change the image during warping. Experiments demonstrate that our framework (i) outperforms most of the state-of-the-art AU recognition methods on the challenging BP4D and GFT benchmarks, and (ii) also achieves competitive self-supervised optical flow estimation performance. • An end-to-end joint framework of AU recognition and self-supervised optical flow estimation. • An identity mapping constraint to ensure the reliability of self-supervised optical flow estimation. • An AU occurrence probability map regression loss for exploiting AU location information and facial spatial information. • Our approach outperforms most of the state-of-the-art AU recognition methods. • Our approach achieves comparable self-supervised optical flow estimation performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Paired relation feature network for spatial relation recognition.
- Author
-
Chen, Nanxi, Wang, Xu, Sun, Qi, Li, Jiamao, and Zhang, Xiaolin
- Abstract
Recognizing relations between objects in an image is challenging for neural networks because some relations may not have obvious dedicated visual features. This paper proposes a Paired Relation Feature Network (PRFN), where all spatial and semantic features are extracted from the subject–object pair jointly, without using any hand-crafted features. PRFN includes a paired 2D spatial feature module that can learn the representative features from a pair of bounding boxes. By focusing on the paired depth feature between the subject and object, the problem of depth feature extraction is simplified to the recognition of a ternary relation {−1, 0, 1}, which is much easier to learn from training data. Experimental results demonstrate the effectiveness of PRFN for both the cases of RGB-D images and RGB images with estimated depth. • Features are extracted from the subject–object pair. No hand-crafted feature is used. • 2D features are extracted from a pair of bounding boxes in a data-driven approach. • Depth feature extraction is simplified to the recognition of a ternary relation. • It is better to work with estimated disparity than to convert the disparity to depth. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Uncovering the authorship: Linking media content to social user profiles.
- Author
-
Baracchi, Daniele, Shullani, Dasara, Iuliani, Massimo, Giani, Damiano, and Piva, Alessandro
- Subjects
- *
SOCIAL media , *ARTIFICIAL neural networks , *DISCRETE cosine transforms , *FAKE news , *SOCIAL exchange - Abstract
The extensive spread of fake news on social networks is carried out by a diverse range of users, encompassing private individuals, newspapers, and organizations. With widely accessible image and video editing tools, malicious users can easily create manipulated media. They can then distribute this content through multiple fake profiles, aiming to maximize its social impact. To tackle this problem effectively, it is crucial to possess the ability to analyze shared media to identify the originators of fake news. To this end, multimedia forensics research has advanced tools that examine traces in media, revealing valuable insights into its origins. While combining these tools has proven to be highly efficient in creating profiles of image and video creators, it is important to note that most of these tools are not specifically designed to function effectively in the complex environment of content exchange on social networks. In this paper, we introduce the problem of establishing associations between images and their source profiles as a means to tackle the spread of disinformation on social platforms. To this end, we assembled SocialNews , an extensive image dataset comprising more than 12,000 images sourced from 21 user profiles across Facebook, Instagram, and Twitter, and we propose three increasingly realistic and challenging experimental scenarios. We present two simple yet effective techniques as benchmarks, one based on statistical analysis of Discrete Cosine Transform (DCT) coefficients and one employing a neural network model based on ResNet, and we compare their performance against the state of the art. Experimental results show that the proposed approaches exhibit superior performance in accurately classifying the originating user profiles. • We introduce SocialNews, a novel dataset for disinformation detection online. • Images were sourced primarily from social profiles of news agencies around the world. • The goal is to automatically identify the profile of the user that shared an image. • We introduce two benchmarks methods: a DCT-based one and a ResNet-based one. • Experiments show that the proposed benchmark methods outperform the state of the art. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Improvised contrastive loss for improved face recognition in open-set nature.
- Author
-
Khan, Zafran, Boragule, Abhijeet, d'Auriol, Brian J., and Jeon, Moongu
- Subjects
- *
FACE perception , *STIMULUS generalization , *MACHINE learning , *ACTIVE learning , *COMPUTATIONAL complexity - Abstract
Face recognition models often encounter various unseen domains and environments in real-world applications, leading to unsatisfactory performance due to the open-set nature of face recognition. Models trained on central datasets may exhibit poor generalization when faced with different candidates under varying illumination and blur conditions. In this paper, our goal is to enhance the generalization of face recognition models for diverse target conditions without relying on active or incremental learning. We propose an approach for face recognition that utilizes contrastive learning to synthesize positive and multiple negative samples. To address the combinatorial challenges posed by positive and negative samples, our framework incorporates a combination of contrastive regularizer loss and Arcface loss, along with an effective sampling strategy for batch model learning. We update the model weights by jointly back-propagating contrastive and ArcFace gradients. We validate our method on both generalized and standard face recognition benchmarks dataset namely IJB-B and IJB-C. Series of experimentation revealed the out-performance of proposed framework against other state-of-the-art methods. • Proposed generalized face recognition aimed to handle unknown target domains without model updates or fine-tuning. • Proposed contrastive learning-based approach to address problem of illumination and motion blur for registered candidates. • Employ augmentation techniques to generate positive and negative samples to mitigate computational complexities. • Integrate ArcFace and contrastive regularizer loss to learn distinctive face representation for each identity. • Performed series of experiments for proof convergence of proposed model on IJB-B and IJB-C datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. A siamese-based verification system for open-set architecture attribution of synthetic images.
- Author
-
Abady, Lydia, Wang, Jun, Tondi, Benedetta, and Barni, Mauro
- Subjects
- *
ARTIFICIAL neural networks , *TRANSFORMER models , *GENERATIVE adversarial networks , *DEEP learning - Abstract
Despite the wide variety of methods developed for synthetic image attribution, most of them can only attribute images generated by models or architectures included in the training set and do not work with unknown architectures, hindering their applicability in real-world scenarios. In this paper, we propose a verification framework that relies on a Siamese Network to address the problem of open-set attribution of synthetic images to the architecture that generated them. We consider two different settings. In the first setting, the system determines whether two images have been produced by the same generative architecture or not. In the second setting, the system verifies a claim about the architecture used to generate a synthetic image, utilizing one or multiple reference images generated by the claimed architecture. The main strength of the proposed system is its ability to operate in both closed and open-set scenarios so that the input images, either the query and reference images, can belong to the architectures considered during training or not. Experimental evaluations encompassing various generative architectures such as GANs, diffusion models, and transformers, focusing on synthetic face image generation, confirm the excellent performance of our method in both closed and open-set settings, as well as its strong generalization capabilities. • New verification framework for open-set architecture attribution of synthetic images. • Tested with several types of generative architectures in closed and open set scenarios. • Generalization tests prove that the system can verify unknown models of the architecture. • Outperforms state-of-the-art methods for open-set architecture attribution with rejection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Learning on sample-efficient and label-efficient multi-view cardiac data with graph transformer.
- Author
-
Wang, Lujing, Ma, Yunting, Zhang, Wanqiu, Zhao, Xiaoying, and Zhao, Xinxiang
- Subjects
- *
TRANSFORMER models , *HEART diseases , *CARDIOVASCULAR diseases , *PREDICTION models - Abstract
Predicting cardiovascular disease has been a challenging task, as assessing samples based on a single view of information may be insufficient. Therefore, in this paper, we focus on the challenge of predicting cardiovascular disease using multi-view cardiac data. However, multi-view cardiac data is usually difficult to collect and label. Based on this motivation, learning an effective predictive model on sample-efficient and label-efficient multi-view cardiac data is urgently needed. To address the aforementioned issues, we propose a multi-view learning method: (i) our method utilizes graph learning to establish and extract relationships between data, enabling learning from a small number of labeled data and a small number of samples; (ii) our method integrates features from multiple views to utilize complementary information in the data; (iii) for data without a provided graph of relationships between samples, we utilize the mechanism of transformers to learn the relationships between samples in a data-driven manner. We validate the effectiveness of our method on real heart disease datasets. • Our method considers multi-view cardiac data to provide comprehensive and accurate information for diagnosis. • Our method overcomes the limitations of sample-efficient and label-efficient data. • Our method captures global relationships between subjects and achieves high diagnostic accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Channel-spatial knowledge distillation for efficient semantic segmentation.
- Author
-
Karine, Ayoub, Napoléon, Thibault, and Jridi, Maher
- Subjects
- *
DISTILLATION , *IMAGE segmentation - Abstract
In this paper, we propose a new lightweight Channel-Spatial Knowledge Distillation (CSKD) method to handle the task of efficient image semantic segmentation. More precisely, we investigate the KD approach that train a compressed neural network called student under the supervision of a heavy one called teacher. In this context, we propose to improve the distillation mechanism by capturing the contextual dependencies in spatial and channel dimensions through a self-attention principle. In addition, to quantify the difference between the teacher and student knowledge, we adopt the Centered Kernel Alignment (CKA) metric that avoids the student to add additional leaning layers to match the teacher features size. Experimental results over Cityscapes, CamVid and Pascal VOC datasets demonstrate that our method can achieve outstanding performance. The code is available at https://github.com/ayoubkarine/CSKD. • Heavy semantic segmentation methods require high computational costs. • The knowledge distillation is adopted for efficient semantic Segmentation. • The spatial and channel distillations through self-attention between teacher and student networks are proposed. • The Centred Kernel Alignment Metric is used to measure the teacher and student knowledge. • Ablation study and comparison with state-of-the-art methods on different image semantic segmentation are presented. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Frame-part-activated deep reinforcement learning for Action Prediction.
- Author
-
Chen, Lei and Song, Zhanjie
- Subjects
- *
DEEP reinforcement learning , *REINFORCEMENT learning , *ACTIVE learning , *REINFORCEMENT (Psychology) , *HUMAN body - Abstract
In this paper, we propose a frame-part-activated deep reinforcement learning (FPA-DRL) for action prediction. Most existing methods for action prediction utilize the evolution of whole frames to model actions, which cannot avoid the noise of the current action, especially in the early prediction. Moreover, the loss of structural information of human body diminishes the capacity of features to describe actions. To address this, we design a FPA-DRL to exploit the structure of the human body by extracting skeleton proposals and reduce the redundancy of frames under a deep reinforcement learning framework. Specifically, we extract features from different parts of the human body individually, activate the action-related parts in features and the action-related frames in videos to enhance the representation. Our method not only exploits the structure information of the human body, but also considers the attention frame for expressing actions. We evaluate our method on three popular action prediction datasets: UT-Interaction, BIT-Interaction and UCF101. Our experimental results demonstrate that our method achieves the very competitive performance with state-of-the-arts. • We design the part-activated module to enhance the action-related parts of features. • We design the frame-activated module to reduce the redundancy of frames. • We achieved very competitive results of state-of-the-arts on three datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Continual learning for adaptive social network identification.
- Author
-
Magistri, Simone, Baracchi, Daniele, Shullani, Dasara, Bagdanov, Andrew D., and Piva, Alessandro
- Subjects
- *
SOCIAL networks , *SOCIAL media , *FOLKSONOMIES , *RESEARCH personnel - Abstract
The popularity of social networks as primary mediums for sharing visual content has made it crucial for forensic experts to identify the original platform of multimedia content. Various methods address this challenge, but the constant emergence of new platforms and updates to existing ones often render forensic tools ineffective shortly after release. This necessitates the regular updating of methods and models, which can be particularly cumbersome for techniques based on neural networks which cannot quickly adapt to new classes without sacrificing performance on previously learned ones – a phenomenon known as catastrophic forgetting. Recently, researchers aimed at mitigating this problem via a family of techniques known as continual learning. In this paper we study the applicability of continual learning techniques to the social network identification task by evaluating two relevant forensic scenarios: Incremental Social Platform Classification , for handling newly introduced social media platforms, and Incremental Social Version Classification , for addressing updated versions of a set of existing social networks. We perform an extensive experimental evaluation of a variety of continual learning approaches applied to these two scenarios. Experimental results demonstrate that, although Continual Social Network Identification remains a difficult problem, catastrophic forgetting can be significantly mitigated in both scenarios by retaining only a fraction of the image patches from past task training samples or by employing previous tasks prototypes. • We investigate continual learning methods for social network identification. • We exploit a state-of-the-art dual-branch neural network designed for this task. • We define two realistic experimental scenarios on multiple datasets. • Exemplar-based methods yield good performance with limited memory requirements. • Prototype-based methods are a viable solution when storing exemplars is not feasible. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.