3,927 results on '"fine-tuning"'
Search Results
2. Explainable AI for Plant Disease Detection: Assessing Explainability in Classifying Maize Leaves Diseases with Focus Score and Ablation-CAM
- Author
-
Quach, Luyl-Da, Quoc, Khang Nguyen, Nguyen, Chi-Ngon, Thai-Nghe, Nguyen, Li, Gang, Series Editor, Filipe, Joaquim, Series Editor, Ghosh, Ashish, Series Editor, Xu, Zhiwei, Series Editor, Thai-Nghe, Nguyen, editor, Do, Thanh-Nghi, editor, and Benferhat, Salem, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
- Author
-
Jiao, Siyu, Zhu, Hongguang, Huang, Jiannan, Zhao, Yao, Wei, Yunchao, Shi, Humphrey, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
4. Cross-Language Code Mapping with Transformer Encoder-Decoder Model
- Author
-
Naik, M. V. Deepak, Jayaraman, Swaminathan, Howlett, Robert J., Series Editor, Jain, Lakhmi C., Series Editor, Pal, Sankar K., editor, Thampi, Sabu M., editor, and Abraham, Ajith, editor
- Published
- 2025
- Full Text
- View/download PDF
5. A Deep Learning Approach for Non - invasive Body Mass Index Calculation
- Author
-
Nandhan, S. Harish, Zean, J. Remoon, Mahi, A. R., Meena, R., Mahalakshmi, S., Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Geetha, R., editor, Dao, Nhu-Ngoc, editor, and Khalid, Saeed, editor
- Published
- 2025
- Full Text
- View/download PDF
6. Enhancing domain-specific text generation for power grid maintenance with P2FT.
- Author
-
Yang, Yi, Li, Chenhao, Zhu, Binghang, Zheng, Wenjie, Zhang, Fengda, and Li, Zhuangzhuang
- Subjects
- *
LANGUAGE models , *NATURAL language processing , *ELECTRIC power distribution grids , *PROCESS capability , *COMPUTER performance - Abstract
The digitization of operation and maintenance in the intelligent power grid equipment relies on a diverse array of information for smart decision-making. In the domain of intelligent decision generation, proficiency is contingent upon extensive learning from copious amounts of text. This necessitates not only robust processing capabilities but also a high level of specialization. In addressing situations where authorization is lacking, pre-trained language models (PLMs) have already provided ideas when confronted with specialized domains or tasks. In consideration of the complexity of textual content in the field of the power grid, which encompasses a multitude of specialized knowledge and involves an abundance of proprietary terminology, we have undertaken an exploration of pre-trained model specialization using the power grid domain as an example, specifically for the task of generating maintenance strategies. A two-stage fine-tuning approach (P2FT) is employed, utilizing a large-scale pre-training model specifically designed for natural language processing. The efficacy and practical value of this method were evaluated through multiple metrics, juxtaposed with other advanced approaches involving low-parameter or parameter-free fine-tuning methods. Through a meticulous analysis and validation of experimental outcomes, we have corroborated the feasibility and practical application value of employing this approach for pre-trained model specialization. Additionally, it has furnished valuable guidance for text generation within both the Chinese language domain and the power grid domain. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Perceptual super-resolution in multiple sclerosis MRI.
- Author
-
Giraldo, Diana L., Khan, Hamza, Pineda, Gustavo, Liang, Zhihua, Lozano-Castillo, Alfonso, Van Wijmeersch, Bart, Woodruff, Henry C., Lambin, Philippe, Romero, Eduardo, Peeters, Liesbet M., and Sijbers, Jan
- Subjects
CONVOLUTIONAL neural networks ,MAGNETIC resonance imaging ,DEEP learning ,BRAIN damage ,SPINAL cord - Abstract
Introduction: Magnetic resonance imaging (MRI) is crucial for diagnosing and monitoring of multiple sclerosis (MS) as it is used to assess lesions in the brain and spinal cord. However, in real-world clinical settings, MRI scans are often acquired with thick slices, limiting their utility for automated quantitative analyses. This work presents a single-image super-resolution (SR) reconstruction framework that leverages SR convolutional neural networks (CNN) to enhance the through-plane resolution of structural MRI in people with MS (PwMS). Methods: Our strategy involves the supervised fine-tuning of CNN architectures, guided by a content loss function that promotes perceptual quality, as well as reconstruction accuracy, to recover high-level image features. Results: Extensive evaluation with MRI data of PwMS shows that our SR strategy leads to more accurate MRI reconstructions than competing methods. Furthermore, it improves lesion segmentation on low-resolution MRI, approaching the performance achievable with high-resolution images. Discussion: Results demonstrate the potential of our SR framework to facilitate the use of low-resolution retrospective MRI from real-world clinical settings to investigate quantitative image-based biomarkers of MS. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Detection of malicious smart contracts by fine‐tuning GPT‐3.
- Author
-
Sathvik, MSVPJ and Mazumdar, Hirak
- Subjects
- *
RESEARCH questions , *NATURAL languages , *CONTRACTS , *CLASSIFICATION - Abstract
This paper introduces a comprehensive framework for the detection and identification of malicious smart contracts, emphasizing their vulnerabilities. The framework leverages the capabilities of GPT‐3, which have been adapted and fine‐tuned for binary and multi‐class classification tasks. To the best of our knowledge, this study is the first to explore the use of GPT‐3 specifically for detecting and identifying malicious smart contracts. The framework addresses previously unexplored research questions and provides insightful answers through rigorous experimentation. The contributions of this work include proposing a novel approach, pioneering the adaptation of GPT‐3 for this purpose, and offering valuable insights into the detection of malicious smart contracts and vulnerabilities. Notably, our research reveals that GPT‐3 excels not only in understanding natural language but also in decoding the secrets embedded in numerical codes like opcodes. This finding extends the applicability of GPT‐3 beyond language‐based tasks and highlights its potential in enhancing smart contract security. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Imputation Strategies in Time Series Based on Language Models.
- Author
-
Jacobsen, Michel and Tropmann-Frick, Marina
- Abstract
Incomplete time series present a significant challenge for downstream analysis. In the field of time series, Large Language Models are already being used for prediction, classification, and, in rare cases, imputation. This study thoroughly examines the imputation of time series using Large Language Models. Within a defined experimental setup, current state-of-the-art time series imputation methods are compared with the performance of Large Language Models. Parameter-efficient fine-tuning methods are applied to adapt the Large Language Models to the imputation task. The results indicate that the models are suitable for time series imputation. The performance of these models depends on the number of parameters and the type of pre-training. Small specialized models, such as BERT, compete with models like Llama2 and outperform them on selected datasets. Furthermore, it becomes clear that the attention and feedforward network components of Large Language Models are particularly well-suited for adaptation to imputation, and parameter-efficient methods are also performance-enhancing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Enhancing Fire Detection Performance Based on Fine-Tuned YOLOv10.
- Author
-
Huynh, Trong Thua, Nguyen, Hoang Thanh, and Phu, Du Thang
- Abstract
In recent years, early detection and warning of fires have posed a significant challenge to environmental protection and human safety. Deep learning models such as Faster R-CNN (Faster Region based Convolutional Neural Network), YOLO (You Only Look Once), and their variants have demonstrated superiority in quickly detecting objects from images and videos, creating new opportunities to enhance automatic and efficient fire detection. The YOLO model, especially newer versions like YOLOv10, stands out for its fast processing capability, making it suitable for low-latency applications. However, when applied to real-world datasets, the accuracy of fire prediction is still not high. This study improves the accuracy of YOLOv10 for real-time applications through model fine-tuning techniques and data augmentation. The core work of the research involves creating a diverse fire image dataset specifically suited for fire detection applications in buildings and factories, freezing the initial layers of the model to retain general features learned from the dataset by applying the Squeeze and Excitation attention mechanism and employing the Stochastic Gradient Descent (SGD) with a momentum optimization algorithm to enhance accuracy while ensuring real-time fire detection. Experimental results demonstrate the effectiveness of the proposed fire prediction approach, where the YOLOv10 small model exhibits the best balance compared to other YOLO family models such as nano, medium, and balanced. Additionally, the study provides an experimental evaluation to highlight the effectiveness of model fine-tuning compared to the YOLOv10 baseline, YOLOv8 and Faster R-CNN based on two criteria: accuracy and prediction time. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. LM4OPT: Unveiling the potential of Large Language Models in formulating mathematical optimization problems.
- Author
-
Ahmed, Tasnim and Choudhury, Salimur
- Abstract
In the fast-paced domain of natural language processing, converting linguistic descriptions into mathematical optimization problems is a complex task, requiring profound comprehension and processing skills from Large Language Models (LLMs). In this study, various LLMs were evaluated, including GPT-3.5, GPT-4, and smaller variants with seven billion parameters: Llama-2, Falcon, Mistral, and Zephyr. This research investigated their performance in both zero-shot and one-shot settings for this task, revealing that GPT-4 outperformed others, particularly in the one-shot scenario. A core contribution of this study is the development of LM4OPT, a progressive fine-tuning framework specifically designed for smaller LLMs. This framework leverages noisy embeddings and specialized datasets to enhance the performance of the models. Regardless of the inherent limitations of smaller models in processing complex and lengthy input contexts, our experimental results indicate a significant reduction in the performance disparity between smaller and larger models when the former are fine-tuned using LM4OPT. Our empirical study, utilizing the NL4Opt dataset, unveils that GPT-4 surpasses the baseline performance established by previous research, achieving an accuracy of 63.30 % , solely based on the problem description in natural language, and without relying on any additional named entity information. GPT-3.5 follows closely, both outperforming the progressively fine-tuned smaller models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Enhancing Misinformation Detection in Spanish Language with Deep Learning: BERT and RoBERTa Transformer Models.
- Author
-
Blanco-Fernández, Yolanda, Otero-Vizoso, Javier, Gil-Solla, Alberto, and García-Duque, Jorge
- Subjects
LANGUAGE models ,TRANSFORMER models ,SPANISH language ,FAKE news ,POLITICIANS - Abstract
This paper presents an approach to identifying political fake news in Spanish using Transformer architectures. Current methodologies often overlook political news due to the lack of quality datasets, especially in Spanish. To address this, we created a synthetic dataset of 57,231 Spanish political news articles, gathered via automated web scraping and enhanced with generative large language models. This dataset is used for fine-tuning and benchmarking Transformer models like BERT and RoBERTa for fake news detection. Our fine-tuned models showed outstanding performance on this dataset, with accuracy ranging from 97.4% to 98.6%. However, testing with a smaller, independent hand-curated dataset, including statements from political leaders during Spain's July 2023 electoral debates, revealed a performance drop to 71%. Although this suggests that the model needs additional refinements to handle the complexity and variability of real-world political discourse, achieving over 70% accuracy seems a promising result in the under-explored domain of Spanish political fake news detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Exploring the potential of DistilBERT architecture for automatic essay scoring task.
- Author
-
Ikiss, Soumia, Daoudi, Najima, Abourezq, Manar, and Bellafkih, Mostafa
- Subjects
ARTIFICIAL neural networks ,LANGUAGE models ,TRANSFORMER models ,ALTERNATIVE education ,ESSAYS - Abstract
Automatic assessment of writing essays, or the process of using computers to evaluate and assign grades to written text, is very needed in the education system as an alternative to reduce human burden and time consumption, especially for large-scale tests. This task has received more attention in the last few years, being one of the major uses for natural language processing (NLP). Traditional automatic scoring systems typically rely on handcrafted features, whereas recent studies have used deep neural networks. Since the advent of transformers, pre-trained language models have performed well in many downstream tasks. We utilize the Kaggle benchmarking automated student assessment prize dataset to fine-tune the pre-trained DistilBERT in three different scenarios, and we compare results with the existing neural network-based approaches to achieve improved performance in the automatic essay scoring task. We utilize quadratic weighted Kappa (QWK) as the main metric to evaluate the performance of our proposed method. Results show that fine-tuning DistilBERT gives good results, especially with the scenario of training all parameters, which achieve 0.90 of QWK and outperform neural network models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Transformer-based thin crack segmentation: an efficient multiscale approach for automatic visual inspection.
- Author
-
Al-maqtari, Omar, Peng, Bo, Al-Huda, Zaid, and Hu, Jie
- Abstract
Road safety relies heavily on accurate crack detection. However, existing models encounter challenges when confronted with complex crack patterns, challenging backgrounds, and computational inefficiency. These obstacles hinder their practicality, particularly when considering the diversity of cracks e.g. thin cracks. In order to overcome these limitations, we propose a thin crack segmentation model based on an efficient multiscale transformer. The proposed model consists of four main methods: Multiscale Self-attention Module for comprehensive global and local self-attention, V approximation (V-apprx) to reduce self-attention parameters and computational power, Hierarchal Multiscale Self-attention Decoder for compressing and applying multiscale self-attention, and Fine-tuning Model for refined feature extraction through the application of Fourier Feature Mapping. By incorporating V vector approximation within the multiscale transformer, our proposed model leveraged both efficiency and robustness. Evaluation on the largest crack dataset i.e. Crack11k and its sub-datasets, demonstrates that the proposed model outperforms other comparable models across multiple evaluation metrics, while only requiring 1.33M parameters and 18.32 GFLOPs. Code can be found at https://zenodo.org/doi/10.5281/zenodo.11166041. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Wildlife Species Classification from Camera Trap Images Using Fine-tuning EfficientNetV2.
- Author
-
Thanh-Nghi Doan and Duc-Ngoc Le-Thi
- Subjects
CONVOLUTIONAL neural networks ,ARTIFICIAL neural networks ,DATA augmentation ,WILDLIFE conservation ,CAMERAS ,ANIMAL traps - Abstract
Camera traps are a valuable tool for wildlife research and conservation, but wildlife species classification in camera trap imagery is challenging due to the variation in species appearance, pose, and lighting conditions. This study explores the use of transfer learning and fine-tuning to develop a robust deep convolutional neural network model for wildlife species classification from camera trap images. To prevent overfitting, data augmentation techniques were applied during the image pre-processing stage. ResNet-50 and various EfficientNetV2 variants have been evaluated, and the EfficientNetV2-L model emerged as the top performer. Fine-tuning methods were then applied to the EfficientNetV2-L model to further improve its performance. Experimental results show that the fine-tuned EfficientNetV2-L model outperformed other methods with an accuracy of 88.822%, a precision of 86.941%, a recall of 87.638%, and an F1-score of 87.193% on a held-out test set, demonstrating its effectiveness for wildlife species classification from camera trap images. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Advanced Deep Learning Fusion Model for Early Multi-Classification of Lung and Colon Cancer Using Histopathological Images.
- Author
-
Abd El-Aziz, A. A., Mahmood, Mahmood A., and Abd El-Ghany, Sameh
- Subjects
- *
LUNG cancer , *COLON cancer , *DIGITAL image processing , *EARLY detection of cancer , *DEEP learning - Abstract
Background: In recent years, the healthcare field has experienced significant advancements. New diagnostic techniques, treatments, and insights into the causes of various diseases have emerged. Despite these progressions, cancer remains a major concern. It is a widespread illness affecting individuals of all ages and leads to one out of every six deaths. Lung and colon cancer alone account for nearly two million fatalities. Though it is rare for lung and colon cancers to co-occur, the spread of cancer cells between these two areas—known as metastasis—is notably high. Early detection of cancer greatly increases survival rates. Currently, histopathological image (HI) diagnosis and appropriate treatment are key methods for reducing cancer mortality and enhancing survival rates. Digital image processing (DIP) and deep learning (DL) algorithms can be employed to analyze the HIs of five different types of lung and colon tissues. Methods: Therefore, this paper proposes a refined DL model that integrates feature fusion for the multi-classification of lung and colon cancers. The proposed model incorporates three DL architectures: ResNet-101V2, NASNetMobile, and EfficientNet-B0. Each model has limitations concerning variations in the shape and texture of input images. To address this, the proposed model utilizes a concatenate layer to merge the pre-trained individual feature vectors from ResNet-101V2, NASNetMobile, and EfficientNet-B0 into a single feature vector, which is then fine-tuned. As a result, the proposed DL model achieves high success in multi-classification by leveraging the strengths of all three models to enhance overall accuracy. This model aims to assist pathologists in the early detection of lung and colon cancer with reduced effort, time, and cost. The proposed DL model was evaluated using the LC25000 dataset, which contains colon and lung HIs. The dataset was pre-processed using resizing and normalization techniques. Results: The model was tested and compared with recent DL models, achieving impressive results: 99.8% for precision, 99.8% for recall, 99.8% for F1-score, 99.96% for specificity, and 99.94% for accuracy. Conclusions: Thus, the proposed DL model demonstrates exceptional performance across all classification categories. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Developing an Automatic Asbestos Detection Method Based on a Convolutional Neural Network and Support Vector Machine.
- Author
-
Matsuo, Tomohito, Takimoto, Mitsuteru, Tanaka, Suzuyo, Futamura, Ayami, Shimadera, Hikari, and Kondo, Akira
- Subjects
CONVOLUTIONAL neural networks ,PHASE-contrast microscopy ,SUPPORT vector machines ,MACHINE learning ,MANUAL labor - Abstract
When buildings containing asbestos are demolished, fine asbestos fibers are released, which can result in serious adverse health effects. Therefore, leakage is monitored to prevent the dispersion of asbestos fibers. Airborne asbestos fibers are monitored via microscopic observation, which requires significant manual labor. In this study, we developed a machine-learning model to automatically detect asbestos fibers in phase-contrast microscopy images. The model was based on a pre-trained convolutional neural network as its foundation, with fully connected layers and a support vector machine (SVM) serving as classifiers. The effects of fine-tuning, class weighting, and hyperparameters were assessed to improve model performance. Consequently, the SVM was chosen as a classifier to improve overall model performance. In addition, fine-tuning improved the performance of the models. The optimized detection model exhibited high classification performance with an F1 score of 0.83. The findings of this study provide valuable insights into effectively detecting asbestos fibers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. A "Region-Specific Model Adaptation (RSMA)"-Based Training Data Method for Large-Scale Land Cover Mapping.
- Author
-
Li, Congcong, Xian, George, and Jin, Suming
- Subjects
- *
MACHINE learning , *LAND cover , *HABITAT conservation , *BIOGEOCHEMICAL cycles , *DATABASES , *DEEP learning - Abstract
An accurate and historical land cover monitoring dataset for Alaska could provide fundamental information for a range of studies, such as conservation habitats, biogeochemical cycles, and climate systems, in this distinctive region. This research addresses challenges associated with the extraction of training data for timely and accurate land cover classifications in Alaska over longer time periods (e.g., greater than 10 years). Specifically, we designed the "Region-Specific Model Adaptation (RSMA)" method for training data. The method integrates land cover information from the National Land Cover Database (NLCD), LANDFIRE's Existing Vegetation Type (EVT), and the National Wetlands Inventory (NWI) and machine learning techniques to generate robust training samples based on the Anderson Level II classification legend. The assumption of the method is that spectral signatures vary across regions because of diverse land surface compositions; however, despite these variations, there are consistent, collective land cover characteristics that span the entire region. Building upon this assumption, this research utilized the classification power of deep learning algorithms and the generalization ability of RSMA to construct a model for the RSMA method. Additionally, we interpreted existing vegetation plot information for land cover labels as validation data to reduce inconsistency in the human interpretation. Our validation results indicate that the RSMA method improved the quality of the training data derived solely from the NLCD by approximately 30% for the overall accuracy. The validation assessment also demonstrates that the RSMA method can generate reliable training data on large scales in regions that lack sufficient reliable data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Remaining useful life prediction of slewing bearings using attention mechanism enabled multivariable gated recurrent unit network.
- Author
-
Shao, Yiyu, Qian, Qinrong, and Wang, Hua
- Subjects
- *
REMAINING useful life , *ACCELERATED life testing , *PREDICTION models , *GENERALIZATION , *TORQUE , *DEEP learning - Abstract
It is difficult to obtain the damage information on large slewing bearings only from vibration signals. In addition, deep learning models trained on old samples do not achieve high accuracy in new tasks. Therefore, this paper uses vibration, temperature, and torque signals of slewing bearings to build a model. Meanwhile, we add attention mechanism to capture internal correlation of them to consider the related factors of remaining useful life (RUL) from multiple angles. The multivariable gated recurrent unit (GRU) based on attention mechanism gated recurrent unit (attention-MGRU) model is adopted to improve the prediction performance. On this foundation, a fine-tuning strategy is introduced to improve the generalization ability of the model. A full-life accelerated test is carried out on the slewing bearing test bench. The model proposed in this paper is compared with GRU prediction model, which utilizes vibration signals and multivariable GRU prediction model. Mean absolute error (MAE) and root-mean-square error (RMSE) are used as measurement indicators. Among different methods, three indicators generated by attention-MGRU show significant superiority. Moreover, the fine-tuned model performs better in new tasks compared with the original model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Construction and preliminary application of large language model for reservoir performance analysis.
- Author
-
PAN Huanquan, LIU Jianqiao, GONG Bin, ZHU Yiheng, BAI Junhui, HUANG Hu, FANG Zhengbao, JING Hongbin, LIU Chen, KUANG Tie, LAN Yubo, WANG Tianzhi, XIE Tian, CHENG Mingzhe, QIN Bin, and SHEN Yujiang
- Subjects
LANGUAGE models ,GAS reservoirs ,PETROLEUM prospecting ,ARTIFICIAL intelligence ,DATA analysis - Abstract
A large language model (LLM) is constructed to address the sophisticated demands of data retrieval and analysis, detailed well profiling, computation of key technical indicators, and the solutions to complex problems in reservoir performance analysis (RPA). The LLM is constructed for RPA scenarios with incremental pre-training, fine-tuning, and functional subsystems coupling. Functional subsystem and efficient coupling methods are proposed based on named entity recognition (NER), tool invocation, and Text-to-SQL construction, all aimed at resolving pivotal challenges in developing the specific application of LLMs for RDA. This study conducted a detailed accuracy test on feature extraction models, tool classification models, data retrieval models and analysis recommendation models. The results indicate that these models have demonstrated good performance in various key aspects of reservoir dynamic analysis. The research takes some injection and production well groups in the PK3 Block of the Daqing Oilfield as an example for testing. Testing results show that our model has significant potential and practical value in assisting reservoir engineers with RDA. The research results provide a powerful support to the application of LLM in reservoir performance analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. EFTNet: an efficient fine-tuning method for few-shot segmentation.
- Author
-
Li, Jiaguang, Wang, Yubo, Gao, Zihan, and Wei, Ying
- Subjects
MACHINE learning ,PROTOTYPES ,FORECASTING ,DESIGN - Abstract
Few-shot segmentation (FSS) aims to segment novel classes given a small number of labeled samples. Most of the existing studies do not fine-tune the model during meta-testing, thus biasing the model towards the base classes and preventing the prediction of novel classes. Other studies only use support images for fine-tuning, which biases the model towards the support images rather than the target query images, especially when there is a large difference between the support and the query images. To alleviate these issues, we propose an e ̲ fficient f ̲ ine- t ̲ uning network (EFTNet) that uses unlabeled query images and predicted pseudo labels to fine-tune the trained model parameters during meta-testing, which can bias the model towards the target query images. In addition, we design a query-to-support module, a support-to-query module, and a discrimination module to evaluate which fine-tuning round the model achieves optimal results. Moreover, the query-to-support module also takes the query images and their pseudo masks as part of the support images and support masks, which causes the prototypes to contain query information and tend to obtain better predictions. As a new meta-testing scheme, our EFTNet can be easily combined with existing studies and greatly improve their model performance without repeating the meta-training phase. Many experiments on PASCAL- 5 i and COCO- 20 i prove the effectiveness of our EFTNet. The EFTNet also achieves new state-of-the-art performance. Codes are available at https://github.com/Jiaguang-NEU/EFTNet. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Military Equipment Entity Extraction Based on Large Language Model.
- Author
-
Liu, Xuhong, Yu, Zhipeng, Liu, Xiulei, Miao, Lin, and Yang, Tao
- Subjects
LANGUAGE models ,MILITARY supplies ,DATA augmentation ,MILITARY technology ,DEEP learning ,KNOWLEDGE base - Abstract
The technology of military equipment entity extraction, a crucial component in constructing military knowledge bases, holds significant research value and theoretical importance for guiding the development and improvement of equipment support forces. In the military domain, equipment entities exhibit a phenomenon of nesting, where one entity is contained within another, and abbreviations or codes are frequently used to represent these entities. To address this complexity, this paper proposes a method named CoTNER for extracting entities. Initially, a large-scale language model is used to perform data augmentation with chain-of-thought on the original dataset, providing additional semantic and contextual information. Subsequently, the augmented dataset is fine-tuned on a small-scale language model to adapt it to the task of military equipment entity extraction and to enhance its ability to learn complex rules specific to the domain of military equipment. Additionally, a high-quality data filtering strategy based on instruction-following difficulty scoring is proposed to address the catastrophic forgetting issue that may occur during the fine-tuning of large language models. The experimental results demonstrate that the proposed military equipment entity extraction method outperforms mainstream traditional deep learning methods, validating the effectiveness of CoTNER. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Improving Automated Detection of Cataract Disease through Transfer Learning using ResNet50.
- Author
-
Mahmood, Salwa Shakir, Chaabouni, Sihem, and Fakhfakh, Ahmed
- Subjects
MACULAR degeneration ,EYE diseases ,CATARACT ,DIABETIC retinopathy ,DIAGNOSIS ,FUNDUS oculi - Abstract
Manual diagnosis of eye diseases through ocular fundus scans is a challenging and complicated task because it is time-consuming and prone to errors. Deep learning techniques are used to detect various ocular diseases from fundus images. Such techniques can accurately classify ocular scans, enabling automated and precise detection of ocular diseases. This study uses the ResNet50 transfer learning model, data augmentation, fine-tuning, binary classification, and rigorous evaluation to achieve state-of-the-art results in the detection of cataract eye disease. This study was primarily implemented on a heavily skewed ODIR-5K dataset comprising 5000 fundus images. These ocular images are distributed unevenly among eight disease classes, including cataract, glaucoma, diabetic retinopathy, age-related macular degeneration, and others. In response to this imbalance and disparity, the proposed approach involved converting the multiclass problem into binary classification tasks, maintaining an equitable distribution of samples within each class. A balanced dataset was used to train a binary classifier using the ResNet50 CNN model. The system achieved an overall test accuracy of 96.63%, outperforming previous methods in differentiating between normal and cataract cases. In general, achieving dataset balance and employing the ResNet50 model enhances the accuracy of automated diagnosis of ocular diseases based on fundus images. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Systematic exploration and in-depth analysis of ChatGPT architectures progression.
- Author
-
Banik, Debajyoty, Pati, Natasha, and Sharma, Atul
- Subjects
CHATGPT ,LANGUAGE models ,ARTIFICIAL intelligence ,COMMON sense - Abstract
The fast evolution of artificial intelligence frameworks has resulted in the creation of increasingly sophisticated large language models (LLM), ChatGPT being the most famous one. This study paper dives into this LLM with a case study of ChatGPT's architecture and provides a thorough comparative analysis of its numerous versions, tracking its history from its conception to its most recent incarnations. This research intends to give a full knowledge of the model's history by investigating the underlying mechanisms and enhancements provided in each edition. The comparative analysis covers key aspects such as model size, training data, fine-tuning techniques, and performance metrics. Furthermore, this study evaluates the limits of ChatGPT in its many incarnations. These limitations include common sense reasoning difficulties, biased replies, verbosity, sensitivity to input wording, and others. Each constraint is investigated for potential remedies and workarounds. This research article also provides a complete analysis of the ChatGPT architecture and its progress through multiple iterations. It gives vital insights for academics, developers, and users wanting to harness the promise of ChatGPT while managing its restrictions by exploring both the model's strengths and limitations. The distinctiveness of this paper rests in its comprehensive assessment of ChatGPT's architectural development and its practical strategy for resolving the myriad difficulties in producing cohesive and contextually relevant replies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Advanced fine-tuning procedures to enhance DNN robustness in visual coding for machines.
- Author
-
Marie, Alban, Desnos, Karol, Mercat, Alexandre, Morin, Luce, Vanne, Jarno, and Zhang, Lu
- Subjects
- *
ARTIFICIAL neural networks , *IMAGE recognition (Computer vision) , *MACHINE learning , *VIDEO coding , *DEEP learning , *MPEG (Video coding standard) - Abstract
Video Coding for Machines (VCM) is gaining momentum in applications like autonomous driving, industry manufacturing, and surveillance, where the robustness of machine learning algorithms against coding artifacts is one of the key success factors. This work complements the MPEG/JVET standardization efforts in improving the resilience of deep neural network (DNN)-based machine models against such coding artifacts by proposing the following three advanced fine-tuning procedures for their training: (1) the progressive increase of the distortion strength as the training proceeds; (2) the incorporation of a regularization term in the original loss function to minimize the distance between predictions on compressed and original content; and (3) a joint training procedure that combines the proposed two approaches. These proposals were evaluated against a conventional fine-tuning anchor on two different machine tasks and datasets: image classification on ImageNet and semantic segmentation on Cityscapes. Our joint training procedure is shown to reduce the training time in both cases and still obtain a 2.4% coding gain in image classification and 7.4% in semantic segmentation, whereas a slight increase in training time can bring up to 9.4% better coding efficiency for the segmentation. All these coding gains are obtained without any additional inference or encoding time. As these advanced fine-tuning procedures are standard-compliant, they offer the potential to have a significant impact on visual coding for machine applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. BF-SAM: enhancing SAM through multi-modal fusion for fine-grained building function identification.
- Author
-
Gong, Zhaoya, Li, Binbo, Wang, Chenglong, Chen, Jun, and Zhao, Pengjun
- Subjects
- *
FEATURE extraction , *URBAN planning , *REMOTE sensing , *POPULATION density , *BIG data - Abstract
AbstractBuilding function identification (BFI) is crucial for urban planning and governance. The traditional remote sensing approach primarily focuses on extracting the physical features of buildings, overlooking their functional uses. Recently, progress has been made in urban functional area identification through multi-modal representation learning from multi-source spatial big data. However, the two approaches are disconnected, and each approach is inadequate to tackle the fine-grained BFI problem solely. To address this challenge, this study proposes a multi-modal foundation model for BFI, called BF-SAM, by fine-tuning a large visual model, Segment Anything Model (SAM), with multi-modal features related to urban functions. This model harnesses the segmentation capability of SAM for building delineation and fuses it with multi-modal representation learning for functional identification through a novel multi-modal fine-tuning method for SAM. Modality-dedicated feature extraction methods are devised to learn geographic features from road networks, population density, and points of interest. The validity of BF-SAM was evaluated on datasets from Munich, Beijing, Suzhou, and Hefei, and the importance of multi-modal geographic features was examined through extensive experiments. BF-SAM achieved a superior performance compared to a series of benchmarks. The potential of model transferability of BF-SAM was further explored under different spatial contexts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Revolutionizing breast ultrasound diagnostics with EfficientNet-B7 and Explainable AI.
- Author
-
Latha, M., Kumar, P. Santhosh, Chandrika, R. Roopa, Mahesh, T. R., Kumar, V. Vinoth, and Guluwadi, Suresh
- Subjects
IMAGE recognition (Computer vision) ,BREAST ultrasound ,ULTRASONIC imaging ,DATA augmentation ,BREAST imaging - Abstract
Breast cancer is a leading cause of mortality among women globally, necessitating precise classification of breast ultrasound images for early diagnosis and treatment. Traditional methods using CNN architectures such as VGG, ResNet, and DenseNet, though somewhat effective, often struggle with class imbalances and subtle texture variations, leading to reduced accuracy for minority classes such as malignant tumors. To address these issues, we propose a methodology that leverages EfficientNet-B7, a scalable CNN architecture, combined with advanced data augmentation techniques to enhance minority class representation and improve model robustness. Our approach involves fine-tuning EfficientNet-B7 on the BUSI dataset, implementing RandomHorizontalFlip, RandomRotation, and ColorJitter to balance the dataset and improve model robustness. The training process includes early stopping to prevent overfitting and optimize performance metrics. Additionally, we integrate Explainable AI (XAI) techniques, such as Grad-CAM, to enhance the interpretability and transparency of the model's predictions, providing visual and quantitative insights into the features and regions of ultrasound images influencing classification outcomes. Our model achieves a classification accuracy of 99.14%, significantly outperforming existing CNN-based approaches in breast ultrasound image classification. The incorporation of XAI techniques enhances our understanding of the model's decision-making process, thereby increasing its reliability and facilitating clinical adoption. This comprehensive framework offers a robust and interpretable tool for the early detection and diagnosis of breast cancer, advancing the capabilities of automated diagnostic systems and supporting clinical decision-making processes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Optimizing large language models in digestive disease: strategies and challenges to improve clinical outcomes.
- Author
-
Giuffrè, Mauro, Kresevic, Simone, Pugliese, Nicola, You, Kisung, and Shung, Dennis L.
- Subjects
- *
LANGUAGE models , *DIGESTIVE system diseases , *REINFORCEMENT learning , *CHATGPT , *LEARNING - Abstract
Large Language Models (LLMs) are transformer‐based neural networks with billions of parameters trained on very large text corpora from diverse sources. LLMs have the potential to improve healthcare due to their capability to parse complex concepts and generate context‐based responses. The interest in LLMs has not spared digestive disease academics, who have mainly investigated foundational LLM accuracy, which ranges from 25% to 90% and is influenced by the lack of standardized rules to report methodologies and results for LLM‐oriented research. In addition, a critical issue is the absence of a universally accepted definition of accuracy, varying from binary to scalar interpretations, often tied to grader expertise without reference to clinical guidelines. We address strategies and challenges to increase accuracy. In particular, LLMs can be infused with domain knowledge using Retrieval Augmented Generation (RAG) or Supervised Fine‐Tuning (SFT) with reinforcement learning from human feedback (RLHF). RAG faces challenges with in‐context window limits and accurate information retrieval from the provided context. SFT, a deeper adaptation method, is computationally demanding and requires specialized knowledge. LLMs may increase patient quality of care across the field of digestive diseases, where physicians are often engaged in screening, treatment and surveillance for a broad range of pathologies for which in‐context learning or SFT with RLHF could improve clinical decision‐making and patient outcomes. However, despite their potential, the safe deployment of LLMs in healthcare still needs to overcome hurdles in accuracy, suggesting a need for strategies that integrate human feedback with advanced model training. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Entity Extraction of Key Elements in 110 Police Reports Based on Large Language Models.
- Author
-
Xing, Xintao and Chen, Peng
- Subjects
LANGUAGE models ,POLICE reports ,CHATGPT ,DATA mining ,DATA augmentation - Abstract
With the rapid advancement of Internet technology and the increasing volume of police reports, relying solely on extensive human labor and traditional natural language processing methods for key element extraction has become impractical. Applying advanced technologies such as large language models to improve the effectiveness of police report extraction has become an inevitable trend in the field of police data analysis. This study addresses the characteristics of Chinese police reports and the need to extract key elements by employing large language models specific to the public security domain for entity extraction. Several lightweight (6/7b) open-source large language models were tested as base models. To enhance model performance, LoRA fine-tuning was employed, combined with data engineering approaches. A zero-shot data augmentation method based on ChatGPT and prompt engineering techniques tailored for police reports were proposed to further improve model performance. The key police report data from a certain city in 2019 were used as a sample for testing. Compared to the base models, prompt engineering improved the F1 score by approximately 3%, while fine-tuning led to an increase of 10–50% in the F1 score. After fine-tuning and comparing different base models, the Baichuan model demonstrated the best overall performance in extracting key elements from police reports. Using the data augmentation method to double the data size resulted in an additional 4% increase in the F1 score, achieving optimal model performance. Compared to the fine-tuned universal information extraction (UIE) large language model, the police report entity extraction model constructed in this study improved the F1 score for each element by approximately 5%, with a 42% improvement in the F1 score for the "organization" element. Finally, ChatGPT was employed to align the extracted entities, resulting in a high-quality entity extraction outcome. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Fine-tuning large language models for rare disease concept normalization.
- Author
-
Wang, Andy, Liu, Cong, Yang, Jingye, and Weng, Chunhua
- Abstract
Objective We aim to develop a novel method for rare disease concept normalization by fine-tuning Llama 2, an open-source large language model (LLM), using a domain-specific corpus sourced from the Human Phenotype Ontology (HPO). Methods We developed an in-house template-based script to generate two corpora for fine-tuning. The first (NAME) contains standardized HPO names, sourced from the HPO vocabularies, along with their corresponding identifiers. The second (NAME+SYN) includes HPO names and half of the concept's synonyms as well as identifiers. Subsequently, we fine-tuned Llama 2 (Llama2-7B) for each sentence set and conducted an evaluation using a range of sentence prompts and various phenotype terms. Results When the phenotype terms for normalization were included in the fine-tuning corpora, both models demonstrated nearly perfect performance, averaging over 99% accuracy. In comparison, ChatGPT-3.5 has only ∼20% accuracy in identifying HPO IDs for phenotype terms. When single-character typos were introduced in the phenotype terms, the accuracy of NAME and NAME+SYN is 10.2% and 36.1%, respectively, but increases to 61.8% (NAME+SYN) with additional typo-specific fine-tuning. For terms sourced from HPO vocabularies as unseen synonyms, the NAME model achieved 11.2% accuracy, while the NAME+SYN model achieved 92.7% accuracy. Conclusion Our fine-tuned models demonstrate ability to normalize phenotype terms unseen in the fine-tuning corpus, including misspellings, synonyms, terms from other ontologies, and laymen's terms. Our approach provides a solution for the use of LLMs to identify named medical entities from clinical narratives, while successfully normalizing them to standard concepts in a controlled vocabulary. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Comparing Fine-Tuning, Zero and Few-Shot Strategies with Large Language Models in Hate Speech Detection in English.
- Author
-
Pan, Ronghao, García-Díaz, José Antonio, and Valencia-García, Rafael
- Subjects
LANGUAGE models ,NATURAL language processing ,CONTEXTUAL learning ,HATE speech ,WOMEN immigrants - Abstract
Large Language Models (LLMs) are increasingly demonstrating their ability to understand natural language and solve complex tasks, especially through text generation. One of the relevant capabilities is contextual learning, which involves the ability to receive instructions in natural language or task demonstrations to generate expected outputs for test instances without the need for additional training or gradient updates. In recent years, the popularity of social networking has provided a medium through which some users can engage in offensive and harmful online behavior. In this study, we investigate the ability of different LLMs, ranging from zero-shot and few-shot learning to fine-tuning. Our experiments show that LLMs can identify sexist and hateful online texts using zero-shot and few-shot approaches through information retrieval. Furthermore, it is found that the encoder-decoder model called Zephyr achieves the best results with the fine-tuning approach, scoring 86.811% on the Explainable Detection of Online Sexism (EDOS) test-set and 57.453% on the Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter (HatEval) test-set. Finally, it is confirmed that the evaluated models perform well in hate text detection, as they beat the best result in the HatEval task leaderboard. The error analysis shows that contextual learning had difficulty distinguishing between types of hate speech and figurative language. However, the fine-tuned approach tends to produce many false positives. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. 掩码语言增强表示的对比学习微调和应用.
- Author
-
张德驰 and 万卫兵
- Abstract
Copyright of Journal of Computer Engineering & Applications is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
33. Flat band fine-tuning and its photonic applications.
- Author
-
Danieli, Carlo, Andreanov, Alexei, Leykam, Daniel, and Flach, Sergej
- Subjects
METAL-insulator transitions ,PHOTONIC crystals ,ENERGY bands ,SYMMETRY - Abstract
Flat bands – single-particle energy bands – in tight-binding lattices, aka networks, have attracted attention due to the presence of macroscopic degeneracies and their sensitivity to perturbations. They support compact localized eigenstates protected by destructive interference. This makes them natural candidates for emerging exotic phases and unconventional orders. In this review we consider the recently proposed systematic ways to construct flat band networks based on symmetries or fine-tuning. We then discuss how the construction methods can be further extended, adapted or exploited in presence of perturbations, both single-particle and many-body. This strategy has lead to the discovery of non-perturbative metal-insulator transitions, fractal phases, nonlinear and quantum caging and many-body nonergodic quantum models. We discuss what implications these results may have for the design of fine-tuned nanophotonic systems including photonic crystals, nanocavities, and metasurfaces. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Segment anything model for few-shot medical image segmentation with domain tuning
- Author
-
Weili Shi, Penglong Zhang, Yuqin Li, and Zhengang Jiang
- Subjects
Medical image segmentation ,Convolutional neural network ,Segment anything model ,Fine-tuning ,Few shot segmentation ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Abstract Medical image segmentation constitutes a crucial step in the analysis of medical images, possessing extensive applications and research significance within the realm of medical research and practice. Convolutional neural network achieved great success in medical image segmentation. However, acquiring large labeled datasets remains unattainable due to the substantial expertise and time required for image labeling, as well as heightened patient privacy concerns. To solve scarce medical image data, we propose a powerful network Domain Tuning SAM for Medical images (DT-SAM). We construct an encoder utilizing a parameter-effective fine-tuning strategy and SAM. This strategy selectively updates a small fraction of the weight increments while preserving the majority of the pre-training weights in the SAM encoder, consequently reducing the required number of training samples. Meanwhile, our approach leverages only SAM encoder structure while incorporating a decoder similar to U-Net decoder structure and redesigning skip connections to concatenate encoder-extracted features, which effectively decode the features extracted by the encoder and preserve edge information. We have conducted comprehensive experiments on three publicly available medical image segmentation datasets. The combined experimental results show that our method can effectively perform few shot medical image segmentation. With just one labeled data, achieving a Dice score of 63.51%, a HD of 17.94 and an IoU score of 73.55% on Heart Task, on Prostate Task, an average Dice score of 46.01%, a HD of 10.25 and an IoU score of 65.92% were achieved, and the Dice, HD, and IoU score reaching 88.67%, 10.63, and 90.19% on BUSI. Remarkably, with few training samples, our method consistently outperforms various based on SAM and CNN.
- Published
- 2024
- Full Text
- View/download PDF
35. Enhancing domain-specific text generation for power grid maintenance with P2FT
- Author
-
Yi Yang, Chenhao Li, Binghang Zhu, Wenjie Zheng, Fengda Zhang, and Zhuangzhuang Li
- Subjects
Natural language processing ,Language model ,Power grid domain ,Text generation ,Fine-tuning ,Medicine ,Science - Abstract
Abstract The digitization of operation and maintenance in the intelligent power grid equipment relies on a diverse array of information for smart decision-making. In the domain of intelligent decision generation, proficiency is contingent upon extensive learning from copious amounts of text. This necessitates not only robust processing capabilities but also a high level of specialization. In addressing situations where authorization is lacking, pre-trained language models (PLMs) have already provided ideas when confronted with specialized domains or tasks. In consideration of the complexity of textual content in the field of the power grid, which encompasses a multitude of specialized knowledge and involves an abundance of proprietary terminology, we have undertaken an exploration of pre-trained model specialization using the power grid domain as an example, specifically for the task of generating maintenance strategies. A two-stage fine-tuning approach (P2FT) is employed, utilizing a large-scale pre-training model specifically designed for natural language processing. The efficacy and practical value of this method were evaluated through multiple metrics, juxtaposed with other advanced approaches involving low-parameter or parameter-free fine-tuning methods. Through a meticulous analysis and validation of experimental outcomes, we have corroborated the feasibility and practical application value of employing this approach for pre-trained model specialization. Additionally, it has furnished valuable guidance for text generation within both the Chinese language domain and the power grid domain.
- Published
- 2024
- Full Text
- View/download PDF
36. Construction and preliminary application of large language model for reservoir performance analysis
- Author
-
Huanquan PAN, Jianqiao LIU, Bin GONG, Yiheng ZHU, Junhui BAI, Hu HUANG, Zhengbao FANG, Hongbin JING, Chen LIU, Tie KUANG, Yubo LAN, Tianzhi WANG, Tian XIE, Mingzhe CHENG, Bin QIN, and Yujiang SHEN
- Subjects
reservoir performance analysis ,artificial intelligence large model ,application-specific large language model ,incremental pre-training ,fine-tuning ,subsystems coupling ,Petroleum refining. Petroleum products ,TP690-692.5 - Abstract
A large language model (LLM) is constructed to address the sophisticated demands of data retrieval and analysis, detailed well profiling, computation of key technical indicators, and the solutions to complex problems in reservoir performance analysis (RPA). The LLM is constructed for RPA scenarios with incremental pre-training, fine-tuning, and functional subsystems coupling. Functional subsystem and efficient coupling methods are proposed based on named entity recognition (NER), tool invocation, and Text-to-SQL construction, all aimed at resolving pivotal challenges in developing the specific application of LLMs for RDA. This study conducted a detailed accuracy test on feature extraction models, tool classification models, data retrieval models and analysis recommendation models. The results indicate that these models have demonstrated good performance in various key aspects of reservoir dynamic analysis. The research takes some injection and production well groups in the PK3 Block of the Daqing Oilfield as an example for testing. Testing results show that our model has significant potential and practical value in assisting reservoir engineers with RDA. The research results provide a powerful support to the application of LLM in reservoir performance analysis.
- Published
- 2024
- Full Text
- View/download PDF
37. Advanced fine-tuning procedures to enhance DNN robustness in visual coding for machines
- Author
-
Alban Marie, Karol Desnos, Alexandre Mercat, Luce Morin, Jarno Vanne, and Lu Zhang
- Subjects
Video Coding for Machines (VCM) ,Coding artifacts ,Image and video coding ,Deep neural network (DNN) ,Deep learning ,Fine-tuning ,Electronics ,TK7800-8360 - Abstract
Abstract Video Coding for Machines (VCM) is gaining momentum in applications like autonomous driving, industry manufacturing, and surveillance, where the robustness of machine learning algorithms against coding artifacts is one of the key success factors. This work complements the MPEG/JVET standardization efforts in improving the resilience of deep neural network (DNN)-based machine models against such coding artifacts by proposing the following three advanced fine-tuning procedures for their training: (1) the progressive increase of the distortion strength as the training proceeds; (2) the incorporation of a regularization term in the original loss function to minimize the distance between predictions on compressed and original content; and (3) a joint training procedure that combines the proposed two approaches. These proposals were evaluated against a conventional fine-tuning anchor on two different machine tasks and datasets: image classification on ImageNet and semantic segmentation on Cityscapes. Our joint training procedure is shown to reduce the training time in both cases and still obtain a 2.4% coding gain in image classification and 7.4% in semantic segmentation, whereas a slight increase in training time can bring up to 9.4% better coding efficiency for the segmentation. All these coding gains are obtained without any additional inference or encoding time. As these advanced fine-tuning procedures are standard-compliant, they offer the potential to have a significant impact on visual coding for machine applications.
- Published
- 2024
- Full Text
- View/download PDF
38. Revolutionizing breast ultrasound diagnostics with EfficientNet-B7 and Explainable AI
- Author
-
M. Latha, P. Santhosh Kumar, R. Roopa Chandrika, T. R. Mahesh, V. Vinoth Kumar, and Suresh Guluwadi
- Subjects
Breast ultrasound imaging ,EfficientNet ,Fine-tuning ,Data augmentation ,Image classification ,Deep learning ,Medical technology ,R855-855.5 - Abstract
Abstract Breast cancer is a leading cause of mortality among women globally, necessitating precise classification of breast ultrasound images for early diagnosis and treatment. Traditional methods using CNN architectures such as VGG, ResNet, and DenseNet, though somewhat effective, often struggle with class imbalances and subtle texture variations, leading to reduced accuracy for minority classes such as malignant tumors. To address these issues, we propose a methodology that leverages EfficientNet-B7, a scalable CNN architecture, combined with advanced data augmentation techniques to enhance minority class representation and improve model robustness. Our approach involves fine-tuning EfficientNet-B7 on the BUSI dataset, implementing RandomHorizontalFlip, RandomRotation, and ColorJitter to balance the dataset and improve model robustness. The training process includes early stopping to prevent overfitting and optimize performance metrics. Additionally, we integrate Explainable AI (XAI) techniques, such as Grad-CAM, to enhance the interpretability and transparency of the model’s predictions, providing visual and quantitative insights into the features and regions of ultrasound images influencing classification outcomes. Our model achieves a classification accuracy of 99.14%, significantly outperforming existing CNN-based approaches in breast ultrasound image classification. The incorporation of XAI techniques enhances our understanding of the model’s decision-making process, thereby increasing its reliability and facilitating clinical adoption. This comprehensive framework offers a robust and interpretable tool for the early detection and diagnosis of breast cancer, advancing the capabilities of automated diagnostic systems and supporting clinical decision-making processes.
- Published
- 2024
- Full Text
- View/download PDF
39. A Survey on Stability of Learning with Limited Labelled Data and its Sensitivity to the Effects of Randomness.
- Author
-
Pecher, Branislav, Srba, Ivan, and Bielikova, Maria
- Published
- 2025
- Full Text
- View/download PDF
40. Segment anything model for few-shot medical image segmentation with domain tuning.
- Author
-
Shi, Weili, Zhang, Penglong, Li, Yuqin, and Jiang, Zhengang
- Abstract
Medical image segmentation constitutes a crucial step in the analysis of medical images, possessing extensive applications and research significance within the realm of medical research and practice. Convolutional neural network achieved great success in medical image segmentation. However, acquiring large labeled datasets remains unattainable due to the substantial expertise and time required for image labeling, as well as heightened patient privacy concerns. To solve scarce medical image data, we propose a powerful network Domain Tuning SAM for Medical images (DT-SAM). We construct an encoder utilizing a parameter-effective fine-tuning strategy and SAM. This strategy selectively updates a small fraction of the weight increments while preserving the majority of the pre-training weights in the SAM encoder, consequently reducing the required number of training samples. Meanwhile, our approach leverages only SAM encoder structure while incorporating a decoder similar to U-Net decoder structure and redesigning skip connections to concatenate encoder-extracted features, which effectively decode the features extracted by the encoder and preserve edge information. We have conducted comprehensive experiments on three publicly available medical image segmentation datasets. The combined experimental results show that our method can effectively perform few shot medical image segmentation. With just one labeled data, achieving a Dice score of 63.51%, a HD of 17.94 and an IoU score of 73.55% on Heart Task, on Prostate Task, an average Dice score of 46.01%, a HD of 10.25 and an IoU score of 65.92% were achieved, and the Dice, HD, and IoU score reaching 88.67%, 10.63, and 90.19% on BUSI. Remarkably, with few training samples, our method consistently outperforms various based on SAM and CNN. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
41. Visual Tuning.
- Author
-
Yu, Bruce X.B., Chang, Jianlong, Wang, Haixin, Liu, Lingbo, Wang, Shijie, Wang, Zhiyu, Lin, Junfan, Xie, Lingxi, Li, Haojie, Lin, Zhouchen, Tian, Qi, and Chen, Chang Wen
- Published
- 2024
- Full Text
- View/download PDF
42. Enhancing recognition and interpretation of functional phenotypic sequences through fine-tuning pre-trained genomic models
- Author
-
Duo Du, Fan Zhong, and Lei Liu
- Subjects
Genomic sequences ,Genotype-phenotype ,Fine-tuning ,HERV ,Motif ,Medicine - Abstract
Abstract Background Decoding human genomic sequences requires comprehensive analysis of DNA sequence functionality. Through computational and experimental approaches, researchers have studied the genotype-phenotype relationship and generate important datasets that help unravel complicated genetic blueprints. Thus, the recently developed artificial intelligence methods can be used to interpret the functions of those DNA sequences. Methods This study explores the use of deep learning, particularly pre-trained genomic models like DNA_bert_6 and human_gpt2-v1, in interpreting and representing human genome sequences. Initially, we meticulously constructed multiple datasets linking genotypes and phenotypes to fine-tune those models for precise DNA sequence classification. Additionally, we evaluate the influence of sequence length on classification results and analyze the impact of feature extraction in the hidden layers of our model using the HERV dataset. To enhance our understanding of phenotype-specific patterns recognized by the model, we perform enrichment, pathogenicity and conservation analyzes of specific motifs in the human endogenous retrovirus (HERV) sequence with high average local representation weight (ALRW) scores. Results We have constructed multiple genotype-phenotype datasets displaying commendable classification performance in comparison with random genomic sequences, particularly in the HERV dataset, which achieved binary and multi-classification accuracies and F1 values exceeding 0.935 and 0.888, respectively. Notably, the fine-tuning of the HERV dataset not only improved our ability to identify and distinguish diverse information types within DNA sequences but also successfully identified specific motifs associated with neurological disorders and cancers in regions with high ALRW scores. Subsequent analysis of these motifs shed light on the adaptive responses of species to environmental pressures and their co-evolution with pathogens. Conclusions These findings highlight the potential of pre-trained genomic models in learning DNA sequence representations, particularly when utilizing the HERV dataset, and provide valuable insights for future research endeavors. This study represents an innovative strategy that combines pre-trained genomic model representations with classical methods for analyzing the functionality of genome sequences, thereby promoting cross-fertilization between genomics and artificial intelligence.
- Published
- 2024
- Full Text
- View/download PDF
43. Fast and Accurate Pupil Estimation Through Semantic Segmentation Fine-Tuning on a Shallow Convolutional Backbone
- Author
-
Wattanapong Kurdthongmee and Piyadhida Kurdthongmee
- Subjects
pupil estimation ,semantic segmentation ,shallow convolutional neural network ,fine-tuning ,deep learning. ,Technological innovations. Automation ,HD45-45.2 - Abstract
In the diverse realms of computer vision, psychology, biometrics, medicine, and robotics, the accurate estimation of pupil size and position holds paramount importance for applications like eye tracking, medical diagnostics, and facial recognition. Traditional pupil estimation techniques often grapple with speed and error issues, impeding their applicability in real-world scenarios. To address this challenge, our study introduces an innovative approach that significantly enhances both the speed and accuracy of pupil estimation. This method hinges on the fine-tuning of a pre-trained semantic segmentation model integrated with a shallow convolutional neural network (CNN) backbone. Our methodology employs a dual-phase process: initially leveraging a robust pre-trained semantic segmentation model, subsequently refined through targeted fine-tuning using a diverse collection of eye images. This process intricately learns pupil characteristics, substantially elevating detection precision. The incorporation of a shallow CNN backbone streamlines the model, ensuring rapid processing suitable for real-time applications. The novelty of our approach lies in its adept handling of varying lighting and camera conditions, establishing new benchmarks in both speed and accuracy, as evidenced by our experimental findings. This advancement marks a significant leap in pupil estimation technology, offering a practical, efficient solution with far-reaching implications in several key technological domains. Doi: 10.28991/HIJ-2024-05-02-016 Full Text: PDF
- Published
- 2024
- Full Text
- View/download PDF
44. Fine-tuned protein-lipid interactions in biological membranes: exploration and implications of the ORMDL-ceramide negative feedback loop in the endoplasmic reticulum.
- Author
-
Dingjan, Tamir and Futerman, Anthony H.
- Subjects
PROTEIN-lipid interactions ,BIOLOGICAL membranes ,ENDOPLASMIC reticulum ,BILAYER lipid membranes ,MEMBRANE proteins - Abstract
Biological membranes consist of a lipid bilayer in which integral membrane proteins are embedded. Based on the compositional complexity of the lipid species found in membranes, and on their specific and selective interactions with membrane proteins, we recently suggested that membrane bilayers can be best described as "finely-tuned molecular machines." We now discuss one such set of lipid-protein interactions by describing a negative feedback mechanism operating in the de novo sphingolipid biosynthetic pathway, which occurs in the membrane of the endoplasmic reticulum, and describe the atomic interactions between the first enzyme in the pathway, namely serine palmitoyl transferase, and the product of the fourth enzyme in the pathway, ceramide. We explore how hydrogen-bonding and hydrophobic interactions formed between Asn13 and Phe63 in the serine palmitoyl transferase complex and ceramide can influence the ceramide content of the endoplasmic reticulum. This example of finely-tuned biochemical interactions raises intriguing mechanistic questions about how sphingolipids and their biosynthetic enzymes could have evolved, particularly in light of their metabolic co-dependence. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Brain Tumor Detection and Classification Using an Optimized Convolutional Neural Network.
- Author
-
Aamir, Muhammad, Namoun, Abdallah, Munir, Sehrish, Aljohani, Nasser, Alanazi, Meshari Huwaytim, Alsahafi, Yaser, and Alotibi, Faris
- Subjects
- *
CONVOLUTIONAL neural networks , *CANCER diagnosis , *TUMOR classification , *DIAGNOSIS , *FEATURE extraction , *BRAIN tumors - Abstract
Brain tumors are a leading cause of death globally, with numerous types varying in malignancy, and only 12% of adults diagnosed with brain cancer survive beyond five years. This research introduces a hyperparametric convolutional neural network (CNN) model to identify brain tumors, with significant practical implications. By fine-tuning the hyperparameters of the CNN model, we optimize feature extraction and systematically reduce model complexity, thereby enhancing the accuracy of brain tumor diagnosis. The critical hyperparameters include batch size, layer counts, learning rate, activation functions, pooling strategies, padding, and filter size. The hyperparameter-tuned CNN model was trained on three different brain MRI datasets available at Kaggle, producing outstanding performance scores, with an average value of 97% for accuracy, precision, recall, and F1-score. Our optimized model is effective, as demonstrated by our methodical comparisons with state-of-the-art approaches. Our hyperparameter modifications enhanced the model performance and strengthened its capacity for generalization, giving medical practitioners a more accurate and effective tool for making crucial judgments regarding brain tumor diagnosis. Our model is a significant step in the right direction toward trustworthy and accurate medical diagnosis, with practical implications for improving patient outcomes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Adaptive Dynamic Learning Rate Optimization Technique for Colorectal Cancer Diagnosis Based on Histopathological Image Using EfficientNet-B0 Deep Learning Model.
- Author
-
Abd El-Ghany, Sameh, Mahmood, Mahmood A., and Abd El-Aziz, A. A.
- Subjects
COLORECTAL cancer ,PATIENTS' attitudes ,MEDICAL errors ,TRANSFER of training ,IMAGE analysis - Abstract
The elevated death rate associated with colorectal cancer (CRC) continues to impact human life worldwide. It helps prevent disease and extend human life by being detected early. CRC is frequently diagnosed and detected through histopathological examination. The decision is based on clinicians' subjective perceptions and daily image analyses. Histological image (HI) classification is difficult because HIs contain multiple tissue types and characteristics. Therefore, deep learning (DL) models are employed to classify different kinds of CRC HIs. Therefore, to increase the efficiency of the CRC diagnostic procedure from HIs, we propose a fine-tuning model for the CRC diagnosis process with the EfficientNet-B0 DL model. The proposed model performs a multi-classification for HIs. It uses an adaptive learning rate (ALR) to overcome the overfitting problem caused by using the static learning rate (SLR) and to enhance the performance of detecting the CRC. The ALR compares the training loss value at the beginning of each epoch. If it is smaller, we increase the ALR; if it is larger, we decrease it. Our proposed model speeds diagnosis, reduces diagnostic costs, and reduces medical errors; hence, it enhances the diagnostic procedure from the patient's perspective. We trained and evaluated the proposed model over the two datasets (NCT-CRC-HE-100K and CRC-VAL-HE-7K). Normalization and scaling methods were used to pre-process the NCT-CRC-HE-100K dataset. The EfficientNet-B0 model attained accuracy, sensitivity, specificity, precision, and an F1-score of 99.87%, 99.64%, 99.95%, 99.62%, and 99.63%, respectively when applied to the NCT-CRC-HE-100K dataset. On the CRC-VAL-HE-7K dataset, the EfficientNet-B0 model achieved 99%, 94.52%, 99.45%, 94.41%, and 94.36% for accuracy, sensitivity, specificity, precision, and F1-score, respectively. As a result, the EfficientNet-B0 model outperforms the state of the art in this field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. MédicoBERT: A Medical Language Model for Spanish Natural Language Processing Tasks with a Question-Answering Application Using Hyperparameter Optimization.
- Author
-
Padilla Cuevas, Josué, Reyes-Ortiz, José A., Cuevas-Rasgado, Alma D., Mora-Gutiérrez, Román A., and Bravo, Maricela
- Subjects
LANGUAGE models ,MEDICAL language ,MEDICAL terminology ,NATURAL languages ,SPANISH language - Abstract
The increasing volume of medical information available in digital format presents a significant challenge for researchers seeking to extract relevant information. Manually analyzing voluminous data is a time-consuming process that constrains researchers' productivity. In this context, innovative and intelligent computational approaches to information search, such as large language models (LLMs), offer a promising solution. LLMs understand natural language questions and respond accurately to complex queries, even in the specialized domain of medicine. This paper presents MédicoBERT, a medical language model in Spanish developed by adapting a general domain language model (BERT) to medical terminology and vocabulary related to diseases, treatments, symptoms, and medications. The model was pre-trained with 3 M medical texts containing 1.1 B words. Furthermore, with promising results, MédicoBERT was adapted and evaluated to answer medical questions in Spanish. The question-answering (QA) task was fine-tuned using a Spanish corpus of over 34,000 medical questions and answers. A search was then conducted to identify the optimal hyperparameter configuration using heuristic methods and nonlinear regression models. The evaluation of MédicoBERT was carried out using metrics such as perplexity to measure the adaptation of the language model to the medical vocabulary in Spanish, where it obtained a value of 4.28, and the average F1 metric for the task of answering medical questions, where it obtained a value of 62.35%. The objective of MédicoBERT is to provide support for research in the field of natural language processing (NLP) in Spanish, with a particular emphasis on applications within the medical domain. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Enhancing recognition and interpretation of functional phenotypic sequences through fine-tuning pre-trained genomic models.
- Author
-
Du, Duo, Zhong, Fan, and Liu, Lei
- Subjects
- *
DNA sequencing , *NUCLEOTIDE sequence , *ARTIFICIAL intelligence , *PHENOTYPES , *DEEP learning - Abstract
Background: Decoding human genomic sequences requires comprehensive analysis of DNA sequence functionality. Through computational and experimental approaches, researchers have studied the genotype-phenotype relationship and generate important datasets that help unravel complicated genetic blueprints. Thus, the recently developed artificial intelligence methods can be used to interpret the functions of those DNA sequences. Methods: This study explores the use of deep learning, particularly pre-trained genomic models like DNA_bert_6 and human_gpt2-v1, in interpreting and representing human genome sequences. Initially, we meticulously constructed multiple datasets linking genotypes and phenotypes to fine-tune those models for precise DNA sequence classification. Additionally, we evaluate the influence of sequence length on classification results and analyze the impact of feature extraction in the hidden layers of our model using the HERV dataset. To enhance our understanding of phenotype-specific patterns recognized by the model, we perform enrichment, pathogenicity and conservation analyzes of specific motifs in the human endogenous retrovirus (HERV) sequence with high average local representation weight (ALRW) scores. Results: We have constructed multiple genotype-phenotype datasets displaying commendable classification performance in comparison with random genomic sequences, particularly in the HERV dataset, which achieved binary and multi-classification accuracies and F1 values exceeding 0.935 and 0.888, respectively. Notably, the fine-tuning of the HERV dataset not only improved our ability to identify and distinguish diverse information types within DNA sequences but also successfully identified specific motifs associated with neurological disorders and cancers in regions with high ALRW scores. Subsequent analysis of these motifs shed light on the adaptive responses of species to environmental pressures and their co-evolution with pathogens. Conclusions: These findings highlight the potential of pre-trained genomic models in learning DNA sequence representations, particularly when utilizing the HERV dataset, and provide valuable insights for future research endeavors. This study represents an innovative strategy that combines pre-trained genomic model representations with classical methods for analyzing the functionality of genome sequences, thereby promoting cross-fertilization between genomics and artificial intelligence. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. RuGECToR: Rule-Based Neural Network Model for Russian Language Grammatical Error Correction.
- Author
-
Khabutdinov, I. A., Chashchin, A. V., Grabovoy, A. V., Kildyakov, A. S., and Chekhovich, U. V.
- Subjects
- *
ARTIFICIAL neural networks , *RUSSIAN language - Abstract
Grammatical error correction is one of the core natural language processing tasks. Presently, the open-source state-of-the-art sequence tagging for English is the GECToR model. For Russian, this problem does not have equally effective solutions due to the lack of annotated datasets, which motivated the current research. In this paper, we describe the process of creating a synthetic dataset and training the model on it. The GECToR architecture is adapted for the Russian language, and it is called RuGECToR. This architecture is chosen because, unlike the sequence-to-sequence approach, it is easy to interpret and does not require a lot of training data. The aim is to train the model in such a way that it generalizes the morphological properties of the language rather than adapts to a specific training sample. The presented model achieves the quality of 82.5 in the metric on synthetic data and 22.2 on the RULEC dataset, which was not used at the training stage. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Research status and application of artificial intelligence large models in the oil and gas industry.
- Author
-
LIU, He, REN, Yili, LI, Xin, DENG, Yue, WANG, Yongtao, CAO, Qianwen, DU, Jinyang, LIN, Zhiwei, and WANG, Wenjie
- Subjects
ARTIFICIAL intelligence ,PETROLEUM industry ,DEEP learning ,ARTIFICIAL neural networks ,NATURAL language processing ,COMPUTER vision ,ELECTRONIC data processing - Abstract
This article elucidates the concept of large model technology, summarizes the research status of large model technology both domestically and internationally, provides an overview of the application status of large models in vertical industries, outlines the challenges and issues confronted in applying large models in the oil and gas sector, and offers prospects for the application of large models in the oil and gas industry. The existing large models can be briefly divided into three categories: large language models, visual large models, and multimodal large models. The application of large models in the oil and gas industry is still in its infancy. Based on open-source large language models, some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation. Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models. A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation, as well as core analysis. The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models, high research and development costs, and poor algorithm autonomy and control. The application of large models should be guided by the needs of oil and gas business, taking the application of large models as an opportunity to improve data lifecycle management, enhance data governance capabilities, promote the construction of computing power, strengthen the construction of "artificial intelligence + energy" composite teams, and boost the autonomy and control of large model technology. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.