"Michael J. Hill" / Journal: applied sciences (2076-3417) - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Michael J. Hill"' showing total 25 results

Start Over "Michael J. Hill" Journal applied sciences (2076-3417)

25 results on '"Michael J. Hill"'

1. Autocorrelation Matrix Knowledge Distillation: A Task-Specific Distillation Method for BERT Models.

Author: Zhang, Kai, Li, Jinqiu, Wang, Bingqian, and Meng, Haoran
Subjects: LANGUAGE models, NATURAL language processing, DISTILLATION, GLUE, TEACHERS
Abstract: Pre-trained language models perform well in various natural language processing tasks. However, their large number of parameters poses significant challenges for edge devices with limited resources, greatly limiting their application in practical deployment. This paper introduces a simple and efficient method called Autocorrelation Matrix Knowledge Distillation (AMKD), aimed at improving the performance of smaller BERT models for specific tasks and making them more applicable in practical deployment scenarios. The AMKD method effectively captures the relationships between features using the autocorrelation matrix, enabling the student model to learn not only the performance of individual features from the teacher model but also the correlations among these features. Additionally, it addresses the issue of dimensional mismatch between the hidden states of the student and teacher models. Even in cases where the dimensions are smaller, AMKD retains the essential features from the teacher model, thereby minimizing information loss. Experimental results demonstrate that BERTTINY-AMKD outperforms traditional distillation methods and baseline models, achieving an average score of 83.6% on GLUE tasks. This represents a 4.1% improvement over BERTTINY-KD and exceeds the performance of BERT4-PKD and DistilBERT4 by 2.6% and 3.9%, respectively. Moreover, despite having only 13.3% of the parameters of BERTBASE, the BERTTINY-AMKD model retains over 96.3% of the performance of the teacher model, BERTBASE. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Machine Learning Approach for Arabic Handwritten Recognition.

Author: Mutawa, A. M., Allaho, Mohammad Y., and Al-Hajeri, Monirah
Subjects: ARTIFICIAL neural networks, NATURAL language processing, DEEP learning, MACHINE learning, FEATURE extraction, HANDWRITING recognition (Computer science), TEXT recognition
Abstract: Text recognition is an important area of the pattern recognition field. Natural language processing (NLP) and pattern recognition have been utilized efficiently in script recognition. Much research has been conducted on handwritten script recognition. However, the research on the Arabic language for handwritten text recognition received little attention compared with other languages. Therefore, it is crucial to develop a new model that can recognize Arabic handwritten text. Most of the existing models used to acknowledge Arabic text are based on traditional machine learning techniques. Therefore, we implemented a new model using deep machine learning techniques by integrating two deep neural networks. In the new model, the architecture of the Residual Network (ResNet) model is used to extract features from raw images. Then, the Bidirectional Long Short-Term Memory (BiLSTM) and connectionist temporal classification (CTC) are used for sequence modeling. Our system improved the recognition rate of Arabic handwritten text compared to other models of a similar type with a character error rate of 13.2% and word error rate of 27.31%. In conclusion, the domain of Arabic handwritten recognition is advancing swiftly with the use of sophisticated deep learning methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Extracting Sentence Embeddings from Pretrained Transformer Models.

Author: Stankevičius, Lukas and Lukoševičius, Mantas
Subjects: LANGUAGE models, NATURAL language processing, TRANSFORMER models, NATURAL languages, TEST methods
Abstract: Pre-trained transformer models shine in many natural language processing tasks and therefore are expected to bear the representation of the input sentence or text meaning. These sentence-level embeddings are also important in retrieval-augmented generation. But do commonly used plain averaging or prompt templates sufficiently capture and represent the underlying meaning? After providing a comprehensive review of existing sentence embedding extraction and refinement methods, we thoroughly test different combinations and our original extensions of the most promising ones on pretrained models. Namely, given 110 M parameters, BERT's hidden representations from multiple layers, and many tokens, we try diverse ways to extract optimal sentence embeddings. We test various token aggregation and representation post-processing techniques. We also test multiple ways of using a general Wikitext dataset to complement BERT's sentence embeddings. All methods are tested on eight Semantic Textual Similarity (STS), six short text clustering, and twelve classification tasks. We also evaluate our representation-shaping techniques on other static models, including random token representations. Proposed representation extraction methods improve the performance on STS and clustering tasks for all models considered. Very high improvements for static token-based models, especially random embeddings for STS tasks, almost reach the performance of BERT-derived representations. Our work shows that the representation-shaping techniques significantly improve sentence embeddings extracted from BERT-based and simple baseline models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. MicroBERT: Distilling MoE-Based Knowledge from BERT into a Lighter Model.

Author: Zheng, Dashun, Li, Jiaxuan, Yang, Yunchu, Wang, Yapeng, and Pang, Patrick Cheong-Iao
Subjects: LANGUAGE models, GENERATIVE adversarial networks, KNOWLEDGE transfer, DISTILLATION, GLUE
Abstract: Natural language-processing tasks have been improved greatly by large language models (LLMs). However, numerous parameters make their execution computationally expensive and difficult on resource-constrained devices. For this problem, as well as maintaining accuracy, some techniques such as distillation and quantization have been proposed. Unfortunately, current methods fail to integrate model pruning with downstream tasks and overlook sentence-level semantic modeling, resulting in reduced efficiency of distillation. To alleviate these limitations, we propose a novel distilled lightweight model for BERT named MicroBERT. This method can transfer the knowledge contained in the "teacher" BERT model to a "student" BERT model. The sentence-level feature alignment loss (FAL) distillation mechanism, guided by Mixture-of-Experts (MoE), captures comprehensive contextual semantic knowledge from the "teacher" model to enhance the "student" model's performance while reducing its parameters. To make the outputs of "teacher" and "student" models comparable, we introduce the idea of a generative adversarial network (GAN) to train a discriminator. Our experimental results based on four datasets show that all steps of our distillation mechanism are effective, and the MicroBERT (101.14%) model outperforms TinyBERT (99%) by 2.24% in terms of average distillation reductions in various tasks on the GLUE dataset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. LCAS-DetNet: A Ship Target Detection Network for Synthetic Aperture Radar Images.

Author: Liu, Junlin, Liao, Dingyi, Wang, Xianyao, Li, Jun, Yang, Bing, and Chen, Guanyu
Subjects: SYNTHETIC aperture radar, SYNTHETIC apertures, SPECKLE interference, SPATIAL arrangement, SHIPS, SPATIAL ability
Abstract: Monitoring ships on water surfaces encounters obstacles such as weather conditions, sunlight, and water ripples, posing significant challenges in accurately detecting target ships in real time. Synthetic Aperture Radar (SAR) offers a viable solution for real-time ship detection, unaffected by cloud coverage, precipitation, or light levels. However, SAR images are often affected by speckle noise, salt-and-pepper noise, and water surface ripple interference. This study introduces LCAS-DetNet, a Multi-Location Cross-Attention Ship Detection Network tailored for the ships in SAR images. Modeled on the YOLO architecture, LCAS-DetNet comprises a feature extractor, an intermediate layer ("Neck"), and a detection head. The feature extractor includes the computation of Multi-Location Cross-Attention (MLCA) for precise extraction of ship features at multiple scales. Incorporating both local and global branches, MLCA bolsters the network's ability to discern spatial arrangements and identify targets via a cross-attention mechanism. Each branch utilizes Multi-Location Attention (MLA) and calculates pixel-level correlations in both channel and spatial dimensions, further combating the impact of salt-and-pepper noise on the distribution of objective ship pixels. The feature extractor integrates downsampling and MLCA stacking, enhanced with residual connections and Patch Embedding, to improve the network's multi-scale spatial recognition capabilities. As the network deepens, we consider this structure to be cascaded and multi-scale, providing the network with a richer receptive field. Additionally, we introduce a loss function based on Wise-IoUv3 to address the influence of label quality on the gradient updates. The effectiveness of our network was validated on the HRSID and SSDD datasets, where it achieved state-of-the-art performance: a 96.59% precision on HRSID and 97.52% on SSDD. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. A Historical Survey of Advances in Transformer Architectures.

Author: Sajun, Ali Reza, Zualkernan, Imran, and Sankalpa, Donthi
Subjects: LANGUAGE models, TRANSFORMER models, DEEP learning, COMPUTER vision, MACHINE learning
Abstract: In recent times, transformer-based deep learning models have risen in prominence in the field of machine learning for a variety of tasks such as computer vision and text generation. Given this increased interest, a historical outlook at the development and rapid progression of transformer-based models becomes imperative in order to gain an understanding of the rise of this key architecture. This paper presents a survey of key works related to the early development and implementation of transformer models in various domains such as generative deep learning and as backbones of large language models. Previous works are classified based on their historical approaches, followed by key works in the domain of text-based applications, image-based applications, and miscellaneous applications. A quantitative and qualitative analysis of the various approaches is presented. Additionally, recent directions of transformer-related research such as those in the biomedical and timeseries domains are discussed. Finally, future research opportunities, especially regarding the multi-modality and optimization of the transformer training process, are identified. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Natural Language Processing in Knowledge-Based Support for Operator Assistance.

Author: Besharati Moghaddam, Fatemeh, Lopez, Angel J., De Vuyst, Stijn, and Gautama, Sidharta
Subjects: NATURAL language processing, LINGUISTICS, MANUFACTURING industries
Abstract: Manufacturing industry faces increasing complexity in the performance of assembly tasks due to escalating demand for complex products with a greater number of variations. Operators require robust assistance systems to enhance productivity, efficiency, and safety. However, existing support services often fall short when operators encounter unstructured open questions and incomplete sentences due to primarily relying on procedural digital work instructions. This draws attention to the need for practical application of natural language processing (NLP) techniques. This study addresses these challenges by introducing a domain-specific dataset tailored to assembly tasks, capturing unique language patterns and linguistic characteristics. We explore strategies to process declarative and imperative sentences, including incomplete ones, effectively. Thorough evaluation of three pre-trained NLP libraries—NLTK, SPACY, and Stanford—is performed to assess their effectiveness in handling assembly-related concepts and ability to address the domain's distinctive challenges. Our findings demonstrate the efficient performance of these open-source NLP libraries in accurately handling assembly-related concepts. By providing valuable insights, our research contributes to developing intelligent operator assistance systems, bridging the gap between NLP techniques and the assembly domain within manufacturing industry. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Research and Development of a Modern Deep Learning Model for Emotional Analysis Management of Text Data.

Author: Bashynska, Iryna, Sarafanov, Mykhailo, and Manikaeva, Olga
Subjects: DEEP learning, AFFECTIVE forecasting (Psychology), SENTIMENT analysis, DISCONTENT, EMBARRASSMENT, AVERSION
Abstract: There are many ways people express their reactions in the media. Text data is one of them, for example, comments, reviews, blog posts, messages, etc. Analysis of emotions expressed there is in high demand nowadays for various purposes. This research provides a method of performing sentiment analysis of text information using machine learning. The authors trained a classifier based on the BERT encoder, which recognizes emotions in text messages in English written in chat style. To handle raw chat-style messages, authors developed an enhanced text standardization layer. The list of emotions identified includes admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, and surprise. The model solves the problem of multiclass multilabel text classification, which means that more than one class can be predicted from one piece of text. The authors trained the model on the GoEmotions dataset, which consists of 54,263 text comments from Reddit. The model reached a macro-averaged F1-Score of 0.50704 in emotions prediction and 0.7349 in sentiments prediction on the testing dataset. The presented model increased the quality of emotions prediction by 10.2% and sentiments prediction by 6.5% in comparison to the baseline approach. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. An Improved Nested Named-Entity Recognition Model for Subject Recognition Task under Knowledge Base Question Answering.

Author: Wang, Ziming, Xu, Xirong, Li, Xinzi, Li, Haochen, Wei, Xiaopeng, and Huang, Degen
Subjects: KNOWLEDGE base, NATURAL language processing
Abstract: In the subject recognition (SR) task under Knowledge Base Question Answering (KBQA), a common method is by training and employing a general flat Named-Entity Recognition (NER) model. However, it is not effective and robust enough in the case that the recognized entity could not be strictly matched to any subjects in the Knowledge Base (KB). Compared to flat NER models, nested NER models show more flexibility and robustness in general NER tasks, whereas it is difficult to employ a nested NER model directly in an SR task. In this paper, we take advantage of features of a nested NER model and propose an Improved Nested NER Model (INNM) for the SR task under KBQA. In our model, each question token is labeled as either an entity token, a start token, or an end token by a modified nested NER model based on semantics. Then, entity candidates would be generated based on such labels, and an approximate matching strategy is employed to score all subjects in the KB based on string similarity to find the best-matched subject. Experimental results show that our model is effective and robust to both single-relation questions and complex questions, which outperforms the baseline flat NER model by a margin of 3.3% accuracy on the SimpleQuestions dataset and a margin of 11.0% accuracy on the WebQuestionsSP dataset. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

10. F-ALBERT: A Distilled Model from a Two-Time Distillation System for Reduced Computational Complexity in ALBERT Model.

Author: Kim, Kyeong-Hwan and Jeong, Chang-Sung
Subjects: COMPUTATIONAL complexity, LANGUAGE models, DISTILLATION, TRANSFORMER models, NATURAL language processing
Abstract: Recently, language models based on the Transformer architecture have been predominantly used in AI natural language processing. These models, which have been proven to perform better with more parameters, have led to a significant increase in model size and computational load. ALBERT solves this problem by significantly reducing the number of parameters it retains by repeatedly reusing parameters. Although ALBERT significantly reduces the parameters it maintains, it requires a computational load similar to the original language model due to the reuse process. In this study, we develop a distillation system that decreases the number of times the ALBERT model reuses parameters and progressively reduces the parameters being reused. We propose a representation in this distillation system that can effectively distill the knowledge of the original model and develop a new architecture with reduced computation. Through this system, F-ALBERT, which had about half the computational load compared to the ALBERT model, restored about 98% of the performance of the original model on the GLUE benchmark. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

11. EvoText: Enhancing Natural Language Generation Models via Self-Escalation Learning for Up-to-Date Knowledge and Improved Performance.

Author: Yuan, Zhengqing, Xue, Huiwen, Zhang, Chao, and Liu, Yongming
Subjects: NATURAL languages, COMPUTER vision, NATURAL language processing, AUTOREGRESSIVE models
Abstract: In recent years, pretrained models have been widely used in various fields, including natural language understanding, computer vision, and natural language generation. However, the performance of these language generation models is highly dependent on the model size and the dataset size. While larger models excel in some aspects, they cannot learn up-to-date knowledge and are relatively difficult to relearn. In this paper, we introduce EvoText, a novel training method that enhances the performance of any natural language generation model without requiring additional datasets during the entire training process (although a prior dataset is necessary for pretraining). EvoText employs two models: G, a text generation model, and D, a model that can determine whether the data generated by G is legitimate. Initially, the fine-tuned D model serves as the knowledge base. The text generated by G is then input to D to determine whether it is legitimate. Finally, G is fine-tuned based on D's output. EvoText enables the model to learn up-to-date knowledge through a self-escalation process that builds on a priori knowledge. When EvoText needs to learn something new, it simply fine-tunes the D model. Our approach applies to autoregressive language modeling for all Transformer classes. With EvoText, eight models achieved stable improvements in seven natural language processing tasks without any changes to the model structure. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

12. PrivacyGLUE: A Benchmark Dataset for General Language Understanding in Privacy Policies.

Author: Shankar, Atreya, Waldis, Andreas, Bless, Christof, Andueza Rodriguez, Maria, and Mazzola, Luca
Subjects: LANGUAGE models, PRIVACY, LANGUAGE research, NATURAL language processing, LEGAL language, TASK analysis
Abstract: Featured Application: We propose the PrivacyGLUE benchmark to compare and contrast NLP models' general language understanding in the privacy language domain. This will help practitioners in selecting understanding models for applications within the privacy language domain. Benchmarks for general language understanding have been rapidly developing in recent years of NLP research, particularly because of their utility in choosing strong-performing models for practical downstream applications. While benchmarks have been proposed in the legal language domain, virtually no such benchmarks exist for privacy policies despite their increasing importance in modern digital life. This could be explained by privacy policies falling under the legal language domain, but we find evidence to the contrary that motivates a separate benchmark for privacy policies. Consequently, we propose PrivacyGLUE as the first comprehensive benchmark of relevant and high-quality privacy tasks for measuring general language understanding in the privacy language domain. Furthermore, we release performances from multiple transformer language models and perform model–pair agreement analysis to detect tasks where models benefited from domain specialization. Our findings show the importance of in-domain pretraining for privacy policies. We believe PrivacyGLUE can accelerate NLP research and improve general language understanding for humans and AI algorithms in the privacy language domain, thus supporting the adoption and acceptance rates of solutions based on it. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

13. Evaluating Deep Learning Techniques for Natural Language Inference.

Author: Eleftheriadis, Petros, Perikos, Isidoros, and Hatzilygeroudis, Ioannis
Subjects: NATURAL languages, DEEP learning, INFERENCE (Logic), NATURAL language processing, WRITTEN communication
Abstract: Natural language inference (NLI) is one of the most important natural language understanding (NLU) tasks. NLI expresses the ability to infer information during spoken or written communication. The NLI task concerns the determination of the entailment relation of a pair of sentences, called the premise and hypothesis. If the premise entails the hypothesis, the pair is labeled as an "entailment". If the hypothesis contradicts the premise, the pair is labeled a "contradiction", and if there is not enough information to infer a relationship, the pair is labeled as "neutral". In this paper, we present experimentation results of using modern deep learning (DL) models, such as the pre-trained transformer BERT, as well as additional models that relay on LSTM networks, for the NLI task. We compare five DL models (and variations of them) on eight widely used NLI datasets. We trained and fine-tuned the hyperparameters for each model to achieve the best performance for each dataset, where we achieved some state-of-the-art results. Next, we examined the inference ability of the models on the BreakingNLI dataset, which evaluates the model's ability to recognize lexical inferences. Finally, we tested the generalization power of our models across all the NLI datasets. The results of the study are quite interesting. In the first part of our experimentation, the results indicate the performance advantage of the pre-trained transformers BERT, RoBERTa, and ALBERT over other deep learning models. This became more evident when they were tested on the BreakingNLI dataset. We also see a pattern of improved performance when the larger models are used. However, ALBERT, given that it has 18 times fewer parameters, achieved quite remarkable performance. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

14. Ensemble-NQG-T5: Ensemble Neural Question Generation Model Based on Text-to-Text Transfer Transformer.

Author: Hwang, Myeong-Ha, Shin, Jikang, Seo, Hojin, Im, Jeong-Seon, Cho, Hee, and Lee, Chun-Kwon
Subjects: ROBOTIC process automation, CHATBOTS, DEEP learning, NATURAL language processing, INTERNET of things, HUMAN resources departments
Abstract: Deep learning chatbot research and development is exploding recently to offer customers in numerous industries personalized services. However, human resources are used to create a learning dataset for a deep learning chatbot. In order to augment this, the idea of neural question generation (NQG) has evolved, although it has restrictions on how questions can be expressed in different ways and has a finite capacity for question generation. In this paper, we propose an ensemble-type NQG model based on the text-to-text transfer transformer (T5). Through the proposed model, the number of generated questions for each single NQG model can be greatly increased by considering the mutual similarity and the quality of the questions using the soft-voting method. For the training of the soft-voting algorithm, the evaluation score and mutual similarity score weights based on the context and the question–answer (QA) dataset are used as the threshold weight. Performance comparison results with existing T5-based NQG models using the SQuAD 2.0 dataset demonstrate the effectiveness of the proposed method for QG. The implementation of the proposed ensemble model is anticipated to span diverse industrial fields, including interactive chatbots, robotic process automation (RPA), and Internet of Things (IoT) services in the future. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

15. Vision-Guided Object Recognition and 6D Pose Estimation System Based on Deep Neural Network for Unmanned Aerial Vehicles towards Intelligent Logistics.

Author: Luo, Sijin, Liang, Yu, Luo, Zhehao, Liang, Guoyuan, Wang, Can, and Wu, Xinyu
Subjects: ARTIFICIAL neural networks, POSE estimation (Computer vision), OBJECT recognition (Computer vision), DRONE aircraft, HOUSEHOLD employees, GEOSTATIONARY satellites, WORKFLOW, LABOR costs, THREE-dimensional imaging
Abstract: Unmanned aerial vehicle (UAV) express delivery is facing a period of rapid development and continues to promote the aviation logistics industry due to its advantages of elevated delivery efficiency and low labor costs. Automatic detection, localization, and estimation of 6D poses of targets in dynamic environments are key prerequisites for UAV intelligent logistics. In this study, we proposed a novel vision system based on deep neural networks to locate targets and estimate their 6D pose parameters from 2D color images and 3D point clouds captured by an RGB-D sensor mounted on a UAV. The workflow of this system can be summarized as follows: detect the targets and locate them, separate the object region from the background using a segmentation network, and estimate the 6D pose parameters from a regression network. The proposed system provides a solid foundation for various complex operations for UAVs. To better verify the performance of the proposed system, we built a small dataset called SIAT comprising some household staff. Comparative experiments with several state-of-the-art networks on the YCB-Video dataset and SIAT dataset verified the effectiveness, robustness, and superior performance of the proposed method, indicating its promising applications in UAV-based delivery tasks. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

16. Compressing BERT for Binary Text Classification via Adaptive Truncation before Fine-Tuning.

Author: Zhang, Xin, Fan, Jing, and Hei, Mengzhe
Subjects: SPAM email, LOTTERY tickets, SENTIMENT analysis, NATURAL languages, ECOLOGICAL impact, CLASSIFICATION
Abstract: Featured Application: The proposed approach can be used to reduce the memory footprint, training and inference time of BERT based binary text classifiers which could serve as spam or anomaly detector, sentiment discriminator, and so on. Large-scale pre-trained language models such as BERT have brought much better performance to text classification. However, their large sizes can lead to sometimes prohibitively slow fine-tuning and inference. To alleviate this, various compression methods have been proposed; however, most of these methods solely consider reducing inference time, often ignoring significant increases in training time, and thus are even more resource consuming. In this article, we focus on lottery ticket extraction for the BERT architecture. Inspired by observations that representations at lower layers are often more useful for text classification, we propose that we can identify the winning ticket of BERT for binary text classification through adaptive truncation, i.e., a process that drops the top-k layers of the pre-trained model based on simple, fast computations. In this way, the cost for compressing and fine-tuning, as well as inference, can be vastly reduced. We present experiments on eight mainstream binary text classification datasets covering different input styles (i.e., single-text and text-pair), as well as different typical tasks (e.g., sentiment analysis, acceptability judgement, textual entailment, semantic similarity analysis and natural language inference). Compared with some strong baselines, our method saved 78.1% time and 31.7% memory on average, and up to 86.7 and 48% in extreme cases, respectively. We also saw good performance, often outperforming the original language model. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

17. Detecting Deceptive Utterances Using Deep Pre-Trained Neural Networks.

Author: Wawer, Aleksander and Sarzyńska-Wawer, Justyna
Subjects: ARTIFICIAL neural networks, LIE detectors & detection, POLISH language, NATURAL language processing, FALSE testimony
Abstract: Lying is an integral part of everyday communication in both written and oral forms. Detecting lies is therefore essential and has many possible applications. Our study aims to investigate the performance of automated lie detection methods, namely the most recent breed of pre-trained transformer neural networks capable of processing the Polish language. We used a dataset of nearly 1500 true and false statements, half of which were transcripts and the other half written statements, originating from possibly the largest study of deception in the Polish language. Technically, the problem was posed as text classification. We found that models perform better on typed than spoken utterances. The best-performing model achieved an accuracy of 0.69, which is much higher than the human performance average of 0.56. For transcribed utterances, human performance was at 0.58 and the models reached 0.62. We also explored model interpretability based on integrated gradient to shed light on classifier decisions. Our observations highlight the role of first words and phrases in model decisions, but more work is needed to systematically explore the observed patterns. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

18. Relational Graph Convolutional Network for Text-Mining-Based Accident Causal Classification.

Author: Chen, Zaili, Huang, Kai, Wu, Li, Zhong, Zhenyu, and Jiao, Zeyu
Subjects: ACCIDENT investigation, NATURAL language processing, INVESTIGATION reports, ACCIDENT prevention, CLASSIFICATION
Abstract: Accident investigation reports are text documents that systematically review and analyze the cause and process of accidents after accidents have occurred and have been widely used in the fields such as transportation, construction and aerospace. With the aid of accident investigation reports, the cause of the accident can be clearly identified, which provides an important basis for accident prevention and reliability assessment. However, since accident record reports are mostly composed of unstructured data such as text, the analysis of accident causes inevitably relies on a lot of expert experience and statistical analyses also require a lot of manual classification. Although, in recent years, with the development of natural language processing technology, there have been many efforts to automatically analyze and classify text. However, the existing methods either rely on large corpus and data preprocessing methods, which are cumbersome, or extract text information based on bidirectional encoder representation from transformers (BERT), but the computational cost is extremely high. These shortcomings make it still a great challenge to automatically analyze accident investigation reports and extract the information therein. To address the aforementioned problems, this study proposes a text-mining-based accident causal classification method based on a relational graph convolutional network (R-GCN) and pre-trained BERT. On the one hand, the proposed method avoids preprocessing such as stop word removal and word segmentation, which not only preserves the information of accident investigation reports to the greatest extent, but also avoids tedious operations. On the other hand, with the help of R-GCN to process the semantic features obtained by BERT representation, the dependence of BERT retraining on computing resources can be avoided. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

19. Commonsense Knowledge-Aware Prompt Tuning for Few-Shot NOTA Relation Classification.

Author: Lv, Bo, Jin, Li, Zhang, Yanan, Wang, Hao, Li, Xiaoyu, and Guo, Zhi
Subjects: SHOT peening, CLASSIFICATION, SEMANTICS
Abstract: Compared with the traditional few-shot task, the few-shot none-of-the-above (NOTA) relation classification focuses on the realistic scenario of few-shot learning, in which a test instance might not belong to any of the target categories. This undoubtedly increases the task's difficulty because given only a few support samples, this cannot represent the distribution of NOTA categories in space. The model needs to make full use of the syntactic information and word meaning information learned in the pre-training stage to distinguish the NOTA category and the support sample category in the embedding space. However, previous fine-tuning methods mainly focus on optimizing the extra classifiers (on top of pre-trained language models (PLMs)) and neglect the connection between pre-training objectives and downstream tasks. In this paper, we propose the commonsense knowledge-aware prompt tuning (CKPT) method for a few-shot NOTA relation classification task. First, a simple and effective prompt-learning method is developed by constructing relation-oriented templates, which can further stimulate the rich knowledge distributed in PLMs to better serve downstream tasks. Second, external knowledge is incorporated into the model by a label-extension operation, which forms knowledgeable prompt tuning to improve and stabilize prompt tuning. Third, to distinguish the NOTA pairs and positive pairs in embedding space more accurately, a learned scoring strategy is proposed, which introduces a learned threshold classification function and improves the loss function by adding a new term focused on NOTA identification. Experiments on two widely used benchmarks (FewRel 2.0 and Few-shot TACRED) show that our method is a simple and effective framework, and a new state of the art is established in the few-shot classification field. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

20. DATLMedQA: A Data Augmentation and Transfer Learning Based Solution for Medical Question Answering.

Author: Zhou, Shuohua and Zhang, Yanping
Abstract: With the outbreak of COVID-19 that has prompted an increased focus on self-care, more and more people hope to obtain disease knowledge from the Internet. In response to this demand, medical question answering and question generation tasks have become an important part of natural language processing (NLP). However, there are limited samples of medical questions and answers, and the question generation systems cannot fully meet the needs of non-professionals for medical questions. In this research, we propose a BERT medical pretraining model, using GPT-2 for question augmentation and T5-Small for topic extraction, calculating the cosine similarity of the extracted topic and using XGBoost for prediction. With augmentation using GPT-2, the prediction accuracy of our model outperforms the state-of-the-art (SOTA) model performance. Our experiment results demonstrate the outstanding performance of our model in medical question answering and question generation tasks, and its great potential to solve other biomedical question answering challenges. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

21. Universal Adversarial Attack via Conditional Sampling for Text Classification.

Author: Zhang, Yu, Shao, Kun, Yang, Junan, and Liu, Hui
Subjects: NATURAL language processing, COMPUTER vision, CLASSIFICATION
Abstract: Despite deep neural networks (DNNs) having achieved impressive performance in various domains, it has been revealed that DNNs are vulnerable in the face of adversarial examples, which are maliciously crafted by adding human-imperceptible perturbations to an original sample to cause the wrong output by the DNNs. Encouraged by numerous researches on adversarial examples for computer vision, there has been growing interest in designing adversarial attacks for Natural Language Processing (NLP) tasks. However, the adversarial attacking for NLP is challenging because text is discrete data and a small perturbation can bring a notable shift to the original input. In this paper, we propose a novel method, based on conditional BERT sampling with multiple standards, for generating universal adversarial perturbations: input-agnostic of words that can be concatenated to any input in order to produce a specific prediction. Our universal adversarial attack can create an appearance closer to natural phrases and yet fool sentiment classifiers when added to benign inputs. Based on automatic detection metrics and human evaluations, the adversarial attack we developed dramatically reduces the accuracy of the model on classification tasks, and the trigger is less easily distinguished from natural text. Experimental results demonstrate that our method crafts more high-quality adversarial examples as compared to baseline methods. Further experiments show that our method has high transferability. Our goal is to prove that adversarial attacks are more difficult to detect than previously thought and enable appropriate defenses. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

22. From General Language Understanding to Noisy Text Comprehension.

Author: Kasthuriarachchy, Buddhika, Chetty, Madhu, Shatte, Adrian, and Walls, Darren
Subjects: SURGICAL gloves, SOCIAL media, COLLECTIVE representation, PREDICTION models, MICROBLOGS
Abstract: Obtaining meaning-rich representations of social media inputs, such as Tweets (unstructured and noisy text), from general-purpose pre-trained language models has become challenging, as these inputs typically deviate from mainstream English usage. The proposed research establishes effective methods for improving the comprehension of noisy texts. For this, we propose a new generic methodology to derive a diverse set of sentence vectors combining and extracting various linguistic characteristics from latent representations of multi-layer, pre-trained language models. Further, we clearly establish how BERT, a state-of-the-art pre-trained language model, comprehends the linguistic attributes of Tweets to identify appropriate sentence representations. Five new probing tasks are developed for Tweets, which can serve as benchmark probing tasks to study noisy text comprehension. Experiments are carried out for classification accuracy by deriving the sentence vectors from GloVe-based pre-trained models and Sentence-BERT, and by using different hidden layers from the BERT model. We show that the initial and middle layers of BERT have better capability for capturing the key linguistic characteristics of noisy texts than its latter layers. With complex predictive models, we further show that the sentence vector length has lesser importance to capture linguistic information, and the proposed sentence vectors for noisy texts perform better than the existing state-of-the-art sentence vectors. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

23. A Sequential and Intensive Weighted Language Modeling Scheme for Multi-Task Learning-Based Natural Language Understanding.

Author: Son, Suhyune, Hwang, Seonjeong, Bae, Sohyeun, Park, Soo Jun, Choi, Jang-Hwan, and Mozgovoy, Maxim
Subjects: NATURAL languages, SEQUENTIAL learning, NATURAL language processing
Abstract: Multi-task learning (MTL) approaches are actively used for various natural language processing (NLP) tasks. The Multi-Task Deep Neural Network (MT-DNN) has contributed significantly to improving the performance of natural language understanding (NLU) tasks. However, one drawback is that confusion about the language representation of various tasks arises during the training of the MT-DNN model. Inspired by the internal-transfer weighting of MTL in medical imaging, we introduce a Sequential and Intensive Weighted Language Modeling (SIWLM) scheme. The SIWLM consists of two stages: (1) Sequential weighted learning (SWL), which trains a model to learn entire tasks sequentially and concentrically, and (2) Intensive weighted learning (IWL), which enables the model to focus on the central task. We apply this scheme to the MT-DNN model and call this model the MTDNN-SIWLM. Our model achieves higher performance than the existing reference algorithms on six out of the eight GLUE benchmark tasks. Moreover, our model outperforms MT-DNN by 0.77 on average on the overall task. Finally, we conducted a thorough empirical investigation to determine the optimal weight for each GLUE task. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

24. Using BiLSTM Networks for Context-Aware Deep Sensitivity Labelling on Conversational Data.

Author: Pogiatzis, Antreas and Samakovitis, Georgios
Subjects: DATA privacy, RANDOM fields
Abstract: Information privacy is a critical design feature for any exchange system, with privacy-preserving applications requiring, most of the time, the identification and labelling of sensitive information. However, privacy and the concept of "sensitive information" are extremely elusive terms, as they are heavily dependent upon the context they are conveyed in. To accommodate such specificity, we first introduce a taxonomy of four context classes to categorise relationships of terms with their textual surroundings by meaning, interaction, precedence, and preference. We then propose a predictive context-aware model based on a Bidirectional Long Short Term Memory network with Conditional Random Fields (BiLSTM + CRF) to identify and label sensitive information in conversational data (multi-class sensitivity labelling). We train our model on a synthetic annotated dataset of real-world conversational data categorised in 13 sensitivity classes that we derive from the P3P standard. We parameterise and run a series of experiments featuring word and character embeddings and introduce a set of auxiliary features to improve model performance. Our results demonstrate that the BiLSTM + CRF model architecture with BERT embeddings and WordShape features is the most effective (F1 score 96.73%). Evaluation of the model is conducted under both temporal and semantic contexts, achieving a 76.33% F1 score on unseen data and outperforms Google's Data Loss Prevention (DLP) system on sensitivity labelling tasks. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

25. Paraphrase Identification with Lexical, Syntactic and Sentential Encodings.

Author: Xu, Sheng, Shen, Xingfa, Fukumoto, Fumiyo, Li, Jiyi, Suzuki, Yoshimi, and Nishizaki, Hiromitsu
Subjects: PARAPHRASE, CONTEXTUAL learning, ENCODING
Abstract: Paraphrase identification has been one of the major topics in Natural Language Processing (NLP). However, how to interpret a diversity of contexts such as lexical and semantic information within a sentence as relevant features is still an open problem. This paper addresses the problem and presents an approach for leveraging contextual features with a neural-based learning model. Our Lexical, Syntactic, and Sentential Encodings (LSSE) learning model incorporates Relational Graph Convolutional Networks (R-GCNs) to make use of different features from local contexts, i.e., word encoding, position encoding, and full dependency structures. By utilizing the hidden states obtained by the R-GCNs as well as lexical and sentential encodings by Bidirectional Encoder Representations from Transformers (BERT), our model learns the contextual similarity between sentences effectively. The experimental results by using the two benchmark datasets, Microsoft Research Paraphrase Corpus (MRPC) and Quora Question Pairs (QQP) show that the improvement compared with the baseline, BERT sentential encodings model, was 1.7% F1-score on MRPC and 1.0% F1-score on QQP. Moreover, we verified that the combination of position encoding and syntactic features contributes to performance improvement. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

25 results on '"Michael J. Hill"'

1. Autocorrelation Matrix Knowledge Distillation: A Task-Specific Distillation Method for BERT Models.

2. Machine Learning Approach for Arabic Handwritten Recognition.

3. Extracting Sentence Embeddings from Pretrained Transformer Models.

4. MicroBERT: Distilling MoE-Based Knowledge from BERT into a Lighter Model.

5. LCAS-DetNet: A Ship Target Detection Network for Synthetic Aperture Radar Images.

6. A Historical Survey of Advances in Transformer Architectures.

7. Natural Language Processing in Knowledge-Based Support for Operator Assistance.

8. Research and Development of a Modern Deep Learning Model for Emotional Analysis Management of Text Data.

9. An Improved Nested Named-Entity Recognition Model for Subject Recognition Task under Knowledge Base Question Answering.

10. F-ALBERT: A Distilled Model from a Two-Time Distillation System for Reduced Computational Complexity in ALBERT Model.

11. EvoText: Enhancing Natural Language Generation Models via Self-Escalation Learning for Up-to-Date Knowledge and Improved Performance.

12. PrivacyGLUE: A Benchmark Dataset for General Language Understanding in Privacy Policies.

13. Evaluating Deep Learning Techniques for Natural Language Inference.

14. Ensemble-NQG-T5: Ensemble Neural Question Generation Model Based on Text-to-Text Transfer Transformer.

15. Vision-Guided Object Recognition and 6D Pose Estimation System Based on Deep Neural Network for Unmanned Aerial Vehicles towards Intelligent Logistics.

16. Compressing BERT for Binary Text Classification via Adaptive Truncation before Fine-Tuning.

17. Detecting Deceptive Utterances Using Deep Pre-Trained Neural Networks.

18. Relational Graph Convolutional Network for Text-Mining-Based Accident Causal Classification.

19. Commonsense Knowledge-Aware Prompt Tuning for Few-Shot NOTA Relation Classification.

20. DATLMedQA: A Data Augmentation and Transfer Learning Based Solution for Medical Question Answering.

21. Universal Adversarial Attack via Conditional Sampling for Text Classification.

22. From General Language Understanding to Noisy Text Comprehension.

23. A Sequential and Intensive Weighted Language Modeling Scheme for Multi-Task Learning-Based Natural Language Understanding.

24. Using BiLSTM Networks for Context-Aware Deep Sensitivity Labelling on Conversational Data.

25. Paraphrase Identification with Lexical, Syntactic and Sentential Encodings.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

25 results on '"Michael J. Hill"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources