"Language Modeling" / Topic: deep learning - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Language Modeling"' showing total 2,653 results

Start Over "Language Modeling" Topic deep learning

2,653 results on '"Language Modeling"'

1. Current status and trends in large language modeling research

Author: Yaozu WANG, Qing LI, Zhangjie DAI, and Yue XU
Subjects: large language models (llms), natural language processing, chatgpt, deep learning, artificial intelligence, Mining engineering. Metallurgy, TN1-997, Environmental engineering, TA170-171
Abstract: Over the past two decades, language modeling (LM) has emerged as a primary methodology for language understanding and generation. This technology has become a cornerstone within the field of natural language processing (NLP). At its core, LM is designed to train models to predict the probability of the next word or token, thereby generating natural and fluent language. The advent of large language models (LLMs), such as Bidirectional Encoder Representations from Transformers and GPT-3, marks a significant milestone in the evolution of LM. These LLMs have left a profound impact on the field of artificial intelligence (AI) while also paving the way for advancements in other domains. This progression underscores the power and efficacy of AI, illustrating how the landscape of AI research has been reshaped by the rapid advancement of LLMs. This paper provides a comprehensive review of the evolution of LLMs, focusing on the technical architecture, model scale, training methods, optimization techniques, and evaluation metrics. Language models have evolved significantly over time, starting from initial statistical language models, moving onto neural network-based models, and now embracing the era of advanced pre-trained language models. As the scale of these models has expanded, so has their performance in language understanding and generation. This has led to notable results across various sectors, including education, healthcare, finance, and industry. However, the application of LLMs also presents certain challenges, such as data quality, model generalization capabilities, and computational resources. This paper delves into these issues, providing an analysis of the strengths and limitations of LLMs. Furthermore, the rise of LLMs has sparked a series of ethical, privacy, and security concerns. For instance, LLMs may generate discriminatory, false, or misleading information, infringe on personal privacy, or even be exploited for malicious activities such as cyber-attacks. To tackle these issues, this paper explores relevant technical measures, such as model interpretability, privacy protection, and security assessments. Ultimately, the paper outlines potential future research trends of LLMs. With ongoing enhancements to model scale and efficiency, LLMs are expected to play an even greater role in multimodal processing and societal impact. For example, by integrating information from different modalities, such as images and sound, LLMs can better understand and generate language. Additionally, they can be employed for societal impact assessment, providing support for policy formulation and decision-making. By thoroughly analyzing the current state of research and potential future directions, this paper aims to offer researchers valuable insights and inspiration regarding LLMs, thereby fostering further advancement in the field.
Published: 2024
Full Text: View/download PDF

2. Sentiment-based masked language modeling for improving sentence-level valence–arousal prediction

Author: Wu, Jheng-Long and Chung, Wei-Yi
Published: 2022
Full Text: View/download PDF

3. Deep-Learning Language-Modeling Approach for Automated, Personalized, and Iterative Radiology-Pathology Correlation.

Author: Filice RW
Subjects: Artificial Intelligence, Automation, Databases, Factual, Forecasting, Humans, Pathology, Clinical, Radiology methods, Radiology trends, Deep Learning trends, Natural Language Processing, Precision Medicine trends, Quality Improvement, Radiology Information Systems statistics & numerical data
Abstract: Purpose: Radiology-pathology correlation has long been foundational to continuing education, peer learning, quality assurance, and multidisciplinary patient care. The objective of this study was to determine whether modern deep-learning language-modeling techniques could reliably match pathology reports to pertinent radiology reports., Methods: The recently proposed Universal Language Model Fine-Tuning for Text Classification methodology was used. Two hundred thousand radiology and pathology reports were used for adaptation to the radiology-pathology space. One hundred thousand candidate radiology-pathology pairs, evenly split into match and no-match categories, were used for training the final binary classification model. Matches were defined by a previous-generation artificial intelligence anatomic concept radiology-pathology correlation system., Results: The language model rapidly adapted very closely to the prior anatomic concept-matching approach, with 100% specificity, 65.1% sensitivity, and 73.7% accuracy. For comparison, the previous methodology, which was intentionally designed to be specific at the expense of sensitivity, had 98.0% specificity, 65.1% sensitivity, and 73.2% accuracy., Conclusions: Modern deep-learning language-modeling approaches are promising for radiology-pathology correlation. Because of their rapid adaptation to underlying training labels, these models advance previous artificial intelligence work in that they can be continuously improved and tuned to improve performance and adjust to user and site-level preference., (Copyright © 2019 American College of Radiology. Published by Elsevier Inc. All rights reserved.)
Published: 2019
Full Text: View/download PDF

4. Transformer-Based Deep Neural Language Modeling for Construct-Specific Automatic Item Generation

Author: Hommel, Björn E., Wollang, Franz-Josef M., Kotova, Veronika, Zacher, Hannes, and Schmukle, Stefan C.
Published: 2022
Full Text: View/download PDF

5. Review of Deep Learning for Language Modeling

Author: WANG Sili, ZHANG Ling, YANG Heng, LIU Wei
Subjects: deep learning, language model, neural network, pre-trained model, word embedding, Bibliography. Library science. Information resources, Agriculture
Abstract: [Purpose/Significance] Deep learning for language modeling is one of the major methods and advanced technologies to enhance language intelligence of machines at present, which has become an indispensable important technical means for automatic processing and analysis of data resources, and intelligent mining of information and knowledge. However, there are still some difficulties in using deep learning for language modeling for technology development and application service in the library and information science (LIS) field. Therefore, this study systematically reviews and reveals the research progress, technical principles, and development methods of deep learning for language modeling, with the aim at providing reliable theoretical basis and feasible methodological paths for the deep understanding and application of deep learning for language modeling for librarians and fellow practitioners. [Method/Process] The data used in this study were collected from the WOS core database, CNKI literature database, arXiv preprint repository, GitHub open-source software hosting platform and the open resources on the Internet. Based on these data, this paper first systematically investigates the background, basic feature representation algorithms, and representative application development tools of deep learning for language modeling, reveals their dynamic evolution and technical principles, and analyzes the advantages and disadvantages and applicability of each algorithm model and development tool. Second, an in-depth analysis of the possible challenging problems faced by the development and application of deep learning for language modeling was performed, and two strategic approaches to expand their application capabilities were put forward. [Results/Conclusions] The important challenges faced by the application and development of deep learning for language modeling include numerous parameters and difficulties to adjust accuracy, relying on a large amount of accurate training data, difficulties in making changes, and the intellectual property and information security issues. In the future, we will start from two aspects of specific domains and feature engineering to expand and improve the application capabilities of deep learning for language modeling. Specifically, we focus on consideration of the collection and preparation of domain data, selection of model architecture, participation of domain experts, and optimization for specific tasks, in order to ensure that the data source of the model is more reliable and secure, and the application effect is more accurate and practical. Moreover, the strategic methods for feature engineering to expand the application capabilities of deep learning for language modeling include selecting appropriate features, feature pre-processing, feature selection, and feature dimensionality reduction. These strategies can help improve the performance and efficiency of deep learning for language models, making them more suitable for specific tasks or domains. To sum up, LIS institutions should leverage the deep learning for language modeling related technologies, guided by the needs of scientific research and social development, and based on advantages of existing literature data resources and knowledge services; they should carry out innovative professional or vertical domain intelligent knowledge management and application service, and develop technology and systems with independent intellectual property rights, which is their long-term sustainable development path.
Published: 2023
Full Text: View/download PDF

6. MLM: Masked Language Modeling Using Deep Learning for Efficient Summarization of Unstructured Data

Author: Bedi, Parminder Pal Singh, Bala, Manju, Sharma, Kapil, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Swaroop, Abhishek, editor, Polkowski, Zdzislaw, editor, Correia, Sérgio Duarte, editor, and Virdee, Bal, editor
Published: 2024
Full Text: View/download PDF

7. Arabic Rumor Detection Using Contextual Deep Bidirectional Language Modeling

Author: Naelah O. Bahurmuz, Ghada A. Amoudi, Fatmah A. Baothman, Amani T. Jamal, Hanan S. Alghamdi, and Areej M. Alhothali
Subjects: Classification, deep learning, fake news, imbalanced data, machine learning, natural language processing, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: In today’s world, news outlets have changed dramatically; newspapers are obsolete, and radio is no longer in the picture. People look for news online and on social media, such as Twitter and Facebook. Social media contributors share information and trending stories before verifying their truthfulness, thus, spreading rumors. Early identification of rumors from social media has attracted many researchers. However, a relatively smaller number of studies focused on other languages, such as Arabic. In this study, an Arabic rumor detection model is proposed. The model was built using transformer-based deep learning architecture. According to the literature, transformers are neural networks with outstanding performance in natural language processing tasks. Two transformers-based models, AraBERT and MARBERT, were employed, tested, and evaluated using three recently developed Arabic datasets. These models are extensions to the BERT, Bidirectional Encoder Representations from Transformers, a deep learning model that uses transformer architecture to learn the text representations and leverages the attention mechanism. We have also mitigated the challenges introduced by the imbalanced training datasets by employing two sampling techniques. The experimental results of our proposed approaches achieved a maximum accuracy of 0.97. This result demonstrated the effectiveness of the proposed method and outperformed other existing Arabic rumor detection methods.
Published: 2022
Full Text: View/download PDF

8. Deep dynamic neural networks for temporal language modeling in author communities

Author: Delasalles, Edouard, Lamprier, Sylvain, and Denoyer, Ludovic
Published: 2021
Full Text: View/download PDF

9. Psychological Human Traits Detection based on Universal Language Modeling

Author: Kamal El-Demerdash, Reda A. El-Khoribi, Mahmoud A. Ismail Shoman, and Sherif Abdou
Subjects: Big Five Personality Model, Personality Traits, LSTM, NLP, Text Analytics, Deep Learning, Electronic computers. Computer science, QA75.5-76.95
Abstract: Personality Traits Detection is one of the important problems as a text analytics task in Natural Language Processing (NLP). Text analytics is the process of finding out insight knowledge over written text. Although most deep learning models give high performance, they often lack interpretability. Computer Vision (CV) has been affected significantly with inductive transfer learning, however training from scratch and task-specific modifications are still wanted in many NLP techniques.This paper addresses the problem of personality traits classification. We adopted the use of the Universal Language Model Fine-Tuning (ULMFiT) in personality traits detection. The model makes use of transfer learning rather than the classical shallow methods of word embedding and proved to be the most powerful model in many NLP problems.The basic advantage of using this model is that there is no need to do feature engineering before classification. When applied to benchmark dataset, the proposed method shows a statistical accuracy improvement of about 1% compared to the state-of-the-art results for the big five personality traits.
Published: 2021
Full Text: View/download PDF

10. 深度学习语言模型的研究综述.

Author: 王思丽, 张伶, 杨恒, and 刘巍
Abstract: [Purpose/Significance] Deep learning for language modeling is one of the major methods and advanced technologies to enhance language intelligence of machines at present, which has become an indispensable important technical means for automatic processing and analysis of data resources, and intelligent mining of information and knowledge. However, there are still some difficulties in using deep learning for language modeling for technology development and application service in the library and information science (LIS) field. Therefore, this study systematically reviews and reveals the research progress, technical principles, and development methods of deep learning for language modeling, with the aim at providing reliable theoretical basis and feasible methodological paths for the deep understanding and application of deep learning for language modeling for librarians and fellow practitioners. [Method/Process] The data used in this study were collected from the WOS core database, CNKI literature database, arXiv preprint repository, GitHub open-source software hosting platform and the open resources on the Internet. Based on these data, this paper first systematically investigates the background, basic feature representation algorithms, and representative application development tools of deep learning for language modeling, reveals their dynamic evolution and technical principles, and analyzes the advantages and disadvantages and applicability of each algorithm model and development tool. Second, an in-depth analysis of the possible challenging problems faced by the development and application of deep learning for language modeling was performed, and two strategic approaches to expand their application capabilities were put forward. [Results/Conclusions] The important challenges faced by the application and development of deep learning for language modeling include numerous parameters and difficulties to adjust accuracy, relying on a large amount of accurate training data, difficulties in making changes, and the intellectual property and information security issues. In the future, we will start from two aspects of specific domains and feature engineering to expand and improve the application capabilities of deep learning for language modeling. Specifically, we focus on consideration of the collection and preparation of domain data, selection of model architecture, participation of domain experts, and optimization for specific tasks, in order to ensure that the data source of the model is more reliable and secure, and the application effect is more accurate and practical. Moreover, the strategic methods for feature engineering to expand the application capabilities of deep learning for language modeling include selecting appropriate features, feature pre-processing, feature selection, and feature dimensionality reduction. These strategies can help improve the performance and efficiency of deep learning for language models, making them more suitable for specific tasks or domains. To sum up, LIS institutions should leverage the deep learning for language modeling related technologies, guided by the needs of scientific research and social development, and based on advantages of existing literature data resources and knowledge services; they should carry out innovative professional or vertical domain intelligent knowledge management and application service, and develop technology and systems with independent intellectual property rights, which is their long-term sustainable development path. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

11. Empirical evaluation of language modeling to ascertain cancer outcomes from clinical text reports.

Author: Elmarakeby, Haitham A., Trukhanov, Pavel S., Arroyo, Vidal M., Riaz, Irbaz Bin, Schrag, Deborah, Van Allen, Eliezer M., and Kehl, Kenneth L.
Subjects: *LANGUAGE models, *ARTIFICIAL neural networks, *CONVOLUTIONAL neural networks, *DEEP learning, *NATURAL language processing, *CANCER prognosis, *NON-small-cell lung carcinoma, *NEUROLINGUISTICS
Abstract: Background: Longitudinal data on key cancer outcomes for clinical research, such as response to treatment and disease progression, are not captured in standard cancer registry reporting. Manual extraction of such outcomes from unstructured electronic health records is a slow, resource-intensive process. Natural language processing (NLP) methods can accelerate outcome annotation, but they require substantial labeled data. Transfer learning based on language modeling, particularly using the Transformer architecture, has achieved improvements in NLP performance. However, there has been no systematic evaluation of NLP model training strategies on the extraction of cancer outcomes from unstructured text. Results: We evaluated the performance of nine NLP models at the two tasks of identifying cancer response and cancer progression within imaging reports at a single academic center among patients with non-small cell lung cancer. We trained the classification models under different conditions, including training sample size, classification architecture, and language model pre-training. The training involved a labeled dataset of 14,218 imaging reports for 1112 patients with lung cancer. A subset of models was based on a pre-trained language model, DFCI-ImagingBERT, created by further pre-training a BERT-based model using an unlabeled dataset of 662,579 reports from 27,483 patients with cancer from our center. A classifier based on our DFCI-ImagingBERT, trained on more than 200 patients, achieved the best results in most experiments; however, these results were marginally better than simpler "bag of words" or convolutional neural network models. Conclusion: When developing AI models to extract outcomes from imaging reports for clinical cancer research, if computational resources are plentiful but labeled training data are limited, large language models can be used for zero- or few-shot learning to achieve reasonable performance. When computational resources are more limited but labeled training data are readily available, even simple machine learning architectures can achieve good performance for such tasks. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

12. Investigating the human and nonobese diabetic mouse MHC class II immunopeptidome using protein language modeling.

Author: Hartout, Philip, Počuča, Bojana, Méndez-García, Celia, and Schleberger, Christian
Subjects: *DEEP learning, *MACHINE learning, *PROTEIN models, *T cell receptors, *DRUG discovery, *TYPE 1 diabetes, *PEPTIDES
Abstract: Motivation Identifying peptides associated with the major histocompability complex class II (MHCII) is a central task in the evaluation of the immunoregulatory function of therapeutics and drug prototypes. MHCII-peptide presentation prediction has multiple biopharmaceutical applications, including the safety assessment of biologics and engineered derivatives in silico , or the fast progression of antigen-specific immunomodulatory drug discovery programs in immune disease and cancer. This has resulted in the collection of large-scale datasets on adaptive immune receptor antigenic responses and MHC-associated peptide proteomics. In parallel, recent deep learning algorithmic advances in protein language modeling have shown potential in leveraging large collections of sequence data and improve MHC presentation prediction. Results Here, we train a compact transformer model (AEGIS) on human and mouse MHCII immunopeptidome data, including a preclinical murine model, and evaluate its performance on the peptide presentation prediction task. We show that the transformer performs on par with existing deep learning algorithms and that combining datasets from multiple organisms increases model performance. We trained variants of the model with and without MHCII information. In both alternatives, the inclusion of peptides presented by the I-Ag7 MHC class II molecule expressed by nonobese diabetic mice enabled for the first time the accurate in silico prediction of presented peptides in a preclinical type 1 diabetes model organism, which has promising therapeutic applications. Availability and implementation The source code is available at https://github.com/Novartis/AEGIS. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

13. Psychological Human Traits Detection based on Universal Language Modeling.

Author: El-Demerdash, Kamal, El-Khoribi, Reda A., Shoman, Mahmoud A. Ismail, and Abdou, Sherif
Subjects: FIVE-factor model of personality, UNIVERSAL language, NATURAL language processing, DEEP learning, COMPUTER vision, PERSONALITY, STATISTICAL accuracy
Abstract: Personality Traits Detection is one of the important problems as a text analytics task in Natural Language Processing (NLP). Text analytics is the process of finding out insight knowledge over written text. Although most deep learning models give high performance, they often lack interpretability. Computer Vision (CV) has been affected significantly with inductive transfer learning, however training from scratch and task-specific modifications are still wanted in many NLP techniques. This paper addresses the problem of personality traits classification. We adopted the use of the Universal Language Model Fine-Tuning (ULMFiT) in personality traits detection. The model makes use of transfer learning rather than the classical shallow methods of word embedding and proved to be the most powerful model in many NLP problems. The basic advantage of using this model is that there is no need to do feature engineering before classification. When applied to benchmark dataset, the proposed method shows a statistical accuracy improvement of about 1% compared to the state-of-the-art results for the big five personality traits. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

14. Progress in Neural Network Based Statistical Language Modeling

Author: Kunte, Anup Shrikant, Attar, Vahida Z., Kacprzyk, Janusz, Series Editor, Pedrycz, Witold, editor, and Chen, Shyi-Ming, editor
Published: 2020
Full Text: View/download PDF

15. Advancing neural language modeling in automatic speech recognition

Author: Irie, Kazuki, Ney, Hermann, and de Mori, Renato
Subjects: language modeling, speech recognition , language modeling , artificial neural networks , deep learning, speech recognition, deep learning, ddc:004, artificial neural networks
Abstract: Dissertation, RWTH Aachen University, 2020; Aachen 1 Online-Ressource (xi, 151 Seiten) : Illustrationen (2020). = Dissertation, RWTH Aachen University, 2020, Statistical language modeling is one of the fundamental problems in natural language processing. In the recent years, language modeling has seen great advances by active research and engineering efforts in applying artificial neural networks, especially those which are recurrent. The application of neural language models to speech recognition has now become well established and ubiquitous. Despite this impression of some degree of maturity, we claim that the full potential of the neural network based language modeling is yet to be explored. In this thesis, we further advance neural language modeling in automatic speech recognition, by investigating a number of new perspectives. From the architectural view point, we investigate the newly proposed Transformer neural net- works for language modeling application. The original model architecture proposed for machine translation is studied and modified to accommodate the specific task of language modeling. Particularly deep models with about one hundred layers are developed. We present an in-depth comparison with the state-of-the-art recurrent neural network language models based on the long short-term memory. While scaling up language modeling to larger scale datasets, the diversity of the data emerges as an opportunity and a challenge. The current state-of-the-art neural language modeling lacks a mechanism of handling diverse data from different domains for a single model to perform well across different domains. In this context, we introduce domain robust language modeling with neural networks, and propose two solutions. As a first solution, we propose a new type of adaptive mixture of experts model which is fully based on neural networks. In the second approach, we investigate knowledge distillation from multiple domain expert models, as a solution to the large model size problem seen in the first approach. Methods for practical applications of knowledge distillation to large vocabulary language modeling are proposed, and studied to a large extent. Finally, we investigate the potential of neural language models to leverage long-span cross- sentence contexts for cross-utterance speech recognition. The appropriate training method for such a scenario is under-explored in the existing works. We carry out systematic comparisons of the training methods, allowing us to achieve improvements in cross-utterance speech recognition. In the same context, we study the sequence length robustness for both recurrent neural networks based on the long short-term memory and Transformers, because such a robustness is one of the fundamental properties we wish to have, in neural networks with the ability to handle variable length contexts. Throughout the thesis, we tackle these problems through novel perspectives of neural language modeling, while keeping the traditional spirit of language modeling in speech recognition., Published by Aachen
Published: 2020
Full Text: View/download PDF

16. Transformer-Based Deep Neural Language Modeling for Construct-Specific Automatic Item Generation

Author: Hannes Zacher, Veronika Kotova, Franz-Josef Wollang, Björn Hommel, and Stefan Schmukle
Subjects: Psychometrics, Applied Mathematics, Humans, Neural Networks, Computer, Models, Theoretical, Application Reviews and Case Studies, Application Reviews and Case Studies (ARCS), automatic item generation, natural language processing, deep learning, neural networks, language modeling, General Psychology, ddc, Language, Natural Language Processing
Abstract: Algorithmic automatic item generation can be used to obtain large quantities of cognitive items in the domains of knowledge and aptitude testing. However, conventional item models used by template-based automatic item generation techniques are not ideal for the creation of items for non-cognitive constructs. Progress in this area has been made recently by employing long short-term memory recurrent neural networks to produce word sequences that syntactically resemble items typically found in personality questionnaires. To date, such items have been produced unconditionally, without the possibility of selectively targeting personality domains. In this article, we offer a brief synopsis on past developments in natural language processing and explain why the automatic generation of construct-specific items has become attainable only due to recent technological progress. We propose that pre-trained causal transformer models can be fine-tuned to achieve this task using implicit parameterization in conjunction with conditional generation. We demonstrate this method in a tutorial-like fashion and finally compare aspects of validity in human- and machine-authored items using empirical data. Our study finds that approximately two-thirds of the automatically generated items show good psychometric properties (factor loadings above .40) and that one-third even have properties equivalent to established and highly curated human-authored items. Our work thus demonstrates the practical use of deep neural networks for non-cognitive automatic item generation.
Published: 2021

17. Feature memory-based deep recurrent neural network for language modeling.

Author: Deng, Hongli, Zhang, Lei, and Shu, Xin
Subjects: FEATURE extraction, LANGUAGE & languages, BACK propagation, RECURRENT neural networks, DEEP learning
Abstract: Recently, deep recurrent neural networks (DRNNs) have been widely proposed for language modeling. DRNNs can learn higher-level features of input data by stacking multiple recurrent layers, making them achieve better performance than single-layer models. However, due to their simple linear stacking patterns, the gradient information vanishes when it is backward propagated through too many layers. As a result, DRNNs become hard to train and their performance degrades rapidly with the number of recurrent layers increasing. To address this problem, the feature memory-based deep recurrent neural network (FMDRNN) is proposed in this paper. FMDRNN presents a new stacking pattern by introducing a special feature memory module (FM), which makes the hidden units of each layer can see and reuse all the features generated by preceding stacked layers, not just the feature from previous layer as in DRNNs. FM is like a traffic hub to provide direct connections between each two layers, and the attention network in FM controls the switch of these connections. These direct connections enable FMDRNN can alleviate the vanishing of gradient in the process of backward propagation and also make the learned features do not wash away when they reach the end of the network. FMDRNN is evaluated by performing extensive experiments on the widely used English Penn Treebank dataset and five more complex non-English language corpora. The experimental results show that FMDRNN can be effectively trained even if a larger number of layers are stacked, so that it benefits from deeper networks instead of degrading performance, and consistently achieves markedly better results than other models through deeper but thinner network. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

18. Psychological Human Traits Detection based on Universal Language Modeling

Author: Mahmoud A. Ismail Shoman, Reda A. El-Khoribi, Sherif M. Abdou, and Kamal El-Demerdash
Subjects: Feature engineering, Word embedding, Process (engineering), Computer science, 02 engineering and technology, Management Science and Operations Research, computer.software_genre, NLP, Deep Learning, Inductive transfer, 0202 electrical engineering, electronic engineering, information engineering, Big Five personality traits, Interpretability, Personality Traits, business.industry, Deep learning, 020206 networking & telecommunications, QA75.5-76.95, Computer Science Applications, Big Five Personality Model, Electronic computers. Computer science, 020201 artificial intelligence & image processing, Artificial intelligence, Text Analytics, business, Transfer of learning, LSTM, computer, Natural language processing, Information Systems
Abstract: Personality Traits Detection is one of the important problems as a text analytics task in Natural Language Processing (NLP). Text analytics is the process of finding out insight knowledge over written text. Although most deep learning models give high performance, they often lack interpretability. Computer Vision (CV) has been affected significantly with inductive transfer learning, however training from scratch and task-specific modifications are still wanted in many NLP techniques. This paper addresses the problem of personality traits classification. We adopted the use of the Universal Language Model Fine-Tuning (ULMFiT) in personality traits detection. The model makes use of transfer learning rather than the classical shallow methods of word embedding and proved to be the most powerful model in many NLP problems. The basic advantage of using this model is that there is no need to do feature engineering before classification. When applied to benchmark dataset, the proposed method shows a statistical accuracy improvement of about 1% compared to the state-of-the-art results for the big five personality traits.
Published: 2021

19. Cyberbullying Text Identification: A Deep Learning and Transformer-based Language Modeling Approach.

Author: Khalid Saifullah, Muhammad Ibrahim Khan, Jamal, Suhaima, and Sarker, Iqbal H.
Subjects: CYBERBULLYING, LANGUAGE models, DEEP learning, TRANSFORMER models, SOCIAL media, NATURAL language processing, MICROBLOGS, AUTOMATIC classification
Abstract: In the contemporary digital age, social media platforms like Facebook, Twitter, and YouTube serve as vital channels for individuals to express ideas and connect with others. Despite fostering increased connectivity, these platforms have inadvertently given rise to negative behaviors, particularly cyberbullying. While extensive research has been conducted on high-resource languages such as English, there is a notable scarcity of resources for low-resource languages like Bengali, Arabic, Tamil, etc., particularly in terms of language modeling. This study addresses this gap by developing a cyberbullying text identification system called BullyFilterNeT tailored for social media texts, considering Bengali as a test case. The intelligent BullyFilterNeT system devised overcomes Out-of-Vocabulary (OOV) challenges associated with non-contextual embeddings and addresses the limitations of context-aware feature representations. To facilitate a comprehensive understanding, three non-contextual embedding models GloVe, FastText, and Word2Vec are developed for feature extraction in Bengali. These embedding models are utilized in the classification models, employing three statistical models (SVM, SGD, Libsvm), and four deep learning models (CNN, VDCNN, LSTM, GRU). Additionally, the study employs six transformer-based language models: mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT, respectively to overcome the limitations of earlier models. Remarkably, BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, underscoring its effectiveness in cyberbullying text identification in the Bengali language. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. GDP: Generic Document Pretraining to Improve Document Understanding

Author: Trivedi, Akkshita, Upadhyay, Akarsh, Mukhopadhyay, Rudrabha, Chaudhury, Santanu, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Barney Smith, Elisa H., editor, Liwicki, Marcus, editor, and Peng, Liangrui, editor
Published: 2024
Full Text: View/download PDF

21. Patterns Versus Characters in Subword-Aware Neural Language Modeling

Author: Takhanov, Rustem, Assylbekov, Zhenisbek, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Liu, Derong, editor, Xie, Shengli, editor, Li, Yuanqing, editor, Zhao, Dongbin, editor, and El-Alfy, El-Sayed M., editor
Published: 2017
Full Text: View/download PDF

22. Split Attention Pointer Network for Source Code Language Modeling

Author: Zhongwen Chen and Zhimin Zhou
Subjects: Source code, Computer Networks and Communications, Computer science, business.industry, Programming language, media_common.quotation_subject, Deep learning, computer.software_genre, Computer Graphics and Computer-Aided Design, Recurrent neural network, Artificial Intelligence, Pointer (computer programming), Program completion, Leverage (statistics), Language model, Artificial intelligence, business, computer, Software, media_common
Abstract: There is a growing interest in leveraging Deep Learning (DL) for automating Software Engineering tasks such as program completion. In this paper, we leverage Recurrent Neural Networks (RNNs) for Abstract Syntax Tree (AST)-based code completion. Our approach converts source code into AST nodes and a language model predicts the type and value attributes of next tokens. Our work demonstrates that the attention augmented RNN-based language models are able to understand local context and copy from recent past tokens which have never appeared in the training data set. We observed a drop of performances of both type and value predictions when using a traditional pointer network architecture for out-of-vocabulary (OoV) copying and context understanding, which we call multi-task conflict. To address this challenge, we have devised a new structure of self-attention called Split Attention, where two separate dot-product layers are applied to different parts of the history cache. Based on this structure, we propose a new network called Split Attention Pointer Network (SAPN), which is efficient and flexible in both learning local context and copying OoV tokens from history. The empirical results suggest that our model is superior in syntax-aware generation and OoV token prediction by demonstrating attention behavior similar to human programmers. The results also indicate that our model out performs previous state-of-the-art approaches by more than 6% on widely recognized program completion benchmarks.
Published: 2020
Full Text: View/download PDF

23. Deep-Learning Language-Modeling Approach for Automated, Personalized, and Iterative Radiology-Pathology Correlation

Author: Ross W. Filice
Subjects: Pathology, medicine.medical_specialty, Databases, Factual, Computer science, media_common.quotation_subject, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 030218 nuclear medicine & medical imaging, Automation, 03 medical and health sciences, Deep Learning, 0302 clinical medicine, Artificial Intelligence, Multidisciplinary approach, medicine, Humans, Radiology, Nuclear Medicine and imaging, Quality (business), Precision Medicine, Peer learning, Adaptation (computer science), Natural Language Processing, media_common, Pathology, Clinical, business.industry, Deep learning, Quality Improvement, Radiology Information Systems, Binary classification, 030220 oncology & carcinogenesis, Language model, Artificial intelligence, Radiology, business, Quality assurance, Forecasting
Abstract: Purpose Radiology-pathology correlation has long been foundational to continuing education, peer learning, quality assurance, and multidisciplinary patient care. The objective of this study was to determine whether modern deep-learning language-modeling techniques could reliably match pathology reports to pertinent radiology reports. Methods The recently proposed Universal Language Model Fine-Tuning for Text Classification methodology was used. Two hundred thousand radiology and pathology reports were used for adaptation to the radiology-pathology space. One hundred thousand candidate radiology-pathology pairs, evenly split into match and no-match categories, were used for training the final binary classification model. Matches were defined by a previous-generation artificial intelligence anatomic concept radiology-pathology correlation system. Results The language model rapidly adapted very closely to the prior anatomic concept-matching approach, with 100% specificity, 65.1% sensitivity, and 73.7% accuracy. For comparison, the previous methodology, which was intentionally designed to be specific at the expense of sensitivity, had 98.0% specificity, 65.1% sensitivity, and 73.2% accuracy. Conclusions Modern deep-learning language-modeling approaches are promising for radiology-pathology correlation. Because of their rapid adaptation to underlying training labels, these models advance previous artificial intelligence work in that they can be continuously improved and tuned to improve performance and adjust to user and site-level preference.
Published: 2019
Full Text: View/download PDF

24. Neural Language Modeling for Molecule Generation

Author: Adilov S
Subjects: Matching (statistics), Artificial neural network, business.industry, Computer science, Deep learning, Benchmarking, Machine learning, computer.software_genre, chEMBL, Recurrent neural network, Artificial intelligence, Language model, business, computer, Generative grammar
Abstract: Generative neural networks have shown promising results in de novo drug design. Recent studies suggest that one of the efficient ways to produce novel molecules matching target properties is to model SMILES sequences using deep learning in a way similar to language modeling in natural language processing. In this paper, we present a survey of various machine learning methods for SMILES-based language modeling and propose our benchmarking results on a standardized subset of ChEMBL database.
Published: 2021
Full Text: View/download PDF

25. Cultural Understanding Using In-context Learning and Masked Language Modeling

Author: Charles Newton, Davis Qian, and Ming Qian
Subjects: Descriptive knowledge, Computer science, business.industry, Deep learning, Variety (linguistics), computer.software_genre, Schema (psychology), Language technology, Language model, Artificial intelligence, business, computer, Natural language processing, Generative grammar, Gesture
Abstract: With the rapid advancement of natural language processing (NLP) as a sub-field of artificial intelligence (AI), a number of unsupervised pre-trained language models trained on large corpus have become available (e.g. BERT and GPT-3). While these models have tremendous linguistic knowledge, a lot of other types of knowledge are embedded in them as well. We perform cross-culture analysis experiments using AI-based Masked Language Modeling (MLM) and GPT-based Generative Language Modeling (In-context learning modeling). The designed approach is to set up a cultural context in sentences with masked words (for MLM) or in a human-prompted text segment (for GPT-based NLG). Consequently, the predicted masked words or the machine generated stories will reflect measurable intercultural differences because language models are trained on different corpus in different languages, and on English corpus containing a significant amount of knowledge on foreign cultures. We show a variety of examples: geopolitical knowledge, holidays, gestures, customs, social norms, emotion schema, role schema, procedure schema, and emotion change detection based on a diplomatic speech. The deep learning neural network model encodes its knowledge in the weights of a neural network instead of as organized semantic concepts. The model can reflect biases brought in by the training data and can give us inaccurate or faulty answers. Overall, with the rapid advancement of language technology, pre-trained language models have grown more powerful, and have great potential to serve as a culturalization tool.
Published: 2021
Full Text: View/download PDF

26. Deep neural language modeling enables functional protein generation across families

Author: Nikhil Naik, Caiming Xiong, Ali Madani, Zachary Z. Sun, Subu Subramanian, Richard Socher, Jose L. Olmos, Ben Krause, James S. Fraser, Eric R. Greene, Benjamin P. Mohr, and James M. Holton
Subjects: chemistry.chemical_classification, Protein family, business.industry, Deep learning, Computational biology, Biology, Amino acid, White (mutation), chemistry, Chorismate mutase, Identity (object-oriented programming), Artificial intelligence, Language model, business, Natural language
Abstract: Bypassing nature’s evolutionary trajectory,de novoprotein generation—defined as creating artificial protein sequences from scratch—could enable breakthrough solutions for biomedical and environmental challenges. Viewing amino acid sequences as a language, we demonstrate that a deep learning-based language model can generate functional artificial protein sequences across families, akin to generating grammatically and semantically correct natural language sentences on diverse topics. Our protein language model is trained by simply learning to predict the next amino acid for over 280 million protein sequences from thousands of protein families, without biophysical or coevolutionary modeling. We experimentally evaluate model-generated artificial proteins on five distinct antibacterial lysozyme families. Artificial proteins show similar activities and catalytic efficiencies as representative natural lysozymes, including hen egg white lysozyme, while reaching as low as 44% identity to any known naturally-evolved protein. The X-ray crystal structure of an enzymatically active artificial protein recapitulates the conserved fold and positioning of active site residues found in natural proteins. We demonstrate our language model’s ability to be adapted to different protein families by accurately predicting the functionality of artificial chorismate mutase and malate dehydrogenase proteins. These results indicate that neural language models successfully performde novoprotein generation across protein families and may prove to be a tool to shortcut evolution.
Published: 2021
Full Text: View/download PDF

27. Sequential estimation techniques and application to multiple speaker tracking and language modeling

Author: Oualil, Youssef and Klakow, Dietrich
Subjects: language modeling, microphone arrays, deep learning, Bayesian estimation, neural networks, machine learning, multiple speaker tracking, ddc:400, ddc:510, ddc:620, signal processing
Abstract: For many real-word applications, the considered data is given as a time sequence that becomes available in an orderly fashion, where the order incorporates important information about the entities of interest. The work presented in this thesis deals with two such cases by introducing new sequential estimation solutions. More precisely, we introduce a: I. Sequential Bayesian estimation framework to solve the multiple speaker localization, detection and tracking problem. This framework is a complete pipeline that includes 1) new observation estimators, which extract a fixed number of potential locations per time frame; 2) new unsupervised Bayesian detectors, which classify these estimates into noise/speaker classes and 3) new Bayesian filters, which use the speaker class estimates to track multiple speakers. This framework was developed to tackle the low overlap detection rate of multiple speakers and to reduce the number of constraints generally imposed in standard solutions. II. Sequential neural estimation framework for language modeling, which overcomes some of the shortcomings of standard approaches through merging of different models in a hybrid architecture. That is, we introduce two solutions that tightly merge particular models and then show how a generalization can be achieved through a new mixture model. In order to speed-up the training of large vocabulary language models, we introduce a new extension of the noise contrastive estimation approach to batch training. Bei vielen Anwendungen kommen Daten als zeitliche Sequenz vor, deren Reihenfolge wichtige Informationen über die betrachteten Entitäten enthält. In der vorliegenden Arbeit werden zwei derartige Fälle bearbeitet, indem neue sequenzielle Schätzverfahren eingeführt werden: I. Ein Framework für ein sequenzielles bayessches Schätzverfahren zur Lokalisation, Erkennung und Verfolgung mehrerer Sprecher. Es besteht aus 1) neuen Beobachtungsschätzern, welche pro Zeitfenster eine bestimmte Anzahl möglicher Aufenthaltsorte bestimmen; 2) neuen, unüberwachten bayesschen Erkennern, die diese Abschätzungen nach Sprechern/Rauschen klassifizieren und 3) neuen bayesschen Filtern, die Schätzungen aus der Sprecher-Klasse zur Verfolgung mehrerer Sprecher verwenden. Dieses Framework wurde speziell zur Verbesserung der i.A. niedrigen Erkennungsrate bei gleichzeitig Sprechenden entwickelt und benötigt weniger Randbedingungen als Standardlösungen. II. Ein sequenzielles neuronales Vorhersageframework für Sprachmodelle, das einige Nachteile von Standardansätzen durch das Zusammenführen verschiedener Modelle in einer Hybridarchitektur beseitigt. Konkret stellen wir zwei Lösungen vor, die bestimmte Modelle integrieren, und leiten dann eine Verallgemeinerung durch die Verwendung eines neuen Mischmodells her. Um das Trainieren von Sprachmodellen mit sehr großem Vokabular zu beschleunigen, wird eine Erweiterung des rauschkontrastiven Schätzverfahrens für Batch-Training vorgestellt.
Published: 2017
Full Text: View/download PDF

28. The Go Transformer: Natural Language Modeling for Game Play

Author: David Noever, Josh Kalin, and Matthew Ciolino
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, 0209 industrial biotechnology, Computer Science - Computation and Language, business.industry, Computer science, Deep learning, ComputingMilieux_PERSONALCOMPUTING, 02 engineering and technology, Machine Learning (cs.LG), Visualization, 020901 industrial engineering & automation, Human–computer interaction, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Language model, Artificial intelligence, Championship, business, Computation and Language (cs.CL), Natural language, Transformer (machine learning model)
Abstract: This work applies natural language modeling to generate plausible strategic moves in the ancient game of Go. We train the Generative Pretrained Transformer (GPT-2) to mimic the style of Go champions as archived in Smart Game Format (SGF), which offers a text description of move sequences. The trained model further generates valid but previously unseen strategies for Go. Because GPT-2 preserves punctuation and spacing, the raw output of the text generator provides inputs to game visualization and creative patterns, such as the Sabaki project's game engine using auto-replays. Results demonstrate that language modeling can capture both the sequencing format of championship Go games and their strategic formations. Compared to random game boards, the GPT-2 fine-tuning shows efficient opening move sequences favoring corner play over less advantageous center and side play. Game generation as a language modeling task offers novel approaches to more than 40 other board games where historical text annotation provides training data (e.g., Amazons & Connect 4/6)., 8 Pages, 5 Figures, 1 Table, IEEE Format, Ai4i 2020
Published: 2020
Full Text: View/download PDF

29. Transformer-based temporal sequence learners for arrhythmia classification.

Author: Varghese A, Kamal S, and Kurian J
Subjects: Humans, Datasets as Topic, Arrhythmias, Cardiac classification, Arrhythmias, Cardiac diagnosis, Electrocardiography, Deep Learning
Abstract: An electrocardiogram (ECG) plays a crucial role in identifying and classifying cardiac arrhythmia. Traditional methods employ handcrafted features, and more recently, deep learning methods use convolution and recursive structures to classify heart signals. Considering the time sequence nature of the ECG signal, a transformer-based model with its high parallelism is proposed to classify ECG arrhythmia. The DistilBERT transformer model, pre-trained for natural language processing tasks, is used in the proposed work. The signals are denoised and then segmented around the R peak and oversampled to get a balanced dataset. The input embedding step is skipped, and only positional encoding is done. The final probabilities are obtained by adding a classification head to the transformer encoder output. The experiments on the MIT-BIH dataset show that the suggested model is excellent in classifying various arrhythmias. The model achieved 99.92% accuracy, 0.99 precision, sensitivity, and F1 score on the augmented dataset with a ROC-AUC score of 0.999., (© 2023. International Federation for Medical and Biological Engineering.)
Published: 2023
Full Text: View/download PDF

30. Diagnosing the benign paroxysmal positional vertigo via 1D and deep-learning composite model.

Author: Wu P, Liu X, Dai Q, Yu J, Zhao J, Yu F, Liu Y, Gao Y, Li H, and Li W
Subjects: Humans, Benign Paroxysmal Positional Vertigo diagnosis, Artificial Intelligence, Semicircular Canals, Deep Learning, Nystagmus, Pathologic etiology
Abstract: Background: Benign Paroxysmal Positional Vertigo (BPPV) is the leading cause of vertigo, and its characteristic nystagmus induced by positional maneuvers makes it a good model for Artificial Intelligence (AI) diagnosis. However, during the testing procedure, up to 10 min of indivisible long-range temporal correlation data are produced, making the AI-informed real-time diagnosing unlikely in clinical practice., Methods: A combined 1D and Deep-Learning (DL) composite model was proposed. Two separate cohorts were recruited, with one for model generation and the other for evaluation of model's real-world generalizability. Eight features, including two head traces and three eye traces and their corresponding slow phase velocity (SPV) value, were served as the inputs. Three candidate models were tested, and a sensitivity study was conducted to determine the saliently important features., Results: The study included 2671 patients in the training cohort and 703 in the test cohort. A hybrid DL model achieved a micro-area under the receiver operating curve (AUROC) of 0.982 (95% CI 0.965, 0.994) and macro-AUROC of 0.965 (95% CI 0.898, 0.999) for overall classification. The highest accuracy was observed for right posterior BPPV, with an AUROC of 0.991 (95% CI 0.972, 1.000), followed by left posterior BPPV, with an AUROC of 0.979 (95% CI 0.940, 0.998), the lowest AUROC was 0.928 (95% CI 0.878, 0.966) for lateral BPPV. The SPV was consistently identified as the most predictive feature in the models. If the model process is carried out 100 times for a 10-min data, one single running takes 0.79 ± 0.06 s., Conclusion: This study designed DL models which can accurately detect and categorize the subtype of BPPV, enabling a quick and straightforward diagnosis of BPPV in clinical setting. The critical feature identified in the model helps expand our understanding of this disorder., (© 2023. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany.)
Published: 2023
Full Text: View/download PDF

31. 6VecLM: Language Modeling in Vector Space for IPv6 Target Generation

Author: Gaopeng Gou, Gang Xiong, Tianyu Cui, Wei Xia, and Junzheng Shi
Subjects: Sequence, IPv6 address, Address space, Computer science, business.industry, Deep learning, 020206 networking & telecommunications, 02 engineering and technology, Machine learning, computer.software_genre, Field (computer science), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Language model, Artificial intelligence, business, computer, Word (computer architecture), Transformer (machine learning model)
Abstract: Fast IPv6 scanning is challenging in the field of network measurement as it requires exploring the whole IPv6 address space but limited by current computational power. Researchers propose to obtain possible active target candidate sets to probe by algorithmically analyzing the active seed sets. However, IPv6 addresses lack semantic information and contain numerous addressing schemes, leading to the difficulty of designing effective algorithms. In this paper, we introduce our approach 6VecLM to explore achieving such target generation algorithms. The architecture can map addresses into a vector space to interpret semantic relationships and uses a Transformer network to build IPv6 language models for predicting address sequence. Experiments indicate that our approach can perform semantic classification on address space. By adding a new generation approach, our model possesses a controllable word innovation capability compared to conventional language models. The work outperformed the state-of-the-art target generation algorithms on two active address datasets by reaching more quality candidate sets.
Published: 2021
Full Text: View/download PDF

32. Deep Learning for Language Modeling of the Croatian Language

Author: Vidak, Sven and Šnajder, Jan
Subjects: optical character recognition correction, ispravljanje optičkog raspoznavanja znakova, language modeling, obrada prirodnog jezika, povratne neuronske mreže, geneiranje teksta, natural lanugage processing, TEHNIČKE ZNANOSTI. Računarstvo, deep learning, TECHNICAL SCIENCES. Computing, text generation, recurrent neural network, duboko učenje, jezično modeliranje
Abstract: Jezično modeliranje jedan je od osnovnih problema unutar područja obrade prirodnog jezika. Sposobnost računala da samostalno generira ili razumije tekst omogućava da mobitelom ili računalom upravljavmo pomoću glasa, ali i da komu- niciramo s njima. Pojavom dubokog učenja i povećanjem interesa znanstvenika za područje neuronskih mreža došlo je do razvoja metoda koje su omogućile izni- man napredak na području jezičnog modeliranja, ali i drugih područja. U okviru ovog rada proučeni su tipovi neuronskih mreža koje se koriste za problem je- zičnog modeliranja, načini njihovog učenja i neke naprednije metode koje daju danas najbolje poznate rezultate. Dana je usporedba tih metoda s klasičnim me- todama jezičnog modeliranja koje se temelje ne statistici te su opisani problemi koje te metode imaju i dani su savjeti za rješavanje tih problema. Naučeni modeli primijenjeni su na problem samostalnog generiranja teksta te ispravljanja krivo prepoznatog teksta tokom postupka optičkog prepoznavanja. Dobiveni rezultati su očekivani, ali svakako mogu biti bolji ako se uloži više vremena u proces učenja neuronskih mreža. Language modeling is one of the basic problem in natural language processing. Ability of the computers to generate or understand a text (or a sound) enables us to communicate with them or to instruct them to achieve some goal. Deep learning and greater interest in neural networks led to the development of new and efficient techniques that achieve significant progress in language modeling and other areas. This thesis gave an overview of neural networks used for language modeling problems, algorithms used to train them and some advanced techniques that give state of the art results. Comparison of neural networks and standard language modeling techbiques is givven and it is also shown that neural networks have some problems that can be solved using advanced models. Learnt models are used for text generation and optical character recognition correction. Obtained results are expected, although not best possible which can be achieved with more training time.
Published: 2016

33. AI Sensing for Robotics using Deep Learning based Visual and Language Modeling

Author: Kameshwar Rao Jv and yuvaram singh
Subjects: Closed captioning, business.industry, Computer science, Deep learning, Robotics, Language model, Artificial intelligence, Modular design, Autonomous system (mathematics), Semantics, business, Convolutional neural network
Abstract: An artificial intelligence(AI) system should be capable of processing the sensory inputs to extract both task-specific and general information about its environment. However, most of the existing algorithms extract only task specific information. In this work, an innovative approach to address the problem of processing visual sensory data is presented by utilizing convolutional neural network (CNN). It recognizes and represents the physical and semantic nature of the surrounding in both human readable and machine processable format. This work utilizes the image captioning model to capture the semantics of the input image and a modular design to generate a probability distribution for semantic topics. It gives any autonomous system the ability to process visual information in a human-like way and generates more insights which are hardly possible with a conventional algorithm. Here a model and data collection method are proposed.
Published: 2020
Full Text: View/download PDF

34. New achievement of DeepO Band: Making Santour music by artificial intelligence using language modeling and deep learning

Author: Ali Olyaei Torqabeh, Amir Hossein Sedghi, and Mohammad Hasan Olyaei Torqabeh
Subjects: Computer science, business.industry, Deep learning, Language model, Artificial intelligence, business
Published: 2018
Full Text: View/download PDF

35. New achievement of DeepO Band: Composing Persian Modern poetry by Artificial Intelligence Using Language Modeling and Deep Learning

Author: Mohammad Hasan Olyaei Torqabeh, Hosein Olyaei Torqabeh, and Ali Olyaei Torqabeh
Subjects: Poetry, business.industry, Computer science, Deep learning, language, Language model, Artificial intelligence, business, Linguistics, language.human_language, Persian
Published: 2018
Full Text: View/download PDF

36. Cyberbullying Text Identification based on Deep Learning and Transformer-based Language Models

Author: Khalid Saifullah, Muhammad Ibrahim Khan, Suhaima Jamal, and Iqbal H. Sarker
Subjects: Cyberbullying, large language modeling, deep learning, transformers models, natural language processing, NLP, Computer engineering. Computer hardware, TK7885-7895, Systems engineering, TA168
Abstract: In the contemporary digital age, social media platforms like Facebook, Twitter, and YouTube serve as vital channels for individuals to express ideas and connect with others. Despite fostering increased connectivity, these platforms have inadvertently given rise to negative behaviors, particularly cyberbullying. While extensive research has been conducted on high-resource languages such as English, there is a notable scarcity of resources for low-resource languages like Bengali, Arabic, Tamil, etc., particularly in terms of language modeling. This study addresses this gap by developing a cyberbullying text identification system called BullyFilterNeT tailored for social media texts, considering Bengali as a test case. The intelligent BullyFilterNeT system devised overcomes Out-of-Vocabulary (OOV) challenges associated with non-contextual embeddings and addresses the limitations of context-aware feature representations. To facilitate a comprehensive understanding, three non-contextual embedding models GloVe, FastText, and Word2Vec are developed for feature extraction in Bengali. These embedding models are utilized in the classification models, employing three statistical models (SVM, SGD, Libsvm), and four deep learning models (CNN, VDCNN, LSTM, GRU). Additionally, the study employs six transformer-based language models: mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT, respectively to overcome the limitations of earlier models. Remarkably, BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, underscoring its effectiveness in cyberbullying text identification in the Bengali language.
Published: 2024
Full Text: View/download PDF

37. A neural document language modeling framework for spoken document retrieval

Author: Li-Phen Yen, Kuan-Yu Chen, and Zhen-Yu Wu
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, business.industry, Computer science, Deep learning, Context (language use), Subject (documents), 02 engineering and technology, 010501 environmental sciences, computer.software_genre, 01 natural sciences, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), Question answering, Language model, Artificial intelligence, Document retrieval, business, Baseline (configuration management), Computation and Language (cs.CL), computer, Natural language processing, 0105 earth and related environmental sciences
Abstract: Recent developments in deep learning have led to a significant innovation in various classic and practical subjects, including speech recognition, computer vision, question answering, information retrieval and so on. In the context of natural language processing (NLP), language representations learned by referring to autoregressive language modeling or autoencoding have shown giant successes in many downstream tasks, so the school of studies have become a major stream of research recently. Because the immenseness of multimedia data along with speech have spread around the world in our daily life, spoken document retrieval (SDR), which aims at retrieving relevant multimedia contents to satisfy users’ queries, has become an important research subject in the past decades. Targeting on enhancing the SDR performance, the paper concentrates on proposing a neural retrieval framework, which assembles the merits of using language modeling (LM) mechanism in SDR and leveraging the abstractive information learned by the language representation models. Consequently, to our knowledge, this is a pioneer study on supervised training of a neural LM-based SDR framework, especially combined with the pretrained language representation methods. A series of empirical SDR experiments conducted on a benchmark collection demonstrate the good efficacy of the proposed framework, compared to several existing strong baseline systems.
Published: 2019

38. Progress in Neural Network Based Statistical Language Modeling

Author: Anup Shrikant Kunte and Vahida Attar
Subjects: Class (computer programming), Artificial neural network, Machine translation, business.industry, Deep learning, computer.software_genre, Field (computer science), Recurrent neural network, Rule-based machine translation, Language model, Artificial intelligence, business, computer, Natural language processing
Abstract: Statistical Language Modeling (LM) is one of the central steps in many Natural Language Processing (NLP) tasks including Automatic Speech recognition (ASR), Statistical Machine Translation (SMT) , Sentence completion, Automatic Text Generation to name a few. Good Quality Language Model has been one of the key success factors for many commercial NLP applications. Since past three decades diverse research communities like psychology, neuroscience, data compression, machine translation, speech recognition, linguistics etc, have advanced research in the field of Language Modeling. First we understand the mathematical background of LM problem. Further we review various Neural Network based LM techniques in the order they were developed. We also review recent developments in Recurrent Neural Network (RNN) Based Language Models. Early LM research in ASR gave rise to commercially successful class of LMs called as N-gram LMs. These class of models were purely statistical based and lacked in utilising the linguistic information present in the text itself. With the advancement in the computing power, availability of humongous and rich sources of textual data Neural Network based LM paved their way into the arena. These techniques proved significant, since they mapped word tokens into a continuous space than treating them as discrete. As NNLM performance was proved to be comparable to existing state of the art N-gram LMs researchers also successfully used Deep Neural Network to LM. Researchers soon realised that the inherent sequential nature of textual input make LM problem a good Candidate for use of Recurrent Neural Network (RNN) architecture. Today RNN is the choice of Neural Architecture to solve LM by most practitioners. This chapter sheds light on variants of Neural Network Based LMs.
Published: 2019
Full Text: View/download PDF

39. Deep Learning Language Modeling Workloads: Where Time Goes on Graphics Processors

Author: Zissis Poulos, Andreas Moshovos, and Ali Hadi Zadeh
Subjects: Vocabulary, Computer science, business.industry, Computation, media_common.quotation_subject, Deep learning, 05 social sciences, Parallel computing, 010501 environmental sciences, 01 natural sciences, Execution time, 0502 economics and business, Softmax function, Language model, Artificial intelligence, 050207 economics, Graphics, business, 0105 earth and related environmental sciences, media_common, Transformer (machine learning model)
Abstract: Language Modeling is at the core of many natural language processing tasks. We analyze two such recent models: a Gated Convolutional Network (GCN) with five layers on the Wikitext-2 dataset and a Transformer network with 24 layers on the Google Billion Word dataset. We find that when executed on modern graphics processors, 30% - 40% of the execution time is due to the final adaptive softmax layer. Analytical modeling of the computation and memory demands of the GCN shows that this behavior will persist even if the hidden state is increased - which could be needed to improve accuracy or to support a wider vocabulary. We present variations of the adaptive softmax layer that reduce execution time for the layer by 40% and that scale better with the hidden state.
Published: 2019
Full Text: View/download PDF

40. Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

Author: Artem M. Grachev, Maxim Kodryan, Dmitry I. Ignatov, and Dmitry Vetrov
Subjects: 0209 industrial biotechnology, Artificial neural network, Computer science, business.industry, Deep learning, Inference, 02 engineering and technology, Machine learning, computer.software_genre, Reduction (complexity), 020901 industrial engineering & automation, Recurrent neural network, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Relevance (information retrieval), Artificial intelligence, Language model, business, Encoder, computer
Abstract: Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in the input and output layers is often excessive. We also show that DSVI-ARD can be applied together with encoder-decoder weight tying allowing to achieve even better sparsity and performance. Our experiments demonstrate that more than 90% of the weights in both encoder and decoder layers can be removed with a minimal quality loss.
Published: 2019
Full Text: View/download PDF

41. On the Predictive Power of Neural Language Models for Human Real-TimeComprehension Behavior

Author: Wilcox, Ethan G., Gauthier, Jon, Hu, Jennifer, Qian, Peng, and Levy, Roger P.
Subjects: Language modeling, real-time language compre-hension, Deep learning, eye-tracking, self-paced reading
Abstract: Human reading behavior is tuned to the statistics of natural lan-guage: the time it takes human subjects to read a word can bepredicted from estimates of the word’s probability in context.However, it remains an open question what computational ar-chitecture best characterizes the expectations deployed in realtime by humans that determine the behavioral signatures ofreading. Here we test over two dozen models, independentlymanipulating computational architecture and training datasetsize, on how well their next-word expectations predict humanreading time behavior on naturalistic text corpora. Consistentwith previous work, we find that across model architecturesand training dataset sizes the relationship between word log-probability and reading time is (near-)linear. We next evalu-ate how features of these models determine their psychometricpredictive power, or ability to predict human reading behav-ior. In general, the better a model’s next-word expectations(as measured by the traditional language modeling perplexityobjective), the better its psychometric predictive power. How-ever, we find nontrivial differences in psychometric predictivepower across model architectures. For any given perplexity,deep Transformer models and n-gram models generally showsuperior psychometric predictive power over LSTM or struc-turally supervised neural models, especially for eye movementdata. Finally, we compare models’ psychometric predictivepower to the depth of their syntactic knowledge, as measuredby a battery of syntactic generalization tests developed usingmethods from controlled psycholinguistic experiments. Onceperplexity is controlled for, we find no significant relationshipbetween syntactic knowledge and predictive power. These re-sults suggest that, at least for the present state of natural lan-guage technology, different approaches may be required to bestmodel human real-time language comprehension behavior innaturalistic reading versus behavior for controlled linguisticmaterials designed for targeted probing of syntactic knowl-edge.
Published: 2020

42. Language Modeling at Scale

Author: Jiaji Huang, Kenneth Church, Md. Mostofa Ali Patwary, Milind Chabbi, Gregory Diamos, and Heewoo Jun
Subjects: FOS: Computer and information sciences, 020203 distributed computing, Training set, Theoretical computer science, Computer Science - Computation and Language, Machine translation, Zipf's law, Scale (ratio), Computer science, business.industry, Deep learning, 02 engineering and technology, computer.software_genre, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Language model, Artificial intelligence, business, Computation and Language (cs.CL), computer, Word (computer architecture), Natural language
Abstract: We show how Zipf's Law can be used to scale up language modeling (LM) to take advantage of more training data and more GPUs. LM plays a key role in many important natural language applications such as speech recognition and machine translation. Scaling up LM is important since it is widely accepted by the community that there is no data like more data. Eventually, we would like to train on terabytes (TBs) of text (trillions of words). Modern training methods are far from this goal, because of various bottlenecks, especially memory (within GPUs) and communication (across GPUs). This paper shows how Zipf's Law can address these bottlenecks by grouping parameters for common words and character sequences, because $U \ll N$, where $U$ is the number of unique words (types) and $N$ is the size of the training set (tokens). For a local batch size $K$ with $G$ GPUs and a $D$-dimension embedding matrix, we reduce the original per-GPU memory and communication asymptotic complexity from $\Theta(GKD)$ to $\Theta(GK + UD)$. Empirically, we find $U \propto (GK)^{0.64}$ on four publicly available large datasets. When we scale up the number of GPUs to 64, a factor of 8, training time speeds up by factors up to 6.7$\times$ (for character LMs) and 6.3$\times$ (for word LMs) with negligible loss of accuracy. Our weak scaling on 192 GPUs on the Tieba dataset shows a 35\% improvement in LM prediction accuracy by training on 93 GB of data (2.5$\times$ larger than publicly available SOTA dataset), but taking only 1.25$\times$ increase in training time, compared to 3 GB of the same dataset running on 6 GPUs.
Published: 2018

43. A Method for Sharing Cell State for LSTM-Based Language Model

Author: Park, Seongik, Kim, Yanggon, Kacprzyk, Janusz, Series Editor, and Lee, Roger, editor
Published: 2020
Full Text: View/download PDF

44. Large Scale Language Modeling: Converging on 40GB of Text in Four Hours

Author: Nikolai Yakovenko, Robert M. Kirby, Bryan Catanzaro, and Raul Puri
Subjects: FOS: Computer and information sciences, Schedule, Computer Science - Machine Learning, Computer science, Machine Learning (stat.ML), 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Convolutional neural network, Machine Learning (cs.LG), CUDA, Statistics - Machine Learning, 0202 electrical engineering, electronic engineering, information engineering, 0105 earth and related environmental sciences, Computer Science - Computation and Language, Artificial neural network, business.industry, Deep learning, Recurrent neural network, Scalability, 020201 artificial intelligence & image processing, Artificial intelligence, Language model, business, Computation and Language (cs.CL), computer
Abstract: Recent work has shown how to train Convolutional Neural Networks (CNNs) rapidly on large image datasets, then transfer the knowledge gained from these models to a variety of tasks. Following [Radford 2017], in this work, we demonstrate similar scalability and transfer for Recurrent Neural Networks (RNNs) for Natural Language tasks. By utilizing mixed precision arithmetic and a 32k batch size distributed across 128 NVIDIA Tesla V100 GPUs, we are able to train a character-level 4096-dimension multiplicative LSTM (mLSTM) for unsupervised text reconstruction over 3 epochs of the 40 GB Amazon Reviews dataset in four hours. This runtime compares favorably with previous work taking one month to train the same size and configuration for one epoch over the same dataset. Converging large batch RNN models can be challenging. Recent work has suggested scaling the learning rate as a function of batch size, but we find that simply scaling the learning rate as a function of batch size leads either to significantly worse convergence or immediate divergence for this problem. We provide a learning rate schedule that allows our model to converge with a 32k batch size. Since our model converges over the Amazon Reviews dataset in hours, and our compute requirement of 128 Tesla V100 GPUs, while substantial, is commercially available, this work opens up large scale unsupervised NLP training to most commercial applications and deep learning researchers. A model can be trained over most public or private text datasets overnight., 8 pages; To appear in High Performance Machine Learning Workshop (HPML) 2018
Published: 2018

45. Text Generation, Machine Translation, and Other Recurrent Language Modeling Tasks

Author: Taweh Beysolow
Subjects: Machine translation, Computer science, business.industry, Deep learning, Text generation, Language model, Artificial intelligence, computer.software_genre, business, computer, Natural language processing, Word (computer architecture)
Abstract: In Chapter 4, I introduced you to some of the more advanced deep learning and NLP techniques, and I discussed how to implement these models in some basic problems, such as mapping word vectors. Before we conclude this book, I will discuss a handful of other NLP tasks that are more domain-specific, but nonetheless useful to go through.
Published: 2018
Full Text: View/download PDF

46. From Feedforward to Recurrent LSTM Neural Networks for Language Modeling

Author: Ralf Schlüter, Martin Sundermeyer, and Hermann Ney
Subjects: Acoustics and Ultrasonics, Artificial neural network, business.industry, Computer science, Time delay neural network, Speech recognition, Deep learning, Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing), Machine learning, computer.software_genre, Computational Mathematics, Probabilistic neural network, Recurrent neural network, Computer Science (miscellaneous), Feedforward neural network, Language model, Artificial intelligence, Electrical and Electronic Engineering, Types of artificial neural networks, business, computer
Abstract: Language models have traditionally been estimated based on relative frequencies, using count statistics that can be extracted from huge amounts of text data. More recently, it has been found that neural networks are particularly powerful at estimating probability distributions over word sequences, giving substantial improvements over state-of-the-art count models. However, the performance of neural network language models strongly depends on their architectural structure. This paper compares count models to feedforward, recurrent, and long short-term memory (LSTM) neural network variants on two large-vocabulary speech recognition tasks. We evaluate the models in terms of perplexity and word error rate, experimentally validating the strong correlation of the two quantities, which we find to hold regardless of the underlying type of the language model. Furthermore, neural networks incur an increased computational complexity compared to count models, and they differently model context dependences, often exceeding the number of words that are taken into account by count based approaches. These differences require efficient search methods for neural networks, and we analyze the potential improvements that can be obtained when applying advanced algorithms to the rescoring of word lattices on large-scale setups.
Published: 2015
Full Text: View/download PDF

47. Patterns Versus Characters in Subword-Aware Neural Language Modeling

Author: Rustem Takhanov and Zhenisbek Assylbekov
Subjects: Conditional random field, Root (linguistics), Perplexity, business.industry, Computer science, Deep learning, 05 social sciences, 010501 environmental sciences, computer.software_genre, 01 natural sciences, Prefix, Character (mathematics), 0502 economics and business, Word representation, Artificial intelligence, Language model, 050207 economics, Alphabet, business, computer, Natural language, Word (computer architecture), Natural language processing, 0105 earth and related environmental sciences
Abstract: Words in some natural languages can have a composite structure. Elements of this structure include the root (that could also be composite), prefixes and suffixes with which various nuances and relations to other words can be expressed. Thus, in order to build a proper word representation one must take into account its internal structure. From a corpus of texts we extract a set of frequent subwords and from the latter set we select patterns, i.e. subwords which encapsulate information on character n-gram regularities. The selection is made using the pattern-based Conditional Random Field model [19, 23] with $l_1$ regularization. Further, for every word we construct a new sequence over an alphabet of patterns. The new alphabet’s symbols confine a local statistical context stronger than the characters, therefore they allow better representations in ${\mathbb {R}}^n$ and are better building blocks for word representation. In the task of subword-aware language modeling, pattern-based models outperform character-based analogues by 2–20 perplexity points. Also, a recurrent neural network in which a word is represented as a sum of embeddings of its patterns is on par with a competitive and significantly more sophisticated character-based convolutional architecture.
Published: 2017
Full Text: View/download PDF

48. Highly Efficient and Accurate Deep Learning-Based Classification of MRI Contrast on a CPU and GPU.

Author: Gai ND
Subjects: Humans, Image Processing, Computer-Assisted methods, Magnetic Resonance Imaging, Neural Networks, Computer, Deep Learning
Abstract: Classifying MR images based on their contrast mechanism can be useful in image segmentation where additional information from different contrast mechanisms can improve intensity-based segmentation and help separate the class distributions. In addition, automated processing of image type can be beneficial in archive management, image retrieval, and staff training. Different clinics and scanners have their own image labeling scheme, resulting in ambiguity when sorting images. Manual sorting of thousands of images would be a laborious task and prone to error. In this work, we used the power of transfer learning to modify pretrained residual convolution neural networks to classify MRI images based on their contrast mechanisms. Training and validation were performed on a total of 5169 images belonging to 10 different classes and from different MRI vendors and field strengths. Time for training and validation was 36 min. Testing was performed on a different data set with 2474 images. Percentage of correctly classified images (accuracy) was 99.76%. (A deeper version of the residual network was trained for 103 min and showed slightly lower accuracy of 99.68%.) In consideration of model deployment in the real world, performance on a single CPU computer was compared with GPU implementation. Highly accurate classification, training, and testing can be achieved without use of a GPU in a relatively short training time, through proper choice of a convolutional neural network and hyperparameters, making it feasible to improve accuracy by repeated training with cumulative training sets. Techniques to improve accuracy further are discussed and demonstrated. Derived heatmaps indicate areas of image used in decision making and correspond well with expert human perception. The methods used can be easily extended to other classification tasks with minimal changes., (© 2022. This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply.)
Published: 2022
Full Text: View/download PDF

49. Combining deep learning and language modeling for segmentation-free OCR from raw pixels

Author: Prem Natarajan, Stephen Rawls, Ekraam Sabir, and Huaigu Cao
Subjects: Artificial neural network, business.industry, Computer science, Speech recognition, Deep learning, Feature extraction, Pattern recognition, Image segmentation, Sliding window protocol, Segmentation, Artificial intelligence, Language model, business, Hidden Markov model
Abstract: We present a simple yet effective LSTM-based approach for recognizing machine-print text from raw pixels. We use a fully-connected feed-forward neural network for feature extraction over a sliding window, the output of which is directly fed into a stacked bi-directional LSTM. We train the network using the CTC objective function and use a WFST language model during recognition. Experimental results show that this simple system outperforms extensively tuned state-of-the-art HMM models on the DARPA Arabic Machine Print corpus.
Published: 2017
Full Text: View/download PDF

50. Recurrent Neural Network based language modeling with controllable external Memory

Author: Bo-Hsiang Tseng, Hung-yi Lee, and Wei-Jen Ko
Subjects: Artificial neural network, Computer science, business.industry, Time delay neural network, Speech recognition, Deep learning, 020207 software engineering, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Memory controller, Recurrent neural network, 0202 electrical engineering, electronic engineering, information engineering, Feedforward neural network, Artificial intelligence, Language model, Types of artificial neural networks, business, Auxiliary memory, 0105 earth and related environmental sciences
Abstract: It is crucial for language models to model long-term dependency in word sequences, which can be achieved to some good extent by recurrent neural network (RNN) based language models with long short-term memory (LSTM) units. To accurately model the sophisticated long-term information in human languages, large memory in language models is necessary. However, the size of RNN-based language models cannot be arbitrarily increased because the computational resources required and the model complexity will also be increase accordingly, due to the limitation of the structure. To overcome this problem, inspired from Neural Turing Machine and Memory Network, we equip RNN-based language models with controllable external memory. With a learnable memory controller, the size of the external memory is independent to the number of model parameters, so the proposed language model can have larger memory without increasing the parameters. In the experiments, the proposed model yielded lower perplexities than RNN-based language models with LSTM units on both English and Chinese corpora.
Published: 2017
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

2,653 results on '"Language Modeling"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources