"Language Modeling" / Publication Year Range: Last 10 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Language Modeling"' showing total 30,112 results

Start Over "Language Modeling" Publication Year Range Last 10 years

30,112 results on '"Language Modeling"'

201. Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling

Author: Sicherman, Amitay and Adi, Yossi
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This work profoundly analyzes discrete self-supervised speech representations (units) through the eyes of Generative Spoken Language Modeling (GSLM). Following the findings of such an analysis, we propose practical improvements to the discrete unit for the GSLM. First, we start comprehending these units by analyzing them in three axes: interpretation, visualization, and resynthesis. Our analysis finds a high correlation between the speech units to phonemes and phoneme families, while their correlation with speaker or gender is weaker. Additionally, we found redundancies in the extracted units and claim that one reason may be the units' context. Following this analysis, we propose a new, unsupervised metric to measure unit redundancies. Finally, we use this metric to develop new methods that improve the robustness of units' clustering and show significant improvement considering zero-resource speech metrics such as ABX. Code and analysis tools are available under the following link: https://github.com/slp-rl/SLM-Discrete-Representations, Comment: Accepted at ICASSP 2023
Published: 2023
Full Text: View/download PDF

202. The Observed T Cell Receptor Space database enables paired-chain repertoire mining, coherence analysis, and language modeling

Author: Raybould, Matthew I.J., Greenshields-Watson, Alexander, Agarwal, Parth, Aguilar-Sanjuan, Broncio, Olsen, Tobias H., Turnbull, Oliver M., Quast, Nele P., and Deane, Charlotte M.
Published: 2024
Full Text: View/download PDF

203. Sentiment-based masked language modeling for improving sentence-level valence–arousal prediction

Author: Wu, Jheng-Long and Chung, Wei-Yi
Published: 2022
Full Text: View/download PDF

204. Hungry Hungry Hippos: Towards Language Modeling with State Space Models

Author: Fu, Daniel Y., Dao, Tri, Saab, Khaled K., Thomas, Armin W., Rudra, Atri, and Ré, Christopher
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly linearly in sequence length instead of quadratically, SSMs are still slower than Transformers due to poor hardware utilization. In this paper, we make progress on understanding the expressivity gap between SSMs and attention in language modeling, and on reducing the hardware barrier between SSMs and attention. First, we use synthetic language modeling tasks to understand the gap between SSMs and attention. We find that existing SSMs struggle with two capabilities: recalling earlier tokens in the sequence and comparing tokens across the sequence. To understand the impact on language modeling, we propose a new SSM layer, H3, that is explicitly designed for these abilities. H3 matches attention on the synthetic languages and comes within 0.4 PPL of Transformers on OpenWebText. Furthermore, a hybrid 125M-parameter H3-attention model that retains two attention layers surprisingly outperforms Transformers on OpenWebText by 1.0 PPL. Next, to improve the efficiency of training SSMs on modern hardware, we propose FlashConv. FlashConv uses a fused block FFT algorithm to improve efficiency on sequences up to 8K, and introduces a novel state passing algorithm that exploits the recurrent properties of SSMs to scale to longer sequences. FlashConv yields 2$\times$ speedup on the long-range arena benchmark and allows hybrid language models to generate text 2.4$\times$ faster than Transformers. Using FlashConv, we scale hybrid H3-attention language models up to 2.7B parameters on the Pile and find promising initial results, achieving lower perplexity than Transformers and outperforming Transformers in zero- and few-shot learning on a majority of tasks in the SuperGLUE benchmark., Comment: ICLR 2023 Camera-Ready (Notable-top-25% / Spotlight)
Published: 2022

205. Quantum State Tomography Inspired by Language Modeling

Author: Zhong, Lu, Guo, Chu, and Wang, Xiaoting
Subjects: Quantum Physics
Abstract: Quantum state tomography is an elementary tool to fully characterize an unknown quantum state. As the quantum hardware scales up in size, the standard quantum state tomography becomes increasingly challenging due to its exponentially growing complexity. In this work, we propose a scalable solution by considering state tomography as a language modeling task, where the unknown quantum state is treated as an unknown language, the correlation of the quantum state is interpreted as the semantic information specific to this language, and the measurement outcomes are simply the text instances generated from the language. Based on a customized transformer model from language modeling, we demonstrate that our method can accurately reconstruct prototypical pure and mixed quantum states using less samples than state-of-the-art methods. More importantly, our method can reconstruct a class of similar states simultaneously, in comparison with the existing neural network methods that need to train a model for each unknown state.
Published: 2022

206. Pivotal Role of Language Modeling in Recommender Systems: Enriching Task-specific and Task-agnostic Representation Learning

Author: Shin, Kyuyong, Kwak, Hanock, Kim, Wonjae, Jeong, Jisu, Jung, Seungjae, Kim, Kyung-Min, Ha, Jung-Woo, and Lee, Sang-Woo
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language
Abstract: Recent studies have proposed unified user modeling frameworks that leverage user behavior data from various applications. Many of them benefit from utilizing users' behavior sequences as plain texts, representing rich information in any domain or system without losing generality. Hence, a question arises: Can language modeling for user history corpus help improve recommender systems? While its versatile usability has been widely investigated in many domains, its applications to recommender systems still remain underexplored. We show that language modeling applied directly to task-specific user histories achieves excellent results on diverse recommendation tasks. Also, leveraging additional task-agnostic user histories delivers significant performance benefits. We further demonstrate that our approach can provide promising transfer learning capabilities for a broad spectrum of real-world recommender systems, even on unseen domains and services., Comment: ACL 2023 main conference
Published: 2022

207. ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting

Author: Fang, Shancheng, Mao, Zhendong, Xie, Hongtao, Wang, Yuxin, Yan, Chenggang, and Zhang, Yongdong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Scene text spotting is of great importance to the computer vision community due to its wide variety of applications. Recent methods attempt to introduce linguistic knowledge for challenging recognition rather than pure visual classification. However, how to effectively model the linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting. Firstly, the autonomous suggests enforcing explicitly language modeling by decoupling the recognizer into vision model and language model and blocking gradient flow between both models. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for the language model which can effectively alleviate the impact of noise input. Finally, to polish ABINet++ in long text recognition, we propose to aggregate horizontal features by embedding Transformer units inside a U-Net, and design a position and content attention module which integrates character order and content to attend to character features precisely. ABINet++ achieves state-of-the-art performance on both scene text recognition and scene text spotting benchmarks, which consistently demonstrates the superiority of our method in various environments especially on low-quality images. Besides, extensive experiments including in English and Chinese also prove that, a text spotter that incorporates our language modeling method can significantly improve its performance both in accuracy and speed compared with commonly used attention-based recognizers., Comment: Accepted by TPAMI. Code is available at https://github.com/FangShancheng/ABINet-PP. arXiv admin note: substantial text overlap with arXiv:2103.06495 (conference version)
Published: 2022

208. Audio Language Modeling using Perceptually-Guided Discrete Representations

Author: Kreuk, Felix, Taigman, Yaniv, Polyak, Adam, Copet, Jade, Synnaeve, Gabriel, Défossez, Alexandre, and Adi, Yossi
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In this work, we study the task of Audio Language Modeling, in which we aim at learning probabilistic models for audio that can be used for generation and completion. We use a state-of-the-art perceptually-guided audio compression model, to encode audio to discrete representations. Next, we train a transformer-based causal language model using these representations. At inference time, we perform audio auto-completion by encoding an audio prompt as a discrete sequence, feeding it to the audio language model, sampling from the model, and synthesizing the corresponding time-domain signal. We evaluate the quality of samples generated by our method on Audioset, the largest dataset for general audio to date, and show that it is superior to the evaluated baseline audio encoders. We additionally provide an extensive analysis to better understand the trade-off between audio-quality and language-modeling capabilities. Samples:link.
Published: 2022

209. Cyberbullying text identification: A deep learning and transformer-based language modeling approach

Author: Saifullah, Khalid, Khan, Muhammad Ibrahim, Jamal, Suhaima, Sarker, Iqbal H., Saifullah, Khalid, Khan, Muhammad Ibrahim, Jamal, Suhaima, and Sarker, Iqbal H.
Abstract: In the contemporary digital age, social media platforms like Facebook, Twitter, and YouTube serve as vital channels for individuals to express ideas and connect with others. Despite fostering increased connectivity, these platforms have inadvertently given rise to negative behaviors, particularly cyberbullying. While extensive research has been conducted on high-resource languages such as English, there is a notable scarcity of resources for low-resource languages like Bengali, Arabic, Tamil, etc., particularly in terms of language modeling. This study addresses this gap by developing a cyberbullying text identification system called BullyFilterNeT tailored for social media texts, considering Bengali as a test case. The intelligent BullyFilterNeT system devised overcomes Out-of-Vocabulary (OOV) challenges associated with non-contextual embeddings and addresses the limitations of context-aware feature representations. To facilitate a comprehensive understanding, three non-contextual embedding models GloVe, FastText, and Word2Vec are developed for feature extraction in Bengali. These embedding models are utilized in the classification models, employing three statistical models (SVM, SGD, Libsvm), and four deep learning models (CNN, VDCNN, LSTM, GRU). Additionally, the study employs six transformer-based language models: mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT, respectively to overcome the limitations of earlier models. Remarkably, BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, underscoring its effectiveness in cyberbullying text identification in the Bengali language.
Published: 2024

210. Investigating the Viability of Masked Language Modeling for Symbolic Music Generation in abc-notation

Author: Casini, Luca, Jonason, Nicolas, Sturm, Bob, Casini, Luca, Jonason, Nicolas, and Sturm, Bob
Abstract: The dominating approach for modeling sequences (e.g. text, music) with deep learning is the causal approach, which consists in learning to predict tokens sequentially given those preceding it. Another paradigm is masked language modeling, which consists of learning to predict the masked tokens of a sequence in no specific order, given all non-masked tokens. Both approaches can be used for generation, but the latter is more flexible for editing, e.g. changing the middle of a sequence. This paper investigates the viability of masked language modeling applied to Irish traditional music represented in the text-based format abc-notation. Our model, called abcMLM, enables a user to edit tunes in arbitrary ways while retaining similar generation capabilities to causal models. We find that generation using masked language modeling is more challenging, but leveraging additional information from a dataset, e.g., imputing musical structure, can generate sequences that are on par with previous models., QC 20240604Part of ISBN 978-3-031-56991-3; 978-3-031-56992-0
Published: 2024
Full Text: View/download PDF

211. The Nordic Pile: A 1.2TB Nordic Dataset for Language Modeling

Author: Öhman, Joey, Verlinden, Severine, Ekgren, Ariel, Gyllensten, Amaru Cuba, Isbister, Tim, Gogoulou, Evangelia, Carlsson, Fredrik, and Sahlgren, Magnus
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Pre-training Large Language Models (LLMs) require massive amounts of text data, and the performance of the LLMs typically correlates with the scale and quality of the datasets. This means that it may be challenging to build LLMs for smaller languages such as Nordic ones, where the availability of text corpora is limited. In order to facilitate the development of the LLMS in the Nordic languages, we curate a high-quality dataset consisting of 1.2TB of text, in all of the major North Germanic languages (Danish, Icelandic, Norwegian, and Swedish), as well as some high-quality English data. This paper details our considerations and processes for collecting, cleaning, and filtering the dataset.
Published: 2023

212. Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

Author: Zhang, Ziqiang, Zhou, Long, Wang, Chengyi, Chen, Sanyuan, Wu, Yu, Liu, Shujie, Chen, Zhuo, Liu, Yanqing, Wang, Huaming, Li, Jinyu, He, Lei, Zhao, Sheng, and Wei, Furu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: We propose a cross-lingual neural codec language model, VALL-E X, for cross-lingual speech synthesis. Specifically, we extend VALL-E and train a multi-lingual conditional codec language model to predict the acoustic token sequences of the target language speech by using both the source language speech and the target language text as prompts. VALL-E X inherits strong in-context learning capabilities and can be applied for zero-shot cross-lingual text-to-speech synthesis and zero-shot speech-to-speech translation tasks. Experimental results show that it can generate high-quality speech in the target language via just one speech utterance in the source language as a prompt while preserving the unseen speaker's voice, emotion, and acoustic environment. Moreover, VALL-E X effectively alleviates the foreign accent problems, which can be controlled by a language ID. Audio samples are available at \url{https://aka.ms/vallex}., Comment: We encourage readers to listen to the audio samples on our demo page: \url{https://aka.ms/vallex}
Published: 2023

213. Capturing Topic Framing via Masked Language Modeling

Author: Guo, Xiaobo, Ma, Weicheng, and Vosoughi, Soroush
Subjects: Computer Science - Computation and Language
Abstract: Differential framing of issues can lead to divergent world views on important issues. This is especially true in domains where the information presented can reach a large audience, such as traditional and social media. Scalable and reliable measurement of such differential framing is an important first step in addressing them. In this work, based on the intuition that framing affects the tone and word choices in written language, we propose a framework for modeling the differential framing of issues through masked token prediction via large-scale fine-tuned language models (LMs). Specifically, we explore three key factors for our framework: 1) prompt generation methods for the masked token prediction; 2) methods for normalizing the output of fine-tuned LMs; 3) robustness to the choice of pre-trained LMs used for fine-tuning. Through experiments on a dataset of articles from traditional media outlets covering five diverse and politically polarized topics, we show that our framework can capture differential framing of these topics with high reliability., Comment: In Findings of EMNLP 2022
Published: 2023

214. Context-Aware Differential Privacy for Language Modeling

Author: Dinh, My H. and Fioretto, Ferdinando
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: The remarkable ability of language models (LMs) has also brought challenges at the interface of AI and security. A critical challenge pertains to how much information these models retain and leak about the training data. This is particularly urgent as the typical development of LMs relies on huge, often highly sensitive data, such as emails and chat logs. To contrast this shortcoming, this paper introduces Context-Aware Differentially Private Language Model (CADP-LM) , a privacy-preserving LM framework that relies on two key insights: First, it utilizes the notion of \emph{context} to define and audit the potentially sensitive information. Second, it adopts the notion of Differential Privacy to protect sensitive information and characterize the privacy leakage. A unique characteristic of CADP-LM is its ability to target the protection of sensitive sentences and contexts only, providing a highly accurate private model. Experiments on a variety of datasets and settings demonstrate these strengths of CADP-LM.
Published: 2023

215. Retrieval-Pretrained Transformer: Long-range Language Modeling with Self-retrieval

Author: Ohad Rubin and Jonathan Berant
Subjects: Computational linguistics. Natural language processing, P98-98.5
Published: 2024
Full Text: View/download PDF

216. MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling

Author: Godey, Nathan, Castagné, Roman, de la Clergerie, Éric, and Sagot, Benoît
Subjects: Computer Science - Computation and Language
Abstract: Static subword tokenization algorithms have been an essential component of recent works on language modeling. However, their static nature results in important flaws that degrade the models' downstream performance and robustness. In this work, we propose MANTa, a Module for Adaptive Neural TokenizAtion. MANTa is a differentiable tokenizer trained end-to-end with the language model. The resulting system offers a trade-off between the expressiveness of byte-level models and the speed of models trained using subword tokenization. In addition, our tokenizer is highly explainable since it produces an explicit segmentation of sequences into blocks. We evaluate our pre-trained model on several English datasets from different domains as well as on synthetic noise. We find that MANTa improves robustness to character perturbations and out-of-domain data. We then show that MANTa performs comparably to other models on the general-domain GLUE benchmark. Finally, we show that it is considerably faster than strictly byte-level models., Comment: EMNLP 2022 Findings (https://aclanthology.org/2022.findings-emnlp.207/)
Published: 2022

217. Word-Level Representation From Bytes For Language Modeling

Author: Lee, Chu-Tak, Guo, Qipeng, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language
Abstract: Modern language models mostly take sub-words as input, a design that balances the trade-off between vocabulary size, number of parameters, and performance. However, sub-word tokenization still has disadvantages like not being robust to noise and difficult to generalize to new languages. Also, the current trend of scaling up models reveals that larger models require larger embeddings but that makes parallelization hard. Previous work on image classification proves splitting raw input into a sequence of chucks is a strong, model-agnostic inductive bias. Based on this observation, we rethink the existing character-aware method that takes character-level inputs but makes word-level sequence modeling and prediction. We overhaul this method by introducing a cross-attention network that builds word-level representation directly from bytes, and a sub-word level prediction based on word-level hidden states to avoid the time and space requirement of word-level prediction. With these two improvements combined, we have a token free model with slim input embeddings for downstream tasks. We name our method Byte2Word and perform evaluations on language modeling and text classification. Experiments show that Byte2Word is on par with the strong sub-word baseline BERT but only takes up 10\% of embedding size. We further test our method on synthetic noise and cross-lingual transfer and find it competitive to baseline methods on both settings., Comment: preprint
Published: 2022

218. Enhancing Crisis-Related Tweet Classification with Entity-Masked Language Modeling and Multi-Task Learning

Author: Seeberger, Philipp and Riedhammer, Korbinian
Subjects: Computer Science - Computation and Language
Abstract: Social media has become an important information source for crisis management and provides quick access to ongoing developments and critical information. However, classification models suffer from event-related biases and highly imbalanced label distributions which still poses a challenging task. To address these challenges, we propose a combination of entity-masked language modeling and hierarchical multi-label classification as a multi-task learning problem. We evaluate our method on tweets from the TREC-IS dataset and show an absolute performance gain w.r.t. F1-score of up to 10% for actionable information types. Moreover, we found that entity-masking reduces the effect of overfitting to in-domain events and enables improvements in cross-event generalization., Comment: Accepted at NLP4PI (EMNLP 2022)
Published: 2022

219. Suffix Retrieval-Augmented Language Modeling

Author: Wang, Zecheng and Tam, Yik-Cheung
Subjects: Computer Science - Computation and Language
Abstract: Causal language modeling (LM) uses word history to predict the next word. BERT, on the other hand, makes use of bi-directional word information in a sentence to predict words at masked positions. While BERT is effective in sequence encoding, it is non-causal by nature and is not designed for sequence generation. In this paper, we propose a novel language model, SUffix REtrieval-Augmented LM (SUREALM), that simulates a bi-directional contextual effect in an autoregressive manner. SUREALM employs an embedding retriever to search for training sentences in a data store that share similar word history during sequence generation. In particular, the suffix portions of the retrieved sentences mimick the "future" context. We evaluated our proposed model on the DSTC9 spoken dialogue corpus and showed promising word perplexity reduction on the validation and test set compared to competitive baselines., Comment: 5 pages, 1 figure. Submitted to ICASSP 2023
Published: 2022

220. Decoupled Context Processing for Context Augmented Language Modeling

Author: Li, Zonglin, Guo, Ruiqi, and Kumar, Sanjiv
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval
Abstract: Language models can be augmented with a context retriever to incorporate knowledge from large external databases. By leveraging retrieved context, the neural network does not have to memorize the massive amount of world knowledge within its internal parameters, leading to better parameter efficiency, interpretability and modularity. In this paper we examined a simple yet effective architecture for incorporating external context into language models based on decoupled Encoder Decoder architecture. We showed that such a simple architecture achieves competitive results on auto-regressive language modeling and open domain question answering tasks. We also analyzed the behavior of the proposed model which performs grounded context transfer. Finally we discussed the computational implications of such retrieval augmented models.
Published: 2022

221. The Effectiveness of Masked Language Modeling and Adapters for Factual Knowledge Injection

Author: Wold, Sondre
Subjects: Computer Science - Computation and Language
Abstract: This paper studies the problem of injecting factual knowledge into large pre-trained language models. We train adapter modules on parts of the ConceptNet knowledge graph using the masked language modeling objective and evaluate the success of the method by a series of probing experiments on the LAMA probe. Mean P@K curves for different configurations indicate that the technique is effective, increasing the performance on subsets of the LAMA probe for large values of k by adding as little as 2.1% additional parameters to the original models., Comment: Camera ready version for the 16th TextGraphs workshop, located at Coling 2022
Published: 2022

222. Underspecification in Language Modeling Tasks: A Causality-Informed Study of Gendered Pronoun Resolution

Author: McMilin, Emily
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Modern language modeling tasks are often underspecified: for a given token prediction, many words may satisfy the user's intent of producing natural language at inference time, however only one word will minimize the task's loss function at training time. We introduce a simple causal mechanism to describe the role underspecification plays in the generation of spurious correlations. Despite its simplicity, our causal model directly informs the development of two lightweight black-box evaluation methods, that we apply to gendered pronoun resolution tasks on a wide range of LLMs to 1) aid in the detection of inference-time task underspecification by exploiting 2) previously unreported gender vs. time and gender vs. location spurious correlations on LLMs with a range of A) sizes: from BERT-base to GPT-4 Turbo Preview, B) pre-training objectives: from masked & autoregressive language modeling to a mixture of these objectives, and C) training stages: from pre-training only to reinforcement learning from human feedback (RLHF). Code and open-source demos available at https://github.com/2dot71mily/uspec., Comment: 24 pages, 41 figures
Published: 2022

223. Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling

Author: Gat, Itai, Kreuk, Felix, Nguyen, Tu Anh, Lee, Ann, Copet, Jade, Synnaeve, Gabriel, Dupoux, Emmanuel, and Adi, Yossi
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Generative Spoken Language Modeling research focuses on optimizing speech Language Models (LMs) using raw audio recordings without accessing any textual supervision. Such speech LMs usually operate over discrete units obtained from quantizing internal representations of self-supervised models. Although such units show impressive modeling results, their robustness capabilities have not been extensively investigated. This work focuses on improving the robustness of discrete input representations for generative spoken language modeling. First, we formally define how to measure the robustness of such representations to various signal variations that do not alter the spoken information (e.g., time-stretch). Next, we empirically demonstrate how current state-of-the-art representation models lack robustness to such variations. To overcome this, we propose an effective and efficient method to learn robust discrete speech representation for generative spoken language modeling. The proposed approach is based on applying a set of signal transformations to the speech signal and optimizing the model using an iterative pseudo-labeling scheme. Our method significantly improves over the evaluated baselines when considering encoding and modeling metrics. We additionally evaluate our method on the speech-to-speech translation task, considering Spanish-English and French-English translations, and show the proposed approach outperforms the evaluated baselines.
Published: 2022

224. LGDN: Language-Guided Denoising Network for Video-Language Modeling

Author: Lu, Haoyu, Ding, Mingyu, Fei, Nanyi, Huo, Yuqi, and Lu, Zhiwu
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Multimedia
Abstract: Video-language modeling has attracted much attention with the rapid growth of web videos. Most existing methods assume that the video frames and text description are semantically correlated, and focus on video-language modeling at video level. However, this hypothesis often fails for two reasons: (1) With the rich semantics of video contents, it is difficult to cover all frames with a single video-level description; (2) A raw video typically has noisy/meaningless information (e.g., scenery shot, transition or teaser). Although a number of recent works deploy attention mechanism to alleviate this problem, the irrelevant/noisy information still makes it very difficult to address. To overcome such challenge, we thus propose an efficient and effective model, termed Language-Guided Denoising Network (LGDN), for video-language modeling. Different from most existing methods that utilize all extracted video frames, LGDN dynamically filters out the misaligned or redundant frames under the language supervision and obtains only 2--4 salient frames per video for cross-modal token-level alignment. Extensive experiments on five public datasets show that our LGDN outperforms the state-of-the-arts by large margins. We also provide detailed ablation study to reveal the critical importance of solving the noise issue, in hope of inspiring future video-language work., Comment: Accepted by NeurIPS2022
Published: 2022

225. Masked Vision and Language Modeling for Multi-modal Representation Learning

Author: Kwon, Gukyeong, Cai, Zhaowei, Ravichandran, Avinash, Bas, Erhan, Bhotika, Rahul, and Soatto, Stefano
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In this paper, we study how to use masked signal modeling in vision and language (V+L) representation learning. Instead of developing masked language modeling (MLM) and masked image modeling (MIM) independently, we propose to build joint masked vision and language modeling, where the masked signal of one modality is reconstructed with the help from another modality. This is motivated by the nature of image-text paired data that both of the image and the text convey almost the same information but in different formats. The masked signal reconstruction of one modality conditioned on another modality can also implicitly learn cross-modal alignment between language tokens and image patches. Our experiments on various V+L tasks show that the proposed method, along with common V+L alignment losses, achieves state-of-the-art performance in the regime of millions of pre-training data. Also, we outperforms the other competitors by a significant margin in limited data scenarios., Comment: International Conference on Learning Representations (ICLR) 2023
Published: 2022

226. Investigating the Viability of Masked Language Modeling for Symbolic Music Generation in abc-notation

Author: Casini, Luca, primary, Jonason, Nicolas, additional, and Sturm, Bob L. T., additional
Published: 2024
Full Text: View/download PDF

227. Generative AI and Large Language Modeling in Cybersecurity

Author: Sarker, Iqbal H., primary
Published: 2024
Full Text: View/download PDF

228. GViG: Generative Visual Grounding Using Prompt-Based Language Modeling for Visual Question Answering

Author: Li, Yi-Ting, primary, Lin, Ying-Jia, additional, Yeh, Chia-Jen, additional, Lin, Chun-Yi, additional, and Kao, Hung-Yu, additional
Published: 2024
Full Text: View/download PDF

229. Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach.

Author: Maxime Poli, Emmanuel Chemla, and Emmanuel Dupoux
Published: 2024

230. Structured Object Language Modeling (SO-LM): Native Structured Objects Generation Conforming to Complex Schemas with Self-Supervised Denoising.

Author: Amir Tavanaei, Kee Kiat Koo, Hayreddin çeker, Shaobai Jiang, Qi Li, Julien Han, and Karim Bouyarmane
Published: 2024

231. When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages.

Author: Tyler A. Chang, Catherine Arnett, Zhuowen Tu, and Ben Bergen 0001
Published: 2024

232. Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens.

Author: Weiyao Luo, Suncong Zheng, Heming Xia, Weikang Wang, Yan Lei, Tianyu Liu, Shuang Chen, and Zhifang Sui
Published: 2024

233. Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling.

Author: Georgios Pantazopoulos, Malvina Nikandrou, Alessandro Suglia, Oliver Lemon, and Arash Eshghi
Published: 2024

234. Birdie: Advancing State Space Language Modeling with Dynamic Mixtures of Training Objectives.

Author: Sam Blouir, Jimmy Smith, Antonios Anastasopoulos, and Amarda Shehu
Published: 2024

235. Target-Aware Language Modeling via Granular Data Sampling.

Author: Ernie Chang, Pin-Jie Lin, Yang Li 0183, Changsheng Zhao 0002, Daeil Kim, Rastislav Rabatin, Zechun Liu, Yangyang Shi, and Vikas Chandra
Published: 2024

236. A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives.

Author: Zihao Li, Shaoxiong Ji, Timothee Mickus, Vincent Segonne, and Jörg Tiedemann
Published: 2024

237. TransferCVLM: Transferring Cross-Modal Knowledge for Vision-Language Modeling.

Author: Dongha Choi, Jung-Jae Kim 0001, and Hyunju Lee
Published: 2024

238. Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling.

Author: Matús Pikuliak, Stefan Oresko, Andrea Hrckova, and Marián Simko
Published: 2024

239. UNICORN: A Unified Causal Video-Oriented Language-Modeling Framework for Temporal Video-Language Tasks.

Author: Yuanhao Xiong, Yixin Nie, Haotian Liu, Boxin Wang, Jun Chen, Rong Jin 0001, Cho-Jui Hsieh, Lorenzo Torresani, and Jie Lei
Published: 2024

240. ProtContext-DTI: Protein Contextual Representation Using Masked Language Modeling in Drug Target Interaction Prediction.

Author: Leila Baghaarabani, Parvin Razaghi, Mennatolla Magdy Mostafa, Ahmad Albaqsami, and Masoud Al Rawahi
Published: 2024
Full Text: View/download PDF

241. ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling.

Author: William Yicheng Zhu, Keren Ye, Junjie Ke, Jiahui Yu, Leonidas J. Guibas, Peyman Milanfar, and Feng Yang 0008
Published: 2024
Full Text: View/download PDF

242. X-former Elucidator: Reviving Efficient Attention for Long Context Language Modeling.

Author: Xupeng Miao, Shenhan Zhu, Fangcheng Fu, Ziyu Guo, Zhi Yang 0001, Yaofeng Tu, Zhihao Jia, and Bin Cui 0001
Published: 2024

243. ReALM: Reference Resolution as Language Modeling.

Author: Joel Ruben Antony Moniz, Soundarya Krishnan, Melis özyildirim, Prathamesh Saraf, Halim Cagri Ates, Yuan Zhang, and Hong Yu
Published: 2024

244. Laying Anchors: Semantically Priming Numerals in Language Modeling.

Author: Mandar Sharma, Rutuja Murlidhar Taware, Pravesh Koirala, Nikhil Muralidhar, and Naren Ramakrishnan
Published: 2024
Full Text: View/download PDF