Author: "Su, Yixuan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Su, Yixuan"' showing total 139 results

Start Over Author "Su, Yixuan"

139 results on '"Su, Yixuan"'

1. Prompt Compression for Large Language Models: A Survey

Author: Li, Zongqian, Liu, Yinhong, Su, Yixuan, and Collier, Nigel
Subjects: Computer Science - Computation and Language
Abstract: Leveraging large language models (LLMs) for complex natural language tasks typically requires long-form prompts to convey detailed requirements and information, which results in increased memory usage and inference costs. To mitigate these challenges, multiple efficient methods have been proposed, with prompt compression gaining significant research interest. This survey provides an overview of prompt compression techniques, categorized into hard prompt methods and soft prompt methods. First, the technical approaches of these methods are compared, followed by an exploration of various ways to understand their mechanisms, including the perspectives of attention optimization, Parameter-Efficient Fine-Tuning (PEFT), modality integration, and new synthetic language. We also examine the downstream adaptations of various prompt compression techniques. Finally, the limitations of current prompt compression methods are analyzed, and several future directions are outlined, such as optimizing the compression encoder, combining hard and soft prompts methods, and leveraging insights from multimodality.
Published: 2024

2. ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question Answering

Author: Li, Huayang, Verga, Pat, Sen, Priyanka, Yang, Bowen, Viswanathan, Vijay, Lewis, Patrick, Watanabe, Taro, and Su, Yixuan
Subjects: Computer Science - Computation and Language
Abstract: The context window of large language models (LLMs) has been extended significantly in recent years. However, while the context length that the LLM can process has grown, the capability of the model to accurately reason over that context degrades noticeably. This occurs because modern LLMs often become overwhelmed by the vast amount of information in the context; when answering questions, the model must identify and reason over relevant evidence sparsely distributed throughout the text. To alleviate the challenge of long-context reasoning, we develop a retrieve-then-reason framework, enabling LLMs to reason over relevant evidence collected during an intermediate retrieval step. We find that modern LLMs struggle to accurately retrieve relevant facts and instead, often hallucinate "retrieved facts", resulting in flawed reasoning and the production of incorrect answers. To address these issues, we introduce ALR$^2$, a method that augments the long-context reasoning capability of LLMs via an explicit two-stage procedure, i.e., aligning LLMs with the objectives of both retrieval and reasoning. We demonstrate the efficacy of ALR$^2$ for mitigating performance degradation in long-context reasoning tasks. Through extensive experiments on long-context QA benchmarks, we find our method to outperform competitive baselines by large margins, achieving at least 8.4 and 7.9 EM gains on the long-context versions of HotpotQA and SQuAD datasets, respectively.
Published: 2024

3. To Code, or Not To Code? Exploring Impact of Code in Pre-training

Author: Aryabumi, Viraat, Su, Yixuan, Ma, Raymond, Morisot, Adrien, Zhang, Ivan, Locatelli, Acyr, Fadaee, Marzieh, Üstün, Ahmet, and Hooker, Sara
Subjects: Computer Science - Computation and Language
Abstract: Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been anecdotal consensus among practitioners that code data plays a vital role in general LLMs' performance, there is only limited work analyzing the precise impact of code on non-code tasks. In this work, we systematically investigate the impact of code data on general performance. We ask "what is the impact of code data used in pre-training on a large variety of downstream tasks beyond code generation". We conduct extensive ablations and evaluate across a broad range of natural language reasoning tasks, world knowledge tasks, code benchmarks, and LLM-as-a-judge win-rates for models with sizes ranging from 470M to 2.8B parameters. Across settings, we find a consistent results that code is a critical building block for generalization far beyond coding tasks and improvements to code quality have an outsized impact across all tasks. In particular, compared to text-only pre-training, the addition of code results in up to relative increase of 8.2% in natural language (NL) reasoning, 4.2% in world knowledge, 6.6% improvement in generative win-rates, and a 12x boost in code performance respectively. Our work suggests investments in code quality and preserving code during pre-training have positive impacts.
Published: 2024

4. 500xCompressor: Generalized Prompt Compression for Large Language Models

Author: Li, Zongqian, Su, Yixuan, and Collier, Nigel
Subjects: Computer Science - Computation and Language
Abstract: Prompt compression is crucial for enhancing inference speed, reducing costs, and improving user experience. However, current methods face challenges such as low compression ratios and potential data leakage during evaluation. To address these issues, we propose 500xCompressor, a method that compresses extensive natural language contexts into a minimum of one single special token. The 500xCompressor introduces approximately 0.3% additional parameters and achieves compression ratios ranging from 6x to 480x. It is designed to compress any text, answer various types of questions, and could be utilized by the original large language model (LLM) without requiring fine-tuning. Initially, 500xCompressor was pretrained on the Arxiv Corpus, followed by fine-tuning on the ArxivQA dataset, and subsequently evaluated on strictly unseen and classical question answering (QA) datasets. The results demonstrate that the LLM retained 62.26-72.89% of its capabilities compared to using non-compressed prompts. This study also shows that not all the compressed tokens are equally utilized and that K V values have significant advantages over embeddings in preserving information at high compression ratios. The highly compressive nature of natural language prompts, even for fine-grained complex information, suggests promising potential for future applications and further research into developing a new LLM language.
Published: 2024

5. Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Author: Verga, Pat, Hofstatter, Sebastian, Althammer, Sophia, Su, Yixuan, Piktus, Aleksandra, Arkhangorodsky, Arkady, Xu, Minjie, White, Naomi, and Lewis, Patrick
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: As Large Language Models (LLMs) have become more advanced, they have outpaced our abilities to accurately evaluate their quality. Not only is finding data to adequately probe particular model properties difficult, but evaluating the correctness of a model's freeform generation alone is a challenge. To address this, many evaluations now rely on using LLMs themselves as judges to score the quality of outputs from other LLMs. Evaluations most commonly use a single large model like GPT4. While this method has grown in popularity, it is costly, has been shown to introduce intramodel bias, and in this work, we find that very large models are often unnecessary. We propose instead to evaluate models using a Panel of LLm evaluators (PoLL). Across three distinct judge settings and spanning six different datasets, we find that using a PoLL composed of a larger number of smaller models outperforms a single large judge, exhibits less intra-model bias due to its composition of disjoint model families, and does so while being over seven times less expensive.
Published: 2024

6. StarCoder 2 and The Stack v2: The Next Generation

Author: Lozhkov, Anton, Li, Raymond, Allal, Loubna Ben, Cassano, Federico, Lamy-Poirier, Joel, Tazi, Nouamane, Tang, Ao, Pykhtar, Dmytro, Liu, Jiawei, Wei, Yuxiang, Liu, Tianyang, Tian, Max, Kocetkov, Denis, Zucker, Arthur, Belkada, Younes, Wang, Zijian, Liu, Qian, Abulkhanov, Dmitry, Paul, Indraneil, Li, Zhuang, Li, Wen-Ding, Risdal, Megan, Li, Jia, Zhu, Jian, Zhuo, Terry Yue, Zheltonozhskii, Evgenii, Dade, Nii Osae Osae, Yu, Wenhao, Krauß, Lucas, Jain, Naman, Su, Yixuan, He, Xuanli, Dey, Manan, Abati, Edoardo, Chai, Yekun, Muennighoff, Niklas, Tang, Xiangru, Oblokulov, Muhtasham, Akiki, Christopher, Marone, Marc, Mou, Chenghao, Mishra, Mayank, Gu, Alex, Hui, Binyuan, Dao, Tri, Zebaze, Armel, Dehaene, Olivier, Patry, Nicolas, Xu, Canwen, McAuley, Julian, Hu, Han, Scholak, Torsten, Paquet, Sebastien, Robinson, Jennifer, Anderson, Carolyn Jane, Chapados, Nicolas, Patwary, Mostofa, Tajbakhsh, Nima, Jernite, Yacine, Ferrandis, Carlos Muñoz, Zhang, Lingming, Hughes, Sean, Wolf, Thomas, Guha, Arjun, von Werra, Leandro, and de Vries, Harm
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence
Abstract: The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
Published: 2024

7. Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence

Author: Liu, Yinhong, Su, Yixuan, Shareghi, Ehsan, and Collier, Nigel
Subjects: Computer Science - Computation and Language
Abstract: Recent large language models (LLMs) have shown remarkable performance in aligning generated text with user intentions across various tasks. When it comes to long-form text generation, there has been a growing interest in generation from a discourse coherence perspective. However, existing lexical or semantic metrics such as BLEU, ROUGE, BertScore cannot effectively capture the discourse coherence. The development of discourse-specific automatic evaluation methods for assessing the output of LLMs warrants greater focus and exploration. In this paper, we present a novel automatic metric designed to quantify the discourse divergence between two long-form articles. Extensive experiments on three datasets from representative domains demonstrate that our metric aligns more closely with human preferences and GPT-4 coherence evaluation, outperforming existing evaluation methods., Comment: Accepted by NAACL 2024 main conference
Published: 2024

8. Instruct-SCTG: Guiding Sequential Controlled Text Generation through Instructions

Author: Liu, Yinhong, Su, Yixuan, Shareghi, Ehsan, and Collier, Nigel
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Instruction-tuned large language models have shown remarkable performance in aligning generated text with user intentions across various tasks. However, maintaining human-like discourse structure in the generated text remains a challenging research question. In this paper, we propose Instruct-SCTG, a flexible and effective sequential framework that harnesses instruction-tuned language models to generate structurally coherent text in both fine-tuned and zero-shot setups. Our framework generates articles in a section-by-section manner, aligned with the desired human structure using natural language instructions. Furthermore, we introduce a new automatic metric that measures discourse divergence in a fuzzy manner. Extensive experiments on three datasets from representative domains of news and recipes demonstrate the state-of-the-art performance of our framework in imposing discourse structure during text generation, as verified by both automatic and human evaluation. Our code will be available on Github.
Published: 2023

9. Specialist or Generalist? Instruction Tuning for Specific NLP Tasks

Author: Shi, Chufan, Su, Yixuan, Yang, Cheng, Yang, Yujiu, and Cai, Deng
Subjects: Computer Science - Computation and Language
Abstract: The potential of large language models (LLMs) to simultaneously perform a wide range of natural language processing (NLP) tasks has been the subject of extensive research. Although instruction tuning has proven to be a data-efficient method for transforming LLMs into such generalist models, their performance still lags behind specialist models trained exclusively for specific tasks. In this paper, we investigate whether incorporating broad-coverage generalist instruction tuning can contribute to building a specialist model. We hypothesize that its efficacy depends on task specificity and skill requirements. Our experiments assess four target tasks with distinct coverage levels, revealing that integrating generalist instruction tuning consistently enhances model performance when the task coverage is broad. The effect is particularly pronounced when the amount of task-specific training data is limited. Further investigation into three target tasks focusing on different capabilities demonstrates that generalist instruction tuning improves understanding and reasoning abilities. However, for tasks requiring factual knowledge, generalist data containing hallucinatory information may negatively affect the model's performance. Overall, our work provides a systematic guide for developing specialist models with general instruction tuning. Our code and other related resources can be found at https://github.com/DavidFanzz/Generalist_or_Specialist., Comment: Accepted to EMNLP 2023
Published: 2023

10. Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

Author: Li, Huayang, Lan, Tian, Fu, Zihao, Cai, Deng, Liu, Lemao, Collier, Nigel, Watanabe, Taro, and Su, Yixuan
Subjects: Computer Science - Computation and Language
Abstract: There are a number of diverging hypotheses about the neural text degeneration problem, i.e., generating repetitive and dull loops, which makes this problem both interesting and confusing. In this work, we aim to advance our understanding by presenting a straightforward and fundamental explanation from the data perspective. Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data. Subsequent experiments also demonstrate that by selectively dropping out the attention to repetitive words in training data, degeneration can be significantly minimized. Furthermore, our empirical analysis illustrates that prior works addressing the degeneration issue from various standpoints, such as the high-inflow words, the likelihood objective, and the self-reinforcement phenomenon, can be interpreted by one simple explanation. That is, penalizing the repetitions in training data is a common and fundamental factor for their effectiveness. Moreover, our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning., Comment: Accepted to NeurIPS 2023
Published: 2023

11. Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

Author: Huang, Yupan, Meng, Zaiqiao, Liu, Fangyu, Su, Yixuan, Collier, Nigel, and Lu, Yutong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Large language models exhibit enhanced zero-shot performance on various tasks when fine-tuned with instruction-following data. Multimodal instruction-following models extend these capabilities by integrating both text and images. However, existing models such as MiniGPT-4 and LLaVA face challenges in maintaining dialogue coherence in scenarios involving multiple images. A primary reason is the lack of a specialized dataset for this critical application. To bridge these gaps, we introduce SparklesDialogue, the first machine-generated dialogue dataset tailored for word-level interleaved multi-image and text interactions. Furthermore, we construct SparklesEval, a GPT-assisted benchmark for quantitatively assessing a model's conversational competence across multiple images and dialogue turns. We then present SparklesChat, a multimodal instruction-following model for open-ended dialogues across multiple images. Our experiments validate the effectiveness of training SparklesChat with SparklesDialogue based on MiniGPT-4 and LLaVA-v1.5, which enhances comprehension across multiple images and dialogue turns, and does not compromise single-image understanding capabilities. Qualitative evaluations further demonstrate SparklesChat's generality in handling real-world applications. All resources related to this study are publicly available at https://github.com/HYPJUDY/Sparkles., Comment: ICLR 2024 Workshop (Navigating and Addressing Data Problems for Foundation Models)
Published: 2023

12. YOLOFM: an improved fire and smoke object detection algorithm based on YOLOv5n

Author: Geng, Xin, Su, Yixuan, Cao, Xianghong, Li, Huaizhou, and Liu, Linggong
Published: 2024
Full Text: View/download PDF

13. PandaGPT: One Model To Instruction-Follow Them All

Author: Su, Yixuan, Lan, Tian, Li, Huayang, Xu, Jialu, Wang, Yan, and Cai, Deng
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: We present PandaGPT, an approach to emPower large lANguage moDels with visual and Auditory instruction-following capabilities. Our pilot experiments show that PandaGPT can perform complex tasks such as detailed image description generation, writing stories inspired by videos, and answering questions about audios. More interestingly, PandaGPT can take multimodal inputs simultaneously and compose their semantics naturally. For example, PandaGPT can connect how objects look in an image/video and how they sound in an audio. To do so, PandaGPT combines the multimodal encoders from ImageBind and the large language models from Vicuna. Notably, only aligned image-text pairs are required for the training of PandaGPT. Thanks to the strong capability of ImageBind in embedding data from different modalities into the same space, PandaGPT displays emergent, i.e. zero-shot, cross-modal behaviors for data other than image and text (e.g., video, audio, depth, thermal, and IMU). We hope that PandaGPT serves as an initial step toward building AGI that can perceive and understand inputs in different modalities holistically, as we humans do. Our project page is at https://panda-gpt.github.io/., Comment: Technical report, work in progress. Our project page is at https://panda-gpt.github.io/
Published: 2023

14. Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization

Author: Fu, Zihao, Su, Yixuan, Meng, Zaiqiao, and Collier, Nigel
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Biomedical named entity recognition is one of the core tasks in biomedical natural language processing (BioNLP). To tackle this task, numerous supervised/distantly supervised approaches have been proposed. Despite their remarkable success, these approaches inescapably demand laborious human effort. To alleviate the need of human effort, dictionary-based approaches have been proposed to extract named entities simply based on a given dictionary. However, one downside of existing dictionary-based approaches is that they are challenged to identify concept synonyms that are not listed in the given dictionary, which we refer as the synonym generalization problem. In this study, we propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions. In particular, SynGen introduces two regularization terms, namely, (1) a synonym distance regularizer; and (2) a noise perturbation regularizer, to minimize the synonym generalization error. To demonstrate the effectiveness of our approach, we provide a theoretical analysis of the bound of synonym generalization error. We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins. Lastly, we provide a detailed analysis to further reveal the merits and inner-workings of our approach.
Published: 2023

15. COFFEE: A Contrastive Oracle-Free Framework for Event Extraction

Author: Zhang, Meiru, Su, Yixuan, Meng, Zaiqiao, Fu, Zihao, and Collier, Nigel
Subjects: Computer Science - Computation and Language
Abstract: Event extraction is a complex information extraction task that involves extracting events from unstructured text. Prior classification-based methods require comprehensive entity annotations for joint training, while newer generation-based methods rely on heuristic templates containing oracle information such as event type, which is often unavailable in real-world scenarios. In this study, we consider a more realistic setting of this task, namely the Oracle-Free Event Extraction (OFEE) task, where only the input context is given without any oracle information, including event type, event ontology and trigger word. To solve this task, we propose a new framework, called COFFEE, which extracts the events solely based on the document context without referring to any oracle information. In particular, a contrastive selection model is introduced in COFFEE to rectify the generated triggers and handle multi-event instances. The proposed COFFEE outperforms state-of-the-art approaches under the oracle-free setting of the event extraction task, as evaluated on a public event extraction benchmark ACE05., Comment: Accepted to MATCHING Workshop at ACL 2023
Published: 2023

16. Plug-and-Play Recipe Generation with Content Planning

Author: Liu, Yinhong, Su, Yixuan, Shareghi, Ehsan, and Collier, Nigel
Subjects: Computer Science - Computation and Language
Abstract: Recent pre-trained language models have shown promising capabilities in generating fluent and realistic natural language text. However, generating multi-sentence text with global content planning has been a long-existing research question. Current approaches for controlled text generation can hardly address this issue, as they usually condition on single known control attributes. In this study, we propose a low-cost yet effective framework which explicitly models the global content plan of the generated text. Specifically, it optimizes the joint distribution of the natural language sequence and the global content plan in a plug-and-play manner. We conduct extensive experiments on the well-established Recipe1M+ benchmark. Both automatic and human evaluations verify that our model achieves the state-of-the-art performance on the task of recipe generation, Comment: Paper accepted by EMNLP 2022 GEM workshop
Published: 2022

17. Momentum Decoding: Open-ended Text Generation As Graph Exploration

Author: Lan, Tian, Su, Yixuan, Liu, Shuhang, Huang, Heyan, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: Open-ended text generation with autoregressive language models (LMs) is one of the core tasks in natural language processing. However, maximization-based decoding methods (e.g., greedy/beam search) often lead to the degeneration problem, i.e., the generated text is unnatural and contains undesirable repetitions. Existing solutions to this problem either introduce randomness prone to incoherence or require a look-ahead mechanism that demands extra computational overhead. In this study, we formulate open-ended text generation from a new perspective, i.e., we view it as an exploration process within a directed graph. Thereby, we understand the phenomenon of degeneration as circular loops within the directed graph. Based on our formulation, we propose a novel decoding method -- \textit{momentum decoding} -- which encourages the LM to \textit{greedily} explore new nodes outside the current graph. Meanwhile, it also allows the LM to return to the existing nodes with a momentum downgraded by a pre-defined resistance function. We extensively test our approach on three benchmarks from different domains through automatic and human evaluations. The results show that momentum decoding performs comparably with the current state of the art while enjoying notably improved inference speed and computation FLOPs. Furthermore, we conduct a detailed analysis to reveal the merits and inner workings of our approach. Our codes and other related resources are publicly available at https://github.com/gmftbyGMFTBY/MomentumDecoding., Comment: Work in progress
Published: 2022

18. An Empirical Study On Contrastive Search And Contrastive Decoding For Open-ended Text Generation

Author: Su, Yixuan and Xu, Jialu
Subjects: Computer Science - Computation and Language
Abstract: In the study, we empirically compare the two recently proposed decoding methods, i.e. Contrastive Search (CS) and Contrastive Decoding (CD), for open-ended text generation. The automatic evaluation results suggest that, while CS performs worse than CD on the MAUVE metric, it substantially surpasses CD on the diversity and coherence metrics. More notably, extensive human evaluations across three different domains demonstrate that human annotators are universally more in favor of CS over CD with substantial margins. The contradicted results between MAUVE and human evaluations reveal that MAUVE does not accurately reflect human preferences. Therefore, we call upon the research community to develop better evaluation metrics for open-ended text generation. To ensure the reproducibility of our work, we have open-sourced all our code, evaluation results, as well as human annotations at https://github.com/yxuansu/Contrastive_Search_versus_Contrastive_Decoding., Comment: Technical report with 9 pages, 5 tables, and 6 figures
Published: 2022

19. Contrastive Search Is What You Need For Neural Text Generation

Author: Su, Yixuan and Collier, Nigel
Subjects: Computer Science - Computation and Language
Abstract: Generating text with autoregressive language models (LMs) is of great importance to many natural language processing (NLP) applications. Previous solutions for this task often produce text that contains degenerative expressions or lacks semantic consistency. Recently, Su et al. introduced a new decoding method, contrastive search, based on the isotropic representation space of the language model and obtained new state of the art on various benchmarks. Additionally, Su et al. argued that the representations of autoregressive LMs (e.g. GPT-2) are intrinsically anisotropic which is also shared by previous studies. Therefore, to ensure the language model follows an isotropic distribution, Su et al. proposed a contrastive learning scheme, SimCTG, which calibrates the language model's representations through additional training. In this study, we first answer the question: "Are autoregressive LMs really anisotropic?". To this end, we extensively evaluate the isotropy of LMs across 16 major languages. Surprisingly, we find that the anisotropic problem only exists in the two specific English GPT-2-small/medium models. On the other hand, all other evaluated LMs are naturally isotropic which is in contrast to the conclusion drawn by previous studies. Based on our findings, we further assess the contrastive search decoding method using off-the-shelf LMs on four generation tasks across 16 languages. Our experimental results demonstrate that contrastive search significantly outperforms previous decoding methods without any additional training. More notably, on 12 out of the 16 evaluated languages, contrastive search performs comparably with human-level performances as judged by human evaluations. Our code and other related resources are publicly available at https://github.com/yxuansu/Contrastive_Search_Is_What_You_Need., Comment: TMLR'23
Published: 2022

20. From Easy to Hard: A Dual Curriculum Learning Framework for Context-Aware Document Ranking

Author: Zhu, Yutao, Nie, Jian-Yun, Su, Yixuan, Chen, Haonan, Zhang, Xinyu, and Dou, Zhicheng
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence
Abstract: Contextual information in search sessions is important for capturing users' search intents. Various approaches have been proposed to model user behavior sequences to improve document ranking in a session. Typically, training samples of (search context, document) pairs are sampled randomly in each training epoch. In reality, the difficulty to understand user's search intent and to judge document's relevance varies greatly from one search context to another. Mixing up training samples of different difficulties may confuse the model's optimization process. In this work, we propose a curriculum learning framework for context-aware document ranking, in which the ranking model learns matching signals between the search context and the candidate document in an easy-to-hard manner. In so doing, we aim to guide the model gradually toward a global optimum. To leverage both positive and negative examples, two curricula are designed. Experiments on two real query log datasets show that our proposed framework can improve the performance of several existing methods significantly, demonstrating the effectiveness of curriculum learning for context-aware document ranking., Comment: CIKM 2022 Camera Ready
Published: 2022
Full Text: View/download PDF

21. Language Models Can See: Plugging Visual Controls in Text Generation

Author: Su, Yixuan, Lan, Tian, Liu, Yahui, Liu, Fangyu, Yogatama, Dani, Wang, Yan, Kong, Lingpeng, and Collier, Nigel
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Generative language models (LMs) such as GPT-2/3 can be prompted to generate text with remarkable quality. While they are designed for text-prompted generation, it remains an open question how the generation process could be guided by modalities beyond text such as images. In this work, we propose a training-free framework, called MAGIC (iMAge-Guided text generatIon with CLIP), for plugging in visual controls in the generation process and enabling LMs to perform multimodal tasks (e.g., image captioning) in a zero-shot manner. MAGIC is a simple yet efficient plug-and-play framework, which directly combines an off-the-shelf LM (i.e., GPT-2) and an image-text matching model (i.e., CLIP) for image-grounded text generation. During decoding, MAGIC influences the generation of the LM by introducing a CLIP-induced score, called magic score, which regularizes the generated result to be semantically related to a given image while being coherent to the previously generated context. Notably, the proposed decoding scheme does not involve any gradient update operation, therefore being computationally efficient. On the challenging task of zero-shot image captioning, MAGIC outperforms the state-of-the-art method by notable margins with a nearly 27 times decoding speedup. MAGIC is a flexible framework and is theoretically compatible with any text generation tasks that incorporate image grounding. In the experiments, we showcase that it is also capable of performing visually grounded story generation given both an image and a text prompt., Comment: 21 pages, 5 figures, 5 tables; (v2 adds some experimental details)
Published: 2022

22. Low-carbon upcycling of spent anode graphite: Integrating graphene and dislocations for sustainable lithium/potassium storage

Author: Su, Yixuan, Chen, Hongjie, Chen, Yucong, Li, Jia, Wu, Mingjun, Cheng, Yao, and Ru, Qiang
Published: 2025
Full Text: View/download PDF

23. A Contrastive Framework for Neural Text Generation

Author: Su, Yixuan, Lan, Tian, Wang, Yan, Yogatama, Dani, Kong, Lingpeng, and Collier, Nigel
Subjects: Computer Science - Computation and Language
Abstract: Text generation is of great importance to many natural language processing applications. However, maximization-based decoding methods (e.g. beam search) of neural language models often lead to degenerate solutions -- the generated text is unnatural and contains undesirable repetitions. Existing approaches introduce stochasticity via sampling or modify training objectives to decrease probabilities of certain tokens (e.g., unlikelihood training). However, they often lead to solutions that lack coherence. In this work, we show that an underlying reason for model degeneration is the anisotropic distribution of token representations. We present a contrastive solution: (i) SimCTG, a contrastive training objective to calibrate the model's representation space, and (ii) a decoding method -- contrastive search -- to encourage diversity while maintaining coherence in the generated text. Extensive experiments and analyses on three benchmarks from two languages demonstrate that our proposed approach significantly outperforms current state-of-the-art text generation methods as evaluated by both human and automatic metrics., Comment: NeurIPS 2022
Published: 2022

24. Measuring and Reducing Model Update Regression in Structured Prediction for NLP

Author: Cai, Deng, Mansimov, Elman, Lai, Yi-An, Su, Yixuan, Shu, Lei, and Zhang, Yi
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Recent advance in deep learning has led to the rapid adoption of machine learning-based NLP models in a wide range of applications. Despite the continuous gain in accuracy, backward compatibility is also an important aspect for industrial applications, yet it received little research attention. Backward compatibility requires that the new model does not regress on cases that were correctly handled by its predecessor. This work studies model update regression in structured prediction tasks. We choose syntactic dependency parsing and conversational semantic parsing as representative examples of structured prediction tasks in NLP. First, we measure and analyze model update regression in different model update settings. Next, we explore and benchmark existing techniques for reducing model update regression including model ensemble and knowledge distillation. We further propose a simple and effective method, Backward-Congruent Re-ranking (BCR), by taking into account the characteristics of structured prediction. Experiments show that BCR can better mitigate model update regression than model ensemble and knowledge distillation approaches., Comment: NeurIPS2022
Published: 2022

25. A Survey on Retrieval-Augmented Text Generation

Author: Li, Huayang, Su, Yixuan, Cai, Deng, Wang, Yan, and Liu, Lemao
Subjects: Computer Science - Computation and Language
Abstract: Recently, retrieval-augmented text generation attracted increasing attention of the computational linguistics community. Compared with conventional generation models, retrieval-augmented text generation has remarkable advantages and particularly has achieved state-of-the-art performance in many NLP tasks. This paper aims to conduct a survey about retrieval-augmented text generation. It firstly highlights the generic paradigm of retrieval-augmented generation, and then it reviews notable approaches according to different tasks including dialogue response generation, machine translation, and other generation tasks. Finally, it points out some important directions on top of recent methods to facilitate future research., Comment: all authors contributed equally
Published: 2022

26. Luoshi Neiyi Prescription inhibits estradiol synthesis and inflammation in endometriosis through the HIF1A/EZH2/SF-1 pathway

Author: Wu, Lizheng, Lan, Dantong, Sun, Bowen, Su, Rui, Pei, Fangli, Kuang, Zijun, Su, Yixuan, Lin, Shuhong, Wang, Xuanyin, Zhang, Siyuan, Chen, Xiaoxin, Jia, Jinjin, and Zeng, Cheng
Published: 2024
Full Text: View/download PDF

27. TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Author: Su, Yixuan, Liu, Fangyu, Meng, Zaiqiao, Lan, Tian, Shu, Lei, Shareghi, Ehsan, and Collier, Nigel
Subjects: Computer Science - Computation and Language
Abstract: Masked language models (MLMs) such as BERT and RoBERTa have revolutionized the field of Natural Language Understanding in the past few years. However, existing pre-trained MLMs often output an anisotropic distribution of token representations that occupies a narrow subset of the entire representation space. Such token representations are not ideal, especially for tasks that demand discriminative semantic meanings of distinct tokens. In this work, we propose TaCL (Token-aware Contrastive Learning), a novel continual pre-training approach that encourages BERT to learn an isotropic and discriminative distribution of token representations. TaCL is fully unsupervised and requires no additional data. We extensively test our approach on a wide range of English and Chinese benchmarks. The results show that TaCL brings consistent and notable improvements over the original BERT model. Furthermore, we conduct detailed analysis to reveal the merits and inner-workings of our approach., Comment: Camera-ready for NAACL 2022
Published: 2021

28. Rewire-then-Probe: A Contrastive Recipe for Probing Biomedical Knowledge of Pre-trained Language Models

Author: Meng, Zaiqiao, Liu, Fangyu, Shareghi, Ehsan, Su, Yixuan, Collins, Charlotte, and Collier, Nigel
Subjects: Computer Science - Computation and Language
Abstract: Knowledge probing is crucial for understanding the knowledge transfer mechanism behind the pre-trained language models (PLMs). Despite the growing progress of probing knowledge for PLMs in the general domain, specialised areas such as biomedical domain are vastly under-explored. To catalyse the research in this direction, we release a well-curated biomedical knowledge probing benchmark, MedLAMA, which is constructed based on the Unified Medical Language System (UMLS) Metathesaurus. We test a wide spectrum of state-of-the-art PLMs and probing approaches on our benchmark, reaching at most 3% of acc@10. While highlighting various sources of domain-specific challenges that amount to this underwhelming performance, we illustrate that the underlying PLMs have a higher potential for probing tasks. To achieve this, we propose Contrastive-Probe, a novel self-supervised contrastive probing approach, that adjusts the underlying PLMs without using any probing data. While Contrastive-Probe pushes the acc@10 to 28%, the performance gap still remains notable. Our human expert evaluation suggests that the probing performance of our Contrastive-Probe is still under-estimated as UMLS still does not include the full spectrum of factual knowledge. We hope MedLAMA and Contrastive-Probe facilitate further developments of more suited probing techniques for this domain., Comment: ACL 2022; code and data are released at https://github.com/cambridgeltl/medlama
Published: 2021

29. Exploring Dense Retrieval for Dialogue Response Selection

Author: Lan, Tian, Cai, Deng, Wang, Yan, Su, Yixuan, Huang, Heyan, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Recent progress in deep learning has continuously improved the accuracy of dialogue response selection. In particular, sophisticated neural network architectures are leveraged to capture the rich interactions between dialogue context and response candidates. While remarkably effective, these models also bring in a steep increase in computational cost. Consequently, such models can only be used as a re-rank module in practice. In this study, we present a solution to directly select proper responses from a large corpus or even a nonparallel corpus that only consists of unpaired sentences, using a dense retrieval model. To push the limits of dense retrieval, we design an interaction layer upon the dense retrieval models and apply a set of tailor-designed learning strategies. Our model shows superiority over strong baselines on the conventional re-rank evaluation setting, which is remarkable given its efficiency. To verify the effectiveness of our approach in realistic scenarios, we also conduct full-rank evaluation, where the target is to select proper responses from a full candidate pool that may contain millions of candidates and evaluate them fairly through human annotations. Our proposed model notably outperforms pipeline baselines that integrate fast recall and expressive re-rank modules. Human evaluation results show that enlarging the candidate pool with nonparallel corpora improves response quality further., Comment: 11 pages, 4 figures, 6 tables
Published: 2021

30. Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Author: Su, Yixuan, Shu, Lei, Mansimov, Elman, Gupta, Arshit, Cai, Deng, Lai, Yi-An, and Zhang, Yi
Subjects: Computer Science - Computation and Language
Abstract: Pre-trained language models have been recently shown to benefit task-oriented dialogue (TOD) systems. Despite their success, existing methods often formulate this task as a cascaded generation problem which can lead to error accumulation across different sub-tasks and greater data annotation overhead. In this study, we present PPTOD, a unified plug-and-play model for task-oriented dialogue. In addition, we introduce a new dialogue multi-task pre-training strategy that allows the model to learn the primary TOD task completion skills from heterogeneous dialog corpora. We extensively test our model on three benchmark TOD tasks, including end-to-end dialogue modelling, dialogue state tracking, and intent classification. Experimental results show that PPTOD achieves new state of the art on all evaluated tasks in both high-resource and low-resource scenarios. Furthermore, comparisons against previous SOTA methods show that the responses generated by PPTOD are more factually correct and semantically coherent as judged by human annotators., Comment: Camera-ready for ACL2022 main conference
Published: 2021

31. Plan-then-Generate: Controlled Data-to-Text Generation via Planning

Author: Su, Yixuan, Vandyke, David, Wang, Sihui, Fang, Yimai, and Collier, Nigel
Subjects: Computer Science - Computation and Language
Abstract: Recent developments in neural networks have led to the advance in data-to-text generation. However, the lack of ability of neural models to control the structure of generated output can be limiting in certain real-world applications. In this study, we propose a novel Plan-then-Generate (PlanGen) framework to improve the controllability of neural data-to-text models. Extensive experiments and analyses are conducted on two benchmark datasets, ToTTo and WebNLG. The results show that our model is able to control both the intra-sentence and inter-sentence structure of the generated output. Furthermore, empirical comparisons against previous state-of-the-art methods show that our model improves the generation quality as well as the output diversity as judged by human and automatic evaluations., Comment: Accepted to Findings of EMNLP 2021
Published: 2021

32. Few-Shot Table-to-Text Generation with Prototype Memory

Author: Su, Yixuan, Meng, Zaiqiao, Baker, Simon, and Collier, Nigel
Subjects: Computer Science - Computation and Language
Abstract: Neural table-to-text generation models have achieved remarkable progress on an array of tasks. However, due to the data-hungry nature of neural models, their performances strongly rely on large-scale training examples, limiting their applicability in real-world applications. To address this, we propose a new framework: Prototype-to-Generate (P2G), for table-to-text generation under the few-shot scenario. The proposed framework utilizes the retrieved prototypes, which are jointly selected by an IR system and a novel prototype selector to help the model bridging the structural gap between tables and texts. Experimental results on three benchmark datasets with three state-of-the-art models demonstrate that the proposed framework significantly improves the model performance across various evaluation metrics., Comment: Accepted to Findings of EMNLP 2021
Published: 2021

33. Non-Autoregressive Text Generation with Pre-trained Language Models

Author: Su, Yixuan, Cai, Deng, Wang, Yan, Vandyke, David, Baker, Simon, Li, Piji, and Collier, Nigel
Subjects: Computer Science - Computation and Language
Abstract: Non-autoregressive generation (NAG) has recently attracted great attention due to its fast inference speed. However, the generation quality of existing NAG models still lags behind their autoregressive counterparts. In this work, we show that BERT can be employed as the backbone of a NAG model to greatly improve performance. Additionally, we devise mechanisms to alleviate the two common problems of vanilla NAG models: the inflexibility of prefixed output length and the conditional independence of individual token predictions. Lastly, to further increase the speed advantage of the proposed model, we propose a new decoding strategy, ratio-first, for applications where the output lengths can be approximately estimated beforehand. For a comprehensive evaluation, we test the proposed model on three text generation tasks, including text summarization, sentence compression and machine translation. Experimental results show that our model significantly outperforms existing non-autoregressive baselines and achieves competitive performance with many strong autoregressive models. In addition, we also conduct extensive analysis experiments to reveal the effect of each proposed component., Comment: Accepted to EACL 2021
Published: 2021

34. Dialogue Response Selection with Hierarchical Curriculum Learning

Author: Su, Yixuan, Cai, Deng, Zhou, Qingyu, Lin, Zibo, Baker, Simon, Cao, Yunbo, Shi, Shuming, Collier, Nigel, and Wang, Yan
Subjects: Computer Science - Computation and Language
Abstract: We study the learning of a matching model for dialogue response selection. Motivated by the recent finding that models trained with random negative samples are not ideal in real-world scenarios, we propose a hierarchical curriculum learning framework that trains the matching model in an "easy-to-difficult" scheme. Our learning framework consists of two complementary curricula: (1) corpus-level curriculum (CC); and (2) instance-level curriculum (IC). In CC, the model gradually increases its ability in finding the matching clues between the dialogue context and a response candidate. As for IC, it progressively strengthens the model's ability in identifying the mismatching information between the dialogue context and a response candidate. Empirical studies on three benchmark datasets with three state-of-the-art matching models demonstrate that the proposed learning framework significantly improves the model performance across various evaluation metrics., Comment: Accepted as long paper to the main conference of ACL2021
Published: 2020

35. Exploring provincial carbon-pollutant emission efficiency in China: An integrated approach with social network analysis and spatial econometrics

Author: Zhu, Chaoping, Su, Yixuan, Fan, Ruguo, Qin, Min, and Fu, Haifeng
Published: 2024
Full Text: View/download PDF

36. UV resistance of sol-gel hydrophobic silica antireflective coatings

Author: Su, Yixuan, Wang, Xiaodong, Zhao, Huiyue, Zhang, Chen, Yuan, Fang, Guo, Jingwen, Feng, Chen, and Shen, Jun
Published: 2023
Full Text: View/download PDF

37. Prototype-to-Style: Dialogue Generation with Style-Aware Editing on Retrieval Memory

Author: Su, Yixuan, Wang, Yan, Baker, Simon, Cai, Deng, Liu, Xiaojiang, Korhonen, Anna, and Collier, Nigel
Subjects: Computer Science - Computation and Language
Abstract: The ability of a dialog system to express prespecified language style during conversations has a direct, positive impact on its usability and on user satisfaction. We introduce a new prototype-to-style (PS) framework to tackle the challenge of stylistic dialogue generation. The framework uses an Information Retrieval (IR) system and extracts a response prototype from the retrieved response. A stylistic response generator then takes the prototype and the desired language style as model input to obtain a high-quality and stylistic response. To effectively train the proposed model, we propose a new style-aware learning objective as well as a de-noising learning strategy. Results on three benchmark datasets from two languages demonstrate that the proposed approach significantly outperforms existing baselines in both in-domain and cross-domain evaluations
Published: 2020

38. Stylistic Dialogue Generation via Information-Guided Reinforcement Learning Strategy

Author: Su, Yixuan, Cai, Deng, Wang, Yan, Baker, Simon, Korhonen, Anna, Collier, Nigel, and Liu, Xiaojiang
Subjects: Computer Science - Computation and Language
Abstract: Stylistic response generation is crucial for building an engaging dialogue system for industrial use. While it has attracted much research interest, existing methods often generate stylistic responses at the cost of the content quality (relevance and fluency). To enable better balance between the content quality and the style, we introduce a new training strategy, know as Information-Guided Reinforcement Learning (IG-RL). In IG-RL, a training model is encouraged to explore stylistic expressions while being constrained to maintain its content quality. This is achieved by adopting reinforcement learning strategy with statistical style information guidance for quality-preserving explorations. Experiments on two datasets show that the proposed approach outperforms several strong baselines in terms of the overall response performance.
Published: 2020

39. An extensible multi-block layout warehouse routing optimization model

Author: Su, Yixuan, Zhu, Xi, Yuan, Jinlong, Teo, Kok Lay, Li, Meixia, and Li, Chunfa
Published: 2023
Full Text: View/download PDF

40. Steiner TSP based on aisle as a unit for order picking

Author: Su, Yixuan, Li, Meixia, Zhu, Xi, and Li, Chunfa
Published: 2022
Full Text: View/download PDF

41. Suppressing the metal-metal interaction by CoZn0.5V1.5O4 derived from two-dimensional metal-organic frameworks for supercapacitors

Author: Yuan, Fang, Gao, Guohua, Jiang, Xiaodi, Bi, Wenchao, Su, Yixuan, Guo, Jingwen, Bao, Zhihao, Shen, Jun, and Wu, Guangming
Published: 2022
Full Text: View/download PDF

42. Embedding Constructed Refractive Index Graded Antireflective Coating with High Abrasion Resistance and Environmental Stability for Polycarbonate Glass

Author: Zhang, Chen, Zhao, Huiyue, Su, Yixuan, Wang, Hongqiang, Shen, Jun, and Wang, Xiaodong
Published: 2021
Full Text: View/download PDF

43. Exploring Dense Retrieval for Dialogue Response Selection

Author: Lan, Tian, primary, Cai, Deng, additional, Wang, Yan, additional, Su, Yixuan, additional, Huang, Heyan, additional, and Mao, Xian-Ling, additional
Published: 2024
Full Text: View/download PDF

44. Surface free energy and microstructure dependent environmental stability of sol–gel SiO2 antireflective coatings: Effect of combined vapor phase surface treatment

Author: Wang, Xiaodong, Zhao, Huiyue, Cao, Yuanyuan, Su, Yixuan, Hui, Haohao, and Shen, Jun
Published: 2019
Full Text: View/download PDF

45. Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization

Author: Fu, Zihao, primary, Su, Yixuan, additional, Meng, Zaiqiao, additional, and Collier, Nigel, additional
Published: 2023
Full Text: View/download PDF

46. Specialist or Generalist? Instruction Tuning for Specific NLP Tasks

Author: Shi, Chufan, primary, Su, Yixuan, additional, Yang, Cheng, additional, Yang, Yujiu, additional, and Cai, Deng, additional
Published: 2023
Full Text: View/download PDF

47. COFFEE: A Contrastive Oracle-Free Framework for Event Extraction

Author: Zhang, Meiru, primary, Su, Yixuan, additional, Meng, Zaiqiao, additional, Fu, Zihao, additional, and Collier, Nigel, additional
Published: 2023
Full Text: View/download PDF

48. From Easy to Hard

Author: Zhu, Yutao, primary, Nie, Jian-Yun, additional, Su, Yixuan, additional, Chen, Haonan, additional, Zhang, Xinyu, additional, and Dou, Zhicheng, additional
Published: 2022
Full Text: View/download PDF

49. Plan-then-Generate: Controlled Data-to-Text Generation via Planning

Author: Su, Yixuan, Vandyke, David, Wang, Sihui, Fang, Yimai, and Collier, Nigel
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Recent developments in neural networks have led to the advance in data-to-text generation. However, the lack of ability of neural models to control the structure of generated output can be limiting in certain real-world applications. In this study, we propose a novel Plan-then-Generate (PlanGen) framework to improve the controllability of neural data-to-text models. Extensive experiments and analyses are conducted on two benchmark datasets, ToTTo and WebNLG. The results show that our model is able to control both the intra-sentence and inter-sentence structure of the generated output. Furthermore, empirical comparisons against previous state-of-the-art methods show that our model improves the generation quality as well as the output diversity as judged by human and automatic evaluations., Comment: Accepted to Findings of EMNLP 2021
Published: 2022
Full Text: View/download PDF

50. UV resistance of sol-gel hydrophobic silica antireflective coatings

Author: Su, Yixuan, primary, Wang, Xiaodong, additional, Zhao, Huiyue, additional, Zhang, Chen, additional, Yuan, Fang, additional, Guo, Jingwen, additional, Feng, Chen, additional, and Shen, Jun, additional
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

139 results on '"Su, Yixuan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources