Author: "He, Junxian" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"He, Junxian"' showing total 318 results

Start Over Author "He, Junxian"

318 results on '"He, Junxian"'

1. Non-myopic Generation of Language Models for Reasoning and Planning

Author: Ma, Chang, Zhao, Haiteng, Zhang, Junlei, He, Junxian, and Kong, Lingpeng
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large Language Models have demonstrated remarkable abilities in reasoning and planning by breaking down complex problems into sequential steps. Despite their success in various domains like mathematical problem-solving and coding, LLMs face challenges in ensuring reliable and optimal planning due to their inherent myopic nature of autoregressive decoding. This paper revisits LLM reasoning from an optimal-control perspective, proposing a novel method, Predictive-Decoding, that leverages Model Predictive Control to enhance planning accuracy. By re-weighting LLM distributions based on foresight trajectories, Predictive-Decoding aims to mitigate early errors and promote non-myopic planning. Our experiments show significant improvements in a wide range of tasks for math, coding, and agents. Furthermore, Predictive-Decoding demonstrates computational efficiency, outperforming search baselines with reduced computational resources. This study provides insights into optimizing LLM planning capabilities.
Published: 2024

2. Can ChatGPT assist visually impaired people with micro-navigation?

Author: He, Junxian, Pundlik, Shrinivas, and Luo, Gang
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Computer Vision and Pattern Recognition
Abstract: Objective: Micro-navigation poses challenges for blind and visually impaired individuals. They often need to ask for sighted assistance. We explored the feasibility of utilizing ChatGPT as a virtual assistant to provide navigation directions. Methods: We created a test set of outdoor and indoor micro-navigation scenarios consisting of 113 scene images and their human-generated text descriptions. A total of 412 way-finding queries and their expected responses were compiled based on the scenarios. Not all queries are answerable based on the information available in the scene image. "I do not know"response was expected for unanswerable queries, which served as negative cases. High level orientation responses were expected, and step-by-step guidance was not required. ChatGPT 4o was evaluated based on sensitivity (SEN) and specificity (SPE) under different conditions. Results: The default ChatGPT 4o, with scene images as inputs, resulted in SEN and SPE values of 64.8% and 75.9%, respectively. Instruction on how to respond to unanswerable questions did not improve SEN substantially but SPE increased by around 14 percentage points. SEN and SPE both improved substantially, by about 17 and 16 percentage points on average respectively, when human written descriptions of the scenes were provided as input instead of images. Providing further prompt instructions to the assistants when the input was text description did not substantially change the SEN and SPE values. Conclusion: Current native ChatGPT 4o is still unable to provide correct micro-navigation guidance in some cases, probably because its scene understanding is not optimized for navigation purposes. If multi-modal chatbots could interpret scenes with a level of clarity comparable to humans, and also guided by appropriate prompts, they may have the potential to provide assistance to visually impaired for micro-navigation.
Published: 2024

3. On the Universal Truthfulness Hyperplane Inside LLMs

Author: Liu, Junteng, Chen, Shiqi, Cheng, Yu, and He, Junxian
Subjects: Computer Science - Computation and Language
Abstract: While large language models (LLMs) have demonstrated remarkable abilities across various fields, hallucination remains a significant challenge. Recent studies have explored hallucinations through the lens of internal representations, proposing mechanisms to decipher LLMs' adherence to facts. However, these approaches often fail to generalize to out-of-distribution data, leading to concerns about whether internal representation patterns reflect fundamental factual awareness, or only overfit spurious correlations on the specific datasets. In this work, we investigate whether a universal truthfulness hyperplane that distinguishes the model's factually correct and incorrect outputs exists within the model. To this end, we scale up the number of training datasets and conduct an extensive evaluation -- we train the truthfulness hyperplane on a diverse collection of over 40 datasets and examine its cross-task, cross-domain, and in-domain generalization. Our results indicate that increasing the diversity of the training datasets significantly enhances the performance in all scenarios, while the volume of data samples plays a less critical role. This finding supports the optimistic hypothesis that a universal truthfulness hyperplane may indeed exist within the model, offering promising directions for future research.
Published: 2024

4. Belief Revision: The Adaptability of Large Language Models Reasoning

Author: Wilie, Bryan, Cahyawijaya, Samuel, Ishii, Etsuko, He, Junxian, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: The capability to reason from text is crucial for real-world NLP applications. Real-world scenarios often involve incomplete or evolving data. In response, individuals update their beliefs and understandings accordingly. However, most existing evaluations assume that language models (LMs) operate with consistent information. We introduce Belief-R, a new dataset designed to test LMs' belief revision ability when presented with new evidence. Inspired by how humans suppress prior inferences, this task assesses LMs within the newly proposed delta reasoning ($\Delta R$) framework. Belief-R features sequences of premises designed to simulate scenarios where additional information could necessitate prior conclusions drawn by LMs. We evaluate $\sim$30 LMs across diverse prompting strategies and found that LMs generally struggle to appropriately revise their beliefs in response to new information. Further, models adept at updating often underperformed in scenarios without necessary updates, highlighting a critical trade-off. These insights underscore the importance of improving LMs' adaptiveness to changing information, a step toward more reliable AI systems.
Published: 2024

5. DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving

Author: Tong, Yuxuan, Zhang, Xiwen, Wang, Rui, Wu, Ruidong, and He, Junxian
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Solving mathematical problems requires advanced reasoning abilities and presents notable challenges for large language models. Previous works usually synthesize data from proprietary models to augment existing datasets, followed by instruction tuning to achieve top-tier results. However, our analysis of these datasets reveals severe biases towards easy queries, with frequent failures to generate any correct response for the most challenging queries. Hypothesizing that difficult queries are crucial to learn complex reasoning, we propose Difficulty-Aware Rejection Tuning (DART), a method that allocates difficult queries more trials during the synthesis phase, enabling more extensive training on difficult samples. Utilizing DART, we have created new datasets for mathematical problem-solving that focus more on difficult queries and are substantially smaller than previous ones. Remarkably, our synthesis process solely relies on a 7B-sized open-weight model, without reliance on the commonly used proprietary GPT-4. We fine-tune various base models on our datasets ranging from 7B to 70B in size, resulting in a series of strong models called DART-MATH. In comprehensive in-domain and out-of-domain evaluation on 6 mathematical benchmarks, DART-MATH outperforms vanilla rejection tuning significantly, being superior or comparable to previous arts, despite using much smaller datasets and no proprietary models. Furthermore, our results position our synthetic datasets as the most effective and cost-efficient publicly available resources for advancing mathematical problem-solving., Comment: Preprint. Data and model checkpoints are available at https://github.com/hkust-nlp/dart-math
Published: 2024

6. IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce

Author: Ding, Wenxuan, Wang, Weiqi, Kwok, Sze Heng Douglas, Liu, Minghao, Fang, Tianqing, Bai, Jiaxin, Liu, Xin, Yu, Changlong, Li, Zheng, Luo, Chen, Yin, Qingyu, Yin, Bing, He, Junxian, and Song, Yangqiu
Subjects: Computer Science - Computation and Language
Abstract: Enhancing Language Models' (LMs) ability to understand purchase intentions in E-commerce scenarios is crucial for their effective assistance in various downstream tasks. However, previous approaches that distill intentions from LMs often fail to generate meaningful and human-centric intentions applicable in real-world E-commerce contexts. This raises concerns about the true comprehension and utilization of purchase intentions by LMs. In this paper, we present IntentionQA, a double-task multiple-choice question answering benchmark to evaluate LMs' comprehension of purchase intentions in E-commerce. Specifically, LMs are tasked to infer intentions based on purchased products and utilize them to predict additional purchases. IntentionQA consists of 4,360 carefully curated problems across three difficulty levels, constructed using an automated pipeline to ensure scalability on large E-commerce platforms. Human evaluations demonstrate the high quality and low false-negative rate of our benchmark. Extensive experiments across 19 language models show that they still struggle with certain scenarios, such as understanding products and intentions accurately, jointly reasoning with products and intentions, and more, in which they fall far behind human performances. Our code and data are publicly available at https://github.com/HKUST-KnowComp/IntentionQA., Comment: Findings of EMNLP 2024
Published: 2024

7. Compression Represents Intelligence Linearly

Author: Huang, Yuzhen, Zhang, Jinghan, Shan, Zifei, and He, Junxian
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Theory, Computer Science - Machine Learning
Abstract: There is a belief that learning to compress well will lead to intelligence. Recently, language modeling has been shown to be equivalent to compression, which offers a compelling rationale for the success of large language models (LLMs): the development of more advanced language models is essentially enhancing compression which facilitates intelligence. Despite such appealing discussions, little empirical evidence is present for the interplay between compression and intelligence. In this work, we examine their relationship in the context of LLMs, treating LLMs as data compressors. Given the abstract concept of "intelligence", we adopt the average downstream benchmark scores as a surrogate, specifically targeting intelligence related to knowledge and commonsense, coding, and mathematical reasoning. Across 12 benchmarks, our study brings together 31 public LLMs that originate from diverse organizations. Remarkably, we find that LLMs' intelligence -- reflected by average benchmark scores -- almost linearly correlates with their ability to compress external text corpora. These results provide concrete evidence supporting the belief that superior compression indicates greater intelligence. Furthermore, our findings suggest that compression efficiency, as an unsupervised metric derived from raw text corpora, serves as a reliable evaluation measure that is linearly associated with the model capabilities. We open-source our compression datasets as well as our data collection pipelines to facilitate future researchers to assess compression properly., Comment: COLM 2024. Data and code are available at https://github.com/hkust-nlp/llm-compression-intelligence
Published: 2024

8. In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

Author: Chen, Shiqi, Xiong, Miao, Liu, Junteng, Wu, Zhengxuan, Xiao, Teng, Gao, Siyang, and He, Junxian
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Large language models (LLMs) frequently hallucinate and produce factual errors, yet our understanding of why they make these errors remains limited. In this study, we delve into the underlying mechanisms of LLM hallucinations from the perspective of inner representations, and discover a salient pattern associated with hallucinations: correct generations tend to have sharper context activations in the hidden states of the in-context tokens, compared to the incorrect ones. Leveraging this insight, we propose an entropy-based metric to quantify the ``sharpness'' among the in-context hidden states and incorporate it into the decoding process to formulate a constrained decoding approach. Experiments on various knowledge-seeking and hallucination benchmarks demonstrate our approach's consistent effectiveness, for example, achieving up to an 8.6 point improvement on TruthfulQA. We believe this study can improve our understanding of hallucinations and serve as a practical solution for hallucination mitigation., Comment: code repo is available at: https://github.com/hkust-nlp/Activation_decoding.git
Published: 2024

9. Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models

Author: Hu, Zhiyuan, Liu, Chumin, Feng, Xidong, Zhao, Yilun, Ng, See-Kiong, Luu, Anh Tuan, He, Junxian, Koh, Pang Wei, and Hooi, Bryan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: In the face of uncertainty, the ability to *seek information* is of fundamental importance. In many practical applications, such as medical diagnosis and troubleshooting, the information needed to solve the task is not initially given and has to be actively sought by asking follow-up questions (for example, a doctor asking a patient for more details about their symptoms). In this work, we introduce Uncertainty of Thoughts (UoT), an algorithm to augment large language models with the ability to actively seek information by asking effective questions. UoT combines 1) an *uncertainty-aware simulation approach* which enables the model to simulate possible future scenarios and how likely they are to occur, 2) *uncertainty-based rewards* motivated by information gain which incentivizes the model to seek information, and 3) a *reward propagation scheme* to select the optimal question to ask in a way that maximizes the expected reward. In experiments on medical diagnosis, troubleshooting, and the `20 Questions` game, UoT achieves an average performance improvement of 38.1% in the rate of successful task completion across multiple LLMs compared with direct prompting and also improves efficiency (i.e., the number of questions needed to complete the task). Our code has been released [here](https://github.com/zhiyuanhubj/UoT), Comment: Update Results
Published: 2024

10. AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents

Author: Ma, Chang, Zhang, Junlei, Zhu, Zhihao, Yang, Cheng, Yang, Yujiu, Jin, Yaohui, Lan, Zhenzhong, Kong, Lingpeng, and He, Junxian
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Evaluating large language models (LLMs) as general-purpose agents is essential for understanding their capabilities and facilitating their integration into practical applications. However, the evaluation process presents substantial challenges. A primary obstacle is the benchmarking of agent performance across diverse scenarios within a unified framework, especially in maintaining partially-observable environments and ensuring multi-round interactions. Moreover, current evaluation frameworks mostly focus on the final success rate, revealing few insights during the process and failing to provide a deep understanding of the model abilities. To address these challenges, we introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents. AgentBoard offers a fine-grained progress rate metric that captures incremental advancements as well as a comprehensive evaluation toolkit that features easy assessment of agents for multi-faceted analysis through interactive visualization. This not only sheds light on the capabilities and limitations of LLM agents but also propels the interpretability of their performance to the forefront. Ultimately, AgentBoard serves as a significant step towards demystifying agent behaviors and accelerating the development of stronger LLM agents., Comment: Preprint
Published: 2024

11. GeoGalactica: A Scientific Large Language Model in Geoscience

Author: Lin, Zhouhan, Deng, Cheng, Zhou, Le, Zhang, Tianhang, Xu, Yi, Xu, Yutong, He, Zhongmou, Shi, Yuanyuan, Dai, Beiya, Song, Yunchong, Zeng, Boyi, Chen, Qiyuan, Miao, Yuxun, Xue, Bo, Wang, Shu, Fu, Luoyi, Zhang, Weinan, He, Junxian, Zhu, Yunqiang, Wang, Xinbing, and Zhou, Chenghu
Subjects: Computer Science - Computation and Language, I.2.7, F.4.1
Abstract: Large language models (LLMs) have achieved huge success for their general knowledge and ability to solve a wide spectrum of tasks in natural language processing (NLP). Due to their impressive abilities, LLMs have shed light on potential inter-discipline applications to foster scientific discoveries of a specific domain by using artificial intelligence (AI for science, AI4S). In the meantime, utilizing NLP techniques in geoscience research and practice is wide and convoluted, contributing from knowledge extraction and document classification to question answering and knowledge discovery. In this work, we take the initial step to leverage LLM for science, through a rather straightforward approach. We try to specialize an LLM into geoscience, by further pre-training the model with a vast amount of texts in geoscience, as well as supervised fine-tuning (SFT) the resulting model with our custom collected instruction tuning dataset. These efforts result in a model GeoGalactica consisting of 30 billion parameters. To our best knowledge, it is the largest language model for the geoscience domain. More specifically, GeoGalactica is from further pre-training of Galactica. We train GeoGalactica over a geoscience-related text corpus containing 65 billion tokens, preserving as the largest geoscience-specific text corpus. Then we fine-tune the model with 1 million pairs of instruction-tuning data consisting of questions that demand professional geoscience knowledge to answer. In this technical report, we will illustrate in detail all aspects of GeoGalactica, including data collection, data cleaning, base model selection, pre-training, SFT, and evaluation. We open-source our data curation tools and the checkpoints of GeoGalactica during the first 3/4 of pre-training.
Published: 2023

12. What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning

Author: Liu, Wei, Zeng, Weihao, He, Keqing, Jiang, Yong, and He, Junxian
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Instruction tuning is a standard technique employed to align large language models to end tasks and user preferences after the initial pretraining phase. Recent research indicates the critical role of data engineering in instruction tuning -- when appropriately selected, only limited data is necessary to achieve superior performance. However, we still lack a principled understanding of what makes good instruction tuning data for alignment, and how we should select data automatically and effectively. In this work, we delve deeply into automatic data selection strategies for alignment. We start with controlled studies to measure data across three dimensions: complexity, quality, and diversity, along which we examine existing methods and introduce novel techniques for enhanced data measurement. Subsequently, we propose a simple strategy to select data samples based on the measurement. We present deita (short for Data-Efficient Instruction Tuning for Alignment), a series of models fine-tuned from LLaMA and Mistral models using data samples automatically selected with our proposed approach. Empirically, deita performs better or on par with the state-of-the-art open-source alignment models with only 6K SFT training data samples -- over 10x less than the data used in the baselines. When further trained with direct preference optimization (DPO), deita-Mistral-7B + DPO trained with 6K SFT and 10K DPO samples achieve 7.55 MT-Bench and 90.06% AlpacaEval scores. We anticipate this work to provide tools on automatic data selection, facilitating data-efficient alignment. We release our models as well as the selected datasets for future researches to effectively align models more efficiently., Comment: ICLR2024 Camera Ready. Data and model checkpoints are available at https://github.com/hkust-nlp/deita
Published: 2023

13. A Survey of Reasoning with Foundation Models

Author: Sun, Jiankai, Zheng, Chuanyang, Xie, Enze, Liu, Zhengying, Chu, Ruihang, Qiu, Jianing, Xu, Jiaqi, Ding, Mingyu, Li, Hongyang, Geng, Mengzhe, Wu, Yue, Wang, Wenhai, Chen, Junsong, Yin, Zhangyue, Ren, Xiaozhe, Fu, Jie, He, Junxian, Yuan, Wu, Liu, Qi, Liu, Xihui, Li, Yu, Dong, Hao, Cheng, Yu, Zhang, Ming, Heng, Pheng Ann, Dai, Jifeng, Luo, Ping, Wang, Jingdong, Wen, Ji-Rong, Qiu, Xipeng, Guo, Yike, Xiong, Hui, Liu, Qun, and Li, Zhenguo
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI., Comment: 20 Figures, 160 Pages, 750+ References, Project Page https://github.com/reasoning-survey/Awesome-Reasoning-Foundation-Models
Published: 2023

14. Prompt Optimization via Adversarial In-Context Learning

Author: Do, Xuan Long, Zhao, Yiran, Brown, Hannah, Xie, Yuxi, Zhao, James Xu, Chen, Nancy F., Kawaguchi, Kenji, Shieh, Michael, and He, Junxian
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier. As in traditional adversarial learning, adv-ICL is implemented as a two-player game between the generator and discriminator, where the generator tries to generate realistic enough output to fool the discriminator. In each round, given an input prefixed by task instructions and several exemplars, the generator produces an output. The discriminator is then tasked with classifying the generator input-output pair as model-generated or real data. Based on the discriminator loss, the prompt modifier proposes possible edits to the generator and discriminator prompts, and the edits that most improve the adversarial loss are selected. We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques for both open and closed-source models on 11 generation and classification tasks including summarization, arithmetic reasoning, machine translation, data-to-text generation, and the MMLU and big-bench hard benchmarks. In addition, because our method uses pre-trained models and updates only prompts rather than model parameters, it is computationally efficient, easy to extend to any LLM and task, and effective in low-resource settings., Comment: ACL 2024
Published: 2023

15. InstructCoder: Instruction Tuning Large Language Models for Code Editing

Author: Li, Kaixin, Hu, Qisheng, Zhao, Xu, Chen, Hui, Xie, Yuxi, Liu, Tiedong, Xie, Qizhe, and He, Junxian
Subjects: Computer Science - Computation and Language, Computer Science - Software Engineering
Abstract: Code editing encompasses a variety of pragmatic tasks that developers deal with daily. Despite its relevance and practical usefulness, automatic code editing remains an underexplored area in the evolution of deep learning models, partly due to data scarcity. In this work, we explore the use of Large Language Models (LLMs) to edit code based on user instructions. Evaluated on a novel human-written execution-based benchmark dubbed EditEval, we found current models often struggle to fulfill the instructions. In light of this, we contribute InstructCoder, the first instruction-tuning dataset designed to adapt LLMs for general-purpose code editing, containing high-diversity code-editing tasks such as comment insertion, code optimization, and code refactoring. It consists of over 114,000 instruction-input-output triplets and covers multiple distinct code editing scenarios. The collection process starts with filtered commit data sourced from GitHub Python repositories as seeds. Subsequently, the dataset is systematically expanded through an iterative process, where both seed and generated tasks are used to prompt ChatGPT for more data. Our findings reveal that open-source LLMs fine-tuned on InstructCoder can significantly enhance the accuracy of code edits, exhibiting superior code-editing performance matching advanced proprietary LLMs. The datasets and the source code are publicly available at https://github.com/qishenghu/CodeInstruct.
Published: 2023

16. FELM: Benchmarking Factuality Evaluation of Large Language Models

Author: Chen, Shiqi, Zhao, Yiran, Zhang, Jinghan, Chern, I-Chun, Gao, Siyang, Liu, Pengfei, and He, Junxian
Subjects: Computer Science - Computation and Language
Abstract: Assessing factuality of text generated by large language models (LLMs) is an emerging yet crucial research area, aimed at alerting users to potential errors and guiding the development of more reliable LLMs. Nonetheless, the evaluators assessing factuality necessitate suitable evaluation themselves to gauge progress and foster advancements. This direction remains under-explored, resulting in substantial impediments to the progress of factuality evaluators. To mitigate this issue, we introduce a benchmark for Factuality Evaluation of large Language Models, referred to as felm. In this benchmark, we collect responses generated from LLMs and annotate factuality labels in a fine-grained manner. Contrary to previous studies that primarily concentrate on the factuality of world knowledge (e.g.~information from Wikipedia), felm focuses on factuality across diverse domains, spanning from world knowledge to math and reasoning. Our annotation is based on text segments, which can help pinpoint specific factual errors. The factuality annotations are further supplemented by predefined error types and reference links that either support or contradict the statement. In our experiments, we investigate the performance of several LLM-based factuality evaluators on felm, including both vanilla LLMs and those augmented with retrieval mechanisms and chain-of-thought processes. Our findings reveal that while retrieval aids factuality evaluation, current LLMs are far from satisfactory to faithfully detect factual errors., Comment: Accepted by NeurIPS 2023 Track on Datasets and Benchmarks
Published: 2023

17. SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning

Author: Duan, Keyu, Liu, Qian, Chua, Tat-Seng, Yan, Shuicheng, Ooi, Wei Tsang, Xie, Qizhe, and He, Junxian
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Textual graphs (TGs) are graphs whose nodes correspond to text (sentences or documents), which are widely prevalent. The representation learning of TGs involves two stages: (i) unsupervised feature extraction and (ii) supervised graph representation learning. In recent years, extensive efforts have been devoted to the latter stage, where Graph Neural Networks (GNNs) have dominated. However, the former stage for most existing graph benchmarks still relies on traditional feature engineering techniques. More recently, with the rapid development of language models (LMs), researchers have focused on leveraging LMs to facilitate the learning of TGs, either by jointly training them in a computationally intensive framework (merging the two stages), or designing complex self-supervised training tasks for feature extraction (enhancing the first stage). In this work, we present SimTeG, a frustratingly Simple approach for Textual Graph learning that does not innovate in frameworks, models, and tasks. Instead, we first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task, such as node classification. We then generate node embeddings using the last hidden states of finetuned LM. These derived features can be further utilized by any GNN for training on the same task. We evaluate our approach on two fundamental graph representation learning tasks: node classification and link prediction. Through extensive experiments, we show that our approach significantly improves the performance of various GNNs on multiple graph benchmarks., Comment: 9 pages, 3 figures
Published: 2023

18. FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

Author: Chern, I-Chun, Chern, Steffi, Chen, Shiqi, Yuan, Weizhe, Feng, Kehua, Zhou, Chunting, He, Junxian, Neubig, Graham, and Liu, Pengfei
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The emergence of generative pre-trained models has facilitated the synthesis of high-quality text, but it has also posed challenges in identifying factual errors in the generated text. In particular: (1) A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models. (2) Generated texts tend to be lengthy and lack a clearly defined granularity for individual facts. (3) There is a scarcity of explicit evidence available during the process of fact checking. With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT). Experiments on four different tasks (knowledge-based QA, code generation, mathematical reasoning, and scientific literature review) show the efficacy of the proposed method. We release the code of FacTool associated with ChatGPT plugin interface at https://github.com/GAIR-NLP/factool .
Published: 2023

19. Composing Parameter-Efficient Modules with Arithmetic Operations

Author: Zhang, Jinghan, Chen, Shiqi, Liu, Junteng, and He, Junxian
Subjects: Computer Science - Computation and Language
Abstract: As an efficient alternative to conventional full finetuning, parameter-efficient finetuning (PEFT) is becoming the prevailing method to adapt pretrained language models. In PEFT, a lightweight module is learned on each dataset while the underlying pretrained language model remains unchanged, resulting in multiple compact modules representing diverse skills when applied to various domains and tasks. In this paper, we propose to compose these parameter-efficient modules through linear arithmetic operations in the weight space, thereby integrating different module capabilities. Specifically, we first define addition and negation operators for the module, and then further compose these two basic operators to perform flexible arithmetic. Our approach requires \emph{no additional training} and enables highly flexible module composition. We apply different arithmetic operations to compose the parameter-efficient modules for (1) distribution generalization, (2) multi-tasking, (3) unlearning, and (4) domain transfer. Additionally, we extend our approach to detoxify Alpaca-LoRA, the latest instruction-tuned large language model based on LLaMA. Empirical results demonstrate that our approach produces new and effective parameter-efficient modules that significantly outperform existing ones across all settings., Comment: NeurIPS 2023. Code is available at https://github.com/SJTU-LIT/PEM_composition
Published: 2023

20. Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

Author: Xiong, Miao, Hu, Zhiyuan, Lu, Xinyang, Li, Yifei, Fu, Jie, He, Junxian, and Hooi, Bryan
Subjects: Computer Science - Computation and Language
Abstract: Empowering large language models to accurately express confidence in their answers is essential for trustworthy decision-making. Previous confidence elicitation methods, which primarily rely on white-box access to internal model information or model fine-tuning, have become less suitable for LLMs, especially closed-source commercial APIs. This leads to a growing need to explore the untapped area of black-box approaches for LLM uncertainty estimation. To better break down the problem, we define a systematic framework with three components: prompting strategies for eliciting verbalized confidence, sampling methods for generating multiple responses, and aggregation techniques for computing consistency. We then benchmark these methods on two key tasks-confidence calibration and failure prediction-across five types of datasets (e.g., commonsense and arithmetic reasoning) and five widely-used LLMs including GPT-4 and LLaMA 2 Chat. Our analysis uncovers several key insights: 1) LLMs, when verbalizing their confidence, tend to be overconfident, potentially imitating human patterns of expressing confidence. 2) As model capability scales up, both calibration and failure prediction performance improve. 3) Employing our proposed strategies, such as human-inspired prompts, consistency among multiple responses, and better aggregation strategies can help mitigate this overconfidence from various perspectives. 4) Comparisons with white-box methods indicate that while white-box methods perform better, the gap is narrow, e.g., 0.522 to 0.605 in AUROC. Despite these advancements, none of these techniques consistently outperform others, and all investigated methods struggle in challenging tasks, such as those requiring professional knowledge, indicating significant scope for improvement. We believe this study can serve as a strong baseline and provide insights for eliciting confidence in black-box LLMs., Comment: The paper is accepted by ICLR 2024. The code is publicly available at https://github.com/MiaoXiong2320/llm-uncertainty
Published: 2023

21. K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization

Author: Deng, Cheng, Zhang, Tianhang, He, Zhongmou, Xu, Yi, Chen, Qiyuan, Shi, Yuanyuan, Fu, Luoyi, Zhang, Weinan, Wang, Xinbing, Zhou, Chenghu, Lin, Zhouhan, and He, Junxian
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, I.2.7, F.4.1
Abstract: Large language models (LLMs) have achieved great success in general domains of natural language processing. In this paper, we bring LLMs to the realm of geoscience with the objective of advancing research and applications in this field. To this end, we present the first-ever LLM in geoscience, K2, alongside a suite of resources developed to further promote LLM research within geoscience. For instance, we have curated the first geoscience instruction tuning dataset, GeoSignal, which aims to align LLM responses to geoscience-related user queries. Additionally, we have established the first geoscience benchmark, GeoBench, to evaluate LLMs in the context of geoscience. In this work, we experiment with a complete recipe to adapt a pre-trained general-domain LLM to the geoscience domain. Specifically, we further train the LLaMA-7B model on 5.5B tokens of geoscience text corpus, including over 1 million pieces of geoscience literature, and utilize GeoSignal's supervised data to fine-tune the model. Moreover, we share a protocol that can efficiently gather domain-specific data and construct domain-supervised data, even in situations where manpower is scarce. Meanwhile, we equip K2 with the abilities of using tools to be a naive geoscience aide. Experiments conducted on the GeoBench demonstrate the effectiveness of our approach and datasets on geoscience knowledge understanding and utilization.We open-source all the training data and K2 model checkpoints at https://github.com/davendw49/k2.
Published: 2023

22. Contrastive Learning of Sentence Embeddings from Scratch

Author: Zhang, Junlei, Lan, Zhenzhong, and He, Junxian
Subjects: Computer Science - Computation and Language
Abstract: Contrastive learning has been the dominant approach to train state-of-the-art sentence embeddings. Previous studies have typically learned sentence embeddings either through the use of human-annotated natural language inference (NLI) data or via large-scale unlabeled sentences in an unsupervised manner. However, even in the case of unlabeled data, their acquisition presents challenges in certain domains due to various reasons. To address these issues, we present SynCSE, a contrastive learning framework that trains sentence embeddings with synthesized data. Specifically, we explore utilizing large language models to synthesize the required data samples for contrastive learning, including (1) producing positive and negative annotations given unlabeled sentences (SynCSE-partial), and (2) generating sentences along with their corresponding annotations from scratch (SynCSE-scratch). Experimental results on sentence similarity and reranking tasks indicate that both SynCSE-partial and SynCSE-scratch greatly outperform unsupervised baselines, and SynCSE-partial even achieves comparable performance to the supervised models in most settings., Comment: Emnlp 2023
Published: 2023

23. Automatic Model Selection with Large Language Models for Reasoning

Author: Zhao, James Xu, Xie, Yuxi, Kawaguchi, Kenji, He, Junxian, and Xie, Michael Qizhe
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Chain-of-Thought (CoT) and Program-Aided Language Models (PAL) represent two distinct reasoning methods, each with its own strengths. CoT employs natural language, offering flexibility and interpretability, while PAL utilizes programming language, yielding more structured and rigorous logic. We introduce a model selection method to combine the best of both worlds by employing a large language model (LLM) to dynamically select between them. Our theoretical analysis underscores the feasibility of this method, which is further corroborated by empirical results. Our proposed method demonstrates significant performance improvements across eight reasoning datasets with Codex, ChatGPT, and GPT-4. Additionally, our method is complementary to self-consistency; when integrated, it can further enhance performance while significantly reducing computation costs. Moreover, we achieve new state-of-the-art results on GSM8K and SVAMP, with respective accuracies of 96.8% and 93.7%. Our code, data and prompts are available at https://github.com/XuZhao0/Model-Selection-Reasoning, Comment: EMNLP 2023 Findings
Published: 2023

24. Evaluating Factual Consistency of Summaries with Large Language Models

Author: Chen, Shiqi, Gao, Siyang, and He, Junxian
Subjects: Computer Science - Computation and Language
Abstract: Detecting factual errors in summaries has been an important and challenging subject in summarization research. Inspired by the emergent ability of large language models (LLMs), we explore evaluating factual consistency of summaries by directly prompting LLMs. We present a comprehensive empirical study to assess the ability of LLMs as factual consistency evaluators, which consists of (1) analyzing different LLMs such as the GPT model series and Flan-T5; (2) investigating a variety of prompting methods including vanilla prompting, chain-of-thought prompting, and a sentence-by-sentence prompting method to tackle long summaries; and (3) evaluating on diverse summaries generated by multiple summarization systems, ranging from pre-transformer methods to SOTA pretrained models. Our experiments demonstrate that prompting LLMs is able to outperform the previous best factuality systems in all settings, by up to 12.2 absolute points in terms of the binary classification accuracy on inconsistency detection.
Published: 2023

25. C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

Author: Huang, Yuzhen, Bai, Yuzhuo, Zhu, Zhihao, Zhang, Junlei, Zhang, Jinghan, Su, Tangjun, Liu, Junteng, Lv, Chuancheng, Zhang, Yikai, Lei, Jiayi, Fu, Yao, Sun, Maosong, and He, Junxian
Subjects: Computer Science - Computation and Language
Abstract: New NLP benchmarks are urgently needed to align with the rapid development of large language models (LLMs). We present C-Eval, the first comprehensive Chinese evaluation suite designed to assess advanced knowledge and reasoning abilities of foundation models in a Chinese context. C-Eval comprises multiple-choice questions across four difficulty levels: middle school, high school, college, and professional. The questions span 52 diverse disciplines, ranging from humanities to science and engineering. C-Eval is accompanied by C-Eval Hard, a subset of very challenging subjects in C-Eval that requires advanced reasoning abilities to solve. We conduct a comprehensive evaluation of the most advanced LLMs on C-Eval, including both English- and Chinese-oriented models. Results indicate that only GPT-4 could achieve an average accuracy of over 60%, suggesting that there is still significant room for improvement for current LLMs. We anticipate C-Eval will help analyze important strengths and shortcomings of foundation models, and foster their development and growth for Chinese users., Comment: NeurIPS 2023. Website: https://cevalbenchmark.com
Published: 2023

26. Self-Evaluation Guided Beam Search for Reasoning

Author: Xie, Yuxi, Kawaguchi, Kenji, Zhao, Yiran, Zhao, Xu, Kan, Min-Yen, He, Junxian, and Xie, Qizhe
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Breaking down a problem into intermediate steps has demonstrated impressive performance in Large Language Model (LLM) reasoning. However, the growth of the reasoning chain introduces uncertainty and error accumulation, making it challenging to elicit accurate final results. To tackle this challenge of uncertainty in multi-step reasoning, we introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of LLMs. We propose a decoding algorithm integrating the self-evaluation guidance via stochastic beam search. The self-evaluation guidance serves as a better-calibrated automatic criterion, facilitating an efficient search in the reasoning space and resulting in superior prediction quality. Stochastic beam search balances exploitation and exploration of the search space with temperature-controlled randomness. Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34\%$, $9.56\%$, and $5.46\%$ on the GSM8K, AQuA, and StrategyQA benchmarks, respectively. Experiment results with Llama-2 on arithmetic reasoning demonstrate the efficiency of our method in outperforming the baseline methods with comparable computational budgets. Further analysis in multi-step reasoning finds our self-evaluation guidance pinpoints logic failures and leads to higher consistency and robustness. Our code is publicly available at https://guideddecoding.github.io/., Comment: NeurIPS 2023. 10 pages, 7 figures, 4 tables (33 pages, 14 figures, 15 tables including references and appendices)
Published: 2023

27. Mega: Moving Average Equipped Gated Attention

Author: Ma, Xuezhe, Zhou, Chunting, Kong, Xiang, He, Junxian, Gui, Liangke, Neubig, Graham, May, Jonathan, and Zettlemoyer, Luke
Subjects: Computer Science - Machine Learning
Abstract: The design choices in the Transformer attention mechanism, including weak inductive bias and quadratic computational complexity, have limited its application for modeling long sequences. In this paper, we introduce Mega, a simple, theoretically grounded, single-head gated attention mechanism equipped with (exponential) moving average to incorporate inductive bias of position-aware local dependencies into the position-agnostic attention mechanism. We further propose a variant of Mega that offers linear time and space complexity yet yields only minimal quality loss, by efficiently splitting the whole sequence into multiple chunks with fixed length. Extensive experiments on a wide range of sequence modeling benchmarks, including the Long Range Arena, neural machine translation, auto-regressive language modeling, and image and speech classification, show that Mega achieves significant improvements over other sequence models, including variants of Transformers and recent state space models., Comment: Accepted by ICLR 2023. Final version (updating MT results). 13 pages, 4 figures and 7 tables
Published: 2022

28. Non-Parametric Temporal Adaptation for Social Media Topic Classification

Author: Mireshghallah, Fatemehsadat, Vogler, Nikolai, He, Junxian, Florez, Omar, El-Kishky, Ahmed, and Berg-Kirkpatrick, Taylor
Subjects: Computer Science - Computation and Language
Abstract: User-generated social media data is constantly changing as new trends influence online discussion and personal information is deleted due to privacy concerns. However, most current NLP models are static and rely on fixed training data, which means they are unable to adapt to temporal change -- both test distribution shift and deleted training data -- without frequent, costly re-training. In this paper, we study temporal adaptation through the task of longitudinal hashtag prediction and propose a non-parametric dense retrieval technique, which does not require re-training, as a simple but effective solution. In experiments on a newly collected, publicly available, year-long Twitter dataset exhibiting temporal distribution shift, our method improves by 64.12% over the best parametric baseline without any of its costly gradient-based updating. Our dense retrieval approach is also particularly well-suited to dynamically deleted user data in line with data privacy laws, with negligible computational cost and performance loss.
Published: 2022

29. Comparison of characteristics of cervical cancer screening history between cervical adenocarcinoma and cervical squamous cell carcinoma

Author: Ye Minjuan, Li Jing, Chen Zifei, He Junxian, Chen Lifa, Zhang Yu
Subjects: cervical adenocarcinoma, cervical squamous cell carcinoma, cervical cancer screening, precancerous lesion, Medicine
Abstract: Objective To comparatively analyze the characteristics of cervical cancer screening history between patients with cervical adenocarcinoma and cervical squamous cell carcinoma， and to preliminarily evaluate the efficacy of cervical cancer screening for precancerous lesions of cervical adenocarcinoma. Methods Clinical data of 117 patients with cervical adenocarcinoma and 712 patients with cervical squamous cell carcinoma were retrospectively analyzed， and the differences in cervical cancer screening history were statistically analyzed between two groups. Results The proportion of cervical adenocarcinoma patients receiving cervical cancer screening was 24.5%， significantly higher than 6.8% of those with cervical squamous cell carcinoma （P < 0.001）. The proportion of cervical adenocarcinoma patients receiving regular screening or above was 18.4%， significantly higher than 2.8% of those with cervical squamous cell carcinoma （P < 0.001）. The proportion of symptom-detected cervical squamous cell carcinoma was 91.6%， significantly higher than 79.1% of their counterparts with cervical adenocarcinoma cell carcinoma （P < 0.001）. The proportion of screening-detected stageⅠ-ⅡA cervical adenocarcinoma was 24.6%， significantly higher than 11.1% of those with screening-detected stage Ⅰ-ⅡA cervical squamous cell carcinoma （P = 0.004）. The proportion of screening-detected stageⅠ-ⅡA cervical adenocarcinoma was 24.6%， significantly higher than 4.0% of those with screening-detected stageⅡB-Ⅳ cervical adenocarcinoma （P = 0.022）. Conclusions Current cervical cancer screening regimen yields higher efficacy for precancerous lesions of cervical squamous cell carcinoma compared with cervical adenocarcinoma. However， it still contributes to the diagnosis of early cervical adenocarcinoma. Therefore， extensive attention should be paid to cervical cancer screening. Cervical cancer screening regimen remains to be further optimized.
Published: 2024
Full Text: View/download PDF

30. Prompt Consistency for Zero-Shot Task Generalization

Author: Zhou, Chunting, He, Junxian, Ma, Xuezhe, Berg-Kirkpatrick, Taylor, and Neubig, Graham
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: One of the most impressive results of recent NLP history is the ability of pre-trained language models to solve new tasks in a zero-shot setting. To achieve this, NLP tasks are framed as natural language prompts, generating a response indicating the predicted output. Nonetheless, the performance in such settings often lags far behind its supervised counterpart, suggesting a large space for potential improvement. In this paper, we explore methods to utilize unlabeled data to improve zero-shot performance. Specifically, we take advantage of the fact that multiple prompts can be used to specify a single task, and propose to regularize prompt consistency, encouraging consistent predictions over this diverse set of prompts. Our method makes it possible to fine-tune the model either with extra unlabeled training data, or directly on test input at inference time in an unsupervised manner. In experiments, our approach outperforms the state-of-the-art zero-shot learner, T0 (Sanh et al., 2022), on 9 out of 11 datasets across 4 NLP tasks by up to 10.6 absolute points in terms of accuracy. The gains are often attained with a small number of unlabeled examples., Comment: EMNLP 2022 Findings. Code is available at https://github.com/violet-zct/swarm-distillation-zero-shot
Published: 2022

31. FOXO4-DRI improves spermatogenesis in aged mice through reducing senescence-associated secretory phenotype secretion from Leydig cells

Author: Li, Yanqing, Zhang, Chi, Cheng, Haicheng, Lv, LinYan, Zhu, Xinning, Ma, Menghui, Xu, Zhenhan, He, Junxian, Xie, Yun, Yang, Xing, Liang, Xiaoyan, Deng, Chunhua, and Liu, Guihua
Published: 2024
Full Text: View/download PDF

32. Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval

Author: Alon, Uri, Xu, Frank F., He, Junxian, Sengupta, Sudipta, Roth, Dan, and Neubig, Graham
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Retrieval-based language models (R-LM) model the probability of natural language text by combining a standard language model (LM) with examples retrieved from an external datastore at test time. While effective, a major bottleneck of using these models in practice is the computationally costly datastore search, which can be performed as frequently as every time step. In this paper, we present RetoMaton - retrieval automaton - which approximates the datastore search, based on (1) saving pointers between consecutive datastore entries, and (2) clustering of entries into "states". This effectively results in a weighted finite automaton built on top of the datastore, instead of representing the datastore as a flat list. The creation of the automaton is unsupervised, and a RetoMaton can be constructed from any text collection: either the original training corpus or from another domain. Traversing this automaton at inference time, in parallel to the LM inference, reduces its perplexity by up to 1.85, or alternatively saves up to 83% of the nearest neighbor searches over $k$NN-LM (Khandelwal et al., 2020) without hurting perplexity. Our code and trained models are available at https://github.com/neulab/retomaton ., Comment: Accepted to ICML'2022. Code and models are available at https://github.com/neulab/retomaton
Published: 2022

33. A visual cortex-inspired edge neuromorphic hardware architecture with on-chip multi-layer STDP learning

Author: He, Junxian, Tian, Min, Jiang, Ying, Wang, Haibing, Wang, Tengxiao, Zhou, Xichuan, Liu, Liyuan, Wu, Nanjian, Wang, Ying, and Shi, Cong
Published: 2024
Full Text: View/download PDF

34. Towards a Unified View of Parameter-Efficient Transfer Learning

Author: He, Junxian, Zhou, Chunting, Ma, Xuezhe, Berg-Kirkpatrick, Taylor, and Neubig, Graham
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP. However, conventional approaches fine-tune all the parameters of the pre-trained model, which becomes prohibitive as the model size and the number of tasks grow. Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance. While effective, the critical ingredients for success and the connections among the various methods are poorly understood. In this paper, we break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them. Specifically, we re-frame them as modifications to specific hidden states in pre-trained models, and define a set of design dimensions along which different methods vary, such as the function to compute the modification and the position to apply the modification. Through comprehensive empirical studies across machine translation, text summarization, language understanding, and text classification benchmarks, we utilize the unified view to identify important design choices in previous methods. Furthermore, our unified framework enables the transfer of design elements across different approaches, and as a result we are able to instantiate new parameter-efficient fine-tuning methods that tune less parameters than previous methods while being more effective, achieving comparable results to fine-tuning all parameters on all four tasks., Comment: ICLR 2022 (spotlight presentation). Code is available at https://github.com/jxhe/unify-parameter-efficient-tuning
Published: 2021

35. Capturing Structural Locality in Non-parametric Language Models

Author: Xu, Frank F., He, Junxian, Neubig, Graham, and Hellendoorn, Vincent J.
Subjects: Computer Science - Computation and Language, Computer Science - Software Engineering
Abstract: Structural locality is a ubiquitous feature of real-world datasets, wherein data points are organized into local hierarchies. Some examples include topical clusters in text or project hierarchies in source code repositories. In this paper, we explore utilizing this structural locality within non-parametric language models, which generate sequences that reference retrieved examples from an external source. We propose a simple yet effective approach for adding locality information into such models by adding learned parameters that improve the likelihood of retrieving examples from local neighborhoods. Experiments on two different domains, Java source code and Wikipedia text, demonstrate that locality features improve model efficacy over models without access to these features, with interesting differences. We also perform an analysis of how and where locality features contribute to improved performance and why the traditionally used contextual similarity metrics alone are not enough to grasp the locality structure., Comment: ICLR 2022
Published: 2021

36. Dependency Induction Through the Lens of Visual Perception

Author: Su, Ruisi, Rijhwani, Shruti, Zhu, Hao, He, Junxian, Wang, Xinyu, Bisk, Yonatan, and Neubig, Graham
Subjects: Computer Science - Computation and Language
Abstract: Most previous work on grammar induction focuses on learning phrasal or dependency structure purely from text. However, because the signal provided by text alone is limited, recently introduced visually grounded syntax models make use of multimodal information leading to improved performance in constituency grammar induction. However, as compared to dependency grammars, constituency grammars do not provide a straightforward way to incorporate visual information without enforcing language-specific heuristics. In this paper, we propose an unsupervised grammar induction model that leverages word concreteness and a structural vision-based heuristic to jointly learn constituency-structure and dependency-structure grammars. Our experiments find that concreteness is a strong indicator for learning dependency grammars, improving the direct attachment score (DAS) by over 50\% as compared to state-of-the-art models trained on pure text. Next, we propose an extension of our model that leverages both word concreteness and visual semantic role labels in constituency and dependency parsing. Our experiments show that the proposed extension outperforms the current state-of-the-art visually grounded models in constituency parsing even with a smaller grammar size., Comment: Accepted to CoNLL 2021
Published: 2021

37. Efficient Nearest Neighbor Language Models

Author: He, Junxian, Neubig, Graham, and Berg-Kirkpatrick, Taylor
Subjects: Computer Science - Computation and Language
Abstract: Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore, which allows them to learn through explicitly memorizing the training datapoints. While effective, these models often require retrieval from a large datastore at test time, significantly increasing the inference overhead and thus limiting the deployment of non-parametric NLMs in practical applications. In this paper, we take the recently proposed $k$-nearest neighbors language model (Khandelwal et al., 2020) as an example, exploring methods to improve its efficiency along various dimensions. Experiments on the standard WikiText-103 benchmark and domain-adaptation datasets show that our methods are able to achieve up to a 6x speed-up in inference speed while retaining comparable performance. The empirical analysis we present may provide guidelines for future research seeking to develop or deploy more efficient non-parametric NLMs., Comment: EMNLP 2021. Update to fix typos. Code is at https://github.com/jxhe/efficient-knnlm
Published: 2021

38. Hydrogen peroxide receptors regulate chilling injury of banana fruit during low-temperature storage

Author: Zhang, Shuting, Shan, Youxia, Li, Ying, He, Junxian, and Jiang, Yueming
Published: 2024
Full Text: View/download PDF

39. Determination of sulfide in complex biofilm matrices using silver-coated, 4-mercaptobenzonitrile-modified gold nanoparticles, encapsulated in ZIF-8 as surface-enhanced Raman scattering nanoprobe

Author: He, Junxian, Qi, Peng, Zhang, Dun, Zeng, Yan, Zhao, Ping, and Wang, Peng
Published: 2023
Full Text: View/download PDF

40. ATP homeostasis and signaling in plants

Author: Xiao, Jiaqi, Zhou, Yijie, Xie, Yunyun, Li, Taotao, Su, Xinguo, He, Junxian, Jiang, Yueming, Zhu, Hong, and Qu, Hongxia
Published: 2024
Full Text: View/download PDF

41. CTRLsum: Towards Generic Controllable Text Summarization

Author: He, Junxian, Kryściński, Wojciech, McCann, Bryan, Rajani, Nazneen, and Xiong, Caiming
Subjects: Computer Science - Computation and Language
Abstract: Current summarization systems yield generic summaries that are disconnected from users' preferences and expectations. To address this limitation, we present CTRLsum, a novel framework for controllable summarization. Our approach enables users to control multiple aspects of generated summaries by interacting with the summarization system through textual input in the form of a set of keywords or descriptive prompts. Using a single unified model, CTRLsum is able to achieve a broad scope of summary manipulation at inference time without requiring additional human annotations or pre-defining a set of control aspects during training. We quantitatively demonstrate the effectiveness of our approach on three domains of summarization datasets and five control aspects: 1) entity-centric and 2) length-controllable summarization, 3) contribution summarization on scientific papers, 4) invention purpose summarization on patent filings, and 5) question-guided summarization on news articles in a reading comprehension setting. Moreover, when used in a standard, uncontrolled summarization setting, CTRLsum achieves state-of-the-art results on the CNN/DailyMail dataset. Code and model checkpoints are available at https://github.com/salesforce/ctrl-sum, Comment: Preprint
Published: 2020

42. A Mechanism of Unstable Growth of Hydraulic Fractures in Laboratory Experiments

Author: Dyskin, Arcady V., Pasternak, Elena, He, Junxian, Wu, Wei, Series Editor, Pasternak, Elena, editor, and Dyskin, Arcady, editor
Published: 2023
Full Text: View/download PDF

43. On the Sentence Embeddings from Pre-trained Language Models

Author: Li, Bohan, Zhou, Hao, He, Junxian, Wang, Mingxuan, Yang, Yiming, and Li, Lei
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Pre-trained contextual representations like BERT have achieved great success in natural language processing. However, the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences. In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We first reveal the theoretical connection between the masked language model pre-training objective and the semantic similarity task theoretically, and then analyze the BERT sentence embeddings empirically. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. To address this issue, we propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective. Experimental results show that our proposed BERT-flow method obtains significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks. The code is available at https://github.com/bohanli/BERT-flow., Comment: EMNLP 2020
Published: 2020

44. Learning Sparse Prototypes for Text Generation

Author: He, Junxian, Berg-Kirkpatrick, Taylor, and Neubig, Graham
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Prototype-driven text generation uses non-parametric models that first choose from a library of sentence "prototypes" and then modify the prototype to generate the output text. While effective, these methods are inefficient at test time as a result of needing to store and index the entire training corpus. Further, existing methods often require heuristics to identify which prototypes to reference at training time. In this paper, we propose a novel generative model that automatically learns a sparse prototype support set that, nonetheless, achieves strong language modeling performance. This is achieved by (1) imposing a sparsity-inducing prior on the prototype selection distribution, and (2) utilizing amortized variational inference to learn a prototype retrieval function. In experiments, our model outperforms previous prototype-driven language models while achieving up to a 1000x memory reduction, as well as a 1000x speed-up at test time. More interestingly, we show that the learned prototypes are able to capture semantics and syntax at different granularity as we vary the sparsity of prototype selection, and that certain sentence attributes can be controlled by specifying the prototype for generation., Comment: NeurIPS 2020 Conference Paper
Published: 2020

45. A Probabilistic Formulation of Unsupervised Text Style Transfer

Author: He, Junxian, Wang, Xinyi, Neubig, Graham, and Berg-Kirkpatrick, Taylor
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: We present a deep generative model for unsupervised text style transfer that unifies previously proposed non-generative techniques. Our probabilistic approach models non-parallel data from two domains as a partially observed parallel corpus. By hypothesizing a parallel latent sequence that generates each observed sequence, our model learns to transform sequences from one domain to another in a completely unsupervised fashion. In contrast with traditional generative sequence models (e.g. the HMM), our model makes few assumptions about the data it generates: it uses a recurrent language model as a prior and an encoder-decoder as a transduction distribution. While computation of marginal data likelihood is intractable in this model class, we show that amortized variational inference admits a practical surrogate. Further, by drawing connections between our variational objective and other recent unsupervised style transfer and machine translation techniques, we show how our probabilistic view can unify some known non-generative objectives such as backtranslation and adversarial loss. Finally, we demonstrate the effectiveness of our method on a wide range of unsupervised style transfer tasks, including sentiment transfer, formality transfer, word decipherment, author imitation, and related language translation. Across all style transfer tasks, our approach yields substantial gains over state-of-the-art non-generative baselines, including the state-of-the-art unsupervised machine translation techniques that our approach generalizes. Further, we conduct experiments on a standard unsupervised machine translation task and find that our unified approach matches the current state-of-the-art., Comment: ICLR 2020 conference paper (spotlight). The first two authors contributed equally
Published: 2020

46. Revisiting Self-Training for Neural Sequence Generation

Author: He, Junxian, Gu, Jiatao, Shen, Jiajun, and Ranzato, Marc'Aurelio
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: Self-training is one of the earliest and simplest semi-supervised methods. The key idea is to augment the original labeled dataset with unlabeled data paired with the model's prediction (i.e. the pseudo-parallel data). While self-training has been extensively studied on classification problems, in complex sequence generation tasks (e.g. machine translation) it is still unclear how self-training works due to the compositionality of the target space. In this work, we first empirically show that self-training is able to decently improve the supervised baseline on neural sequence generation tasks. Through careful examination of the performance gains, we find that the perturbation on the hidden states (i.e. dropout) is critical for self-training to benefit from the pseudo-parallel data, which acts as a regularizer and forces the model to yield close predictions for similar unlabeled inputs. Such effect helps the model correct some incorrect predictions on unlabeled data. To further encourage this mechanism, we propose to inject noise to the input space, resulting in a "noisy" version of self-training. Empirical study on standard machine translation and text summarization benchmarks shows that noisy self-training is able to effectively utilize unlabeled data and improve the performance of the supervised baseline by a large margin., Comment: ICLR 2020. The first two authors contributed equally. Updated to fix typos
Published: 2019

47. The Source-Target Domain Mismatch Problem in Machine Translation

Author: Shen, Jiajun, Chen, Peng-Jen, Le, Matt, He, Junxian, Gu, Jiatao, Ott, Myle, Auli, Michael, and Ranzato, Marc'Aurelio
Subjects: Computer Science - Computation and Language
Abstract: While we live in an increasingly interconnected world, different places still exhibit strikingly different cultures and many events we experience in our every day life pertain only to the specific place we live in. As a result, people often talk about different things in different parts of the world. In this work we study the effect of local context in machine translation and postulate that particularly in low resource settings this causes the domains of the source and target language to greatly mismatch, as the two languages are often spoken in further apart regions of the world with more distinctive cultural traits and unrelated local events. We first formalize the concept of source-target domain mismatch, propose a metric to quantify it, and provide empirical evidence corroborating our intuition that organic text produced by people speaking very different languages exhibits the most dramatic differences. We conclude with an empirical study of how source-target domain mismatch affects training of machine translation systems for low resource language pairs. In particular, we find that it severely affects back-translation, but the degradation can be alleviated by combining back-translation with self-training and by increasing the relative amount of target side monolingual data.
Published: 2019

48. A Surprisingly Effective Fix for Deep Latent Variable Modeling of Text

Author: Li, Bohan, He, Junxian, Neubig, Graham, Berg-Kirkpatrick, Taylor, and Yang, Yiming
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: When trained effectively, the Variational Autoencoder (VAE) is both a powerful language model and an effective representation learning framework. In practice, however, VAEs are trained with the evidence lower bound (ELBO) as a surrogate objective to the intractable marginal data likelihood. This approach to training yields unstable results, frequently leading to a disastrous local optimum known as posterior collapse. In this paper, we investigate a simple fix for posterior collapse which yields surprisingly effective results. The combination of two known heuristics, previously considered only in isolation, substantially improves held-out likelihood, reconstruction, and latent representation learning when compared with previous state-of-the-art methods. More interestingly, while our experiments demonstrate superiority on these principle evaluations, our method obtains a worse ELBO. We use these results to argue that the typical surrogate objective for VAEs may not be sufficient or necessarily appropriate for balancing the goals of representation learning and data distribution modeling., Comment: EMNLP 2019 short paper. The first two authors contributed equally
Published: 2019

49. Cross-Lingual Syntactic Transfer through Unsupervised Adaptation of Invertible Projections

Author: He, Junxian, Zhang, Zhisong, Berg-Kirkpatrick, Taylor, and Neubig, Graham
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Cross-lingual transfer is an effective way to build syntactic analysis tools in low-resource languages. However, transfer is difficult when transferring to typologically distant languages, especially when neither annotated target data nor parallel corpora are available. In this paper, we focus on methods for cross-lingual transfer to distant languages and propose to learn a generative model with a structured prior that utilizes labeled source data and unlabeled target data jointly. The parameters of source model and target model are softly shared through a regularized log likelihood objective. An invertible projection is employed to learn a new interlingual latent embedding space that compensates for imperfect cross-lingual word embedding input. We evaluate our method on two syntactic tasks: part-of-speech (POS) tagging and dependency parsing. On the Universal Dependency Treebanks, we use English as the only source corpus and transfer to a wide range of target languages. On the 10 languages in this dataset that are distant from English, our method yields an average of 5.2% absolute improvement on POS tagging and 8.3% absolute improvement on dependency parsing over a direct transfer method using state-of-the-art discriminative models., Comment: ACL 2019 long paper
Published: 2019

50. Choosing Transfer Languages for Cross-Lingual Learning

Author: Lin, Yu-Hsiang, Chen, Chian-Yu, Lee, Jean, Li, Zirui, Zhang, Yuyan, Xia, Mengzhou, Rijhwani, Shruti, He, Junxian, Zhang, Zhisong, Ma, Xuezhe, Anastasopoulos, Antonios, Littell, Patrick, and Neubig, Graham
Subjects: Computer Science - Computation and Language
Abstract: Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages. However, given a particular task language, it is not clear which language to transfer from, and the standard strategy is to select languages based on ad hoc criteria, usually the intuition of the experimenter. Since a large number of features contribute to the success of cross-lingual transfer (including phylogenetic similarity, typological properties, lexical overlap, or size of available data), even the most enlightened experimenter rarely considers all these factors for the particular task at hand. In this paper, we consider this task of automatically selecting optimal transfer languages as a ranking problem, and build models that consider the aforementioned features to perform this prediction. In experiments on representative NLP tasks, we demonstrate that our model predicts good transfer languages much better than ad hoc baselines considering single features in isolation, and glean insights on what features are most informative for each different NLP tasks, which may inform future ad hoc selection even without use of our method. Code, data, and pre-trained models are available at https://github.com/neulab/langrank, Comment: Proceedings of ACL 2019
Published: 2019

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

318 results on '"He, Junxian"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources