Author: "Hu, Tianxiang" / Topic: computer science - computation and language - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Hu, Tianxiang"' showing total 6 results

Start Over Author "Hu, Tianxiang" Topic computer science - computation and language

6 results on '"Hu, Tianxiang"'

1. FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs

Author: Fan, Zhiting, Chen, Ruizhe, Hu, Tianxiang, and Liu, Zuozhu
Subjects: Computer Science - Computation and Language
Abstract: The growing use of large language model (LLM)-based chatbots has raised concerns about fairness. Fairness issues in LLMs can lead to severe consequences, such as bias amplification, discrimination, and harm to marginalized communities. While existing fairness benchmarks mainly focus on single-turn dialogues, multi-turn scenarios, which in fact better reflect real-world conversations, present greater challenges due to conversational complexity and potential bias accumulation. In this paper, we propose a comprehensive fairness benchmark for LLMs in multi-turn dialogue scenarios, \textbf{FairMT-Bench}. Specifically, we formulate a task taxonomy targeting LLM fairness capabilities across three stages: context understanding, user interaction, and instruction trade-offs, with each stage comprising two tasks. To ensure coverage of diverse bias types and attributes, we draw from existing fairness datasets and employ our template to construct a multi-turn dialogue dataset, \texttt{FairMT-10K}. For evaluation, GPT-4 is applied, alongside bias classifiers including Llama-Guard-3 and human validation to ensure robustness. Experiments and analyses on \texttt{FairMT-10K} reveal that in multi-turn dialogue scenarios, current LLMs are more likely to generate biased responses, and there is significant variation in performance across different tasks and models. Based on this, we curate a challenging dataset, \texttt{FairMT-1K}, and test 15 current state-of-the-art (SOTA) LLMs on this dataset. The results show the current state of fairness in LLMs and showcase the utility of this novel approach for assessing fairness in more realistic multi-turn dialogue contexts, calling for future work to focus on LLM fairness improvement and the adoption of \texttt{FairMT-1K} in such efforts.
Published: 2024

2. Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning

Author: Hu, Tianxiang, Zhang, Pei, Yang, Baosong, Xie, Jun, Wong, Derek F., and Wang, Rui
Subjects: Computer Science - Computation and Language
Abstract: Achieving consistent high-quality machine translation (MT) across diverse domains remains a significant challenge, primarily due to the limited and imbalanced parallel training data available in various domains. While large language models (LLMs) have demonstrated impressive general understanding and generation abilities, their potential in multi-domain MT is under-explored. We establish a comprehensive benchmark for multi-domain translation, featuring 25 German$\Leftrightarrow$English and 22 Chinese$\Leftrightarrow$English test sets respectively covering 15 domains. Our evaluation of prominent LLMs reveals a discernible performance gap against traditional MT systems, highlighting domain overfitting and catastrophic forgetting issues after fine-tuning on domain-limited corpora. To mitigate this, we propose a domain Chain of Thought (CoT) fine-tuning technique that utilizes the intrinsic multi-domain intelligence of LLMs to improve translation performance. This method inspires the LLM to perceive domain information from the source text, which then serves as a helpful hint to guide the translation process. Despite being trained on a small dataset of four domains, our CoT fine-tune approach achieves notable enhancements in translation accuracy and domain robustness than traditional fine-tuning, as evidenced by an average 1.53 BLEU score increase in over 20 German$\rightarrow$English distinct out-of-domain tests.
Published: 2024

3. Learnable Privacy Neurons Localization in Language Models

Author: Chen, Ruizhe, Hu, Tianxiang, Feng, Yang, and Liu, Zuozhu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Cryptography and Security
Abstract: Concerns regarding Large Language Models (LLMs) to memorize and disclose private information, particularly Personally Identifiable Information (PII), become prominent within the community. Many efforts have been made to mitigate the privacy risks. However, the mechanism through which LLMs memorize PII remains poorly understood. To bridge this gap, we introduce a pioneering method for pinpointing PII-sensitive neurons (privacy neurons) within LLMs. Our method employs learnable binary weight masks to localize specific neurons that account for the memorization of PII in LLMs through adversarial training. Our investigations discover that PII is memorized by a small subset of neurons across all layers, which shows the property of PII specificity. Furthermore, we propose to validate the potential in PII risk mitigation by deactivating the localized privacy neurons. Both quantitative and qualitative experiments demonstrate the effectiveness of our neuron localization algorithm., Comment: ACL 2024 main conference
Published: 2024

4. PolyLM: An Open Source Polyglot Large Language Model

Author: Wei, Xiangpeng, Wei, Haoran, Lin, Huan, Li, Tianhao, Zhang, Pei, Ren, Xingzhang, Li, Mei, Wan, Yu, Cao, Zhiwei, Xie, Binbin, Hu, Tianxiang, Li, Shangjie, Hui, Binyuan, Yu, Bowen, Liu, Dayiheng, Yang, Baosong, Huang, Fei, and Xie, Jun
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) demonstrate remarkable ability to comprehend, reason, and generate following nature language instructions. However, the development of LLMs has been primarily focused on high-resource languages, such as English, thereby limiting their applicability and research in other languages. Consequently, we present PolyLM, a multilingual LLM trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B. To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training. Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning. To assess the model's performance, we collect several existing multilingual tasks, including multilingual understanding, question answering, generation, and translation. Extensive experiments show that PolyLM surpasses other open-source models such as LLaMA and BLOOM on multilingual tasks while maintaining comparable performance in English. Our models, alone with the instruction data and multilingual benchmark, are available at: \url{https://modelscope.cn/models/damo/nlp_polylm_13b_text_generation}.
Published: 2023

5. WCL-BBCD: A Contrastive Learning and Knowledge Graph Approach to Named Entity Recognition

Author: Zhou, Renjie, Hu, Qiang, Wan, Jian, Zhang, Jilin, Liu, Qiang, Hu, Tianxiang, and Li, Jianjun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Named Entity Recognition task is one of the core tasks of information extraction. Word ambiguity and word abbreviation are important reasons for the low recognition rate of named entities. In this paper, we propose a novel named entity recognition model WCL-BBCD (Word Contrastive Learning with BERT-BiLSTM-CRF-DBpedia), which incorporates the idea of contrastive learning. The model first trains the sentence pairs in the text, calculate similarity between sentence pairs, and fine-tunes BERT used for the named entity recognition task according to the similarity, so as to alleviate word ambiguity. Then, the fine-tuned BERT is combined with BiLSTM-CRF to perform the named entity recognition task. Finally, the recognition results are corrected in combination with prior knowledge such as knowledge graphs, so as to alleviate the low-recognition-rate problem caused by word abbreviations. The results of experimentals conducted on the CoNLL-2003 English dataset and OntoNotes V5 English dataset show that our model outperforms other similar models on.
Published: 2022

6. Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Author: Ye, Wei, Xie, Rui, Zhang, Jinglei, Hu, Tianxiang, Wang, Xiaoyin, and Zhang, Shikun
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language, Computer Science - Software Engineering
Abstract: Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Since both tasks aim to model the association between natural language and programming language, recent studies have combined these two tasks to improve their performance. However, researchers have yet been able to effectively leverage the intrinsic connection between the two tasks as they train these tasks in a separate or pipeline manner, which means their performance can not be well balanced. In this paper, we propose a novel end-to-end model for the two tasks by introducing an additional code generation task. More specifically, we explicitly exploit the probabilistic correlation between code summarization and code generation with dual learning, and utilize the two encoders for code summarization and code generation to train the code retrieval task via multi-task learning. We have carried out extensive experiments on an existing dataset of SQL and Python, and results show that our model can significantly improve the results of the code retrieval task over the-state-of-art models, as well as achieve competitive performance in terms of BLEU score for the code summarization task., Comment: Published at The Web Conference (WWW) 2020, full paper
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

6 results on '"Hu, Tianxiang"'

1. FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs

2. Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning

3. Learnable Privacy Neurons Localization in Language Models

4. PolyLM: An Open Source Polyglot Large Language Model

5. WCL-BBCD: A Contrastive Learning and Knowledge Graph Approach to Named Entity Recognition

6. Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

6 results on '"Hu, Tianxiang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources