Author: "Zhao, Hai" / Publication Year Range: Last 10 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhao, Hai"' showing total 5,553 results

Start Over Author "Zhao, Hai" Publication Year Range Last 10 years

5,553 results on '"Zhao, Hai"'

1. KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing

Author: Yang, Yifei, Cao, Zouying, Chen, Qiguang, Qin, Libo, Yang, Dongjie, Zhao, Hai, and Chen, Zhi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: The development of large language models (LLMs) has significantly expanded model sizes, resulting in substantial GPU memory requirements during inference. The key and value storage of the attention map in the KV (key-value) cache accounts for more than 80\% of this memory consumption. Nowadays, most existing KV cache compression methods focus on intra-layer compression within a single Transformer layer but few works consider layer-wise compression. In this paper, we propose a plug-and-play method called \textit{KVSharer}, which shares the KV cache between layers to achieve layer-wise compression. Rather than intuitively sharing based on higher similarity, we discover a counterintuitive phenomenon: sharing dissimilar KV caches better preserves the model performance. Experiments show that \textit{KVSharer} can reduce KV cache computation by 30\%, thereby lowering memory consumption without significantly impacting model performance and it can also achieve at least 1.3 times generation acceleration. Additionally, we verify that \textit{KVSharer} is compatible with existing intra-layer KV cache compression methods, and combining both can further save memory., Comment: Under Review by ICLR2025
Published: 2024

2. Instruction-Driven Game Engine: A Poker Case Study

Author: Wu, Hongqiu, Liu, Xingyuan, Wang, Yan, and Zhao, Hai
Subjects: Computer Science - Artificial Intelligence, Computer Science - Software Engineering
Abstract: The Instruction-Driven Game Engine (IDGE) project aims to democratize game development by enabling a large language model (LLM) to follow free-form game descriptions and generate game-play processes. The IDGE allows users to create games simply by natural language instructions, which significantly lowers the barrier for game development. We approach the learning process for IDGEs as a Next State Prediction task, wherein the model autoregressively predicts the game states given player actions. The computation of game states must be precise; otherwise, slight errors could corrupt the game-play experience. This is challenging because of the gap between stability and diversity. To address this, we train the IDGE in a curriculum manner that progressively increases its exposure to complex scenarios. Our initial progress lies in developing an IDGE for Poker, which not only supports a wide range of poker variants but also allows for highly individualized new poker games through natural language inputs. This work lays the groundwork for future advancements in transforming how games are created and played., Comment: EMNLP 2024 Demo. arXiv admin note: substantial text overlap with arXiv:2404.00276
Published: 2024

3. Are LLMs Aware that Some Questions are not Open-ended?

Author: Yang, Dongjie and Zhao, Hai
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) have shown the impressive capability of answering questions in a wide range of scenarios. However, when LLMs face different types of questions, it is worth exploring whether LLMs are aware that some questions have limited answers and need to respond more deterministically but some do not. We refer to this as question awareness of LLMs. The lack of question awareness in LLMs leads to two phenomena that LLMs are: (1) too casual to answer non-open-ended questions or (2) too boring to answer open-ended questions. In this paper, we first evaluate the question awareness in LLMs. The experimental results show that LLMs have the issues of lacking awareness of questions in certain domains, e.g. factual knowledge, resulting in hallucinations during the generation. To mitigate these, we propose a method called Question Awareness Temperature Sampling (QuATS). This method enhances the question awareness of LLMs by adaptively adjusting the output distributions based on question features. The automatic adjustment in QuATS eliminates the need for manual temperature tuning in text generation and consistently improves model performance in various benchmarks., Comment: Accepted by EMNLP 2024
Published: 2024

4. VHASR: A Multimodal Speech Recognition System With Vision Hotwords

Author: Hu, Jiliang, Li, Zuchao, Wang, Ping, Ai, Haojun, Zhang, Lefei, and Zhao, Hai
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The image-based multimodal automatic speech recognition (ASR) model enhances speech recognition performance by incorporating audio-related image. However, some works suggest that introducing image information to model does not help improving ASR performance. In this paper, we propose a novel approach effectively utilizing audio-related image information and set up VHASR, a multimodal speech recognition system that uses vision as hotwords to strengthen the model's speech recognition capability. Our system utilizes a dual-stream architecture, which firstly transcribes the text on the two streams separately, and then combines the outputs. We evaluate the proposed model on four datasets: Flickr8k, ADE20k, COCO, and OpenImages. The experimental results show that VHASR can effectively utilize key information in images to enhance the model's speech recognition ability. Its performance not only surpasses unimodal ASR, but also achieves SOTA among existing image-based multimodal ASR., Comment: 14 pages, 6 figures, accepted by EMNLP 2024
Published: 2024

5. Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models

Author: Shi, Luohe, Yao, Yao, Li, Zuchao, Zhang, Lefei, and Zhao, Hai
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and Parameter-Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting LLMs to downstream tasks. ICL typically constructs a few-shot learning scenario, either manually or by setting up a Retrieval-Augmented Generation (RAG) system, helping models quickly grasp domain knowledge or question-answering patterns without changing model parameters. However, this approach involves trade-offs, such as slower inference speed and increased space occupancy. PEFT assists the model in adapting to tasks through minimal parameter modifications, but the training process still demands high hardware requirements, even with a small number of parameters involved. To address these challenges, we propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning, maintaining low inference costs. RTD constructs a reference datastore from the provided training examples and optimizes the LLM's final vocabulary distribution by flexibly selecting suitable references based on the input, resulting in more trustable responses and enabling the model to adapt to downstream tasks at a low cost. Experimental evaluations on various LLMs using different benchmarks demonstrate that RTD establishes a new paradigm for augmenting models to downstream tasks. Furthermore, our method exhibits strong orthogonality with traditional methods, allowing for concurrent usage.
Published: 2024

6. A Coin Has Two Sides: A Novel Detector-Corrector Framework for Chinese Spelling Correction

Author: Zeng, Xiangke, Li, Zuchao, Zhang, Lefei, Wang, Ping, Wu, Hongqiu, and Zhao, Hai
Subjects: Computer Science - Computation and Language
Abstract: Chinese Spelling Correction (CSC) stands as a foundational Natural Language Processing (NLP) task, which primarily focuses on the correction of erroneous characters in Chinese texts. Certain existing methodologies opt to disentangle the error correction process, employing an additional error detector to pinpoint error positions. However, owing to the inherent performance limitations of error detector, precision and recall are like two sides of the coin which can not be both facing up simultaneously. Furthermore, it is also worth investigating how the error position information can be judiciously applied to assist the error correction. In this paper, we introduce a novel approach based on error detector-corrector framework. Our detector is designed to yield two error detection results, each characterized by high precision and recall. Given that the occurrence of errors is context-dependent and detection outcomes may be less precise, we incorporate the error detection results into the CSC task using an innovative feature fusion strategy and a selective masking strategy. Empirical experiments conducted on mainstream CSC datasets substantiate the efficacy of our proposed method., Comment: ECAI-2024
Published: 2024

7. Nothing in Excess: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering

Author: Cao, Zouying, Yang, Yifei, and Zhao, Hai
Subjects: Computer Science - Artificial Intelligence
Abstract: Safety alignment is indispensable for Large language models (LLMs) to defend threats from malicious instructions. However, recent researches reveal safety-aligned LLMs prone to reject benign queries due to the exaggerated safety issue, limiting their helpfulness. In this paper, we propose a Safety-Conscious Activation Steering (SCANS) method to mitigate the exaggerated safety concerns in aligned LLMs. First, SCANS extracts the refusal steering vectors within the activation space and utilizes vocabulary projection to anchor some specific safety-critical layers which influence model refusal behavior. Second, by tracking the hidden state transition, SCANS identifies the steering direction and steers the model behavior accordingly, achieving a balance between exaggerated safety and adequate safety. Experiments show that SCANS achieves new state-of-the-art performance on XSTest and OKTest benchmarks, without impairing their defense capability against harmful queries and maintaining almost unchanged model capability.
Published: 2024

8. MEGen: Generative Backdoor in Large Language Models via Model Editing

Author: Qiu, Jiyang, Ma, Xinbei, Zhang, Zhuosheng, and Zhao, Hai
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) have demonstrated remarkable capabilities. Their powerful generative abilities enable flexible responses based on various queries or instructions. Emerging as widely adopted generalists for diverse tasks, LLMs are still vulnerable to backdoors. This paper proposes an editing-based generative backdoor, named MEGen, aiming to create a customized backdoor for NLP tasks with the least side effects. In our approach, we first leverage a language model to insert a trigger selected on fixed metrics into the input, then design a pipeline of model editing to directly embed a backdoor into an LLM. By adjusting a small set of local parameters with a mini-batch of samples, MEGen significantly enhances time efficiency and achieves high robustness. Experimental results indicate that our backdoor attack strategy achieves a high attack success rate on poison data while maintaining the model's performance on clean data. Notably, the backdoored model, when triggered, can freely output pre-set dangerous information while successfully completing downstream tasks. This suggests that future LLM applications could be guided to deliver certain dangerous information, thus altering the LLM's generative style. We believe this approach provides insights for future LLM applications and the execution of backdoor attacks on conversational AI systems., Comment: Working in progress
Published: 2024

9. BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction

Author: Yang, Yifei, Shi, Runhan, Li, Zuchao, Jiang, Shu, Lu, Bao-Liang, Yang, Yang, and Zhao, Hai
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computational Engineering, Finance, and Science
Abstract: Retrosynthesis analysis is pivotal yet challenging in drug discovery and organic chemistry. Despite the proliferation of computational tools over the past decade, AI-based systems often fall short in generalizing across diverse reaction types and exploring alternative synthetic pathways. This paper presents BatGPT-Chem, a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction. Integrating chemical tasks via a unified framework of natural language and SMILES notation, this approach synthesizes extensive instructional data from an expansive chemical database. Employing both autoregressive and bidirectional training techniques across over one hundred million instances, BatGPT-Chem captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions and exhibiting strong zero-shot capabilities. Superior to existing AI methods, our model demonstrates significant advancements in generating effective strategies for complex molecules, as validated by stringent benchmark tests. BatGPT-Chem not only boosts the efficiency and creativity of retrosynthetic analysis but also establishes a new standard for computational tools in synthetic design. This development empowers chemists to adeptly address the synthesis of novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science. We release our trial platform at \url{https://www.batgpt.net/dapp/chem}.
Published: 2024

10. Self-Directed Turing Test for Large Language Models

Author: Wu, Weiqi, Wu, Hongqiu, and Zhao, Hai
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The Turing test examines whether AIs can exhibit human-like behaviour in natural language conversations. Traditional Turing tests adopt a rigid dialogue format where each participant sends only one message each time and require continuous human involvement to direct the entire interaction with the test subject. This fails to reflect a natural conversational style and hinders the evaluation of Large Language Models (LLMs) in complex and prolonged dialogues. This paper proposes the Self-Directed Turing Test, which extends the original test with a burst dialogue format, allowing more dynamic exchanges by multiple consecutive messages. It further efficiently reduces human workload by having the LLM self-direct the majority of the test process, iteratively generating dialogues that simulate its interaction with humans. With the pseudo-dialogue history, the model then engages in a shorter dialogue with a human, which is paired with a human-human conversation on the same topic to be judged using questionnaires. We introduce the X-Turn Pass-Rate metric to assess the human likeness of LLMs across varying durations. While LLMs like GPT-4 initially perform well, achieving pass rates of 51.9% and 38.9% during 3 turns and 10 turns of dialogues respectively, their performance drops as the dialogue progresses, which underscores the difficulty in maintaining consistency in the long term.
Published: 2024

11. Game Development as Human-LLM Interaction

Author: Hong, Jiale, Wu, Hongqiu, and Zhao, Hai
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Human-Computer Interaction
Abstract: Game development is a highly specialized task that relies on a complex game engine powered by complex programming languages, preventing many gaming enthusiasts from handling it. This paper introduces the Interaction-driven Game Engine (IGE) powered by LLM, which allows everyone to develop a custom game using natural language through Human-LLM interaction. To enable an LLM to function as an IGE, we instruct it to perform the following processes in each turn: (1) $P_{script}$ : configure the game script segment based on the user's input; (2) $P_{code}$ : generate the corresponding code snippet based on the game script segment; (3) $P_{utter}$ : interact with the user, including guidance and feedback. We propose a data synthesis pipeline based on the LLM to generate game script-code pairs and interactions from a few manually crafted seed data. We propose a three-stage progressive training strategy to transfer the dialogue-based LLM to our IGE smoothly. We construct an IGE for poker games as a case study and comprehensively evaluate it from two perspectives: interaction quality and code correctness. The code and data are available at \url{https://github.com/alterego238/IGE}.
Published: 2024

12. Evolving Virtual World with Delta-Engine

Author: Wu, Hongqiu, Xu, Zekai, Xu, Tianyang, Wei, Shize, Wang, Yan, Hong, Jiale, Wu, Weiqi, Zhao, Hai, Zhang, Min, and He, Zhezhi
Subjects: Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction
Abstract: In this paper, we focus on the \emph{virtual world}, a cyberspace where people can live in. An ideal virtual world shares great similarity with our real world. One of the crucial aspects is its evolving nature, reflected by individuals' capability to grow and thereby influence the objective world. Such dynamics is unpredictable and beyond the reach of existing systems. For this, we propose a special engine called \textbf{\emph{Delta-Engine}} to drive this virtual world. $\Delta$ associates the world's evolution to the engine's scalability. It consists of a base engine and a neural proxy. The base engine programs the prototype of the virtual world; given a trigger, the neural proxy generates new snippets on the base engine through \emph{incremental prediction}. This paper presents a full-stack introduction to the delta-engine. The key feature of the delta-engine is its scalability to unknown elements within the world, Technically, it derives from the prefect co-work of the neural proxy and the base engine, and the alignment with high-quality data. We introduce an engine-oriented fine-tuning method that embeds the base engine into the proxy. We then discuss the human-LLM collaborative design to produce novel and interesting data efficiently. Eventually, we propose three evaluation principles to comprehensively assess the performance of a delta engine: naive evaluation, incremental evaluation, and adversarial evaluation.
Published: 2024

13. Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions

Author: Ma, Xinbei, Wang, Yiting, Yao, Yao, Yuan, Tongxin, Zhang, Aston, Zhang, Zhuosheng, and Zhao, Hai
Subjects: Computer Science - Computation and Language
Abstract: This paper investigates the faithfulness of multimodal large language model (MLLM) agents in the graphical user interface (GUI) environment, aiming to address the research question of whether multimodal GUI agents can be distracted by environmental context. A general setting is proposed where both the user and the agent are benign, and the environment, while not malicious, contains unrelated content. A wide range of MLLMs are evaluated as GUI agents using our simulated dataset, following three working patterns with different levels of perception. Experimental results reveal that even the most powerful models, whether generalist agents or specialist GUI agents, are susceptible to distractions. While recent studies predominantly focus on the helpfulness (i.e., action accuracy) of multimodal agents, our findings indicate that these agents are prone to environmental distractions, resulting in unfaithful behaviors. Furthermore, we switch to the adversarial perspective and implement environment injection, demonstrating that such unfaithfulness can be exploited, leading to unexpected risks.
Published: 2024

14. Study of Wide-Field-of-View X-ray Observations of the Virgo Cluster Using the Lobster Eye Imager for Astronomy

Author: Feng, Wen-Cheng, Jia, Shu-Mei, Zhao, Hai-Hui, Yu, Heng, Pan, Hai-Wu, Li, Cheng-Kui, Cheng, Yu-Lin, Weng, Shan-Shan, Chen, Yong, Liu, Yuan, Ling, Zhi-Xing, and Zhang, Chen
Subjects: Astrophysics - High Energy Astrophysical Phenomena, Astrophysics - Instrumentation and Methods for Astrophysics
Abstract: The Lobster Eye Imager for Astronomy (LEIA) is the pathfinder of the wide-field X-ray telescope used in the Einstein Probe mission. In this study, we present an image of the Virgo Cluster taken by LEIA in the 0.5-4.5 keV band with an exposure time of $\sim$17.3 ks in the central region. This extended emission is generally consistent with the results obtained by ROSAT. However, the field is affected by bright point sources due to the instrument's Point Spread Function (PSF) effect. Through fitting of the LEIA spectrum of the Virgo Cluster, we obtained a temperature of $2.1^{+0.3}_{-0.1}$ keV, which is consistent with the XMM-Newton results ($\sim$2.3 keV). Above 1.6 keV, the spectrum is dominated by the X-ray background. In summary, this study validates LEIA's extended source imaging and spectral resolution capabilities for the first time., Comment: 9 pages, 6 figures, 1 table
Published: 2024

15. Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption

Author: Shi, Luohe, Zhang, Hongyi, Yao, Yao, Li, Zuchao, and Zhao, Hai
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs), epitomized by ChatGPT' s release in late 2022, have revolutionized various industries with their advanced language comprehension. However, their efficiency is challenged by the Transformer architecture' s struggle with handling long texts. KV-Cache has emerged as a pivotal solution to this issue, converting the time complexity of token generation from quadratic to linear, albeit with increased GPU memory overhead proportional to conversation length. With the development of the LLM community and academia, various KV-Cache compression methods have been proposed. In this review, we dissect the various properties of KV-Cache and elaborate on various methods currently used to optimize the KV-Cache space usage of LLMs. These methods span the pre-training phase, deployment phase, and inference phase, and we summarize the commonalities and differences among these methods. Additionally, we list some metrics for evaluating the long-text capabilities of large language models, from both efficiency and capability perspectives. Our review thus sheds light on the evolving landscape of LLM optimization, offering insights into future advancements in this dynamic field., Comment: to be published in CoLM 2024
Published: 2024

16. DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems

Author: Zou, Anni, Yu, Wenhao, Zhang, Hongming, Ma, Kaixin, Cai, Deng, Zhang, Zhuosheng, Zhao, Hai, and Yu, Dong
Subjects: Computer Science - Computation and Language
Abstract: Recently, there has been a growing interest among large language model (LLM) developers in LLM-based document reading systems, which enable users to upload their own documents and pose questions related to the document contents, going beyond simple reading comprehension tasks. Consequently, these systems have been carefully designed to tackle challenges such as file parsing, metadata extraction, multi-modal information understanding and long-context reading. However, no current benchmark exists to evaluate their performance in such scenarios, where a raw file and questions are provided as input, and a corresponding response is expected as output. In this paper, we introduce DocBench, a new benchmark designed to evaluate LLM-based document reading systems. Our benchmark involves a meticulously crafted process, including the recruitment of human annotators and the generation of synthetic questions. It includes 229 real documents and 1,102 questions, spanning across five different domains and four major types of questions. We evaluate both proprietary LLM-based systems accessible via web interfaces or APIs, and a parse-then-read pipeline employing open-source LLMs. Our evaluations reveal noticeable gaps between existing LLM-based document reading systems and human performance, underscoring the challenges of developing proficient systems. To summarize, DocBench aims to establish a standardized benchmark for evaluating LLM-based document reading systems under diverse real-world scenarios, thereby guiding future advancements in this research area., Comment: Work in progress
Published: 2024

17. Hypergraph based Understanding for Document Semantic Entity Recognition

Author: Li, Qiwei, Li, Zuchao, Wang, Ping, Ai, Haojun, and Zhao, Hai
Subjects: Computer Science - Artificial Intelligence
Abstract: Semantic entity recognition is an important task in the field of visually-rich document understanding. It distinguishes the semantic types of text by analyzing the position relationship between text nodes and the relation between text content. The existing document understanding models mainly focus on entity categories while ignoring the extraction of entity boundaries. We build a novel hypergraph attention document semantic entity recognition framework, HGA, which uses hypergraph attention to focus on entity boundaries and entity categories at the same time. It can conduct a more detailed analysis of the document text representation analyzed by the upstream model and achieves a better performance of semantic information. We apply this method on the basis of GraphLayoutLM to construct a new semantic entity recognition model HGALayoutLM. Our experiment results on FUNSD, CORD, XFUND and SROIE show that our method can effectively improve the performance of semantic entity recognition tasks based on the original model. The results of HGALayoutLM on FUNSD and XFUND reach the new state-of-the-art results.
Published: 2024

18. Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba

Author: Zou, Yuchen, Chen, Yineng, Li, Zuchao, Zhang, Lefei, and Zhao, Hai
Subjects: Computer Science - Computation and Language
Abstract: Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. Nevertheless, the recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering: (i) the functioning of the Mamba mechanism and its foundation on the principles of structured state space models; (ii) the proposed improvements and the integration of Mamba with various networks, exploring its potential as a substitute for Transformers; (iii) the combination of Transformers and Mamba to compensate for each other's shortcomings. We have also made efforts to interpret Mamba and Transformer in the framework of kernel functions, allowing for a comparison of their mathematical nature within a unified context. Our paper encompasses the vast majority of improvements related to Mamba to date.
Published: 2024

19. The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

Author: Li, Jiajia, Yang, Lu, Tang, Mingni, Chen, Cong, Li, Zuchao, Wang, Ping, and Zhao, Hai
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-related capabilities of LLMs. ZIQI-Eval encompasses a wide range of questions, covering 10 major categories and 56 subcategories, resulting in over 14,000 meticulously curated data entries. By leveraging ZIQI-Eval, we conduct a comprehensive evaluation over 16 LLMs to evaluate and analyze LLMs' performance in the domain of music. Results indicate that all LLMs perform poorly on the ZIQI-Eval benchmark, suggesting significant room for improvement in their musical capabilities. With ZIQI-Eval, we aim to provide a standardized and robust evaluation framework that facilitates a comprehensive assessment of LLMs' music-related abilities. The dataset is available at GitHub\footnote{https://github.com/zcli-charlie/ZIQI-Eval} and HuggingFace\footnote{https://huggingface.co/datasets/MYTH-Lab/ZIQI-Eval}., Comment: Accepted to ACL-Findings 2024
Published: 2024

20. Vript: A Video Is Worth Thousands of Words

Author: Yang, Dongjie, Huang, Suyuan, Lu, Chengqiang, Han, Xiaodong, Zhang, Haoxin, Gao, Yan, Hu, Yao, and Zhao, Hai
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Advancements in multimodal learning, particularly in video understanding and generation, require high-quality video-text datasets for improved model performance. Vript addresses this issue with a meticulously annotated corpus of 12K high-resolution videos, offering detailed, dense, and script-like captions for over 420K clips. Each clip has a caption of ~145 words, which is over 10x longer than most video-text datasets. Unlike captions only documenting static content in previous datasets, we enhance video captioning to video scripting by documenting not just the content, but also the camera operations, which include the shot types (medium shot, close-up, etc) and camera movements (panning, tilting, etc). By utilizing the Vript, we explore three training paradigms of aligning more text with the video modality rather than clip-caption pairs. This results in Vriptor, a top-performing video captioning model among open-source models, comparable to GPT-4V in performance. Vriptor is also a powerful model capable of end-to-end generation of dense and detailed captions for long videos. Moreover, we introduce Vript-Hard, a benchmark consisting of three video understanding tasks that are more challenging than existing benchmarks: Vript-HAL is the first benchmark evaluating action and object hallucinations in video LLMs, Vript-RR combines reasoning with retrieval resolving question ambiguity in long-video QAs, and Vript-ERO is a new task to evaluate the temporal understanding of events in long videos rather than actions in short videos in previous works. All code, models, and datasets are available in https://github.com/mutonix/Vript. PS: We have included more video-text datasets (Vript_CN & Vript_Multilingual) in the Vript series., Comment: Accepted by NeurIPS Dataset & Benchmark track
Published: 2024

21. GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment

Author: Yao, Yao, Li, Zuchao, and Zhao, Hai
Subjects: Computer Science - Computation and Language
Abstract: The burgeoning size of Large Language Models (LLMs) has led to enhanced capabilities in generating responses, albeit at the expense of increased inference times and elevated resource demands. Existing methods of acceleration, predominantly hinged on knowledge distillation, generally necessitate fine-tuning of considerably large models, such as Llama-7B, posing a challenge for average users. Furthermore, present techniques for expediting inference and reducing costs operate independently. To address these issues, we introduce a novel and intuitive Guidance-based Knowledge Transfer (GKT) framework. This approach leverages a larger LLM as a ''teacher'' to create guidance prompts, paired with a smaller ''student'' model to finalize responses. Remarkably, GKT requires no fine-tuning and doesn't necessitate the teacher and student models to have the same vocabulary, allowing for extensive batch generation to accelerate the process while ensuring user customization. GKT can be seamlessly integrated into cloud-edge collaboration architectures, and is versatile enough for plug-and-play application across various models. It excels in both efficiency and affordability, epitomizing a ''cheap and cheerful'' solution. GKT achieves a maximum accuracy improvement of 14.18%, along with a 10.72 times speed-up on GSM8K and an accuracy improvement of 14.00 % along with a 7.73 times speed-up in CSQA. When utilizing ChatGPT as teacher model and Llama2-70B as the student model, we can achieve 95.00% of ChatGPT's performance at 52% of the cost. The results highlight substantial enhancements in accuracy and processing speed on the GSM8K and CSQA datasets, surpassing the performance of using either the student or teacher models in isolation.
Published: 2024

22. From Role-Play to Drama-Interaction: An LLM Solution

Author: Wu, Weiqi, Wu, Hongqiu, Jiang, Lai, Liu, Xingyuan, Hong, Jiale, Zhao, Hai, and Zhang, Min
Subjects: Computer Science - Computation and Language
Abstract: Drama is a form of storytelling inspired by human creativity, proceeding with a predefined storyline, carrying emotions and thoughts. This paper introduces \emph{LLM-based interactive drama}, which endows traditional drama with an unprecedented immersion, where a person is allowed to walk into it and interact with the characters and scenes. We define this new artistic genre by 6 essential elements-plot, character, thought, diction, spectacle and interaction-and study the entire pipeline to forge a backbone \emph{drama LLM} to drive the playing process, which is challenged by limited drama resources, uncontrollable narrative development, and complicated instruction following. We propose \emph{Narrative Chain} to offer finer control over the narrative progression during interaction with players; \emph{Auto-Drama} to synthesize drama scripts given arbitrary stories; \emph{Sparse Instruction Tuning} to allow the model to follow sophisticated instructions. We manually craft 3 scripts, \emph{Detective Conan}, \emph{Harry Potter}, \emph{Romeo and Juliet}, and design a 5-dimension principle to evaluate the drama LLM comprehensively., Comment: Accepted by ACL 2024 Findings
Published: 2024

23. PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference

Author: Yang, Dongjie, Han, XiaoDong, Gao, Yan, Hu, Yao, Zhang, Shilin, and Zhao, Hai
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) have shown remarkable comprehension abilities but face challenges in GPU memory usage during inference, hindering their scalability for real-time applications like chatbots. To accelerate inference, we store computed keys and values (KV cache) in the GPU memory. Existing methods study the KV cache compression to reduce memory by pruning the pre-computed KV cache. However, they neglect the inter-layer dependency between layers and huge memory consumption in pre-computation. To explore these deficiencies, we find that the number of crucial keys and values that influence future generations decreases layer by layer and we can extract them by the consistency in attention weights. Based on the findings, we propose PyramidInfer, a method that compresses the KV cache by layer-wise retaining crucial context. PyramidInfer saves significant memory by computing fewer keys and values without sacrificing performance. Experimental results show PyramidInfer improves 2.2x throughput compared to Accelerate with over 54% GPU memory reduction in KV cache., Comment: Accepted by ACL 2024
Published: 2024

24. SirLLM: Streaming Infinite Retentive LLM

Author: Yao, Yao, Li, Zuchao, and Zhao, Hai
Subjects: Computer Science - Computation and Language
Abstract: As Large Language Models (LLMs) become increasingly prevalent in various domains, their ability to process inputs of any length and maintain a degree of memory becomes essential. However, the one-off input of overly long texts is limited, as studies have shown that when input lengths exceed the LLMs' pre-trained text length, there is a dramatic decline in text generation capabilities. Moreover, simply extending the length of pre-training texts is impractical due to the difficulty in obtaining long text data and the substantial memory consumption costs this would entail for LLMs. Recent efforts have employed streaming inputs to alleviate the pressure of excessively long text inputs, but this approach can significantly impair the model's long-term memory capabilities. Motivated by this challenge, we introduce Streaming Infinite Retentive LLM (SirLLM), which allows LLMs to maintain longer memory during infinite-length dialogues without the need for fine-tuning. SirLLM utilizes the Token Entropy metric and a memory decay mechanism to filter key phrases, endowing LLMs with both long-lasting and flexible memory. We designed three distinct tasks and constructed three datasets to measure the effectiveness of SirLLM from various angles: (1) DailyDialog; (2) Grocery Shopping; (3) Rock-Paper-Scissors. Our experimental results robustly demonstrate that SirLLM can achieve stable and significant improvements across different LLMs and tasks, compellingly proving its effectiveness. When having a coversation, "A sir could forget himself," but SirLLM never does! Our code is publicly available at https://github.com/Zoeyyao27/SirLLM
Published: 2024

25. Communication-Efficient Hybrid Federated Learning for E-health with Horizontal and Vertical Data Partitioning

Author: Yu, Chong, Shen, Shuaiqi, Wang, Shiqiang, Zhang, Kuan, and Zhao, Hai
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: E-health allows smart devices and medical institutions to collaboratively collect patients' data, which is trained by Artificial Intelligence (AI) technologies to help doctors make diagnosis. By allowing multiple devices to train models collaboratively, federated learning is a promising solution to address the communication and privacy issues in e-health. However, applying federated learning in e-health faces many challenges. First, medical data is both horizontally and vertically partitioned. Since single Horizontal Federated Learning (HFL) or Vertical Federated Learning (VFL) techniques cannot deal with both types of data partitioning, directly applying them may consume excessive communication cost due to transmitting a part of raw data when requiring high modeling accuracy. Second, a naive combination of HFL and VFL has limitations including low training efficiency, unsound convergence analysis, and lack of parameter tuning strategies. In this paper, we provide a thorough study on an effective integration of HFL and VFL, to achieve communication efficiency and overcome the above limitations when data is both horizontally and vertically partitioned. Specifically, we propose a hybrid federated learning framework with one intermediate result exchange and two aggregation phases. Based on this framework, we develop a Hybrid Stochastic Gradient Descent (HSGD) algorithm to train models. Then, we theoretically analyze the convergence upper bound of the proposed algorithm. Using the convergence results, we design adaptive strategies to adjust the training parameters and shrink the size of transmitted data. Experimental results validate that the proposed HSGD algorithm can achieve the desired accuracy while reducing communication cost, and they also verify the effectiveness of the adaptive strategies.
Published: 2024

26. Instruction-Driven Game Engines on Large Language Models

Author: Wu, Hongqiu, Wang, Yan, Liu, Xingyuan, Zhao, Hai, and Zhang, Min
Subjects: Computer Science - Artificial Intelligence
Abstract: The Instruction-Driven Game Engine (IDGE) project aims to democratize game development by enabling a large language model (LLM) to follow free-form game rules and autonomously generate game-play processes. The IDGE allows users to create games by issuing simple natural language instructions, which significantly lowers the barrier for game development. We approach the learning process for IDGEs as a Next State Prediction task, wherein the model autoregressively predicts in-game states given player actions. It is a challenging task because the computation of in-game states must be precise; otherwise, slight errors could disrupt the game-play. To address this, we train the IDGE in a curriculum manner that progressively increases the model's exposure to complex scenarios. Our initial progress lies in developing an IDGE for Poker, a universally cherished card game. The engine we've designed not only supports a wide range of poker variants but also allows for high customization of rules through natural language inputs. Furthermore, it also favors rapid prototyping of new games from minimal samples, proposing an innovative paradigm in game development that relies on minimal prompt and data engineering. This work lays the groundwork for future advancements in instruction-driven game creation, potentially transforming how games are designed and played.
Published: 2024

27. Mitigating Misleading Chain-of-Thought Reasoning with Selective Filtering

Author: Wu, Yexin, Zhang, Zhuosheng, and Zhao, Hai
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models have manifested remarkable capabilities by leveraging chain-of-thought (CoT) reasoning techniques to solve intricate questions through step-by-step reasoning chains. Despite its success, the efficacy of such reasoning is inherently contingent upon the quality of CoT. However, flawless CoT reasoning cannot be guaranteed due to the presence of indecomposable questions and the potential for erroneous reasoning chains, particularly in the case of small-scale language models. To tackle this challenge, we propose a novel approach called the selective filtering reasoner (SelF-Reasoner) that assesses the entailment relationship between the question and the candidate reasoning chain. Then, we proceed with CoT reasoning when the reasoning chain demonstrates confidence; otherwise, we opt to predict the answer directly. SelF-Reasoner improves the fine-tuned T5 baseline consistently over the ScienceQA, ECQA, and LastLetter tasks. Code is available at \texttt{https://github.com/LibroWu/SelF-Reasoner}.
Published: 2024

28. Multi-modal Auto-regressive Modeling via Visual Words

Author: Peng, Tianshuo, Li, Zuchao, Zhang, Lefei, Zhao, Hai, Wang, Ping, and Du, Bo
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Large Language Models (LLMs), benefiting from the auto-regressive modelling approach performed on massive unannotated texts corpora, demonstrates powerful perceptual and reasoning capabilities. However, as for extending auto-regressive modelling to multi-modal scenarios to build Large Multi-modal Models (LMMs), there lies a great difficulty that the image information is processed in the LMM as continuous visual embeddings, which cannot obtain discrete supervised labels for classification.In this paper, we successfully perform multi-modal auto-regressive modeling with a unified objective for the first time.Specifically, we propose the concept of visual tokens, which maps the visual features to probability distributions over LLM's vocabulary, providing supervision information for visual modelling.We further explore the distribution of visual features in the semantic space within LMM and the possibility of using text embeddings to represent visual information.Experimental results and ablation studies on 5 VQA tasks and 4 benchmark toolkits validate the powerful performance of our proposed approach., Comment: ACM MM 2024
Published: 2024

29. Hypertext Entity Extraction in Webpage

Author: Yang, Yifei, Liu, Tianqiao, Shao, Bo, Zhao, Hai, Shou, Linjun, Gong, Ming, and Jiang, Daxin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Webpage entity extraction is a fundamental natural language processing task in both research and applications. Nowadays, the majority of webpage entity extraction models are trained on structured datasets which strive to retain textual content and its structure information. However, existing datasets all overlook the rich hypertext features (e.g., font color, font size) which show their effectiveness in previous works. To this end, we first collect a \textbf{H}ypertext \textbf{E}ntity \textbf{E}xtraction \textbf{D}ataset (\textit{HEED}) from the e-commerce domains, scraping both the text and the corresponding explicit hypertext features with high-quality manual entity annotations. Furthermore, we present the \textbf{Mo}E-based \textbf{E}ntity \textbf{E}xtraction \textbf{F}ramework (\textit{MoEEF}), which efficiently integrates multiple features to enhance model performance by Mixture of Experts and outperforms strong baselines, including the state-of-the-art small-scale models and GPT-3.5-turbo. Moreover, the effectiveness of hypertext features in \textit{HEED} and several model components in \textit{MoEEF} are analyzed.
Published: 2024

30. Cuproptosis and Cu: a new paradigm in cellular death and their role in non-cancerous diseases

Author: Yang, Zhibo, Feng, Ridong, and Zhao, Hai
Published: 2024
Full Text: View/download PDF

31. Characteristics and distribution of microstructures in high pressure die cast alloys with X-ray microtomography: A review

Author: Zhao, Hai-dong, Wang, Xue-ling, Wan, Qian, Bai, Wen-hui, and Liu, Fei
Published: 2024
Full Text: View/download PDF

32. Improving RSW nugget diameter prediction method: unleashing the power of multi-fidelity neural networks and transfer learning

Author: Yue, Zhong-Jie, Chen, Qiu-Ren, Bao, Zu-Guo, Huang, Li, Tan, Guo-Bi, Hou, Ze-Hong, Li, Mu-Shi, Huang, Shi-Yao, Zhao, Hai-Long, Kong, Jing-Yu, Wang, Jia, and Liu, Qing
Published: 2024
Full Text: View/download PDF

33. Unveiling Vulnerability of Self-Attention

Author: Liong, Khai Jiet, Wu, Hongqiu, and Zhao, Hai
Subjects: Computer Science - Computation and Language
Abstract: Pre-trained language models (PLMs) are shown to be vulnerable to minor word changes, which poses a big threat to real-world systems. While previous studies directly focus on manipulating word inputs, they are limited by their means of generating adversarial samples, lacking generalization to versatile real-world attack. This paper studies the basic structure of transformer-based PLMs, the self-attention (SA) mechanism. (1) We propose a powerful perturbation technique \textit{HackAttend}, which perturbs the attention scores within the SA matrices via meticulously crafted attention masks. We show that state-of-the-art PLMs fall into heavy vulnerability that minor attention perturbations $(1\%)$ can produce a very high attack success rate $(98\%)$. Our paper expands the conventional text attack of word perturbations to more general structural perturbations. (2) We introduce \textit{S-Attend}, a novel smoothing technique that effectively makes SA robust via structural perturbations. We empirically demonstrate that this simple yet effective technique achieves robust performance on par with adversarial training when facing various text attackers. Code is publicly available at \url{github.com/liongkj/HackAttend}., Comment: https://github.com/liongkj/HackAttend
Published: 2024

34. CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation

Author: Ma, Xinbei, Zhang, Zhuosheng, and Zhao, Hai
Subjects: Computer Science - Computation and Language
Abstract: Multimodal large language models (MLLMs) have shown remarkable potential as human-like autonomous language agents to interact with real-world environments, especially for graphical user interface (GUI) automation. However, those GUI agents require comprehensive cognition ability including exhaustive perception and reliable action response. We propose a Comprehensive Cognitive LLM Agent, CoCo-Agent, with two novel approaches, comprehensive environment perception (CEP) and conditional action prediction (CAP), to systematically improve the GUI automation performance. First, CEP facilitates the GUI perception through different aspects and granularity, including screenshots and complementary detailed layouts for the visual channel and historical actions for the textual channel. Second, CAP decomposes the action prediction into sub-problems: action type prediction and action target conditioned on the action type. With our technical design, our agent achieves new state-of-the-art performance on AITW and META-GUI benchmarks, showing promising abilities in realistic scenarios. Code is available at https://github.com/xbmxb/CoCo-Agent., Comment: ACL'2024 Findings
Published: 2024

35. Head-wise Shareable Attention for Large Language Models

Author: Cao, Zouying, Yang, Yifei, and Zhao, Hai
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) suffer from huge number of parameters, which restricts their deployment on edge devices. Weight sharing is one promising solution that encourages weight reuse, effectively reducing memory usage with less performance drop. However, current weight sharing techniques primarily focus on small-scale models like BERT and employ coarse-grained sharing rules, e.g., layer-wise. This becomes limiting given the prevalence of LLMs and sharing an entire layer or block obviously diminishes the flexibility of weight sharing. In this paper, we present a perspective on head-wise shareable attention for large language models. We further propose two memory-efficient methods that share parameters across attention heads, with a specific focus on LLMs. Both of them use the same dynamic strategy to select the shared weight matrices. The first method directly reuses the pre-trained weights without retraining, denoted as $\textbf{DirectShare}$. The second method first post-trains with constraint on weight matrix similarity and then shares, denoted as $\textbf{PostShare}$. Experimental results reveal our head-wise shared models still maintain satisfactory capabilities, demonstrating the feasibility of fine-grained weight sharing applied to LLMs., Comment: 17 pages, 7 figures, 21 tables, EMNLP'24 Findings
Published: 2024

36. Dissecting Human and LLM Preferences

Author: Li, Junlong, Zhou, Fan, Sun, Shichao, Zhang, Yikai, Zhao, Hai, and Liu, Pengfei
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: As a relative quality comparison of model responses, human and Large Language Model (LLM) preferences serve as common alignment goals in model fine-tuning and criteria in evaluation. Yet, these preferences merely reflect broad tendencies, resulting in less explainable and controllable models with potential safety risks. In this work, we dissect the preferences of human and 32 different LLMs to understand their quantitative composition, using annotations from real-world user-model conversations for a fine-grained, scenario-wise analysis. We find that humans are less sensitive to errors, favor responses that support their stances, and show clear dislike when models admit their limits. On the contrary, advanced LLMs like GPT-4-Turbo emphasize correctness, clarity, and harmlessness more. Additionally, LLMs of similar sizes tend to exhibit similar preferences, regardless of their training methods, and fine-tuning for alignment does not significantly alter the preferences of pretrained-only LLMs. Finally, we show that preference-based evaluation can be intentionally manipulated. In both training-free and training-based settings, aligning a model with the preferences of judges boosts scores, while injecting the least preferred properties lowers them. This results in notable score shifts: up to 0.59 on MT-Bench (1-10 scale) and 31.94 on AlpacaEval 2.0 (0-100 scale), highlighting the significant impact of this strategic adaptation. Interactive Demo: https://huggingface.co/spaces/GAIR/Preference-Dissection-Visualization Dataset: https://huggingface.co/datasets/GAIR/preference-dissection Code: https://github.com/GAIR-NLP/Preference-Dissection
Published: 2024

37. LaCo: Large Language Model Pruning via Layer Collapse

Author: Yang, Yifei, Cao, Zouying, and Zhao, Hai
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) based on transformer are witnessing a notable trend of size expansion, which brings considerable costs to both model training and inference. However, existing methods such as model quantization, knowledge distillation, and model pruning are constrained by various issues, including hardware support limitations, the need for extensive training, and alterations to the model internal structure. In this paper, we propose a concise layer-wise structured pruner called \textit{Layer Collapse (LaCo)}, in which rear model layers collapse into a prior layer, enabling a rapid reduction in model size while preserving the model structure. Comprehensive experiments show that our method maintains an average task performance of over 80\% at pruning ratios of 25-30\%, significantly outperforming existing state-of-the-art structured pruning methods. We also conduct post-training experiments to confirm that the \textit{LaCo} effectively inherits the parameters of the original model. Additionally, we perform ablation studies on various settings of \textit{LaCo}. Finally, we discuss our motivation from the perspective of layer-wise similarity and evaluate the performance of the pruned LLMs across various pruning ratios\footnote{\url{https://github.com/yangyifei729/LaCo}}., Comment: Accepted as Findings of EMNLP2024
Published: 2024

38. On the Robustness of Editing Large Language Models

Author: Ma, Xinbei, Ju, Tianjie, Qiu, Jiyang, Zhang, Zhuosheng, Zhao, Hai, Liu, Lifeng, and Wang, Yulong
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) have played a pivotal role in building communicative AI, yet they encounter the challenge of efficient updates. Model editing enables the manipulation of specific knowledge memories and the behavior of language generation without retraining. However, the robustness of model editing remains an open question. This work seeks to understand the strengths and limitations of editing methods, facilitating practical applications of communicative AI. We focus on three key research questions. RQ1: Can edited LLMs behave consistently resembling communicative AI in realistic situations? RQ2: To what extent does the rephrasing of prompts lead LLMs to deviate from the edited knowledge memory? RQ3: Which knowledge features are correlated with the performance and robustness of editing? Our empirical studies uncover a substantial disparity between existing editing methods and the practical application of LLMs. On rephrased prompts that are flexible but common in realistic applications, the performance of editing experiences a significant decline. Further analysis shows that more popular knowledge is memorized better, easier to recall, and more challenging to edit effectively. Code is publicly available at https://github.com/xbmxb/edit_analysis ., Comment: EMNLP2024. Code is publicly available at https://github.com/xbmxb/edit_analysis
Published: 2024

39. GLaPE: Gold Label-agnostic Prompt Evaluation and Optimization for Large Language Model

Author: Zhang, Xuanchang, Zhang, Zhuosheng, and Zhao, Hai
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Despite the rapid progress of large language models (LLMs), their task performance remains sensitive to prompt design. Recent studies have explored leveraging the LLM itself as an optimizer to identify optimal prompts that maximize task accuracy. However, when evaluating prompts, such approaches heavily rely on elusive manually annotated gold labels to calculate task accuracy for each candidate prompt, which hinders the widespread implementation and generality. To overcome the limitation, this work proposes a gold label-agnostic prompt evaluation (GLaPE) to alleviate dependence on gold labels. Motivated by the observed correlation between self-consistency and the accuracy of the answer, we adopt self-consistency as the initial evaluation score. Subsequently, we refine the scores of prompts producing identical answers to be mutually consistent. Experimental results show that GLaPE provides reliable evaluations uniform with accuracy, even in the absence of gold labels. Moreover, on six popular reasoning tasks, our GLaPE-based prompt optimization yields effective prompts comparable to accuracy-based ones. The code is publicly available at https://github.com/thunderous77/GLaPE.
Published: 2024

40. Sparse is Enough in Fine-tuning Pre-trained Large Language Models

Author: Song, Weixi, Li, Zuchao, Zhang, Lefei, Zhao, Hai, and Du, Bo
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: With the prevalence of pre-training-fine-tuning paradigm, how to efficiently adapt the pre-trained model to the downstream tasks has been an intriguing issue. Parameter-Efficient Fine-Tuning (PEFT) methods have been proposed for low-cost adaptation. Although PEFT has demonstrated effectiveness and been widely applied, the underlying principles are still unclear. In this paper, we adopt the PAC-Bayesian generalization error bound, viewing pre-training as a shift of prior distribution which leads to a tighter bound for generalization error. We validate this shift from the perspectives of oscillations in the loss landscape and the quasi-sparsity in gradient distribution. Based on this, we propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT), and validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning. The code is accessible at https://github.com/song-wx/SIFT/., Comment: Accepted at ICML 2024 Spotlight
Published: 2023

41. A Novel Energy based Model Mechanism for Multi-modal Aspect-Based Sentiment Analysis

Author: Peng, Tianshuo, Li, Zuchao, Wang, Ping, Zhang, Lefei, and Zhao, Hai
Subjects: Computer Science - Artificial Intelligence
Abstract: Multi-modal aspect-based sentiment analysis (MABSA) has recently attracted increasing attention. The span-based extraction methods, such as FSUIE, demonstrate strong performance in sentiment analysis due to their joint modeling of input sequences and target labels. However, previous methods still have certain limitations: (i) They ignore the difference in the focus of visual information between different analysis targets (aspect or sentiment). (ii) Combining features from uni-modal encoders directly may not be sufficient to eliminate the modal gap and can cause difficulties in capturing the image-text pairwise relevance. (iii) Existing span-based methods for MABSA ignore the pairwise relevance of target span boundaries. To tackle these limitations, we propose a novel framework called DQPSA for multi-modal sentiment analysis. Specifically, our model contains a Prompt as Dual Query (PDQ) module that uses the prompt as both a visual query and a language query to extract prompt-aware visual information and strengthen the pairwise relevance between visual information and the analysis target. Additionally, we introduce an Energy-based Pairwise Expert (EPE) module that models the boundaries pairing of the analysis target from the perspective of an Energy-based Model. This expert predicts aspect or sentiment span based on pairwise stability. Experiments on three widely used benchmarks demonstrate that DQPSA outperforms previous approaches and achieves a new state-of-the-art performance., Comment: AAAI2024
Published: 2023

42. Influence of typical elements and heat treatment parameters on hardenability in steel: a review

Author: Wang, Bin-bin, Zhu, De-xin, Zhang, Chao-lei, Zhou, Xiao-ye, Wu, Hong-hui, Wang, Shui-ze, Wu, Gui-lin, Gao, Jun-heng, Zhao, Hai-tao, and Mao, Xin-ping
Published: 2024
Full Text: View/download PDF

43. Detoxification with resin promotes the shift from acidogenesis to solventogenesis and prevents acid crash during butanol fermentation from wheat straw

Author: Liu, Guoqiang, Yi, Zhuolin, Li, Jiang, Yang, Lin, Fang, Yang, Du, Anping, He, Kaize, Zhao, Hai, and Jin, Yanling
Published: 2024
Full Text: View/download PDF

44. Numerical simulation of microstructure and microporosity morphology in directional solidification of aluminum-copper alloys: Effect of copper content and withdrawal rate

Author: Yuan, Wei, Zhao, Hai-dong, Shen, Xu, Zou, Chun, Liu, Yuan, and Xu, Qing-yan
Published: 2024
Full Text: View/download PDF

45. Strength and ductility optimization of HPDC AlSi8MgCuZn2 alloys by modifying pre-aging treatment

Author: Jiang, Yuan-hang, Zheng, Hui-ting, Liu, Fei, and Zhao, Hai-dong
Published: 2024
Full Text: View/download PDF

46. Improvement of impact properties of Al–Si–Mg alloy via solution treatment and joint modification with Sr and La

Author: Zheng, Xiong-Ling, Li, Shao-Xiang, Ma, Jia-Le, Xu, Qing-Yan, Zhao, Hai-Dong, and Han, Zhi-Qiang
Published: 2024
Full Text: View/download PDF

47. Radiological characteristics predicting early poor drug response in patients with hemifacial spasm

Author: Li, Bo, Huang, Linwen, Luo, Chun, Jin, Yabin, Zhong, Xuguang, Wang, Guofu, Xu, Zhifeng, Chen, Jingxian, Huang, Shengqiao, Zhao, Hai, and Gao, Mingyong
Published: 2024
Full Text: View/download PDF

48. Photocatalytic in situ H2O2 production and activation for enhanced ciprofloxacin degradation over CeO2-Co3O4/g-C3N4: key role of CeO2

Author: Zhu, Hong-Jie, Yang, Yi-Kai, Li, Ming-Hui, Zou, Lu-Ning, and Zhao, Hai-Tao
Published: 2024
Full Text: View/download PDF

49. The performance of out-of-time events of the follow-up X-ray telescope onboard Einstein probe

Author: Zhao, Hai-Sheng, Zhang, Juan, Guan, Ju, Li, Cheng-Kui, Jia, Shu-Mei, and Song, Li-Ming
Published: 2024
Full Text: View/download PDF

50. Subspace and second-order statistical distribution alignment for cross-domain recognition of human hand motions

Author: Kou, Hanwen, Shi, Han, and Zhao, Hai
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

5,553 results on '"Zhao, Hai"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources