2,131 results on '"Gu, Jia"'
Search Results
2. BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression
- Author
-
Li, Yuankai, Gu, Jia-Chen, Wu, Di, Chang, Kai-Wei, and Peng, Nanyun
- Subjects
Computer Science - Computation and Language - Abstract
Retrieval-augmented generation (RAG) can supplement large language models (LLMs) by integrating external knowledge. However, as the number of retrieved documents increases, the input length to LLMs grows linearly, causing a dramatic increase in latency and a degradation in long-context understanding. This is particularly serious for multi-hop questions that require a chain of reasoning across documents. To accelerate inference, reduce costs, and minimize distractions, this paper presents BRIEF (Bridging Retrieval and Inference through Evidence Fusion), a lightweight approach that performs query-aware multi-hop reasoning by compressing retrieved documents into highly dense textual summaries to integrate into in-context RAG. To enable learning compression for multi-hop reasoning, we curate synthetic data by extracting atomic propositions that encapsulate distinct factoids from the source documents to compose synthetic summaries. Based on our synthetic data built entirely by open-source models, BRIEF generates more concise summaries and enables a range of LLMs to achieve exceptional open-domain question answering (QA) performance. For example, on HotpotQA, BRIEF improves the compression rate by 2 times compared to the state-of-the-art baseline, while outperforming it by 3.00% EM and 4.16% F1 with Flan-UL2 as the reader model. It also generates more concise summaries than proprietary GPT-3.5, while demonstrating nearly identical QA performance., Comment: Accepted by NAACL 2025 Findings. Project page: https://jasonforjoy.github.io/BRIEF/
- Published
- 2024
3. MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
- Author
-
Hu, Wenbo, Gu, Jia-Chen, Dou, Zi-Yi, Fayyaz, Mohsen, Lu, Pan, Chang, Kai-Wei, and Peng, Nanyun
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Existing multimodal retrieval benchmarks primarily focus on evaluating whether models can retrieve and utilize external textual knowledge for question answering. However, there are scenarios where retrieving visual information is either more beneficial or easier to access than textual data. In this paper, we introduce a multimodal retrieval-augmented generation benchmark, MRAG-Bench, in which we systematically identify and categorize scenarios where visually augmented knowledge is better than textual knowledge, for instance, more images from varying viewpoints. MRAG-Bench consists of 16,130 images and 1,353 human-annotated multiple-choice questions across 9 distinct scenarios. With MRAG-Bench, we conduct an evaluation of 10 open-source and 4 proprietary large vision-language models (LVLMs). Our results show that all LVLMs exhibit greater improvements when augmented with images compared to textual knowledge, confirming that MRAG-Bench is vision-centric. Additionally, we conduct extensive analysis with MRAG-Bench, which offers valuable insights into retrieval-augmented LVLMs. Notably, the top-performing model, GPT-4o, faces challenges in effectively leveraging retrieved knowledge, achieving only a 5.82% improvement with ground-truth information, in contrast to a 33.16% improvement observed in human participants. These findings highlight the importance of MRAG-Bench in encouraging the community to enhance LVLMs' ability to utilize retrieved visual knowledge more effectively., Comment: https://mragbench.github.io
- Published
- 2024
4. Gravitational waves associated with the r-mode instability from neutron star-white dwarf mergers
- Author
-
Zhong, Shu-Qing, Meng, Yan-Zhi, and Gu, Jia-Hong
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
Neutron star-white dwarf (NS-WD) binaries evolve into either ultra-compact X-ray binaries undergoing stable mass transfer or direct mergers by unstable mass transfer. While much attention has been on gravitational wave (GW) emissions from NS-WD binaries with the former evolutionary pathway, this work explores GW emissions related to {\em r}-mode instability of the accreting NSs in NS-WD mergers particularly with WD's mass $\gtrsim 1M_{\odot}$. Due to considerably high accretion rates, the GW emissions associated with both {\em r}-modes and magnetic deformation intrinsically induced by {\em r}-modes presented in this work are much stronger than those in NS-WD binaries categorized as intermediate-mass or low-mass X-ray binaries, rendering them interesting sources for the advanced Laser Interferometer Gravitational Wave Observatory and upcoming Einstein Telescope. Moreover, these strong GW emissions might accompany some intriguing electromagnetic emissions such as peculiar long gamma-ray bursts (LGRBs), fast blue optical transients including kilonova-like emissions associated with peculiar LGRBs, and/or fast radio bursts., Comment: Published in PRD
- Published
- 2024
- Full Text
- View/download PDF
5. A simple nanoplatform of thermo-sensitive liposomes and gold nanorods to treat bone metastasis through improved chemotherapy combined with photothermal therapy.
- Author
-
Gu, Jia, Jiang, Lifan, Chen, Zhongping, and Qi, Jun
- Subjects
Bone metastasis ,Gold nanorods ,Liposomes ,Photothermal therapy ,Sensitive release - Abstract
Bone metastasis remains a clinical challenge and is still considered incurable. While nanoparticles-based drug delivery and photothermal therapy (PTT) show promise in treating subcutaneous solid tumor, their therapeutic outcome in treating bone metastasis is limited, due to the inaccessibility of bone metastatic site and the complexity of bone metastasis. Herein, we reported a simple nanoplatform composed of thermo-sensitive liposomes (TSL) and gold nanorods (GNR) to treat bone metastasis through improved chemotherapy combined with GNR-assisted PTT. Lipid combination of TSL was firstly tailored to regulate its stability under physiological condition as well as its sensitivity in responding to PTT-caused mild hyperthermia. The obtained TSL with loaded drug was then combined with GNR to form the nanoplatform through unsophisticated incubation. Cell experiments revealed that upon near-infrared (NIR) irradiation, the nanoplatform effectively inhibited the viability and migration ability of tumor cells through PTT, PTT-triggered thermo-sensitive drug release, and PTT-augmented sensitivity of tumor cells to drug. In a murine model of bone metastasis, the nanoplatform enabled effective delivery of loaded drug and GNR to bone metastatic site for rapid drug release upon local NIR irradiation. Through killing tumor cells and rebalancing the turnover of osteoclasts and osteoblasts, the nanoplatform largely preserved bone structure for pain relief and survival extension. Inspired by the simplicity of nanoplatform acquirement and treatment operation, the strategy of liposomes-based thermo-sensitive drug delivery in combination with GNR-assisted PTT is considered greatly promising in treating bone metastasis.
- Published
- 2024
6. Can Editing LLMs Inject Harm?
- Author
-
Chen, Canyu, Huang, Baixiang, Li, Zekun, Chen, Zhaorun, Lai, Shiyang, Xu, Xiongxiao, Gu, Jia-Chen, Gu, Jindong, Yao, Huaxiu, Xiao, Chaowei, Yan, Xifeng, Wang, William Yang, Torr, Philip, Song, Dawn, and Shu, Kai
- Subjects
Computer Science - Computation and Language - Abstract
Knowledge editing has been increasingly adopted to correct the false or outdated knowledge in Large Language Models (LLMs). Meanwhile, one critical but under-explored question is: can knowledge editing be used to inject harm into LLMs? In this paper, we propose to reformulate knowledge editing as a new type of safety threat for LLMs, namely Editing Attack, and conduct a systematic investigation with a newly constructed dataset EditAttack. Specifically, we focus on two typical safety risks of Editing Attack including Misinformation Injection and Bias Injection. For the risk of misinformation injection, we first categorize it into commonsense misinformation injection and long-tail misinformation injection. Then, we find that editing attacks can inject both types of misinformation into LLMs, and the effectiveness is particularly high for commonsense misinformation injection. For the risk of bias injection, we discover that not only can biased sentences be injected into LLMs with high effectiveness, but also one single biased sentence injection can cause a bias increase in general outputs of LLMs, which are even highly irrelevant to the injected sentence, indicating a catastrophic impact on the overall fairness of LLMs. Then, we further illustrate the high stealthiness of editing attacks, measured by their impact on the general knowledge and reasoning capacities of LLMs, and show the hardness of defending editing attacks with empirical evidence. Our discoveries demonstrate the emerging misuse risks of knowledge editing techniques on compromising the safety alignment of LLMs and the feasibility of disseminating misinformation or bias with LLMs as new channels., Comment: The first two authors contributed equally. 9 pages for main paper, 36 pages including appendix. The code, results, dataset for this paper and more resources are on the project website: https://llm-editing.github.io
- Published
- 2024
7. Knowledge Mechanisms in Large Language Models: A Survey and Perspective
- Author
-
Wang, Mengru, Yao, Yunzhi, Xu, Ziwen, Qiao, Shuofei, Deng, Shumin, Wang, Peng, Chen, Xiang, Gu, Jia-Chen, Jiang, Yong, Xie, Pengjun, Huang, Fei, Chen, Huajun, and Zhang, Ningyu
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Human-Computer Interaction ,Computer Science - Machine Learning - Abstract
Understanding knowledge mechanisms in Large Language Models (LLMs) is crucial for advancing towards trustworthy AGI. This paper reviews knowledge mechanism analysis from a novel taxonomy including knowledge utilization and evolution. Knowledge utilization delves into the mechanism of memorization, comprehension and application, and creation. Knowledge evolution focuses on the dynamic progression of knowledge within individual and group LLMs. Moreover, we discuss what knowledge LLMs have learned, the reasons for the fragility of parametric knowledge, and the potential dark knowledge (hypothesis) that will be challenging to address. We hope this work can help understand knowledge in LLMs and provide insights for future research., Comment: EMNLP 2024 Findings; 39 pages (v4)
- Published
- 2024
8. Leveraging single-case results to Bayesian hierarchical modelling: Leveraging single-case...
- Author
-
Si, Shijing, Gu, Jia-wen, and Tian, Maozai
- Published
- 2025
- Full Text
- View/download PDF
9. Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation
- Author
-
Wu, Di, Gu, Jia-Chen, Yin, Fan, Peng, Nanyun, and Chang, Kai-Wei
- Subjects
Computer Science - Computation and Language - Abstract
Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks. However, there are significant trustworthiness concerns as RALMs are prone to generating unfaithful outputs, including baseless information or contradictions with the retrieved context. This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decoding dynamics including sequence likelihood, uncertainty quantification, context influence, and semantic alignment to synchronously detect unfaithful sentences. By integrating efficiently measurable and complementary signals, SynCheck enables accurate and immediate feedback and intervention, achieving 0.85 AUROC in detecting faithfulness errors across six long-form retrieval-augmented generation tasks, improving prior best method by 4%. Leveraging SynCheck, we further introduce FOD, a faithfulness-oriented decoding algorithm guided by beam search for long-form retrieval-augmented generation. Empirical results demonstrate that FOD outperforms traditional strategies such as abstention, reranking, or contrastive decoding significantly in terms of faithfulness, achieving over 10% improvement across six datasets., Comment: EMNLP 2024
- Published
- 2024
10. Perturbation-Restrained Sequential Model Editing
- Author
-
Ma, Jun-Yu, Wang, Hong, Xu, Hao-Xiang, Ling, Zhen-Hua, and Gu, Jia-Chen
- Subjects
Computer Science - Computation and Language - Abstract
Model editing is an emerging field that focuses on updating the knowledge embedded within large language models (LLMs) without extensive retraining. However, current model editing methods significantly compromise the general abilities of LLMs as the number of edits increases, and this trade-off poses a substantial challenge to the continual learning of LLMs. In this paper, we first theoretically analyze that the factor affecting the general abilities in sequential model editing lies in the condition number of the edited matrix. The condition number of a matrix represents its numerical sensitivity, and therefore can be used to indicate the extent to which the original knowledge associations stored in LLMs are perturbed after editing. Subsequently, statistical findings demonstrate that the value of this factor becomes larger as the number of edits increases, thereby exacerbating the deterioration of general abilities. To this end, a framework termed Perturbation Restraint on Upper bouNd for Editing (PRUNE) is proposed, which applies the condition number restraints in sequential editing. These restraints can lower the upper bound on perturbation to edited models, thus preserving the general abilities. Systematically, we conduct experiments employing three popular editing methods on three LLMs across four representative downstream tasks. Evaluation results show that PRUNE can preserve considerable general abilities while maintaining the editing performance effectively in sequential model editing., Comment: Accepted by ICLR 2025
- Published
- 2024
11. Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection
- Author
-
Zhu, Yun, Gu, Jia-Chen, Sikora, Caitlin, Ko, Ho, Liu, Yinxiao, Lin, Chu-Cheng, Shu, Lei, Luo, Liangchen, Meng, Lei, Liu, Bang, and Chen, Jindong
- Subjects
Computer Science - Computation and Language - Abstract
Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility by incorporating external contexts. However, the input length grows linearly in the number of retrieved documents, causing a dramatic increase in latency. In this paper, we propose a novel paradigm named Sparse RAG, which seeks to cut computation costs through sparsity. Specifically, Sparse RAG encodes retrieved documents in parallel, which eliminates latency introduced by long-range attention of retrieved documents. Then, LLMs selectively decode the output by only attending to highly relevant caches auto-regressively, which are chosen via prompting LLMs with special control tokens. It is notable that Sparse RAG combines the assessment of each individual document and the generation of the response into a single process. The designed sparse mechanism in a RAG system can facilitate the reduction of the number of documents loaded during decoding for accelerating the inference of the RAG system. Additionally, filtering out undesirable contexts enhances the model's focus on relevant context, inherently improving its generation quality. Evaluation results of two datasets show that Sparse RAG can strike an optimal balance between generation quality and computational efficiency, demonstrating its generalizability across both short- and long-form generation tasks.
- Published
- 2024
12. Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation
- Author
-
Gu, Jia, Pang, Liang, Shen, Huawei, and Cheng, Xueqi
- Subjects
Computer Science - Computation and Language - Abstract
With the rapid advancement of large language models (LLMs) for handling complex language tasks, an increasing number of studies are employing LLMs as agents to emulate the sequential decision-making processes of humans often represented as Markov decision-making processes (MDPs). The actions in MDPs adhere to specific probability distributions and require iterative sampling. This arouses curiosity regarding the capacity of LLM agents to comprehend probability distributions, thereby guiding the agent's behavioral decision-making through probabilistic sampling and generating behavioral sequences. To answer the above question, we divide the problem into two main aspects: sequence simulation with known probability distribution and sequence simulation with unknown probability distribution. Our analysis indicates that LLM agents can understand probabilities, but they struggle with probability sampling. Their ability to perform probabilistic sampling can be improved to some extent by integrating coding tools, but this level of sampling precision still makes it difficult to simulate human behavior as agents., Comment: The 31st International Conference on Computational Linguistics (COLING 2025)
- Published
- 2024
13. Achieving ultrahigh-specific strength and enhanced GFA in Ti-based bulk metallic glasses via a two-step alloying strategy
- Author
-
Bu, Heng-Tong, Gu, Jia-Lun, Su, Yun-Shuai, Shao, Yang, and Yao, Ke-Fu
- Published
- 2024
- Full Text
- View/download PDF
14. Precommitted Strategies with Initial-Time and Intermediate-Time Value-at-Risk Constraints
- Author
-
Wu, Chufang, Gu, Jia-Wen, Ching, Wai-Ki, and Wong, Chi-Wing
- Published
- 2024
- Full Text
- View/download PDF
15. Knowledge Boundary and Persona Dynamic Shape A Better Social Media Agent
- Author
-
Zhou, Junkai, Pang, Liang, Jing, Ya, Gu, Jia, Shen, Huawei, and Cheng, Xueqi
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Constructing personalized and anthropomorphic agents holds significant importance in the simulation of social networks. However, there are still two key problems in existing works: the agent possesses world knowledge that does not belong to its personas, and it cannot eliminate the interference of diverse persona information on current actions, which reduces the personalization and anthropomorphism of the agent. To solve the above problems, we construct the social media agent based on personalized knowledge and dynamic persona information. For personalized knowledge, we add external knowledge sources and match them with the persona information of agents, thereby giving the agent personalized world knowledge. For dynamic persona information, we use current action information to internally retrieve the persona information of the agent, thereby reducing the interference of diverse persona information on the current action. To make the agent suitable for social media, we design five basic modules for it: persona, planning, action, memory and reflection. To provide an interaction and verification environment for the agent, we build a social media simulation sandbox. In the experimental verification, automatic and human evaluations demonstrated the effectiveness of the agent we constructed.
- Published
- 2024
16. Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval
- Author
-
Wang, Qian, Gu, Jia-Chen, and Ling, Zhen-Hua
- Subjects
Computer Science - Sound ,Computer Science - Information Retrieval ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Audio-text retrieval (ATR), which retrieves a relevant caption given an audio clip (A2T) and vice versa (T2A), has recently attracted much research attention. Existing methods typically aggregate information from each modality into a single vector for matching, but this sacrifices local details and can hardly capture intricate relationships within and between modalities. Furthermore, current ATR datasets lack comprehensive alignment information, and simple binary contrastive learning labels overlook the measurement of fine-grained semantic differences between samples. To counter these challenges, we present a novel ATR framework that comprehensively captures the matching relationships of multimodal information from different perspectives and finer granularities. Specifically, a fine-grained alignment method is introduced, achieving a more detail-oriented matching through a multiscale process from local to global levels to capture meticulous cross-modal relationships. In addition, we pioneer the application of cross-modal similarity consistency, leveraging intra-modal similarity relationships as soft supervision to boost more intricate alignment. Extensive experiments validate the effectiveness of our approach, outperforming previous methods by significant margins of at least 3.9% (T2A) / 6.9% (A2T) R@1 on the AudioCaps dataset and 2.9% (T2A) / 5.4% (A2T) R@1 on the Clotho dataset., Comment: 5 pages, accepted to ICASSP2024
- Published
- 2024
17. Neighboring Perturbations of Knowledge Editing on Large Language Models
- Author
-
Ma, Jun-Yu, Ling, Zhen-Hua, Zhang, Ningyu, and Gu, Jia-Chen
- Subjects
Computer Science - Computation and Language - Abstract
Despite their exceptional capabilities, large language models (LLMs) are prone to generating unintended text due to false or outdated knowledge. Given the resource-intensive nature of retraining LLMs, there has been a notable increase in the development of knowledge editing. However, current approaches and evaluations rarely explore the perturbation of editing on neighboring knowledge. This paper studies whether updating new knowledge to LLMs perturbs the neighboring knowledge encapsulated within them. Specifically, we seek to figure out whether appending a new answer into an answer list to a factual question leads to catastrophic forgetting of original correct answers in this list, as well as unintentional inclusion of incorrect answers. A metric of additivity is introduced and a benchmark dubbed as Perturbation Evaluation of Appending Knowledge (PEAK) is constructed to evaluate the degree of perturbation to neighboring knowledge when appending new knowledge. Besides, a plug-and-play framework termed Appending via Preservation and Prevention (APP) is proposed to mitigate the neighboring perturbation by maintaining the integrity of the answer list. Experiments demonstrate the effectiveness of APP coupling with four editing methods on four LLMs. The code and data are available at https://github.com/mjy1111/PEAK., Comment: Accepted by ICML 2024
- Published
- 2024
18. Corrective Retrieval Augmented Generation
- Author
-
Yan, Shi-Qi, Gu, Jia-Chen, Zhu, Yun, and Ling, Zhen-Hua
- Subjects
Computer Science - Computation and Language - Abstract
Large language models (LLMs) inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable complement to LLMs, it relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. To this end, we propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation. Specifically, a lightweight retrieval evaluator is designed to assess the overall quality of retrieved documents for a query, returning a confidence degree based on which different knowledge retrieval actions can be triggered. Since retrieval from static and limited corpora can only return sub-optimal documents, large-scale web searches are utilized as an extension for augmenting the retrieval results. Besides, a decompose-then-recompose algorithm is designed for retrieved documents to selectively focus on key information and filter out irrelevant information in them. CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches. Experiments on four datasets covering short- and long-form generation tasks show that CRAG can significantly improve the performance of RAG-based approaches., Comment: Update results, add more analysis, and fix typos
- Published
- 2024
19. Leveraging Large Language Models for NLG Evaluation: Advances and Challenges
- Author
-
Li, Zhen, Xu, Xiaohan, Shen, Tao, Xu, Can, Gu, Jia-Chen, Lai, Yuxuan, Tao, Chongyang, and Ma, Shuai
- Subjects
Computer Science - Computation and Language - Abstract
In the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. This paper aims to provide a thorough overview of leveraging LLMs for NLG evaluation, a burgeoning area that lacks a systematic analysis. We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to understand and compare these methods. Our detailed exploration includes critically assessing various LLM-based methodologies, as well as comparing their strengths and limitations in evaluating NLG outputs. By discussing unresolved challenges, including bias, robustness, domain-specificity, and unified evaluation, this paper seeks to offer insights to researchers and advocate for fairer and more advanced NLG evaluation techniques., Comment: 21 pages, 5 figures
- Published
- 2024
20. Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
- Author
-
Gu, Jia-Chen, Xu, Hao-Xiang, Ma, Jun-Yu, Lu, Pan, Ling, Zhen-Hua, Chang, Kai-Wei, and Peng, Nanyun
- Subjects
Computer Science - Computation and Language - Abstract
Model editing is a technique that edits the large language models (LLMs) with updated knowledge to alleviate hallucinations without resource-intensive retraining. While current model editing methods can effectively modify a model's behavior within a specific area of interest, they often overlook the potential unintended side effects on the general abilities of LLMs such as reasoning, natural language inference, and question answering. In this paper, we raise concerns that model editing's improvements on factuality may come at the cost of a significant degradation of the model's general abilities. We systematically analyze the side effects by evaluating four popular editing methods on three LLMs across eight representative tasks. Our extensive empirical experiments show that it is challenging for current editing methods to simultaneously improve factuality of LLMs and maintain their general abilities. Our analysis reveals that the side effects are caused by model editing altering the original model weights excessively, leading to overfitting to the edited facts. To mitigate this, a method named RECT is proposed to regularize the edit update weights by imposing constraints on their complexity based on the RElative Change in weighT. Evaluation results show that RECT can significantly mitigate the side effects of editing while still maintaining over 94% editing performance., Comment: Accepted by EMNLP 2024
- Published
- 2024
21. A Comprehensive Study of Knowledge Editing for Large Language Models
- Author
-
Zhang, Ningyu, Yao, Yunzhi, Tian, Bozhong, Wang, Peng, Deng, Shumin, Wang, Mengru, Xi, Zekun, Mao, Shengyu, Zhang, Jintian, Ni, Yuansheng, Cheng, Siyuan, Xu, Ziwen, Xu, Xin, Gu, Jia-Chen, Jiang, Yong, Xie, Pengjun, Huang, Fei, Liang, Lei, Zhang, Zhiqiang, Zhu, Xiaowei, Zhou, Jun, and Chen, Huajun
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Human-Computer Interaction ,Computer Science - Machine Learning - Abstract
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the knowledge editing problem and then provide a comprehensive review of cutting-edge approaches. Drawing inspiration from educational and cognitive research theories, we propose a unified categorization criterion that classifies knowledge editing methods into three groups: resorting to external knowledge, merging knowledge into the model, and editing intrinsic knowledge. Furthermore, we introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches. Additionally, we provide an in-depth analysis of knowledge location, which can give a deeper understanding of the knowledge structures inherent within LLMs. Finally, we discuss several potential applications of knowledge editing, outlining its broad and impactful implications., Comment: Ongoing work (v5): we have updated the Table 4 results after optimizing certain methods (related to AdaLoRA) and fixing computational bugs (related to ROME and MEMIT) in the EasyEdit. These improvements have led to better results than before. We will continue updating this paper and welcome everyone to discuss and exchange ideas
- Published
- 2024
22. A single-pixel elemental imaging method using neutron-induced gamma-ray activation: A single-pixel elemental imaging...
- Author
-
Cheng, Can, Xie, Yong-Ji, Xia, Xun-Rong, Gu, Jia-Yu, Zhao, Dong, Chen, Yi-Ze, Sun, Ai-Yun, Liang, Xu-Wen, Jia, Wen-Bao, and Hei, Da-Qian
- Published
- 2025
- Full Text
- View/download PDF
23. Is ChatGPT a Good Multi-Party Conversation Solver?
- Author
-
Tan, Chao-Hong, Gu, Jia-Chen, and Ling, Zhen-Hua
- Subjects
Computer Science - Computation and Language - Abstract
Large Language Models (LLMs) have emerged as influential instruments within the realm of natural language processing; nevertheless, their capacity to handle multi-party conversations (MPCs) -- a scenario marked by the presence of multiple interlocutors involved in intricate information exchanges -- remains uncharted. In this paper, we delve into the potential of generative LLMs such as ChatGPT and GPT-4 within the context of MPCs. An empirical analysis is conducted to assess the zero-shot learning capabilities of ChatGPT and GPT-4 by subjecting them to evaluation across three MPC datasets that encompass five representative tasks. The findings reveal that ChatGPT's performance on a number of evaluated MPC tasks leaves much to be desired, whilst GPT-4's results portend a promising future. Additionally, we endeavor to bolster performance through the incorporation of MPC structures, encompassing both speaker and addressee architecture. This study provides an exhaustive evaluation and analysis of applying generative LLMs to MPCs, casting a light upon the conception and creation of increasingly effective and robust MPC agents. Concurrently, this work underscores the challenges implicit in the utilization of LLMs for MPCs, such as deciphering graphical information flows and generating stylistically consistent responses., Comment: Accepted by Findings of EMNLP 2023
- Published
- 2023
24. Untying the Reversal Curse via Bidirectional Language Model Editing
- Author
-
Ma, Jun-Yu, Gu, Jia-Chen, Ling, Zhen-Hua, Liu, Quan, and Liu, Cong
- Subjects
Computer Science - Computation and Language - Abstract
Recent studies have demonstrated that large language models (LLMs) store massive factual knowledge within their parameters. But existing LLMs are prone to hallucinate unintended text due to false or outdated knowledge. Since retraining LLMs is resource intensive, there has been a growing interest in the concept of model editing. Despite the emergence of benchmarks and approaches, these unidirectional editing and evaluation have failed to explore the reversal curse. Intuitively, if "The capital of France is" is edited to be a counterfact "London" within a model, then it should be able to naturally reason and recall the reverse fact, i.e., "London is the capital of" followed by "France" instead of "England". In this paper, we study bidirectional language model editing, aiming to provide rigorous model editing evaluation to assess if edited LLMs can recall the editing knowledge bidirectionally. A new evaluation metric of reversibility is introduced, and a benchmark dubbed as Bidirectional Assessment for Knowledge Editing (BAKE) is constructed to evaluate the reversibility of edited models in recalling knowledge in the reverse direction of editing. We surprisingly observe that while current editing methods and LLMs can effectively recall editing facts in the direction of editing, they suffer serious deficiencies when evaluated in the reverse direction. To mitigate the reversal curse, a method named Bidirectionally Inversible Relationship moDeling (BIRD) is proposed. A set of editing objectives that incorporate bidirectional relationships between subject and object into the updated model weights are designed. Experiments show that BIRD improves the performance of four representative LLMs of different sizes via question answering and judgement.
- Published
- 2023
25. Associations of oxidative balance score with lumbar spine osteopenia in 20–40 years adults: NHANES 2011–2018
- Author
-
Tao, Yu-Ao, Long, Ling, Gu, Jia-Xiang, Wang, Pei-Yang, Li, Xi, Li, Xiao-Long, Fan, Pan, and Wang, Yuntao
- Published
- 2024
- Full Text
- View/download PDF
26. Synergistic Effects of Glutamine Deprivation and Metformin in Acute Myeloid Leukemia
- Author
-
Liu, Tong-yuan, Fu, Xing, Yang, Ying, Gu, Jia, Xiao, Min, and Li, Deng-ju
- Published
- 2024
- Full Text
- View/download PDF
27. SEAD reference panel with 22,134 haplotypes boosts rare variant imputation and genome-wide association analysis in Asian populations
- Author
-
Yang, Meng-Yuan, Zhong, Jia-Dong, Li, Xin, Tian, Geng, Bai, Wei-Yang, Fang, Yi-Hu, Qiu, Mo-Chang, Yuan, Cheng-Da, Yu, Chun-Fu, Li, Nan, Yang, Ji-Jian, Liu, Yu-Heng, Yu, Shi-Hui, Zhao, Wei-Wei, Liu, Jun-Quan, Sun, Yi, Cong, Pei-Kuan, Khederzadeh, Saber, Zhao, Pian-Pian, Qian, Yu, Guan, Peng-Lin, Gu, Jia-Xuan, Gai, Si-Rui, Yi, Xiang-Jiao, Tao, Jian-Guo, Chen, Xiang, Miao, Mao-Mao, Lei, Lan-Xin, Xu, Lin, Xie, Shu-Yang, Li, Jin-Chen, Guo, Ji-Feng, Karasik, David, Yang, Liu, Tang, Bei-Sha, Huang, Fei, and Zheng, Hou-Feng
- Published
- 2024
- Full Text
- View/download PDF
28. Molecular genetics, therapeutics and RET inhibitor resistance for medullary thyroid carcinoma and future perspectives
- Author
-
Zhang, Ying, Zheng, Wei-Hui, Zhou, Shi-Hong, Gu, Jia-Lei, Yu, Qing, Zhu, Yi-Zhou, Yan, Yu-Jie, Zhu, Zhi, and Shang, Jin-Biao
- Published
- 2024
- Full Text
- View/download PDF
29. Effectiveness of chest pain center accreditation on the hospital outcome of acute aortic dissection: a nationwide study in China
- Author
-
Liu, Li-Wei, Cui, Yi-Kai, Zhang, Lin, Jia, Dai-Le, Wang, Jing, Gu, Jia-Wei, Zhang, Jin-Yan, Dong, Zhen, Jin, Xue-Juan, Zou, Xiao-Yi, Sun, Guo-Li, Dai, Yu-Xiang, Sun, Ai-Jun, and Ge, Jun-Bo
- Published
- 2024
- Full Text
- View/download PDF
30. P. gingivalis in oral-prostate axis exacerbates benign prostatic hyperplasia via IL-6/IL-6R pathway
- Author
-
Wang, Shuang-Ying, Cai, Yi, Hu, Xiao, Li, Fei, Qian, Xin-Hang, Xia, Ling-Yun, Gao, Bo, Wu, Lan, Xie, Wen-Zhong, Gu, Jia-Min, Deng, Tong, Zhu, Cong, Jia, Hai-Chang, Peng, Wan-Qi, Huang, Jiao, Fang, Cheng, and Zeng, Xian-Tao
- Published
- 2024
- Full Text
- View/download PDF
31. Danggui Sini decoction alleviates oxaliplatin-induced peripheral neuropathy by regulating gut microbiota and potentially relieving neuroinflammation related metabolic disorder
- Author
-
Chen, Chen, Xu, Jian-Lin, Gu, Zhan-Cheng, Zhou, Shan-Shan, Wei, Guo-Li, Gu, Jia-Lin, Ma, Hai-Long, Feng, Yan-Qi, Song, Zi-Wei, Yan, Zhan-Peng, Deng, Shan, Ding, Rong, Li, Song-Lin, and Huo, Jie-Ge
- Published
- 2024
- Full Text
- View/download PDF
32. Lifestyle intervention in children with obesity and nonalcoholic fatty liver disease (NAFLD): study protocol for a randomized controlled trial in Ningbo city (the SCIENT study)
- Author
-
Zhang, Ping-ping, Wang, You-xin, Shen, Fang-jing, Xing, Yun-fei, Gu, Jia-ying, Li, Xue-ying, Jin, Han, Jin, Shi-feng, Xu, Miao, Wang, Hai-jun, Wang, Hui, and Li, Li
- Published
- 2024
- Full Text
- View/download PDF
33. MADNet: Maximizing Addressee Deduction Expectation for Multi-Party Conversation Generation
- Author
-
Gu, Jia-Chen, Tan, Chao-Hong, Chu, Caiyuan, Ling, Zhen-Hua, Tao, Chongyang, Liu, Quan, and Liu, Cong
- Subjects
Computer Science - Computation and Language - Abstract
Modeling multi-party conversations (MPCs) with graph neural networks has been proven effective at capturing complicated and graphical information flows. However, existing methods rely heavily on the necessary addressee labels and can only be applied to an ideal setting where each utterance must be tagged with an addressee label. To study the scarcity of addressee labels which is a common issue in MPCs, we propose MADNet that maximizes addressee deduction expectation in heterogeneous graph neural networks for MPC generation. Given an MPC with a few addressee labels missing, existing methods fail to build a consecutively connected conversation graph, but only a few separate conversation fragments instead. To ensure message passing between these conversation fragments, four additional types of latent edges are designed to complete a fully-connected graph. Besides, to optimize the edge-type-dependent message passing for those utterances without addressee labels, an Expectation-Maximization-based method that iteratively generates silver addressee labels (E step), and optimizes the quality of generated responses (M step), is designed. Experimental results on two Ubuntu IRC channel benchmarks show that MADNet outperforms various baseline models on the task of MPC generation, especially under the more common and challenging setting where part of addressee labels are missing., Comment: Accepted by EMNLP 2023. arXiv admin note: text overlap with arXiv:2203.08500
- Published
- 2023
34. SHINE: Syntax-augmented Hierarchical Interactive Encoder for Zero-shot Cross-lingual Information Extraction
- Author
-
Ma, Jun-Yu, Gu, Jia-Chen, Ling, Zhen-Hua, Liu, Quan, Liu, Cong, and Hu, Guoping
- Subjects
Computer Science - Computation and Language - Abstract
Zero-shot cross-lingual information extraction(IE) aims at constructing an IE model for some low-resource target languages, given annotations exclusively in some rich-resource languages. Recent studies based on language-universal features have shown their effectiveness and are attracting increasing attention. However, prior work has neither explored the potential of establishing interactions between language-universal features and contextual representations nor incorporated features that can effectively model constituent span attributes and relationships between multiple spans. In this study, a syntax-augmented hierarchical interactive encoder (SHINE) is proposed to transfer cross-lingual IE knowledge. The proposed encoder is capable of interactively capturing complementary information between features and contextual information, to derive language-agnostic representations for various IE tasks. Concretely, a multi-level interaction network is designed to hierarchically interact the complementary information to strengthen domain adaptability. Besides, in addition to the well-studied syntax features of part-of-speech and dependency relation, a new syntax feature of constituency structure is introduced to model the constituent span information which is crucial for IE. Experiments across seven languages on three IE tasks and four benchmarks verify the effectiveness and generalization ability of the proposed method., Comment: 15pages
- Published
- 2023
35. DiffuSIA: A Spiral Interaction Architecture for Encoder-Decoder Text Diffusion
- Author
-
Tan, Chao-Hong, Gu, Jia-Chen, and Ling, Zhen-Hua
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Diffusion models have emerged as the new state-of-the-art family of deep generative models, and their promising potentials for text generation have recently attracted increasing attention. Existing studies mostly adopt a single encoder architecture with partially noising processes for conditional text generation, but its degree of flexibility for conditional modeling is limited. In fact, the encoder-decoder architecture is naturally more flexible for its detachable encoder and decoder modules, which is extensible to multilingual and multimodal generation tasks for conditions and target texts. However, the encoding process of conditional texts lacks the understanding of target texts. To this end, a spiral interaction architecture for encoder-decoder text diffusion (DiffuSIA) is proposed. Concretely, the conditional information from encoder is designed to be captured by the diffusion decoder, while the target information from decoder is designed to be captured by the conditional encoder. These two types of information flow run through multilayer interaction spirally for deep fusion and understanding. DiffuSIA is evaluated on four text generation tasks, including paraphrase, text simplification, question generation, and open-domain dialogue generation. Experimental results show that DiffuSIA achieves competitive performance among previous methods on all four tasks, demonstrating the effectiveness and generalization ability of the proposed method., Comment: Work in Progress
- Published
- 2023
36. GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding
- Author
-
Gu, Jia-Chen, Ling, Zhen-Hua, Liu, Quan, Liu, Cong, and Hu, Guoping
- Subjects
Computer Science - Computation and Language - Abstract
Addressing the issues of who saying what to whom in multi-party conversations (MPCs) has recently attracted a lot of research attention. However, existing methods on MPC understanding typically embed interlocutors and utterances into sequential information flows, or utilize only the superficial of inherent graph structures in MPCs. To this end, we present a plug-and-play and lightweight method named graph-induced fine-tuning (GIFT) which can adapt various Transformer-based pre-trained language models (PLMs) for universal MPC understanding. In detail, the full and equivalent connections among utterances in regular Transformer ignore the sparse but distinctive dependency of an utterance on another in MPCs. To distinguish different relationships between utterances, four types of edges are designed to integrate graph-induced signals into attention mechanisms to refine PLMs originally designed for processing sequential texts. We evaluate GIFT by implementing it into three PLMs, and test the performance on three downstream tasks including addressee recognition, speaker identification and response selection. Experimental results show that GIFT can significantly improve the performance of three PLMs on three downstream tasks and two benchmarks with only 4 additional parameters per encoding layer, achieving new state-of-the-art performance on MPC understanding., Comment: Accepted by ACL 2023. arXiv admin note: substantial text overlap with arXiv:2106.01541
- Published
- 2023
37. USTC-NELSLIP at SemEval-2023 Task 2: Statistical Construction and Dual Adaptation of Gazetteer for Multilingual Complex NER
- Author
-
Ma, Jun-Yu, Gu, Jia-Chen, Qi, Jiajun, Ling, Zhen-Hua, Liu, Quan, and Zhao, Xiaoyi
- Subjects
Computer Science - Computation and Language - Abstract
This paper describes the system developed by the USTC-NELSLIP team for SemEval-2023 Task 2 Multilingual Complex Named Entity Recognition (MultiCoNER II). A method named Statistical Construction and Dual Adaptation of Gazetteer (SCDAG) is proposed for Multilingual Complex NER. The method first utilizes a statistics-based approach to construct a gazetteer. Secondly, the representations of gazetteer networks and language models are adapted by minimizing the KL divergence between them at both the sentence-level and entity-level. Finally, these two networks are then integrated for supervised named entity recognition (NER) training. The proposed method is applied to XLM-R with a gazetteer built from Wikidata, and shows great generalization ability across different tracks. Experimental results and detailed analysis verify the effectiveness of the proposed method. The official results show that our system ranked 1st on one track (Hindi) in this task., Comment: Winner system (USTC-NELSLIP) of SemEval 2023 MultiCoNER II shared task on Hindi track. arXiv admin note: substantial text overlap with arXiv:2203.03216
- Published
- 2023
38. Multi-Stage Coarse-to-Fine Contrastive Learning for Conversation Intent Induction
- Author
-
Chu, Caiyuan, Li, Ya, Liu, Yifan, Gu, Jia-Chen, Liu, Quan, Ge, Yongxin, and Hu, Guoping
- Subjects
Computer Science - Computation and Language - Abstract
Intent recognition is critical for task-oriented dialogue systems. However, for emerging domains and new services, it is difficult to accurately identify the key intent of a conversation due to time-consuming data annotation and comparatively poor model transferability. Therefore, the automatic induction of dialogue intention is very important for intelligent dialogue systems. This paper presents our solution to Track 2 of Intent Induction from Conversations for Task-Oriented Dialogue at the Eleventh Dialogue System Technology Challenge (DSTC11). The essence of intention clustering lies in distinguishing the representation of different dialogue utterances. The key to automatic intention induction is that, for any given set of new data, the sentence representation obtained by the model can be well distinguished from different labels. Therefore, we propose a multi-stage coarse-to-fine contrastive learning model training scheme including unsupervised contrastive learning pre-training, supervised contrastive learning pre-training, and fine-tuning with joint contrastive learning and clustering to obtain a better dialogue utterance representation model for the clustering task. In the released DSTC11 Track 2 evaluation results, our proposed system ranked first on both of the two subtasks of this Track., Comment: Ranked 1st on Track 2 at DSTC 11, Accepted by DSTC 11 Workshop
- Published
- 2023
39. Effective drug and shRNA delivery for synergistic treatment of triple-negative breast cancer by sequentially targeting tumor hypoxia
- Author
-
Liu, Xuemeng, Sun, Jiajia, Gu, Jia, Weng, Lingyan, Wang, Xueting, Zhu, Li, Luo, Qianqian, and Chen, Zhongping
- Subjects
Chemical Engineering ,Engineering ,Materials Engineering ,Environmental Engineering ,Cancer ,Biotechnology ,Breast Cancer ,5.1 Pharmaceuticals ,Development of treatments and therapeutic interventions ,5.2 Cellular and gene therapies ,Civil Engineering ,Chemical engineering ,Environmental engineering ,Materials engineering - Abstract
Clinical treatment of TNBC remains challenging, due to the lack of targeted therapies. As TNBC is highly hypoxic with higher HIF-1α expression than other subtypes, we fabricated hypoxia-responsive polymeric micelles co-loading drug and shRNA to treat TNBC by targeting hypoxic tumor microenvironment and subsequently targeting overexpressed HIF-1α under hypoxia. The micelles were assembled from methoxy-polyethylene glycol (mPEG) and poly-L-lysine (PLL) copolymer with AZO as a hypoxia-responsive bridge of mPEG and PLL. Once exposed to hypoxia, AZO bridge was cleaved, resulting in the disassembly of the micelles for rapid release. In vitro and in vivo results showed that the micelles enabled simultaneous delivery of drug and shRNA to hypoxic sites for site-specific rapid release, facilitated by sensitive response to hypoxia; hypoxia-responsive shRNA delivery effectively silenced HIF-1α and its downstream genes, which not only ameliorated the response of hypoxic tumor to drug, but also modulated tumor microenvironment for further improved drug and shRNA delivery; as a result, synergistic treatment of chemotherapy and HIF-1α targeted gene therapy inhibited the growth of primary TNBC tumor and its distant metastasis in a murine model of orthotopic TNBC. Together with their good biocompatibility, hypoxia-responsive polymeric micelles thus emerged as a safe, effective, and universally applicable drug and gene carrier for treatment of TNBC as well as other hypoxic tumors.
- Published
- 2023
40. WIDER & CLOSER: Mixture of Short-channel Distillers for Zero-shot Cross-lingual Named Entity Recognition
- Author
-
Ma, Jun-Yu, Chen, Beiduo, Gu, Jia-Chen, Ling, Zhen-Hua, Guo, Wu, Liu, Quan, Chen, Zhigang, and Liu, Cong
- Subjects
Computer Science - Computation and Language - Abstract
Zero-shot cross-lingual named entity recognition (NER) aims at transferring knowledge from annotated and rich-resource data in source languages to unlabeled and lean-resource data in target languages. Existing mainstream methods based on the teacher-student distillation framework ignore the rich and complementary information lying in the intermediate layers of pre-trained language models, and domain-invariant information is easily lost during transfer. In this study, a mixture of short-channel distillers (MSD) method is proposed to fully interact the rich hierarchical information in the teacher model and to transfer knowledge to the student model sufficiently and efficiently. Concretely, a multi-channel distillation framework is designed for sufficient information transfer by aggregating multiple distillers as a mixture. Besides, an unsupervised method adopting parallel domain adaptation is proposed to shorten the channels between the teacher and student models to preserve domain-invariant features. Experiments on four datasets across nine languages demonstrate that the proposed method achieves new state-of-the-art performance on zero-shot cross-lingual NER and shows great generalization and compatibility across languages and fields., Comment: 13 pages, 4 figures, accepted by EMNLP2022
- Published
- 2022
41. A New Recovery Calibration Method of Steam Stimulation in Shallow Heavy Oil Reservoirs in Kazakhstan
- Author
-
Gu, Jia-qing, Liang, Li-dong, Xu, Jia-long, Wang, Gang, Wang, Miao-miao, Wu, Wei, Series Editor, and Lin, Jia'en, editor
- Published
- 2024
- Full Text
- View/download PDF
42. TREM2 modulates macrophage pyroptosis and inflammatory responses to ameliorate aortic valve calcification
- Author
-
Bian, Jin-Hui, Yuan, Chun-Ze, Gu, Jia-Xi, Lin, Wen-Feng, Xiong, Jia-Qi, Tang, Zhi-Wei, Li, Ao, and Shao, Yong-Feng
- Published
- 2025
- Full Text
- View/download PDF
43. RAMIS: Increasing robustness and accuracy in medical image segmentation with hybrid CNN-transformer synergy
- Author
-
Gu, Jia, Tian, Fangzheng, and Oh, Il-Seok
- Published
- 2025
- Full Text
- View/download PDF
44. Emerging advances in drug delivery systems (DDSs) for optimizing cancer complications
- Author
-
Li, Kerui, Guo, Bei, Gu, Junmou, Ta, Na, Gu, Jia, Yu, Hao, Sun, Mengchi, and Han, Tao
- Published
- 2025
- Full Text
- View/download PDF
45. Weighted Distributed Estimation under Heterogeneity
- Author
-
Gu, Jia and Chen, Songxi
- Subjects
Mathematics - Statistics Theory - Abstract
This paper considers distributed M-estimation under heterogeneous distributions among distributed data blocks. A weighted distributed estimator is proposed to improve the efficiency of the standard "Split-And-Conquer" (SaC) estimator for the common parameter shared by all the data blocks. The weighted distributed estimator is shown to be at least as efficient as the would-be full sample and the generalized method of moment estimators with the latter two estimators requiring full data access. A bias reduction is formulated to the WD estimator to accommodate much larger numbers of data blocks than the existing methods without sacrificing the estimation efficiency, and a similar debiased operation is made to the SaC estimator. The mean squared error (MSE) bounds and the asymptotic distributions of the WD and the two debiased estimators are derived, which shows advantageous performance of the debiased estimators when the number of data blocks is large., Comment: 35 pages, 1 figure
- Published
- 2022
46. MicroRNA-126: From biology to therapeutics
- Author
-
Guo, Bei, Gu, Jia, Zhuang, Tongtian, Zhang, Jingbin, Fan, Chunyang, Li, Yiyao, Zhao, Mengdi, Chen, Ruoran, Wang, Rui, Kong, Yuan, Xu, Shuang, Gao, Wei, Liang, Linlang, Yu, Hao, and Han, Tao
- Published
- 2025
- Full Text
- View/download PDF
47. RACS2: A Framework of Remote Autonomous Control System for Telescope Observation and its application
- Author
-
Wang, Zhi-yue, Zhang, Guang-yu, Wang, Jian, Zhang, Qian, Genga, Zhe, Zhu, Ze-yu, Gu, Jia-Yao, Zheng, Zhen-hao, Zhu, Lu-cheng, Ge, Kun, and Zhang, Hong-fei
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics - Abstract
As the demand of astronomical observation rising, the telescope systems are becoming more and more complex. Thus, the observatory control software needs to be more intelligent, they have to control each instrument inside the observatory, finish the observation tasks autonomously, and report the information to users if needed. We developed a distributed autonomous observatory control framework named Remote Autonomous Control System 2nd, RACS2 to meet these requirements. The RACS2 framework uses decentralized distributed architecture, instrument control software and system service such as observation control service are implemented as different components. The communication between components is implemented based on a high-performance serialization library and a light-weighted messaging library.The interfaces towards python and Experimental Physics and Industrial Control System (EPICS) are implemented, so the RACS2 framework can communicate with EPICS based device control software and python-based software. Several system components including log, executor, scheduler and other modules are developed to help observation. Observation tasks can be programmed with python language, and the plans are scheduled by the scheduler component to achieve autonomous observation.A set of web service is implemented based on the FastAPI framework, with which user can control and manage the framework remotely.Based on the RACS2 framework, we have implemented the DATs telescope's observation system and the space object observation system.We performed remote autonomous observation and received many data with these systems., Comment: 22 pages, 16 figures
- Published
- 2022
48. TegTok: Augmenting Text Generation via Task-specific and Open-world Knowledge
- Author
-
Tan, Chao-Hong, Gu, Jia-Chen, Tao, Chongyang, Ling, Zhen-Hua, Xu, Can, Hu, Huang, Geng, Xiubo, and Jiang, Daxin
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Generating natural and informative texts has been a long-standing problem in NLP. Much effort has been dedicated into incorporating pre-trained language models (PLMs) with various open-world knowledge, such as knowledge graphs or wiki pages. However, their ability to access and manipulate the task-specific knowledge is still limited on downstream tasks, as this type of knowledge is usually not well covered in PLMs and is hard to acquire. To address the problem, we propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework. Our model selects knowledge entries from two types of knowledge sources through dense retrieval and then injects them into the input encoding and output decoding stages respectively on the basis of PLMs. With the help of these two types of knowledge, our model can learn what and how to generate. Experiments on two text generation tasks of dialogue generation and question generation, and on two datasets show that our method achieves better performance than various baseline models., Comment: Accepted by Findings of ACL 2022
- Published
- 2022
49. HeterMPC: A Heterogeneous Graph Neural Network for Response Generation in Multi-Party Conversations
- Author
-
Gu, Jia-Chen, Tan, Chao-Hong, Tao, Chongyang, Ling, Zhen-Hua, Hu, Huang, Geng, Xiubo, and Jiang, Daxin
- Subjects
Computer Science - Computation and Language - Abstract
Recently, various response generation models for two-party conversations have achieved impressive improvements, but less effort has been paid to multi-party conversations (MPCs) which are more practical and complicated. Compared with a two-party conversation where a dialogue context is a sequence of utterances, building a response generation model for MPCs is more challenging, since there exist complicated context structures and the generated responses heavily rely on both interlocutors (i.e., speaker and addressee) and history utterances. To address these challenges, we present HeterMPC, a heterogeneous graph-based neural network for response generation in MPCs which models the semantics of utterances and interlocutors simultaneously with two types of nodes in a graph. Besides, we also design six types of meta relations with node-edge-type-dependent parameters to characterize the heterogeneous interactions within the graph. Through multi-hop updating, HeterMPC can adequately utilize the structural knowledge of conversations for response generation. Experimental results on the Ubuntu Internet Relay Chat (IRC) channel benchmark show that HeterMPC outperforms various baseline models for response generation in MPCs., Comment: Accepted by ACL 2022
- Published
- 2022
50. Amelioration of immunoglobulin A vasculitis by suppression of the pathological expansion of T follicular helper 17 cells
- Author
-
Jiang, Qinglian, Chi, Xuyang, Wei, Tong, Nakayamada, Shingo, Shan, Yu, Sun, Yini, Zhao, Xing, Zhou, Jieqing, Fan, Yan, Gu, Jia, Jiang, Hong, and Ma, Xiaoxue
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.