Author: "Li, Linyang" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Li, Linyang"' showing total 522 results

Start Over Author "Li, Linyang"

522 results on '"Li, Linyang"'

1. LongSafetyBench: Long-Context LLMs Struggle with Safety Issues

Author: Huang, Mianqiu, Liu, Xiaoran, Zhou, Shaojun, Zhang, Mozhi, Tan, Chenkun, Wang, Pengyu, Guo, Qipeng, Xu, Zhe, Li, Linyang, Lei, Zhikai, Li, Linlin, Liu, Qun, Zhou, Yaqian, Qiu, Xipeng, and Huang, Xuanjing
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: With the development of large language models (LLMs), the sequence length of these models continues to increase, drawing significant attention to long-context language models. However, the evaluation of these models has been primarily limited to their capabilities, with a lack of research focusing on their safety. Existing work, such as ManyShotJailbreak, has to some extent demonstrated that long-context language models can exhibit safety concerns. However, the methods used are limited and lack comprehensiveness. In response, we introduce \textbf{LongSafetyBench}, the first benchmark designed to objectively and comprehensively evaluate the safety of long-context models. LongSafetyBench consists of 10 task categories, with an average length of 41,889 words. After testing eight long-context language models on LongSafetyBench, we found that existing models generally exhibit insufficient safety capabilities. The proportion of safe responses from most mainstream long-context LLMs is below 50\%. Moreover, models' safety performance in long-context scenarios does not always align with that in short-context scenarios. Further investigation revealed that long-context models tend to overlook harmful content within lengthy texts. We also proposed a simple yet effective solution, allowing open-source models to achieve performance comparable to that of top-tier closed-source models. We believe that LongSafetyBench can serve as a valuable benchmark for evaluating the safety capabilities of long-context language models. We hope that our work will encourage the broader community to pay attention to the safety of long-context models and contribute to the development of solutions to improve the safety of long-context LLMs.
Published: 2024

2. Possible way to achieve anomalous valley Hall effect by tunable intrinsic piezoelectric polarization in FeO$_2$SiGeN$_2$ monolayer

Author: Tian, Jianke, Li, Jia, Liu, Hengbo, Li, Yan, Liu, Ze, Li, Linyang, Li, Jun, Liu, Guodong, and Shi, Junjie
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics, Condensed Matter - Materials Science
Abstract: Valley-related multiple Hall effect and piezoelectric response are novel transport characteristics in low-dimensional system, however few studies have reported their coexistence in a single system as well as their coupling relationships. By first-principles calculations, we propose a multifunctional Janus semiconductor, i.e. FeO$_2$SiGeN$_2$ monolayer with large valley polarization of about 120 meV and in-plane piezoelectric polarization with d11 of -0.714.03 pm/V. The magnetic anisotropy energy can be significantly regulated by electronic correlation strength and strain, which can be attributed to the change of competition relationship about Fe-3d-resolved magnetic anisotropy energy brought about by external regulatory means. Electronic correlation strength can induce phase transitions in Janus FeO$_2$SiGeN$_2$ monolayer from ferrovalley to quantum anomalous Hall phase, while the half-valley metallic state as the boundary of the phase transition can gererate 100% spin- and valley polarization. The related phase transition mechanism is analyzed based on the two-band strained kp model. The presence of piezoelectric strain coefficients d11 in valleytronic material makes the coupling between charge degrees of freedom and valley degrees of freedom possible, and the intrinsic electric field caused by the in-plane piezoelectric response provide the way to realize piezoelectric anomalous valley Hall effect. This work may pave a way to find a new member of materials with valley-related multiple Hall effect and stimulate further experimental works related to valleytronics and piezotronics.
Published: 2024

3. Piezoelectric Manipulation and Engineering for Layertronics in Two-Dimensional Materials

Author: Tian, Jianke, Li, Jia, Liu, Hengbo, Li, Yan, Liu, Ze, Li, Linyang, Li, Jun, Liu, Guodong, and Shi, Junjie
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics, Condensed Matter - Materials Science, Physics - Applied Physics
Abstract: The electronic transport characteristics of two-dimensional (2D) systems have widespread application prospects in the fabrication of multifunctional nanodevices. However, the current research for basic transport phenomena, such as anomalous valley Hall effect (AVHE) and piezoelectric response, is limited to discrete discussion. Here, we theoretically propose a valley-piezoelectricity coupling strategy beyond the existing paradigm to realize AVHE and layer Hall effect (LHE) in ferrovalley (FV) systems, and its essential principle can be extended to general valleytronic materials. Through first-principles calculations, we demonstrate that the large polarized electric field of 2.8*106 (1.67*107) V/m can be induced by 0.1% uniaxial strain in FV 2H-LaHF (1T-LaHF) monolayers. In addition, the microscopic mechanism of interlayer antiferromagnetic (AFM) state of 2H-LaHF bilayer is uncovered by the spin Hamiltonian and super-superexchange (SSE) interaction. Our findings pave the way for new explorations of valley Hall-related effect involving piezoelectricity.
Published: 2024

4. Spin-layer coupling in altermagnets multilayer: a design principle for spintronics

Author: Tian, Jianke, Li, Jia, Liu, Hengbo, Li, Yan, Liu, Ze, Li, Linyang, Li, Jun, Liu, Guodong, and Shi, Junjie
Subjects: Physics - Applied Physics, Condensed Matter - Materials Science
Abstract: The discovery of collinear symmetric-compensated altermagnets (AM) with intrinsic spin splitting provides a route towards energy-efficient and ultrafast device applications. Here, using first-principles calculations and symmetry analysis, we propose a series of AM Cr2SX (X=O, S, Se) monolayer and explore the spin splitting in Cr2SX multilayer. A general design principle for realizing the spin-layer coupling in odd/even-layer is mapped out based on the comprehensive analysis of spin group symmetry. The spin splitting behavior related with the MzUt, Mz and ML symmetries in AM multilayer can be significantly modulated by magnetic orders, crystal symmetry and external perpendicular gate field (Ez). Due to the spin-compensated bands of sublayers linked by overall Mz and interlayers ML symmetries, the Cr2S2 odd-layer exhibits the unique coexistence of spin splitting and spin degeneracy at high symmetric paths and X/Y valley, respectively. Furthermore, owing to the higher priority of overall ML symmetry compared to interlayers ML symmetry in AM even-layer, the spin-layer coupling of AM multilayer shows strong odd/even-layer dependence. Our work not only offer a new direction for manipulating spin splitting, but also greatly enrich the research on AM monolayer and multilayer.
Published: 2024

5. Case2Code: Learning Inductive Reasoning with Synthetic Data

Author: Shao, Yunfan, Li, Linyang, Ma, Yichuan, Li, Peiji, Song, Demin, Cheng, Qinyuan, Li, Shimin, Li, Xiaonan, Wang, Pengyu, Guo, Qipeng, Yan, Hang, Qiu, Xipeng, Huang, Xuanjing, and Lin, Dahua
Subjects: Computer Science - Computation and Language
Abstract: Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples or sequential transformations. However, collecting large-scale and diverse human-generated inductive data is challenging. We focus on data synthesis in the code domain and propose a \textbf{Case2Code} task by exploiting the expressiveness and correctness of programs. Specifically, we collect a diverse set of executable programs, synthesize input-output transformations for each program, and force LLMs to infer the underlying code implementations based on the synthetic I/O cases. We first evaluate representative LLMs on the synthesized Case2Code task and demonstrate that the Case-to-code induction is challenging for LLMs. Then, we synthesize large-scale Case2Code training samples to train LLMs to perform inductive reasoning. Experimental results show that such induction training benefits not only in distribution Case2Code performance but also enhances various coding abilities of trained LLMs, demonstrating the great potential of learning inductive reasoning via synthetic data.
Published: 2024

6. Orbital origin of magnetic moment enhancement induced by charge density wave in kagome FeGe

Author: Han, Shulun, Li, Linyang, Tang, Chi Sin, Wang, Qi, Zhang, Lingfeng, Diao, Caozheng, Zhao, Mingwen, Sun, Shuo, Tian, Lijun, Breese, Mark B. H., Cai, Chuanbing, Milosevic, Milorad V., Qi, Yanpeng, Wee, Andrew T. S., and Yin, Xinmao
Subjects: Condensed Matter - Strongly Correlated Electrons
Abstract: Interactions among various electronic states such as CDW, magnetism, and superconductivity are of high significance in strongly correlated systems. While significant progress has been made in understanding the relationship between CDW and superconductivity, the interplay between CDW and magnetic order remains largely elusive. Kagome lattices, which intertwine nontrivial topology, charge order, and magnetism, offer an ideal platform for such studies. The kagome magnet FeGe, hosting the unique coupling between CDW and magnetism, has recently garnered considerable attention in that respect. Here we reveal the significant role of the orbital coupling effect during the CDW phase transition, highlighting the orbital origin of the magnetic moment enhancement in FeGe. Our X ray absorption experiments and first principles calculations illuminate the temperature dependent behavior of Fe3d_Ge4p orbital hybridization and corroborate its pivotal impact on the magnetic properties of FeGe. These findings introduce an orbital dimension to the correlation between charge and magnetic degrees of freedom, advancing our understanding of the intriguing quantum phases resulting from this interplay.
Published: 2024

7. Unified Active Retrieval for Retrieval Augmented Generation

Author: Cheng, Qinyuan, Li, Xiaonan, Li, Shimin, Zhu, Qin, Yin, Zhangyue, Shao, Yunfan, Li, Linyang, Sun, Tianxiang, Yan, Hang, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language
Abstract: In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal. Therefore, determining whether to retrieve is crucial for RAG, which is usually referred to as Active Retrieval. However, existing active retrieval methods face two challenges: 1. They usually rely on a single criterion, which struggles with handling various types of instructions. 2. They depend on specialized and highly differentiated procedures, and thus combining them makes the RAG system more complicated and leads to higher response latency. To address these challenges, we propose Unified Active Retrieval (UAR). UAR contains four orthogonal criteria and casts them into plug-and-play classification tasks, which achieves multifaceted retrieval timing judgements with negligible extra inference cost. We further introduce the Unified Active Retrieval Criteria (UAR-Criteria), designed to process diverse active retrieval scenarios through a standardized procedure. Experiments on four representative types of user instructions show that UAR significantly outperforms existing work on the retrieval timing judgement and the performance of downstream tasks, which shows the effectiveness of UAR and its helpfulness to downstream tasks., Comment: Accepted to Findings of EMNLP 2024, camera-ready version
Published: 2024

8. InternLM2 Technical Report

Author: Cai, Zheng, Cao, Maosong, Chen, Haojiong, Chen, Kai, Chen, Keyu, Chen, Xin, Chen, Xun, Chen, Zehui, Chen, Zhi, Chu, Pei, Dong, Xiaoyi, Duan, Haodong, Fan, Qi, Fei, Zhaoye, Gao, Yang, Ge, Jiaye, Gu, Chenya, Gu, Yuzhe, Gui, Tao, Guo, Aijia, Guo, Qipeng, He, Conghui, Hu, Yingfan, Huang, Ting, Jiang, Tao, Jiao, Penglong, Jin, Zhenjiang, Lei, Zhikai, Li, Jiaxing, Li, Jingwen, Li, Linyang, Li, Shuaibin, Li, Wei, Li, Yining, Liu, Hongwei, Liu, Jiangning, Hong, Jiawei, Liu, Kaiwen, Liu, Kuikun, Liu, Xiaoran, Lv, Chengqi, Lv, Haijun, Lv, Kai, Ma, Li, Ma, Runyuan, Ma, Zerun, Ning, Wenchang, Ouyang, Linke, Qiu, Jiantao, Qu, Yuan, Shang, Fukai, Shao, Yunfan, Song, Demin, Song, Zifan, Sui, Zhihao, Sun, Peng, Sun, Yu, Tang, Huanze, Wang, Bin, Wang, Guoteng, Wang, Jiaqi, Wang, Jiayu, Wang, Rui, Wang, Yudong, Wang, Ziyi, Wei, Xingjian, Weng, Qizhen, Wu, Fan, Xiong, Yingtong, Xu, Chao, Xu, Ruiliang, Yan, Hang, Yan, Yirong, Yang, Xiaogui, Ye, Haochen, Ying, Huaiyuan, Yu, Jia, Yu, Jing, Zang, Yuhang, Zhang, Chuyu, Zhang, Li, Zhang, Pan, Zhang, Peng, Zhang, Ruijie, Zhang, Shuo, Zhang, Songyang, Zhang, Wenjian, Zhang, Wenwei, Zhang, Xingcheng, Zhang, Xinyue, Zhao, Hui, Zhao, Qian, Zhao, Xiaomeng, Zhou, Fengzhe, Zhou, Zaida, Zhuo, Jingming, Zou, Yicheng, Qiu, Xipeng, Qiao, Yu, and Lin, Dahua
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack" test. InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy that addresses conflicting human preferences and reward hacking. By releasing InternLM2 models in different training stages and model sizes, we provide the community with insights into the model's evolution.
Published: 2024

9. MOSS: An Open Conversational Large Language Model

Author: Sun, Tianxiang, Zhang, Xiaotian, He, Zhengfu, Li, Peng, Cheng, Qinyuan, Liu, Xiangyang, Yan, Hang, Shao, Yunfan, Tang, Qiong, Zhang, Shiduo, Zhao, Xingjian, Chen, Ke, Zheng, Yining, Zhou, Zhejian, Li, Ruixiao, Zhan, Jun, Zhou, Yunhua, Li, Linyang, Yang, Xiaogui, Wu, Lingling, Yin, Zhangyue, Huang, Xuanjing, Jiang, Yu-Gang, and Qiu, Xipeng
Published: 2024
Full Text: View/download PDF

10. Balanced Data Sampling for Language Model Training with Clustering

Author: Shao, Yunfan, Li, Linyang, Fei, Zhaoye, Yan, Hang, Lin, Dahua, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Data plays a fundamental role in the training of Large Language Models (LLMs). While attention has been paid to the collection and composition of datasets, determining the data sampling strategy in training remains an open question. Most LLMs are trained with a simple strategy, random sampling. However, this sampling strategy ignores the unbalanced nature of training data distribution, which can be sub-optimal. In this paper, we propose ClusterClip Sampling to balance the text distribution of training data for better model training. Specifically, ClusterClip Sampling utilizes data clustering to reflect the data distribution of the training set and balances the common samples and rare samples during training based on the cluster results. A repetition clip operation is introduced to mitigate the overfitting issue led by samples from certain clusters. Extensive experiments validate the effectiveness of ClusterClip Sampling, which outperforms random sampling and other cluster-based sampling variants under various training datasets and large language models., Comment: ACL 2024 (findings), Code is released at https://github.com/choosewhatulike/cluster-clip
Published: 2024

11. AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Author: Zhan, Jun, Dai, Junqi, Ye, Jiasheng, Zhou, Yunhua, Zhang, Dong, Liu, Zhigeng, Zhang, Xin, Yuan, Ruibin, Zhang, Ge, Li, Linyang, Yan, Hang, Fu, Jie, Gui, Tao, Sun, Tianxiang, Jiang, Yugang, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music. AnyGPT can be trained stably without any alterations to the current large language model (LLM) architecture or training paradigms. Instead, it relies exclusively on data-level preprocessing, facilitating the seamless integration of new modalities into LLMs, akin to the incorporation of new languages. We build a multimodal text-centric dataset for multimodal alignment pre-training. Utilizing generative models, we synthesize the first large-scale any-to-any multimodal instruction dataset. It consists of 108k samples of multi-turn conversations that intricately interweave various modalities, thus equipping the model to handle arbitrary combinations of multimodal inputs and outputs. Experimental results demonstrate that AnyGPT is capable of facilitating any-to-any multimodal conversation while achieving performance comparable to specialized models across all modalities, proving that discrete representations can effectively and conveniently unify multiple modalities within a language model. Demos are shown in https://junzhan2000.github.io/AnyGPT.github.io/, Comment: 28 pages, 16 figures, under review, work in progress
Published: 2024

12. Turn Waste into Worth: Rectifying Top-$k$ Router of MoE

Author: Zeng, Zhiyuan, Guo, Qipeng, Fei, Zhaoye, Yin, Zhangyue, Zhou, Yunhua, Li, Linyang, Sun, Tianxiang, Yan, Hang, Lin, Dahua, and Qiu, Xipeng
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Sparse Mixture of Experts (MoE) models are popular for training large language models due to their computational efficiency. However, the commonly used top-$k$ routing mechanism suffers from redundancy computation and memory costs due to the unbalanced routing. Some experts are overflow, where the exceeding tokens are dropped. While some experts are vacant, which are padded with zeros, negatively impacting model performance. To address the dropped tokens and padding, we propose the Rectify-Router, comprising the Intra-GPU Rectification and the Fill-in Rectification. The Intra-GPU Rectification handles dropped tokens, efficiently routing them to experts within the GPU where they are located to avoid inter-GPU communication. The Fill-in Rectification addresses padding by replacing padding tokens with the tokens that have high routing scores. Our experimental results demonstrate that the Intra-GPU Rectification and the Fill-in Rectification effectively handle dropped tokens and padding, respectively. Furthermore, the combination of them achieves superior performance, surpassing the accuracy of the vanilla top-1 router by 4.7%.
Published: 2024

13. InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

Author: Ying, Huaiyuan, Zhang, Shuo, Li, Linyang, Zhou, Zhejian, Shao, Yunfan, Fei, Zhaoye, Ma, Yichuan, Hong, Jiawei, Liu, Kuikun, Wang, Ziyi, Wang, Yudong, Wu, Zijian, Li, Shuaibin, Zhou, Fengzhe, Liu, Hongwei, Zhang, Songyang, Zhang, Wenwei, Yan, Hang, Qiu, Xipeng, Wang, Jiayu, Chen, Kai, and Lin, Dahua
Subjects: Computer Science - Computation and Language
Abstract: The math abilities of large language models can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format and supervise our model to be a versatile math reasoner, verifier, prover, and augmenter. These abilities can be used to develop the next math LLMs or self-iteration. InternLM-Math obtains open-sourced state-of-the-art performance under the setting of in-context learning, supervised fine-tuning, and code-assisted reasoning in various informal and formal benchmarks including GSM8K, MATH, Hungary math exam, MathBench-ZH, and MiniF2F. Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning. We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math. Our models, codes, and data are released at \url{https://github.com/InternLM/InternLM-Math}.
Published: 2024

14. Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora

Author: Fei, Zhaoye, Shao, Yunfan, Li, Linyang, Zeng, Zhiyuan, He, Conghui, Yan, Hang, Lin, Dahua, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language
Abstract: Large language models have demonstrated remarkable potential in various tasks, however, there remains a significant scarcity of open-source models and data for specific domains. Previous works have primarily focused on manually specifying resources and collecting high-quality data on specific domains, which significantly consume time and effort. To address this limitation, we propose an efficient data collection method $\textit{Query of CC}$ based on large language models. This method bootstraps seed information through a large language model and retrieves related data from public corpora. It not only collects knowledge-related data for specific domains but unearths the data with potential reasoning procedures. Through the application of this method, we have curated a high-quality dataset called KNOWLEDGE PILE, encompassing four major domains, including stem and humanities sciences, among others. Experimental results demonstrate that KNOWLEDGE PILE significantly improves the performance of large language models in mathematical and knowledge-related reasoning ability tests. To facilitate academic sharing, we open-source our dataset and code, providing valuable support to the academic community., Comment: We have released the full data (total of 735GB) in https://huggingface.co/datasets/Query-of-CC/knowledge_pile_full and partial data (about 40GB) in https://huggingface.co/datasets/Query-of-CC/knowledge_pile
Published: 2024

15. Can AI Assistants Know What They Don't Know?

Author: Cheng, Qinyuan, Sun, Tianxiang, Liu, Xiangyang, Zhang, Wenwei, Yin, Zhangyue, Li, Shimin, Li, Linyang, He, Zhengfu, Chen, Kai, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Recently, AI assistants based on large language models (LLMs) show surprising performance in many tasks, such as dialogue, solving math problems, writing code, and using tools. Although LLMs possess intensive world knowledge, they still make factual errors when facing some knowledge intensive tasks, like open-domain question answering. These untruthful responses from the AI assistant may cause significant risks in practical applications. We believe that an AI assistant's refusal to answer questions it does not know is a crucial method for reducing hallucinations and making the assistant truthful. Therefore, in this paper, we ask the question "Can AI assistants know what they don't know and express them through natural language?" To answer this question, we construct a model-specific "I don't know" (Idk) dataset for an assistant, which contains its known and unknown questions, based on existing open-domain question answering datasets. Then we align the assistant with its corresponding Idk dataset and observe whether it can refuse to answer its unknown questions after alignment. Experimental results show that after alignment with Idk datasets, the assistant can refuse to answer most its unknown questions. For questions they attempt to answer, the accuracy is significantly higher than before the alignment., Comment: Work in progress
Published: 2024

16. InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance

Author: Wang, Pengyu, Zhang, Dong, Li, Linyang, Tan, Chenkun, Wang, Xinghao, Ren, Ke, Jiang, Botian, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language
Abstract: With the rapid development of large language models (LLMs), they are not only used as general-purpose AI assistants but are also customized through further fine-tuning to meet the requirements of different applications. A pivotal factor in the success of current LLMs is the alignment process. Current alignment methods, such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), focus on training-time alignment and are often complex and cumbersome to implement. Therefore, we develop \textbf{InferAligner}, a novel inference-time alignment method that utilizes cross-model guidance for harmlessness alignment. InferAligner utilizes safety steering vectors extracted from safety-aligned model to modify the activations of the target model when responding to harmful inputs, thereby guiding the target model to provide harmless responses. Experimental results show that our method can be very effectively applied to domain-specific models in finance, medicine, and mathematics, as well as to multimodal large language models (MLLMs) such as LLaVA. It significantly diminishes the Attack Success Rate (ASR) of both harmful instructions and jailbreak attacks, while maintaining almost unchanged performance in downstream tasks.
Published: 2024

17. Rectangular carbon nitrides C4N monolayers with a zigzag buckled structure: Quasi-one-dimensional Dirac nodal lines and topological flat edge states

Author: Li, Linyang, Li, Jialei, Yu, Yawei, Song, Yuxuan, Li, Jia, Liu, Xiaobiao, Peeters, François M., Chen, Xin, and Liu, Guodong
Subjects: Condensed Matter - Materials Science
Abstract: Due to the flexibility of C and N atoms in forming different types of bonds, the prediction of new two-dimensional (2D) carbon nitrides is a hot topic in the field of carbon-based materials. Using first-principles calculations, we propose two C4N monolayers with a zigzag buckled (ZB) structure. The ZB C4N monolayers contain raised-C (raised-N) atoms with sp3 hybridization, different from the traditional 2D graphene-like carbon nitride materials with sp2 hybridization. Interestingly, the band structures of the ZB C4N monolayers exhibit quasi-one-dimensional (quasi-1D) Dirac nodal line that results from the corresponding quasi-1D structure of the zigzag carbon chains, which is essentially different from the more common ring-shaped nodal line. The quasi-1D Dirac nodal line exhibits the following features: (i) gapless Dirac points, (ii) varying Fermi velocity, and (iii) slightly curved band along the high-symmetry path. All these features are successfully explained by our proposed tight-binding model that includes interactions up to the third nearest-neighbor. The Fermi velocity of the 2D system can reach 105 m/s, which is promising for applications in high-speed electronic devices. The topological flat band structure determined by the Zak phase and band inversion of the corresponding 1D system is edge-dependent, which is corresponding to the Su-Schrieffer-Heeger model, providing to rich physical phenomena., Comment: 31 pages, 8 figures
Published: 2024

18. Super-Resolution on Rotationally Scanned Photoacoustic Microscopy Images Incorporating Scanning Prior

Author: Pan, Kai, Li, Linyang, Lin, Li, Cheng, Pujin, Lyu, Junyan, Xi, Lei, and Tang, Xiaoyin
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Photoacoustic Microscopy (PAM) images integrating the advantages of optical contrast and acoustic resolution have been widely used in brain studies. However, there exists a trade-off between scanning speed and image resolution. Compared with traditional raster scanning, rotational scanning provides good opportunities for fast PAM imaging by optimizing the scanning mechanism. Recently, there is a trend to incorporate deep learning into the scanning process to further increase the scanning speed.Yet, most such attempts are performed for raster scanning while those for rotational scanning are relatively rare. In this study, we propose a novel and well-performing super-resolution framework for rotational scanning-based PAM imaging. To eliminate adjacent rows' displacements due to subject motion or high-frequency scanning distortion,we introduce a registration module across odd and even rows in the preprocessing and incorporate displacement degradation in the training. Besides, gradient-based patch selection is proposed to increase the probability of blood vessel patches being selected for training. A Transformer-based network with a global receptive field is applied for better performance. Experimental results on both synthetic and real datasets demonstrate the effectiveness and generalizability of our proposed framework for rotationally scanned PAM images'super-resolution, both quantitatively and qualitatively. Code is available at https://github.com/11710615/PAMSR.git.
Published: 2023

19. LLatrieval: LLM-Verified Retrieval for Verifiable Generation

Author: Li, Xiaonan, Zhu, Changtai, Li, Linyang, Yin, Zhangyue, Sun, Tianxiang, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval
Abstract: Verifiable generation aims to let the large language model (LLM) generate text with supporting documents, which enables the user to flexibly verify the answer and makes the LLM's output more reliable. Retrieval plays a crucial role in verifiable generation. Specifically, the retrieved documents not only supplement knowledge to help the LLM generate correct answers, but also serve as supporting evidence for the user to verify the LLM's output. However, the widely used retrievers become the bottleneck of the entire pipeline and limit the overall performance. Their capabilities are usually inferior to LLMs since they often have much fewer parameters than the large language model and have not been demonstrated to scale well to the size of LLMs. If the retriever does not correctly find the supporting documents, the LLM can not generate the correct and verifiable answer, which overshadows the LLM's remarkable abilities. To address these limitations, we propose \LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can sufficiently support answering the question. Thus, the LLM can iteratively provide feedback to retrieval and facilitate the retrieval result to fully support verifiable generation. Experiments show that LLatrieval significantly outperforms extensive baselines and achieves state-of-the-art results., Comment: Accepted by NAACL 2024 (Main Conference)
Published: 2023

20. Watermarking LLMs with Weight Quantization

Author: Li, Linyang, Jiang, Botian, Wang, Pengyu, Ren, Ke, Yan, Hang, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language
Abstract: Abuse of large language models reveals high risks as large language models are being deployed at an astonishing speed. It is important to protect the model weights to avoid malicious usage that violates licenses of open-source large language models. This paper proposes a novel watermarking strategy that plants watermarks in the quantization process of large language models without pre-defined triggers during inference. The watermark works when the model is used in the fp32 mode and remains hidden when the model is quantized to int8, in this way, the users can only inference the model without further supervised fine-tuning of the model. We successfully plant the watermark into open-source large language model weights including GPT-Neo and LLaMA. We hope our proposed method can provide a potential direction for protecting model weights in the era of large language model applications., Comment: Accepted by Findings of EMNLP2023
Published: 2023

21. Character-LLM: A Trainable Agent for Role-Playing

Author: Shao, Yunfan, Li, Linyang, Dai, Junqi, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) can be used to serve as agents to simulate human behaviors, given the powerful ability to understand human instructions and provide high-quality generated texts. Such ability stimulates us to wonder whether LLMs can simulate a person in a higher form than simple human behaviors. Therefore, we aim to train an agent with the profile, experience, and emotional states of a specific person instead of using limited prompts to instruct ChatGPT API. In this work, we introduce Character-LLM that teach LLMs to act as specific people such as Beethoven, Queen Cleopatra, Julius Caesar, etc. Our method focuses on editing profiles as experiences of a certain character and training models to be personal simulacra with these experiences. To assess the effectiveness of our approach, we build a test playground that interviews trained agents and evaluates whether the agents \textit{memorize} their characters and experiences. Experimental results show interesting observations that help build future simulacra of humankind., Comment: To appear at EMNLP 2023; Repo at https://github.com/choosewhatulike/trainable-agents
Published: 2023

22. SeqXGPT: Sentence-Level AI-Generated Text Detection

Author: Wang, Pengyu, Li, Linyang, Ren, Ke, Jiang, Botian, Zhang, Dong, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language
Abstract: Widely applied large language models (LLMs) can generate human-like content, raising concerns about the abuse of LLMs. Therefore, it is important to build strong AI-generated text (AIGT) detectors. Current works only consider document-level AIGT detection, therefore, in this paper, we first introduce a sentence-level detection challenge by synthesizing a dataset that contains documents that are polished with LLMs, that is, the documents contain sentences written by humans and sentences modified by LLMs. Then we propose \textbf{Seq}uence \textbf{X} (Check) \textbf{GPT}, a novel method that utilizes log probability lists from white-box LLMs as features for sentence-level AIGT detection. These features are composed like \textit{waves} in speech processing and cannot be studied by LLMs. Therefore, we build SeqXGPT based on convolution and self-attention networks. We test it in both sentence and document-level detection challenges. Experimental results show that previous methods struggle in solving sentence-level AIGT detection, while our method not only significantly surpasses baseline methods in both sentence and document-level detection challenges but also exhibits strong generalization capabilities., Comment: Accepted by EMNLP2023
Published: 2023

23. PerturbScore: Connecting Discrete and Continuous Perturbations in NLP

Author: Li, Linyang, Ren, Ke, Shao, Yunfan, Wang, Pengyu, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language
Abstract: With the rapid development of neural network applications in NLP, model robustness problem is gaining more attention. Different from computer vision, the discrete nature of texts makes it more challenging to explore robustness in NLP. Therefore, in this paper, we aim to connect discrete perturbations with continuous perturbations, therefore we can use such connections as a bridge to help understand discrete perturbations in NLP models. Specifically, we first explore how to connect and measure the correlation between discrete perturbations and continuous perturbations. Then we design a regression task as a PerturbScore to learn the correlation automatically. Through experimental results, we find that we can build a connection between discrete and continuous perturbations and use the proposed PerturbScore to learn such correlation, surpassing previous methods used in discrete perturbation measuring. Further, the proposed PerturbScore can be well generalized to different datasets, perturbation methods, indicating that we can use it as a powerful tool to study model robustness in NLP., Comment: Accepted by Findings of EMNLP2023
Published: 2023

24. In situ detection of spatial distribution information of temperature-pH-strain of sandstone cultural relics

Author: Xu, Changyuan, Li, Linyang, Hu, Rong, Wu, Huihua, Kong, Lingnan, Zhong, Nianbing, Wan, Bo, Wu, Lei, Lai, Dong, He, Yuanyuan, Liu, Yang, Peng, Xiaoling, Zhao, Mingfu, and Xie, Quanhua
Published: 2024
Full Text: View/download PDF

25. Multijugate Dual Learning for Low-Resource Task-Oriented Dialogue System

Author: Li, Shimin, Zhang, Xiaotian, Zheng, Yanjun, Li, Linyang, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language
Abstract: Dialogue data in real scenarios tend to be sparsely available, rendering data-starved end-to-end dialogue systems trained inadequately. We discover that data utilization efficiency in low-resource scenarios can be enhanced by mining alignment information uncertain utterance and deterministic dialogue state. Therefore, we innovatively implement dual learning in task-oriented dialogues to exploit the correlation of heterogeneous data. In addition, the one-to-one duality is converted into a multijugate duality to reduce the influence of spurious correlations in dual training for generalization. Without introducing additional parameters, our method could be implemented in arbitrary networks. Extensive empirical analyses demonstrate that our proposed method improves the effectiveness of end-to-end task-oriented dialogue systems under multiple benchmarks and obtains state-of-the-art results in low-resource scenarios., Comment: Accepted to Findings of ACL 2023
Published: 2023

26. Improving Contrastive Learning of Sentence Embeddings from AI Feedback

Author: Cheng, Qinyuan, Yang, Xiaogui, Sun, Tianxiang, Li, Linyang, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Contrastive learning has become a popular approach in natural language processing, particularly for the learning of sentence embeddings. However, the discrete nature of natural language makes it difficult to ensure the quality of positive and negative sample pairs generated through data augmentation methods. Although supervised contrastive learning can produce more accurate sample pairs with human feedback labels, it still lacks fine-grained training signals. In this paper, we propose to improve \textbf{C}ontrastive \textbf{L}earning of sentence embeddings from \textbf{AI} \textbf{F}eedback \textbf{(CLAIF)}. Our method utilizes AI feedback from large pre-trained language models (LLMs) to construct sample pairs with fine-grained sample similarity scores to improve contrastive learning. Besides, we combine human feedback and AI feedback to provide better supervision signals for supervised contrastive learning of sentence embeddings. Experimental results show that our method achieves state-of-the-art performance on several semantic textual similarity (STS) and transfer learning tasks compared to other unsupervised and supervised contrastive learning methods., Comment: Accepted to Findings of ACL 2023, camera-ready version
Published: 2023

27. Origin Tracing and Detecting of LLMs

Author: Li, Linyang, Wang, Pengyu, Ren, Ke, Sun, Tianxiang, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language
Abstract: The extraordinary performance of large language models (LLMs) heightens the importance of detecting whether the context is generated by an AI system. More importantly, while more and more companies and institutions release their LLMs, the origin can be hard to trace. Since LLMs are heading towards the time of AGI, similar to the origin tracing in anthropology, it is of great importance to trace the origin of LLMs. In this paper, we first raise the concern of the origin tracing of LLMs and propose an effective method to trace and detect AI-generated contexts. We introduce a novel algorithm that leverages the contrastive features between LLMs and extracts model-wise features to trace the text origins. Our proposed method works under both white-box and black-box settings therefore can be widely generalized to detect various LLMs.(e.g. can be generalized to detect GPT-3 models without the GPT-3 models). Also, our proposed method requires only limited data compared with the supervised learning methods and can be extended to trace new-coming model origins. We construct extensive experiments to examine whether we can trace the origins of given texts. We provide valuable observations based on the experimental results, such as the difficulty level of AI origin tracing, and the AI origin similarities, and call for ethical concerns of LLM providers. We are releasing all codes and data as a toolkit and benchmark for future AI origin tracing and detecting studies. \footnote{We are releasing all available resource at \url{https://github.com/OpenLMLab/}.}, Comment: working in progress
Published: 2023

28. Correction to: MOSS: An Open Conversational Large Language Model

Author: Sun, Tianxiang, Zhang, Xiaotian, He, Zhengfu, Li, Peng, Cheng, Qinyuan, Liu, Xiangyang, Yan, Hang, Shao, Yunfan, Tang, Qiong, Zhang, Shiduo, Zhao, Xingjian, Chen, Ke, Zheng, Yining, Zhou, Zhejian, Li, Ruixiao, Zhan, Jun, Zhou, Yunhua, Li, Linyang, Yang, Xiaogui, Wu, Lingling, Yin, Zhangyue, Huang, Xuanjing, Jiang, Yu-Gang, and Qiu, Xipeng
Published: 2024
Full Text: View/download PDF

29. Mitigating Negative Style Transfer in Hybrid Dialogue System

Author: Li, Shimin, Cheng, Qinyuan, Li, Linyang, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language
Abstract: As the functionality of dialogue systems evolves, hybrid dialogue systems that accomplish user-specific goals and participate in open-topic chitchat with users are attracting growing attention. Existing research learns both tasks concurrently utilizing a multi-task fusion technique but ignores the negative transfer phenomenon induced by the unique textual style differences. Therefore, contrastive learning based on the latent variable model is used to decouple the various textual genres in the latent space. We devise supervised and self-supervised positive and negative sample constructions for diverse datasets. In addition, to capitalize on the style information contained in the decoupled latent variables, we employ a style prefix that incorporates latent variables further to control the generation of responses with varying styles. We performed extensive experiments on three dialogue datasets, including a hybrid dialogue dataset and two task-oriented dialogue datasets. The experimental results demonstrate that our method can mitigate the negative style transfer issue and achieves state-of-the-art performance on multiple dialogue datasets., Comment: Accepted by AAAI-2023
Published: 2022

30. Identifying GNSS NLOS Using Visual Label and Ensemble Tree Under Complex City Environment

Author: Xu, Zhenbang, Li, Xin, Han, Xinjuan, Zhou, Yuxuan, Li, Linyang, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Tan, Kay Chen, Series Editor, Yang, Changfeng, editor, and Xie, Jun, editor
Published: 2024
Full Text: View/download PDF

31. Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with User Simulator

Author: Cheng, Qinyuan, Li, Linyang, Quan, Guofeng, Gao, Feng, Mou, Xiaofeng, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language
Abstract: Task-Oriented Dialogue (TOD) systems are drawing more and more attention in recent studies. Current methods focus on constructing pre-trained models or fine-tuning strategies while the evaluation of TOD is limited by a policy mismatch problem. That is, during evaluation, the user utterances are from the annotated dataset while these utterances should interact with previous responses which can have many alternatives besides annotated texts. Therefore, in this work, we propose an interactive evaluation framework for TOD. We first build a goal-oriented user simulator based on pre-trained models and then use the user simulator to interact with the dialogue system to generate dialogues. Besides, we introduce a sentence-level and a session-level score to measure the sentence fluency and session coherence in the interactive evaluation. Experimental results show that RL-based TOD systems trained by our proposed user simulator can achieve nearly 98% inform and success rates in the interactive evaluation of MultiWOZ dataset and the proposed scores measure the response quality besides the inform and success rates. We are hoping that our work will encourage simulator-based interactive evaluations in the TOD task., Comment: Accepted by Findings of EMNLP 2022
Published: 2022

32. Orbital hybridization-driven charge density wave transition in CsV3Sb5 kagome superconductor

Author: Han, Shulun, Tang, Chi Sin, Li, Linyang, Liu, Yi, Liu, Huimin, Gou, Jian, Wu, Jing, Zhou, Difan, Yang, Ping, Diao, Caozheng, Ji, Jiacheng, Bao, Jinke, Zhang, Lingfeng, Zhao, Mingwen, Milošević, M. V., Guo, Yanqun, Tian, Lijun, Breese, Mark B. H., Cao, Guanghan, Cai, Chuanbing, Wee, Andrew T. S., and Yin, Xinmao
Subjects: Condensed Matter - Superconductivity
Abstract: Owing to its inherent non-trivial geometry, the unique structural motif of the recently discovered Kagome topological superconductor AV3Sb5 is an ideal host of diverse topologically non-trivial phenomena, including giant anomalous Hall conductivity, topological charge order, charge density wave, and unconventional superconductivity. Despite possessing a normal-state CDW order in the form of topological chiral charge order and diverse superconducting gaps structures, it remains unclear how fundamental atomic-level properties and many-body effects including Fermi surface nesting, electron-phonon coupling, and orbital hybridization contribute to these symmetry-breaking phenomena. Here, we report the direct participation of the V3d-Sb5p orbital hybridization in mediating the CDW phase transition in CsV3Sb5. The combination of temperature-dependent X-ray absorption and first principles studies clearly indicate the Inverse Star of David structure as the preferred reconstruction in the low-temperature CDW phase. Our results highlight the critical role that Sb orbitals plays and establish orbital hybridization as the direct mediator of the CDW states and structural transition dynamics in Kagome unconventional superconductors. This is a significant step towards the fundamental understanding and control of the emerging correlated phases from the Kagome lattice through the orbital interactions and provide promising approaches to novel regimes in unconventional orders and topology.
Published: 2022
Full Text: View/download PDF

33. Retrospective study on the effect of adding heparin sodium saline at different time points on improving severe coagulation in dialyzers (不同时间点追加肝素钠盐水对透析器重度凝血改善作用的回顾性研究)

Author: LI Linyang (李林洋), WANG Xinhui (王新慧), LI Hao (李昊), HUANG Yanping (黄砚萍), and YU Renhuan (余仁欢)
Subjects: dialyzer coagulation, anticoagulant, hemodialysis, heparin saline, 透析器凝血, 抗凝, 血液透析, 肝素钠盐水, Nursing, RT1-120
Abstract: Objective To investigate the improvement effect of adding heparin sodium saline at different time points on the occurrence of severe dialyzer coagulation in maintenance hemodialysis (MHD) patients during dialysis. Methods Patients who underwent MHD treatment at the dialysis center of Xiyuan Hospital from November 2021 to November 2022 and experienced severe coagulation with dialyzers within 2 hours of treatment were included in the study. Electronic medical records were consulted to collect baseline data, data on various dialysis parameters, and relevant case records. According to the time point of adding heparin sodium, it was divided into two groups: the treatment group and the control group. Both groups were treated with 200 mL of physiological saline first, while the treatment group was treated with 1 mL of 625 U/mL (5 mg/mL) heparin sodium saline added to the dialysis machine heparin pump before re infusion of blood, and a maintenance dose of 1 mL/h was given until the dialysis was stopped 30 minutes before the end of dialysis; The control group received the same dose of heparin sodium salt solution added by the same method after re inducing blood. Compare the improvement effect of two sets of processes on severe coagulation dialyzers. Results A total of 12 MHD patients were included in this study. The treatment group consisted of 6 patients who completed 4 hours of normal dialysis treatment, while the control group consisted of 4 patients who completed normal treatment, and 2 patients who returned to the machine early due to severe coagulation in the dialyzer and vascular system. After dismounting, the coagulation level was evaluated. In the treatment group, 4 patients with no coagulation were classified as 0 level; Two patients had grade 1 coagulation on the dialyzer. In the control group, two dialyzers showed grade 3 coagulation, and there were small amounts of clots in both the arterial and venous ampulla; Three dialyzers showed grade 2 coagulation; One dialyzer showed grade 1 coagulation. There was a statistically significant difference (P＜0. 05) in the evaluation of coagulation grade between the two groups. Conclusion The method of adding heparin sodium saline before re infusion of blood can effectively alleviate the severe coagulation condition of the dialyzer, restore the function of dialyzer, and avoid blood loss, which is better than the method of adding heparin sodium saline after re infusion of blood to start extracorporeal circulation. (目的探究不同时间点追加肝素钠盐水对维持性血液透析(MHD)患者透析中发生透析器重度凝血的改善作用。方法纳入2021年11—至2022年11月中国中医科学院西苑医院透析中心进行MHD治疗, 在治疗开始2 h内发生透析器重度凝血的患者, 查阅电子病历收集患者基线资料、透析中各参数资料、相关病例记录。根据追加肝素钠的时间点分为治疗组和对照组两组。两组均采用先回冲200 mL生理盐水, 治疗组采用再次引血前通过透析机肝素泵追加浓度为625 U/mL(5 mg/mL)的肝素钠盐水1 mL并给予维持量1 mL/h, 直至透析结束前30min停止; 对照组采用再次引血后通过同种方法追加相同剂量肝素钠盐水。比较两组流程对重度凝血透析器的改善作用。结果本研究共纳入12例MHD患者, 治疗组6例患者均完成4 h正常透析治疗, 对照组中4例完成正常治疗, 2例因透析器及血管路重度凝血提前回血下机。下机后凝血等级评估, 治疗组中, 4例患者无凝血为0级; 2例患者透析器1级凝血。对照组中, 2例透析器为3级凝血, 动脉壶、静脉壶均存在少量凝血块; 3例透析器为2级凝血; 1例透析器为1级凝血。凝血等级评估经比较两组存在统计学差异(P＜0. 05)。结论于再次引血前追加肝素钠盐水的方法, 可以有效缓解透析器重度凝血的状况, 恢复透析器的功能, 避免血液丢失, 优于再次引血开始体外循环后再追加的方式。)
Published: 2024
Full Text: View/download PDF

34. An efficient GNSS NLOS signal identification and processing method using random forest and factor analysis with visual labels

Author: Li, Linyang, Xu, Zhenbang, Jia, Zhen, Lai, Luguang, and Shen, Yang
Published: 2024
Full Text: View/download PDF

35. Text Adversarial Purification as Defense against Adversarial Attacks

Author: Li, Linyang, Song, Demin, and Qiu, Xipeng
Subjects: Computer Science - Computation and Language
Abstract: Adversarial purification is a successful defense mechanism against adversarial attacks without requiring knowledge of the form of the incoming attack. Generally, adversarial purification aims to remove the adversarial perturbations therefore can make correct predictions based on the recovered clean samples. Despite the success of adversarial purification in the computer vision field that incorporates generative models such as energy-based models and diffusion models, using purification as a defense strategy against textual adversarial attacks is rarely explored. In this work, we introduce a novel adversarial purification method that focuses on defending against textual adversarial attacks. With the help of language models, we can inject noise by masking input texts and reconstructing the masked texts based on the masked language models. In this way, we construct an adversarial purification process for textual models against the most widely used word-substitution adversarial attacks. We test our proposed adversarial purification method on several strong adversarial attack methods including Textfooler and BERT-Attack and experimental results indicate that the purification algorithm can successfully defend against strong word-substitution attacks., Comment: Accepted by ACL2023 main conference
Published: 2022

36. MarkBERT: Marking Word Boundaries Improves Chinese BERT

Author: Li, Linyang, Dai, Yong, Tang, Duyu, Qiu, Xipeng, Xu, Zenglin, and Shi, Shuming
Subjects: Computer Science - Computation and Language
Abstract: We present a Chinese BERT model dubbed MarkBERT that uses word information in this work. Existing word-based BERT models regard words as basic units, however, due to the vocabulary limit of BERT, they only cover high-frequency words and fall back to character level when encountering out-of-vocabulary (OOV) words. Different from existing works, MarkBERT keeps the vocabulary being Chinese characters and inserts boundary markers between contiguous words. Such design enables the model to handle any words in the same way, no matter they are OOV words or not. Besides, our model has two additional benefits: first, it is convenient to add word-level learning objectives over markers, which is complementary to traditional character and sentence-level pretraining tasks; second, it can easily incorporate richer semantics such as POS tags of words by replacing generic markers with POS tag-specific markers. With the simple markers insertion, MarkBERT can improve the performances of various downstream tasks including language understanding and sequence labeling. \footnote{All the codes and models will be made publicly available at \url{https://github.com/daiyongya/markbert}}, Comment: preprint
Published: 2022

37. 'Is Whole Word Masking Always Better for Chinese BERT?': Probing on Chinese Grammatical Error Correction

Author: Dai, Yong, Li, Linyang, Zhou, Cong, Feng, Zhangyin, Zhao, Enbo, Qiu, Xipeng, Li, Piji, and Tang, Duyu
Subjects: Computer Science - Computation and Language
Abstract: Whole word masking (WWM), which masks all subwords corresponding to a word at once, makes a better English BERT model. For the Chinese language, however, there is no subword because each token is an atomic character. The meaning of a word in Chinese is different in that a word is a compositional unit consisting of multiple characters. Such difference motivates us to investigate whether WWM leads to better context understanding ability for Chinese BERT. To achieve this, we introduce two probing tasks related to grammatical error correction and ask pretrained models to revise or insert tokens in a masked language modeling manner. We construct a dataset including labels for 19,075 tokens in 10,448 sentences. We train three Chinese BERT models with standard character-level masking (CLM), WWM, and a combination of CLM and WWM, respectively. Our major findings are as follows: First, when one character needs to be inserted or replaced, the model trained with CLM performs the best. Second, when more than one character needs to be handled, WWM is the key to better performance. Finally, when being fine-tuned on sentence-level downstream tasks, models trained with different masking strategies perform comparably., Comment: Short paper in Findings of ACL 2022
Published: 2022

38. Demethylzeylasteral induces PD-L1 ubiquitin–proteasome degradation and promotes antitumor immunity via targeting USP22

Author: Zhang, Yanyan, Huang, Yun, Yu, Dianping, Xu, Mengting, Hu, Hongmei, Zhang, Qing, Cai, Minchen, Geng, Xiangxin, Zhang, Hongwei, Xia, Jianhua, Guo, Mengmeng, Lu, Dong, Xu, Hanchi, Li, Linyang, Zhang, Xing, Wang, Qun, Liu, Sanhong, and Zhang, Weidong
Published: 2024
Full Text: View/download PDF

39. Vacancy-defective nano-carbon matrix enables highly efficient Fe single atom catalyst for aqueous and flexible Zn-Air batteries

Author: Sun, Hao, Li, Linyang, Zhu, Zhongjie, Li, Xin, Zhu, Zhi, Yuan, Tao, Yang, Junhe, Pang, Yuepeng, and Zheng, Shiyou
Published: 2024
Full Text: View/download PDF

40. Enhancing BDS-3 PPP-B2b real-time positioning performance by tightly integrating MEMS IMU and LiDAR in GNSS-Degraded environment

Author: Guo, Wenzhuo, Li, Linyang, Zhao, Dongqing, Zhu, Fengbo, and Lai, Luguang
Published: 2025
Full Text: View/download PDF

41. Realizations of Su-Schrieffer-Heeger (SSH) edge states in two-dimensional hydrocarbon systems

Author: Song, Yuxuan, Liu, Xibin, Zhou, Meng, Guan, Lixiu, Liu, Xiaobiao, and Li, Linyang
Published: 2024
Full Text: View/download PDF

42. KNN-BERT: Fine-Tuning Pre-Trained Models with KNN Classifier

Author: Li, Linyang, Song, Demin, Ma, Ruotian, Qiu, Xipeng, and Huang, Xuanjing
Subjects: Computer Science - Computation and Language
Abstract: Pre-trained models are widely used in fine-tuning downstream tasks with linear classifiers optimized by the cross-entropy loss, which might face robustness and stability problems. These problems can be improved by learning representations that focus on similarities in the same class and contradictions in different classes when making predictions. In this paper, we utilize the K-Nearest Neighbors Classifier in pre-trained model fine-tuning. For this KNN classifier, we introduce a supervised momentum contrastive learning framework to learn the clustered representations of the supervised downstream tasks. Extensive experiments on text classification tasks and robustness tests show that by incorporating KNNs with the traditional fine-tuning process, we can obtain significant improvements on the clean accuracy in both rich-source and few-shot settings and can improve the robustness against adversarial attacks. \footnote{all codes is available at https://github.com/LinyangLee/KNN-BERT}, Comment: preprint
Published: 2021

43. Template-free Prompt Tuning for Few-shot NER

Author: Ma, Ruotian, Zhou, Xin, Gui, Tao, Tan, Yiding, Li, Linyang, Zhang, Qi, and Huang, Xuanjing
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Prompt-based methods have been successfully applied in sentence-level few-shot learning tasks, mostly owing to the sophisticated design of templates and label words. However, when applied to token-level labeling tasks such as NER, it would be time-consuming to enumerate the template queries over all potential entity spans. In this work, we propose a more elegant method to reformulate NER tasks as LM problems without any templates. Specifically, we discard the template construction process while maintaining the word prediction paradigm of pre-training models to predict a class-related pivot word (or label word) at the entity position. Meanwhile, we also explore principled ways to automatically search for appropriate label words that the pre-trained models can easily adapt to. While avoiding complicated template-based process, the proposed LM objective also reduces the gap between different objectives used in pre-training and fine-tuning, thus it can better benefit the few-shot performance. Experimental results demonstrate the effectiveness of the proposed method over bert-tagger and template-based method under few-shot setting. Moreover, the decoding speed of the proposed method is up to 1930.12 times faster than the template-based method., Comment: Accepted by NAACL 2022 (Oral)
Published: 2021

44. Floquet band engineering and topological phase transitions in 1T' transition metal dichalcogenides

Author: Kong, Xiangru, Luo, Wei, Li, Linyang, Yoon, Mina, Berlijn, Tom, and Liang, Liangbo
Subjects: Condensed Matter - Materials Science
Abstract: Using ab initio tight-binding approaches, we investigate Floquet band engineering of the 1T' phase of transition metal dichalcogenides (MX2, M = W, Mo and X = Te, Se, S) monolayers under the irradiation with circularly polarized light. Our first principles calculations demonstrate that light can induce important transitions in the topological phases of this emerging materials family. For example, upon irradiation, Te-based MX2 undergoes a phase transition from quantum spin Hall (QSH) semimetal to time-reversal symmetry broken QSH insulator with a nontrivial band gap of up to 92.5 meV. On the other hand, Se- and S-based MX2 undergoes the topological phase transition from the QSH effect to the quantum anomalous Hall (QAH) effect and into trivial phases with increasing light intensity. From a general perspective, our work brings further insight into non-equilibrium topological systems.
Published: 2021

45. Synergy of modulating in-plane pores and zincophilic sites on the flexible graphene paper for efficient and dendrite-free hosted Zn anode

Author: Zhou, Zihan, Cao, Liujun, Li, Linyang, Pu, Hong, You, Jiagui, Yan, Guilong, and Long, Jianping
Published: 2024
Full Text: View/download PDF

46. Hydrogenation-controlled band engineering of dumbbell graphene

Author: Song, Yuxuan, Chen, Mengteng, Xie, Xiao, Liu, Xiaobiao, Li, Jia, Peeters, François M., and Li, Linyang
Published: 2024
Full Text: View/download PDF

47. Prediction of surface deformation induced by mining thin coal seam: A case study of Guanshan coalfield in Sichuan

Author: Cai, Wei, Li, Linyang, Lin, Mengming, Wang, Jingyong, Wang, Ping, Li, Qingmiao, Ye, Zhiping, Zhang, Jie, and Zhao, Jianjun
Published: 2024
Full Text: View/download PDF

48. Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning

Author: Li, Linyang, Song, Demin, Li, Xiaonan, Zeng, Jiehang, Ma, Ruotian, and Qiu, Xipeng
Subjects: Computer Science - Cryptography and Security, Computer Science - Computation and Language
Abstract: \textbf{P}re-\textbf{T}rained \textbf{M}odel\textbf{s} have been widely applied and recently proved vulnerable under backdoor attacks: the released pre-trained weights can be maliciously poisoned with certain triggers. When the triggers are activated, even the fine-tuned model will predict pre-defined labels, causing a security threat. These backdoors generated by the poisoning methods can be erased by changing hyper-parameters during fine-tuning or detected by finding the triggers. In this paper, we propose a stronger weight-poisoning attack method that introduces a layerwise weight poisoning strategy to plant deeper backdoors; we also introduce a combinatorial trigger that cannot be easily detected. The experiments on text classification tasks show that previous defense methods cannot resist our weight-poisoning method, which indicates that our method can be widely applied and may provide hints for future model robustness studies., Comment: Accepted by EMNLP2021 main conference
Published: 2021

49. Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution

Author: Li, Zongyi, Xu, Jianhan, Zeng, Jiehang, Li, Linyang, Zheng, Xiaoqing, Zhang, Qi, Chang, Kai-Wei, and Hsieh, Cho-Jui
Subjects: Computer Science - Computation and Language
Abstract: Recent studies have shown that deep neural networks are vulnerable to intentionally crafted adversarial examples, and various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models. However, there is a lack of systematic study on comparing different defense approaches under the same attacking setting. In this paper, we seek to fill the gap of systematic studies through comprehensive researches on understanding the behavior of neural text classifiers trained by various defense methods under representative adversarial attacks. In addition, we propose an effective method to further improve the robustness of neural text classifiers against such attacks and achieved the highest accuracy on both clean and adversarial examples on AGNEWS and IMDB datasets by a significant margin., Comment: Accepted by EMNLP2021 main conference
Published: 2021

50. Ferromagnetism with in-plane magnetization, Dirac spin-gapless semiconducting property, and tunable topological states in two-dimensional rare-earth-metal dinitrides

Author: Yu, Yawei, Chen, Xin, Liu, Xiaobiao, Li, Jia, Sanyal, Biplab, Kong, Xiangru, Peeters, François M., and Li, Linyang
Subjects: Condensed Matter - Materials Science
Abstract: As the bulk single-crystal MoN2/ReN2 with a layered structure was successfully synthesized in experiment, transition-metal dinitrides have attracted considerable attention in recent years. Here, we focus on rare-earth-metal (Rem) elements and propose seven stable Rem dinitride monolayers with a 1T structure, namely 1T-RemN2. These monolayers have a ferromagnetic ground state with in-plane magnetization. Without spin-orbit coupling (SOC) effect, the band structures are spin-polarized with Dirac points at the Fermi level. Remarkably, the 1T-LuN2 monolayer shows an isotropic magnetic anisotropy energy in the xy-plane with in-plane magnetization, indicating easy tunability of the magnetization direction. When rotating the magnetization vector in the xy-plane, our proposed model can accurately describe the variety of the SOC band gap and two topological states (Weyl-like semimetal and Chern insulator states) appear with tunable properties. The Weyl-like semimetal state is a critical point between the two Chern insulator states with opposite sign of the Chern numbers. The large nontrivial band gap (up to 60.3 meV) and the Weyl-like semimetal state are promising for applications in spintronic devices., Comment: 27 pages, 6 figures
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

522 results on '"Li, Linyang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources