6,960 results on '"Sun Xu"'
Search Results
52. Object-centered family interactions for young autistic children: a diary study
- Author
-
Hu, Yuqi, Sun, Xu, Yao, Cheng, Luo, Shijian, Liu, Bingjian, Xue, Mengru, and Lyu, Hui
- Published
- 2024
- Full Text
- View/download PDF
53. The associations between peripheral inflammatory and lipid parameters, white matter hyperintensity, and cognitive function in patients with non-disabling ischemic cerebrovascular events
- Author
-
Li, Binghan, Gu, Zhengsheng, Wang, Weisen, Du, Bingying, Wu, Chenghao, Li, Bin, Wang, Tianren, Yin, Ge, Gao, Xin, Chen, Jingjing, Bi, Xiaoying, Zhang, Hailing, and Sun, Xu
- Published
- 2024
- Full Text
- View/download PDF
54. Constructing and validating the museum product creativity measurement (MPCM): dimensions for creativity assessment of souvenir products in Chinese urban historical museums
- Author
-
Cheng, Hui, Sun, Xu, Xie, Jing, Liu, Bing-Jian, Xia, Liang, Luo, Shi-Jian, Tian, Xin, Qiu, Xiao, Li, Wei, and Li, Yang
- Published
- 2024
- Full Text
- View/download PDF
55. Open Fundus Photograph Dataset with Pathologic Myopia Recognition and Anatomical Structure Annotation
- Author
-
Fang, Huihui, Li, Fei, Wu, Junde, Fu, Huazhu, Sun, Xu, Orlando, José Ignacio, Bogunović, Hrvoje, Zhang, Xiulan, and Xu, Yanwu
- Published
- 2024
- Full Text
- View/download PDF
56. Validation of Roussouly classification in predicting the occurrence of adjacent segment disease after short-level lumbar fusion surgery
- Author
-
Wang, Muyi, Wang, Xin, Wang, Hao, Shen, Yifei, Qiu, Yong, Sun, Xu, Zhou, Dong, and Jiang, Yuqing
- Published
- 2024
- Full Text
- View/download PDF
57. Analysis of abnormal expansion of pipe system and optimization of structural stress in 350MW unit
- Author
-
Liu Xin, Zhang Yanming, Liu Qun, Sun Xu, Wang Yu, and Zhao Liye
- Subjects
Environmental sciences ,GE1-350 - Abstract
A 350MW power plant main steam and reheat hot steam pipe subsidence occurred in part of the pipe section, through field inspection, calculation and checking analysis, combined with pipeline support and hanger adjustment, load testing and elevation measurement and other means, the settlement of the pipe system to optimize the overall stress state of the pipe system. Through the thermal displacement of the pipe system support lifting point, the selection and calculation of the pipe system support hanger and the overall design state stress check of the pipe system, the design state is basically consistent with the check state. Through to the key node load tests have shown that small spring hanger selection is the primary cause of section settling, combined with the spring adjustment space and calculation results are part of the hanger. Finally, the settlement of pipeline is realized, main steam pipe at the same time a stress and secondary stress were achieved about 15% of the decline, piping stress has been further optimized.
- Published
- 2021
- Full Text
- View/download PDF
58. Diagnosing oral squamous cell carcinoma using salivary biomarkers
- Author
-
Mohammad Sayedur Rahman Khan, Fatama Siddika, Sun Xu, Xiao Lin Liu, Mei Shuang, and Hao Fu Liang
- Subjects
Biomarker ,DNA marker ,RNA marker ,Protein marker ,Saliva ,Squamous cell carcinoma ,Medicine - Abstract
Oral cancer is becoming frightful public health issue because of its raising incidence as well as mortality rates worldwide. Out of all types of oral cancer, the oral squamous cell carcinoma is the most common malignant tumor with an incidence of about 90%. This fatal disease is diagnosed through a comprehensive clinical examination followed by the histological assessments forming the diagnostic gold standard. Although the oral cavity is simply accessible, but maximum oral cancers are usually diagnosed at the late stage. Consequently, it is necessary to implicate newer screening and early diagnosing approaches which will diminish the morbidity as well as mortality related to this disease. Saliva which is a complex biological fluid has a direct relation with the oral cancer lesion and contains abnormal DNA, RNA, protein molecules released by the malignant cells. These can be labelled as neoplastic biomarkers proposed to play an important role in diagnostic, therapeutic and prognostic purposes for oral cancers as well as other diseases. The aim of this review paper is to concisely discuss the different types of potential salivary biomarkers as well as their interaction for screening of oral cancers.
- Published
- 2018
- Full Text
- View/download PDF
59. VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
- Author
-
Li, Shicheng, Li, Lei, Ren, Shuhuai, Liu, Yuanxin, Liu, Yi, Gao, Rundong, Sun, Xu, and Hou, Lu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
The ability to perceive how objects change over time is a crucial ingredient in human intelligence. However, current benchmarks cannot faithfully reflect the temporal understanding abilities of video-language models (VidLMs) due to the existence of static visual shortcuts. To remedy this issue, we present VITATECS, a diagnostic VIdeo-Text dAtaset for the evaluation of TEmporal Concept underStanding. Specifically, we first introduce a fine-grained taxonomy of temporal concepts in natural language in order to diagnose the capability of VidLMs to comprehend different temporal aspects. Furthermore, to disentangle the correlation between static and temporal information, we generate counterfactual video descriptions that differ from the original one only in the specified temporal aspect. We employ a semi-automatic data collection framework using large language models and human-in-the-loop annotation to obtain high-quality counterfactual descriptions efficiently. Evaluation of representative video-language understanding models confirms their deficiency in temporal understanding, revealing the need for greater emphasis on the temporal elements in video-language research., Comment: Accepted by ECCV 2024. 19 pages, 3 figures, 8 tables. Data is available at https://github.com/lscpku/VITATECS
- Published
- 2023
60. RECALL: A Benchmark for LLMs Robustness against External Counterfactual Knowledge
- Author
-
Liu, Yi, Huang, Lianzhe, Li, Shicheng, Chen, Sishuo, Zhou, Hao, Meng, Fandong, Zhou, Jie, and Sun, Xu
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
LLMs and AI chatbots have improved people's efficiency in various fields. However, the necessary knowledge for answering the question may be beyond the models' knowledge boundaries. To mitigate this issue, many researchers try to introduce external knowledge, such as knowledge graphs and Internet contents, into LLMs for up-to-date information. However, the external information from the Internet may include counterfactual information that will confuse the model and lead to an incorrect response. Thus there is a pressing need for LLMs to possess the ability to distinguish reliable information from external knowledge. Therefore, to evaluate the ability of LLMs to discern the reliability of external knowledge, we create a benchmark from existing knowledge bases. Our benchmark consists of two tasks, Question Answering and Text Generation, and for each task, we provide models with a context containing counterfactual information. Evaluation results show that existing LLMs are susceptible to interference from unreliable external knowledge with counterfactual information, and simple intervention methods make limited contributions to the alleviation of this issue.
- Published
- 2023
61. FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
- Author
-
Liu, Yuanxin, Li, Lei, Ren, Shuhuai, Gao, Rundong, Li, Shicheng, Chen, Sishuo, Sun, Xu, and Hou, Lu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recently, open-domain text-to-video (T2V) generation models have made remarkable progress. However, the promising results are mainly shown by the qualitative cases of generated videos, while the quantitative evaluation of T2V models still faces two critical problems. Firstly, existing studies lack fine-grained evaluation of T2V models on different categories of text prompts. Although some benchmarks have categorized the prompts, their categorization either only focuses on a single aspect or fails to consider the temporal information in video generation. Secondly, it is unclear whether the automatic evaluation metrics are consistent with human standards. To address these problems, we propose FETV, a benchmark for Fine-grained Evaluation of Text-to-Video generation. FETV is multi-aspect, categorizing the prompts based on three orthogonal aspects: the major content, the attributes to control and the prompt complexity. FETV is also temporal-aware, which introduces several temporal categories tailored for video generation. Based on FETV, we conduct comprehensive manual evaluations of four representative T2V models, revealing their pros and cons on different categories of prompts from different aspects. We also extend FETV as a testbed to evaluate the reliability of automatic T2V metrics. The multi-aspect categorization of FETV enables fine-grained analysis of the metrics' reliability in different scenarios. We find that existing automatic metrics (e.g., CLIPScore and FVD) correlate poorly with human evaluation. To address this problem, we explore several solutions to improve CLIPScore and FVD, and develop two automatic metrics that exhibit significant higher correlation with humans than existing metrics. Benchmark page: https://github.com/llyx97/FETV., Comment: NeurIPS 2023 Datasets and Benchmarks Track
- Published
- 2023
62. TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
- Author
-
Ren, Shuhuai, Chen, Sishuo, Li, Shicheng, Sun, Xu, and Hou, Lu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Large-scale video-language pre-training has made remarkable strides in advancing video-language understanding tasks. However, the heavy computational burden of video encoding remains a formidable efficiency bottleneck, particularly for long-form videos. These videos contain massive visual tokens due to their inherent 3D properties and spatiotemporal redundancy, making it challenging to capture complex temporal and spatial relationships. To tackle this issue, we propose an efficient method called TEmporal-Spatial Token Aggregation (TESTA). TESTA condenses video semantics by adaptively aggregating similar frames, as well as similar patches within each frame. TESTA can reduce the number of visual tokens by 75% and thus accelerate video encoding. Building upon TESTA, we introduce a pre-trained video-language model equipped with a divided space-time token aggregation module in each video encoder block. We evaluate our model on five datasets for paragraph-to-video retrieval and long-form VideoQA tasks. Experimental results show that TESTA improves computing efficiency by 1.7 times, and achieves significant performance gains from its scalability in processing longer input frames, e.g., +13.7 R@1 on QuerYD and +6.5 R@1 on Condensed Movie., Comment: 16 pages, 9 figures, code is available at https://github.com/RenShuhuai-Andy/TESTA
- Published
- 2023
63. Incorporating Pre-trained Model Prompting in Multimodal Stock Volume Movement Prediction
- Author
-
Chen, Ruibo, Zhang, Zhiyuan, Liu, Yi, Bao, Ruihan, Harimoto, Keiko, and Sun, Xu
- Subjects
Computer Science - Computation and Language ,Computer Science - Computational Engineering, Finance, and Science - Abstract
Multimodal stock trading volume movement prediction with stock-related news is one of the fundamental problems in the financial area. Existing multimodal works that train models from scratch face the problem of lacking universal knowledge when modeling financial news. In addition, the models ability may be limited by the lack of domain-related knowledge due to insufficient data in the datasets. To handle this issue, we propose the Prompt-based MUltimodal Stock volumE prediction model (ProMUSE) to process text and time series modalities. We use pre-trained language models for better comprehension of financial news and adopt prompt learning methods to leverage their capability in universal knowledge to model textual information. Besides, simply fusing two modalities can cause harm to the unimodal representations. Thus, we propose a novel cross-modality contrastive alignment while reserving the unimodal heads beside the fusion head to mitigate this problem. Extensive experiments demonstrate that our proposed ProMUSE outperforms existing baselines. Comprehensive analyses further validate the effectiveness of our architecture compared to potential variants and learning mechanisms., Comment: 9 pages, 3 figures, 7 tables. Accepted by 2023 KDD Workshop on Machine Learning in Finance
- Published
- 2023
64. Nonlinear conjugate gradient methods: worst-case convergence rates via computer-assisted analyses
- Author
-
Das Gupta, Shuvomoy, Freund, Robert M., Sun, Xu Andy, and Taylor, Adrien
- Published
- 2024
- Full Text
- View/download PDF
65. The contagion of ethical voice among peers: an attribution perspective
- Author
-
Zhao, Nan, He, Bin, and Sun, Xu
- Published
- 2024
- Full Text
- View/download PDF
66. E3 Ubiquitin Ligase ASB14 Inhibits Cardiomyocyte Proliferation by Regulating MAPRE2 Ubiquitination
- Author
-
Yang, Yanpeng, Ma, Dongpu, Liu, Bo, Sun, Xu, Fu, Wei, Lv, Feifei, and Qiu, Chunguang
- Published
- 2024
- Full Text
- View/download PDF
67. Reduced nitrogen rate improves post-anthesis assimilates to grain and ameliorates grain-filling characteristics of winter wheat in dry land
- Author
-
Wang, Jinjin, Sun, Xu, Hussain, Sadam, Yang, Lihua, Gao, Sisi, Zhang, Peng, Chen, Xiaoli, and Ren, Xiaolong
- Published
- 2024
- Full Text
- View/download PDF
68. Current concept in alveolar cleft management
- Author
-
Mohammad Sayedur Rahman Khan, Mei Shuang, Xiao Lin Liu, Sun Xu, and Hao Fu Liang
- Subjects
Alveolar cleft ,Alveolar osteoplasty ,Bone graft ,Bone graft substitutes ,Medicine - Abstract
The alveolar cleft is known as the developmental defect of bone in alveolar process of maxillae which occurs in 75% of the cleft lip and palate patients with different types of clinical presentation like unilateral or bilateral and complete or incomplete. Secondary alveolar cleft reconstruction with autogenic spongy bone grafting (osteoplasty) at the stage of mixed dentition is commonly accepted treatment to help in the maintenance of maxillary arch continuity, repairing of oronasal fistula, eruption of the permanent dentition, enhancement of nasal symmetry through providing alar base support and improving speech. As of late, conflicting argument of alveolar cleft management is continuing regarding treatment planning with timing, graft materials, surgical techniques as well as methods of evaluation of the progress of alveolar osteoplasty. Now-a-days, experiments have made for the application of allogeneic bone, artificial bone, and recombinant human bone morphogenetic protein (rhBMP), along with growth factors to diminish the donor-site morbidity associated autogenic bone grafting. The purpose of this review is to discuss about pathogenesis and aetiology of cleft defects, surgical techniques, assessment of progress of alveolar bone graft and proposed future materials for bone graft.
- Published
- 2017
- Full Text
- View/download PDF
69. MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
- Author
-
Yang, Bang, Liu, Fenglin, Wu, Xian, Wang, Yaowei, Sun, Xu, and Zou, Yuexian
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Supervised visual captioning models typically require a large scale of images or videos paired with descriptions in a specific language (i.e., the vision-caption pairs) for training. However, collecting and labeling large-scale datasets is time-consuming and expensive for many scenarios and languages. Therefore, sufficient labeled pairs are usually not available. To deal with the label shortage problem, we present a simple yet effective zero-shot approach MultiCapCLIP that can generate visual captions for different scenarios and languages without any labeled vision-caption pairs of downstream datasets. In the training stage, MultiCapCLIP only requires text data for input. Then it conducts two main steps: 1) retrieving concept prompts that preserve the corresponding domain knowledge of new scenarios; 2) auto-encoding the prompts to learn writing styles to output captions in a desired language. In the testing stage, MultiCapCLIP instead takes visual data as input directly to retrieve the concept prompts to generate the final visual descriptions. The extensive experiments on image and video captioning across four benchmarks and four languages (i.e., English, Chinese, German, and French) confirm the effectiveness of our approach. Compared with state-of-the-art zero-shot and weakly-supervised methods, our method achieves 4.8% and 21.5% absolute improvements in terms of BLEU@4 and CIDEr metrics. Our code is available at https://github.com/yangbang18/MultiCapCLIP., Comment: ACL'2023, 13 pages, 4 figures
- Published
- 2023
- Full Text
- View/download PDF
70. Towards Codable Watermarking for Injecting Multi-bits Information to LLMs
- Author
-
Wang, Lean, Yang, Wenkai, Chen, Deli, Zhou, Hao, Lin, Yankai, Meng, Fandong, Zhou, Jie, and Sun, Xu
- Subjects
Computer Science - Computation and Language - Abstract
As large language models (LLMs) generate texts with increasing fluency and realism, there is a growing need to identify the source of texts to prevent the abuse of LLMs. Text watermarking techniques have proven reliable in distinguishing whether a text is generated by LLMs by injecting hidden patterns. However, we argue that existing LLM watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs (such as encoding model version, generation time, user id, etc.). In this work, we conduct the first systematic study on the topic of Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information. First of all, we study the taxonomy of LLM watermarking technologies and give a mathematical formulation for CTWL. Additionally, we provide a comprehensive evaluation system for CTWL: (1) watermarking success rate, (2) robustness against various corruptions, (3) coding rate of payload information, (4) encoding and decoding efficiency, (5) impacts on the quality of the generated text. To meet the requirements of these non-Pareto-improving metrics, we follow the most prominent vocabulary partition-based watermarking direction, and devise an advanced CTWL method named Balance-Marking. The core idea of our method is to use a proxy language model to split the vocabulary into probability-balanced parts, thereby effectively maintaining the quality of the watermarked text. Our code is available at https://github.com/lancopku/codable-watermarking-for-llm., Comment: ICLR 2024 poster
- Published
- 2023
71. M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning
- Author
-
Li, Lei, Yin, Yuwei, Li, Shicheng, Chen, Liang, Wang, Peiyi, Ren, Shuhuai, Li, Mukai, Yang, Yazheng, Xu, Jingjing, Sun, Xu, Kong, Lingpeng, and Liu, Qi
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Instruction tuning has significantly advanced large language models (LLMs) such as ChatGPT, enabling them to align with human instructions across diverse tasks. However, progress in open vision-language models (VLMs) has been limited due to the scarcity of high-quality instruction datasets. To tackle this challenge and promote research in the vision-language field, we introduce the Multi-Modal, Multilingual Instruction Tuning (M$^3$IT) dataset, designed to optimize VLM alignment with human instructions. Our M$^3$IT dataset comprises 40 carefully curated datasets, including 2.4 million instances and 400 manually written task instructions, reformatted into a vision-to-text structure. Key tasks are translated into 80 languages with an advanced translation system, ensuring broader accessibility. M$^3$IT surpasses previous datasets regarding task coverage, instruction number and instance scale. Moreover, we develop Ying-VLM, a VLM model trained on our M$^3$IT dataset, showcasing its potential to answer complex questions requiring world knowledge, generalize to unseen video tasks, and comprehend unseen instructions in Chinese. We have open-sourced the dataset to encourage further research., Comment: Fix dataset url: https://huggingface.co/datasets/MMInstruction/M3IT Project: https://m3-it.github.io/
- Published
- 2023
72. Exploring sex differences in collaborative virtual environments for participation equality and user experience
- Author
-
Yang, Yifan, Zhang, Sheng, Sun, Xu, Zhang, Xingyi, Sun, Xiaotong, Jing, Ying, and Yang, Canjun
- Published
- 2024
- Full Text
- View/download PDF
73. Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
- Author
-
Wang, Lean, Li, Lei, Dai, Damai, Chen, Deli, Zhou, Hao, Meng, Fandong, Zhou, Jie, and Sun, Xu
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
In-context learning (ICL) emerges as a promising capability of large language models (LLMs) by providing them with demonstration examples to perform diverse tasks. However, the underlying mechanism of how LLMs learn from the provided context remains under-explored. In this paper, we investigate the working mechanism of ICL through an information flow lens. Our findings reveal that label words in the demonstration examples function as anchors: (1) semantic information aggregates into label word representations during the shallow computation layers' processing; (2) the consolidated information in label words serves as a reference for LLMs' final predictions. Based on these insights, we introduce an anchor re-weighting method to improve ICL performance, a demonstration compression technique to expedite inference, and an analysis framework for diagnosing ICL errors in GPT2-XL. The promising applications of our findings again validate the uncovered ICL working mechanism and pave the way for future studies., Comment: Accepted by EMNLP 2023
- Published
- 2023
74. Can Language Models Understand Physical Concepts?
- Author
-
Li, Lei, Xu, Jingjing, Dong, Qingxiu, Zheng, Ce, Liu, Qi, Kong, Lingpeng, and Sun, Xu
- Subjects
Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Language models~(LMs) gradually become general-purpose interfaces in the interactive and embodied world, where the understanding of physical concepts is an essential prerequisite. However, it is not yet clear whether LMs can understand physical concepts in the human world. To investigate this, we design a benchmark VEC that covers the tasks of (i) Visual concepts, such as the shape and material of objects, and (ii) Embodied Concepts, learned from the interaction with the world such as the temperature of objects. Our zero (few)-shot prompting results show that the understanding of certain visual concepts emerges as scaling up LMs, but there are still basic concepts to which the scaling law does not apply. For example, OPT-175B performs close to humans with a zero-shot accuracy of 85\% on the material concept, yet behaves like random guessing on the mass concept. Instead, vision-augmented LMs such as CLIP and BLIP achieve a human-level understanding of embodied concepts. Analysis indicates that the rich semantics in visual representation can serve as a valuable source of embodied knowledge. Inspired by this, we propose a distillation method to transfer embodied knowledge from VLMs to LMs, achieving performance gain comparable with that by scaling up the parameters of LMs 134x. Our dataset is available at \url{https://github.com/TobiasLee/VEC}
- Published
- 2023
75. Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter
- Author
-
Liu, Yi, Bi, Xiaohan, Li, Lei, Chen, Sishuo, Yang, Wenkai, and Sun, Xu
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Federated Multilingual Neural Machine Translation (Fed-MNMT) has emerged as a promising paradigm for institutions with limited language resources. This approach allows multiple institutions to act as clients and train a unified model through model synchronization, rather than collecting sensitive data for centralized training. This significantly reduces the cost of corpus collection and preserves data privacy. However, as pre-trained language models (PLMs) continue to increase in size, the communication cost for transmitting parameters during synchronization has become a training speed bottleneck. In this paper, we propose a communication-efficient Fed-MNMT framework that addresses this issue by keeping PLMs frozen and only transferring lightweight adapter modules between clients. Since different language pairs exhibit substantial discrepancies in data distributions, adapter parameters of clients may conflict with each other. To tackle this, we explore various clustering strategies to group parameters for integration and mitigate the negative effects of conflicting parameters. Experimental results demonstrate that our framework reduces communication cost by over 98% while achieving similar or even better performance compared to competitive baselines. Further analysis reveals that clustering strategies effectively solve the problem of linguistic discrepancy and pruning adapter modules further improves communication efficiency., Comment: Findings of ACL 2023
- Published
- 2023
76. Edit As You Wish: Video Caption Editing with Multi-grained User Control
- Author
-
Yao, Linli, Zhang, Yuanmeng, Wang, Ziheng, Hou, Xinglin, Ge, Tiezheng, Jiang, Yuning, Sun, Xu, and Jin, Qin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Multimedia - Abstract
Automatically narrating videos in natural language complying with user requests, i.e. Controllable Video Captioning task, can help people manage massive videos with desired intentions. However, existing works suffer from two shortcomings: 1) the control signal is single-grained which can not satisfy diverse user intentions; 2) the video description is generated in a single round which can not be further edited to meet dynamic needs. In this paper, we propose a novel \textbf{V}ideo \textbf{C}aption \textbf{E}diting \textbf{(VCE)} task to automatically revise an existing video description guided by multi-grained user requests. Inspired by human writing-revision habits, we design the user command as a pivotal triplet \{\textit{operation, position, attribute}\} to cover diverse user needs from coarse-grained to fine-grained. To facilitate the VCE task, we \textit{automatically} construct an open-domain benchmark dataset named VATEX-EDIT and \textit{manually} collect an e-commerce dataset called EMMAD-EDIT. We further propose a specialized small-scale model (i.e., OPA) compared with two generalist Large Multi-modal Models to perform an exhaustive analysis of the novel task. For evaluation, we adopt comprehensive metrics considering caption fluency, command-caption consistency, and video-caption alignment. Experiments reveal the task challenges of fine-grained multi-modal semantics understanding and processing. Our datasets, codes, and evaluation tools are available at https://github.com/yaolinli/VCE., Comment: Accepted by ACM MM 2024
- Published
- 2023
77. PALM: Open Fundus Photograph Dataset with Pathologic Myopia Recognition and Anatomical Structure Annotation
- Author
-
Fang, Huihui, Li, Fei, Wu, Junde, Fu, Huazhu, Sun, Xu, Orlando, José Ignacio, Bogunović, Hrvoje, Zhang, Xiulan, and Xu, Yanwu
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Pathologic myopia (PM) is a common blinding retinal degeneration suffered by highly myopic population. Early screening of this condition can reduce the damage caused by the associated fundus lesions and therefore prevent vision loss. Automated diagnostic tools based on artificial intelligence methods can benefit this process by aiding clinicians to identify disease signs or to screen mass populations using color fundus photographs as inputs. This paper provides insights about PALM, our open fundus imaging dataset for pathological myopia recognition and anatomical structure annotation. Our databases comprises 1200 images with associated labels for the pathologic myopia category and manual annotations of the optic disc, the position of the fovea and delineations of lesions such as patchy retinal atrophy (including peripapillary atrophy) and retinal detachment. In addition, this paper elaborates on other details such as the labeling process used to construct the database, the quality and characteristics of the samples and provides other relevant usage notes., Comment: 10 pages, 6 figures
- Published
- 2023
78. Diffusion Theory as a Scalpel: Detecting and Purifying Poisonous Dimensions in Pre-trained Language Models Caused by Backdoor or Bias
- Author
-
Zhang, Zhiyuan, Chen, Deli, Zhou, Hao, Meng, Fandong, Zhou, Jie, and Sun, Xu
- Subjects
Computer Science - Computation and Language - Abstract
Pre-trained Language Models (PLMs) may be poisonous with backdoors or bias injected by the suspicious attacker during the fine-tuning process. A core challenge of purifying potentially poisonous PLMs is precisely finding poisonous dimensions. To settle this issue, we propose the Fine-purifying approach, which utilizes the diffusion theory to study the dynamic process of fine-tuning for finding potentially poisonous dimensions. According to the relationship between parameter drifts and Hessians of different dimensions, we can detect poisonous dimensions with abnormal dynamics, purify them by resetting them to clean pre-trained weights, and then fine-tune the purified weights on a small clean dataset. To the best of our knowledge, we are the first to study the dynamics guided by the diffusion theory for safety or defense purposes. Experimental results validate the effectiveness of Fine-purifying even with a small clean dataset., Comment: Accepted by Findings of ACL 2023
- Published
- 2023
79. Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
- Author
-
Ren, Shuhuai, Zhang, Aston, Zhu, Yi, Zhang, Shuai, Zheng, Shuai, Li, Mu, Smola, Alex, and Sun, Xu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
This work proposes POMP, a prompt pre-training method for vision-language models. Being memory and computation efficient, POMP enables the learned prompt to condense semantic information for a rich set of visual concepts with over twenty-thousand classes. Once pre-trained, the prompt with a strong transferable ability can be directly plugged into a variety of visual recognition tasks including image classification, semantic segmentation, and object detection, to boost recognition performances in a zero-shot manner. Empirical evaluation shows that POMP achieves state-of-the-art performances on 21 datasets, e.g., 67.0% average accuracy on 10 classification datasets (+3.1% compared to CoOp) and 84.4 hIoU on open-vocabulary Pascal VOC segmentation (+6.9 compared to ZSSeg). Our code is available at https://github.com/amazon-science/prompt-pretraining., Comment: Code is available at https://github.com/amazon-science/prompt-pretraining
- Published
- 2023
80. Equilibrium distribution and diffusion of mixed hydrogen-methane gas in gravity field
- Author
-
Peng, Shiyao, He, Qiao, Peng, Ducheng, Ouyang, Xin, Zhang, Xiaorui, Chai, Chong, Zhang, Lianlai, Sun, Xu, Deng, Huiqiu, Hu, Wangyu, and Hou, Jie
- Subjects
Condensed Matter - Statistical Mechanics - Abstract
Repurposing existing natural gas pipelines is a promising solution for large-scale transportation of mixed hydrogen-methane gas. However, it remains debatable whether gravitational stratification can notably affect hydrogen partial pressure in the gas mixture. To address this issue, we combined molecular dynamics simulation with thermodynamic and diffusion theories. Our study systematically examined the equilibrium distribution of hydrogen-methane mixtures in gravity fields. We demonstrated that partial pressures of both gases decrease with altitude, with hydrogen showing slower decrease due to its smaller molar mass. As a result, the volume fraction of hydrogen is maximized at the top end of pipes. The stratification is more favorable at low temperature and large altitude drops, with notable gas stratification only occurring at extremely large drops in altitude, being generally negligible even at a drop of 1500 m. Furthermore, we showed that the diffusion time required to achieve the equilibrium distribution is proportional to gas pressure and the square of pipeline height. This requires approximately 300 years for a 1500 m pipeline at 1 bar. Therefore, temporary interruptions in pipeline gas transportation will not cause visible stratification. Our work clarifies the effect of gravity on hydrogen-methane gas mixtures and provides quantitative insights into assessing the stratification of gas mixtures in pipelines., Comment: 14 pages, 8 figures
- Published
- 2023
81. Robust adaptive beamforming method for active sonar in single snapshot
- Author
-
Sun Xu and Li Ranwei
- Subjects
Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Forming narrow beams is a useful way for active sonar to anti-reverberation when it works in the shallow water. High-resolution adaptive beamforming with the performance of narrow beamwidths and low sidelobe levels is a better and more efficient method, particularly in the scenario where the installation space for sonar array is limited, such as hull-mounted sonar. Due to the short duration of target echo signal in the complex and varying acoustic channel, conventional adaptive beamforming methods are invalid. Therefore, this paper proposes a robust adaptive beamforming method for active sonar in single snapshot, also called the steered dominant mode rejection (STDMR). Firstly, STDMR steered the sample covariance matrix (STCM) based on wide-band focusing, which the needed number of snapshots is greatly reduced. Secondly, by partial eigendecomposition, the large eigenvalues of the STCM which are greater than the noise energy and their eigenvectors are used for dominant mode rejection (DMR). DMR is a typical eigenspace-based algorithm which has small computational load and fast convergence speed. Finally, modified with the methods of diagonal loading of 3-5dB over the noise energy and signal mismatch protection, improved the robustness of this method. Simulation and experimental data analysis shows that the STDMR method achieves narrow beams and low-level sidelobes in single snapshot. Hence, the STDMR beamformer is an appropriate implementation to use for active sonar detection systems.
- Published
- 2019
- Full Text
- View/download PDF
82. Can digital technology promote the equalization of regional basic public services?
- Author
-
Han, Chaoliang, Sun, Xu, and Liu, Mingyu
- Published
- 2024
- Full Text
- View/download PDF
83. Exploring the effect of supervisor bottom-line mentality on subordinate work well-being: a self-determination theory perspective
- Author
-
Zhao, Nan, He, Bin, Sun, Xu, and Hu, Weimin
- Published
- 2024
- Full Text
- View/download PDF
84. Fine-Tuning Deteriorates General Textual Out-of-Distribution Detection by Distorting Task-Agnostic Features
- Author
-
Chen, Sishuo, Yang, Wenkai, Bi, Xiaohan, and Sun, Xu
- Subjects
Computer Science - Computation and Language - Abstract
Detecting out-of-distribution (OOD) inputs is crucial for the safe deployment of natural language processing (NLP) models. Though existing methods, especially those based on the statistics in the feature space of fine-tuned pre-trained language models (PLMs), are claimed to be effective, their effectiveness on different types of distribution shifts remains underexplored. In this work, we take the first step to comprehensively evaluate the mainstream textual OOD detection methods for detecting semantic and non-semantic shifts. We find that: (1) no existing method behaves well in both settings; (2) fine-tuning PLMs on in-distribution data benefits detecting semantic shifts but severely deteriorates detecting non-semantic shifts, which can be attributed to the distortion of task-agnostic features. To alleviate the issue, we present a simple yet effective general OOD score named GNOME that integrates the confidence scores derived from the task-agnostic and task-specific representations. Experiments show that GNOME works well in both semantic and non-semantic shift scenarios, and further brings significant improvement on two cross-task benchmarks where both kinds of shifts simultaneously take place. Our code is available at https://github.com/lancopku/GNOME., Comment: Findings of EACL 2023
- Published
- 2023
85. When to Trust Aggregated Gradients: Addressing Negative Client Sampling in Federated Learning
- Author
-
Yang, Wenkai, Lin, Yankai, Zhao, Guangxiang, Li, Peng, Zhou, Jie, and Sun, Xu
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security - Abstract
Federated Learning has become a widely-used framework which allows learning a global model on decentralized local datasets under the condition of protecting local data privacy. However, federated learning faces severe optimization difficulty when training samples are not independently and identically distributed (non-i.i.d.). In this paper, we point out that the client sampling practice plays a decisive role in the aforementioned optimization difficulty. We find that the negative client sampling will cause the merged data distribution of currently sampled clients heavily inconsistent with that of all available clients, and further make the aggregated gradient unreliable. To address this issue, we propose a novel learning rate adaptation mechanism to adaptively adjust the server learning rate for the aggregated gradient in each round, according to the consistency between the merged data distribution of currently sampled clients and that of all available clients. Specifically, we make theoretical deductions to find a meaningful and robust indicator that is positively related to the optimal server learning rate and can effectively reflect the merged data distribution of sampled clients, and we utilize it for the server learning rate adaptation. Extensive experiments on multiple image and text classification tasks validate the great effectiveness of our method.
- Published
- 2023
86. Integrating Local Real Data with Global Gradient Prototypes for Classifier Re-Balancing in Federated Long-Tailed Learning
- Author
-
Yang, Wenkai, Chen, Deli, Zhou, Hao, Meng, Fandong, Zhou, Jie, and Sun, Xu
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Federated Learning (FL) has become a popular distributed learning paradigm that involves multiple clients training a global model collaboratively in a data privacy-preserving manner. However, the data samples usually follow a long-tailed distribution in the real world, and FL on the decentralized and long-tailed data yields a poorly-behaved global model severely biased to the head classes with the majority of the training samples. To alleviate this issue, decoupled training has recently been introduced to FL, considering it has achieved promising results in centralized long-tailed learning by re-balancing the biased classifier after the instance-balanced training. However, the current study restricts the capacity of decoupled training in federated long-tailed learning with a sub-optimal classifier re-trained on a set of pseudo features, due to the unavailability of a global balanced dataset in FL. In this work, in order to re-balance the classifier more effectively, we integrate the local real data with the global gradient prototypes to form the local balanced datasets, and thus re-balance the classifier during the local training. Furthermore, we introduce an extra classifier in the training phase to help model the global data distribution, which addresses the problem of contradictory optimization goals caused by performing classifier re-balancing locally. Extensive experiments show that our method consistently outperforms the existing state-of-the-art methods in various settings.
- Published
- 2023
87. Nonlinear conjugate gradient methods: worst-case convergence rates via computer-assisted analyses
- Author
-
Gupta, Shuvomoy Das, Freund, Robert M., Sun, Xu Andy, and Taylor, Adrien
- Subjects
Mathematics - Optimization and Control - Abstract
We propose a computer-assisted approach to the analysis of the worst-case convergence of nonlinear conjugate gradient methods (NCGMs). Those methods are known for their generally good empirical performances for large-scale optimization, while having relatively incomplete analyses. Using our computer-assisted approach, we establish novel complexity bounds for the Polak-Ribi\`ere-Polyak (PRP) and the Fletcher-Reeves (FR) NCGMs for smooth strongly convex minimization. In particular, we construct mathematical proofs that establish the first non-asymptotic convergence bound for FR (which is historically the first developed NCGM), and a much improved non-asymptotic convergence bound for PRP. Additionally, we provide simple adversarial examples on which these methods do not perform better than gradient descent with exact line search, leaving very little room for improvements on the same class of problems., Comment: Published in Mathematical Programming Series A. DOI: https://doi.org/10.1007/s10107-024-02127-7
- Published
- 2023
88. A Survey on In-context Learning
- Author
-
Dong, Qingxiu, Li, Lei, Dai, Damai, Zheng, Ce, Ma, Jingyuan, Li, Rui, Xia, Heming, Xu, Jingjing, Wu, Zhiyong, Liu, Tianyu, Chang, Baobao, Sun, Xu, and Sui, Zhifang
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
With the increasing capabilities of large language models (LLMs), in-context learning (ICL) has emerged as a new paradigm for natural language processing (NLP), where LLMs make predictions based on contexts augmented with a few examples. It has been a significant trend to explore ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress and challenges of ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques, including training strategies, prompt designing strategies, and related analysis. Additionally, we explore various ICL application scenarios, such as data engineering and knowledge updating. Finally, we address the challenges of ICL and suggest potential directions for further research. We hope that our work can encourage more research on uncovering how ICL works and improving ICL., Comment: Update
- Published
- 2022
89. Utility of MRI-based vertebral bone quality scores and CT-based Hounsfield unit values in vertebral bone mineral density assessment for patients with diffuse idiopathic skeletal hyperostosis
- Author
-
Chen, Haojie, Zhu, Xiufen, Zhou, Qingshuang, Pu, Xiaojiang, Wang, Bin, Lin, Hua, Zhu, Zezhang, Qiu, Yong, and Sun, Xu
- Published
- 2024
- Full Text
- View/download PDF
90. Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
- Author
-
Liu, Fenglin, Wu, Xian, You, Chenyu, Ge, Shen, Zou, Yuexian, and Sun, Xu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Training supervised video captioning model requires coupled video-caption pairs. However, for many targeted languages, sufficient paired data are not available. To this end, we introduce the unpaired video captioning task aiming to train models without coupled video-caption pairs in target language. To solve the task, a natural choice is to employ a two-step pipeline system: first utilizing video-to-pivot captioning model to generate captions in pivot language and then utilizing pivot-to-target translation model to translate the pivot captions to the target language. However, in such a pipeline system, 1) visual information cannot reach the translation model, generating visual irrelevant target captions; 2) the errors in the generated pivot captions will be propagated to the translation model, resulting in disfluent target captions. To address these problems, we propose the Unpaired Video Captioning with Visual Injection system (UVC-VI). UVC-VI first introduces the Visual Injection Module (VIM), which aligns source visual and target language domains to inject the source visual information into the target language domain. Meanwhile, VIM directly connects the encoder of the video-to-pivot model and the decoder of the pivot-to-target model, allowing end-to-end inference by completely skipping the generation of pivot captions. To enhance the cross-modality injection of the VIM, UVC-VI further introduces a pluggable video encoder, i.e., Multimodal Collaborative Encoder (MCE). The experiments show that UVC-VI outperforms pipeline systems and exceeds several supervised systems. Furthermore, equipping existing supervised systems with our MCE can achieve 4% and 7% relative margins on the CIDEr scores to current state-of-the-art models on the benchmark MSVD and MSR-VTT datasets, respectively., Comment: Published at IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
- Published
- 2022
91. Gradient Knowledge Distillation for Pre-trained Language Models
- Author
-
Wang, Lean, Li, Lei, and Sun, Xu
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Knowledge distillation (KD) is an effective framework to transfer knowledge from a large-scale teacher to a compact yet well-performing student. Previous KD practices for pre-trained language models mainly transfer knowledge by aligning instance-wise outputs between the teacher and student, while neglecting an important knowledge source, i.e., the gradient of the teacher. The gradient characterizes how the teacher responds to changes in inputs, which we assume is beneficial for the student to better approximate the underlying mapping function of the teacher. Therefore, we propose Gradient Knowledge Distillation (GKD) to incorporate the gradient alignment objective into the distillation process. Experimental results show that GKD outperforms previous KD methods regarding student performance. Further analysis shows that incorporating gradient knowledge makes the student behave more consistently with the teacher, improving the interpretability greatly., Comment: Accepted by NeurIPS ENLSP 2022 workshop(spotlight)
- Published
- 2022
92. DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention
- Author
-
Liu, Fenglin, Wu, Xian, Ge, Shen, Ren, Xuancheng, Fan, Wei, Sun, Xu, and Zou, Yuexian
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Vision-and-language (V-L) tasks require the system to understand both vision content and natural language, thus learning fine-grained joint representations of vision and language (a.k.a. V-L representations) is of paramount importance. Recently, various pre-trained V-L models are proposed to learn V-L representations and achieve improved results in many tasks. However, the mainstream models process both vision and language inputs with the same set of attention matrices. As a result, the generated V-L representations are entangled in one common latent space. To tackle this problem, we propose DiMBERT (short for Disentangled Multimodal-Attention BERT), which is a novel framework that applies separated attention spaces for vision and language, and the representations of multi-modalities can thus be disentangled explicitly. To enhance the correlation between vision and language in disentangled spaces, we introduce the visual concepts to DiMBERT which represent visual information in textual format. In this manner, visual concepts help to bridge the gap between the two modalities. We pre-train DiMBERT on a large amount of image-sentence pairs on two tasks: bidirectional language modeling and sequence-to-sequence language modeling. After pre-train, DiMBERT is further fine-tuned for the downstream tasks. Experiments show that DiMBERT sets new state-of-the-art performance on three tasks (over four datasets), including both generation tasks (image captioning and visual storytelling) and classification tasks (referring expressions). The proposed DiM (short for Disentangled Multimodal-Attention) module can be easily incorporated into existing pre-trained V-L models to boost their performance, up to a 5% increase on the representative task. Finally, we conduct a systematic analysis and demonstrate the effectiveness of our DiM and the introduced visual concepts., Comment: Published in ACM TKDD2022 (ACM Transactions on Knowledge Discovery from Data)
- Published
- 2022
93. Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine
- Author
-
Liu, Fenglin, Yang, Bang, You, Chenyu, Wu, Xian, Ge, Shen, Liu, Zhangdaihong, Sun, Xu, Yang, Yang, and Clifton, David A.
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Language models (LMs), including large language models (such as ChatGPT), have the potential to assist clinicians in generating various clinical notes. However, LMs are prone to produce ``hallucinations'', i.e., generated content that is not aligned with facts and knowledge. In this paper, we propose the Re$^3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning to enable LMs to generate faithful clinical texts. We demonstrate the effectiveness of our method in generating patient discharge instructions. It requires the LMs not to only understand the patients' long clinical documents, i.e., the health records during hospitalization, but also to generate critical instructional information provided both to carers and to the patient at the time of discharge. The proposed Re$^3$Writer imitates the working patterns of physicians to first \textbf{re}trieve related working experience from historical instructions written by physicians, then \textbf{re}ason related medical knowledge. Finally, it \textbf{re}fines the retrieved working experience and reasoned medical knowledge to extract useful information, which is used to generate the discharge instructions for previously-unseen patients. Our experiments show that, using our method, the performance of five representative LMs can be substantially boosted across all metrics. Meanwhile, we show results from human evaluations to measure the effectiveness in terms of fluency, faithfulness, and comprehensiveness.
- Published
- 2022
94. Prophet Attention: Predicting Attention with Future Attention for Image Captioning
- Author
-
Liu, Fenglin, Ren, Xuancheng, Wu, Xian, Fan, Wei, Zou, Yuexian, and Sun, Xu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Recently, attention based models have been used extensively in many sequence-to-sequence learning systems. Especially for image captioning, the attention based models are expected to ground correct image regions with proper generated words. However, for each time step in the decoding process, the attention based models usually use the hidden state of the current input to attend to the image regions. Under this setting, these attention models have a "deviated focus" problem that they calculate the attention weights based on previous words instead of the one to be generated, impairing the performance of both grounding and captioning. In this paper, we propose the Prophet Attention, similar to the form of self-supervision. In the training stage, this module utilizes the future information to calculate the "ideal" attention weights towards image regions. These calculated "ideal" weights are further used to regularize the "deviated" attention. In this manner, image regions are grounded with the correct words. The proposed Prophet Attention can be easily incorporated into existing image captioning models to improve their performance of both grounding and captioning. The experiments on the Flickr30k Entities and the MSCOCO datasets show that the proposed Prophet Attention consistently outperforms baselines in both automatic metrics and human evaluations. It is worth noticing that we set new state-of-the-arts on the two benchmark datasets and achieve the 1st place on the leaderboard of the online MSCOCO benchmark in terms of the default ranking score, i.e., CIDEr-c40., Comment: Accepted by NeurIPS 2020
- Published
- 2022
95. Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models
- Author
-
Zhang, Zhiyuan, Lyu, Lingjuan, Ma, Xingjun, Wang, Chenguang, and Sun, Xu
- Subjects
Computer Science - Computation and Language ,Computer Science - Cryptography and Security ,Computer Science - Machine Learning - Abstract
Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks. In Natural Language Processing (NLP), DNNs are often backdoored during the fine-tuning process of a large-scale Pre-trained Language Model (PLM) with poisoned samples. Although the clean weights of PLMs are readily available, existing methods have ignored this information in defending NLP models against backdoor attacks. In this work, we take the first step to exploit the pre-trained (unfine-tuned) weights to mitigate backdoors in fine-tuned language models. Specifically, we leverage the clean pre-trained weights via two complementary techniques: (1) a two-step Fine-mixing technique, which first mixes the backdoored weights (fine-tuned on poisoned data) with the pre-trained weights, then fine-tunes the mixed weights on a small subset of clean data; (2) an Embedding Purification (E-PUR) technique, which mitigates potential backdoors existing in the word embeddings. We compare Fine-mixing with typical backdoor mitigation methods on three single-sentence sentiment classification tasks and two sentence-pair classification tasks and show that it outperforms the baselines by a considerable margin in all scenarios. We also show that our E-PUR method can benefit existing mitigation methods. Our work establishes a simple but strong baseline defense for secure fine-tuned NLP models against backdoor attacks., Comment: Accepted by Findings of EMNLP 2022
- Published
- 2022
96. On Distributionally Robust Multistage Convex Optimization: Data-driven Models and Performance
- Author
-
Zhang, Shixuan and Sun, Xu Andy
- Subjects
Mathematics - Optimization and Control - Abstract
This paper presents a novel algorithmic study with extensive numerical experiments of distributionally robust multistage convex optimization (DR-MCO). Following the previous work on dual dynamic programming (DDP) algorithmic framework for DR-MCO, we focus on data-driven DR-MCO models with Wasserstein ambiguity sets that allow probability measures with infinite supports. These data-driven Wasserstein DR-MCO models have out-of-sample performance guarantees and adjustable in-sample conservatism. Then by exploiting additional concavity or convexity in the uncertain cost functions, we design exact single stage subproblem oracle (SSSO) implementations that ensure the convergence of DDP algorithms. We test the data-driven Wasserstein DR-MCO models against multistage robust convex optimization (MRCO), risk-neutral and risk-averse multistage stochastic convex optimization (MSCO) models on multi-commodity inventory problems and hydro-thermal power planning problems. The results show that our DR-MCO models could outperform MRCO and MSCO models when the data size is small.
- Published
- 2022
97. Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks
- Author
-
Chen, Sishuo, Yang, Wenkai, Zhang, Zhiyuan, Bi, Xiaohan, and Sun, Xu
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Natural language processing (NLP) models are known to be vulnerable to backdoor attacks, which poses a newly arisen threat to NLP models. Prior online backdoor defense methods for NLP models only focus on the anomalies at either the input or output level, still suffering from fragility to adaptive attacks and high computational cost. In this work, we take the first step to investigate the unconcealment of textual poisoned samples at the intermediate-feature level and propose a feature-based efficient online defense method. Through extensive experiments on existing attacking methods, we find that the poisoned samples are far away from clean samples in the intermediate feature space of a poisoned NLP model. Motivated by this observation, we devise a distance-based anomaly score (DAN) to distinguish poisoned samples from clean samples at the feature level. Experiments on sentiment analysis and offense detection tasks demonstrate the superiority of DAN, as it substantially surpasses existing online defense methods in terms of defending performance and enjoys lower inference costs. Moreover, we show that DAN is also resistant to adaptive attacks based on feature-level regularization. Our code is available at https://github.com/lancopku/DAN., Comment: Findings of EMNLP 2022
- Published
- 2022
98. Holistic Sentence Embeddings for Better Out-of-Distribution Detection
- Author
-
Chen, Sishuo, Bi, Xiaohan, Gao, Rundong, and Sun, Xu
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Detecting out-of-distribution (OOD) instances is significant for the safe deployment of NLP models. Among recent textual OOD detection works based on pretrained language models (PLMs), distance-based methods have shown superior performance. However, they estimate sample distance scores in the last-layer CLS embedding space and thus do not make full use of linguistic information underlying in PLMs. To address the issue, we propose to boost OOD detection by deriving more holistic sentence embeddings. On the basis of the observations that token averaging and layer combination contribute to improving OOD detection, we propose a simple embedding approach named Avg-Avg, which averages all token representations from each intermediate layer as the sentence embedding and significantly surpasses the state-of-the-art on a comprehensive suite of benchmarks by a 9.33% FAR95 margin. Furthermore, our analysis demonstrates that it indeed helps preserve general linguistic knowledge in fine-tuned PLMs and substantially benefits detecting background shifts. The simple yet effective embedding method can be applied to fine-tuned PLMs with negligible extra costs, providing a free gain in OOD detection. Our code is available at https://github.com/lancopku/Avg-Avg., Comment: Findings of EMNLP 2022
- Published
- 2022
99. GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization
- Author
-
Zhang, Zhiyuan, Luo, Ruixuan, Su, Qi, and Sun, Xu
- Subjects
Computer Science - Machine Learning - Abstract
Recently, Sharpness-Aware Minimization (SAM) algorithm has shown state-of-the-art generalization abilities in vision tasks. It demonstrates that flat minima tend to imply better generalization abilities. However, it has some difficulty implying SAM to some natural language tasks, especially to models with drastic gradient changes, such as RNNs. In this work, we analyze the relation between the flatness of the local minimum and its generalization ability from a novel and straightforward theoretical perspective. We propose that the shift of the training and test distributions can be equivalently seen as a virtual parameter corruption or perturbation, which can explain why flat minima that are robust against parameter corruptions or perturbations have better generalization performances. On its basis, we propose a Gradient-Strength based Adaptive Sharpness-Aware Minimization (GA-SAM) algorithm to help to learn algorithms find flat minima that generalize better. Results in various language benchmarks validate the effectiveness of the proposed GA-SAM algorithm on natural language tasks., Comment: Accepted by EMNLP 2022
- Published
- 2022
100. Dim-Krum: Backdoor-Resistant Federated Learning for NLP with Dimension-wise Krum-Based Aggregation
- Author
-
Zhang, Zhiyuan, Su, Qi, and Sun, Xu
- Subjects
Computer Science - Machine Learning ,Computer Science - Cryptography and Security - Abstract
Despite the potential of federated learning, it is known to be vulnerable to backdoor attacks. Many robust federated aggregation methods are proposed to reduce the potential backdoor risk. However, they are mainly validated in the CV field. In this paper, we find that NLP backdoors are hard to defend against than CV, and we provide a theoretical analysis that the malicious update detection error probabilities are determined by the relative backdoor strengths. NLP attacks tend to have small relative backdoor strengths, which may result in the failure of robust federated aggregation methods for NLP attacks. Inspired by the theoretical results, we can choose some dimensions with higher backdoor strengths to settle this issue. We propose a novel federated aggregation algorithm, Dim-Krum, for NLP tasks, and experimental results validate its effectiveness., Comment: Accepted by Findings of EMNLP 2022
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.