Author: "Wang, Zifeng" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wang, Zifeng"' showing total 1,137 results

Start Over Author "Wang, Zifeng"

1,137 results on '"Wang, Zifeng"'

1. SynRL: Aligning Synthetic Clinical Trial Data with Human-preferred Clinical Endpoints Using Reinforcement Learning

Author: Das, Trisha, Wang, Zifeng, Shafquat, Afrah, Beigi, Mandis, Mezey, Jason, and Sun, Jimeng
Subjects: Computer Science - Machine Learning
Abstract: Each year, hundreds of clinical trials are conducted to evaluate new medical interventions, but sharing patient records from these trials with other institutions can be challenging due to privacy concerns and federal regulations. To help mitigate privacy concerns, researchers have proposed methods for generating synthetic patient data. However, existing approaches for generating synthetic clinical trial data disregard the usage requirements of these data, including maintaining specific properties of clinical outcomes, and only use post hoc assessments that are not coupled with the data generation process. In this paper, we propose SynRL which leverages reinforcement learning to improve the performance of patient data generators by customizing the generated data to meet the user-specified requirements for synthetic data outcomes and endpoints. Our method includes a data value critic function to evaluate the quality of the generated data and uses reinforcement learning to align the data generator with the users' needs based on the critic's feedback. We performed experiments on four clinical trial datasets and demonstrated the advantages of SynRL in improving the quality of the generated synthetic data while keeping the privacy risks low. We also show that SynRL can be utilized as a general framework that can customize data generation of multiple types of synthetic data generators. Our code is available at https://anonymous.4open.science/r/SynRL-DB0F/.
Published: 2024

2. A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges

Author: Wang, Zifeng, Wang, Hanyin, Danek, Benjamin, Li, Ying, Mack, Christina, Poon, Hoifung, Wang, Yajun, Rajpurkar, Pranav, and Sun, Jimeng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The integration of Large Language Models (LLMs) into medical applications has sparked widespread interest across the healthcare industry, from drug discovery and development to clinical decision support, assisting telemedicine, medical devices, and healthcare insurance applications. This perspective paper aims to discuss the inner workings of building LLM-powered medical AI applications and introduces a comprehensive framework for their development. We review existing literature and outline the unique challenges of applying LLMs in specialized medical contexts. Additionally, we introduce a three-step framework to organize medical LLM research activities: 1) Modeling: breaking down complex medical workflows into manageable steps for developing medical-specific models; 2) Optimization: optimizing the model performance with crafted prompts and integrating external knowledge and tools, and 3) System engineering: decomposing complex tasks into subtasks and leveraging human expertise for building medical AI applications. Furthermore, we offer a detailed use case playbook that describes various LLM-powered medical AI applications, such as optimizing clinical trial design, enhancing clinical decision support, and advancing medical imaging analysis. Finally, we discuss various challenges and considerations for building medical AI applications with LLMs, such as handling hallucination issues, data ownership and compliance, privacy, intellectual property considerations, compute cost, sustainability issues, and responsible AI requirements.
Published: 2024

3. Can Large Language Models Replace Data Scientists in Clinical Research?

Author: Wang, Zifeng, Danek, Benjamin, Yang, Ziwei, Chen, Zheng, and Sun, Jimeng
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Quantitative Biology - Genomics, Quantitative Biology - Quantitative Methods
Abstract: Data science plays a critical role in clinical research, but it requires professionals with expertise in coding and medical data analysis. Large language models (LLMs) have shown great potential in supporting medical tasks and performing well in general coding tests. However, these tests do not assess LLMs' ability to handle data science tasks in medicine, nor do they explore their practical utility in clinical research. To address this, we developed a dataset consisting of 293 real-world data science coding tasks, based on 39 published clinical studies, covering 128 tasks in Python and 165 tasks in R. This dataset simulates realistic clinical research scenarios using patient data. Our findings reveal that cutting-edge LLMs struggle to generate perfect solutions, frequently failing to follow input instructions, understand target data, and adhere to standard analysis practices. Consequently, LLMs are not yet ready to fully automate data science tasks. We benchmarked advanced adaptation methods and found two to be particularly effective: chain-of-thought prompting, which provides a step-by-step plan for data analysis, which led to a 60% improvement in code accuracy; and self-reflection, enabling LLMs to iteratively refine their code, yielding a 38% accuracy improvement. Building on these insights, we developed a platform that integrates LLMs into the data science workflow for medical professionals. In a user study with five medical doctors, we found that while LLMs cannot fully automate coding tasks, they significantly streamline the programming process. We found that 80% of their submitted code solutions were incorporated from LLM-generated code, with up to 96% reuse in some cases. Our analysis highlights the potential of LLMs, when integrated into expert workflows, to enhance data science efficiency in clinical research.
Published: 2024

4. Demystifying Large Language Models for Medicine: A Primer

Author: Jin, Qiao, Wan, Nicholas, Leaman, Robert, Tian, Shubo, Wang, Zhizheng, Yang, Yifan, Wang, Zifeng, Xiong, Guangzhi, Lai, Po-Ting, Zhu, Qingqing, Hou, Benjamin, Sarfo-Gyamfi, Maame, Zhang, Gongbo, Gilson, Aidan, Bhasuran, Balu, He, Zhe, Zhang, Aidong, Sun, Jimeng, Weng, Chunhua, Summers, Ronald M., Chen, Qingyu, Peng, Yifan, and Lu, Zhiyong
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare by generating human-like responses across diverse contexts and adapting to novel tasks following human instructions. Their potential application spans a broad range of medical tasks, such as clinical documentation, matching patients to clinical trials, and answering medical questions. In this primer paper, we propose an actionable guideline to help healthcare professionals more efficiently utilize LLMs in their work, along with a set of best practices. This approach consists of several main phases, including formulating the task, choosing LLMs, prompt engineering, fine-tuning, and deployment. We start with the discussion of critical considerations in identifying healthcare tasks that align with the core capabilities of LLMs and selecting models based on the selected task and data, performance requirements, and model interface. We then review the strategies, such as prompt engineering and fine-tuning, to adapt standard LLMs to specialized medical tasks. Deployment considerations, including regulatory compliance, ethical guidelines, and continuous monitoring for fairness and bias, are also discussed. By providing a structured step-by-step methodology, this tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice, ensuring that these powerful technologies are applied in a safe, reliable, and impactful manner., Comment: Under review
Published: 2024

5. OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities

Author: Chen, Lichang, Hu, Hexiang, Zhang, Mingda, Chen, Yiwen, Wang, Zifeng, Li, Yandong, Shyam, Pranav, Zhou, Tianyi, Huang, Heng, Yang, Ming-Hsuan, and Gong, Boqing
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Multimedia
Abstract: We introduce OmnixR, an evaluation suite designed to benchmark SoTA Omni-modality Language Models, such as GPT-4o and Gemini. Evaluating OLMs, which integrate multiple modalities such as text, vision, and audio, presents unique challenges. Particularly, the user message might often consist of multiple modalities, such that OLMs have to establish holistic understanding and reasoning across modalities to accomplish the task. Existing benchmarks are limited to single modality or dual-modality tasks, overlooking comprehensive multi-modal assessments of model reasoning. To address this, OmnixR offers two evaluation variants: (1)synthetic subset: a synthetic dataset generated automatically by translating text into multiple modalities--audio, images, video, and hybrids (Omnify). (2)realistic subset: a real-world dataset, manually curated and annotated by experts, for evaluating cross-modal reasoning in natural settings. OmnixR presents a unique evaluation towards assessing OLMs over a diverse mix of modalities, such as a question that involves video, audio, and text, providing a rigorous cross-modal reasoning testbed unlike any existing benchmarks. Our experiments find that all state-of-the-art OLMs struggle with OmnixR questions that require integrating information from multiple modalities to answer. Further analysis highlights differences in reasoning behavior, underscoring the challenges of omni-modal AI alignment., Comment: 19 pages, 6 figures, 12 tables
Published: 2024

6. Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Author: Xu, Wenda, Han, Rujun, Wang, Zifeng, Le, Long T., Madeka, Dhruv, Li, Lei, Wang, William Yang, Agarwal, Rishabh, Lee, Chen-Yu, and Pfister, Tomas
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Recent advances in knowledge distillation (KD) have enabled smaller student models to approach the performance of larger teacher models. However, popular methods such as supervised KD and on-policy KD, are adversely impacted by the knowledge gaps between teacher-student in practical scenarios. Supervised KD suffers from a distribution mismatch between training with a static dataset and inference over final student-generated outputs. Conversely, on-policy KD, which uses student-generated samples for training, can suffer from low-quality training examples with which teacher models are not familiar, resulting in inaccurate teacher feedback. To address these limitations, we introduce Speculative Knowledge Distillation (SKD), a novel approach that leverages cooperation between student and teacher models to generate high-quality training data on-the-fly while aligning with the student's inference-time distribution. In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution, transferring high-quality knowledge adaptively. We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following, and show that SKD consistently outperforms existing KD methods across different domains, data sizes, and model initialization strategies.
Published: 2024

7. Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence

Author: Feng, Shangbin, Wang, Zifeng, Wang, Yike, Ebrahimi, Sayna, Palangi, Hamid, Miculicich, Lesly, Kulshrestha, Achin, Rauschmayr, Nathalie, Choi, Yejin, Tsvetkov, Yulia, Lee, Chen-Yu, and Pfister, Tomas
Subjects: Computer Science - Computation and Language
Abstract: We propose Model Swarms, a collaborative search algorithm to adapt LLMs via swarm intelligence, the collective behavior guiding individual systems. Specifically, Model Swarms starts with a pool of LLM experts and a utility function. Guided by the best-found checkpoints across models, diverse LLM experts collaboratively move in the weight space and optimize a utility function representing model adaptation objectives. Compared to existing model composition approaches, Model Swarms offers tuning-free model adaptation, works in low-data regimes with as few as 200 examples, and does not require assumptions about specific experts in the swarm or how they should be composed. Extensive experiments demonstrate that Model Swarms could flexibly adapt LLM experts to a single task, multi-task domains, reward models, as well as diverse human interests, improving over 12 model composition baselines by up to 21.0% across tasks and contexts. Further analysis reveals that LLM experts discover previously unseen capabilities in initial checkpoints and that Model Swarms enable the weak-to-strong transition of experts through the collaborative search process.
Published: 2024

8. TableRAG: Million-Token Table Understanding with Language Models

Author: Chen, Si-An, Miculicich, Lesly, Eisenschlos, Julian Martin, Wang, Zifeng, Wang, Zilong, Chen, Yanfei, Fujii, Yasuhisa, Lin, Hsuan-Tien, Lee, Chen-Yu, and Pfister, Tomas
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Recent advancements in language models (LMs) have notably enhanced their ability to reason with tabular data, primarily through program-aided mechanisms that manipulate and analyze tables. However, these methods often require the entire table as input, leading to scalability challenges due to the positional bias or context length constraints. In response to these challenges, we introduce TableRAG, a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs. This enables more efficient data encoding and precise retrieval, significantly reducing prompt lengths and mitigating information loss. We have developed two new million-token benchmarks from the Arcade and BIRD-SQL datasets to thoroughly evaluate TableRAG's effectiveness at scale. Our results demonstrate that TableRAG's retrieval design achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding., Comment: Accepted to NeurIPS 2024
Published: 2024

9. Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

Author: Wang, Zilong, Wang, Zifeng, Le, Long, Zheng, Huaixiu Steven, Mishra, Swaroop, Perot, Vincent, Zhang, Yuwei, Mattapalli, Anush, Taly, Ankur, Shang, Jingbo, Lee, Chen-Yu, and Pfister, Tomas
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval outcomes through iterative LLM refinement or self-critique capabilities acquired through additional instruction tuning of LLMs. In this work, we introduce Speculative RAG - a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM. Each draft is generated from a distinct subset of retrieved documents, offering diverse perspectives on the evidence while reducing input token counts per draft. This approach enhances comprehension of each subset and mitigates potential position bias over long context. Our method accelerates RAG by delegating drafting to the smaller specialist LM, with the larger generalist LM performing a single verification pass over the drafts. Extensive experiments demonstrate that Speculative RAG achieves state-of-the-art performance with reduced latency on TriviaQA, MuSiQue, PubHealth, and ARC-Challenge benchmarks. It notably enhances accuracy by up to 12.97% while reducing latency by 51% compared to conventional RAG systems on PubHealth., Comment: Preprint
Published: 2024

10. Off-site production of plasma-activated water for efficient sterilization: the crucial role of high-valence NOx and new chemical pathways

Author: Wang, Zifeng, Wang, Xiangyu, Xu, Shenghang, Zhou, Renwu, Zhang, Mingyan, Li, Wanchun, Zhang, Zizhu, Wang, Luge, Chen, Jinkun, Zhang, Jishen, Guo, Li, Pei, Dandan, Liu, Dingxin, and Rong, Mingzhe
Subjects: Physics - Plasma Physics
Abstract: Efficient sterilization of pathogens with cleaner methods is a critical concern for environmental disinfection and clinical anti-infective treatment. Plasma-activated water (PAW) is a promising alternative to chemical disinfectants and antibiotics for its strong sterilization ability and not inducing any acute toxicity, and only water and air are consumed during production. For more efficient water activation, plasma sources are commonly placed near or fully in contact with water as possible, but the risks of electrode corrosion and metal contamination of water threaten the safety and stability of PAW production. Herein, plasma-activated gas rich in high-valence NOx is generated by a hybrid plasma configuration and introduced into water for off-site PAW production. Plasma-generated O3 is found to dominate the gas-phase reactions for the formation of high-valence NOx. With the time-evolution of O3 concentration, gaseous NO3 radicals are produced behind N2O5 formation, but will be decomposed before N2O5 quenching. By decoupling the roles of gaseous NO3, N2O5, and O3 in the water activation, results show that short-lived aqueous species induced by gaseous NO3 radicals play the most crucial role in PAW sterilization, and the acidic environment induced by N2O5 is also essential. Moreover, SEM photographs and biomacromolecule leakage assays demonstrate that PAW disrupts the cell membranes of bacteria to achieve inactivation. In real-life applications, an integrated device for off-site PAW production with a yield of 2 L/h and a bactericidal efficiency of >99.9% is developed. The PAW of 50mL produced in 3 minutes using this device is more effective in disinfection than 0.5% NaClO and 3% H2O2 with the same bacterial contact time. This work provides new avenues for efficient PAW production and deepens insights into the fundamental processes that govern the reactive chemistry in PAW sterilization.
Published: 2024
Full Text: View/download PDF

11. Panacea: A foundation model for clinical trial search, summarization, design, and recruitment

Author: Lin, Jiacheng, Xu, Hanwen, Wang, Zifeng, Wang, Sheng, and Sun, Jimeng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Clinical trials are fundamental in developing new drugs, medical devices, and treatments. However, they are often time-consuming and have low success rates. Although there have been initial attempts to create large language models (LLMs) for clinical trial design and patient-trial matching, these models remain task-specific and not adaptable to diverse clinical trial tasks. To address this challenge, we propose a clinical trial foundation model named Panacea, designed to handle multiple tasks, including trial search, trial summarization, trial design, and patient-trial matching. We also assemble a large-scale dataset, named TrialAlign, of 793,279 trial documents and 1,113,207 trial-related scientific papers, to infuse clinical knowledge into the model by pre-training. We further curate TrialInstruct, which has 200,866 of instruction data for fine-tuning. These resources enable Panacea to be widely applicable for a range of clinical trial tasks based on user requirements. We evaluated Panacea on a new benchmark, named TrialPanorama, which covers eight clinical trial tasks. Our method performed the best on seven of the eight tasks compared to six cutting-edge generic or medicine-specific LLMs. Specifically, Panacea showed great potential to collaborate with human experts in crafting the design of eligibility criteria, study arms, and outcome measures, in multi-round conversations. In addition, Panacea achieved 14.42% improvement in patient-trial matching, 41.78% to 52.02% improvement in trial search, and consistently ranked at the top for five aspects of trial summarization. Our approach demonstrates the effectiveness of Panacea in clinical trials and establishes a comprehensive resource, including training data, model, and benchmark, for developing clinical trial foundation models, paving the path for AI-based clinical trial development.
Published: 2024

12. Accelerating Clinical Evidence Synthesis with Large Language Models

Author: Wang, Zifeng, Cao, Lang, Danek, Benjamin, Jin, Qiao, Lu, Zhiyong, and Sun, Jimeng
Subjects: Computer Science - Computation and Language
Abstract: Synthesizing clinical evidence largely relies on systematic reviews of clinical trials and retrospective analyses from medical literature. However, the rapid expansion of publications presents challenges in efficiently identifying, summarizing, and updating clinical evidence. Here, we introduce TrialMind, a generative artificial intelligence (AI) pipeline for facilitating human-AI collaboration in three crucial tasks for evidence synthesis: study search, screening, and data extraction. To assess its performance, we chose published systematic reviews to build the benchmark dataset, named TrialReviewBench, which contains 100 systematic reviews and the associated 2,220 clinical studies. Our results show that TrialMind excels across all three tasks. In study search, it generates diverse and comprehensive search queries to achieve high recall rates (Ours 0.711-0.834 v.s. Human baseline 0.138-0.232). For study screening, TrialMind surpasses traditional embedding-based methods by 30% to 160%. In data extraction, it outperforms a GPT-4 baseline by 29.6% to 61.5%. We further conducted user studies to confirm its practical utility. Compared to manual efforts, human-AI collaboration using TrialMind yielded a 71.4% recall lift and 44.2% time savings in study screening and a 23.5% accuracy lift and 63.4% time savings in data extraction. Additionally, when comparing synthesized clinical evidence presented in forest plots, medical experts favored TrialMind's outputs over GPT-4's outputs in 62.5% to 100% of cases. These findings show the promise of LLM-based approaches like TrialMind to accelerate clinical evidence synthesis via streamlining study search, screening, and data extraction from medical literature, with exceptional performance improvement when working with human experts.
Published: 2024

13. Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Author: Hsieh, Cheng-Yu, Chuang, Yung-Sung, Li, Chun-Liang, Wang, Zifeng, Le, Long T., Kumar, Abhishek, Glass, James, Ratner, Alexander, Lee, Chen-Yu, Krishna, Ranjay, and Pfister, Tomas
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit a U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless of their relevance. Second, we mitigate this positional bias through a calibration mechanism, found-in-the-middle, that allows the model to attend to contexts faithfully according to their relevance, even though when they are in the middle. Third, we show found-in-the-middle not only achieves better performance in locating relevant information within a long context, but also eventually leads to improved retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 15 percentage points. These findings open up future directions in understanding LLM attention bias and its potential consequences., Comment: ACL Findings 2024
Published: 2024

14. CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation

Author: Hsu, I-Hung, Wang, Zifeng, Le, Long T., Miculicich, Lesly, Peng, Nanyun, Lee, Chen-Yu, and Pfister, Tomas
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses by accurately citing verifiable sources. However, existing methods, by either feeding LMs with raw or preprocessed materials, remain prone to errors. To address this, we introduce CaLM, a novel verification framework. CaLM leverages the insight that a robust grounded response should be consistent with information derived solely from its cited sources. Our framework empowers smaller LMs, which rely less on parametric memory and excel at processing relevant information given a query, to validate the output of larger LMs. Larger LM responses that closely align with the smaller LMs' output, which relies exclusively on cited documents, are verified. Responses showing discrepancies are iteratively refined through a feedback loop. Experiments on three open-domain question-answering datasets demonstrate significant performance gains of 1.5% to 7% absolute average without any required model fine-tuning., Comment: ACL 2024 Camera Ready Version
Published: 2024

15. Continual Learning of Large Language Models: A Comprehensive Survey

Author: Shi, Haizhou, Xu, Zihao, Wang, Hengyi, Qin, Weiyi, Wang, Wenyuan, Wang, Yibin, Wang, Zifeng, Ebrahimi, Sayna, and Wang, Hao
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: The recent success of large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications. One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences. Pre-trained LLMs, when tailored for specific needs, often experience significant performance degradation in previous knowledge domains -- a phenomenon known as "catastrophic forgetting". While extensively studied in the continual learning (CL) community, it presents new manifestations in the realm of LLMs. In this survey, we provide a comprehensive overview of the current research progress on LLMs within the context of CL. This survey is structured into four main sections: we first describe an overview of continually learning LLMs, consisting of two directions of continuity: vertical continuity (or vertical continual learning), i.e., continual adaptation from general to specific capabilities, and horizontal continuity (or horizontal continual learning), i.e., continual adaptation across time and domains (Section 3). We then summarize three stages of learning LLMs in the context of modern CL: Continual Pre-Training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT) (Section 4). Then we provide an overview of evaluation protocols for continual learning with LLMs, along with the current available data sources (Section 5). Finally, we discuss intriguing questions pertaining to continual learning for LLMs (Section 6). The full list of papers examined in this survey is available at https://github.com/Wang-ML-Lab/llm-continual-learning-survey., Comment: 47 pages, 2 figures, 4 tables. Work in progress
Published: 2024

16. CodecLM: Aligning Language Models with Tailored Synthetic Data

Author: Wang, Zifeng, Li, Chun-Liang, Perot, Vincent, Le, Long T., Miao, Jin, Zhang, Zizhao, Lee, Chen-Yu, and Pfister, Tomas
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Instruction tuning has emerged as the key in aligning large language models (LLMs) with specific task instructions, thereby mitigating the discrepancy between the next-token prediction objective and users' actual goals. To reduce the labor and time cost to collect or annotate data by humans, researchers start to explore the use of LLMs to generate instruction-aligned synthetic data. Recent works focus on generating diverse instructions and applying LLM to increase instruction complexity, often neglecting downstream use cases. It remains unclear how to tailor high-quality data to elicit better instruction-following abilities in different target instruction distributions and LLMs. To this end, we introduce CodecLM, a general framework for adaptively generating high-quality synthetic data for LLM alignment with different downstream instruction distributions and LLMs. Drawing on the Encode-Decode principles, we use LLMs as codecs to guide the data generation process. We first encode seed instructions into metadata, which are concise keywords generated on-the-fly to capture the target instruction distribution, and then decode metadata to create tailored instructions. We also introduce Self-Rubrics and Contrastive Filtering during decoding to tailor data-efficient samples. Extensive experiments on four open-domain instruction following benchmarks validate the effectiveness of CodecLM over the current state-of-the-arts., Comment: Accepted to Findings of NAACL 2024
Published: 2024

17. ADAPT to Robustify Prompt Tuning Vision Transformers

Author: Eskandar, Masih, Imtiaz, Tooba, Wang, Zifeng, and Dy, Jennifer
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: The performance of deep models, including Vision Transformers, is known to be vulnerable to adversarial attacks. Many existing defenses against these attacks, such as adversarial training, rely on full-model fine-tuning to induce robustness in the models. These defenses require storing a copy of the entire model, that can have billions of parameters, for each task. At the same time, parameter-efficient prompt tuning is used to adapt large transformer-based models to downstream tasks without the need to save large copies. In this paper, we examine parameter-efficient prompt tuning of Vision Transformers for downstream tasks under the lens of robustness. We show that previous adversarial defense methods, when applied to the prompt tuning paradigm, suffer from gradient obfuscation and are vulnerable to adaptive attacks. We introduce ADAPT, a novel framework for performing adaptive adversarial training in the prompt tuning paradigm. Our method achieves competitive robust accuracy of ~40% w.r.t. SOTA robustness methods using full-model fine-tuning, by tuning only ~1% of the number of parameters.
Published: 2024

18. TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale

Author: Jiang, Pengcheng, Xiao, Cao, Wang, Zifeng, Bhatia, Parminder, Sun, Jimeng, and Han, Jiawei
Subjects: Computer Science - Computation and Language
Abstract: The advent of large language models (LLMs) has significantly advanced natural language processing tasks like text summarization. However, their large size and computational demands, coupled with privacy concerns in data transmission, limit their use in resource-constrained and privacy-centric settings. To overcome this, we introduce TriSum, a framework for distilling LLMs' text summarization abilities into a compact, local model. Initially, LLMs extract a set of aspect-triple rationales and summaries, which are refined using a dual-scoring method for quality. Next, a smaller local model is trained with these tasks, employing a curriculum learning strategy that evolves from simple to complex tasks. Our method enhances local model performance on various benchmarks (CNN/DailyMail, XSum, and ClinicalTrial), outperforming baselines by 4.5%, 8.5%, and 7.4%, respectively. It also improves interpretability by providing insights into the summarization rationale., Comment: NAACL'24
Published: 2024

19. TOX2 nuclear-cytosol translocation is linked to leukemogenesis of acute T-cell leukemia by repressing TIM3 transcription

Author: Li, Anzhou, Zhang, Junbao, Zhan, Liangping, Liu, Xiufeng, Zeng, Xiliang, Zhu, Qian, Wang, Zifeng, and Li, Jiang
Published: 2024
Full Text: View/download PDF

20. GenRES: Rethinking Evaluation for Generative Relation Extraction in the Era of Large Language Models

Author: Jiang, Pengcheng, Lin, Jiacheng, Wang, Zifeng, Sun, Jimeng, and Han, Jiawei
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The field of relation extraction (RE) is experiencing a notable shift towards generative relation extraction (GRE), leveraging the capabilities of large language models (LLMs). However, we discovered that traditional relation extraction (RE) metrics like precision and recall fall short in evaluating GRE methods. This shortfall arises because these metrics rely on exact matching with human-annotated reference relations, while GRE methods often produce diverse and semantically accurate relations that differ from the references. To fill this gap, we introduce GenRES for a multi-dimensional assessment in terms of the topic similarity, uniqueness, granularity, factualness, and completeness of the GRE results. With GenRES, we empirically identified that (1) precision/recall fails to justify the performance of GRE methods; (2) human-annotated referential relations can be incomplete; (3) prompting LLMs with a fixed set of relations or entities can cause hallucinations. Next, we conducted a human evaluation of GRE methods that shows GenRES is consistent with human preferences for RE quality. Last, we made a comprehensive evaluation of fourteen leading LLMs using GenRES across document, bag, and sentence level RE datasets, respectively, to set the benchmark for future research in GRE
Published: 2024

21. PILOT: Legal Case Outcome Prediction with Case Law

Author: Cao, Lang, Wang, Zifeng, Xiao, Cao, and Sun, Jimeng
Subjects: Computer Science - Computation and Language
Abstract: Machine learning shows promise in predicting the outcome of legal cases, but most research has concentrated on civil law cases rather than case law systems. We identified two unique challenges in making legal case outcome predictions with case law. First, it is crucial to identify relevant precedent cases that serve as fundamental evidence for judges during decision-making. Second, it is necessary to consider the evolution of legal principles over time, as early cases may adhere to different legal contexts. In this paper, we proposed a new framework named PILOT (PredictIng Legal case OuTcome) for case outcome prediction. It comprises two modules for relevant case retrieval and temporal pattern handling, respectively. To benchmark the performance of existing legal case outcome prediction models, we curated a dataset from a large-scale case law database. We demonstrate the importance of accurately identifying precedent cases and mitigating the temporal shift when making predictions for case law, as our method shows a significant improvement over the prior methods that focus on civil law case outcome predictions.
Published: 2024

22. Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

Author: Wang, Zilong, Zhang, Hao, Li, Chun-Liang, Eisenschlos, Julian Martin, Perot, Vincent, Wang, Zifeng, Miculicich, Lesly, Fujii, Yasuhisa, Shang, Jingbo, Lee, Chen-Yu, and Pfister, Tomas
Subjects: Computer Science - Computation and Language
Abstract: Table-based reasoning with large language models (LLMs) is a promising direction to tackle many table understanding tasks, such as table-based question answering and fact verification. Compared with generic reasoning, table-based reasoning requires the extraction of underlying semantics from both free-form questions and semi-structured tabular data. Chain-of-Thought and its similar approaches incorporate the reasoning chain in the form of textual context, but it is still an open question how to effectively leverage tabular data in the reasoning chain. We propose the Chain-of-Table framework, where tabular data is explicitly used in the reasoning chain as a proxy for intermediate thoughts. Specifically, we guide LLMs using in-context learning to iteratively generate operations and update the table to represent a tabular reasoning chain. LLMs can therefore dynamically plan the next operation based on the results of the previous ones. This continuous evolution of the table forms a chain, showing the reasoning process for a given tabular problem. The chain carries structured information of the intermediate results, enabling more accurate and reliable predictions. Chain-of-Table achieves new state-of-the-art performance on WikiTQ, FeTaQA, and TabFact benchmarks across multiple LLM choices., Comment: Accepted to ICLR 2024
Published: 2024

23. The effect of hypo-sulfonamide accelerators on the induction period of rubber vulcanization based on experiments and molecular simulations

Author: Liu, Junying, Jiang, Shuangyan, Wang, Zifeng, Song, Jianhui, and Yong, Zhanfu
Subjects: Molecular dynamics -- Research, Crosslinked polymers -- Research, Sulfonamides -- Research, Simulation methods -- Research, Engineering and manufacturing industries, Science and technology
Abstract: Accelerators are indispensable additives in rubber vulcanization reactions. Hypo-sulfonamide accelerators are widely used due to their fast vulcanization speed. However, the short vulcanization induction period compromises the processing safety of rubber. In this paper, we incorporated experiments with molecular simulations to investigate the underlying reasons for the different effects of various hypo-sulfonamide accelerators The vulcanization induction period of each accelerator was investigated according to the vulcanization curves. Furthermore, the corresponding reaction mechanism was explored from the molecular scale through quantum mechanical and molecular dynamics simulations. Subsequently, the research focus was discussed in depth. The compatibility relationship between the accelerators and the rubber materials w7as verified through the crosslinking density test, and a coincidence between the experimental and simulation results was found. This study guides the selection and improvement of hypo-sulfonamide accelerators and corroborates the factors influencing the vulcanization induction period of natural rubber by accelerators. Highlights * Reactivity of accelerators calculated by quantum mechanical simulations. * Difference in solubility parameter characterizes accelerator/NR compatibility. * Mobility and diffusion coefficient reflect the diffusion ability of the promoter. * Molecular simulation results are consistent with experimental results. KEYWORDS hypo-sulfonamide accelerator, molecular dynamics simulation, natural rubber, quantum mechanics simulation, vulcanization induction period, 1 | INTRODUCTION Vulcanization involves heating the rubber with additives (i.e., sulfur) to transform the linear rubber molecular chain into a three-dimensional mesh structure, achieving the transition of the plastic [...]
Published: 2024
Full Text: View/download PDF

24. BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs

Author: Wang, Zifeng, Wang, Zichen, Srinivasan, Balasubramaniam, Ioannidis, Vassilis N., Rangwala, Huzefa, and Anubhai, Rishita
Subjects: Computer Science - Machine Learning
Abstract: Foundation models (FMs) are able to leverage large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks. However, FMs developed for biomedical domains have largely remained unimodal, i.e., independently trained and used for tasks on protein sequences alone, small molecule structures alone, or clinical data alone. To overcome this limitation of biomedical FMs, we present BioBridge, a novel parameter-efficient learning framework, to bridge independently trained unimodal FMs to establish multimodal behavior. BioBridge achieves it by utilizing Knowledge Graphs (KG) to learn transformations between one unimodal FM and another without fine-tuning any underlying unimodal FMs. Our empirical results demonstrate that BioBridge can beat the best baseline KG embedding methods (on average by around 76.3%) in cross-modal retrieval tasks. We also identify BioBridge demonstrates out-of-domain generalization ability by extrapolating to unseen modalities or relations. Additionally, we also show that BioBridge presents itself as a general purpose retriever that can aid biomedical multimodal question answering as well as enhance the guided generation of novel drugs., Comment: ICLR 2024
Published: 2023

25. UniPredict: Large Language Models are Universal Tabular Classifiers

Author: Wang, Ruiyu, Wang, Zifeng, and Sun, Jimeng
Subjects: Computer Science - Machine Learning
Abstract: Tabular data prediction is a fundamental machine learning task for many applications. Existing methods predominantly employ discriminative modeling and operate under the assumption of a fixed target column, necessitating re-training for every new predictive task. Inspired by the generative power of large language models (LLMs), this paper exploits the idea of building universal tabular data predictors based on generative modeling, namely UniPredict. Here, we demonstrate the scalability of an LLM to extensive tabular datasets, enabling it to comprehend diverse tabular inputs and predict target variables following the provided instructions. Specifically, we train a single LLM on an aggregation of 169 tabular datasets with diverse targets and compare its performance against baselines that are trained on each dataset separately. We observe this versatile UniPredict model demonstrates an advantage over other models, ranging from 5.4% to 13.4%, when compared with the best tree-boosting baseline and the best neural network baseline, respectively. We further test UniPredict in few-shot learning settings on another 62 tabular datasets. Our method achieves strong performance in quickly adapting to new tasks. In low-resource few-shot setup, we observed a 100%+ performance advantage compared with XGBoost, and significant margin over all baselines. We envision that UniPredict sheds light on developing a universal tabular data prediction system that learns from data at scale and serves a wide range of prediction tasks.
Published: 2023

26. CITING: Large Language Models Create Curriculum for Instruction Tuning

Author: Feng, Tao, Wang, Zifeng, and Sun, Jimeng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The recent advancement of large language models (LLMs) has been achieved through a combo of instruction tuning and human alignment. However, building manually crafted instruction datasets and performing human alignment become the bottleneck for scaling the development of LLMs. In this paper, we exploit the idea of leveraging AI models in lieu of humans as the teacher to train student LLMs. Our method is inspired by how human students refine their writing skills by following the rubrics and learning from the revisions offered by their tutors. Specifically, we employ a teacher LLM to create a curriculum for instruction tuning of the student LLM, namely Curriculum Instruction TunING (CITING). It encompasses two main steps: (1) the teacher LLM crafts the rubrics for evaluating the answers corresponding to various types of questions, and (2) the student LLM learns to follow the rubrics and perform self-correction from the revision made by the teacher. We further iteratively carry out it to embody the procedure of CITING. We compare CITING to a series of state-of-the-art baselines on four datasets. Our method demonstrates strong improvement in terms of articulate, in-depth, and comprehensive by GPT-4 evaluation. Specifically, it achieves an average winning rate of 79.4% over SFT, 73.4% over RLHF, 78.1% over RRHF, and 76.3% over RAFT, respectively.
Published: 2023

27. LMDX: Language Model-based Document Information Extraction and Localization

Author: Perot, Vincent, Kang, Kai, Luisier, Florian, Su, Guolong, Sun, Xiaoyu, Boppana, Ramya Sree, Wang, Zilong, Wang, Zifeng, Mu, Jiaqi, Zhang, Hao, Lee, Chen-Yu, and Hua, Nan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art and exhibiting emergent capabilities across various tasks. However, their application in extracting information from visually rich documents, which is at the core of many document processing workflows and involving the extraction of key entities from semi-structured documents, has not yet been successful. The main obstacles to adopting LLMs for this task include the absence of layout encoding within LLMs, which is critical for high quality extraction, and the lack of a grounding mechanism to localize the predicted entities within the document. In this paper, we introduce Language Model-based Document Information Extraction and Localization (LMDX), a methodology to reframe the document information extraction task for a LLM. LMDX enables extraction of singular, repeated, and hierarchical entities, both with and without training data, while providing grounding guarantees and localizing the entities within the document. Finally, we apply LMDX to the PaLM 2-S and Gemini Pro LLMs and evaluate it on VRDU and CORD benchmarks, setting a new state-of-the-art and showing how LMDX enables the creation of high quality, data-efficient parsers.
Published: 2023

28. Critical behavior of the van der Waals ferromagnet Cr1.2Te2

Author: Zhang, Ying, Liu, Yonglai, He, Miao, Wang, Zifeng, and Quan, Guiying
Published: 2024
Full Text: View/download PDF

29. MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models

Author: Wen, Yilin, Wang, Zifeng, and Sun, Jimeng
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Large language models (LLMs) have achieved remarkable performance in natural language understanding and generation tasks. However, they often suffer from limitations such as difficulty in incorporating new knowledge, generating hallucinations, and explaining their reasoning process. To address these challenges, we propose a novel prompting pipeline, named \method, that leverages knowledge graphs (KGs) to enhance LLMs' inference and transparency. Our method enables LLMs to comprehend KG inputs and infer with a combination of implicit and external knowledge. Moreover, our method elicits the mind map of LLMs, which reveals their reasoning pathways based on the ontology of knowledge. We evaluate our method on diverse question \& answering tasks, especially in medical domains, and show significant improvements over baselines. We also introduce a new hallucination evaluation benchmark and analyze the effects of different components of our method. Our results demonstrate the effectiveness and robustness of our method in merging knowledge from LLMs and KGs for combined inference. To reproduce our results and extend the framework further, we make our codebase available at https://github.com/wyl-willing/MindMap., Comment: 8 pages, 8 figures, 12 tables
Published: 2023

30. Matching Patients to Clinical Trials with Large Language Models

Author: Jin, Qiao, Wang, Zifeng, Floudas, Charalampos S., Chen, Fangyuan, Gong, Changlin, Bracken-Clarke, Dara, Xue, Elisabetta, Yang, Yifan, Sun, Jimeng, and Lu, Zhiyong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Clinical trials are often hindered by the challenge of patient recruitment. In this work, we introduce TrialGPT, a first-of-its-kind large language model (LLM) framework to assist patient-to-trial matching. Given a patient note, TrialGPT predicts the patient's eligibility on a criterion-by-criterion basis and then consolidates these predictions to assess the patient's eligibility for the target trial. We evaluate the trial-level prediction performance of TrialGPT on three publicly available cohorts of 184 patients with over 18,000 trial annotations. We also engaged three physicians to label over 1,000 patient-criterion pairs to assess its criterion-level prediction accuracy. Experimental results show that TrialGPT achieves a criterion-level accuracy of 87.3% with faithful explanations, close to the expert performance (88.7%-90.0%). The aggregated TrialGPT scores are highly correlated with human eligibility judgments, and they outperform the best-competing models by 32.6% to 57.2% in ranking and excluding clinical trials. Furthermore, our user study reveals that TrialGPT can significantly reduce the screening time (by 42.6%) in a real-life clinical trial matching task. These results and analyses have demonstrated promising opportunities for clinical trial matching with LLMs such as TrialGPT., Comment: Under review
Published: 2023

31. PyTrial: Machine Learning Software and Benchmark for Clinical Trial Applications

Author: Wang, Zifeng, Theodorou, Brandon, Fu, Tianfan, Xiao, Cao, and Sun, Jimeng
Subjects: Computer Science - Artificial Intelligence, Quantitative Biology - Quantitative Methods
Abstract: Clinical trials are conducted to test the effectiveness and safety of potential drugs in humans for regulatory approval. Machine learning (ML) has recently emerged as a new tool to assist in clinical trials. Despite this progress, there have been few efforts to document and benchmark ML4Trial algorithms available to the ML research community. Additionally, the accessibility to clinical trial-related datasets is limited, and there is a lack of well-defined clinical tasks to facilitate the development of new algorithms. To fill this gap, we have developed PyTrial that provides benchmarks and open-source implementations of a series of ML algorithms for clinical trial design and operations. In this paper, we thoroughly investigate 34 ML algorithms for clinical trials across 6 different tasks, including patient outcome prediction, trial site selection, trial outcome prediction, patient-trial matching, trial similarity search, and synthetic data generation. We have also collected and prepared 23 ML-ready datasets as well as their working examples in Jupyter Notebooks for quick implementation and testing. PyTrial defines each task through a simple four-step process: data loading, model specification, model training, and model evaluation, all achievable with just a few lines of code. Furthermore, our modular API architecture empowers practitioners to expand the framework to incorporate new algorithms and tasks effortlessly. The code is available at https://github.com/RyanWangZf/PyTrial.
Published: 2023

32. Matching patients to clinical trials with large language models

Author: Jin, Qiao, Wang, Zifeng, Floudas, Charalampos S., Chen, Fangyuan, Gong, Changlin, Bracken-Clarke, Dara, Xue, Elisabetta, Yang, Yifan, Sun, Jimeng, and Lu, Zhiyong
Published: 2024
Full Text: View/download PDF

33. CSRP1 gene: a potential novel prognostic marker in acute myeloid leukemia with implications for immune response

Author: Zhao, Chunxia, Wang, Yulu, Wang, Huan, Sharma, Amit, Wu, Yun, Schmidt-Wolf, Ingo G. H., and Wang, Zifeng
Published: 2024
Full Text: View/download PDF

34. FOXA2 rewires AP-1 for transcriptional reprogramming and lineage plasticity in prostate cancer

Author: Wang, Zifeng, Townley, Scott L., Zhang, Songqi, Liu, Mingyu, Li, Muqing, Labaf, Maryam, Patalano, Susan, Venkataramani, Kavita, Siegfried, Kellee R., Macoska, Jill A., Han, Dong, Gao, Shuai, Risbridger, Gail P., Taylor, Renea A., Lawrence, Mitchell G., He, Housheng Hansen, Selth, Luke A., and Cai, Changmeng
Published: 2024
Full Text: View/download PDF

35. Multiomics profiling reveals VDR as a central regulator of mesenchymal stem cell senescence with a known association with osteoporosis after high-fat diet exposure

Author: Chen, Jiayao, Kuang, Shuhong, Cen, Jietao, Zhang, Yong, Shen, Zongshan, Qin, Wei, Huang, Qiting, Wang, Zifeng, Gao, Xianling, Huang, Fang, and Lin, Zhengmei
Published: 2024
Full Text: View/download PDF

36. SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended)

Author: Sun, Ruoxi, Arik, Sercan Ö., Muzio, Alex, Miculicich, Lesly, Gundabathula, Satya, Yin, Pengcheng, Dai, Hanjun, Nakhost, Hootan, Sinha, Rajarishi, Wang, Zifeng, and Pfister, Tomas
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Databases
Abstract: Text-to-SQL, the process of translating natural language into Structured Query Language (SQL), represents a transformative application of large language models (LLMs), potentially revolutionizing how humans interact with data. This paper introduces the SQL-PaLM framework, a comprehensive solution for understanding and enhancing Text-to-SQL using LLMs, using in the learning regimes of few-shot prompting and instruction fine-tuning. With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error filtering. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs. In particular, we investigate how performance can be improved through expanded training data coverage and diversity, synthetic data augmentation, and integrating query-specific database content. We propose a test-time selection method to further refine accuracy by integrating SQL outputs from multiple paradigms with execution feedback as guidance. Additionally, we tackle the practical challenge of navigating intricate databases with a significant number of tables and columns, proposing efficient techniques for accurately selecting relevant database elements to enhance Text-to-SQL performance. Our holistic approach yields substantial advancements in Text-to-SQL, as demonstrated on two key public benchmarks, Spider and BIRD. Through comprehensive ablations and error analyses, we shed light on the strengths and weaknesses of our framework, offering valuable insights into Text-to-SQL's future work.
Published: 2023

37. MediTab: Scaling Medical Tabular Data Predictors via Data Consolidation, Enrichment, and Refinement

Author: Wang, Zifeng, Gao, Chufan, Xiao, Cao, and Sun, Jimeng
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Tabular data prediction has been employed in medical applications such as patient health risk prediction. However, existing methods usually revolve around the algorithm design while overlooking the significance of data engineering. Medical tabular datasets frequently exhibit significant heterogeneity across different sources, with limited sample sizes per source. As such, previous predictors are often trained on manually curated small datasets that struggle to generalize across different tabular datasets during inference. This paper proposes to scale medical tabular data predictors (MediTab) to various tabular inputs with varying features. The method uses a data engine that leverages large language models (LLMs) to consolidate tabular samples to overcome the barrier across tables with distinct schema. It also aligns out-domain data with the target task using a "learn, annotate, and refinement" pipeline. The expanded training data then enables the pre-trained MediTab to infer for arbitrary tabular input in the domain without fine-tuning, resulting in significant improvements over supervised baselines: it reaches an average ranking of 1.57 and 1.00 on 7 patient outcome prediction datasets and 3 trial outcome prediction datasets, respectively. In addition, MediTab exhibits impressive zero-shot performances: it outperforms supervised XGBoost models by 8.9% and 17.2% on average in two prediction tasks, respectively., Comment: IJCAI 2024
Published: 2023

38. AutoTrial: Prompting Language Models for Clinical Trial Design

Author: Wang, Zifeng, Xiao, Cao, and Sun, Jimeng
Subjects: Computer Science - Computation and Language
Abstract: Clinical trials are critical for drug development. Constructing the appropriate eligibility criteria (i.e., the inclusion/exclusion criteria for patient recruitment) is essential for the trial's success. Proper design of clinical trial protocols should consider similar precedent trials and their eligibility criteria to ensure sufficient patient coverage. In this paper, we present a method named AutoTrial to aid the design of clinical eligibility criteria using language models. It allows (1) controllable generation under instructions via a hybrid of discrete and neural prompting, (2) scalable knowledge incorporation via in-context learning, and (3) explicit reasoning chains to provide rationales for understanding the outputs. Experiments on over 70K clinical trials verify that AutoTrial generates high-quality criteria texts that are fluent and coherent and with high accuracy in capturing the relevant clinical concepts to the target trial. It is noteworthy that our method, with a much smaller parameter size, gains around 60% winning rate against the GPT-3.5 baselines via human evaluations., Comment: EMNLP 2023 Main
Published: 2023

39. DualHSIC: HSIC-Bottleneck and Alignment for Continual Learning

Author: Wang, Zifeng, Zhan, Zheng, Gong, Yifan, Shao, Yucai, Ioannidis, Stratis, Wang, Yanzhi, and Dy, Jennifer
Subjects: Computer Science - Machine Learning
Abstract: Rehearsal-based approaches are a mainstay of continual learning (CL). They mitigate the catastrophic forgetting problem by maintaining a small fixed-size buffer with a subset of data from past tasks. While most rehearsal-based approaches study how to effectively exploit the knowledge from the buffered past data, little attention is paid to the inter-task relationships with the critical task-specific and task-invariant knowledge. By appropriately leveraging inter-task relationships, we propose a novel CL method named DualHSIC to boost the performance of existing rehearsal-based methods in a simple yet effective way. DualHSIC consists of two complementary components that stem from the so-called Hilbert Schmidt independence criterion (HSIC): HSIC-Bottleneck for Rehearsal (HBR) lessens the inter-task interference and HSIC Alignment (HA) promotes task-invariant knowledge sharing. Extensive experiments show that DualHSIC can be seamlessly plugged into existing rehearsal-based methods for consistent performance improvements, and also outperforms recent state-of-the-art regularization-enhanced rehearsal methods. Source code will be released., Comment: Accepted at ICML 2023 as a conference paper
Published: 2023

40. SPOT: Sequential Predictive Modeling of Clinical Trial Outcome with Meta-Learning

Author: Wang, Zifeng, Xiao, Cao, and Sun, Jimeng
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Clinical trials are essential to drug development but time-consuming, costly, and prone to failure. Accurate trial outcome prediction based on historical trial data promises better trial investment decisions and more trial success. Existing trial outcome prediction models were not designed to model the relations among similar trials, capture the progression of features and designs of similar trials, or address the skewness of trial data which causes inferior performance for less common trials. To fill the gap and provide accurate trial outcome prediction, we propose Sequential Predictive mOdeling of clinical Trial outcome (SPOT) that first identifies trial topics to cluster the multi-sourced trial data into relevant trial topics. It then generates trial embeddings and organizes them by topic and time to create clinical trial sequences. With the consideration of each trial sequence as a task, it uses a meta-learning strategy to achieve a point where the model can rapidly adapt to new tasks with minimal updates. In particular, the topic discovery module enables a deeper understanding of the underlying structure of the data, while sequential learning captures the evolution of trial designs and outcomes. This results in predictions that are not only more accurate but also more interpretable, taking into account the temporal patterns and unique characteristics of each trial topic. We demonstrate that SPOT wins over the prior methods by a significant margin on trial outcome benchmark data: with a 21.5\% lift on phase I, an 8.9\% lift on phase II, and a 5.5\% lift on phase III trials in the metric of the area under precision-recall curve (PR-AUC).
Published: 2023

41. Robust Meta Learning for Image based tasks

Author: Jiang, Penghao, Ke, Xin, Wang, ZiFeng, and Li, Chunxi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: A machine learning model that generalizes well should obtain low errors on unseen test examples. Thus, if we learn an optimal model in training data, it could have better generalization performance in testing tasks. However, learning such a model is not possible in standard machine learning frameworks as the distribution of the test data is unknown. To tackle this challenge, we propose a novel robust meta-learning method, which is more robust to the image-based testing tasks which is unknown and has distribution shifts with training tasks. Our robust meta-learning method can provide robust optimal models even when data from each distribution are scarce. In experiments, we demonstrate that our algorithm not only has better generalization performance but also robust to different unknown testing tasks., Comment: IEEE International Conference on Robotics and Automation SRLworkshop 2022
Published: 2023

42. Invariant Meta Learning for Out-of-Distribution Generalization

Author: Jiang, Penghao, Xin, Ke, Wang, Zifeng, and Li, Chunxi
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Modern deep learning techniques have illustrated their excellent capabilities in many areas, but relies on large training data. Optimization-based meta-learning train a model on a variety tasks, such that it can solve new learning tasks using only a small number of training samples.However, these methods assumes that training and test dataare identically and independently distributed. To overcome such limitation, in this paper, we propose invariant meta learning for out-of-distribution tasks. Specifically, invariant meta learning find invariant optimal meta-initialization,and fast adapt to out-of-distribution tasks with regularization penalty. Extensive experiments demonstrate the effectiveness of our proposed invariant meta learning on out-of-distribution few-shot tasks., Comment: IEEE Conference on Computer Vision and Pattern Recognition 2022 The Ninth Workshop on Fine-Grained Visual Categorization
Published: 2023

43. SAIF: Sparse Adversarial and Imperceptible Attack Framework

Author: Imtiaz, Tooba, Kohler, Morgan, Miller, Jared, Wang, Zifeng, Sznaier, Mario, Camps, Octavia, and Dy, Jennifer
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Adversarial attacks hamper the decision-making ability of neural networks by perturbing the input signal. The addition of calculated small distortion to images, for instance, can deceive a well-trained image classification network. In this work, we propose a novel attack technique called Sparse Adversarial and Interpretable Attack Framework (SAIF). Specifically, we design imperceptible attacks that contain low-magnitude perturbations at a small number of pixels and leverage these sparse attacks to reveal the vulnerability of classifiers. We use the Frank-Wolfe (conditional gradient) algorithm to simultaneously optimize the attack perturbations for bounded magnitude and sparsity with $O(1/\sqrt{T})$ convergence. Empirical results show that SAIF computes highly imperceptible and interpretable adversarial examples, and outperforms state-of-the-art sparse attack methods on the ImageNet dataset.
Published: 2022

44. Surface Water

Author: Liu, Junguo, Mao, Ganquan, Zhang, Shuyu, Liu, Xiaomang, Feng, Lian, Wang, Zifeng, Chen, He, Pokhrel, Yadu, Dang, Huy, Wang, Hong, Chen, Deliang, editor, Liu, Junguo, editor, and Tang, Qiuhong, editor
Published: 2024
Full Text: View/download PDF

45. QueryForm: A Simple Zero-shot Form Entity Query Framework

Author: Wang, Zifeng, Zhang, Zizhao, Devlin, Jacob, Lee, Chen-Yu, Su, Guolong, Zhang, Hao, Dy, Jennifer, Perot, Vincent, and Pfister, Tomas
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities. We present a novel query-based framework, QueryForm, that extracts entity values from form-like documents in a zero-shot fashion. QueryForm contains a dual prompting mechanism that composes both the document schema and a specific entity type into a query, which is used to prompt a Transformer model to perform a single entity extraction task. Furthermore, we propose to leverage large-scale query-entity pairs generated from form-like webpages with weak HTML annotations to pre-train QueryForm. By unifying pre-training and fine-tuning into the same query-based framework, QueryForm enables models to learn from structured documents containing various entities and layouts, leading to better generalization to target document types without the need for target-specific training data. QueryForm sets new state-of-the-art average F1 score on both the XFUND (+4.6%~10.1%) and the Payment (+3.2%~9.5%) zero-shot benchmark, with a smaller model size and no additional image input., Comment: Accepted to Findings of ACL 2023
Published: 2022

46. MedCLIP: Contrastive Learning from Unpaired Medical Images and Text

Author: Wang, Zifeng, Wu, Zhenbang, Agarwal, Dinesh, and Sun, Jimeng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Existing vision-text contrastive learning like CLIP aims to match the paired image and caption embeddings while pushing others apart, which improves representation transferability and supports zero-shot prediction. However, medical image-text datasets are orders of magnitude below the general images and captions from the internet. Moreover, previous methods encounter many false negatives, i.e., images and reports from separate patients probably carry the same semantics but are wrongly treated as negatives. In this paper, we decouple images and texts for multimodal contrastive learning thus scaling the usable training data in a combinatorial magnitude with low cost. We also propose to replace the InfoNCE loss with semantic matching loss based on medical knowledge to eliminate false negatives in contrastive learning. We prove that MedCLIP is a simple yet effective framework: it outperforms state-of-the-art methods on zero-shot prediction, supervised classification, and image-text retrieval. Surprisingly, we observe that with only 20K pre-training data, MedCLIP wins over the state-of-the-art method (using around 200K data). Our code is available at https://github.com/RyanWangZf/MedCLIP., Comment: EMNLP 2022
Published: 2022

47. PromptEHR: Conditional Electronic Healthcare Records Generation with Prompt Learning

Author: Wang, Zifeng and Sun, Jimeng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Accessing longitudinal multimodal Electronic Healthcare Records (EHRs) is challenging due to privacy concerns, which hinders the use of ML for healthcare applications. Synthetic EHRs generation bypasses the need to share sensitive real patient records. However, existing methods generate single-modal EHRs by unconditional generation or by longitudinal inference, which falls short of low flexibility and makes unrealistic EHRs. In this work, we propose to formulate EHRs generation as a text-to-text translation task by language models (LMs), which suffices to highly flexible event imputation during generation. We also design prompt learning to control the generation conditioned by numerical and categorical demographic features. We evaluate synthetic EHRs quality by two perplexity measures accounting for their longitudinal pattern (longitudinal imputation perplexity, lpl) and the connections cross modalities (cross-modality imputation perplexity, mpl). Moreover, we utilize two adversaries: membership and attribute inference attacks for privacy-preserving evaluation. Experiments on MIMIC-III data demonstrate the superiority of our methods on realistic EHRs generation (53.1\% decrease of lpl and 45.3\% decrease of mpl on average compared to the best baselines) with low privacy risks. Software is available at https://github.com/RyanWangZf/PromptEHR., Comment: EMNLP 2022
Published: 2022

48. Pruning Adversarially Robust Neural Networks without Adversarial Examples

Author: Jian, Tong, Wang, Zifeng, Wang, Yanzhi, Dy, Jennifer, and Ioannidis, Stratis
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition
Abstract: Adversarial pruning compresses models while preserving robustness. Current methods require access to adversarial examples during pruning. This significantly hampers training efficiency. Moreover, as new adversarial attacks and training methods develop at a rapid rate, adversarial pruning methods need to be modified accordingly to keep up. In this work, we propose a novel framework to prune a previously trained robust neural network while maintaining adversarial robustness, without further generating adversarial examples. We leverage concurrent self-distillation and pruning to preserve knowledge in the original model as well as regularizing the pruned model via the Hilbert-Schmidt Information Bottleneck. We comprehensively evaluate our proposed framework and show its superior performance in terms of both adversarial robustness and efficiency when pruning architectures trained on the MNIST, CIFAR-10, and CIFAR-100 datasets against five state-of-the-art attacks. Code is available at https://github.com/neu-spiral/PwoA/., Comment: Published at ICDM 2022 as a conference paper
Published: 2022

49. SparCL: Sparse Continual Learning on the Edge

Author: Wang, Zifeng, Zhan, Zheng, Gong, Yifan, Yuan, Geng, Niu, Wei, Jian, Tong, Ren, Bin, Ioannidis, Stratis, Wang, Yanzhi, and Dy, Jennifer
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Existing work in continual learning (CL) focuses on mitigating catastrophic forgetting, i.e., model performance deterioration on past tasks when learning a new task. However, the training efficiency of a CL system is under-investigated, which limits the real-world application of CL systems under resource-limited scenarios. In this work, we propose a novel framework called Sparse Continual Learning(SparCL), which is the first study that leverages sparsity to enable cost-effective continual learning on edge devices. SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity. Specifically, we propose task-aware dynamic masking (TDM) to learn a sparse network throughout the entire CL process, dynamic data removal (DDR) to remove less informative training data, and dynamic gradient masking (DGM) to sparsify the gradient updates. Each of them not only improves efficiency, but also further mitigates catastrophic forgetting. SparCL consistently improves the training efficiency of existing state-of-the-art (SOTA) CL methods by at most 23X less training FLOPs, and, surprisingly, further improves the SOTA accuracy by at most 1.7%. SparCL also outperforms competitive baselines obtained from adapting SOTA sparse training methods to the CL setting in both efficiency and accuracy. We also evaluate the effectiveness of SparCL on a real mobile phone, further indicating the practical potential of our method., Comment: Published at NeurIPS 2022 as a conference paper
Published: 2022

50. Artificial Intelligence for In Silico Clinical Trials: A Review

Author: Wang, Zifeng, Gao, Chufan, Glass, Lucas M., and Sun, Jimeng
Subjects: Quantitative Biology - Quantitative Methods, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: A clinical trial is an essential step in drug development, which is often costly and time-consuming. In silico trials are clinical trials conducted digitally through simulation and modeling as an alternative to traditional clinical trials. AI-enabled in silico trials can increase the case group size by creating virtual cohorts as controls. In addition, it also enables automation and optimization of trial design and predicts the trial success rate. This article systematically reviews papers under three main topics: clinical simulation, individualized predictive modeling, and computer-aided trial design. We focus on how machine learning (ML) may be applied in these applications. In particular, we present the machine learning problem formulation and available data sources for each task. We end with discussing the challenges and opportunities of AI for in silico trials in real-world applications.
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,137 results on '"Wang, Zifeng"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources