Author: "Zhang Sheng" / Publication Type: Reports - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhang Sheng"' showing total 225 results

Start Over Author "Zhang Sheng" Publication Type Reports

225 results on '"Zhang Sheng"'

1. From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond

Author: Nori, Harsha, Usuyama, Naoto, King, Nicholas, McKinney, Scott Mayer, Fernandes, Xavier, Zhang, Sheng, and Horvitz, Eric
Subjects: Computer Science - Computation and Language
Abstract: Run-time steering strategies like Medprompt are valuable for guiding large language models (LLMs) to top performance on challenging tasks. Medprompt demonstrates that a general LLM can be focused to deliver state-of-the-art performance on specialized domains like medicine by using a prompt to elicit a run-time strategy involving chain of thought reasoning and ensembling. OpenAI's o1-preview model represents a new paradigm, where a model is designed to do run-time reasoning before generating final responses. We seek to understand the behavior of o1-preview on a diverse set of medical challenge problem benchmarks. Following on the Medprompt study with GPT-4, we systematically evaluate the o1-preview model across various medical benchmarks. Notably, even without prompting techniques, o1-preview largely outperforms the GPT-4 series with Medprompt. We further systematically study the efficacy of classic prompt engineering strategies, as represented by Medprompt, within the new paradigm of reasoning models. We found that few-shot prompting hinders o1's performance, suggesting that in-context learning may no longer be an effective steering approach for reasoning-native models. While ensembling remains viable, it is resource-intensive and requires careful cost-performance optimization. Our cost and accuracy analysis across run-time strategies reveals a Pareto frontier, with GPT-4o representing a more affordable option and o1-preview achieving state-of-the-art performance at higher cost. Although o1-preview offers top performance, GPT-4o with steering strategies like Medprompt retains value in specific contexts. Moreover, we note that the o1-preview model has reached near-saturation on many existing medical benchmarks, underscoring the need for new, challenging benchmarks. We close with reflections on general directions for inference-time computation with LLMs., Comment: 25 pages
Published: 2024

2. MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction

Author: Zhao, Wang, Liu, Jiachen, Zhang, Sheng, Li, Yishu, Chen, Sili, Huang, Sharon X, Liu, Yong-Jin, and Guo, Hengkai
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: This paper presents a generalizable 3D plane detection and reconstruction framework named MonoPlane. Unlike previous robust estimator-based works (which require multiple images or RGB-D input) and learning-based works (which suffer from domain shift), MonoPlane combines the best of two worlds and establishes a plane reconstruction pipeline based on monocular geometric cues, resulting in accurate, robust and scalable 3D plane detection and reconstruction in the wild. Specifically, we first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image. These monocular geometric cues are then incorporated into a proximity-guided RANSAC framework to sequentially fit each plane instance. We exploit effective 3D point proximity and model such proximity via a graph within RANSAC to guide the plane fitting from noisy monocular depths, followed by image-level multi-plane joint optimization to improve the consistency among all plane instances. We further design a simple but effective pipeline to extend this single-view solution to sparse-view 3D plane reconstruction. Extensive experiments on a list of datasets demonstrate our superior zero-shot generalizability over baselines, achieving state-of-the-art plane reconstruction performance in a transferring setting. Our code is available at https://github.com/thuzhaowang/MonoPlane ., Comment: IROS 2024 (oral)
Published: 2024

3. Lines of Bound States in the Continuum in a Phononic Crystal Slab

Author: Yang, Lin, Zheng, Riyi, Zhang, Sheng, Zhang, Wenshuai, Du, Qiujiao, Peng, Pai, Wang, Ziyu, Ke, Manzhu, Huang, Xueqin, and Liu, Fengming
Subjects: Condensed Matter - Materials Science
Abstract: We demonstrate that bound states in the continuum (BICs) form continuous lines along high-symmetry directions of momentum space in a simple phononic crystal slab. Contrary to common sense, these BICs are symmetry-protected (SP) BICs not only at the center of the Brillouin zone (gamma point) but also off the gamma point. We utilize numerical simulations, a group theory method, and a mode expansion method to comprehensively understand the formation of the BICs. It is revealed that these BICs correspond to phase singularity lines of far-field radiation, and the phase winding number can be defined as a topological index. This makes the BICs topologically protected and robust to any parameter variation that maintains periodicity and rotational symmetry. Finally, the generation of the BICs lines is experimentally demonstrated.
Published: 2024

4. Traversability of Schwarzschild-Anti-de Sitter Wormhole in f(T) gravity

Author: Zhang, Sheng-Fu and Lin, Rui-Hui
Subjects: General Relativity and Quantum Cosmology
Abstract: In this paper we analyze the traversability of static and evolving Schwarzschild-Anti-de Sitter wormholes. The wormhole metric under consideration is not asymptotically flat. Hence one can only embed this metric into the Euclidean space for a limited radius $r_{max}$. For $r>r_{max}$, an exterior vacuum spacetime should be matched to the wormhole spacetime. In the framework of $f(T)$ gravities, we discuss the null energy condition that the matter supporting the wormhole should satisfy and find that the nontrivial form of $f(T)$ is necessary. For the wormholes to be suitable for human to traverse, we consider the tidal force that a traveler would have felt during his trip. This leads to an upper bound of the traveler's velocity. Utilizing the velocity allowed, we will estimate the travel time through the wormhole. In the evolving cases, the wormhole should not be expanding too fast, otherwise the traveler may not be able to arrive at the other side of the wormhole. Besides this, for static wormholes, we briefly discuss the geodesics in the plane $\theta=\pi/2$., Comment: 15 pages, 2 figures
Published: 2024

5. MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging

Author: Codella, Noel C. F., Jin, Ying, Jain, Shrey, Gu, Yu, Lee, Ho Hin, Abacha, Asma Ben, Santamaria-Pang, Alberto, Guyman, Will, Sangani, Naiteek, Zhang, Sheng, Poon, Hoifung, Hyland, Stephanie, Bannur, Shruthi, Alvarez-Valle, Javier, Li, Xue, Garrett, John, McMillan, Alan, Rajguru, Gaurav, Maddi, Madhu, Vijayrania, Nilesh, Bhimai, Rehaan, Mecklenburg, Nick, Jain, Rupal, Holstein, Daniel, Gaur, Naveen, Aski, Vijay, Hwang, Jenq-Neng, Lin, Thomas, Tarapov, Ivan, Lungren, Matthew, and Wei, Mu
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work, we present MedImageInsight, an open-source medical imaging embedding model. MedImageInsight is trained on medical images with associated text and labels across a diverse collection of domains, including X-Ray, CT, MRI, dermoscopy, OCT, fundus photography, ultrasound, histopathology, and mammography. Rigorous evaluations demonstrate MedImageInsight's ability to achieve state-of-the-art (SOTA) or human expert level performance across classification, image-image search, and fine-tuning tasks. Specifically, on public datasets, MedImageInsight achieves SOTA in CT 3D medical image retrieval, as well as SOTA in disease classification and search for chest X-ray, dermatology, and OCT imaging. Furthermore, MedImageInsight achieves human expert performance in bone age estimation (on both public and partner data), as well as AUC above 0.9 in most other domains. When paired with a text decoder, MedImageInsight achieves near SOTA level single image report findings generation with less than 10\% the parameters of other models. Compared to fine-tuning GPT-4o with only MIMIC-CXR data for the same task, MedImageInsight outperforms in clinical metrics, but underperforms on lexical metrics where GPT-4o sets a new SOTA. Importantly for regulatory purposes, MedImageInsight can generate ROC curves, adjust sensitivity and specificity based on clinical need, and provide evidence-based decision support through image-image search (which can also enable retrieval augmented generation). In an independent clinical evaluation of image-image search in chest X-ray, MedImageInsight outperformed every other publicly available foundation model evaluated by large margins (over 6 points AUC), and significantly outperformed other models in terms of AI fairness (across age and gender). We hope releasing MedImageInsight will help enhance collective progress in medical imaging AI research and development.
Published: 2024

6. FlexiTex: Enhancing Texture Generation with Visual Guidance

Author: Jiang, DaDong, Yang, Xianghui, Zhao, Zibo, Zhang, Sheng, Yu, Jiaao, Lai, Zeqiang, Yang, Shaoxiong, Guo, Chunchao, Zhou, Xiaobo, and Ke, Zhihui
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Recent texture generation methods achieve impressive results due to the powerful generative prior they leverage from large-scale text-to-image diffusion models. However, abstract textual prompts are limited in providing global textural or shape information, which results in the texture generation methods producing blurry or inconsistent patterns. To tackle this, we present FlexiTex, embedding rich information via visual guidance to generate a high-quality texture. The core of FlexiTex is the Visual Guidance Enhancement module, which incorporates more specific information from visual guidance to reduce ambiguity in the text prompt and preserve high-frequency details. To further enhance the visual guidance, we introduce a Direction-Aware Adaptation module that automatically designs direction prompts based on different camera poses, avoiding the Janus problem and maintaining semantically global consistency. Benefiting from the visual guidance, FlexiTex produces quantitatively and qualitatively sound results, demonstrating its potential to advance texture generation for real-world applications., Comment: Project Page: https://flexitex.github.io/FlexiTex/
Published: 2024

7. Machine learning approach for vibronically renormalized electronic band structures

Author: Aryal, Niraj, Zhang, Sheng, Yin, Weiguo, and Chern, Gia-Wei
Subjects: Condensed Matter - Materials Science, Computer Science - Machine Learning
Abstract: We present a machine learning (ML) method for efficient computation of vibrational thermal expectation values of physical properties from first principles. Our approach is based on the non-perturbative frozen phonon formulation in which stochastic Monte Carlo algorithm is employed to sample configurations of nuclei in a supercell at finite temperatures based on a first-principles phonon model. A deep-learning neural network is trained to accurately predict physical properties associated with sampled phonon configurations, thus bypassing the time-consuming {\em ab initio} calculations. To incorporate the point-group symmetry of the electronic system into the ML model, group-theoretical methods are used to develop a symmetry-invariant descriptor for phonon configurations in the supercell. We apply our ML approach to compute the temperature dependent electronic energy gap of silicon based on density functional theory (DFT). We show that, with less than a hundred DFT calculations for training the neural network model, an order of magnitude larger number of sampling can be achieved for the computation of the vibrational thermal expectation values. Our work highlights the promising potential of ML techniques for finite temperature first-principles electronic structure methods., Comment: 17 pages, 7 figures
Published: 2024

8. Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning

Author: Cheng, Ke, Wang, Zhi, Hu, Wen, Yang, Tiannuo, Li, Jianguo, and Zhang, Sheng
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: A service-level objective (SLO) is a target performance metric of service that cloud vendors aim to ensure. Delivering optimized SLOs can enhance user satisfaction and improve the competitiveness of cloud vendors. As large language models (LLMs) are gaining increasing popularity across various fields, it is of great significance to optimize SLOs for LLM inference services. In this paper, we observe that adjusting the parameters of LLM inference engines can improve service performance, and the optimal parameter configurations of different services are different. Therefore, we propose SCOOT, an automatic performance tuning system to optimize SLOs for each LLM inference service by tuning the parameters of the inference engine. We first propose a generalized formulation of the tuning problem to handle various objectives and constraints between parameters, and SCOOT exploits the Bayesian optimization (BO) technique to resolve the problem via exploration and exploitation. Moreover, SCOOT adopts a random forest to learn hidden constraints during the tuning process to mitigate invalid exploration. To improve the tuning efficiency, SCOOT utilizes the parallel suggestion to accelerate the tuning process. Extensive experiments demonstrate that SCOOT can significantly outperform existing tuning techniques in SLO optimization while greatly improving the tuning efficiency.
Published: 2024

9. Realization of Conditional Operations through Transition Pathway Engineering

Author: Zhang, Sheng, Duan, Peng, Wang, Yun-Jie, Wang, Tian-Le, Wang, Peng, Zhao, Ren-Ze, Yang, Xiao-Yan, Zhao, Ze-An, Guo, Liang-Liang, Chen, Yong, Zhang, Hai-Feng, Du, Lei, Tao, Hao-Ran, Li, Zhi-Fei, Wu, Yuan, Jia, Zhi-Long, Kong, Wei-Cheng, Chen, Zhao-Yun, Wu, Yu-Chun, and Guo, Guo-Ping
Subjects: Quantum Physics
Abstract: In the NISQ era, achieving large-scale quantum computing demands compact circuits to mitigate decoherence and gate error accumulation. Quantum operations with diverse degrees of freedom hold promise for circuit compression, but conventional approaches encounter challenges in simultaneously adjusting multiple parameters. Here, we propose a transition composite gate (TCG) scheme grounded on state-selective transition path engineering, enabling more expressive conditional operations. We experimentally validate a controlled unitary (CU) gate as an example, with independent and continuous parameters. By adjusting the parameters of $\rm X^{12}$ gate, we obtain the CU family with a fidelity range of 95.2% to 99.0% leveraging quantum process tomography (QPT). To demonstrate the capability of circuit compression, we use TCG scheme to prepare 3-qubit Greenberger-Horne-Zeilinger (GHZ) and W states, with the fidelity of 96.77% and 95.72%. TCG can achieve the reduction in circuit depth of about 40% and 44% compared with the use of CZ gates only. Moreover, we show that short-path TCG (SPTCG) can further reduce the state-preparation circuit time cost. The TCG scheme exhibits advantages in certain quantum circuits and shows significant potential for large-scale quantum algorithms., Comment: 21 pages, 12 figures
Published: 2024

10. Probing Perfection: The Relentless Art of Meddling for Pulmonary Airway Segmentation from HRCT via a Human-AI Collaboration Based Active Learning Method

Author: Wang, Shiyi, Nan, Yang, Zhang, Sheng, Felder, Federico, Xing, Xiaodan, Fang, Yingying, Del Ser, Javier, Walsh, Simon L F, and Yang, Guang
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: In pulmonary tracheal segmentation, the scarcity of annotated data is a prevalent issue in medical segmentation. Additionally, Deep Learning (DL) methods face challenges: the opacity of 'black box' models and the need for performance enhancement. Our Human-Computer Interaction (HCI) based models (RS_UNet, LC_UNet, UUNet, and WD_UNet) address these challenges by combining diverse query strategies with various DL models. We train four HCI models and repeat these steps: (1) Query Strategy: The HCI models select samples that provide the most additional representative information when labeled in each iteration and identify unlabeled samples with the greatest predictive disparity using Wasserstein Distance, Least Confidence, Entropy Sampling, and Random Sampling. (2) Central line correction: Selected samples are used for expert correction of system-generated tracheal central lines in each training round. (3) Update training dataset: Experts update the training dataset after each DL model's training epoch, enhancing the trustworthiness and performance of the models. (4) Model training: The HCI model is trained using the updated dataset and an enhanced UNet version. Experimental results confirm the effectiveness of these HCI-based approaches, showing that WD-UNet, LC-UNet, UUNet, and RS-UNet achieve comparable or superior performance to state-of-the-art DL models. Notably, WD-UNet achieves this with only 15%-35% of the training data, reducing physician annotation time by 65%-85%.
Published: 2024

11. From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning

Author: Xu, Nan, Wang, Fei, Zhang, Sheng, Poon, Hoifung, and Chen, Muhao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Motivated by in-context learning (ICL) capabilities of Large Language models (LLMs), multimodal LLMs with additional visual modality are also exhibited with similar ICL abilities when multiple image-text pairs are provided as demonstrations. However, relatively less work has been done to investigate the principles behind how and why multimodal ICL works. We conduct a systematic and principled evaluation of multimodal ICL for models of different scales on a broad spectrum of new yet critical tasks. Through perturbations over different modality information, we show that modalities matter differently across tasks in multimodal ICL. Guided by task-specific modality impact, we recommend modality-driven demonstration strategies to boost ICL performance. We also find that models may follow inductive biases from multimodal ICL even if they are rarely seen in or contradict semantic priors from pretraining data. Our principled analysis provides a comprehensive way of understanding the role of demonstrations in multimodal in-context learning, and sheds light on effectively improving multimodal ICL on a wide range of tasks.
Published: 2024

12. Fuzzy Attention-based Border Rendering Network for Lung Organ Segmentation

Author: Zhang, Sheng, Nan, Yang, Fang, Yingying, Wang, Shiyi, Xing, Xiaodan, Gao, Zhifan, and Yang, Guang
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Automatic lung organ segmentation on CT images is crucial for lung disease diagnosis. However, the unlimited voxel values and class imbalance of lung organs can lead to false-negative/positive and leakage issues in advanced methods. Additionally, some slender lung organs are easily lost during the recycled down/up-sample procedure, e.g., bronchioles & arterioles, causing severe discontinuity issue. Inspired by these, this paper introduces an effective lung organ segmentation method called Fuzzy Attention-based Border Rendering (FABR) network. Since fuzzy logic can handle the uncertainty in feature extraction, hence the fusion of deep networks and fuzzy sets should be a viable solution for better performance. Meanwhile, unlike prior top-tier methods that operate on all regular dense points, our FABR depicts lung organ regions as cube-trees, focusing only on recycle-sampled border vulnerable points, rendering the severely discontinuous, false-negative/positive organ regions with a novel Global-Local Cube-tree Fusion (GLCF) module. All experimental results, on four challenging datasets of airway & artery, demonstrate that our method can achieve the favorable performance significantly., Comment: MICCAI 2024
Published: 2024

13. Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving

Author: Cheng, Ke, Hu, Wen, Wang, Zhi, Peng, Hongen, Li, Jianguo, and Zhang, Sheng
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Large language models (LLMs) iteratively generate text token by token, with memory usage increasing with the length of generated token sequences. The unpredictability of generation lengths makes it difficult to estimate the time and memory needed to process requests, posing a challenge for effective request scheduling. Conventional sequence-level scheduling (SLS) serves requests in a first-come first-served (FCFS) manner with static batching where requests with short generation lengths are delayed until those with long ones have finished generation, which hurts computational efficiency. Besides, to avoid out-of-memory (OOM) errors, SLS batches requests with a small batch size, which limits throughput. Recently proposed iteration-level scheduling (ILS) enhances computational efficiency with continuous batching to return completed requests timely and dynamically add new requests for processing. However, many ILS schedulers limit the number of parallel-processing requests to avoid OOM errors while achieving a fast inference speed, which compromises throughput. Moreover, existing SLS and ILS schedulers fail to balance the workload across multiple deployed LLM instances. To tackle these challenges, we propose slice-level scheduling (SCLS). By splitting the predefined maximal generation length limit into slices and serving batches slice by slice, it provides a precise range of serving time and memory usage for batched requests, laying the foundation for effective scheduling. Experiments confirm that compared with SLS and ILS schedulers, SCLS can improve throughput by up to 315.8% and greatly mitigate load imbalance with proposed batching and offloading algorithms., Comment: 13 pages, 22 figures
Published: 2024

14. Super-resolution 3D tomography of vector near-fields in dielectric resonators

Author: Zhu, Bingbing, Cai, Qingnan, Liu, Yaxin, Zhang, Sheng, Liu, Weifeng, He, Qiong, Zhou, Lei, and Tao, Zhensheng
Subjects: Physics - Optics
Abstract: All-dielectric optical resonators, exhibiting exotic near-field distributions upon excitations, have emerged as low-loss, versatile and highly adaptable components in nanophotonic structures for manipulating electromagnetic waves and enhancing light-matter interactions. However, achieving experimental full three-dimensional characterization of near-fields within dielectric materials poses significant challenges. Here, we develop a novel technique using high-order sideband generation to image near-field wave patterns inside dielectric optical resonators. By exploiting the phase-sensitivity of various harmonic orders that enables the detection of near-field distributions at distinct depths, we realize three-dimensional tomographic and super-resolution near-field imaging inside a micrometer-thick silicon anapole resonator. Furthermore, our method offers high-contrast polarization sensitivity and phase-resolving capability, providing comprehensive vectorial near-field information. Our approach can potentially be applied to diverse dielectric metamaterials, and becomes a valuable tool for comprehensive characterization of near-field wave phenomena within dielectric materials., Comment: 26 pages, 4 figures
Published: 2024

15. mDPO: Conditional Preference Optimization for Multimodal Large Language Models

Author: Wang, Fei, Zhou, Wenxuan, Huang, James Y., Xu, Nan, Zhang, Sheng, Poon, Hoifung, and Chen, Muhao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Direct preference optimization (DPO) has shown to be an effective method for large language model (LLM) alignment. Recent works have attempted to apply DPO to multimodal scenarios but have found it challenging to achieve consistent improvement. Through a comparative experiment, we identify the unconditional preference problem in multimodal preference optimization, where the model overlooks the image condition. To address this problem, we propose mDPO, a multimodal DPO objective that prevents the over-prioritization of language-only preferences by also optimizing image preference. Moreover, we introduce a reward anchor that forces the reward to be positive for chosen responses, thereby avoiding the decrease in their likelihood -- an intrinsic problem of relative preference optimization. Experiments on two multimodal LLMs of different sizes and three widely used benchmarks demonstrate that mDPO effectively addresses the unconditional preference problem in multimodal preference optimization and significantly improves model performance, particularly in reducing hallucination., Comment: Accepted to EMNLP 2024 Main Conference. Project website: https://feiwang96.github.io/mDPO
Published: 2024

16. MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Author: Wang, Fei, Fu, Xingyu, Huang, James Y., Li, Zekun, Liu, Qin, Liu, Xiaogeng, Ma, Mingyu Derek, Xu, Nan, Zhou, Wenxuan, Zhang, Kai, Yan, Tianyi Lorena, Mo, Wenjie Jacky, Liu, Hsiang-Hui, Lu, Pan, Li, Chunyuan, Xiao, Chaowei, Chang, Kai-Wei, Roth, Dan, Zhang, Sheng, Poon, Hoifung, and Chen, Muhao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a pairwise manner, where each standard instance is paired with an unanswerable variant that has minimal semantic differences, in order for a reliable assessment. Evaluated upon 20 recent multi-modal LLMs, our results reveal that even the best-performing models like GPT-4o and Gemini Pro find it challenging to solve MuirBench, achieving 68.0% and 49.3% in accuracy. Open-source multimodal LLMs trained on single images can hardly generalize to multi-image questions, hovering below 33.3% in accuracy. These results highlight the importance of MuirBench in encouraging the community to develop multimodal LLMs that can look beyond a single image, suggesting potential pathways for future improvements., Comment: typos corrected, references added, Project Page: https://muirbench.github.io/
Published: 2024

17. Enabling Large-Scale and High-Precision Fluid Simulations on Near-Term Quantum Computers

Author: Chen, Zhao-Yun, Ma, Teng-Yang, Ye, Chuang-Chao, Xu, Liang, Tan, Ming-Yang, Zhuang, Xi-Ning, Xu, Xiao-Fan, Wang, Yun-Jie, Sun, Tai-Ping, Chen, Yong, Du, Lei, Guo, Liang-Liang, Zhang, Hai-Feng, Tao, Hao-Ran, Wang, Tian-Le, Yang, Xiao-Yan, Zhao, Ze-An, Wang, Peng, Zhang, Sheng, Zhang, Chi, Zhao, Ren-Ze, Jia, Zhi-Long, Kong, Wei-Cheng, Dou, Meng-Han, Wang, Jun-Chao, Liu, Huan-Yu, Xue, Cheng, Zhang, Peng-Jun-Yi, Huang, Sheng-Hong, Duan, Peng, Wu, Yu-Chun, and Guo, Guo-Ping
Subjects: Physics - Computational Physics, Quantum Physics
Abstract: Quantum computational fluid dynamics (QCFD) offers a promising alternative to classical computational fluid dynamics (CFD) by leveraging quantum algorithms for higher efficiency. This paper introduces a comprehensive QCFD method, including an iterative method "Iterative-QLS" that suppresses error in quantum linear solver, and a subspace method to scale the solution to a larger size. We implement our method on a superconducting quantum computer, demonstrating successful simulations of steady Poiseuille flow and unsteady acoustic wave propagation. The Poiseuille flow simulation achieved a relative error of less than $0.2\%$, and the unsteady acoustic wave simulation solved a 5043-dimensional matrix. We emphasize the utilization of the quantum-classical hybrid approach in applications of near-term quantum computers. By adapting to quantum hardware constraints and offering scalable solutions for large-scale CFD problems, our method paves the way for practical applications of near-term quantum computers in computational science., Comment: 31 pages, 10 figures
Published: 2024

18. Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction

Author: Cheng, Ke, Hu, Wen, Wang, Zhi, Du, Peng, Li, Jianguo, and Zhang, Sheng
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Nowadays, large language models (LLMs) are published as a service and can be accessed by various applications via APIs, also known as language-model-as-a-service (LMaaS). Without knowing the generation length of requests, existing serving systems serve requests in a first-come, first-served (FCFS) manner with a fixed batch size, which leads to two problems that affect batch serving efficiency. First, the generation lengths of requests in a batch vary, and requests with short generation lengths must wait for requests with long generation lengths to finish during the batch serving procedure. Second, requests with longer generation lengths consume more memory during serving. Without knowing the generation lengths of batched requests, the batch size is always set small to avoid the out-of-memory (OOM) error, thus preventing the GPU from being fully utilized. In this paper, we find that a significant number of popular applications in the LMaaS scenario have a positive correlation between the generation length and the length of raw user input. Based on this observation, we propose Magnus, which can accurately predict the request generation length with the user input length, application-level, and user-level semantic features. Accordingly, Magnus can achieve high request throughput by batching requests of similar generation lengths together with adaptive batch sizes. Besides, Magnus can also schedule batches with the highest response ratio next (HRRN) policy to reduce request response time. Experiments conducted on our testbed show that Magnus improves request throughput by up to 234\% and reduces response time by up to 89.7\% compared to baselines., Comment: 12 pages, 14 figures
Published: 2024

19. GLINT-RU: Gated Lightweight Intelligent Recurrent Units for Sequential Recommender Systems

Author: Zhang, Sheng, Wang, Maolin, and Zhao, Xiangyu
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence
Abstract: In the rapidly evolving field of artificial intelligence, transformer-based models have gained significant attention in the context of Sequential Recommender Systems (SRSs), demonstrating remarkable proficiency in capturing user-item interactions. However, such attention-based frameworks result in substantial computational overhead and extended inference time. To address this problem, this paper proposes a novel efficient sequential recommendation framework GLINT-RU that leverages dense selective Gated Recurrent Units (GRU) module to accelerate the inference speed, which is a pioneering work to further exploit the potential of efficient GRU modules in SRSs. The GRU module lies at the heart of GLINT-RU, playing a crucial role in substantially reducing both inference time and GPU memory usage. Through the integration of a dense selective gate, our framework adeptly captures both long-term and short-term item dependencies, enabling the adaptive generation of item scores. GLINT-RU further integrates a mixing block, enriching it with global user-item interaction information to bolster recommendation quality. Moreover, we design a gated Multi-layer Perceptron (MLP) for our framework where the information is deeply filtered. Extensive experiments on three datasets are conducted to highlight the effectiveness and efficiency of GLINT-RU. Our GLINT-RU achieves exceptional inference speed and prediction accuracy, outperforming existing baselines based on Recurrent Neural Network (RNN), Transformer, MLP and State Space Model (SSM). These results establish a new standard in sequential recommendation, highlighting the potential of GLINT-RU as a renewing approach in the realm of recommender systems.
Published: 2024

20. MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering

Author: Ness, Robert Osazuwa, Matton, Katie, Helm, Hayden, Zhang, Sheng, Bajwa, Junaid, Priebe, Carey E., and Horvitz, Eric
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, I.2.7
Abstract: Large language models (LLM) have achieved impressive performance on medical question-answering benchmarks. However, high benchmark accuracy does not imply that the performance generalizes to real-world clinical settings. Medical question-answering benchmarks rely on assumptions consistent with quantifying LLM performance but that may not hold in the open world of the clinic. Yet LLMs learn broad knowledge that can help the LLM generalize to practical conditions regardless of unrealistic assumptions in celebrated benchmarks. We seek to quantify how well LLM medical question-answering benchmark performance generalizes when benchmark assumptions are violated. Specifically, we present an adversarial method that we call MedFuzz (for medical fuzzing). MedFuzz attempts to modify benchmark questions in ways aimed at confounding the LLM. We demonstrate the approach by targeting strong assumptions about patient characteristics presented in the MedQA benchmark. Successful "attacks" modify a benchmark item in ways that would be unlikely to fool a medical expert but nonetheless "trick" the LLM into changing from a correct to an incorrect answer. Further, we present a permutation test technique that can ensure a successful attack is statistically significant. We show how to use performance on a "MedFuzzed" benchmark, as well as individual successful attacks. The methods show promise at providing insights into the ability of an LLM to operate robustly in more realistic settings., Comment: 9 pages, 3 figures, 2 algorithms, appendix
Published: 2024

21. Explore the properties of $\Lambda(1670)$ in the Cabibbo-favored process $\Lambda^+_c \to p K^- \pi^+$ decay

Author: Zhang, Sheng-Chao, Duan, Man-Yu, Lyu, Wen-Tao, Wang, Guan-Ying, Zhu, Jing-Yu, and Wang, En
Subjects: High Energy Physics - Phenomenology
Abstract: Recently, the Belle and LHCb Collaborations have measured the $\Lambda^+_c \to p K^- \pi^+$ decay and reported the $p K^-$ invariant mass distribution, which shows a clear cusp structure around the $\eta \Lambda$ threshold. In this work, we have analyzed this process by considering the triangle mechanism and the $S$-wave pseudoscalar meson-octet baryon interactions within the chiral unitary approach, which dynamically generate the $\Lambda(1670)$. Our results are in good agreement with the Belle measurements, which implies that the cusp structure around $\eta\Lambda$ threshold could be associated with the $\Lambda(1670)$ with the molecular nature.
Published: 2024

22. Kinetics of orbital ordering in cooperative Jahn-Teller models: Machine-learning enabled large-scale simulations

Author: Ghosh, Supriyo, Zhang, Sheng, Cheng, Chen, and Chern, Gia-Wei
Subjects: Condensed Matter - Strongly Correlated Electrons, Condensed Matter - Materials Science, Computer Science - Machine Learning
Abstract: We present a scalable machine learning (ML) force-field model for the adiabatic dynamics of cooperative Jahn-Teller (JT) systems. Large scale dynamical simulations of the JT model also shed light on the orbital ordering dynamics in colossal magnetoresistance manganites. The JT effect in these materials describes the distortion of local oxygen octahedra driven by a coupling to the orbital degrees of freedom of $e_g$ electrons. An effective electron-mediated interaction between the local JT modes leads to a structural transition and the emergence of long-range orbital order at low temperatures. Assuming the principle of locality, a deep-learning neural-network model is developed to accurately and efficiently predict the electron-induced forces that drive the dynamical evolution of JT phonons. A group-theoretical method is utilized to develop a descriptor that incorporates the combined orbital and lattice symmetry into the ML model. Large-scale Langevin dynamics simulations, enabled by the ML force-field models, are performed to investigate the coarsening dynamics of the composite JT distortion and orbital order after a thermal quench. The late-stage coarsening of orbital domains exhibits pronounced freezing behaviors which are likely related to the unusual morphology of the domain structures. Our work highlights a promising avenue for multi-scale dynamical modeling of correlated electron systems., Comment: 17 pages, 11 figures
Published: 2024

23. When AI Eats Itself: On the Caveats of AI Autophagy

Author: Xing, Xiaodan, Shi, Fadong, Huang, Jiahao, Wu, Yinzhe, Nan, Yang, Zhang, Sheng, Fang, Yingying, Roberts, Mike, Schönlieb, Carola-Bibiane, Del Ser, Javier, and Yang, Guang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Generative Artificial Intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech, and music. Creating these advanced generative models requires significant resources, particularly large and high-quality datasets. To minimise training expenses, many algorithm developers use data created by the models themselves as a cost-effective training solution. However, not all synthetic data effectively improve model performance, necessitating a strategic balance in the use of real versus synthetic data to optimise outcomes. Currently, the previously well-controlled integration of real and synthetic data is becoming uncontrollable. The widespread and unregulated dissemination of synthetic data online leads to the contamination of datasets traditionally compiled through web scraping, now mixed with unlabeled synthetic data. This trend, known as the AI autophagy phenomenon, suggests a future where generative AI systems may increasingly consume their own outputs without discernment, raising concerns about model performance, reliability, and ethical implications. What will happen if generative AI continuously consumes itself without discernment? What measures can we take to mitigate the potential adverse effects? To address these research questions, this study examines the existing literature, delving into the consequences of AI autophagy, analyzing the associated risks, and exploring strategies to mitigate its impact. Our aim is to provide a comprehensive perspective on this phenomenon advocating for a balanced approach that promotes the sustainable development of generative AI technologies in the era of large models.
Published: 2024

24. Evidence of the low-lying baryon $\Sigma^*(1/2^-)$ in the process $\Lambda_c^+\to \eta\pi^+\Lambda$

Author: Lyu, Wen-Tao, Zhang, Sheng-Chao, Wang, Guan-Ying, Wu, Jia-Jun, Wang, En, Geng, Li-Sheng, and Xie, Ju-Jun
Subjects: High Energy Physics - Phenomenology
Abstract: Motivated by the Belle measurements of the process $\Lambda_c^+\to \eta\pi^+\Lambda$, we investigate this process by considering the contributions from the $\Lambda(1670)$, $a_0(980)$, and $\Sigma(1385)$. In addition, we also consider the predicted low-lying baryon $\Sigma^*(1/2^-)$. Our results involving the $\Sigma^*(1/2^-)$ are favored by fitting to the Belle data of the $\eta\Lambda$ and $\pi^+\Lambda$ invariant mass distributions. Furthermore, we predict the $\eta\pi^+$ invariant mass distribution and the angular distribution $d\Gamma/d{\rm cos}\theta$, which are significantly different depending on whether or not the contribution from the $\Sigma^*(1/2^-)$ is considered. Finally, we show that, with the contribution from the $\Sigma^*(1/2^-)$, the calculated Dalizt plot agrees with the Belle measurements. Future precise measurements of the process $\Lambda_c^+\to \eta\pi^+\Lambda$ could shed further light on the existence of the low-lying $\Sigma^*(1/2^-)$., Comment: 11 pages, 14 figures, comments are welcome
Published: 2024
Full Text: View/download PDF

25. Canonical interpretation of the newly observed $J^P =1^+$ structure $X(2085)$

Author: Li, Tian-Ge, Zhang, Sheng-Chao, Wang, Guan-Ying, and Lü, Qi-Fang
Subjects: High Energy Physics - Phenomenology
Abstract: Inspired by the newly observed $X(2085)$ by the BESIII Collaboration, we study the strong decay behaviors of excited axialvector strange mesons within the quark pair creation model. Our results indicate that the $K_1(1793)/K_1(1861)$ can be regarded as the same $K_1(2P)$ state, and the $K_1(1911)$ is assigned as the $K_1(2P^\prime)$ state. Considering the mass, spin-parity, and decay behaviors, we interpret the newly observed $X(2085)$ as the radially excited $K_1(3P)$ state, which mainly decays into the $\rho(1450) K$, $\omega(1420)K$, $\pi K^*(1410)$, $\rho K_1(1270)$, and $\rho K^*(892)$ final states. Also, the width of $K_1(3P^\prime)$ state is predicted to be about 300 MeV, which can be searched for by future experiments. We expect that present calculations can help us to better understand the nature of the $X(2085)$ structure.
Published: 2024

26. Offset Unlearning for Large Language Models

Author: Huang, James Y., Zhou, Wenxuan, Wang, Fei, Morstatter, Fred, Zhang, Sheng, Poon, Hoifung, and Chen, Muhao
Subjects: Computer Science - Computation and Language
Abstract: Despite the strong capabilities of Large Language Models (LLMs) to acquire knowledge from their training corpora, the memorization of sensitive information in the corpora such as copyrighted, harmful, and private content has led to ethical and legal concerns. In response to these challenges, unlearning has emerged as a potential remedy for LLMs affected by problematic training data. However, previous unlearning techniques are either not applicable to black-box LLMs due to required access to model internal weights, or violate data protection principles by retaining sensitive data for inference-time correction. We propose $\delta$-unlearning, an offset unlearning framework for black-box LLMs. Instead of tuning the black-box LLM itself, $\delta$-unlearning learns the logit offset needed for unlearning by contrasting the logits from a pair of smaller models. Experiments demonstrate that $\delta$-unlearning can effectively unlearn target data while maintaining similar or even stronger performance on general out-of-forget-scope tasks. $\delta$-unlearning also effectively incorporates different unlearning algorithms, making our approach a versatile solution to adapting various existing unlearning algorithms to black-box LLMs.
Published: 2024

27. Fast delivery of heralded atom-photon quantum correlation over 12km fiber through multiplexing enhancement

Author: Zhang, Sheng, Shi, Jixuan, Liang, Yibo, Sun, Yuedong, Wu, Yukai, Duan, Luming, and Pu, Yunfei
Subjects: Quantum Physics
Abstract: Distributing quantum entanglement between distant parties is a significant but difficult task in quantum information science, as it can enable numerous applications but suffers from exponential decay in the quantum channel. Quantum repeater is one of the most promising approaches towards this goal. In a quantum repeater protocol, it is essential that the entanglement generation speed within each elementary link is faster than the memory decoherence rate, to enable the scale-up of the quantum repeater by connecting neighboring repeater segments. This stringent requirement has not been implemented over a fiber of metropolitan scale so far. As a step towards this challenging goal, in this work we experimentally realize multiplexing-enhanced generation of heralded atom-photon quantum correlation over a 12km fiber. We excite the memory modes in a multiplexed quantum memory successively to generate 280 pairs of atom-photon quantum correlations with a train of photonic time-bin pulses filling the long fiber. After successful detection of a heralding signal, the excited memory mode can be identified and retrieved into idler photons on demand with either fixed or variable storage time. With the multiplexing enhancement, the heralding rate of atom-photon correlation can reach 1.95kHz, and the ratio between the quantum correlation generation rate to memory decoherence rate can be improved to 0.46 for a fiber length of 12km, which is so far the best for long fiber length (>10km) to our knowledge. This work therefore constitutes an important step towards the realization of a large-scale quantum repeater network., Comment: 13 pages, 10 figures
Published: 2024

28. Coarsening of chiral domains in itinerant electron magnets: A machine learning force field approach

Author: Fan, Yunhao, Zhang, Sheng, and Chern, Gia-Wei
Subjects: Condensed Matter - Strongly Correlated Electrons, Computer Science - Machine Learning
Abstract: Frustrated itinerant magnets often exhibit complex noncollinear or noncoplanar magnetic orders which support topological electronic structures. A canonical example is the anomalous quantum Hall state with a chiral spin order stabilized by electron-spin interactions on a triangular lattice. While a long-range magnetic order cannot survive thermal fluctuations in two dimensions, the chiral order which results from the breaking of a discrete Ising symmetry persists even at finite temperatures. We present a scalable machine learning (ML) framework to model the complex electron-mediated spin-spin interactions that stabilize the chiral magnetic domains in a triangular lattice. Large-scale dynamical simulations, enabled by the ML force-field models, are performed to investigate the coarsening of chiral domains after a thermal quench. While the chiral phase is described by a broken $Z_2$ Ising-type symmetry, we find that the characteristic size of chiral domains increases linearly with time, in stark contrast to the expected Allen-Cahn domain growth law for a non-conserved Ising order parameter field. The linear growth of the chiral domains is attributed to the orientational anisotropy of domain boundaries. Our work also demonstrates the promising potential of ML models for large-scale spin dynamics of itinerant magnets., Comment: 16 pages, 8 figures
Published: 2024

29. Quantum Advantage of One-Way Squeezing in Enhancing Weak-Force Sensing

Author: Wang, Jie, Zhang, Qian, Jiao, Ya-Feng, Zhang, Sheng-Dian, Lu, Tian-Xiang, Li, Zhipeng, Qiu, Cheng-Wei, and Jing, Hui
Subjects: Quantum Physics
Abstract: Cavity optomechanical (COM) sensors, featuring efficient light-motion couplings, have been widely used for ultra sensitive measurements of various physical quantities ranging from displacements to accelerations or weak forces. Previous works, however, have mainly focused on reciprocal COM systems. Here, we propose how to further improve the performance of quantum COM sensors by breaking reciprocal symmetry in purely quantum regime. Specifically, we consider a spinning COM resonator and show that by selectively driving it in opposite directions, highly nonreciprocal optical squeezing can emerge, which in turn provides an efficient way to surpass the standard quantum limit that otherwise exists in conventional reciprocal devices. Our work confirms that breaking reciprocal symmetry, already achieved in diverse systems well beyond spinning systems, can serve as a new strategy to further enhance the abilities of advanced quantum sensors, for applications ranging from testing fundamental physical laws to practical quantum metrology., Comment: 7 pages,3 figures
Published: 2024

30. Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

Author: Chaves, Juan Manuel Zambrano, Huang, Shih-Cheng, Xu, Yanbo, Xu, Hanwen, Usuyama, Naoto, Zhang, Sheng, Wang, Fei, Xie, Yujia, Khademi, Mahmoud, Yang, Ziyi, Awadalla, Hany, Gong, Julia, Hu, Houdong, Yang, Jianwei, Li, Chunyuan, Gao, Jianfeng, Gu, Yu, Wong, Cliff, Wei, Mu, Naumann, Tristan, Chen, Muhao, Lungren, Matthew P., Chaudhari, Akshay, Yeung-Levy, Serena, Langlotz, Curtis P., Wang, Sheng, and Poon, Hoifung
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant performance gaps in multimodal biomedical applications. More importantly, less-acknowledged pragmatic issues, including accessibility, model cost, and tedious manual evaluation make it hard for clinicians to use state-of-the-art large models directly on private patient data. Here, we explore training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology. To maximize data efficiency, we adopt a modular approach by incorporating state-of-the-art pre-trained models for image and text modalities, and focusing on training a lightweight adapter to ground each modality to the text embedding space, as exemplified by LLaVA-Med. For training, we assemble a large dataset of over 697 thousand radiology image-text pairs. For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation. For best practice, we conduct a systematic ablation study on various choices in data engineering and multimodal training. The resulting LlaVA-Rad (7B) model attains state-of-the-art results on standard radiology tasks such as report generation and cross-modal retrieval, even outperforming much larger models such as GPT-4V and Med-PaLM M (84B). The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
Published: 2024

31. Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries

Author: Gero, Zelalem, Singh, Chandan, Xie, Yiqing, Zhang, Sheng, Naumann, Tristan, Gao, Jianfeng, and Poon, Hoifung
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Summarizing clinical text is crucial in health decision-support and clinical research. Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation, especially in safety-critical domains such as health. Holistically evaluating text summaries is challenging because they may contain unsubstantiated information. Here, we explore a general mitigation framework using Attribute Structuring (AS), which structures the summary evaluation process. It decomposes the evaluation process into a grounded procedure that uses an LLM for relatively simple structuring and scoring tasks, rather than the full task of holistic summary evaluation. Experiments show that AS consistently improves the correspondence between human annotations and automated metrics in clinical text summarization. Additionally, AS yields interpretations in the form of a short text span corresponding to each output, which enables efficient human auditing, paving the way towards trustworthy evaluation of clinical information in resource-constrained scenarios. We release our code, prompts, and an open-source benchmark at https://github.com/microsoft/attribute-structuring., Comment: 4 pages
Published: 2024

32. Hyperon semileptonic decays in QCD sum rules

Author: Zhang, Sheng-Qi, Zhang, Xuan-Heng, and Qiao, Cong-Feng
Subjects: High Energy Physics - Phenomenology
Abstract: We investigate the hyperon semileptonic decays within the framework of QCD sum rules. The flavor $ SU(3) $ symmetry breaking effects are analyzed via the relevant form factors and corresponding branching fractions. Employing the $ z $-series parameterization to capture the $ q^2 $ dependence of form factors, we calculate the hyperon semileptonic decay rates and confront them with the recent experimental measurements. Moreover, we calculate as well the non-standard tensor form factors, which involve certain new physics beyond the standard model., Comment: 18 pages, 2 figures
Published: 2024
Full Text: View/download PDF

33. Make it more specific: A novel uncertainty based airway segmentation application on 3D U-Net and its variants

Author: Wang, Shiyi, Nan, Yang, N, Felder Federico, Zhang, Sheng, F, Walsh Simon L, and Yang, Guang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Each medical segmentation task should be considered with a specific AI algorithm based on its scenario so that the most accurate prediction model can be obtained. The most popular algorithms in medical segmentation, 3D U-Net and its variants, can directly implement the task of lung trachea segmentation, but its failure to consider the special tree-like structure of the trachea suggests that there is much room for improvement in its segmentation accuracy. Therefore, a research gap exists because a great amount of state-of-the-art DL algorithms are vanilla 3D U-Net structures, which do not introduce the various performance-enhancing modules that come with special natural image modality in lung airway segmentation. In this paper, we proposed two different network structures Branch-Level U-Net (B-UNet) and Branch-Level CE-UNet (B-CE-UNet) which are based on U-Net structure and compared the prediction results with the same dataset. Specially, both of the two networks add branch loss and central line loss to learn the feature of fine branch endings of the airways. Uncertainty estimation algorithms are also included to attain confident predictions and thereby, increase the overall trustworthiness of our whole model. In addition, predictions of the lung trachea based on the maximum connectivity rate were calculated and extracted during post-processing for segmentation refinement and pruning.
Published: 2024

34. EASRec: Elastic Architecture Search for Efficient Long-term Sequential Recommender Systems

Author: Zhang, Sheng, Wang, Maolin, Zhao, Yao, Zhuang, Chenyi, Gu, Jinjie, Guo, Ruocheng, Zhao, Xiangyu, Zhang, Zijian, and Yin, Hongzhi
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence
Abstract: In this age where data is abundant, the ability to distill meaningful insights from the sea of information is essential. Our research addresses the computational and resource inefficiencies that current Sequential Recommender Systems (SRSs) suffer from. especially those employing attention-based models like SASRec, These systems are designed for next-item recommendations in various applications, from e-commerce to social networks. However, such systems suffer from substantial computational costs and resource consumption during the inference stage. To tackle these issues, our research proposes a novel method that combines automatic pruning techniques with advanced model architectures. We also explore the potential of resource-constrained Neural Architecture Search (NAS), a technique prevalent in the realm of recommendation systems, to fine-tune models for reduced FLOPs, latency, and energy usage while retaining or even enhancing accuracy. The main contribution of our work is developing the Elastic Architecture Search for Efficient Long-term Sequential Recommender Systems (EASRec). This approach aims to find optimal compact architectures for attention-based SRSs, ensuring accuracy retention. EASRec introduces data-aware gates that leverage historical information from input data batch to improve the performance of the recommendation network. Additionally, it utilizes a dynamic resource constraint approach, which standardizes the search process and results in more appropriate architectures. The effectiveness of our methodology is validated through exhaustive experiments on three benchmark datasets, which demonstrates EASRec's superiority in SRSs. Our research set a new standard for future exploration into efficient and accurate recommender systems, signifying a substantial advancement within this swiftly advancing field.
Published: 2024

35. T-Rex: Text-assisted Retrosynthesis Prediction

Author: Liu, Yifeng, Xu, Hanwen, Fang, Tangqi, Xi, Haocheng, Liu, Zixuan, Zhang, Sheng, Poon, Hoifung, and Wang, Sheng
Subjects: Computer Science - Computation and Language
Abstract: As a fundamental task in computational chemistry, retrosynthesis prediction aims to identify a set of reactants to synthesize a target molecule. Existing template-free approaches only consider the graph structures of the target molecule, which often cannot generalize well to rare reaction types and large molecules. Here, we propose T-Rex, a text-assisted retrosynthesis prediction approach that exploits pre-trained text language models, such as ChatGPT, to assist the generation of reactants. T-Rex first exploits ChatGPT to generate a description for the target molecule and rank candidate reaction centers based both the description and the molecular graph. It then re-ranks these candidates by querying the descriptions for each reactants and examines which group of reactants can best synthesize the target molecule. We observed that T-Rex substantially outperformed graph-based state-of-the-art approaches on two datasets, indicating the effectiveness of considering text information. We further found that T-Rex outperformed the variant that only use ChatGPT-based description without the re-ranking step, demonstrate how our framework outperformed a straightforward integration of ChatGPT and graph information. Collectively, we show that text generated by pre-trained language models can substantially improve retrosynthesis prediction, opening up new avenues for exploiting ChatGPT to advance computational chemistry. And the codes can be found at https://github.com/lauyikfung/T-Rex.
Published: 2024

36. Aircraft Landing Time Prediction with Deep Learning on Trajectory Images

Author: Huang, Liping, Zhang, Sheng, Zhang, Yicheng, Zhang, Yi, and Yin, Yifang
Subjects: Computer Science - Machine Learning
Abstract: Aircraft landing time (ALT) prediction is crucial for air traffic management, especially for arrival aircraft sequencing on the runway. In this study, a trajectory image-based deep learning method is proposed to predict ALTs for the aircraft entering the research airspace that covers the Terminal Maneuvering Area (TMA). Specifically, the trajectories of all airborne arrival aircraft within the temporal capture window are used to generate an image with the target aircraft trajectory labeled as red and all background aircraft trajectory labeled as blue. The trajectory images contain various information, including the aircraft position, speed, heading, relative distances, and arrival traffic flows. It enables us to use state-of-the-art deep convolution neural networks for ALT modeling. We also use real-time runway usage obtained from the trajectory data and the external information such as aircraft types and weather conditions as additional inputs. Moreover, a convolution neural network (CNN) based module is designed for automatic holding-related featurizing, which takes the trajectory images, the leading aircraft holding status, and their time and speed gap at the research airspace boundary as its inputs. Its output is further fed into the final end-to-end ALT prediction. The proposed ALT prediction approach is applied to Singapore Changi Airport (ICAO Code: WSSS) using one-month Automatic Dependent Surveillance-Broadcast (ADS-B) data from November 1 to November 30, 2022. Experimental results show that by integrating the holding featurization, we can reduce the mean absolute error (MAE) from 82.23 seconds to 43.96 seconds, and achieve an average accuracy of 96.1\%, with 79.4\% of the predictions errors being less than 60 seconds., Comment: In 2023 13th SESAR Innovation Days (SIDS2023)
Published: 2024

37. Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge

Author: Nan, Yang, Xing, Xiaodan, Wang, Shiyi, Tang, Zeyu, Felder, Federico N, Zhang, Sheng, Ledda, Roberta Eufrasia, Ding, Xiaoliu, Yu, Ruiqi, Liu, Weiping, Shi, Feng, Sun, Tianyang, Cao, Zehong, Zhang, Minghui, Gu, Yun, Zhang, Hanxiao, Gao, Jian, Wang, Pingyu, Tang, Wen, Yu, Pengxin, Kang, Han, Chen, Junqiang, Lu, Xing, Zhang, Boyu, Mamalakis, Michail, Prinzi, Francesco, Carlini, Gianluca, Cuneo, Lisa, Banerjee, Abhirup, Xing, Zhaohu, Zhu, Lei, Mesbah, Zacharia, Jain, Dhruv, Mayet, Tsiry, Yuan, Hongyu, Lyu, Qing, Qayyum, Abdul, Mazher, Moona, Wells, Athol, Walsh, Simon LF, and Yang, Guang
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intricate honeycombing patterns present in the lung tissues of fibrotic lung disease patients exacerbate the challenges, often leading to various prediction errors. To address this issue, the 'Airway-Informed Quantitative CT Imaging Biomarker for Fibrotic Lung Disease 2023' (AIIB23) competition was organized in conjunction with the official 2023 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). The airway structures were meticulously annotated by three experienced radiologists. Competitors were encouraged to develop automatic airway segmentation models with high robustness and generalization abilities, followed by exploring the most correlated QIB of mortality prediction. A training set of 120 high-resolution computerised tomography (HRCT) scans were publicly released with expert annotations and mortality status. The online validation set incorporated 52 HRCT scans from patients with fibrotic lung disease and the offline test set included 140 cases from fibrosis and COVID-19 patients. The results have shown that the capacity of extracting airway trees from patients with fibrotic lung disease could be enhanced by introducing voxel-wise weighted general union loss and continuity loss. In addition to the competitive image biomarkers for prognosis, a strong airway-derived biomarker (Hazard ratio>1.5, p<0.0001) was revealed for survival prognostication compared with existing clinical measurements, clinician assessment and AI-based biomarkers., Comment: 19 pages
Published: 2023
Full Text: View/download PDF

38. VASP2KP: kp models and Lande g-factors from ab initio calculations

Author: Zhang, Sheng, Sheng, Haohao, Song, Zhi-Da, Liang, Chenhao, Jiang, Yi, Sun, Song, Wu, Quansheng, Weng, Hongming, Fang, Zhong, Dai, Xi, and Wang, Zhijun
Subjects: Condensed Matter - Materials Science
Abstract: The $k\cdot p$ method is significant in condensed matter physics for the compact and analytical Hamiltonian. In the presence of magnetic field, it is described by the effective Zeeman's coupling Hamiltonian with Land\'e $ g $-factors. Here, we develop an open-source package VASP2KP (including two parts: vasp2mat and mat2kp) to compute $k\cdot p$ parameters and Land\'e $g$-factors directly from the wavefunctions provided by the density functional theory (DFT) as implemented in Vienna ab initio Simulation Package (VASP). First, we develop a VASP patch vasp2mat to compute matrix representations of the generalized momentum operator $ \mathbf{\hat{\pi}}=\mathbf{\hat{p}}+\frac{1}{2mc^2}\left(\mathbf{\hat{s}}\times\nabla V(\mathbf{r})\right) $, spin operator $\mathbf{\hat{s}}$, time reversal operator $\hat{T}$ and crystalline symmetry operators $\hat{R}$ on the DFT wavefunctions. Second, we develop a python code mat2kp to obtain the unitary transformation $U$ that rotates the degenerate DFT basis towards the standard basis, and then automatically compute the $k\cdot p$ parameters and $g$-factors. The theory and the methodology behind VASP2KP are described in detail. The matrix elements of the operators are derived comprehensively and computed correctly within the projector augmented wave method. We apply this package to some materials, e.g., Bi$_2$Se$_3$, Na$_3$Bi, Te, InAs and 1H-TMD monolayers. The obtained effective model's dispersions are in good agreement with the DFT data around the specific wave vector, and the $g$-factors are consistent with experimental data. The VASP2KP package is available at https://github.com/zjwang11/VASP2KP.
Published: 2023
Full Text: View/download PDF

39. Code Membership Inference for Detecting Unauthorized Data Use in Code Pre-trained Language Models

Author: Zhang, Sheng and Li, Hui
Subjects: Computer Science - Software Engineering
Abstract: Code pre-trained language models (CPLMs) have received great attention since they can benefit various tasks that facilitate software development and maintenance. However, CPLMs are trained on massive open-source code, raising concerns about potential data infringement. This paper launches the first study of detecting unauthorized code use in CPLMs, i.e., Code Membership Inference (CMI) task. We design a framework Buzzer for different settings of CMI. Buzzer deploys several inference techniques, including distilling the target CPLM, ensemble inference, and unimodal and bimodal calibration. Extensive experiments show that CMI can be achieved with high accuracy using Buzzer. Hence, Buzzer can serve as a CMI tool and help protect intellectual property rights.
Published: 2023

40. Code Search Debiasing:Improve Search Results beyond Overall Ranking Performance

Author: Zhang, Sheng, Li, Hui, Wang, Yanlin, Wei, Zhao, Xiu, Yong, Wang, Juhong, and Ji, Rongong
Subjects: Computer Science - Computation and Language
Abstract: Code search engine is an essential tool in software development. Many code search methods have sprung up, focusing on the overall ranking performance of code search. In this paper, we study code search from another perspective by analyzing the bias of code search models. Biased code search engines provide poor user experience, even though they show promising overall performance. Due to different development conventions (e.g., prefer long queries or abbreviations), some programmers will find the engine useful, while others may find it hard to get desirable search results. To mitigate biases, we develop a general debiasing framework that employs reranking to calibrate search results. It can be easily plugged into existing engines and handle new code search biases discovered in the future. Experiments show that our framework can effectively reduce biases. Meanwhile, the overall ranking performance of code search gets improved after debiasing., Comment: 11 pages
Published: 2023

41. Flexible generation of structured terahertz fields via programmable exchange-biased spintronic emitters

Author: Wang, Shunjia, Qin, Wentao, Guan, Tongyang, Liu, Jingyu, Cai, Qingnan, Zhang, Sheng, Zhou, Lei, Zhang, Yan, Wu, Yizheng, and Tao, Zhensheng
Subjects: Physics - Optics, Physics - Applied Physics
Abstract: Structured light, particularly in the terahertz frequency range, holds considerable potential for a diverse range of applications. However, the generation and control of structured terahertz radiation pose major challenges. In this work, we demonstrate a novel programmable spintronic emitter that can flexibly generate a variety of structured terahertz waves. This is achieved through the precise and high-resolution programming of the magnetization pattern on the emitter surface, utilizing laser-assisted local field cooling of an exchange-biased ferromagnetic heterostructure. Moreover, we outline a generic design strategy for realizing specific complex structured terahertz fields in the far field. Our device successfully demonstrates the generation of terahertz waves with diverse structured polarization states, including spatially separated circular polarizations, azimuthal or radial polarization states, and a full Poincare beam. This innovation opens a new avenue for designing and generating structured terahertz radiations, with potential applications in terahertz microscopy, communication, quantum information, and light-matter interactions.
Published: 2023

42. DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation

Author: Xie, Yiqing, Zhang, Sheng, Cheng, Hao, Liu, Pengfei, Gero, Zelalem, Wong, Cliff, Naumann, Tristan, Poon, Hoifung, and Rose, Carolyn
Subjects: Computer Science - Computation and Language
Abstract: Medical text generation aims to assist with administrative work and highlight salient information to support decision-making. To reflect the specific requirements of medical text, in this paper, we propose a set of metrics to evaluate the completeness, conciseness, and attribution of the generated text at a fine-grained level. The metrics can be computed by various types of evaluators including instruction-following (both proprietary and open-source) and supervised entailment models. We demonstrate the effectiveness of the resulting framework, DocLens, with three evaluators on three tasks: clinical note generation, radiology report summarization, and patient question summarization. A comprehensive human study shows that DocLens exhibits substantially higher agreement with the judgments of medical experts than existing metrics. The results also highlight the need to improve open-source evaluators and suggest potential directions., Comment: ACL Camera Ready Version
Published: 2023

43. Realization of a programmable multi-purpose photonic quantum memory with over-thousand qubit manipulations

Author: Zhang, Sheng, Shi, Jixuan, Cui, Zhaibin, Wang, Ye, Wu, Yukai, Duan, Luming, and Pu, Yunfei
Subjects: Quantum Physics, Computer Science - Emerging Technologies, Physics - Optics
Abstract: Quantum networks can enable various applications such as distributed quantum computing, long-distance quantum communication, and network-based quantum sensing with unprecedented performances. One of the most important building blocks for a quantum network is a photonic quantum memory which serves as the interface between the communication channel and the local functional unit. A programmable quantum memory which can process a large stream of flying qubits and fulfill the requirements of multiple core functions in a quantum network is still to-be-realized. Here we report a high-performance quantum memory which can simultaneously store 72 optical qubits carried by 144 spatially separated atomic ensembles and support up to a thousand consecutive write or read operations in a random access way, two orders of magnitude larger than the previous record. Due to the built-in programmability, this quantum memory can be adapted on-demand for several functions. As example applications, we realize quantum queue, stack, and buffer which closely resemble the counterpart devices for classical information processing. We further demonstrate the synchronization and reshuffle of 4 entangled pairs of photonic pulses with probabilistic arrival time and arbitrary release order via the memory, which is an essential requirement for the realization of quantum repeaters and efficient routing in quantum networks. Realization of this multi-purpose programmable quantum memory thus constitutes a key enabling building block for future large-scale fully-functional quantum networks., Comment: 17 pages, 19 figures
Published: 2023
Full Text: View/download PDF

44. Dephasing of Strong-Field-Driven Floquet States Revealed by Time- and Spectrum-Resolved Quantum-Path Interferometry

Author: Liu, Yaxin, Zhu, Bingbing, Jiang, Shicheng, Huang, Shenyang, Luo, Mingyan, Zhang, Sheng, Yan, Hugen, Zhang, Yuanbo, Lu, Ruifeng, and Tao, Zhensheng
Subjects: Physics - Optics, Condensed Matter - Mesoscale and Nanoscale Physics
Abstract: Floquet engineering, while a powerful tool for ultrafast quantum-state manipulation, faces challenges under strong-field conditions, as recent high harmonic generation studies unveil exceptionally short dephasing times. In this study, using time- and spectrum-resolved quantum-path interferometry, we investigate the dephasing mechanisms of terahertz-driven excitons. Our results reveal a dramatic increase in exciton dephasing rate beyond a threshold field strength, indicating exciton dissociation as the primary dephasing mechanism. Importantly, we demonstrate long dephasing times of strong-field-dressed excitons, supporting coherent strong-field manipulation of quantum materials.
Published: 2023

45. Dynamic Multimodal Information Bottleneck for Multimodality Classification

Author: Fang, Yingying, Wu, Shuang, Zhang, Sheng, Huang, Chaoyan, Zeng, Tieyong, Xing, Xiaodan, Walsh, Simon, and Yang, Guang
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Effectively leveraging multimodal data such as various images, laboratory tests and clinical information is gaining traction in a variety of AI-based medical diagnosis and prognosis tasks. Most existing multi-modal techniques only focus on enhancing their performance by leveraging the differences or shared features from various modalities and fusing feature across different modalities. These approaches are generally not optimal for clinical settings, which pose the additional challenges of limited training data, as well as being rife with redundant data or noisy modality channels, leading to subpar performance. To address this gap, we study the robustness of existing methods to data redundancy and noise and propose a generalized dynamic multimodal information bottleneck framework for attaining a robust fused feature representation. Specifically, our information bottleneck module serves to filter out the task-irrelevant information and noises in the fused feature, and we further introduce a sufficiency loss to prevent dropping of task-relevant information, thus explicitly preserving the sufficiency of prediction information in the distilled feature. We validate our model on an in-house and a public COVID19 dataset for mortality prediction as well as two public biomedical datasets for diagnostic tasks. Extensive experiments show that our method surpasses the state-of-the-art and is significantly more robust, being the only method to remain performance when large-scale noisy channels exist. Our code is publicly available at https://github.com/BII-wushuang/DMIB., Comment: WACV 2024
Published: 2023

46. BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys

Author: Gu, Yu, Yang, Jianwei, Usuyama, Naoto, Li, Chunyuan, Zhang, Sheng, Lungren, Matthew P., Gao, Jianfeng, and Poon, Hoifung
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Rapid progress has been made in instruction-learning for image editing with natural-language instruction, as exemplified by InstructPix2Pix. In biomedicine, such methods can be applied to counterfactual image generation, which helps differentiate causal structure from spurious correlation and facilitate robust image interpretation for disease progression modeling. However, generic image-editing models are ill-suited for the biomedical domain, and counterfactual biomedical image generation is largely underexplored. In this paper, we present BiomedJourney, a novel method for counterfactual biomedical image generation by instruction-learning from multimodal patient journeys. Given a patient with two biomedical images taken at different time points, we use GPT-4 to process the corresponding imaging reports and generate a natural language description of disease progression. The resulting triples (prior image, progression description, new image) are then used to train a latent diffusion model for counterfactual biomedical image generation. Given the relative scarcity of image time series data, we introduce a two-stage curriculum that first pretrains the denoising network using the much more abundant single image-report pairs (with dummy prior image), and then continues training using the counterfactual triples. Experiments using the standard MIMIC-CXR dataset demonstrate the promise of our method. In a comprehensive battery of tests on counterfactual medical image generation, BiomedJourney substantially outperforms prior state-of-the-art methods in instruction image editing and medical image generation such as InstructPix2Pix and RoentGen. To facilitate future study in counterfactual medical generation, we plan to release our instruction-learning code and pretrained models., Comment: Project page & demo: https://aka.ms/biomedjourney
Published: 2023

47. Asynchronous Federated Learning with Incentive Mechanism Based on Contract Theory

Author: Yang, Danni, Ji, Yun, Kou, Zhoubin, Zhong, Xiaoxiong, and Zhang, Sheng
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: To address the challenges posed by the heterogeneity inherent in federated learning (FL) and to attract high-quality clients, various incentive mechanisms have been employed. However, existing incentive mechanisms are typically utilized in conventional synchronous aggregation, resulting in significant straggler issues. In this study, we propose a novel asynchronous FL framework that integrates an incentive mechanism based on contract theory. Within the incentive mechanism, we strive to maximize the utility of the task publisher by adaptively adjusting clients' local model training epochs, taking into account factors such as time delay and test accuracy. In the asynchronous scheme, considering client quality, we devise aggregation weights and an access control algorithm to facilitate asynchronous aggregation. Through experiments conducted on the MNIST dataset, the simulation results demonstrate that the test accuracy achieved by our framework is 3.12% and 5.84% higher than that achieved by FedAvg and FedProx without any attacks, respectively. The framework exhibits a 1.35% accuracy improvement over the ideal Local SGD under attacks. Furthermore, aiming for the same target accuracy, our framework demands notably less computation time than both FedAvg and FedProx.
Published: 2023

48. Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment

Author: Zhang, Sheng, Naseer, Muzammal, Chen, Guangyi, Shen, Zhiqiang, Khan, Salman, Zhang, Kun, and Khan, Fahad
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification. Despite the success, most traditional VLMs-based methods are restricted by the assumption of partial source supervision or ideal vocabularies, which rarely satisfy the open-world scenario. In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary. To address this challenge, we propose the Self Structural Semantic Alignment (S^3A) framework, which extracts the structural semantic information from unlabeled data while simultaneously self-learning. Our S^3A framework adopts a unique Cluster-Vote-Prompt-Realign (CVPR) algorithm, which iteratively groups unlabeled data to derive structural semantics for pseudo-supervision. Our CVPR process includes iterative clustering on images, voting within each cluster to identify initial class candidates from the vocabulary, generating discriminative prompts with large language models to discern confusing candidates, and realigning images and the vocabulary as structural semantic alignment. Finally, we propose to self-learn the CLIP image encoder with both individual and structural semantic alignment through a teacher-student learning strategy. Our comprehensive experiments across various generic and fine-grained benchmarks demonstrate that the S^3A method offers substantial improvements over existing VLMs-based approaches, achieving a more than 15% accuracy improvement over CLIP on average. Our codes, models, and prompts are publicly released at https://github.com/sheng-eatamath/S3A., Comment: AAAI'24
Published: 2023

49. UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition

Author: Zhou, Wenxuan, Zhang, Sheng, Gu, Yu, Chen, Muhao, and Poon, Hoifung
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) have demonstrated remarkable generalizability, such as understanding arbitrary entities and relations. Instruction tuning has proven effective for distilling LLMs into more cost-efficient models such as Alpaca and Vicuna. Yet such student models still trail the original LLMs by large margins in downstream applications. In this paper, we explore targeted distillation with mission-focused instruction tuning to train student models that can excel in a broad application class such as open information extraction. Using named entity recognition (NER) for case study, we show how ChatGPT can be distilled into much smaller UniversalNER models for open NER. For evaluation, we assemble the largest NER benchmark to date, comprising 43 datasets across 9 diverse domains such as biomedicine, programming, social media, law, finance. Without using any direct supervision, UniversalNER attains remarkable NER accuracy across tens of thousands of entity types, outperforming general instruction-tuned models such as Alpaca and Vicuna by over 30 absolute F1 points in average. With a tiny fraction of parameters, UniversalNER not only acquires ChatGPT's capability in recognizing arbitrary entity types, but also outperforms its NER accuracy by 7-9 absolute F1 points in average. Remarkably, UniversalNER even outperforms by a large margin state-of-the-art multi-task instruction-tuned systems such as InstructUIE, which uses supervised NER examples. We also conduct thorough ablation studies to assess the impact of various components in our distillation approach. We release the distillation recipe, data, and UniversalNER models to facilitate future research on targeted distillation., Comment: Accepted at ICLR 2024. Project page: https://universal-ner.github.io/
Published: 2023

50. Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology

Author: Wong, Cliff, Zhang, Sheng, Gu, Yu, Moung, Christine, Abel, Jacob, Usuyama, Naoto, Weerasinghe, Roshanthi, Piening, Brian, Naumann, Tristan, Bifulco, Carlo, and Poon, Hoifung
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Clinical trial matching is a key process in health delivery and discovery. In practice, it is plagued by overwhelming unstructured data and unscalable manual processing. In this paper, we conduct a systematic study on scaling clinical trial matching using large language models (LLMs), with oncology as the focus area. Our study is grounded in a clinical trial matching system currently in test deployment at a large U.S. health network. Initial findings are promising: out of box, cutting-edge LLMs, such as GPT-4, can already structure elaborate eligibility criteria of clinical trials and extract complex matching logic (e.g., nested AND/OR/NOT). While still far from perfect, LLMs substantially outperform prior strong baselines and may serve as a preliminary solution to help triage patient-trial candidates with humans in the loop. Our study also reveals a few significant growth areas for applying LLMs to end-to-end clinical trial matching, such as context limitation and accuracy, especially in structuring patient information from longitudinal medical records., Comment: 24 pages, 5 figures, accepted at Machine Learning for Healthcare (MLHC) 2023
Published: 2023

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

225 results on '"Zhang Sheng"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources