Author: "Zhang, Xiaofan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhang, Xiaofan"' showing total 1,511 results

Start Over Author "Zhang, Xiaofan"

1,511 results on '"Zhang, Xiaofan"'

1. Addressing Architectural Obstacles for Overlay with Stream Network Abstraction

Author: Wang, Chengyue, Zhang, Xiaofan, Cong, Jason, and Hoe, James C.
Subjects: Computer Science - Hardware Architecture
Abstract: Overlay is an effective approach for creating FPGA-based AI accelerators, enabling software-programmable specialized hardware datapaths to flexibly support various DNN operations. Traditional DNN overlays typically base their instruction set design on the von Neumann model but adapt them to be more coarse-grained. These instruction sets control execution at the layer granularity and impose restricted patterns for mapping computation and bandwidth resources. Such constraints cause inefficiencies from the imperfect match between supported execution patterns and diverse DNN layer shapes and types. This work proposes a Reconfigurable Stream Network architecture, a unique ISA abstraction tailored for flexible FPGA overlay execution at low cost, marking it as the first known FPGA design to support dynamic sequential linear layer pipelining. This novel architecture presents a datapath abstraction modeled after a specialized circuit-switched network with stateful functional units (FUs) as nodes and data streaming on edges. Programming a computation corresponds to triggering a network path in this stream-connected datapath. The program can individually control FUs to form paths that exploit both spatial and pipeline parallelism between independent and dependent concurrent computations. We present a proof-of-concept design RSN-XNN on the Versal VCK190. Evaluations show a 22x latency reduction for BERT compared to the state of the art, along with throughput improvements of 3.2x, 2.4x, 2.5x, and 2.8x for BERT, VIT, NCF, and MLP, respectively. RSN-XNN matches the latency of the T4 GPU with the same FP32 performance but only 18% of the memory bandwidth. Compared to the A100 GPU under the same 7nm process node, it achieves 2.1x/4.5x better operating/dynamic energy efficiency in FP32.
Published: 2024

2. Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment

Author: Jiang, Yankai, Lei, Wenhui, Zhang, Xiaofan, and Zhang, Shaoting
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Recent advancements in medical vision-language pre-training models have driven significant progress in zero-shot disease recognition. However, transferring image-level knowledge to pixel-level tasks, such as lesion segmentation in 3D CT scans, remains a critical challenge. Due to the complexity and variability of pathological visual characteristics, existing methods struggle to align fine-grained lesion features not encountered during training with disease-related textual representations. In this paper, we present Malenia, a novel multi-scale lesion-level mask-attribute alignment framework, specifically designed for 3D zero-shot lesion segmentation. Malenia improves the compatibility between mask representations and their associated elemental attributes, explicitly linking the visual features of unseen lesions with the extensible knowledge learned from previously seen ones. Furthermore, we design a Cross-Modal Knowledge Injection module to enhance both visual and textual features with mutually beneficial information, effectively guiding the generation of segmentation results. Comprehensive experiments across three datasets and 12 lesion categories validate the superior performance of Malenia. Codes will be publicly available.
Published: 2024

3. MedDiff-FM: A Diffusion-based Foundation Model for Versatile Medical Image Applications

Author: Yu, Yongrui, Gu, Yannian, Zhang, Shaoting, and Zhang, Xiaofan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Diffusion models have achieved significant success in both the natural image and medical image domains, encompassing a wide range of applications. Previous investigations in medical images have often been constrained to specific anatomical regions, particular applications, and limited datasets, resulting in isolated diffusion models. This paper introduces a diffusion-based foundation model to address a diverse range of medical image tasks, namely MedDiff-FM. MedDiff-FM leverages 3D CT images from multiple publicly available datasets, covering anatomical regions from head to abdomen, to pre-train a diffusion foundation model, and explores the capabilities of the diffusion foundation model across a variety of application scenarios. The diffusion foundation model handles multi-level image processing both at the image-level and patch-level, and utilizes position embedding to establish multi-level spatial relationships as well as anatomical structures and region classes to control certain anatomical regions. MedDiff-FM manages several downstream tasks seamlessly, including image denoising, anomaly detection, and image synthesis. MedDiff-FM is also capable of performing lesion generation and lesion inpainting by rapidly fine-tuning the diffusion foundation model using ControlNet with task-specific conditions. Experimental results demonstrate the effectiveness of MedDiff-FM in addressing diverse downstream medical image tasks.
Published: 2024

4. MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling

Author: Zhu, Yakun, Wei, Shaohang, Wang, Xu, Xue, Kui, Zhang, Xiaofan, and Zhang, Shaoting
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Integrating tools into Large Language Models (LLMs) has facilitated the widespread application. Despite this, in specialized downstream task contexts, reliance solely on tools is insufficient to fully address the complexities of the real world. This particularly restricts the effective deployment of LLMs in fields such as medicine. In this paper, we focus on the downstream tasks of medical calculators, which use standardized tests to assess an individual's health status. We introduce MeNTi, a universal agent architecture for LLMs. MeNTi integrates a specialized medical toolkit and employs meta-tool and nested calling mechanisms to enhance LLM tool utilization. Specifically, it achieves flexible tool selection and nested tool calling to address practical issues faced in intricate medical scenarios, including calculator selection, slot filling, and unit conversion. To assess the capabilities of LLMs for quantitative assessment throughout the clinical process of calculator scenarios, we introduce CalcQA. This benchmark requires LLMs to use medical calculators to perform calculations and assess patient health status. CalcQA is constructed by professional physicians and includes 100 case-calculator pairs, complemented by a toolkit of 281 medical tools. The experimental results demonstrate significant performance improvements with our framework. This research paves new directions for applying LLMs in demanding scenarios of medicine.
Published: 2024

5. Simultaneous Eruption and Shrinkage of Pre-existing Flare Loops during a Subsequent Solar Eruption

Author: Chen, Huadong, Fletcher, Lyndsay, Zhou, Guiping, Cheng, Xin, Wang, Ya, Mulay, Sargam, Zheng, Ruisheng, Ma, Suli, and Zhang, Xiaofan
Subjects: Astrophysics - Solar and Stellar Astrophysics, Physics - Plasma Physics
Abstract: We investigated two consecutive solar eruption events in the solar active region (AR) 12994 at the solar eastern limb on 2022 April 15. We found that the flare loops formed by the first eruption were involved in the second eruption. During the initial stage of the second flare, the middle part of these flare loops (E-loops) erupted outward along with the flux ropes below, while the parts of the flare loops (I-loops1 and I-loops2) on either side of the E-loops first rose and then contracted. Approximately 1 hour after the eruption, the heights of I-loops1 and I-loops2 decreased by 9 Mm and 45 Mm, respectively, compared to before the eruption. Their maximum descent velocities were 30 km/s and 130 km/s, respectively. The differential emission measure (DEM) results indicate that the plasma above I-loops1 and I-loops2 began to be heated about 23 minutes and 44 minutes after the start of the second flare, respectively. Within 20 minutes, the plasma temperature in these regions increased from ~3 MK to 6 MK. We proposed an adiabatic heating mechanism that magnetic energy would be converted into thermal and kinetic energy when the pre-stretched loops contract. Our calculations show that the magnetic energy required to heat the two high-temperature regions are 10^29-10^30 erg, which correspond to a loss of field strength of 2-3 G., Comment: The paper has been accepted for publication in the ApJ
Published: 2024

6. DeReStainer: H&E to IHC Pathological Image Translation via Decoupled Staining Channels

Author: Wei, Linda, Hua, Shengyi, Zhang, Shaoting, and Zhang, Xiaofan
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Breast cancer is a highly fatal disease among cancers in women, and early detection is crucial for treatment. HER2 status, a valuable diagnostic marker based on Immunohistochemistry (IHC) staining, is instrumental in determining breast cancer status. The high cost of IHC staining and the ubiquity of Hematoxylin and Eosin (H&E) staining make the conversion from H&E to IHC staining essential. In this article, we propose a destain-restain framework for converting H&E staining to IHC staining, leveraging the characteristic that H&E staining and IHC staining of the same tissue sections share the Hematoxylin channel. We further design loss functions specifically for Hematoxylin and Diaminobenzidin (DAB) channels to generate IHC images exploiting insights from separated staining channels. Beyond the benchmark metrics on BCI contest, we have developed semantic information metrics for the HER2 level. The experimental results demonstrated that our method outperforms previous open-sourced methods in terms of image intrinsic property and semantic information.
Published: 2024

7. TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading

Author: Wu, Kun, Park, Jeongmin Brian, Zhang, Xiaofan, Hidayetoğlu, Mert, Mailthody, Vikram Sharma, Huang, Sitao, Lumetta, Steven Sam, and Hwu, Wen-mei
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Neural and Evolutionary Computing
Abstract: The growth rate of the GPU memory capacity has not been able to keep up with that of the size of large language models (LLMs), hindering the model training process. In particular, activations -- the intermediate tensors produced during forward propagation and reused in backward propagation -- dominate the GPU memory use. To address this challenge, we propose TBA to efficiently offload activations to high-capacity NVMe SSDs. This approach reduces GPU memory usage without impacting performance by adaptively overlapping data transfers with computation. TBA is compatible with popular deep learning frameworks like PyTorch, Megatron, and DeepSpeed, and it employs techniques such as tensor deduplication, forwarding, and adaptive offloading to further enhance efficiency. We conduct extensive experiments on GPT, BERT, and T5. Results demonstrate that TBA effectively reduces 47% of the activation peak memory usage. At the same time, TBA perfectly overlaps the I/O with the computation and incurs negligible performance overhead. We introduce the recompute-offload-keep (ROK) curve to compare the TBA offloading with other two tensor placement strategies, keeping activations in memory and layerwise full recomputation. We find that TBA achieves better memory savings than layerwise full recomputation while retaining the performance of keeping the activations in memory.
Published: 2024

8. MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models

Author: Liu, Mianxin, Ding, Jinru, Xu, Jie, Hu, Weiguo, Li, Xiaoyang, Zhu, Lifeng, Bai, Zhian, Shi, Xiaoming, Wang, Benyou, Song, Haitao, Liu, Pengfei, Zhang, Xiaofan, Wang, Shanshan, Li, Kang, Wang, Haofen, Ruan, Tong, Huang, Xuanjing, Sun, Xin, and Zhang, Shaoting
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese medical LLM. First, MedBench assembles the currently largest evaluation dataset (300,901 questions) to cover 43 clinical specialties and performs multi-facet evaluation on medical LLM. Second, MedBench provides a standardized and fully automatic cloud-based evaluation infrastructure, with physical separations for question and ground truth. Third, MedBench implements dynamic evaluation mechanisms to prevent shortcut learning and answer remembering. Applying MedBench to popular general and medical LLMs, we observe unbiased, reproducible evaluation results largely aligning with medical professionals' perspectives. This study establishes a significant foundation for preparing the practical applications of Chinese medical LLMs. MedBench is publicly accessible at https://medbench.opencompass.org.cn., Comment: 25 pages.4 figures
Published: 2024

9. MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens

Author: Fan, Yongqi, Sun, Hongli, Xue, Kui, Zhang, Xiaofan, Zhang, Shaoting, and Ruan, Tong
Subjects: Computer Science - Computation and Language
Abstract: Numerous advanced Large Language Models (LLMs) now support context lengths up to 128K, and some extend to 200K. Some benchmarks in the generic domain have also followed up on evaluating long-context capabilities. In the medical domain, tasks are distinctive due to the unique contexts and need for domain expertise, necessitating further evaluation. However, despite the frequent presence of long texts in medical scenarios, evaluation benchmarks of long-context capabilities for LLMs in this field are still rare. In this paper, we propose MedOdyssey, the first medical long-context benchmark with seven length levels ranging from 4K to 200K tokens. MedOdyssey consists of two primary components: the medical-context "needles in a haystack" task and a series of tasks specific to medical applications, together comprising 10 datasets. The first component includes challenges such as counter-intuitive reasoning and novel (unknown) facts injection to mitigate knowledge leakage and data contamination of LLMs. The second component confronts the challenge of requiring professional medical expertise. Especially, we design the ``Maximum Identical Context'' principle to improve fairness by guaranteeing that different LLMs observe as many identical contexts as possible. Our experiment evaluates advanced proprietary and open-source LLMs tailored for processing long contexts and presents detailed performance analyses. This highlights that LLMs still face challenges and need for further research in this area. Our code and data are released in the repository: \url{https://github.com/JOHNNY-fans/MedOdyssey.}
Published: 2024

10. New Solutions on LLM Acceleration, Optimization, and Application

Author: Huang, Yingbing, Wan, Lily Jiaxin, Ye, Hanchen, Jha, Manvi, Wang, Jinghua, Li, Yuhong, Zhang, Xiaofan, and Chen, Deming
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Software Engineering
Abstract: Large Language Models (LLMs) have become extremely potent instruments with exceptional capacities for comprehending and producing human-like text in a wide range of applications. However, the increasing size and complexity of LLMs present significant challenges in both training and deployment, leading to substantial computational and storage costs as well as heightened energy consumption. In this paper, we provide a review of recent advancements and research directions aimed at addressing these challenges and enhancing the efficiency of LLM-based systems. We begin by discussing algorithm-level acceleration techniques focused on optimizing LLM inference speed and resource utilization. We also explore LLM-hardware co-design strategies with a vision to improve system efficiency by tailoring hardware architectures to LLM requirements. Further, we delve into LLM-to-accelerator compilation approaches, which involve customizing hardware accelerators for efficient LLM deployment. Finally, as a case study to leverage LLMs for assisting circuit design, we examine LLM-aided design methodologies for an important task: High-Level Synthesis (HLS) functional verification, by creating a new dataset that contains a large number of buggy and bug-free codes, which can be essential for training LLMs to specialize on HLS verification and debugging. For each aspect mentioned above, we begin with a detailed background study, followed by the presentation of several novel solutions proposed to overcome specific challenges. We then outline future research directions to drive further advancements. Through these efforts, we aim to pave the way for more efficient and scalable deployment of LLMs across a diverse range of applications., Comment: This is an expanded and more comprehensive study based on our invited DAC-24 paper with the same title and co-authors
Published: 2024

11. SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task

Author: Zhong, Ziije, Zhong, Linqing, Sun, Zhaoze, Jin, Qingyun, Qin, Zengchang, and Zhang, Xiaofan
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Integrating Large Language Models (LLMs) with existing Knowledge Graph (KG) databases presents a promising avenue for enhancing LLMs' efficacy and mitigating their "hallucinations". Given that most KGs reside in graph databases accessible solely through specialized query languages (e.g., Cypher), there exists a critical need to bridge the divide between LLMs and KG databases by automating the translation of natural language into Cypher queries (commonly termed the "Text2Cypher" task). Prior efforts tried to bolster LLMs' proficiency in Cypher generation through Supervised Fine-Tuning. However, these explorations are hindered by the lack of annotated datasets of Query-Cypher pairs, resulting from the labor-intensive and domain-specific nature of annotating such datasets. In this study, we propose SyntheT2C, a methodology for constructing a synthetic Query-Cypher pair dataset, comprising two distinct pipelines: (1) LLM-based prompting and (2) template-filling. SyntheT2C facilitates the generation of extensive Query-Cypher pairs with values sampled from an underlying Neo4j graph database. Subsequently, SyntheT2C is applied to two medical databases, culminating in the creation of a synthetic dataset, MedT2C. Comprehensive experiments demonstrate that the MedT2C dataset effectively enhances the performance of backbone LLMs on the Text2Cypher task. Both the SyntheT2C codebase and the MedT2C dataset will be released soon., Comment: 19 pages, 15 figures, 8 tables
Published: 2024

12. CAT: Coordinating Anatomical-Textual Prompts for Multi-Organ and Tumor Segmentation

Author: Huang, Zhongzhen, Jiang, Yankai, Zhang, Rongzhao, Zhang, Shaoting, and Zhang, Xiaofan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Existing promptable segmentation methods in the medical imaging field primarily consider either textual or visual prompts to segment relevant objects, yet they often fall short when addressing anomalies in medical images, like tumors, which may vary greatly in shape, size, and appearance. Recognizing the complexity of medical scenarios and the limitations of textual or visual prompts, we propose a novel dual-prompt schema that leverages the complementary strengths of visual and textual prompts for segmenting various organs and tumors. Specifically, we introduce CAT, an innovative model that Coordinates Anatomical prompts derived from 3D cropped images with Textual prompts enriched by medical domain knowledge. The model architecture adopts a general query-based design, where prompt queries facilitate segmentation queries for mask prediction. To synergize two types of prompts within a unified framework, we implement a ShareRefiner, which refines both segmentation and prompt queries while disentangling the two types of prompts. Trained on a consortium of 10 public CT datasets, CAT demonstrates superior performance in multiple segmentation tasks. Further validation on a specialized in-house dataset reveals the remarkable capacity of segmenting tumors across multiple cancer stages. This approach confirms that coordinating multimodal prompts is a promising avenue for addressing complex scenarios in the medical domain.
Published: 2024

13. ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Author: You, Haoran, Guo, Yipin, Fu, Yichao, Zhou, Wei, Shi, Huihong, Zhang, Xiaofan, Kundu, Souvik, Yazdanbakhsh, Amir, and Lin, Yingyan Celine
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large language models (LLMs) have shown impressive performance on language tasks but face challenges when deployed on resource-constrained devices due to their extensive parameters and reliance on dense multiplications, resulting in high memory demands and latency bottlenecks. Shift-and-add reparameterization offers a promising solution by replacing costly multiplications with hardware-friendly primitives in both the attention and multi-layer perceptron (MLP) layers of an LLM. However, current reparameterization techniques require training from scratch or full parameter fine-tuning to restore accuracy, which is resource-intensive for LLMs. To address this, we propose accelerating pretrained LLMs through post-training shift-and-add reparameterization, creating efficient multiplication-free models, dubbed ShiftAddLLM. Specifically, we quantize each weight matrix into binary matrices paired with group-wise scaling factors. The associated multiplications are reparameterized into (1) shifts between activations and scaling factors and (2) queries and adds according to the binary matrices. To reduce accuracy loss, we present a multi-objective optimization method to minimize both weight and output activation reparameterization errors. Additionally, based on varying sensitivity across layers to reparameterization, we develop an automated bit allocation strategy to further reduce memory usage and latency. Experiments on five LLM families and eight tasks consistently validate the effectiveness of ShiftAddLLM, achieving average perplexity improvements of 5.6 and 22.7 points at comparable or lower latency compared to the most competitive quantized LLMs at 3 and 2 bits, respectively, and more than 80% memory and energy reductions over the original LLMs. Codes and models are available at https://github.com/GATECH-EIC/ShiftAddLLM., Comment: Accepted by NeurIPS 2024
Published: 2024

14. Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation

Author: Zhong, Zijie, Liu, Hanwen, Cui, Xiaoya, Zhang, Xiaofan, and Qin, Zengchang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Integrating information from different reference data sources is a major challenge for Retrieval-Augmented Generation (RAG) systems because each knowledge source adopts a unique data structure and follows different conventions. Retrieving from multiple knowledge sources with one fixed strategy usually leads to under-exploitation of information. To mitigate this drawback, inspired by Mix-of-Expert, we introduce Mix-of-Granularity (MoG), a method that dynamically determines the optimal granularity of a knowledge database based on input queries using a router. The router is efficiently trained with a newly proposed loss function employing soft labels. We further extend MoG to Mix-of-Granularity-Graph (MoGG), where reference documents are pre-processed into graphs, enabling the retrieval of relevant information from distantly situated chunks. Extensive experiments demonstrate that both MoG and MoGG effectively predict optimal granularity levels, significantly enhancing the performance of the RAG system in downstream tasks. The code of both MoG and MoGG will be made public., Comment: 17 pages, 6 figures and 8 tables
Published: 2024

15. Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring

Author: Zhang, Tiantian, Lin, Manxi, Guo, Hongda, Zhang, Xiaofan, Chiu, Ka Fung Peter, Feragen, Aasa, and Dou, Qi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The Prostate Imaging Reporting and Data System (PI-RADS) is pivotal in the diagnosis of clinically significant prostate cancer through MRI imaging. Current deep learning-based PI-RADS scoring methods often lack the incorporation of common PI-RADS clinical guideline~(PICG) utilized by radiologists, potentially compromising scoring accuracy. This paper introduces a novel approach that adapts a multi-modal large language model (MLLM) to incorporate PICG into PI-RADS scoring model without additional annotations and network parameters. We present a designed two-stage fine-tuning process aiming at adapting a MLLM originally trained on natural images to the MRI images while effectively integrating the PICG. Specifically, in the first stage, we develop a domain adapter layer tailored for processing 3D MRI inputs and instruct the MLLM to differentiate MRI sequences. In the second stage, we translate PICG for guiding instructions from the model to generate PICG-guided image features. Through such a feature distillation step, we align the scoring network's features with the PICG-guided image features, which enables the model to effectively incorporate the PICG information. We develop our model on a public dataset and evaluate it on an in-house dataset. Experimental results demonstrate that our approach effectively improves the performance of current scoring networks. Code is available at: https://github.com/med-air/PICG2scoring
Published: 2024

16. Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models

Author: Huang, Zhongzhen, Xue, Kui, Fan, Yongqi, Mu, Linjie, Liu, Ruoyu, Ruan, Tong, Zhang, Shaoting, and Zhang, Xiaofan
Subjects: Computer Science - Computation and Language
Abstract: Large-scale language models (LLMs) have achieved remarkable success across various language tasks but suffer from hallucinations and temporal misalignment. To mitigate these shortcomings, Retrieval-augmented generation (RAG) has been utilized to provide external knowledge to facilitate the answer generation. However, applying such models to the medical domain faces several challenges due to the lack of domain-specific knowledge and the intricacy of real-world scenarios. In this study, we explore LLMs with RAG framework for knowledge-intensive tasks in the medical field. To evaluate the capabilities of LLMs, we introduce MedicineQA, a multi-round dialogue benchmark that simulates the real-world medication consultation scenario and requires LLMs to answer with retrieved evidence from the medicine database. MedicineQA contains 300 multi-round question-answering pairs, each embedded within a detailed dialogue history, highlighting the challenge posed by this knowledge-intensive task to current LLMs. We further propose a new \textit{Distill-Retrieve-Read} framework instead of the previous \textit{Retrieve-then-Read}. Specifically, the distillation and retrieval process utilizes a tool calling mechanism to formulate search queries that emulate the keyword-based inquiries used by search engines. With experimental results, we show that our framework brings notable performance improvements and surpasses the previous counterparts in the evidence retrieval process in terms of evidence retrieval accuracy. This advancement sheds light on applying RAG to the medical domain.
Published: 2024

17. Grounded Knowledge-Enhanced Medical VLP for Chest X-Ray

Author: Deng, Qiao, Huang, Zhongzhen, Wang, Yunqi, Wang, Zhichuan, Wang, Zhao, Zhang, Xiaofan, Dou, Qi, Hui, Yeung Yu, and Hui, Edward S.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Medical vision-language pre-training has emerged as a promising approach for learning domain-general representations of medical image and text. Current algorithms that exploit the global and local alignment between medical image and text could however be marred by the redundant information in medical data. To address this issue, we propose a grounded knowledge-enhanced medical vision-language pre-training (GK-MVLP) framework for chest X-ray. In this framework, medical knowledge is grounded to the appropriate anatomical regions by using a transformer-based grounded knowledge-enhanced module for fine-grained alignment between anatomical region-level visual features and the textural features of medical knowledge. The performance of GK-MVLP is competitive with or exceeds the state of the art on downstream chest X-ray disease classification, disease localization, report generation, and medical visual question-answering tasks. Our results show the advantage of incorporating grounding mechanism to remove biases and improve the alignment between chest X-ray image and radiology report.
Published: 2024

18. Transnational Higher Education in China: Policies, Practices, and Development in a (Post-)Pandemic Era

Author: Li, Xiaoyuan, Dai, Kun, and Zhang, Xiaofan
Published: 2024
Full Text: View/download PDF

19. CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

Author: Yu, Yongrui, Chen, Hanyu, Zhang, Zitian, Xiao, Qiong, Lei, Wenhui, Dai, Linrui, Fu, Yu, Tan, Hui, Wang, Guan, Gao, Peng, and Zhang, Xiaofan
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Despite the significant success achieved by deep learning methods in medical image segmentation, researchers still struggle in the computer-aided diagnosis of abdominal lymph nodes due to the complex abdominal environment, small and indistinguishable lesions, and limited annotated data. To address these problems, we present a pipeline that integrates the conditional diffusion model for lymph node generation and the nnU-Net model for lymph node segmentation to improve the segmentation performance of abdominal lymph nodes through synthesizing a diversity of realistic abdominal lymph node data. We propose LN-DDPM, a conditional denoising diffusion probabilistic model (DDPM) for lymph node (LN) generation. LN-DDPM utilizes lymph node masks and anatomical structure masks as model conditions. These conditions work in two conditioning mechanisms: global structure conditioning and local detail conditioning, to distinguish between lymph nodes and their surroundings and better capture lymph node characteristics. The obtained paired abdominal lymph node images and masks are used for the downstream segmentation task. Experimental results on the abdominal lymph node datasets demonstrate that LN-DDPM outperforms other generative methods in the abdominal lymph node image synthesis and better assists the downstream abdominal lymph node segmentation task.
Published: 2024

20. PathoTune: Adapting Visual Foundation Model to Pathological Specialists

Author: Lu, Jiaxuan, Yan, Fang, Zhang, Xiaofan, Gao, Yue, and Zhang, Shaoting
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: As natural image understanding moves towards the pretrain-finetune era, research in pathology imaging is concurrently evolving. Despite the predominant focus on pretraining pathological foundation models, how to adapt foundation models to downstream tasks is little explored. For downstream adaptation, we propose the existence of two domain gaps, i.e., the Foundation-Task Gap and the Task-Instance Gap. To mitigate these gaps, we introduce PathoTune, a framework designed to efficiently adapt pathological or even visual foundation models to pathology-specific tasks via multi-modal prompt tuning. The proposed framework leverages Task-specific Visual Prompts and Task-specific Textual Prompts to identify task-relevant features, along with Instance-specific Visual Prompts for encoding single pathological image features. Results across multiple datasets at both patch-level and WSI-level demonstrate its superior performance over single-modality prompt tuning approaches. Significantly, PathoTune facilitates the direct adaptation of natural visual foundation models to pathological tasks, drastically outperforming pathological foundation models with simple linear probing. The code is available at https://github.com/openmedlab/PathoDuet., Comment: MICCAI 2024
Published: 2024

21. VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Human Annotation-Free Pathological Image Classification

Author: Zhong, Lanfeng, Liao, Xin, Zhang, Shaoting, Zhang, Xiaofan, and Wang, Guotai
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Despite that deep learning methods have achieved remarkable performance in pathology image classification, they heavily rely on labeled data, demanding extensive human annotation efforts. In this study, we present a novel human annotation-free method for pathology image classification by leveraging pre-trained Vision-Language Models (VLMs). Without human annotation, pseudo labels of the training set are obtained by utilizing the zero-shot inference capabilities of VLM, which may contain a lot of noise due to the domain shift between the pre-training data and the target dataset. To address this issue, we introduce VLM-CPL, a novel approach based on consensus pseudo labels that integrates two noisy label filtering techniques with a semi-supervised learning strategy. Specifically, we first obtain prompt-based pseudo labels with uncertainty estimation by zero-shot inference with the VLM using multiple augmented views of an input. Then, by leveraging the feature representation ability of VLM, we obtain feature-based pseudo labels via sample clustering in the feature space. Prompt-feature consensus is introduced to select reliable samples based on the consensus between the two types of pseudo labels. By rejecting low-quality pseudo labels, we further propose High-confidence Cross Supervision (HCS) to learn from samples with reliable pseudo labels and the remaining unlabeled samples. Experimental results showed that our method obtained an accuracy of 87.1% and 95.1% on the HPH and LC25K datasets, respectively, and it largely outperformed existing zero-shot classification and noisy label learning methods. The code is available at https://github.com/lanfz2000/VLM-CPL., Comment: Under review
Published: 2024

22. GuideGen: A Text-Guided Framework for Full-torso Anatomy and CT Volume Generation

Author: Dai, Linrui, Zhang, Rongzhao, Yu, Yongrui, and Zhang, Xiaofan
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: The recently emerging conditional diffusion models seem promising for mitigating the labor and expenses in building large 3D medical imaging datasets. However, previous studies on 3D CT generation have yet to fully capitalize on semantic and textual conditions, and they have primarily focused on specific organs characterized by a local structure and fixed contrast. In this work, we present GuideGen, a controllable framework that generates anatomical masks and corresponding CT volumes for the entire torso-from chest to pelvis-based on free-form text prompts. Our approach includes three core components: a text-conditional semantic synthesizer for creating realistic full-torso anatomies; a contrast-aware autoencoder for detailed, high-fidelity feature extraction across varying contrast levels; and a latent feature generator that ensures alignment between CT images, anatomical semantics and input prompts. To train and evaluate GuideGen, we compile a multi-modality cancer imaging dataset with paired CT and clinical descriptions from 12 public TCIA datasets and one private real-world dataset. Comprehensive evaluations across generation quality, cross-modality alignment, and data usability on multi-organ and tumor segmentation tasks demonstrate GuideGen's superiority over existing CT generation methods., Comment: submitted to CVPR2025
Published: 2024

23. Miniature narrow-linewidth 1 {\mu}m Laser

Author: Zhang, Xiaofan, Zhang, Fan, Jia, Kunpeng, Liu, Yunfeng, shi, Haosen, Jiang, Yanyi, Jiang, Xiaoshun, Ma, Longsheng, Liang, Wei, Xie, Zhenda, and Zhu, Shi-ning
Subjects: Physics - Optics
Abstract: Self-injection locking scheme has the potential to narrow the linewidth of lasers in a compact setup. Here, we report a narrow linewidth laser source near 1 {\mu}m by self-injection locking scheme using a Fabry-Perot (FP) hollow resonator with a high-quality factor (Q>10^8). The measured fundamental linewidth of the laser is 41 Hz, and a coarse tuning range over 5.5 nm is achieved by changing the driving current of the laser source. Meanwhile, a fine-tuning range of 373 MHz is achieved without mode hops by changing the voltage applied to the PZT on the resonator. More importantly, benefiting from the low thermal refractive noise and low thermal expansion of the FP hollow resonator, the beat-note linewidth and the frequency Allan deviation are measured to be 510.3 Hz in and 10^-11 (1s averaging time), respectively, by using a fully stabilized frequency comb as reference. Such a high-performance laser is fully integrated with a palm-sized package (52.3 mL) for field-deployable applications.
Published: 2024

24. Modality-Aware and Shift Mixer for Multi-modal Brain Tumor Segmentation

Author: Huang, Zhongzhen, Wei, Linda, Zhang, Shaoting, and Zhang, Xiaofan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Combining images from multi-modalities is beneficial to explore various information in computer vision, especially in the medical domain. As an essential part of clinical diagnosis, multi-modal brain tumor segmentation aims to delineate the malignant entity involving multiple modalities. Although existing methods have shown remarkable performance in the task, the information exchange for cross-scale and high-level representations fusion in spatial and modality are limited in these methods. In this paper, we present a novel Modality Aware and Shift Mixer that integrates intra-modality and inter-modality dependencies of multi-modal images for effective and robust brain tumor segmentation. Specifically, we introduce a Modality-Aware module according to neuroimaging studies for modeling the specific modality pair relationships at low levels, and a Modality-Shift module with specific mosaic patterns is developed to explore the complex relationships across modalities at high levels via the self-attention. Experimentally, we outperform previous state-of-the-art approaches on the public Brain Tumor Segmentation (BraTS 2021 segmentation) dataset. Further qualitative experiments demonstrate the efficacy and robustness of MASM.
Published: 2024

25. OpenMEDLab: An Open-source Platform for Multi-modality Foundation Models in Medicine

Author: Wang, Xiaosong, Zhang, Xiaofan, Wang, Guotai, He, Junjun, Li, Zhongyu, Zhu, Wentao, Guo, Yi, Dou, Qi, Li, Xiaoxiao, Wang, Dequan, Hong, Liang, Lao, Qicheng, Ruan, Tong, Zhou, Yukun, Li, Yixue, Zhao, Jie, Li, Kang, Sun, Xin, Zhu, Lifeng, and Zhang, Shaoting
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The emerging trend of advancing generalist artificial intelligence, such as GPTv4 and Gemini, has reshaped the landscape of research (academia and industry) in machine learning and many other research areas. However, domain-specific applications of such foundation models (e.g., in medicine) remain untouched or often at their very early stages. It will require an individual set of transfer learning and model adaptation techniques by further expanding and injecting these models with domain knowledge and data. The development of such technologies could be largely accelerated if the bundle of data, algorithms, and pre-trained foundation models were gathered together and open-sourced in an organized manner. In this work, we present OpenMEDLab, an open-source platform for multi-modality foundation models. It encapsulates not only solutions of pioneering attempts in prompting and fine-tuning large language and vision models for frontline clinical and bioinformatic applications but also building domain-specific foundation models with large-scale multi-modal medical data. Importantly, it opens access to a group of pre-trained foundation models for various medical image modalities, clinical text, protein engineering, etc. Inspiring and competitive results are also demonstrated for each collected approach and model in a variety of benchmarks for downstream tasks. We welcome researchers in the field of medical artificial intelligence to continuously contribute cutting-edge methods and models to OpenMEDLab, which can be accessed via https://github.com/openmedlab., Comment: Technical Report. Visit https://github.com/openmedlab for more details
Published: 2024

26. DeReStainer: H&E to IHC Pathological Image Translation via Decoupled Staining Channels

Author: Wei, Linda, Hua, Shengyi, Zhang, Shaoting, Zhang, Xiaofan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Mukhopadhyay, Anirban, editor, Oksuz, Ilkay, editor, Engelhardt, Sandy, editor, Mehrof, Dorit, editor, and Yuan, Yixuan, editor
Published: 2025
Full Text: View/download PDF

27. TGFβ2 mediates oxidative stress–induced epithelial-to-mesenchymal transition of bladder smooth muscle

Author: Geng, Jingwen, Zhang, Xiaofan, Zhang, Yansong, Meng, Xiaojia, Sun, Jinqi, Zhou, Bo, and Ma, Jun
Published: 2024
Full Text: View/download PDF

28. USFM: A Universal Ultrasound Foundation Model Generalized to Tasks and Organs towards Label Efficient Image Analysis

Author: Jiao, Jing, Zhou, Jin, Li, Xiaokang, Xia, Menghua, Huang, Yi, Huang, Lihong, Wang, Na, Zhang, Xiaofan, Zhou, Shichong, Wang, Yuanyuan, and Guo, Yi
Subjects: Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Inadequate generality across different organs and tasks constrains the application of ultrasound (US) image analysis methods in smart healthcare. Building a universal US foundation model holds the potential to address these issues. Nevertheless, the development of such foundational models encounters intrinsic challenges in US analysis, i.e., insufficient databases, low quality, and ineffective features. In this paper, we present a universal US foundation model, named USFM, generalized to diverse tasks and organs towards label efficient US image analysis. First, a large-scale Multi-organ, Multi-center, and Multi-device US database was built, comprehensively containing over two million US images. Organ-balanced sampling was employed for unbiased learning. Then, USFM is self-supervised pre-trained on the sufficient US database. To extract the effective features from low-quality US images, we proposed a spatial-frequency dual masked image modeling method. A productive spatial noise addition-recovery approach was designed to learn meaningful US information robustly, while a novel frequency band-stop masking learning approach was also employed to extract complex, implicit grayscale distribution and textural variations. Extensive experiments were conducted on the various tasks of segmentation, classification, and image enhancement from diverse organs and diseases. Comparisons with representative US image analysis models illustrate the universality and effectiveness of USFM. The label efficiency experiments suggest the USFM obtains robust performance with only 20% annotation, laying the groundwork for the rapid development of US models in clinical practices., Comment: Submit to MedIA, 17 pages, 11 figures
Published: 2023

29. PathoDuet: Foundation Models for Pathological Slide Analysis of H&E and IHC Stains

Author: Hua, Shengyi, Yan, Fang, Shen, Tianle, Ma, Lei, and Zhang, Xiaofan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Large amounts of digitized histopathological data display a promising future for developing pathological foundation models via self-supervised learning methods. Foundation models pretrained with these methods serve as a good basis for downstream tasks. However, the gap between natural and histopathological images hinders the direct application of existing methods. In this work, we present PathoDuet, a series of pretrained models on histopathological images, and a new self-supervised learning framework in histopathology. The framework is featured by a newly-introduced pretext token and later task raisers to explicitly utilize certain relations between images, like multiple magnifications and multiple stains. Based on this, two pretext tasks, cross-scale positioning and cross-stain transferring, are designed to pretrain the model on Hematoxylin and Eosin (H&E) images and transfer the model to immunohistochemistry (IHC) images, respectively. To validate the efficacy of our models, we evaluate the performance over a wide variety of downstream tasks, including patch-level colorectal cancer subtyping and whole slide image (WSI)-level classification in H&E field, together with expression level prediction of IHC marker, tumor identification and slide-level qualitative analysis in IHC field. The experimental results show the superiority of our models over most tasks and the efficacy of proposed pretext tasks. The codes and models are available at https://github.com/openmedlab/PathoDuet., Comment: Accepted for Medical Image Analysis
Published: 2023

30. ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting

Author: Jiang, Yankai, Huang, Zhongzhen, Zhang, Rongzhao, Zhang, Xiaofan, and Zhang, Shaoting
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The long-tailed distribution problem in medical image analysis reflects a high prevalence of common conditions and a low prevalence of rare ones, which poses a significant challenge in developing a unified model capable of identifying rare or novel tumor categories not encountered during training. In this paper, we propose a new zero-shot pan-tumor segmentation framework (ZePT) based on query-disentangling and self-prompting to segment unseen tumor categories beyond the training set. ZePT disentangles the object queries into two subsets and trains them in two stages. Initially, it learns a set of fundamental queries for organ segmentation through an object-aware feature grouping strategy, which gathers organ-level visual features. Subsequently, it refines the other set of advanced queries that focus on the auto-generated visual prompts for unseen tumor segmentation. Moreover, we introduce query-knowledge alignment at the feature level to enhance each query's discriminative representation and generalizability. Extensive experiments on various tumor segmentation tasks demonstrate the performance superiority of ZePT, which surpasses the previous counterparts and evidence the promising ability for zero-shot tumor segmentation in real-world settings., Comment: This paper has been accepted by CVPR 2024
Published: 2023

31. SHA-SCP: A UI Element Spatial Hierarchy Aware Smartphone User Click Behavior Prediction Method

Author: Chen, Ling, Peng, Yiyi, Qian, Kai, Shi, Hongyu, and Zhang, Xiaofan
Subjects: Computer Science - Human-Computer Interaction
Abstract: Predicting user click behavior and making relevant recommendations based on the user's historical click behavior are critical to simplifying operations and improving user experience. Modeling UI elements is essential to user click behavior prediction, while the complexity and variety of the UI make it difficult to adequately capture the information of different scales. In addition, the lack of relevant datasets also presents difficulties for such studies. In response to these challenges, we construct a fine-grained smartphone usage behavior dataset containing 3,664,325 clicks of 100 users and propose a UI element spatial hierarchy aware smartphone user click behavior prediction method (SHA-SCP). SHA-SCP builds element groups by clustering the elements according to their spatial positions and uses attention mechanisms to perceive the UI at the element level and the element group level to fully capture the information of different scales. Experiments are conducted on the fine-grained smartphone usage behavior dataset, and the results show that our method outperforms the best baseline by an average of 10.52%, 11.34%, and 10.42% in Top-1 Accuracy, Top-3 Accuracy, and Top-5 Accuracy, respectively.
Published: 2023

32. AG-CRC: Anatomy-Guided Colorectal Cancer Segmentation in CT with Imperfect Anatomical Knowledge

Author: Zhang, Rongzhao, Bai, Zhian, Yu, Ruoying, Pang, Wenrao, Wang, Lingyun, Zhu, Lifeng, Zhang, Xiaofan, Zhang, Huan, and Hu, Weiguo
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: When delineating lesions from medical images, a human expert can always keep in mind the anatomical structure behind the voxels. However, although high-quality (though not perfect) anatomical information can be retrieved from computed tomography (CT) scans with modern deep learning algorithms, it is still an open problem how these automatically generated organ masks can assist in addressing challenging lesion segmentation tasks, such as the segmentation of colorectal cancer (CRC). In this paper, we develop a novel Anatomy-Guided segmentation framework to exploit the auto-generated organ masks to aid CRC segmentation from CT, namely AG-CRC. First, we obtain multi-organ segmentation (MOS) masks with existing MOS models (e.g., TotalSegmentor) and further derive a more robust organ of interest (OOI) mask that may cover most of the colon-rectum and CRC voxels. Then, we propose an anatomy-guided training patch sampling strategy by optimizing a heuristic gain function that considers both the proximity of important regions (e.g., the tumor or organs of interest) and sample diversity. Third, we design a novel self-supervised learning scheme inspired by the topology of tubular organs like the colon to boost the model performance further. Finally, we employ a masked loss scheme to guide the model to focus solely on the essential learning region. We extensively evaluate the proposed method on two CRC segmentation datasets, where substantial performance improvement (5% to 9% in Dice) is achieved over current state-of-the-art medical image segmentation models, and the ablation studies further evidence the efficacy of every proposed component., Comment: under review
Published: 2023

33. Drosophila TMEM63 and mouse TMEM63A are lysosomal mechanosensory ion channels

Author: Li, Kai, Guo, Yanmeng, Wang, Yayu, Zhu, Ruijun, Chen, Wei, Cheng, Tong, Zhang, Xiaofan, Jia, Yinjun, Liu, Ting, Zhang, Wei, Jan, Lily Yeh, and Jan, Yuh Nung
Subjects: Biochemistry and Cell Biology, Biological Sciences, Neurosciences, Genetics, Neurodegenerative, Rare Diseases, Underpinning research, 1.1 Normal biological development and functioning, Generic health relevance, Medical and Health Sciences, Developmental Biology, Biochemistry and cell biology
Abstract: Cells sense physical forces and convert them into electrical or chemical signals, a process known as mechanotransduction. Whereas extensive studies focus on mechanotransduction at the plasma membrane, little is known about whether and how intracellular organelles sense mechanical force and the physiological functions of organellar mechanosensing. Here we identify the Drosophila TMEM63 (DmTMEM63) ion channel as an intrinsic mechanosensor of the lysosome, a major degradative organelle. Endogenous DmTMEM63 proteins localize to lysosomes, mediate lysosomal mechanosensitivity and modulate lysosomal morphology and function. Tmem63 mutant flies exhibit impaired lysosomal degradation, synaptic loss, progressive motor deficits and early death, with some of these mutant phenotypes recapitulating symptoms of TMEM63-associated human diseases. Importantly, mouse TMEM63A mediates lysosomal mechanosensitivity in Neuro-2a cells, indicative of functional conservation in mammals. Our findings reveal DmTMEM63 channel function in lysosomes and its physiological roles in vivo and provide a molecular basis to explore the mechanosensitive process in subcellular organelles.
Published: 2024

34. AtPRMT3-RPS2B promotes ribosome biogenesis and coordinates growth and cold adaptation trade-off

Author: Wang, Zhen, Zhang, Xiaofan, Liu, Chunyan, Duncan, Susan, Hang, Runlai, Sun, Jing, Luo, Lilan, Ding, Yiliang, and Cao, Xiaofeng
Published: 2024
Full Text: View/download PDF

35. Intraspecific cooperation allows the survival of Staphylococcus aureus staff: a novel strategy for disease relapse

Author: Luo, Hua, Ni, Lijia, Chen, Tongling, Huang, Lisi, Zhang, Xiaofan, Li, Xuexue, Liao, Xiaoyan, Shen, Rui, Luo, Zhaofan, and Xie, Xiaoying
Published: 2024
Full Text: View/download PDF

36. Experimental research on damage characteristics of red sandstone under the combined action of temperature, water and stress

Author: Bao, Xiankai, Qiao, Jianlong, Yu, Chaoyun, Tian, Baolong, Wang, Lingyu, and Zhang, Xiaofan
Published: 2024
Full Text: View/download PDF

37. Artificial intelligence-based assessment of PD-L1 expression in diffuse large B cell lymphoma

Author: Yan, Fang, Da, Qian, Yi, Hongmei, Deng, Shijie, Zhu, Lifeng, Zhou, Mu, Liu, Yingting, Feng, Ming, Wang, Jing, Wang, Xuan, Zhang, Yuxiu, Zhang, Wenjing, Zhang, Xiaofan, Lin, Jingsheng, Zhang, Shaoting, and Wang, Chaofu
Published: 2024
Full Text: View/download PDF

38. Automatic lobe segmentation using attentive cross entropy and end-to-end fissure generation

Author: Su, Qi, Wang, Na, Xie, Jiawen, Chen, Yinan, and Zhang, Xiaofan
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: The automatic lung lobe segmentation algorithm is of great significance for the diagnosis and treatment of lung diseases, however, which has great challenges due to the incompleteness of pulmonary fissures in lung CT images and the large variability of pathological features. Therefore, we propose a new automatic lung lobe segmentation framework, in which we urge the model to pay attention to the area around the pulmonary fissure during the training process, which is realized by a task-specific loss function. In addition, we introduce an end-to-end pulmonary fissure generation method in the auxiliary pulmonary fissure segmentation task, without any additional network branch. Finally, we propose a registration-based loss function to alleviate the convergence difficulty of the Dice loss supervised pulmonary fissure segmentation task. We achieve 97.83% and 94.75% dice scores on our private dataset STLB and public LUNA16 dataset respectively., Comment: 5 pages, 3 figures, published to 'IEEE International Symposium on Biomedical Imaging (ISBI) 2023'
Published: 2023

39. Efficient Subclass Segmentation in Medical Images

Author: Dai, Linrui, Lei, Wenhui, and Zhang, Xiaofan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: As research interests in medical image analysis become increasingly fine-grained, the cost for extensive annotation also rises. One feasible way to reduce the cost is to annotate with coarse-grained superclass labels while using limited fine-grained annotations as a complement. In this way, fine-grained data learning is assisted by ample coarse annotations. Recent studies in classification tasks have adopted this method to achieve satisfactory results. However, there is a lack of research on efficient learning of fine-grained subclasses in semantic segmentation tasks. In this paper, we propose a novel approach that leverages the hierarchical structure of categories to design network architecture. Meanwhile, a task-driven data generation method is presented to make it easier for the network to recognize different subclass categories. Specifically, we introduce a Prior Concatenation module that enhances confidence in subclass segmentation by concatenating predicted logits from the superclass classifier, a Separate Normalization module that stretches the intra-class distance within the same superclass to facilitate subclass segmentation, and a HierarchicalMix model that generates high-quality pseudo labels for unlabeled samples by fusing only similar superclass regions from labeled and unlabeled images. Our experiments on the BraTS2021 and ACDC datasets demonstrate that our approach achieves comparable accuracy to a model trained with full subclass annotations, with limited subclass annotations and sufficient superclass annotations. Our approach offers a promising solution for efficient fine-grained subclass segmentation in medical images. Our code is publicly available here., Comment: MICCAI 2023 early accept
Published: 2023

40. Synthesis and antiviral property of polysulfate-grafted maleimide-based enediynes

Author: Li, Zhuoyu, Ding, Zhe, Cheng, Haonan, Zhang, Xiaofan, Zhang, Houjun, Wong, Gary, Ding, Yun, Lan, Jiaming, and Hu, Aiguo
Published: 2024
Full Text: View/download PDF

41. 3D-Printed Pea Protein–Based Dysphagia Diet Affected by Different Hydrocolloids

Author: Zhu, Yaolei, Chen, Lei, Zhang, Xiaofan, Meng, Ting, Liu, Zhenbin, Chitrakar, Bimal, and He, Chaojun
Published: 2024
Full Text: View/download PDF

42. MedLSAM: Localize and Segment Anything Model for 3D CT Images

Author: Lei, Wenhui, Wei, Xu, Zhang, Xiaofan, Li, Kang, and Zhang, Shaoting
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advancements in foundation models have shown significant potential in medical image analysis. However, there is still a gap in models specifically designed for medical image localization. To address this, we introduce MedLAM, a 3D medical foundation localization model that accurately identifies any anatomical part within the body using only a few template scans. MedLAM employs two self-supervision tasks: unified anatomical mapping (UAM) and multi-scale similarity (MSS) across a comprehensive dataset of 14,012 CT scans. Furthermore, we developed MedLSAM by integrating MedLAM with the Segment Anything Model (SAM). This innovative framework requires extreme point annotations across three directions on several templates to enable MedLAM to locate the target anatomical structure in the image, with SAM performing the segmentation. It significantly reduces the amount of manual annotation required by SAM in 3D medical imaging scenarios. We conducted extensive experiments on two 3D datasets covering 38 distinct organs. Our findings are twofold: 1) MedLAM can directly localize anatomical structures using just a few template scans, achieving performance comparable to fully supervised models; 2) MedLSAM closely matches the performance of SAM and its specialized medical adaptations with manual prompts, while minimizing the need for extensive point annotations across the entire dataset. Moreover, MedLAM has the potential to be seamlessly integrated with future 3D SAM models, paving the way for enhanced segmentation performance. Our code is public at \href{https://github.com/openmedlab/MedLSAM}, Comment: MIA 2024. Code is public at https://github.com/openmedlab/MedLSAM
Published: 2023

43. KiUT: Knowledge-injected U-Transformer for Radiology Report Generation

Author: Huang, Zhongzhen, Zhang, Xiaofan, and Zhang, Shaoting
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Radiology report generation aims to automatically generate a clinically accurate and coherent paragraph from the X-ray image, which could relieve radiologists from the heavy burden of report writing. Although various image caption methods have shown remarkable performance in the natural image field, generating accurate reports for medical images requires knowledge of multiple modalities, including vision, language, and medical terminology. We propose a Knowledge-injected U-Transformer (KiUT) to learn multi-level visual representation and adaptively distill the information with contextual and clinical knowledge for word prediction. In detail, a U-connection schema between the encoder and decoder is designed to model interactions between different modalities. And a symptom graph and an injected knowledge distiller are developed to assist the report generation. Experimentally, we outperform state-of-the-art methods on two widely used benchmark datasets: IU-Xray and MIMIC-CXR. Further experimental results prove the advantages of our architecture and the complementary benefits of the injected knowledge.
Published: 2023

44. Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision Post-Training Quantization

Author: Schaefer, Clemens JS, Lambert-Shirzad, Navid, Zhang, Xiaofan, Chou, Chiachen, Jablin, Tom, Li, Jian, Guo, Elfie, Stanton, Caitlin, Joshi, Siddharth, and Wang, Yu Emma
Subjects: Computer Science - Machine Learning
Abstract: Efficiently serving neural network models with low latency is becoming more challenging due to increasing model complexity and parameter count. Model quantization offers a solution which simultaneously reduces memory footprint and compute requirements. However, aggressive quantization may lead to an unacceptable loss in model accuracy owing to differences in sensitivity to numerical imperfection across different layers in the model. To address this challenge, we propose a mixed-precision post training quantization (PTQ) approach that assigns different numerical precisions to tensors in a network based on their specific needs, for a reduced memory footprint and improved latency while preserving model accuracy. Previous works rely on layer-wise Hessian information to determine numerical precision, but as we demonstrate, Hessian estimation is typically insufficient in determining an effective ordering of layer sensitivities. We address this by augmenting the estimated Hessian with additional information to capture inter-layer dependencies. We demonstrate that this consistently improves PTQ performance along the accuracy-latency Pareto frontier across multiple models. Our method combines second-order information and inter-layer dependencies to guide a bisection search, finding quantization configurations within a user-configurable model accuracy degradation range. We evaluate the effectiveness of our method on the ResNet50, MobileNetV2, and BERT models. Our experiments demonstrate latency reductions compared to a 16-bit baseline of $25.48\%$, $21.69\%$, and $33.28\%$ respectively, while maintaining model accuracy to within $99.99\%$ of the baseline model.
Published: 2023

45. MidMed: Towards Mixed-Type Dialogues for Medical Consultation

Author: Shi, Xiaoming, Liu, Zeming, Wang, Chuan, Leng, Haitao, Xue, Kui, Zhang, Xiaofan, and Zhang, Shaoting
Subjects: Computer Science - Computation and Language
Abstract: Most medical dialogue systems assume that patients have clear goals (medicine querying, surgical operation querying, etc.) before medical consultation. However, in many real scenarios, due to the lack of medical knowledge, it is usually difficult for patients to determine clear goals with all necessary slots. In this paper, we identify this challenge as how to construct medical consultation dialogue systems to help patients clarify their goals. To mitigate this challenge, we propose a novel task and create a human-to-human mixed-type medical consultation dialogue corpus, termed MidMed, covering five dialogue types: task-oriented dialogue for diagnosis, recommendation, knowledge-grounded dialogue, QA, and chitchat. MidMed covers four departments (otorhinolaryngology, ophthalmology, skin, and digestive system), with 8,175 dialogues. Furthermore, we build baselines on MidMed and propose an instruction-guiding medical dialogue generation framework, termed InsMed, to address this task. Experimental results show the effectiveness of InsMed., Comment: Accepted by ACL 2023 main conference. The first two authors contributed equally to this work
Published: 2023

46. Mixed Precision Post Training Quantization of Neural Networks with Sensitivity Guided Search

Author: Schaefer, Clemens JS, Guo, Elfie, Stanton, Caitlin, Zhang, Xiaofan, Jablin, Tom, Lambert-Shirzad, Navid, Li, Jian, Chou, Chiachen, Joshi, Siddharth, and Wang, Yu Emma
Subjects: Computer Science - Machine Learning
Abstract: Serving large-scale machine learning (ML) models efficiently and with low latency has become challenging owing to increasing model size and complexity. Quantizing models can simultaneously reduce memory and compute requirements, facilitating their widespread access. However, for large models not all layers are equally amenable to the same numerical precision and aggressive quantization can lead to unacceptable loss in model accuracy. One approach to prevent this accuracy degradation is mixed-precision quantization, which allows different tensors to be quantized to varying levels of numerical precision, leveraging the capabilities of modern hardware. Such mixed-precision quantiztaion can more effectively allocate numerical precision to different tensors `as needed' to preserve model accuracy while reducing footprint and compute latency. In this paper, we propose a method to efficiently determine quantization configurations of different tensors in ML models using post-training mixed precision quantization. We analyze three sensitivity metrics and evaluate them for guiding configuration search of two algorithms. We evaluate our method for computer vision and natural language processing and demonstrate latency reductions of up to 27.59% and 34.31% compared to the baseline 16-bit floating point model while guaranteeing no more than 1% accuracy degradation.
Published: 2023

47. Research On Software Testing Method Based on Social Media Risk Control

Author: Zhang, Xiaofan, Luo, Xun, Editor-in-Chief, Almohammedi, Akram A., Series Editor, Chen, Chi-Hua, Series Editor, Guan, Steven, Series Editor, Pamucar, Dragan, Series Editor, Zukarnain, Zuriati Ahmad, editor, Shen, Mouquan, editor, Perumal, Thinagaran, editor, and Zakuan, Norhayati, editor
Published: 2024
Full Text: View/download PDF

48. IOSSAM: Label Efficient Multi-view Prompt-Driven Tooth Segmentation

Author: Huang, Xinrui, He, Dongming, Li, Zhenming, Zhang, Xiaofan, Wang, Xudong, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Linguraru, Marius George, editor, Dou, Qi, editor, Feragen, Aasa, editor, Giannarou, Stamatia, editor, Glocker, Ben, editor, Lekadir, Karim, editor, and Schnabel, Julia A., editor
Published: 2024
Full Text: View/download PDF

49. Compilation and Optimizations for Efficient Machine Learning on Embedded Systems

Author: Zhang, Xiaofan, Chen, Yao, Hao, Cong, Huang, Sitao, Li, Yuhong, Chen, Deming, Pasricha, Sudeep, editor, and Shafique, Muhammad, editor
Published: 2024
Full Text: View/download PDF

50. Crystal structure of bis[(triaqua-4-iodopyridine-2,6-dicarboxylato-κ 3 N,O,O ″)cobalt(II)] trihydrate, C14H22N2O17I2Co2

Author: Wang Liye, Zhang Xiaofan, and Dong Hongxu
Subjects: 2324122, Physics, QC1-999, Crystallography, QD901-999
Abstract: C14H22N2O17I2Co2, monoclinic, P21/c (no. 14), a = 7.13476(14) Å, b = 11.1853(2) Å, c = 32.5727(5) Å, β = 93.5815(16)°, Z = 4, V = 2594.38(8) Å3, R gt(F) = 0.0447, wR ref(F 2) = 0.0865, T = 293.0 K.
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,511 results on '"Zhang, Xiaofan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources