2,832 results on '"Chen Zhihong"'
Search Results
2. Rehabilitation Efficacy of Core Stability Training Combined with All-Round Intensive Exercise Training on Children with Spastic Cerebral Palsy
- Author
-
WANG Jing, YUE Ling, WANG Zexi, CHEN Zhihong, SUN Suzhen, MA Guilin, BAI Bing, and CHEN Cuiying
- Subjects
spastic cerebral palsy ,core stability training ,all-round intensive exercise training ,walking parameters ,trunk control ,degree of spasticity ,functional independence ,Medicine - Abstract
ObjectiveTo investigate the rehabilitation effect of core stability training combined with all-round intensive exercise training on children with spastic cerebral palsy.MethodsA total of 144 children with spastic cerebral palsy treated in Children's Hospital of Hebei Province from September 2018 to August 2020 were divided into core stability group, intensive exercise group and combined group according to the experimental principle of 1∶1∶1 by stratified random method, with 48 cases in each group, and were provided with core stability training, all-round intensive exercise training, and core stability training + all-round intensive exercise training, respectively. The interventions for all three groups lasted for 6 months.The clinical efficacy of the three groups was statistically compared, as well as walking parameters (step length, width and speed), trunk control ability, Gross Motor Function Measure (GMFM), modified Ashworth scale (MAS) and Wee-Functional Independence Measure (WeeFIM) scores before intervention, and after 3 and 6 months of intervention.ResultsThe total effective rate of rehabilitation intervention in the combined group was higher than that in the core stability group and the intensive exercise group (PPPPPPPPPPP>0.05).ConclusionAll-round intensive exercise training combined with core stability training can be applied to children with spastic cerebral palsy, and further improve their walking function, trunk control ability and motor function, and relieve their spastic conditions.
- Published
- 2023
- Full Text
- View/download PDF
3. RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models
- Author
-
Varma, Maya, Delbrouck, Jean-Benoit, Chen, Zhihong, Chaudhari, Akshay, and Langlotz, Curtis
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Fine-tuned vision-language models (VLMs) often capture spurious correlations between image features and textual attributes, resulting in degraded zero-shot performance at test time. Existing approaches for addressing spurious correlations (i) primarily operate at the global image-level rather than intervening directly on fine-grained image features and (ii) are predominantly designed for unimodal settings. In this work, we present RaVL, which takes a fine-grained perspective on VLM robustness by discovering and mitigating spurious correlations using local image features rather than operating at the global image level. Given a fine-tuned VLM, RaVL first discovers spurious correlations by leveraging a region-level clustering approach to identify precise image features contributing to zero-shot classification errors. Then, RaVL mitigates the identified spurious correlation with a novel region-aware loss function that enables the VLM to focus on relevant regions and ignore spurious relationships during fine-tuning. We evaluate RaVL on 654 VLMs with various model architectures, data domains, and learned spurious correlations. Our results show that RaVL accurately discovers (191% improvement over the closest baseline) and mitigates (8.2% improvement on worst-group image classification accuracy) spurious correlations. Qualitative evaluations on general-domain and medical-domain VLMs confirm our findings., Comment: NeurIPS 2024
- Published
- 2024
4. Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback
- Author
-
Hein, Dennis, Chen, Zhihong, Ostmeier, Sophie, Xu, Justin, Varma, Maya, Reis, Eduardo Pontes, Michalson, Arne Edward, Bluethgen, Christian, Shin, Hyun Joo, Langlotz, Curtis, and Chaudhari, Akshay S
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Radiologists play a crucial role by translating medical images into medical reports. However, the field faces staffing shortages and increasing workloads. While automated approaches using vision-language models (VLMs) show promise as assistants, they require exceptionally high accuracy. Most current VLMs in radiology rely solely on supervised fine-tuning (SFT). Meanwhile, in the general domain, additional preference fine-tuning has become standard practice. The challenge in radiology lies in the prohibitive cost of obtaining radiologist feedback. We propose a scalable automated preference alignment technique for VLMs in radiology, focusing on chest X-ray (CXR) report generation. Our method leverages publicly available datasets with an LLM-as-a-Judge mechanism, eliminating the need for additional expert radiologist feedback. We evaluate and benchmark five direct alignment algorithms (DAAs). Our results show up to a 57.4% improvement in average GREEN scores, a LLM-based metric for evaluating CXR reports, and a 9.2% increase in an average across six metrics (domain specific and general), compared to the SFT baseline. We study reward overoptimization via length exploitation, with reports lengthening by up to 3.2x. To assess a potential alignment tax, we benchmark on six additional diverse tasks, finding no significant degradations. A reader study involving four board-certified radiologists indicates win rates of up to 0.62 over the SFT baseline, while significantly penalizing verbosity. Our analysis provides actionable insights for the development of VLMs in high-stakes fields like radiology.
- Published
- 2024
5. Overview of the First Shared Task on Clinical Text Generation: RRG24 and 'Discharge Me!'
- Author
-
Xu, Justin, Chen, Zhihong, Johnston, Andrew, Blankemeier, Louis, Varma, Maya, Hom, Jason, Collins, William J., Modi, Ankit, Lloyd, Robert, Hopkins, Benjamin, Langlotz, Curtis, and Delbrouck, Jean-Benoit
- Subjects
Computer Science - Computation and Language - Abstract
Recent developments in natural language generation have tremendous implications for healthcare. For instance, state-of-the-art systems could automate the generation of sections in clinical reports to alleviate physician workload and streamline hospital documentation. To explore these applications, we present a shared task consisting of two subtasks: (1) Radiology Report Generation (RRG24) and (2) Discharge Summary Generation ("Discharge Me!"). RRG24 involves generating the 'Findings' and 'Impression' sections of radiology reports given chest X-rays. "Discharge Me!" involves generating the 'Brief Hospital Course' and 'Discharge Instructions' sections of discharge summaries for patients admitted through the emergency department. "Discharge Me!" submissions were subsequently reviewed by a team of clinicians. Both tasks emphasize the goal of reducing clinician burnout and repetitive workloads by generating documentation. We received 201 submissions from across 8 teams for RRG24, and 211 submissions from across 16 teams for "Discharge Me!"., Comment: ACL Proceedings. BioNLP workshop
- Published
- 2024
- Full Text
- View/download PDF
6. Merlin: A Vision Language Foundation Model for 3D Computed Tomography
- Author
-
Blankemeier, Louis, Cohen, Joseph Paul, Kumar, Ashwin, Van Veen, Dave, Gardezi, Syed Jamal Safdar, Paschali, Magdalini, Chen, Zhihong, Delbrouck, Jean-Benoit, Reis, Eduardo, Truyts, Cesar, Bluethgen, Christian, Jensen, Malte Engmann Kjeldskov, Ostmeier, Sophie, Varma, Maya, Valanarasu, Jeya Maria Jose, Fang, Zhongnan, Huo, Zepeng, Nabulsi, Zaid, Ardila, Diego, Weng, Wei-Hung, Junior, Edson Amaro, Ahuja, Neera, Fries, Jason, Shah, Nigam H., Johnston, Andrew, Boutin, Robert D., Wentland, Andrew, Langlotz, Curtis P., Hom, Jason, Gatidis, Sergios, and Chaudhari, Akshay S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision language models (VLMs). However, current medical VLMs are generally limited to 2D images and short reports, and do not leverage electronic health record (EHR) data for supervision. We introduce Merlin - a 3D VLM that we train using paired CT scans (6+ million images from 15,331 CTs), EHR diagnosis codes (1.8+ million codes), and radiology reports (6+ million tokens). We evaluate Merlin on 6 task types and 752 individual tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification (31 findings), phenotype classification (692 phenotypes), and zero-shot cross-modal retrieval (image to findings and image to impressions), while model adapted tasks include 5-year disease prediction (6 diseases), radiology report generation, and 3D semantic segmentation (20 organs). We perform internal validation on a test set of 5,137 CTs, and external validation on 7,000 clinical CTs and on two public CT datasets (VerSe, TotalSegmentator). Beyond these clinically-relevant evaluations, we assess the efficacy of various network architectures and training strategies to depict that Merlin has favorable performance to existing task-specific baselines. We derive data scaling laws to empirically assess training data needs for requisite downstream task performance. Furthermore, unlike conventional VLMs that require hundreds of GPUs for training, we perform all training on a single GPU., Comment: 18 pages, 7 figures
- Published
- 2024
7. CheXpert Plus: Augmenting a Large Chest X-ray Dataset with Text Radiology Reports, Patient Demographics and Additional Image Formats
- Author
-
Chambon, Pierre, Delbrouck, Jean-Benoit, Sounack, Thomas, Huang, Shih-Cheng, Chen, Zhihong, Varma, Maya, Truong, Steven QH, Chuong, Chu The, and Langlotz, Curtis P.
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Since the release of the original CheXpert paper five years ago, CheXpert has become one of the most widely used and cited clinical AI datasets. The emergence of vision language models has sparked an increase in demands for sharing reports linked to CheXpert images, along with a growing interest among AI fairness researchers in obtaining demographic data. To address this, CheXpert Plus serves as a new collection of radiology data sources, made publicly available to enhance the scaling, performance, robustness, and fairness of models for all subsequent machine learning tasks in the field of radiology. CheXpert Plus is the largest text dataset publicly released in radiology, with a total of 36 million text tokens, including 13 million impression tokens. To the best of our knowledge, it represents the largest text de-identification effort in radiology, with almost 1 million PHI spans anonymized. It is only the second time that a large-scale English paired dataset has been released in radiology, thereby enabling, for the first time, cross-institution training at scale. All reports are paired with high-quality images in DICOM format, along with numerous image and patient metadata covering various clinical and socio-economic groups, as well as many pathology labels and RadGraph annotations. We hope this dataset will boost research for AI models that can further assist radiologists and help improve medical care. Data is available at the following URL: https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1 Models are available at the following URL: https://github.com/Stanford-AIMI/chexpert-plus, Comment: 13 pages Updated title
- Published
- 2024
8. GREEN: Generative Radiology Report Evaluation and Error Notation
- Author
-
Ostmeier, Sophie, Xu, Justin, Chen, Zhihong, Varma, Maya, Blankemeier, Louis, Bluethgen, Christian, Michalson, Arne Edward, Moseley, Michael, Langlotz, Curtis, Chaudhari, Akshay S, and Delbrouck, Jean-Benoit
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Evaluating radiology reports is a challenging problem as factual correctness is extremely important due to the need for accurate medical communication about medical images. Existing automatic evaluation metrics either suffer from failing to consider factual correctness (e.g., BLEU and ROUGE) or are limited in their interpretability (e.g., F1CheXpert and F1RadGraph). In this paper, we introduce GREEN (Generative Radiology Report Evaluation and Error Notation), a radiology report generation metric that leverages the natural language understanding of language models to identify and explain clinically significant errors in candidate reports, both quantitatively and qualitatively. Compared to current metrics, GREEN offers: 1) a score aligned with expert preferences, 2) human interpretable explanations of clinically significant errors, enabling feedback loops with end-users, and 3) a lightweight open-source method that reaches the performance of commercial counterparts. We validate our GREEN metric by comparing it to GPT-4, as well as to error counts of 6 experts and preferences of 2 experts. Our method demonstrates not only higher correlation with expert error counts, but simultaneously higher alignment with expert preferences when compared to previous approaches."
- Published
- 2024
9. Myocardial infarction augments sleep to limit cardiac inflammation and damage
- Author
-
Huynh, Pacific, Hoffmann, Jan D., Gerhardt, Teresa, Kiss, Máté G., Zuraikat, Faris M., Cohen, Oren, Wolfram, Christopher, Yates, Abi G., Leunig, Alexander, Heiser, Merlin, Gaebel, Lena, Gianeselli, Matteo, Goswami, Sukanya, Khamhoung, Annie, Downey, Jeffrey, Yoon, Seonghun, Chen, Zhihong, Roudko, Vladimir, Dawson, Travis, Ferreira da Silva, Joana, Ameral, Natalie J., Morgenroth-Rebin, Jarod, D’Souza, Darwin, Koekkoek, Laura L., Jacob, Walter, Munitz, Jazz, Lee, Donghoon, Fullard, John F., van Leent, Mandy M. T., Roussos, Panos, Kim-Schulze, Seunghee, Shah, Neomi, Kleinstiver, Benjamin P., Swirski, Filip K., Leistner, David, St-Onge, Marie-Pierre, and McAlpine, Cameron S.
- Published
- 2024
- Full Text
- View/download PDF
10. Arsenic trioxide and p97 inhibitor synergize against acute myeloid leukemia by targeting nascent polypeptides and activating the ZAKα–JNK pathway
- Author
-
Xie, Shufeng, Liu, Hui, Zhu, Shouhai, Chen, Zhihong, Wang, Ruiheng, Zhang, Wenjie, Xian, Huajian, Xiang, Rufang, Xia, Xiaoli, Sun, Yong, Long, Jinlan, Wang, Yuanli, Wang, Minghui, Wang, Yixin, Yu, Yaoyifu, Huang, Zixuan, Lu, Chaoqun, Xu, Zhenshu, and Liu, Han
- Published
- 2024
- Full Text
- View/download PDF
11. Large Multimodal Agents: A Survey
- Author
-
Xie, Junlin, Chen, Zhihong, Zhang, Ruifei, Wan, Xiang, and Li, Guanbin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Large language models (LLMs) have achieved superior performance in powering text-based AI agents, endowing them with decision-making and reasoning abilities akin to humans. Concurrently, there is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This extension enables AI agents to interpret and respond to diverse multimodal user queries, thereby handling more intricate and nuanced tasks. In this paper, we conduct a systematic review of LLM-driven multimodal agents, which we refer to as large multimodal agents ( LMAs for short). First, we introduce the essential components involved in developing LMAs and categorize the current body of research into four distinct types. Subsequently, we review the collaborative frameworks integrating multiple LMAs , enhancing collective efficacy. One of the critical challenges in this field is the diverse evaluation methods used across existing studies, hindering effective comparison among different LMAs . Therefore, we compile these evaluation methodologies and establish a comprehensive framework to bridge the gaps. This framework aims to standardize evaluations, facilitating more meaningful comparisons. Concluding our review, we highlight the extensive applications of LMAs and propose possible future research directions. Our discussion aims to provide valuable insights and guidelines for future research in this rapidly evolving field. An up-to-date resource list is available at https://github.com/jun0wanan/awesome-large-multimodal-agents., Comment: 15 pages, 4 figures
- Published
- 2024
12. ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models
- Author
-
Chen, Guiming Hardy, Chen, Shunian, Zhang, Ruifei, Chen, Junying, Wu, Xiangbo, Zhang, Zhiyi, Chen, Zhihong, Li, Jianquan, Wan, Xiang, and Wang, Benyou
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Large vision-language models (LVLMs) have shown premise in a broad range of vision-language tasks with their strong reasoning and generalization capabilities. However, they require considerable computational resources for training and deployment. This study aims to bridge the performance gap between traditional-scale LVLMs and resource-friendly lite versions by adopting high-quality training data. To this end, we propose a comprehensive pipeline for generating a synthetic dataset. The key idea is to leverage strong proprietary models to generate (i) fine-grained image annotations for vision-language alignment and (ii) complex reasoning visual question-answering pairs for visual instruction fine-tuning, yielding 1.3M samples in total. We train a series of lite VLMs on the synthetic dataset and experimental results demonstrate the effectiveness of the proposed scheme, where they achieve competitive performance on 17 benchmarks among 4B LVLMs, and even perform on par with 7B/13B-scale models on various benchmarks. This work highlights the feasibility of adopting high-quality data in crafting more efficient LVLMs. We name our dataset \textit{ALLaVA}, and open-source it to research community for developing better resource-efficient LVLMs for wider usage., Comment: 22 pages
- Published
- 2024
13. CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation
- Author
-
Chen, Zhihong, Varma, Maya, Delbrouck, Jean-Benoit, Paschali, Magdalini, Blankemeier, Louis, Van Veen, Dave, Valanarasu, Jeya Maria Jose, Youssef, Alaa, Cohen, Joseph Paul, Reis, Eduardo Pontes, Tsai, Emily B., Johnston, Andrew, Olsen, Cameron, Abraham, Tanishq Mathew, Gatidis, Sergios, Chaudhari, Akshay S., and Langlotz, Curtis
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice. Recent advances in the development of vision-language foundation models (FMs) give rise to the possibility of performing automated CXR interpretation, which can assist physicians with clinical decision-making and improve patient outcomes. However, developing FMs that can accurately interpret CXRs is challenging due to the (1) limited availability of large-scale vision-language datasets in the medical image domain, (2) lack of vision and language encoders that can capture the complexities of medical data, and (3) absence of evaluation frameworks for benchmarking the abilities of FMs on CXR interpretation. In this work, we address these challenges by first introducing \emph{CheXinstruct} - a large-scale instruction-tuning dataset curated from 28 publicly-available datasets. We then present \emph{CheXagent} - an instruction-tuned FM capable of analyzing and summarizing CXRs. To build CheXagent, we design a clinical large language model (LLM) for parsing radiology reports, a vision encoder for representing CXR images, and a network to bridge the vision and language modalities. Finally, we introduce \emph{CheXbench} - a novel benchmark designed to systematically evaluate FMs across 8 clinically-relevant CXR interpretation tasks. Extensive quantitative evaluations and qualitative reviews with five expert radiologists demonstrate that CheXagent outperforms previously-developed general- and medical-domain FMs on CheXbench tasks. Furthermore, in an effort to improve model transparency, we perform a fairness evaluation across factors of sex, race and age to highlight potential performance disparities. Our project is at \url{https://stanford-aimi.github.io/chexagent.html}., Comment: 24 pages, 8 figures
- Published
- 2024
14. Tailoring Amorphous Boron Nitride for High-Performance 2D Electronics
- Author
-
Chen, Cindy Y., Sun, Zheng, Torsi, Riccardo, Wang, Ke, Kachian, Jessica, Liu, Bangzhi, Rayner Jr, Gilbert B., Chen, Zhihong, Appenzeller, Joerg, Lin, Yu-Chuan, and Robinson, Joshua A.
- Subjects
Condensed Matter - Materials Science - Abstract
Two-dimensional (2D) materials have garnered significant attention in recent years due to their atomically thin structure and unique electronic and optoelectronic properties. To harness their full potential for applications in next-generation electronics and photonics, precise control over the dielectric environment surrounding the 2D material is critical. The lack of nucleation sites on 2D surfaces to form thin, uniform dielectric layers often leads to interfacial defects that degrade the device performance, posing a major roadblock in the realization of 2D-based devices. Here, we demonstrate a wafer-scale, low-temperature process (< 250 {\deg}C) using atomic layer deposition (ALD) for the synthesis of uniform, conformal amorphous boron nitride (aBN) thin films. ALD deposition temperatures between 125 and 250 {\deg}C result in stoichiometric films with high oxidative stability, yielding a dielectric strength of 8.2 MV/cm. Utilizing a seed-free ALD approach, we form uniform aBN dielectric layers on 2D surfaces and fabricate multiple quantum well structures of aBN/MoS2 and aBN-encapsulated double-gated monolayer (ML) MoS2 field-effect transistors to evaluate the impact of aBN dielectric environment on MoS2 optoelectronic and electronic properties. Our work in scalable aBN dielectric integration paves a way towards realizing the theoretical performance of 2D materials for next-generation electronics., Comment: 27 pages, 4 figures
- Published
- 2023
15. MsNet: Multi-stage Learning from Seldom Labeled Data for 3D Tooth Segmentation in Dental Cone Beam Computed Tomography
- Author
-
Kang, Xuewei, Qiu, Bingjiang, Yao, Lisha, Chen, Zhihong, Han, Chu, Liu, Zaiyi, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Wang, Yaqi, editor, Chen, Xiaodiao, editor, Qian, Dahong, editor, Ye, Fan, editor, Wang, Shuai, editor, and Zhang, Hongyuan, editor
- Published
- 2025
- Full Text
- View/download PDF
16. Terpyridine-based metallo-cuboctahedron nanomaterials for efficient photocatalytic degradation of persistent organic pollutants
- Author
-
Bai, Qixia, Huang, Yan, Chen, Zhihong, Pan, Yilin, Zhang, Xiaohan, Long, Qingwu, Yang, Qiaoan, Wu, Tun, Xie, Ting-Zheng, Wang, Mingjian, Luo, Hongguang, Hu, Chun, Wang, Pingshan, and Zhang, Zhe
- Published
- 2024
- Full Text
- View/download PDF
17. MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria
- Author
-
Ge, Wentao, Chen, Shunian, Chen, Guiming Hardy, Chen, Junying, Chen, Zhihong, Chen, Nuo, Xie, Wenya, Yan, Shuo, Zhu, Chenghao, Lin, Ziyue, Dingjie, Song, Wang, Xidong, Gao, Anningzhe, Zhiyi, Zhang, Li, Jianquan, Wan, Xiang, and Wang, Benyou
- Subjects
Computer Science - Computation and Language - Abstract
Multimodal large language models (MLLMs) have broadened the scope of AI applications. Existing automatic evaluation methodologies for MLLMs are mainly limited in evaluating queries without considering user experiences, inadequately addressing the nuances of creative and associative multimodal tasks. However, the open-ended and subjective nature of such tasks poses a significant challenge to the evaluation methodology, where it is difficult to define the ground-truth answers for them. To this end, in our paper, we propose a new evaluation paradigm for MLLMs, which is evaluating MLLMs with per-sample criteria using potent MLLM as the judge. To validate the feasibility and effectiveness of this paradigm, we design a benchmark, dubbed MLLM-Bench, by curating the evaluation samples across six comprehensive cognitive levels. We benchmark 21 popular MLLMs in a pairwise-comparison fashion, showing diverse performance across models. Moreover, the validity of our benchmark manifests itself in reaching 88.02% agreement with human evaluation. We contend that the proposed paradigm explores the potential of MLLMs as effective evaluation tools with the help of per-sample criteria. See online leaderboard at \url{https://mllm-bench.llmzoo.com}., Comment: 23 pages
- Published
- 2023
18. Exploiting Low-confidence Pseudo-labels for Source-free Object Detection
- Author
-
Chen, Zhihong, Wang, Zilei, and Zhang, Yixin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Source-free object detection (SFOD) aims to adapt a source-trained detector to an unlabeled target domain without access to the labeled source data. Current SFOD methods utilize a threshold-based pseudo-label approach in the adaptation phase, which is typically limited to high-confidence pseudo-labels and results in a loss of information. To address this issue, we propose a new approach to take full advantage of pseudo-labels by introducing high and low confidence thresholds. Specifically, the pseudo-labels with confidence scores above the high threshold are used conventionally, while those between the low and high thresholds are exploited using the Low-confidence Pseudo-labels Utilization (LPU) module. The LPU module consists of Proposal Soft Training (PST) and Local Spatial Contrastive Learning (LSCL). PST generates soft labels of proposals for soft training, which can mitigate the label mismatch problem. LSCL exploits the local spatial relationship of proposals to improve the model's ability to differentiate between spatially adjacent proposals, thereby optimizing representational features further. Combining the two components overcomes the challenges faced by traditional methods in utilizing low-confidence pseudo-labels. Extensive experiments on five cross-domain object detection benchmarks demonstrate that our proposed method outperforms the previous SFOD methods, achieving state-of-the-art performance.
- Published
- 2023
- Full Text
- View/download PDF
19. AceGPT, Localizing Large Language Models in Arabic
- Author
-
Huang, Huang, Yu, Fei, Zhu, Jianqing, Sun, Xuening, Cheng, Hao, Song, Dingjie, Chen, Zhihong, Alharthi, Abdulmohsen, An, Bang, He, Juncai, Liu, Ziche, Zhang, Zhiyi, Chen, Junying, Li, Jianquan, Wang, Benyou, Zhang, Lian, Sun, Ruoyu, Wan, Xiang, Li, Haizhou, and Xu, Jinchao
- Subjects
Computer Science - Computation and Language - Abstract
This paper is devoted to the development of a localized Large Language Model (LLM) specifically for Arabic, a language imbued with unique cultural characteristics inadequately addressed by current mainstream models. Significant concerns emerge when addressing cultural sensitivity and local values. To address this, the paper proposes a comprehensive solution that includes further pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic, alongside Reinforcement Learning with AI Feedback (RLAIF) employing a reward model attuned to local culture and values. The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities. Comprehensive evaluations reveal that the resulting model, dubbed `AceGPT', sets the state-of-the-art standard for open Arabic LLMs across various benchmarks. Codes, data, and models are in https://github.com/FreedomIntelligence/AceGPT., Comment: Accepted to NAACL main conference. https://github.com/FreedomIntelligence/AceGPT
- Published
- 2023
20. Experimental demonstration of an integrated on-chip p-bit core utilizing stochastic Magnetic Tunnel Junctions and 2D-MoS2 FETs
- Author
-
Daniel, John, Sun, Zheng, Zhang, Xuejian, Tan, Yuanqiu, Dilley, Neil, Chen, Zhihong, and Appenzeller, Joerg
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Computer Science - Emerging Technologies - Abstract
Probabilistic computing is a novel computing scheme that offers a more efficient approach than conventional CMOS-based logic in a variety of applications ranging from optimization to Bayesian inference, and invertible Boolean logic. The probabilistic-bit (or p-bit, the base unit of probabilistic computing) is a naturally fluctuating entity that requires tunable stochasticity; by coupling low-barrier stochastic Magnetic Tunnel Junctions (MTJs) with a transistor circuit, a compact implementation is achieved. In this work, through integrating stochastic MTJs with 2D-MoS$_{2}$ FETs, the first on-chip realization of a key p-bit building block displaying voltage-controllable stochasticity is demonstrated. In addition, supported by circuit simulations, this work provides a careful analysis of the three transistor-one magnetic tunnel junction (3T-1MTJ) p-bit design, evaluating how the characteristics of each component influence the overall p-bit output. This understanding of the interplay between the characteristics of the transistors and the MTJ is vital for the construction of a fully functioning p-bit, making the design rules presented in this article key for future experimental implementations of scaled on-chip p-bit networks.
- Published
- 2023
- Full Text
- View/download PDF
21. CMB: A Comprehensive Medical Benchmark in Chinese
- Author
-
Wang, Xidong, Chen, Guiming Hardy, Song, Dingjie, Zhang, Zhiyi, Chen, Zhihong, Xiao, Qingying, Jiang, Feng, Li, Jianquan, Wan, Xiang, Wang, Benyou, and Li, Haizhou
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Large Language Models (LLMs) provide a possibility to make a great breakthrough in medicine. The establishment of a standardized medical benchmark becomes a fundamental cornerstone to measure progression. However, medical environments in different regions have their local characteristics, e.g., the ubiquity and significance of traditional Chinese medicine within China. Therefore, merely translating English-based medical evaluation may result in \textit{contextual incongruities} to a local region. To solve the issue, we propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese, designed and rooted entirely within the native Chinese linguistic and cultural framework. While traditional Chinese medicine is integral to this evaluation, it does not constitute its entirety. Using this benchmark, we have evaluated several prominent large-scale LLMs, including ChatGPT, GPT-4, dedicated Chinese LLMs, and LLMs specialized in the medical domain. We hope this benchmark provide first-hand experience in existing LLMs for medicine and also facilitate the widespread adoption and enhancement of medical LLMs within China. Our data and code are publicly available at https://github.com/FreedomIntelligence/CMB., Comment: Accepted to NAACL 2024 Main Conference
- Published
- 2023
22. Advancing Visual Grounding with Scene Knowledge: Benchmark and Method
- Author
-
Chen, Zhihong, Zhang, Ruifei, Song, Yibing, Wan, Xiang, and Li, Guanbin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Visual grounding (VG) aims to establish fine-grained alignment between vision and language. Ideally, it can be a testbed for vision-and-language models to evaluate their understanding of the images and texts and their reasoning abilities over their joint space. However, most existing VG datasets are constructed using simple description texts, which do not require sufficient reasoning over the images and texts. This has been demonstrated in a recent study~\cite{luo2022goes}, where a simple LSTM-based text encoder without pretraining can achieve state-of-the-art performance on mainstream VG datasets. Therefore, in this paper, we propose a novel benchmark of \underline{S}cene \underline{K}nowledge-guided \underline{V}isual \underline{G}rounding (SK-VG), where the image content and referring expressions are not sufficient to ground the target objects, forcing the models to have a reasoning ability on the long-form scene knowledge. To perform this task, we propose two approaches to accept the triple-type input, where the former embeds knowledge into the image features before the image-query interaction; the latter leverages linguistic structure to assist in computing the image-text matching. We conduct extensive experiments to analyze the above methods and show that the proposed approaches achieve promising results but still leave room for improvement, including performance and interpretability. The dataset and code are available at \url{https://github.com/zhjohnchan/SK-VG}., Comment: Computer Vision and Natural Language Processing. 21 pages, 14 figures. CVPR-2023
- Published
- 2023
23. Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
- Author
-
Xu, Zunnan, Chen, Zhihong, Zhang, Yong, Song, Yibing, Wan, Xiang, and Li, Guanbin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Parameter Efficient Tuning (PET) has gained attention for reducing the number of parameters while maintaining performance and providing better hardware resource savings, but few studies investigate dense prediction tasks and interaction between modalities. In this paper, we do an investigation of efficient tuning problems on referring image segmentation. We propose a novel adapter called Bridger to facilitate cross-modal information exchange and inject task-specific information into the pre-trained model. We also design a lightweight decoder for image segmentation. Our approach achieves comparable or superior performance with only 1.61\% to 3.38\% backbone parameter updates, evaluated on challenging benchmarks. The code is available at \url{https://github.com/kkakkkka/ETRIS}., Comment: Computer Vision and Natural Language Processing. 14 pages, 8 figures. ICCV-2023
- Published
- 2023
24. On the Difference of BERT-style and CLIP-style Text Encoders
- Author
-
Chen, Zhihong, Chen, Guiming Hardy, Diao, Shizhe, Wan, Xiang, and Wang, Benyou
- Subjects
Computer Science - Computation and Language - Abstract
Masked language modeling (MLM) has been one of the most popular pretraining recipes in natural language processing, e.g., BERT, one of the representative models. Recently, contrastive language-image pretraining (CLIP) has also attracted attention, especially its vision models that achieve excellent performance on a broad range of vision tasks. However, few studies are dedicated to studying the text encoders learned by CLIP. In this paper, we analyze the difference between BERT-style and CLIP-style text encoders from three experiments: (i) general text understanding, (ii) vision-centric text understanding, and (iii) text-to-image generation. Experimental analyses show that although CLIP-style text encoders underperform BERT-style ones for general text understanding tasks, they are equipped with a unique ability, i.e., synesthesia, for the cross-modal association, which is more similar to the senses of humans., Comment: Natural Language Processing. 10 pages, 1 figure. Findings of ACL-2023
- Published
- 2023
25. FANet: focus-aware lightweight light field salient object detection network
- Author
-
Fu, Jiamin, Chen, Zhihong, Zhang, Haiwei, Gao, Yuxuan, Xu, Haitao, and Zhang, Hao
- Published
- 2025
- Full Text
- View/download PDF
26. Disorganized chromatin hierarchy and stem cell aging in a male patient of atypical laminopathy-based progeria mandibuloacral dysplasia type A
- Author
-
Jin, Wei, Jiang, Shaoshuai, Liu, Xinyi, He, Yi, Li, Tuo, Ma, Jingchun, Chen, Zhihong, Lu, Xiaomei, Liu, Xinguang, Shou, Weinian, Jin, Guoxiang, Ding, Junjun, and Zhou, Zhongjun
- Published
- 2024
- Full Text
- View/download PDF
27. The role of CISD1 reduction in macrophages in promoting COPD development through M1 polarization and mitochondrial dysfunction
- Author
-
Gao, Jiameng, Dong, Meiyuan, Tian, Weibin, Xia, Junyi, Qian, Yuhao, Jiang, Zhilong, Chen, Zhihong, and Shen, Yao
- Published
- 2024
- Full Text
- View/download PDF
28. Author Correction: Longitudinal single-cell profiling reveals molecular heterogeneity and tumor-immune evolution in refractory mantle cell lymphoma
- Author
-
Zhang, Shaojun, Jiang, Vivian Changying, Han, Guangchun, Hao, Dapeng, Lian, Junwei, Liu, Yang, Cai, Qingsong, Zhang, Rongjia, McIntosh, Joseph, Wang, Ruiping, Dang, Minghao, Dai, Enyu, Wang, Yuanxin, Santos, David, Badillo, Maria, Leeming, Angela, Chen, Zhihong, Hartig, Kimberly, Bigcal, John, Zhou, Jia, Kanagal-Shamanna, Rashmi, Ok, Chi Young, Lee, Hun, Steiner, Raphael E., Zhang, Jianhua, Song, Xingzhi, Nair, Ranjit, Ahmed, Sairah, Rodriquez, Alma, Thirumurthi, Selvi, Jain, Preetesh, Wagner-Bartak, Nicolaus, Hill, Holly, Nomie, Krystle, Flowers, Christopher, Futreal, Andrew, Wang, Linghua, and Wang, Michael
- Published
- 2024
- Full Text
- View/download PDF
29. Research on intelligent monitoring technology for roof damage of traditional Chinese residential buildings based on improved YOLOv8: taking ancient villages in southern Fujian as an example
- Author
-
Qiu, Haochen, Zhang, Jiahao, Zhuo, Lingchen, Xiao, Qi, Chen, Zhihong, and Tian, Hua
- Published
- 2024
- Full Text
- View/download PDF
30. Experimental demonstration of an on-chip p-bit core based on stochastic magnetic tunnel junctions and 2D MoS2 transistors
- Author
-
Daniel, John, Sun, Zheng, Zhang, Xuejian, Tan, Yuanqiu, Dilley, Neil, Chen, Zhihong, and Appenzeller, Joerg
- Published
- 2024
- Full Text
- View/download PDF
31. Tailoring amorphous boron nitride for high-performance two-dimensional electronics
- Author
-
Chen, Cindy Y., Sun, Zheng, Torsi, Riccardo, Wang, Ke, Kachian, Jessica, Liu, Bangzhi, Rayner, Jr, Gilbert B., Chen, Zhihong, Appenzeller, Joerg, Lin, Yu-Chuan, and Robinson, Joshua A.
- Published
- 2024
- Full Text
- View/download PDF
32. Deep learning-aided 3D proxy-bridged region-growing framework for multi-organ segmentation
- Author
-
Chen, Zhihong, Yao, Lisha, Liu, Yue, Han, Xiaorui, Gong, Zhengze, Luo, Jichao, Zhao, Jietong, and Fang, Gang
- Published
- 2024
- Full Text
- View/download PDF
33. SARS-CoV-2 infection increases airway bleeding risk in patients after tracheostomies
- Author
-
Tang, Shupin, Lin, Gongbiao, Wu, Xiaobo, and Chen, Zhihong
- Published
- 2024
- Full Text
- View/download PDF
34. Epstein-Barr virus-positive inflammatory follicular dendritic cell sarcoma with significant granuloma: case report and literature review
- Author
-
Nie, Chenchen, Xie, Xun, Li, Hangyan, Li, Yangcan, Chen, Zhihong, Li, Yanchun, and Li, Zhenfeng
- Published
- 2024
- Full Text
- View/download PDF
35. The HSP90-MYC-CDK9 network drives therapeutic resistance in mantle cell lymphoma
- Author
-
Yan, Fangfang, Jiang, Vivian, Jordan, Alexa, Che, Yuxuan, Liu, Yang, Cai, Qingsong, Xue, Yu, Li, Yijing, McIntosh, Joseph, Chen, Zhihong, Vargas, Jovanny, Nie, Lei, Yao, Yixin, Lee, Heng-Huan, Wang, Wei, Bigcal, JohnNelson R., Badillo, Maria, Meena, Jitendra, Flowers, Christopher, Zhou, Jia, Zhao, Zhongming, Simon, Lukas M., and Wang, Michael
- Published
- 2024
- Full Text
- View/download PDF
36. Improved remote sensing image target detection based on YOLOv7
- Author
-
Xu, Shuanglong, Chen, Zhihong, Zhang, Haiwei, Xue, Lifang, and Su, Huijun
- Published
- 2024
- Full Text
- View/download PDF
37. HuatuoGPT, towards Taming Language Model to Be a Doctor
- Author
-
Zhang, Hongbo, Chen, Junying, Jiang, Feng, Yu, Fei, Chen, Zhihong, Li, Jianquan, Chen, Guiming, Wu, Xiangbo, Zhang, Zhiyi, Xiao, Qingying, Wan, Xiang, Wang, Benyou, and Li, Haizhou
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
In this paper, we present HuatuoGPT, a large language model (LLM) for medical consultation. The core recipe of HuatuoGPT is to leverage both \textit{distilled data from ChatGPT} and \textit{real-world data from doctors} in the supervised fine-tuned stage. The responses of ChatGPT are usually detailed, well-presented and informative while it cannot perform like a doctor in many aspects, e.g. for integrative diagnosis. We argue that real-world data from doctors would be complementary to distilled data in the sense the former could tame a distilled language model to perform like doctors. To better leverage the strengths of both data, we train a reward model to align the language model with the merits that both data bring, following an RLAIF (reinforced learning from AI feedback) fashion. To evaluate and benchmark the models, we propose a comprehensive evaluation scheme (including automatic and manual metrics). Experimental results demonstrate that HuatuoGPT achieves state-of-the-art results in performing medical consultation among open-source LLMs in GPT-4 evaluation, human evaluation, and medical benchmark datasets. It is worth noting that by using additional real-world data and RLAIF, the distilled language model (i.e., HuatuoGPT) outperforms its teacher model ChatGPT in most cases. Our code, data, and models are publicly available at \url{https://github.com/FreedomIntelligence/HuatuoGPT}. The online demo is available at \url{https://www.HuatuoGPT.cn/}.
- Published
- 2023
38. Phoenix: Democratizing ChatGPT across Languages
- Author
-
Chen, Zhihong, Jiang, Feng, Chen, Junying, Wang, Tiannan, Yu, Fei, Chen, Guiming, Zhang, Hongbo, Liang, Juhao, Zhang, Chen, Zhang, Zhiyi, Li, Jianquan, Wan, Xiang, Wang, Benyou, and Li, Haizhou
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
This paper presents our efforts to democratize ChatGPT across language. We release a large language model "Phoenix", achieving competitive performance among open-source English and Chinese models while excelling in languages with limited resources (covering both Latin and non-Latin languages). We believe this work will be beneficial to make ChatGPT more accessible, especially in countries where people cannot use ChatGPT due to restrictions from OpenAI or local goverments. Our data, code, and models are available at https://github.com/FreedomIntelligence/LLMZoo.
- Published
- 2023
39. Correction: Arsenic trioxide and p97 inhibitor synergize against acute myeloid leukemia by targeting nascent polypeptides and activating the ZAKα–JNK pathway
- Author
-
Xie, Shufeng, Liu, Hui, Zhu, Shouhai, Chen, Zhihong, Wang, Ruiheng, Zhang, Wenjie, Xian, Huajian, Xiang, Rufang, Xia, Xiaoli, Sun, Yong, Long, Jinlan, Wang, Yuanli, Wang, Minghui, Wang, Yixin, Yu, Yaoyifu, Huang, Zixuan, Lu, Chaoqun, Xu, Zhenshu, and Liu, Han
- Published
- 2024
- Full Text
- View/download PDF
40. Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts
- Author
-
Chen, Zhihong, Diao, Shizhe, Wang, Benyou, Li, Guanbin, and Wan, Xiang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Medical vision-and-language pre-training (Med-VLP) has shown promising improvements on many downstream medical tasks owing to its applicability to extracting generic representations from medical images and texts. Practically, there exist two typical types, \textit{i.e.}, the fusion-encoder type and the dual-encoder type, depending on whether a heavy fusion module is used. The former is superior at multi-modal tasks owing to the sufficient interaction between modalities; the latter is good at uni-modal and cross-modal tasks due to the single-modality encoding ability. To take advantage of these two types, we propose an effective yet straightforward scheme named PTUnifier to unify the two types. We first unify the input format by introducing visual and textual prompts, which serve as a feature bank that stores the most representative images/texts. By doing so, a single model could serve as a \textit{foundation model} that processes various tasks adopting different input formats (\textit{i.e.}, image-only, text-only, and image-text-pair). Furthermore, we construct a prompt pool (instead of static ones) to improve diversity and scalability. Experimental results show that our approach achieves state-of-the-art results on a broad range of tasks, spanning uni-modal tasks (\textit{i.e.}, image/text classification and text summarization), cross-modal tasks (\textit{i.e.}, image-to-text generation and image-text/text-image retrieval), and multi-modal tasks (\textit{i.e.}, visual question answering), demonstrating the effectiveness of our approach. Note that the adoption of prompts is orthogonal to most existing Med-VLP approaches and could be a beneficial and complementary extension to these approaches., Comment: Work in progress
- Published
- 2023
41. GIPA: A General Information Propagation Algorithm for Graph Learning
- Author
-
Li, Houyi, Chen, Zhihong, Li, Zhao, Zheng, Qinkai, Zhang, Peng, and Zhou, Shuigeng
- Subjects
Computer Science - Machine Learning - Abstract
Graph neural networks (GNNs) have been widely used in graph-structured data computation, showing promising performance in various applications such as node classification, link prediction, and network recommendation. Existing works mainly focus on node-wise correlation when doing weighted aggregation of neighboring nodes based on attention, such as dot product by the dense vectors of two nodes. This may cause conflicting noise in nodes to be propagated when doing information propagation. To solve this problem, we propose a General Information Propagation Algorithm (GIPA in short), which exploits more fine-grained information fusion including bit-wise and feature-wise correlations based on edge features in their propagation. Specifically, the bit-wise correlation calculates the element-wise attention weight through a multi-layer perceptron (MLP) based on the dense representations of two nodes and their edge; The feature-wise correlation is based on the one-hot representations of node attribute features for feature selection. We evaluate the performance of GIPA on the Open Graph Benchmark proteins (OGBN-proteins for short) dataset and the Alipay dataset of Alibaba. Experimental results reveal that GIPA outperforms the state-of-the-art models in terms of prediction accuracy, e.g., GIPA achieves an average ROC-AUC of $0.8901\pm 0.0011$, which is better than that of all the existing methods listed in the OGBN-proteins leaderboard., Comment: Accepted by DASFAA2023. arXiv admin note: substantial text overlap with arXiv:2105.06035
- Published
- 2023
42. Generalizing Multimodal Variational Methods to Sets
- Author
-
Zhou, Jinzhao, Duan, Yiqun, Chen, Zhihong, Chang, Yu-Cheng, and Lin, Chin-Teng
- Subjects
Computer Science - Artificial Intelligence - Abstract
Making sense of multiple modalities can yield a more comprehensive description of real-world phenomena. However, learning the co-representation of diverse modalities is still a long-standing endeavor in emerging machine learning applications and research. Previous generative approaches for multimodal input approximate a joint-modality posterior by uni-modality posteriors as product-of-experts (PoE) or mixture-of-experts (MoE). We argue that these approximations lead to a defective bound for the optimization process and loss of semantic connection among modalities. This paper presents a novel variational method on sets called the Set Multimodal VAE (SMVAE) for learning a multimodal latent space while handling the missing modality problem. By modeling the joint-modality posterior distribution directly, the proposed SMVAE learns to exchange information between multiple modalities and compensate for the drawbacks caused by factorization. In public datasets of various domains, the experimental results demonstrate that the proposed method is applicable to order-agnostic cross-modal generation while achieving outstanding performance compared to the state-of-the-art multimodal methods. The source code for our method is available online https://anonymous.4open.science/r/SMVAE-9B3C/., Comment: First Submission
- Published
- 2022
43. AdaptNet: Adaptive Learning from Partially Labeled Data for Abdomen Multi-organ and Tumor Segmentation
- Author
-
Luo, JiChao, Chen, Zhihong, Liu, Wenbin, Liu, Zaiyi, Qiu, Bingjiang, Fang, Gang, Hartmanis, Juris, Founding Editor, van Leeuwen, Jan, Series Editor, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Kobsa, Alfred, Series Editor, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Nierstrasz, Oscar, Series Editor, Pandu Rangan, C., Editorial Board Member, Sudan, Madhu, Series Editor, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Vardi, Moshe Y, Series Editor, Goos, Gerhard, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Ma, Jun, editor, and Wang, Bo, editor
- Published
- 2024
- Full Text
- View/download PDF
44. Toward expanding the scope of radiology report summarization to multiple anatomies and modalities
- Author
-
Chen, Zhihong, Varma, Maya, Wan, Xiang, Langlotz, Curtis, and Delbrouck, Jean-Benoit
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Radiology report summarization (RRS) is a growing area of research. Given the Findings section of a radiology report, the goal is to generate a summary (called an Impression section) that highlights the key observations and conclusions of the radiology study. However, RRS currently faces essential limitations.First, many prior studies conduct experiments on private datasets, preventing reproduction of results and fair comparisons across different systems and solutions. Second, most prior approaches are evaluated solely on chest X-rays. To address these limitations, we propose a dataset (MIMIC-RRS) involving three new modalities and seven new anatomies based on the MIMIC-III and MIMIC-CXR datasets. We then conduct extensive experiments to evaluate the performance of models both within and across modality-anatomy pairs in MIMIC-RRS. In addition, we evaluate their clinical efficacy via RadGraph, a factual correctness metric.
- Published
- 2022
- Full Text
- View/download PDF
45. Improving Radiology Summarization with Radiograph and Anatomy Prompts
- Author
-
Hu, Jinpeng, Chen, Zhihong, Liu, Yang, Wan, Xiang, and Chang, Tsung-Hui
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
The impression is crucial for the referring physicians to grasp key information since it is concluded from the findings and reasoning of radiologists. To alleviate the workload of radiologists and reduce repetitive human labor in impression writing, many researchers have focused on automatic impression generation. However, recent works on this task mainly summarize the corresponding findings and pay less attention to the radiology images. In clinical, radiographs can provide more detailed valuable observations to enhance radiologists' impression writing, especially for complicated cases. Besides, each sentence in findings usually focuses on single anatomy, so they only need to be matched to corresponding anatomical regions instead of the whole image, which is beneficial for textual and visual features alignment. Therefore, we propose a novel anatomy-enhanced multimodal model to promote impression generation. In detail, we first construct a set of rules to extract anatomies and put these prompts into each sentence to highlight anatomy characteristics. Then, two separate encoders are applied to extract features from the radiograph and findings. Afterward, we utilize a contrastive learning module to align these two representations at the overall level and use a co-attention to fuse them at the sentence level with the help of anatomy-enhanced sentence representation. Finally, the decoder takes the fused information as the input to generate impressions. The experimental results on two benchmark datasets confirm the effectiveness of the proposed method, which achieves state-of-the-art results., Comment: 11 pages, ACL2023 Findings
- Published
- 2022
46. Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge
- Author
-
Chen, Zhihong, Li, Guanbin, and Wan, Xiang
- Subjects
Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Medical vision-and-language pre-training (Med-VLP) has received considerable attention owing to its applicability to extracting generic vision-and-language representations from medical images and texts. Most existing methods mainly contain three elements: uni-modal encoders (i.e., a vision encoder and a language encoder), a multi-modal fusion module, and pretext tasks, with few studies considering the importance of medical domain expert knowledge and explicitly exploiting such knowledge to facilitate Med-VLP. Although there exist knowledge-enhanced vision-and-language pre-training (VLP) methods in the general domain, most require off-the-shelf toolkits (e.g., object detectors and scene graph parsers), which are unavailable in the medical domain. In this paper, we propose a systematic and effective approach to enhance Med-VLP by structured medical knowledge from three perspectives. First, considering knowledge can be regarded as the intermediate medium between vision and language, we align the representations of the vision encoder and the language encoder through knowledge. Second, we inject knowledge into the multi-modal fusion model to enable the model to perform reasoning using knowledge as the supplementation of the input image and text. Third, we guide the model to put emphasis on the most critical information in images and texts by designing knowledge-induced pretext tasks. To perform a comprehensive evaluation and facilitate further research, we construct a medical vision-and-language benchmark including three tasks. Experimental results illustrate the effectiveness of our approach, where state-of-the-art performance is achieved on all downstream tasks. Further analyses explore the effects of different components of our approach and various settings of pre-training., Comment: Natural Language Processing. 10 pages, 3 figures
- Published
- 2022
47. Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training
- Author
-
Chen, Zhihong, Du, Yuhao, Hu, Jinpeng, Liu, Yang, Li, Guanbin, Wan, Xiang, and Chang, Tsung-Hui
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Medical vision-and-language pre-training provides a feasible solution to extract effective vision-and-language representations from medical images and texts. However, few studies have been dedicated to this field to facilitate medical vision-and-language understanding. In this paper, we propose a self-supervised learning paradigm with multi-modal masked autoencoders (M$^3$AE), which learn cross-modal domain knowledge by reconstructing missing pixels and tokens from randomly masked images and texts. There are three key designs to make this simple approach work. First, considering the different information densities of vision and language, we adopt different masking ratios for the input image and text, where a considerably larger masking ratio is used for images. Second, we use visual and textual features from different layers to perform the reconstruction to deal with different levels of abstraction in visual and language. Third, we develop different designs for vision and language decoders (i.e., a Transformer for vision and a multi-layer perceptron for language). To perform a comprehensive evaluation and facilitate further research, we construct a medical vision-and-language benchmark including three tasks. Experimental results demonstrate the effectiveness of our approach, where state-of-the-art results are achieved on all downstream tasks. Besides, we conduct further analysis to better verify the effectiveness of different components of our approach and various settings of pre-training. The source code is available at~\url{https://github.com/zhjohnchan/M3AE}., Comment: Natural Language Processing. 11 pages, 3 figures
- Published
- 2022
48. A novel CD47-blocking peptide fused to pro-apoptotic KLA repeat inhibits lung cancer growth in mice
- Author
-
Pan, Linyue, Hu, Lu, Chen, Mengjie, Song, Yuanlin, Chen, Zhihong, Gu, Yutong, Li, Chun, and Jiang, Zhilong
- Published
- 2023
- Full Text
- View/download PDF
49. Cross-modal Memory Networks for Radiology Report Generation
- Author
-
Chen, Zhihong, Shen, Yaling, Song, Yan, and Wan, Xiang
- Subjects
Computer Science - Computation and Language - Abstract
Medical imaging plays a significant role in clinical practice of medical diagnosis, where the text reports of the images are essential in understanding them and facilitating later treatments. By generating the reports automatically, it is beneficial to help lighten the burden of radiologists and significantly promote clinical automation, which already attracts much attention in applying artificial intelligence to medical domain. Previous studies mainly follow the encoder-decoder paradigm and focus on the aspect of text generation, with few studies considering the importance of cross-modal mappings and explicitly exploit such mappings to facilitate radiology report generation. In this paper, we propose a cross-modal memory networks (CMN) to enhance the encoder-decoder framework for radiology report generation, where a shared memory is designed to record the alignment between images and texts so as to facilitate the interaction and generation across modalities. Experimental results illustrate the effectiveness of our proposed model, where state-of-the-art performance is achieved on two widely used benchmark datasets, i.e., IU X-Ray and MIMIC-CXR. Further analyses also prove that our model is able to better align information from radiology images and texts so as to help generating more accurate reports in terms of clinical indicators., Comment: Natural Language Processing. 11 pages, 6 figures. ACL-IJCNLP 2021
- Published
- 2022
50. Graph Enhanced Contrastive Learning for Radiology Findings Summarization
- Author
-
Hu, Jinpeng, Li, Zhuo, Chen, Zhihong, Li, Zhen, Wan, Xiang, and Chang, Tsung-Hui
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
The impression section of a radiology report summarizes the most prominent observation from the findings section and is the most important section for radiologists to communicate to physicians. Summarizing findings is time-consuming and can be prone to error for inexperienced radiologists, and thus automatic impression generation has attracted substantial attention. With the encoder-decoder framework, most previous studies explore incorporating extra knowledge (e.g., static pre-defined clinical ontologies or extra background information). Yet, they encode such knowledge by a separate encoder to treat it as an extra input to their models, which is limited in leveraging their relations with the original findings. To address the limitation, we propose a unified framework for exploiting both extra knowledge and the original findings in an integrated way so that the critical information (i.e., key words and their relations) can be extracted in an appropriate way to facilitate impression generation. In detail, for each input findings, it is encoded by a text encoder, and a graph is constructed through its entities and dependency tree. Then, a graph encoder (e.g., graph neural networks (GNNs)) is adopted to model relation information in the constructed graph. Finally, to emphasize the key words in the findings, contrastive learning is introduced to map positive samples (constructed by masking non-key words) closer and push apart negative ones (constructed by masking key words). The experimental results on OpenI and MIMIC-CXR confirm the effectiveness of our proposed method., Comment: 9 pages, 5 figures, Accepted to ACL 2022 Main Conference
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.