Back to Search Start Over

Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNA

Authors :
Jilei Liu
Hongru Shen
Yichen Yang
Meng Yang
Qiang Zhang
Kexin Chen
Xiangchun Li
Source :
Molecular Oncology, Vol 18, Iss 11, Pp 2755-2769 (2024)
Publication Year :
2024
Publisher :
Wiley, 2024.

Abstract

Early cancer diagnosis from bisulfite‐treated cell‐free DNA (cfDNA) fragments requires tedious data analytical procedures. Here, we present a deep‐learning‐based approach for early cancer interception and diagnosis (DECIDIA) that can achieve accurate cancer diagnosis exclusively from bisulfite‐treated cfDNA sequencing fragments. DECIDIA relies on transformer‐based representation learning of DNA fragments and weakly supervised multiple‐instance learning for classification. We systematically evaluate the performance of DECIDIA for cancer diagnosis and cancer type prediction on a curated dataset of 5389 samples that consist of colorectal cancer (CRC; n = 1574), hepatocellular cell carcinoma (HCC; n = 1181), lung cancer (n = 654), and non‐cancer control (n = 1980). DECIDIA achieved an area under the receiver operating curve (AUROC) of 0.980 (95% CI, 0.976–0.984) in 10‐fold cross‐validation settings on the CRC dataset by differentiating cancer patients from cancer‐free controls, outperforming benchmarked methods that are based on methylation intensities. Noticeably, DECIDIA achieved an AUROC of 0.910 (95% CI, 0.896–0.924) on the externally independent HCC testing set in distinguishing HCC patients from cancer‐free controls, although there was no HCC data used in model development. In the settings of cancer‐type classification, we observed that DECIDIA achieved a micro‐average AUROC of 0.963 (95% CI, 0.960–0.966) and an overall accuracy of 82.8% (95% CI, 81.8–83.9). In addition, we distilled four sequence signatures from the raw sequencing reads that exhibited differential patterns in cancer versus control and among different cancer types. Our approach represents a new paradigm towards eliminating the tedious data analytical procedures for liquid biopsy that uses bisulfite‐treated cfDNA methylome.

Details

Language :
English
ISSN :
18780261 and 15747891
Volume :
18
Issue :
11
Database :
Directory of Open Access Journals
Journal :
Molecular Oncology
Publication Type :
Academic Journal
Accession number :
edsdoj.67456d0b304a818b85b00963dc4443
Document Type :
article
Full Text :
https://doi.org/10.1002/1878-0261.13745