Back to Search Start Over

TumorTracer: a method to identify the tissue of origin from the somatic mutations of a tumor specimen

Authors :
Celine Lefebvre
Charles Swanton
Aron Charles Eklund
Mariam Jamal-Hanjani
Cecilia Engel Thomas
Andrea Marion Marquard
Francesco Favero
Zoltan Szallasi
Nicolai Juul Birkbak
Seema Shafi
Charles Ferté
Fabrice Andre
Marcin Krzystanek
Gareth A. Wilson
Center for Biological Sequence Analysis [Lyngby]
Technical University of Denmark [Lyngby] (DTU)
Cancer Research UK Lung Cancer Centre of Excellence [Londres, Royaume-Uni]
University College of London [London] (UCL)
Novo Nordisk Foundation Center for Protein Research (CPR)
Faculty of Health and Medical Sciences
University of Copenhagen = Københavns Universitet (KU)-University of Copenhagen = Københavns Universitet (KU)
Biomarqueurs prédictifs et nouvelles stratégies moléculaires en thérapeutique anticancéreuse (U981)
Université Paris-Sud - Paris 11 (UP11)-Institut Gustave Roussy (IGR)-Institut National de la Santé et de la Recherche Médicale (INSERM)
Département de médecine oncologique [Gustave Roussy]
Institut Gustave Roussy (IGR)
Cancer Research UK London Research Institute
Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology [Boston, MA, États-Unis] (CHIP@HST)
Harvard Medical School [Boston] (HMS)
This work was supported by the European Commission 7th Framework Programme [HEALTH-2010-F2-259303]
the Danish Council for Independent Research [09-073053/FSS]
the Breast Cancer Research Foundation [to ZS]
the Villum Kann Rasmussen Foundation [to NJB]
the Danish Cancer Society [to ACE] and the Novo Nordisk Foundation. The mutation and SCNA data used to develop the classifiers was obtained from the Sanger Institute Catalogue Of Somatic Mutations In Cancer [15] web site, http://cancer.sanger.ac.uk/cosmic.
European Project: 259303,EC:FP7:HEALTH,FP7-HEALTH-2010-two-stage,PREDICT(2011)
Bodescot, Myriam
Predicting individual response and resistance to VEGFR/mTOR pathway therapeutic intervention using biomarkers discovered through tumour functional genomics - PREDICT - - EC:FP7:HEALTH2011-01-01 - 2014-12-31 - 259303 - VALID
Danmarks Tekniske Universitet = Technical University of Denmark (DTU)
University of Copenhagen = Københavns Universitet (UCPH)-University of Copenhagen = Københavns Universitet (UCPH)
Source :
BMC Medical Genomics, BMC Medical Genomics, BioMed Central, 2015, 8 (1), pp.58. ⟨10.1186/s12920-015-0130-0⟩, BMC Medical Genomics, 2015, 8 (1), pp.58. ⟨10.1186/s12920-015-0130-0⟩, Marquard, A M, Birkbak, N J, Thomas, C E, Favero, F, Krzystanek, M, Lefebvre, C, Ferté, C, Jamal-Hanjani, M, Wilson, G A, Shafi, S, Swanton, C, André, F, Szallasi, Z I & Eklund, A C 2015, ' TumorTracer: a method to identify the tissue of origin from the somatic mutations of a tumor specimen ', BMC Medical Genomics, vol. 8, no. 58 . https://doi.org/10.1186/s12920-015-0130-0
Publication Year :
2015

Abstract

Background A substantial proportion of cancer cases present with a metastatic tumor and require further testing to determine the primary site; many of these are never fully diagnosed and remain cancer of unknown primary origin (CUP). It has been previously demonstrated that the somatic point mutations detected in a tumor can be used to identify its site of origin with limited accuracy. We hypothesized that higher accuracy could be achieved by a classification algorithm based on the following feature sets: 1) the number of nonsynonymous point mutations in a set of 232 specific cancer-associated genes, 2) frequencies of the 96 classes of single-nucleotide substitution determined by the flanking bases, and 3) copy number profiles, if available. Methods We used publicly available somatic mutation data from the COSMIC database to train random forest classifiers to distinguish among those tissues of origin for which sufficient data was available. We selected feature sets using cross-validation and then derived two final classifiers (with or without copy number profiles) using 80 % of the available tumors. We evaluated the accuracy using the remaining 20 %. For further validation, we assessed accuracy of the without-copy-number classifier on three independent data sets: 1669 newly available public tumors of various types, a cohort of 91 breast metastases, and a set of 24 specimens from 9 lung cancer patients subjected to multiregion sequencing. Results The cross-validation accuracy was highest when all three types of information were used. On the left-out COSMIC data not used for training, we achieved a classification accuracy of 85 % across 6 primary sites (with copy numbers), and 69 % across 10 primary sites (without copy numbers). Importantly, a derived confidence score could distinguish tumors that could be identified with 95 % accuracy (32 %/75 % of tumors with/without copy numbers) from those that were less certain. Accuracy in the independent data sets was 46 %, 53 % and 89 % respectively, similar to the accuracy expected from the training data. Conclusions Identification of primary site from point mutation and/or copy number data may be accurate enough to aid clinical diagnosis of cancers of unknown primary origin. Electronic supplementary material The online version of this article (doi:10.1186/s12920-015-0130-0) contains supplementary material, which is available to authorized users.

Details

ISSN :
17558794
Volume :
8
Database :
OpenAIRE
Journal :
BMC medical genomics
Accession number :
edsair.doi.dedup.....6b47863004cc96ed59f81440c04e1b59
Full Text :
https://doi.org/10.1186/s12920-015-0130-0⟩