1. ifCNV: A novel isolation-forest-based package to detect copy-number variations from various targeted NGS datasets
- Author
-
Simon Cabello-Aguilar, Julie A. Vendrell, Charles Van Goethem, Mehdi Brousse, Catherine Gozé, Laurent Frantz, and Jérôme Solassol
- Subjects
MT: Bioinformatics ,CNV detection ,artificial intelligence ,machine learning ,localization scoring ,R open-source package ,Therapeutics. Pharmacology ,RM1-950 - Abstract
Copy-number variations (CNVs) are an essential component of genetic variation distributed across large parts of the human genome. CNV detection from next-generation sequencing data and artificial intelligence algorithms have progressed in recent years. However, only a few tools have taken advantage of machine-learning algorithms for CNV detection, and none propose using artificial intelligence to automatically detect probable CNV-positive samples. The most developed approach is to use a reference or normal dataset to compare with the samples of interest, and it is well known that selecting appropriate normal samples represents a challenging task that dramatically influences the precision of results in all CNV-detecting tools. With careful consideration of these issues, we propose here ifCNV, a new software based on isolation forests that creates its own reference, available in R and python with customizable parameters. ifCNV combines artificial intelligence using two isolation forests and a comprehensive scoring method to faithfully detect CNVs among various samples. It was validated using targeted next-generation sequencing (NGS) datasets from diverse origins (capture and amplicon, germline and somatic), and it exhibits high sensitivity, specificity, and accuracy. ifCNV is a publicly available open-source software (https://github.com/SimCab-CHU/ifCNV) that allows the detection of CNVs in many clinical situations.
- Published
- 2022
- Full Text
- View/download PDF