1. Massive NGS data analysis reveals hundreds of potential novel gene fusions in human cell lines
- Author
-
Marco Bolis, Silvia Gioiosa, Enrico Garattini, Tiziano Flati, Giovanni Chillemi, Tiziana Castrignanò, Annalisa Massini, and Maddalena Fratelli
- Subjects
Data Analysis ,0301 basic medicine ,Computer science ,In silico ,Health Informatics ,Computational biology ,Web Browser ,Data Note ,Genome ,Translocation, Genetic ,Cell Line ,Fusion gene ,03 medical and health sciences ,Naive Bayes classifier ,Cell Line, Tumor ,Putative gene ,Databases, Genetic ,Data Mining ,Humans ,Oncogene Fusion ,human gene fusions ,database ,chromosomal rearrangements ,Gene Rearrangement ,Database ,malignant cell lines ,NGS ,gene fusion detection algorithms ,bioinformatics ,Genome, Human ,Computational Biology ,High-Throughput Nucleotide Sequencing ,Genomics ,Gene rearrangement ,Computer Science Applications ,030104 developmental biology ,Cancer biomarkers ,Gene Fusion - Abstract
Background Gene fusions derive from chromosomal rearrangements. The resulting chimeric transcripts are often endowed with oncogenic potential. Furthermore, they serve as diagnostic tools for the clinical classification of cancer subgroups with different prognosis and, in some cases, they can provide specific drug targets. To date, many efforts have been carried out to study gene fusion events occurring in tumor samples. In recent years, the availability of a comprehensive next-generation sequencing dataset for all existing human tumor cell lines has provided the opportunity to further investigate these data in order to identify novel and still uncharacterized gene fusion events. Results In our work, we have extensively reanalyzed 935 paired-end RNA-sequencing experiments downloaded from the Cancer Cell Line Encyclopedia repository, aiming at addressing novel putative cell-line specific gene fusion events in human malignancies. The bioinformatics analysis has been performed by the execution of four gene fusion detection algorithms. The results have been further prioritized by running a Bayesian classifier that makes an in silico validation. The collection of fusion events supported by all of the predictive software results in a robust set of ∼1,700 in silico predicted novel candidates suitable for downstream analyses. Given the huge amount of data and information produced, computational results have been systematized in a database named LiGeA. The database can be browsed through a dynamic and interactive web portal, further integrated with validated data from other well-known repositories. Taking advantage of the intuitive query forms, the users can easily access, navigate, filter, and select the putative gene fusions for further validations and studies. They can also find suitable experimental models for a given fusion of interest. Conclusions We believe that the LiGeA resource can represent not only the first compendium of both known and putative novel gene fusion events in the catalog of all of the human malignant cell lines but it can also become a handy starting point for wet-lab biologists who wish to investigate novel cancer biomarkers and specific drug targets.
- Published
- 2018