Back to Search
Start Over
Benchmarking and Testing Machine Learning Approaches with BARRA:CuRDa, a Curated RNA-Seq Database for Cancer Research
- Source :
- Journal of Computational Biology. 28:931-944
- Publication Year :
- 2021
- Publisher :
- Mary Ann Liebert Inc, 2021.
-
Abstract
- RNA-seq is gradually becoming the dominating technique employed to access the global gene expression in biological samples, allowing more flexible protocols and robust analysis. However, the nature of RNA-seq results imposes new data-handling challenges when it comes to computational analysis. With the increasing employment of machine learning (ML) techniques in biomedical sciences, databases that could provide curated data sets treated with state-of-the-art approaches already adapted to ML protocols, become essential for testing new algorithms. In this study, we present the Benchmarking of ARtificial intelligence Research: Curated RNA-seq Database (BARRA:CuRDa). BARRA:CuRDa was built exclusively for cancer research and is composed of 17 handpicked RNA-seq data sets for Homo sapiens that were gathered from the Gene Expression Omnibus, using rigorous filtering criteria. All data sets were individually submitted to sample quality analysis, removal of low-quality bases and artifacts from the experimental process, removal of ribosomal RNA, and estimation of transcript-level abundance. Moreover, all data sets were tested using standard approaches in the field, which allows them to be used as benchmark to new ML approaches. A feature selection analysis was also performed on each data set to investigate the biological accuracy of basic techniques. Results include genes already related to their specific tumoral tissue a large amount of long noncoding RNA and pseudogenes. BARRA:CuRDa is available at http://sbcb.inf.ufrgs.br/barracurda.
- Subjects :
- Database
Process (engineering)
Computer science
business.industry
Pseudogene
Feature selection
RNA-Seq
Benchmarking
computer.software_genre
Machine learning
Field (computer science)
Data set
Computational Mathematics
Computational Theory and Mathematics
Modeling and Simulation
Genetics
Benchmark (computing)
Cancer research
Artificial intelligence
business
Molecular Biology
computer
Subjects
Details
- ISSN :
- 15578666
- Volume :
- 28
- Database :
- OpenAIRE
- Journal :
- Journal of Computational Biology
- Accession number :
- edsair.doi...........99d88c6ec4197affda54898ece591186