1. PatternMarkers & GWCoGAPS for novel data-driven biomarkers via whole transcriptome NMF
- Author
-
Michael F. Ochs, Elana J. Fertig, Michael Considine, Alexander V. Favorov, Shawn Sivy, Daria A. Gaykalova, Sijia Li, Emily Flam, Ronald D.G. McKay, Luigi Marchionni, Genevieve Stein-O’Brien, Thomas Sherman, Theresa Guo, Carlo Colantuoni, Jacob Carey, and Wai Shing Lee
- Subjects
0301 basic medicine ,Statistics and Probability ,Computer science ,Computational biology ,computer.software_genre ,01 natural sciences ,Biochemistry ,Genome ,Data-driven ,Matrix decomposition ,Non-negative matrix factorization ,Transcriptome ,Bioconductor ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Gene expression ,Humans ,0101 mathematics ,Molecular Biology ,Gene ,030304 developmental biology ,0303 health sciences ,Sequence Analysis, RNA ,Gene Expression Profiling ,Univariate ,Bayes Theorem ,Applications Notes ,Computer Science Applications ,Computational Mathematics ,030104 developmental biology ,Computational Theory and Mathematics ,Genetic marker ,Biomarker (medicine) ,Data mining ,computer ,Algorithms ,Biomarkers ,Software ,030217 neurology & neurosurgery - Abstract
Summary Non-negative Matrix Factorization (NMF) algorithms associate gene expression with biological processes (e.g. time-course dynamics or disease subtypes). Compared with univariate associations, the relative weights of NMF solutions can obscure biomarkers. Therefore, we developed a novel patternMarkers statistic to extract genes for biological validation and enhanced visualization of NMF results. Finding novel and unbiased gene markers with patternMarkers requires whole-genome data. Therefore, we also developed Genome-Wide CoGAPS Analysis in Parallel Sets (GWCoGAPS), the first robust whole genome Bayesian NMF using the sparse, MCMC algorithm, CoGAPS. Additionally, a manual version of the GWCoGAPS algorithm contains analytic and visualization tools including patternMatcher, a Shiny web application. The decomposition in the manual pipeline can be replaced with any NMF algorithm, for further generalization of the software. Using these tools, we find granular brain-region and cell-type specific signatures with corresponding biomarkers in GTEx data, illustrating GWCoGAPS and patternMarkers ascertainment of data-driven biomarkers from whole-genome data. Availability and Implementation PatternMarkers & GWCoGAPS are in the CoGAPS Bioconductor package (3.5) under the GPL license. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2017
- Full Text
- View/download PDF