1. SABRE: a method for assessing the stability of gene modules in complex tissues and subject populations
- Author
-
Virginia Chen, Scott J. Tebbutt, Casey P. Shannon, Bruce M. McManus, Mandeep Takhar, Zsuzsanna Hollander, Don D. Sin, Robert Balshaw, and Raymond T. Ng
- Subjects
0301 basic medicine ,Stability criterion ,Computer science ,Systems biology ,Gene regulatory network ,Stability (learning theory) ,Inference ,Gene modules ,computer.software_genre ,Biochemistry ,Set (abstract data type) ,Transcriptome ,03 medical and health sciences ,Structural Biology ,Gene expression ,Humans ,Gene Regulatory Networks ,Molecular Biology ,Gene ,WGCNA ,Gene Expression Profiling ,Systems Biology ,Methodology Article ,Applied Mathematics ,Computational Biology ,Reproducibility ,Bootstrap ,Computer Science Applications ,Identification (information) ,030104 developmental biology ,Data mining ,DNA microarray ,computer ,Algorithms ,Software - Abstract
Background Gene network inference (GNI) algorithms can be used to identify sets of coordinately expressed genes, termed network modules from whole transcriptome gene expression data. The identification of such modules has become a popular approach to systems biology, with important applications in translational research. Although diverse computational and statistical approaches have been devised to identify such modules, their performance behavior is still not fully understood, particularly in complex human tissues. Given human heterogeneity, one important question is how the outputs of these computational methods are sensitive to the input sample set, or stability. A related question is how this sensitivity depends on the size of the sample set. We describe here the SABRE (Similarity Across Bootstrap RE-sampling) procedure for assessing the stability of gene network modules using a re-sampling strategy, introduce a novel criterion for identifying stable modules, and demonstrate the utility of this approach in a clinically-relevant cohort, using two different gene network module discovery algorithms. Results The stability of modules increased as sample size increased and stable modules were more likely to be replicated in larger sets of samples. Random modules derived from permutated gene expression data were consistently unstable, as assessed by SABRE, and provide a useful baseline value for our proposed stability criterion. Gene module sets identified by different algorithms varied with respect to their stability, as assessed by SABRE. Finally, stable modules were more readily annotated in various curated gene set databases. Conclusions The SABRE procedure and proposed stability criterion may provide guidance when designing systems biology studies in complex human disease and tissues. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1319-8) contains supplementary material, which is available to authorized users.
- Published
- 2017
- Full Text
- View/download PDF