1. Filtering Genes for Cluster and Network Analysis
- Author
-
Joseph Beyene, David Tritchler, and Elena Parkhomenko
- Subjects
Gene regulatory network ,Genomics ,Saccharomyces cerevisiae ,Biology ,lcsh:Computer applications to medicine. Medical informatics ,computer.software_genre ,01 natural sciences ,Biochemistry ,Set (abstract data type) ,010104 statistics & probability ,03 medical and health sciences ,Structural Biology ,Cluster (physics) ,Escherichia coli ,Cluster Analysis ,Computer Simulation ,Gene Regulatory Networks ,0101 mathematics ,lcsh:QH301-705.5 ,Molecular Biology ,030304 developmental biology ,Interpretability ,0303 health sciences ,Principal Component Analysis ,Models, Statistical ,Models, Genetic ,Applied Mathematics ,Methodology Article ,Filter (signal processing) ,Quantitative Biology::Genomics ,Computer Science Applications ,lcsh:Biology (General) ,Genes ,Principal component analysis ,lcsh:R858-859.7 ,Data mining ,computer ,Algorithms ,Network analysis - Abstract
Background Prior to cluster analysis or genetic network analysis it is customary to filter, or remove genes considered to be irrelevant from the set of genes to be analyzed. Often genes whose variation across samples is less than an arbitrary threshold value are deleted. This can improve interpretability and reduce bias. Results This paper introduces modular models for representing network structure in order to study the relative effects of different filtering methods. We show that cluster analysis and principal components are strongly affected by filtering. Filtering methods intended specifically for cluster and network analysis are introduced and compared by simulating modular networks with known statistical properties. To study more realistic situations, we analyze simulated "real" data based on well-characterized E. coli and S. cerevisiae regulatory networks. Conclusion The methods introduced apply very generally, to any similarity matrix describing gene expression. One of the proposed methods, SUMCOV, performed well for all models simulated.
- Published
- 2009