Back to Search
Start Over
Biclustering via sparse clustering
- Source :
- Biometrics
- Publication Year :
- 2018
-
Abstract
- In identifying subgroups of a heterogeneous disease or condition, it is often desirable to identify both the observations and the features which differ between subgroups. For instance, it may be that there is a subgroup of individuals with a certain disease who differ from the rest of the population based on the expression profile for only a subset of genes. Identifying the subgroup of patients and subset of genes could lead to better-targeted therapy. We can represent the subgroup of individuals and genes as a bicluster, a submatrix, U , of a larger data matrix, X , such that the features and observations in U differ from those not contained in U . We present a novel two-step method, SC-Biclust, for identifying U . In the first step, the observations in the bicluster are identified to maximize the sum of the weighted between-cluster feature differences. In the second step, features in the bicluster are identified based on their contribution to the clustering of the observations. This versatile method can be used to identify biclusters that differ on the basis of feature means, feature variances, or more general differences. The bicluster identification accuracy of SC-Biclust is illustrated through several simulated studies. Application of SC-Biclust to pain research illustrates its ability to identify biologically meaningful subgroups.
- Subjects :
- Statistics and Probability
Clustering high-dimensional data
Biometry
Computer science
Normal Distribution
Computational biology
01 natural sciences
General Biochemistry, Genetics and Molecular Biology
Article
Biclustering
010104 statistics & probability
03 medical and health sciences
Cluster Analysis
Humans
Computer Simulation
Disease
0101 mathematics
Cluster analysis
030304 developmental biology
0303 health sciences
Analysis of Variance
Models, Statistical
General Immunology and Microbiology
Applied Mathematics
k-means clustering
General Medicine
Temporomandibular Joint Disorders
Expression (mathematics)
Hierarchical clustering
Feature (computer vision)
Data Interpretation, Statistical
Identification (biology)
General Agricultural and Biological Sciences
Algorithms
Software
Subjects
Details
- ISSN :
- 15410420
- Volume :
- 76
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- Biometrics
- Accession number :
- edsair.doi.dedup.....17ec9e2679acade24f3557961f3ac219