Back to Search
Start Over
Discriminative feature of cells characterizes cell populations of interest by a small subset of genes
- Source :
- PLoS Computational Biology, Vol 17, Iss 11, p e1009579 (2021), PLoS Computational Biology
- Publication Year :
- 2021
- Publisher :
- Public Library of Science (PLoS), 2021.
-
Abstract
- Organisms are composed of various cell types with specific states. To obtain a comprehensive understanding of the functions of organs and tissues, cell types have been classified and defined by identifying specific marker genes. Statistical tests are critical for identifying marker genes, which often involve evaluating differences in the mean expression levels of genes. Differentially expressed gene (DEG)-based analysis has been the most frequently used method of this kind. However, in association with increases in sample size such as in single-cell analysis, DEG-based analysis has faced difficulties associated with the inflation of P-values. Here, we propose the concept of discriminative feature of cells (DFC), an alternative to using DEG-based approaches. We implemented DFC using logistic regression with an adaptive LASSO penalty to perform binary classification for discriminating a population of interest and variable selection to obtain a small subset of defining genes. We demonstrated that DFC prioritized gene pairs with non-independent expression using artificial data and that DFC enabled characterization of the muscle satellite/progenitor cell population. The results revealed that DFC well captured cell-type-specific markers, specific gene expression patterns, and subcategories of this cell population. DFC may complement DEG-based methods for interpreting large data sets. DEG-based analysis uses lists of genes with differences in expression between groups, while DFC, which can be termed a discriminative approach, has potential applications in the task of cell characterization. Upon recent advances in the high-throughput analysis of single cells, methods of cell characterization such as scRNA-seq can be effectively subjected to the discriminative methods.<br />Author summary Statistical methods for detecting differences in individual gene expression are indispensable for understanding cell types. However, conventional statistical methods, such as differentially expressed gene (DEG)-based analysis, have faced difficulties associated with the inflation of P-values because of both the large sample size and selection bias introduced by exploratory data analysis such as single-cell transcriptomics. Here, we propose the concept of discriminative feature of cells (DFC), an alternative to using DEG-based approaches. We implemented DFC using logistic regression with an adaptive LASSO penalty to perform binary classification for the discrimination of a population of interest and variable selection to obtain a small subset of defining genes. We demonstrated that DFC prioritized gene pairs with non-independent expression using artificial data, and that it enabled characterization of the muscle satellite/progenitor cell population. The results revealed that DFC well captured cell-type-specific markers, specific gene expression patterns, and subcategories of this cell population. DFC may complement differentially expressed gene-based methods for interpreting large data sets.
- Subjects :
- Critical Care and Emergency Medicine
Cellular differentiation
Gene Expression
Lasso (statistics)
Discriminative model
Animal Cells
Medicine and Health Sciences
Morphogenesis
Feature (machine learning)
Cluster Analysis
Biology (General)
Musculoskeletal System
Trauma Medicine
education.field_of_study
Ecology
Muscles
Stem Cells
Cell Differentiation
Computational Theory and Mathematics
Binary classification
Modeling and Simulation
Anatomy
Cellular Types
Traumatic Injury
Muscle Regeneration
Algorithms
Research Article
Genetic Markers
Cell type
QH301-705.5
Population
Muscle Tissue
Feature selection
Computational biology
Biology
Research and Analysis Methods
Cellular and Molecular Neuroscience
Genetics
Regeneration
Humans
Computer Simulation
Molecular Biology Techniques
education
Gene
Molecular Biology
Ecology, Evolution, Behavior and Systematics
Biology and Life Sciences
Marker Genes
Cell Biology
Exploratory data analysis
Biological Tissue
Logistic Models
Skeletal Muscles
Musculoskeletal Injury
Organism Development
Developmental Biology
Subjects
Details
- Language :
- English
- ISSN :
- 15537358
- Volume :
- 17
- Issue :
- 11
- Database :
- OpenAIRE
- Journal :
- PLoS Computational Biology
- Accession number :
- edsair.doi.dedup.....19766dd64db55c81c6fd2a32573beb7a