Back to Search
Start Over
Prediction of condition-specific regulatory genes using machine learning
- Source :
- Nucleic Acids Research
- Publication Year :
- 2020
- Publisher :
- Oxford University Press, 2020.
-
Abstract
- Recent advances in genomic technologies have generated data on large-scale protein–DNA interactions and open chromatin regions for many eukaryotic species. How to identify condition-specific functions of transcription factors using these data has become a major challenge in genomic research. To solve this problem, we have developed a method called ConSReg, which provides a novel approach to integrate regulatory genomic data into predictive machine learning models of key regulatory genes. Using Arabidopsis as a model system, we tested our approach to identify regulatory genes in data sets from single cell gene expression and from abiotic stress treatments. Our results showed that ConSReg accurately predicted transcription factors that regulate differentially expressed genes with an average auROC of 0.84, which is 23.5–25% better than enrichment-based approaches. To further validate the performance of ConSReg, we analyzed an independent data set related to plant nitrogen responses. ConSReg provided better rankings of the correct transcription factors in 61.7% of cases, which is three times better than other plant tools. We applied ConSReg to Arabidopsis single cell RNA-seq data, successfully identifying candidate regulatory genes that control cell wall formation. Our methods provide a new approach to define candidate regulatory genes using integrated genomic data in plants.
- Subjects :
- 0106 biological sciences
STRESS
AcademicSubjects/SCI00010
05 Environmental Sciences
Arabidopsis
Datasets as Topic
RNA-Seq
computer.software_genre
01 natural sciences
Genome
Machine Learning
Gene Expression Regulation, Plant
Genes, Regulator
Gene Regulatory Networks
NETWORK
Promoter Regions, Genetic
Narese/7
Regulator gene
0303 health sciences
biology
OPEN CHROMATIN
Chromatin
TRANSCRIPTION FACTORS
Narese/24
Methods Online
Single-Cell Analysis
Life Sciences & Biomedicine
Biochemistry & Molecular Biology
EXPRESSION DATA
Machine learning
CHEMICAL-COMPOSITION
Genes, Plant
03 medical and health sciences
Deep Learning
ROOT
Stress, Physiological
Genetics
TOLERANCE
Gene
Transcription factor
030304 developmental biology
Abiotic stress
business.industry
Arabidopsis Proteins
Gene Expression Profiling
DNA
06 Biological Sciences
biology.organism_classification
Artificial intelligence
08 Information and Computing Sciences
business
computer
Transcription Factors
010606 plant biology & botany
Developmental Biology
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Nucleic Acids Research
- Accession number :
- edsair.doi.dedup.....577ebfa86499a83539584d022d2b1589