Back to Search
Start Over
Gaussian embedding for large-scale gene set analysis
- Source :
- Nature machine intelligence
- Publication Year :
- 2020
- Publisher :
- Springer Science and Business Media LLC, 2020.
-
Abstract
- Gene sets, including protein complexes and signalling pathways, have proliferated greatly, in large part as a result of high-throughput biological data. Leveraging gene sets to gain insight into biological discovery requires computational methods for converting them into a useful form for available machine learning models. Here, we study the problem of embedding gene sets as compact features that are compatible with available machine learning codes. We present Set2Gaussian, a novel network-based gene set embedding approach, which represents each gene set as a multivariate Gaussian distribution rather than a single point in the low-dimensional space, according to the proximity of these genes in a protein–protein interaction network. We demonstrate that Set2Gaussian improves gene set member identification, accurately stratifies tumours, and finds concise gene sets for gene set enrichment analysis. We further show how Set2Gaussian allows us to identify a clinical prognostic and predictive subnetwork around neurofilament medium in sarcoma, which we validate in independent cohorts. Gene sets can provide valuable information for gaining insight into disease mechanisms and cellular functions. In this paper, the authors use a Gaussian approach to represent gene sets and gene networks in a low-dimensional space, allowing for accurate prediction and decreased computational complexity.
- Subjects :
- 0301 basic medicine
Biological data
Computational complexity theory
Computer Networks and Communications
Computer science
Gene regulatory network
Scale (descriptive set theory)
Computational biology
Article
Human-Computer Interaction
Set (abstract data type)
03 medical and health sciences
030104 developmental biology
0302 clinical medicine
Artificial Intelligence
Interaction network
Embedding
Computer Vision and Pattern Recognition
Subnetwork
030217 neurology & neurosurgery
Software
Subjects
Details
- ISSN :
- 25225839
- Volume :
- 2
- Database :
- OpenAIRE
- Journal :
- Nature Machine Intelligence
- Accession number :
- edsair.doi.dedup.....d7ec64fd7ec611e8346df4c3422664c7