Back to Search Start Over

Discovering genotype-phenotype relationships with machine learning and the Visual Physiology Opsin Database (VPOD).

Authors :
Frazer SA
Baghbanzadeh M
Rahnavard A
Crandall KA
Oakley TH
Source :
GigaScience [Gigascience] 2024 Jan 02; Vol. 13.
Publication Year :
2024

Abstract

Background: Predicting phenotypes from genetic variation is foundational for fields as diverse as bioengineering and global change biology, highlighting the importance of efficient methods to predict gene functions. Linking genetic changes to phenotypic changes has been a goal of decades of experimental work, especially for some model gene families, including light-sensitive opsin proteins. Opsins can be expressed in vitro to measure light absorption parameters, including λmax-the wavelength of maximum absorbance-which strongly affects organismal phenotypes like color vision. Despite extensive research on opsins, the data remain dispersed, uncompiled, and often challenging to access, thereby precluding systematic and comprehensive analyses of the intricate relationships between genotype and phenotype.<br />Results: Here, we report a newly compiled database of all heterologously expressed opsin genes with λmax phenotypes that we call the Visual Physiology Opsin Database (VPOD). VPOD_1.0 contains 864 unique opsin genotypes and corresponding λmax phenotypes collected across all animals from 73 separate publications. We use VPOD data and deepBreaks to show regression-based machine learning (ML) models often reliably predict λmax, account for nonadditive effects of mutations on function, and identify functionally critical amino acid sites.<br />Conclusion: The ability to reliably predict functions from gene sequences alone using ML will allow robust exploration of molecular-evolutionary patterns governing phenotype, will inform functional and evolutionary connections to an organism's ecological niche, and may be used more broadly for de novo protein design. Together, our database, phenotype predictions, and model comparisons lay the groundwork for future research applicable to families of genes with quantifiable and comparable phenotypes.<br /> (© The Author(s) 2024. Published by Oxford University Press GigaScience.)

Details

Language :
English
ISSN :
2047-217X
Volume :
13
Database :
MEDLINE
Journal :
GigaScience
Publication Type :
Academic Journal
Accession number :
39460934
Full Text :
https://doi.org/10.1093/gigascience/giae073