Back to Search Start Over

Do available protein 3D structures reflect human genetic and functional diversity?

Authors :
Jens Meiler
John A. Capra
Gregory Sliwoski
Charles R. Sanders
R. Michael Sivley
Neel Patel
William S. Bush
Publication Year :
2019
Publisher :
Cold Spring Harbor Laboratory, 2019.

Abstract

Genomic databases are substantially biased towards European ancestry populations, and this bias contributes to health disparities. Here, we quantify how well 66,971 experimentally characterized human protein 3D structures represent the diversity of protein sequences observed across the 1000 Genomes Project. More than 85% of available structures do not match a sequence observed in at least one individual, and on average structures match the sequence of 74% of individuals. Nearly 23% of human structures do not matchanyobserved sequences; however, after masking engineered/known mutations, this decreases to ~4%. African ancestry sequences are modestly, but significantly, less likely to be represented by structures (73.5% vs. 74.0%). These differences are mainly driven by the greater genetic diversity of African populations. We identify thousands of variants unrepresented in available structures that influence protein structure and function. Thus, the use of a single structure as representative of “the wild type” protein will often bias results against many individuals. The diversity of protein sequence and structure must be considered to enable accurate, reproducible, and generalizable conclusions from structural analyses.

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....f8ad612305fc971d8a0a4990ec39e470
Full Text :
https://doi.org/10.1101/637744