Back to Search Start Over

Discriminating physiological from non-physiological interfaces in structures of protein complexes: a community-wide study

Authors :
Hugo Schweke
Qifang Xu
Gerardo Tauriello
Lorenzo Pantolini
Torsten Schwede
Frédéric Cazals
Alix Lhéritier
Juan Fernandez-Recio
Luis Ángel Rodríguez-Lumbreras
Ora Schueler-Furman
Julia K. Varga
Brian Jiménez-García
Manon F. Réau
Alexandre Bonvin
Castrense Savojardo
Pier-Luigi Martelli
Rita Casadio
Jérôme Tubiana
Haim Wolfson
Romina Oliva
Didier Barradas-Bautista
Tiziana Ricciardelli
Luigi Cavallo
Česlovas Venclovas
Kliment Olechnovič
Raphael Guerois
Jessica Andreani
Juliette Martin
Xiao Wang
Daisuke Kihara
Anthony Marchand
Bruno Correia
Xiaoqin Zou
Sucharita Dey
Roland Dunbrack
Emmanuel Levy
Shoshana Wodak
Publication Year :
2023
Publisher :
Authorea, Inc., 2023.

Abstract

Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94 respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines were shown to recall the physiological dimers with significantly higher accuracy than the non-physiological set, lending support for the pertinence of our benchmark dataset. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.

Details

Database :
OpenAIRE
Accession number :
edsair.doi...........adaf2d977a6fae24f92d44626aa47f9a
Full Text :
https://doi.org/10.22541/au.167569565.51141128/v1