1. MPI-LIT: a literature-curated dataset of microbial binary protein--protein interactions
- Author
-
Peter Uetz, Svetlana Shtivelband, Srinivas Ramachandra, Naresh Raviswaran, Björn Titz, Julia Hofmann, Seesandra V. Rajagopala, Kumar C. Sunil, Johannes B. Goll, N.D. Deve Gowda, Stephen M Blazie, Sharmila S. Mary, Chetan S. Poojari, and Arnab Mukherjee
- Subjects
Statistics and Probability ,Context (language use) ,Computational biology ,Biology ,computer.software_genre ,Biochemistry ,Protein–protein interaction ,Bacterial protein ,Set (abstract data type) ,Bacterial Proteins ,Protein Interaction Mapping ,Databases, Protein ,Molecular Biology ,String database ,Supplementary data ,Microbial interaction ,Computational Biology ,Original Papers ,Computer Science Applications ,Computational Mathematics ,ComputingMethodologies_PATTERNRECOGNITION ,Computational Theory and Mathematics ,Data mining ,Experimental methods ,computer ,Protein Binding - Abstract
Prokaryotic protein–protein interactions are underrepresented in currently available databases. Here, we describe a ‘gold standard’ dataset (MPI-LIT) focusing on microbial binary protein–protein interactions and associated experimental evidence that we have manually curated from 813 abstracts and full texts that were selected from an initial set of 36 852 abstracts. The MPI-LIT dataset comprises 1237 experimental descriptions that describe a non-redundant set of 746 interactions of which 659 (88%) are not reported in public databases. To estimate the curation quality, we compared our dataset with a union of microbial interaction data from IntAct, DIP, BIND and MINT. Among common abstracts, we achieve a sensitivity of up to 66% for interactions and 75% for experimental methods. Compared with these other datasets, MPI-LIT has the lowest fraction of interaction experiments per abstract (0.9) and the highest coverage of strains (92) and scientific articles (813). We compared methods that evaluate functional interactions among proteins (such as genomic context or co-expression) which are implemented in the STRING database. Most of these methods discriminate well between functionally relevant protein interactions (MPI-LIT) and high-throughput data. Availability: http://www.jcvi.org/mpidb/interaction.php?dbsource=MPI-LIT. Contact: raja@jcvi.org Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2008