Back to Search Start Over

Bayesian correlated clustering to integrate multiple datasets.

Authors :
Kirk, Paul
Griffin, Jim E.
Savage, Richard S.
Ghahramani, Zoubin
Wild, David L.
Source :
Bioinformatics. Dec2012, Vol. 28 Issue 24, p3290-3297. 8p.
Publication Year :
2012

Abstract

Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets.Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.Availability: A Matlab implementation of MDI is available from http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/.Contact: D.L.Wild@warwick.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13674803
Volume :
28
Issue :
24
Database :
Academic Search Index
Journal :
Bioinformatics
Publication Type :
Academic Journal
Accession number :
84195046
Full Text :
https://doi.org/10.1093/bioinformatics/bts595