Back to Search Start Over

dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning.

Authors :
Cao, Han
Zhang, Youcheng
Baumbach, Jan
Burton, Paul R
Dwyer, Dominic
Koutsouleris, Nikolaos
Matschinske, Julian
Marcon, Yannick
Rajan, Sivanesan
Rieg, Thilo
Ryser-Welch, Patricia
Späth, Julian
Consortium, The COMMITMENT
Herrmann, Carl
Schwarz, Emanuel
Source :
Bioinformatics; Nov2022, Vol. 38 Issue 21, p4919-4926, 8p
Publication Year :
2022

Abstract

Motivation In multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources. Results Here, we describe the development of 'dsMTL', a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n  < 500), real expression data given the actual network latency. Availability and implementation dsMTL is freely available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package). Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13674803
Volume :
38
Issue :
21
Database :
Complementary Index
Journal :
Bioinformatics
Publication Type :
Academic Journal
Accession number :
159959437
Full Text :
https://doi.org/10.1093/bioinformatics/btac616