Back to Search Start Over

SCIM: Universal Single-Cell Matching with Unpaired Feature Sets

Authors :
Stark, Stefan
Ficek, Joanna
Locatello, Francesco
Bonilla Bustillo, Ximena
Chicherova, Natalia
Singer, Franziska
Tumor Profiler Consortium
Aebersold, Rudolf
Beerenwinkel, Niko
Al-Quaddoomi, Faisal S.
Albinus, Jonas
Beisel, Christian
Bertolini, Anne
Davidson, Natalie
Eschbach, Katja
Ferreira, Pedro
Goetze, Sandra
Grob, Linda
Günther, Detlef
Jahn, Katharina
James, Alva R.
Kahles, André
Kuipers, Jack
Lehmann, Kjong-Van
Mena, Julien
Menzel, Ulrike
Milani, Emanuela S.
Pedrioli, Patrick G.A.
Prummer, Michael
Rosano-Gonzalez, Maria L.
Rätsch, Gunnar
Schär, Tobias
Snijder, Berend
Thankam Sreedharan, Vipin
Stekhoven, Daniel J.
Thomas, Tinu M.
Toussaint, Nora C.
Tuncel, Mustafa
van Drogen, Audrey
Wegmann, Rebekka
Wendt, Fabian
Wollscheid, Bernd
Yu, Shuqing
Zimmermann, Marc
Source :
bioRxiv
Publication Year :
2020
Publisher :
Cold Spring Harbor Laboratory, 2020.

Abstract

Motivation Recent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of the perspectives afforded by each technology. In most cases, however, profiling technologies consume the used cells and thus pairwise correspondences between datasets are lost. Due to the sheer size single-cell datasets can acquire, scalable algorithms that are able to universally match single-cell measurements carried out in one cell to its corresponding sibling in another technology are needed. Results We propose Single-Cell data Integration via Matching (SCIM), a scalable approach to recover such correspondences in two or more technologies. SCIM assumes that cells share a common (low-dimensional) underlying structure and that the underlying cell distribution is approximately constant across technologies. It constructs a technology-invariant latent space using an auto-encoder framework with an adversarial objective. Multi-modal datasets are integrated by pairing cells across technologies using a bipartite matching scheme that operates on the low-dimensional latent representations. We evaluate SCIM on a simulated cellular branching process and show that the cell-to-cell matches derived by SCIM reflect the same pseudotime on the simulated dataset. Moreover, we apply our method to two real-world scenarios, a melanoma tumor sample and a human bone marrow sample, where we pair cells from a scRNA dataset to their sibling cells in a CyTOF dataset achieving 93% and 84% cell-matching accuracy for each one of the samples respectively. Availability https://github.com/ratschlab/scim<br />bioRxiv

Details

Language :
English
Database :
OpenAIRE
Journal :
bioRxiv
Accession number :
edsair.doi.dedup.....78754237f69618ddc28ff0167a99d5ba