Back to Search
Start Over
A simple, scalable approach to building a cross-platform transcriptome atlas
- Source :
- PLoS Computational Biology, Vol 16, Iss 9, p e1008219 (2020), PLoS Computational Biology
- Publication Year :
- 2020
- Publisher :
- Public Library of Science (PLoS), 2020.
-
Abstract
- Gene expression atlases have transformed our understanding of the development, composition and function of human tissues. New technologies promise improved cellular or molecular resolution, and have led to the identification of new cell types, or better defined cell states. But as new technologies emerge, information derived on old platforms becomes obsolete. We demonstrate that it is possible to combine a large number of different profiling experiments summarised from dozens of laboratories and representing hundreds of donors, to create an integrated molecular map of human tissue. As an example, we combine 850 samples from 38 platforms to build an integrated atlas of human blood cells. We achieve robust and unbiased cell type clustering using a variance partitioning method, selecting genes with low platform bias relative to biological variation. Other than an initial rescaling, no other transformation to the primary data is applied through batch correction or renormalisation. Additional data, including single-cell datasets, can be projected for comparison, classification and annotation. The resulting atlas provides a multi-scaled approach to visualise and analyse the relationships between sets of genes and blood cell lineages, including the maturation and activation of leukocytes in vivo and in vitro. In allowing for data integration across hundreds of studies, we address a key reproduciblity challenge which is faced by any new technology. This allows us to draw on the deep phenotypes and functional annotations that accompany traditional profiling methods, and provide important context to the high cellular resolution of single cell profiling. Here, we have implemented the blood atlas in the open access Stemformatics.org platform, drawing on its extensive collection of curated transcriptome data. The method is simple, scalable and amenable for rapid deployment in other biological systems or computational workflows.<br />Author summary Combining data from many different studies is an attractive way of capturing new aspects of the biology being studied. Biological variance attributable to cell type, cellular niche, origin, disease status or environmental stimuli is the basis of most small-n transcriptome studies. In aggregation, these promise to capture emergent dimensions of a biology that is not possible to view from any individual study. However biological signal is easily swamped by technical artifact, especially when data is generated on platforms with profoundly different data structures. This is the case when comparing microarray data to RNAseq, or RNAseq to single cell profiling. Consequently, transcriptome atlases are generally comprised from a small number of donors/conditions surveyed using one technology platform. In this paper we present a simple and scalable data integration method that is platform agnostic. We provide a proof-of-principle by constructing an atlas of blood cells that combines many data sets measured on different platforms, and that in combination, recapitulates the known blood hierarchy. The atlas provides a reference to compare external samples to, allowing users to benchmark new derivation or isolation methods. It also provides a reference point for new data types, such as the classification of single cells. The approach allows for FAIR data reuse and robust identification of molecular signatures across multiple studies and experimental conditions.
- Subjects :
- 0301 basic medicine
Physiology
Microarrays
Computer science
Gene Expression
computer.software_genre
Monocytes
Transcriptome
White Blood Cells
Mathematical and Statistical Techniques
0302 clinical medicine
Animal Cells
Gene expression
Cross-platform
Medicine and Health Sciences
Cluster Analysis
Profiling (information science)
Lymphocytes
Biology (General)
Data Curation
Principal Component Analysis
0303 health sciences
Ecology
Statistics
Genomics
Body Fluids
Blood
Bioassays and Physiological Analysis
medicine.anatomical_structure
Computational Theory and Mathematics
Modeling and Simulation
Physical Sciences
Scalability
Data mining
Anatomy
Cellular Types
DNA microarray
Transcriptome Analysis
Research Article
Data integration
QH301-705.5
Immune Cells
Immunology
Research and Analysis Methods
Cellular and Molecular Neuroscience
03 medical and health sciences
Annotation
Atlas (anatomy)
Genetics
medicine
Humans
Statistical Methods
Cluster analysis
Molecular Biology
Ecology, Evolution, Behavior and Systematics
030304 developmental biology
Blood Cells
Data curation
Gene Expression Profiling
Biology and Life Sciences
Computational Biology
Cell Biology
Genome Analysis
030104 developmental biology
Workflow
Multivariate Analysis
computer
Mathematics
030217 neurology & neurosurgery
Subjects
Details
- Language :
- English
- ISSN :
- 15537358
- Volume :
- 16
- Issue :
- 9
- Database :
- OpenAIRE
- Journal :
- PLoS Computational Biology
- Accession number :
- edsair.doi.dedup.....711165da308e157a2764d543650ce467