46 results on '"Kieran R Campbell"'
Search Results
2. Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers [version 1; referees: 2 approved]
- Author
-
Kieran R Campbell and Christopher Yau
- Subjects
Bioinformatics ,Control of Gene Expression ,Theory & Simulation ,Medicine ,Science - Abstract
Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.
- Published
- 2017
- Full Text
- View/download PDF
3. Single-cell decoding of drug induced transcriptomic reprogramming in triple negative breast cancers
- Author
-
Farhia Kabeer, Hoa Tran, Mirela Andronescu, Gurdeep Singh, Hakwoo Lee, Sohrab Salehi, Beixi Wang, Justina Biele, Jazmine Brimhall, David Gee, Viviana Cerda, Ciara O’Flanagan, Teresa Algara, Takako Kono, Sean Beatty, Elena Zaikova, Daniel Lai, Eric Lee, Richard Moore, Andrew J. Mungall, IMAXT Consortium, Marc J. Williams, Andrew Roth, Kieran R. Campbell, Sohrab P. Shah, and Samuel Aparicio
- Subjects
PDX ,Single-cell RNA sequencing ,DLP+ single-cell sequencing ,Cisplatin treatment ,In-cis/in-trans genes ,Sensitive/resistant clones ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Background The encoding of cell intrinsic drug resistance states in breast cancer reflects the contributions of genomic and non-genomic variations and requires accurate estimation of clonal fitness from co-measurement of transcriptomic and genomic data. Somatic copy number (CN) variation is the dominant mutational mechanism leading to transcriptional variation and notably contributes to platinum chemotherapy resistance cell states. Here, we deploy time series measurements of triple negative breast cancer (TNBC) single-cell transcriptomes, along with co-measured single-cell CN fitness, identifying genomic and transcriptomic mechanisms in drug-associated transcriptional cell states. Results We present scRNA-seq data (53,641 filtered cells) from serial passaging TNBC patient-derived xenograft (PDX) experiments spanning 2.5 years, matched with genomic single-cell CN data from the same samples. Our findings reveal distinct clonal responses within TNBC tumors exposed to platinum. Clones with high drug fitness undergo clonal sweeps and show subtle transcriptional reversion, while those with weak fitness exhibit dynamic transcription upon drug withdrawal. Pathway analysis highlights convergence on epithelial-mesenchymal transition and cytokine signaling, associated with resistance. Furthermore, pseudotime analysis demonstrates hysteresis in transcriptional reversion, indicating generation of new intermediate transcriptional states upon platinum exposure. Conclusions Within a polyclonal tumor, clones with strong genotype-associated fitness under platinum remained fixed, minimizing transcriptional reversion upon drug withdrawal. Conversely, clones with weaker fitness display non-genomic transcriptional plasticity. This suggests CN-associated and CN-independent transcriptional states could both contribute to platinum resistance. The dominance of genomic or non-genomic mechanisms within polyclonal tumors has implications for drug sensitivity, restoration, and re-treatment strategies.
- Published
- 2024
- Full Text
- View/download PDF
4. Beyond benchmarking and towards predictive models of dataset-specific single-cell RNA-seq pipeline performance
- Author
-
Cindy Fang, Alina Selega, and Kieran R. Campbell
- Subjects
Single-cell RNA sequencing (scRNA-seq) ,Clustering ,Benchmarking ,Automated machine learning ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Background The advent of single-cell RNA-sequencing (scRNA-seq) has driven significant computational methods development for all steps in the scRNA-seq data analysis pipeline, including filtering, normalization, and clustering. The large number of methods and their resulting parameter combinations has created a combinatorial set of possible pipelines to analyze scRNA-seq data, which leads to the obvious question: which is best? Several benchmarking studies compare methods but frequently find variable performance depending on dataset and pipeline characteristics. Alternatively, the large number of scRNA-seq datasets along with advances in supervised machine learning raise a tantalizing possibility: could the optimal pipeline be predicted for a given dataset? Results Here, we begin to answer this question by applying 288 scRNA-seq analysis pipelines to 86 datasets and quantifying pipeline success via a range of measures evaluating cluster purity and biological plausibility. We build supervised machine learning models to predict pipeline success given a range of dataset and pipeline characteristics. We find that prediction performance is significantly better than random and that in many cases pipelines predicted to perform well provide clustering outputs similar to expert-annotated cell type labels. We identify characteristics of datasets that correlate with strong prediction performance that could guide when such prediction models may be useful. Conclusions Supervised machine learning models have utility for recommending analysis pipelines and therefore the potential to alleviate the burden of choosing from the near-infinite number of possibilities. Different aspects of datasets influence the predictive performance of such models which will further guide users.
- Published
- 2024
- Full Text
- View/download PDF
5. Order Under Uncertainty: Robust Differential Expression Analysis Using Probabilistic Models for Pseudotime Inference.
- Author
-
Kieran R Campbell and Christopher Yau
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Single cell gene expression profiling can be used to quantify transcriptional dynamics in temporal processes, such as cell differentiation, using computational methods to label each cell with a 'pseudotime' where true time series experimentation is too difficult to perform. However, owing to the high variability in gene expression between individual cells, there is an inherent uncertainty in the precise temporal ordering of the cells. Pre-existing methods for pseudotime estimation have predominantly given point estimates precluding a rigorous analysis of the implications of uncertainty. We use probabilistic modelling techniques to quantify pseudotime uncertainty and propagate this into downstream differential expression analysis. We demonstrate that reliance on a point estimate of pseudotime can lead to inflated false discovery rates and that probabilistic approaches provide greater robustness and measures of the temporal resolution that can be obtained from pseudotime inference.
- Published
- 2016
- Full Text
- View/download PDF
6. The impacts of active and self-supervised learning on efficient annotation of single-cell expression data
- Author
-
Michael J. Geuenich, Dae-won Gong, and Kieran R. Campbell
- Subjects
Science - Abstract
Abstract A crucial step in the analysis of single-cell data is annotating cells to cell types and states. While a myriad of approaches has been proposed, manual labeling of cells to create training datasets remains tedious and time-consuming. In the field of machine learning, active and self-supervised learning methods have been proposed to improve the performance of a classifier while reducing both annotation time and label budget. However, the benefits of such strategies for single-cell annotation have yet to be evaluated in realistic settings. Here, we perform a comprehensive benchmarking of active and self-supervised labeling strategies across a range of single-cell technologies and cell type annotation algorithms. We quantify the benefits of active learning and self-supervised strategies in the presence of cell type imbalance and variable similarity. We introduce adaptive reweighting, a heuristic procedure tailored to single-cell data—including a marker-aware version—that shows competitive performance with existing approaches. In addition, we demonstrate that having prior knowledge of cell type markers improves annotation accuracy. Finally, we summarize our findings into a set of recommendations for those implementing cell type annotation procedures or platforms. An R package implementing the heuristic approaches introduced in this work may be found at https://github.com/camlab-bioml/leader .
- Published
- 2024
- Full Text
- View/download PDF
7. Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models.
- Author
-
Kaspar Märtens, Kieran R. Campbell, and Christopher Yau
- Published
- 2019
8. The Landscape of COVID-19 Research in the United States: a Cross-sectional Study of Randomized Trials Registered on ClinicalTrials.Gov
- Author
-
Michael Fralick, Jason Moggridge, Chana A. Sacks, Michael Dougan, Kieran R Campbell, Crystal M. North, and Molly Wolf
- Subjects
Relative risk reduction ,medicine.medical_specialty ,Cross-sectional study ,Psychological intervention ,law.invention ,Randomized controlled trial ,law ,Internal medicine ,Internal Medicine ,medicine ,Clinical endpoint ,Humans ,Pandemics ,Randomized Controlled Trials as Topic ,Original Research ,SARS-CoV-2 ,business.industry ,COVID-19 ,Hydroxychloroquine ,United States ,Clinical trial ,Cross-Sectional Studies ,Treatment Outcome ,Sample size determination ,business ,medicine.drug - Abstract
Importance SARS-CoV-2 has infected over 200 million people worldwide, resulting in more than 4 million deaths. Randomized controlled trials are the single best tool to identify effective treatments against this novel pathogen. Objective To describe the characteristics of randomized controlled trials of treatments for COVID-19 in the United States launched in the first 9 months of the pandemic. Design, Setting, and Participants We conducted a cross-sectional study of all completed or actively enrolling randomized, interventional, clinical trials for the treatment of COVID-19 in the United States registered on www.clinicaltrials.gov as of August 10, 2020. We excluded trials of vaccines and other interventions intended to prevent COVID-19. Main Outcomes and Measures We used descriptive statistics to characterize the clinical trials and the statistical power for the available studies. For the late-phase trials (i.e., phase 3 and 2/3 studies), we compared the geographic distribution of the clinical trials with the geographic distribution of people diagnosed with COVID-19. Results We identified 200 randomized controlled trials of treatments for people with COVID-19. Across all trials, 87 (43.5%) were single-center, 64 (32.0%) were unblinded, and 80 (40.0%) were sponsored by industry. The most common treatments included monoclonal antibodies (N=46 trials), small molecule immunomodulators (N=28), antiviral medications (N=24 trials), and hydroxychloroquine (N=20 trials). Of the 9 trials completed by August 2020, the median sample size was 450 (IQR 67–1113); of the 191 ongoing trials, the median planned sample size was 150 (IQR 60–400). Of the late-phase trials (N=54), the most common primary outcome was a severity scale (N=23, 42.6%), followed by a composite of mortality and ventilation (N=10, 18.5%), and mortality alone (N=6, 11.1%). Among these late-phase trials, all trials of antivirals, monoclonal antibodies, or chloroquine/hydroxychloroquine had a power of less than 25% to detect a 20% relative risk reduction in mortality. Had the individual trials for a given class of treatments instead formed a single trial, the power to detect that same reduction in mortality would have been greater than 98%. There was large variability in access to trials with the highest number of trials per capita in the Northeast and the lowest in the Midwest. Conclusions and Relevance A large number of randomized trials were launched early in the pandemic to evaluate treatments for COVID-19. However, many trials were underpowered for important clinical endpoints and substantial geographic disparities were observed, highlighting the importance of improving national clinical trial infrastructure. Supplementary Information The online version contains supplementary material available at 10.1007/s11606-021-07167-9.
- Published
- 2021
9. The differential impacts of dataset imbalance in single-cell data integration
- Author
-
Hassaan Maan, Lin Zhang, Chengxin Yu, Michael Geuenich, Kieran R Campbell, and Bo Wang
- Abstract
Single-cell transcriptomic data measured across distinct samples has led to a surge in computational methods for data integration. Few studies have explicitly examined the common case of cell-type imbalance between datasets to be integrated, and none have characterized its impact on downstream analyses. To address this gap, we developed theIniquitatepipeline for assessing the stability of single-cell RNA sequencing (scRNA-seq) integration results after perturbing the degree of imbalance between datasets. Through benchmarking 5 state-of-the-art scRNA-seq integration techniques in 1600 perturbed integration scenarios for a multi-sample peripheral blood mononuclear cell (PBMC) dataset, our results indicate that sample imbalance has significant impacts on downstream analyses and the biological interpretation of integration results. We observed significant variation in clustering, cell-type classification, marker gene-based annotation, and query-to-reference mapping in imbalanced settings. Two key factors were found to lead to quantitation differences after scRNA-seq integration - the cell-type imbalance within and between samples (relative cell-type support) and the relatedness of cell-types across samples (minimum cell-type center distance). To account for evaluation gaps in imbalanced contexts, we developed novel clustering metrics robust to sample imbalance, including the balanced Adjusted Rand Index (bARI) and balanced Adjusted Mutual Information (bAMI). Our analysis quantifies biologically-relevant effects of dataset imbalance in integration scenarios and introduces guidelines and novel metrics for integration of disparate datasets. The Iniquitate pipeline and balanced clustering metrics are available athttps://github.com/hsmaan/Iniquitateandhttps://github.com/hsmaan/balanced-clustering, respectively.
- Published
- 2022
10. Radiomics analysis to predict pulmonary nodule malignancy using machine learning approaches
- Author
-
Matthew T. Warkentin, Hamad Al-Sawaihey, Stephen Lam, Geoffrey Liu, Brenda Diergaarde, Jian-Min Yuan, David O. Wilson, Martin C. Tammemägi, Sukhinder Atkar-Khattra, Benjamin Grant, Yonathan Brhane, Elham Khodayari-Moez, Kieran R. Campbell, and Rayjean J. Hung
- Abstract
PurposeScreening with low-dose computed tomography can reduce lung cancer-related mortality. However, most screen-detected pulmonary abnormalities do not develop into cancer and it remains challenging to identify high-risk nodules among those with indeterminate appearance. We aim to develop and validate prediction models to discriminate between benign and malignant pulmonary lesions based on radiological features.MethodsUsing four international lung cancer screening studies, we extracted 2,060 radiomic features for each of 16,797 nodules among 6,865 participants. After filtering out redundant and low-quality radiomic features, 642 radiomic and 9 epidemiologic features remained for model development. We used cross-validation and grid search to assess three machine learning models (XGBoost, Random Forest, LASSO) for their ability to accurately predict risk of malignancy for pulmonary nodules. We fit the top-performing ML model in the full training set. We report model performance based on the area under the curve (AUC) and calibration metrics in the held-out test set.ResultsThe ML models that yielded the best predictive performance in cross-validation were XGBoost and LASSO, and among these models, LASSO had superior model calibration, which we considered to be the optimal model. We fit the final LASSO model based on the optimized hyperparameter from cross-validation. Our radiomics model was both well-calibrated and had a test-set AUC of 0.930 (95% CI: 0.901-0.957) and out-performed the established Brock model (AUC=0.868, 95% CI: 0.847-0.888) for nodule assessment.ConclusionWe developed highly-accurate machine learning models based on radiomic and epidemiologic features from four international lung cancer screening studies that may be suitable for assessing suspicious, but indeterminate, screen-detected pulmonary nodules for risk of malignancy.
- Published
- 2022
11. Loss of apelin blocks the emergence of sprouting angiogenesis in experimental tumors
- Author
-
Abul K. Azad, Kieran R. Campbell, Pavel Zhabyeyev, Gavin Y. Oudit, Ronald B. Moore, and Allan G. Murray
- Subjects
Vascular Endothelial Growth Factor A ,Neovascularization, Pathologic ,Vascular Endothelial Growth Factors ,Angiogenesis Inhibitors ,Neoplasms, Experimental ,Ligands ,Biochemistry ,Receptors, G-Protein-Coupled ,Mice ,Receptors, Vascular Endothelial Growth Factor ,Neoplasms ,Sunitinib ,Genetics ,Animals ,Apelin ,Protein Kinase Inhibitors ,Molecular Biology ,Biotechnology - Abstract
Angiogenesis inhibitor drugs targeting vascular endothelial growth factor (VEGF) signaling to the endothelial cell (EC) are used to treat various cancer types. However, primary or secondary resistance to therapy is common. Clinical and pre-clinical studies suggest that alternative pro-angiogenic factors are upregulated after VEGF pathway inhibition. Therefore, identification of alternative pro-angiogenic pathway(s) is critical for the development of more effective anti-angiogenic therapy. Here we study the role of apelin as a pro-angiogenic G-protein-coupled receptor ligand in tumor growth and angiogenesis. We found that loss of apelin in mice delayed the primary tumor growth of Lewis lung carcinoma 1 and B16F10 melanoma when combined with the VEGF receptor tyrosine kinase inhibitor, sunitinib. Targeting apelin in combination with sunitinib markedly reduced the tumor vessel density, and decreased microvessel remodeling. Apelin loss reduced angiogenic sprouting and tip cell marker gene expression in comparison to the sunitinib-alone-treated mice. Single-cell RNA sequencing of tumor EC demonstrated that the loss of apelin prevented EC tip cell differentiation. Thus, apelin is a potent pro-angiogenic cue that supports initiation of tumor neovascularization. Together, our data suggest that targeting apelin may be useful as adjuvant therapy in combination with VEGF signaling inhibition to inhibit the growth of advanced tumors.
- Published
- 2022
12. Multi-objective Bayesian Optimization with Heuristic Objectives for Biomedical and Molecular Data Analysis Workflows
- Author
-
Alina Selega and Kieran R. Campbell
- Abstract
Many practical applications require optimization of multiple, computationally expensive, and possibly competing objectives that are well-suited for multi-objective Bayesian optimization (MOBO) procedures. However, for many types of biomedical data, measures of data analysis workflow success are often heuristic and therefore it is not known a priori which objectives are useful. Thus, MOBO methods that return the full Pareto front may be suboptimal in these cases. Here we propose a novel MOBO method that adaptively updates the scalarization function using properties of the posterior of a multi-output Gaussian process surrogate function. This approach selects useful objectives based on a flexible set of desirable criteria, allowing the functional form of each objective to guide optimization. We demonstrate the qualitative behaviour of our method on toy data and perform proof-of-concept analyses of single-cell RNA sequencing and highly multiplexed imaging datasets.
- Published
- 2022
13. The Basics of Machine Learning
- Author
-
Michael Fralick and Kieran R. Campbell
- Published
- 2022
14. Single cell transcriptomes of normal endometrial derived organoids uncover novel cell type markers and cryptic differentiation of primary tumours
- Author
-
Lien Hoang, Stefan Kommoss, Samuel Leung, Kendall Greening, Minh Bui, Aslı D Munzur, Sohrab P. Shah, James Hopkins, Jamie L. P. Lim, Evan W. Gibbard, Christine Chow, Angela S. Cheng, Maya DeGrood, Niki Boyd, David G. Huntsman, Dawn R. Cochrane, Vassilena Sharlandjieva, J Maxwell Douglas, Germain C. Ho, Daniel Lai, David Farnell, Jessica N. McAlpine, Friedrich Kommoss, Andrew Roth, and Kieran R Campbell
- Subjects
0301 basic medicine ,Cell type ,Cell ,Biology ,Endometrium ,Pathology and Forensic Medicine ,Transcriptome ,03 medical and health sciences ,0302 clinical medicine ,Biomarkers, Tumor ,medicine ,Carcinoma ,Humans ,Sequence Analysis, RNA ,Endometrial cancer ,Cell Differentiation ,medicine.disease ,Endometrial Neoplasms ,Organoids ,030104 developmental biology ,medicine.anatomical_structure ,Single cell sequencing ,030220 oncology & carcinogenesis ,Cancer research ,Immunohistochemistry ,Female ,Carcinoma, Endometrioid - Abstract
Endometrial carcinoma, the most common gynaecological cancer, develops from endometrial epithelium which is composed of secretory and ciliated cells. Pathologic classification is unreliable and there is a need for prognostic tools. We used single cell sequencing to study organoid model systems derived from normal endometrial endometrium to discover novel markers specific for endometrial ciliated or secretory cells. A marker of secretory cells (MPST) and several markers of ciliated cells (FAM92B, WDR16, and DYDC2) were validated by immunohistochemistry on organoids and tissue sections. We performed single cell sequencing on endometrial and ovarian tumours and found both secretory-like and ciliated-like tumour cells. We found that ciliated cell markers (DYDC2, CTH, FOXJ1, and p73) and the secretory cell marker MPST were expressed in endometrial tumours and positively correlated with disease-specific and overall survival of endometrial cancer patients. These findings suggest that expression of differentiation markers in tumours correlates with less aggressive disease, as would be expected for tumours that retain differentiation capacity, albeit cryptic in the case of ciliated cells. These markers could be used to improve the risk stratification of endometrial cancer patients, thereby improving their management. We further assessed whether consideration of MPST expression could refine the ProMiSE molecular classification system for endometrial tumours. We found that higher expression levels of MPST could be used to refine stratification of three of the four ProMiSE molecular subgroups, and that any level of MPST expression was able to significantly refine risk stratification of the copy number high subgroup which has the worst prognosis. Taken together, this shows that single cell sequencing of putative cells of origin has the potential to uncover novel biomarkers that could be used to guide management of cancers. © 2020 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.
- Published
- 2020
15. clonealign: statistical integration of independent single-cell RNA and DNA sequencing data from human cancers
- Author
-
Samuel Aparicio, Kieran R Campbell, Beixi Wang, Emma Laks, Hossein Farahani, Andrew McPherson, Hans Zahn, David Lai, Sohrab P. Shah, Ciara H. O'Flanagan, Pascale Walters, Justina Biele, Alexandre Bouchard-Côté, Farhia Kabeer, Adi Steif, and Jazmine Brimhall
- Subjects
lcsh:QH426-470 ,Somatic cell ,Cell ,Method ,Triple Negative Breast Neoplasms ,Mice, SCID ,Computational biology ,Biology ,DNA sequencing ,03 medical and health sciences ,chemistry.chemical_compound ,0302 clinical medicine ,Mice, Inbred NOD ,Gene expression ,Biomarkers, Tumor ,Tumor Cells, Cultured ,medicine ,Animals ,Humans ,lcsh:QH301-705.5 ,030304 developmental biology ,Ovarian Neoplasms ,0303 health sciences ,Models, Statistical ,High-Throughput Nucleotide Sequencing ,RNA ,Cancer ,medicine.disease ,Xenograft Model Antitumor Assays ,Human genetics ,Clone Cells ,Cystadenocarcinoma, Serous ,3. Good health ,lcsh:Genetics ,medicine.anatomical_structure ,chemistry ,lcsh:Biology (General) ,Female ,Single-Cell Analysis ,Software ,030217 neurology & neurosurgery ,DNA - Abstract
Measuring gene expression of tumor clones at single-cell resolution links functional consequences to somatic alterations. Without scalable methods to simultaneously assay DNA and RNA from the same single cell, parallel single-cell DNA and RNA measurements from independent cell populations must be mapped for genome-transcriptome association. We present clonealign, which assigns gene expression states to cancer clones using single-cell RNA and DNA sequencing independently sampled from a heterogeneous population. We apply clonealign to triple-negative breast cancer patient-derived xenografts and high-grade serous ovarian cancer cell lines and discover clone-specific dysregulated biological pathways not visible using either sequencing method alone. Electronic supplementary material The online version of this article (10.1186/s13059-019-1645-z) contains supplementary material, which is available to authorized users.
- Published
- 2019
16. Modelling hereditary diffuse gastric cancer initiation using transgenic mouse-derived gastric organoids and single-cell sequencing
- Author
-
Kieran R Campbell, Monica Ta, Germain Ho, Tom Brew, Christine Chow, Dawn R. Cochrane, David G. Huntsman, David F. Schaeffer, Howard John Lim, Minh Bui, Parry Guilford, Steve E. Kalloger, Simon Cheung, David Farnell, Amal El-Naggar, Katherine Dixon, Pardeep Kaurah, Tanis D Godwin, and J Maxwell Douglas
- Subjects
0301 basic medicine ,Squamous Differentiation ,Mice, Transgenic ,Germline ,Pathology and Forensic Medicine ,CDH1 ,Cancer syndrome ,03 medical and health sciences ,Cytokeratin ,Mice ,0302 clinical medicine ,Stomach Neoplasms ,medicine ,Animals ,Genetic Predisposition to Disease ,biology ,Cancer ,medicine.disease ,Cadherins ,3. Good health ,Gene Expression Regulation, Neoplastic ,Organoids ,Disease Models, Animal ,030104 developmental biology ,Cell Transformation, Neoplastic ,Single cell sequencing ,030220 oncology & carcinogenesis ,Cancer research ,biology.protein ,Hereditary diffuse gastric cancer ,Single-Cell Analysis ,Transcriptome - Abstract
Hereditary diffuse gastric cancer (HDGC) is a cancer syndrome caused by germline variants in CDH1, the gene encoding the cell-cell adhesion molecule E-cadherin. Loss of E-cadherin in cancer is associated with cellular dedifferentiation and poor prognosis, but the mechanisms through which CDH1 loss initiates HDGC are not known. Using single-cell RNA sequencing, we explored the transcriptional landscape of a murine organoid model of HDGC to characterize the impact of CDH1 loss in early tumourigenesis. Progenitor populations of stratified squamous and simple columnar epithelium, characteristic of the mouse stomach, showed lineage-specific transcriptional programs. Cdh1 inactivation resulted in shifts along the squamous differentiation trajectory associated with aberrant expression of genes central to gastrointestinal epithelial differentiation. Cytokeratin 7 (CK7), encoded by the differentiation-dependent gene Krt7, was a specific marker for early neoplastic lesions in CDH1 carriers. Our findings suggest that deregulation of developmental transcriptional programs may precede malignancy in HDGC. © 2021 The Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.
- Published
- 2021
17. Automated assignment of cell identity from single-cell multiplexed imaging and proteomic data
- Author
-
Jinyu Hou, Hartland W. Jackson, Michael J. Geuenich, Sunyun Lee, Kieran R. Campbell, and Shanza Ayub
- Subjects
Proteomics ,Histology ,Artificial neural network ,Computer science ,business.industry ,Inference ,Statistical model ,Cell Biology ,Machine learning ,computer.software_genre ,Multiplexing ,Pathology and Forensic Medicine ,Robustness (computer science) ,Scalability ,A priori and a posteriori ,Cluster Analysis ,Artificial intelligence ,Neural Networks, Computer ,business ,Cluster analysis ,computer - Abstract
Summary A major challenge in the analysis of highly multiplexed imaging data is the assignment of cells to a priori known cell types. Existing approaches typically solve this by clustering cells followed by manual annotation. However, these often require several subjective choices and cannot explicitly assign cells to an uncharacterized type. To help address these issues we present Astir, a probabilistic model to assign cells to cell types by integrating prior knowledge of marker proteins. Astir uses deep recognition neural networks for fast inference, allowing for annotations at the million-cell scale in the absence of a previously annotated reference. We apply Astir to over 2.4 million cells from suspension and imaging datasets and demonstrate its scalability, robustness to sample composition, and interpretable uncertainty estimates. We envision deployment of Astir either for a first broad cell type assignment or to accurately annotate cells that may serve as biomarkers in multiple disease contexts. A record of this paper’s transparent peer review process is included in the supplemental information.
- Published
- 2021
18. Single cell fitness landscapes induced by genetic and pharmacologic perturbations in cancer
- Author
-
Allen W. Zhang, Justina Biele, Kieran R Campbell, Jerome Ting, Samuel Aparicio, Nicholas Ceglia, Diljot Grewal, Marc J Williams, Richard D. Moore, Takako Kono, Nicole Rusk, Jennifer Pham, Fatemeh Dorri, Sohrab P. Shah, Beixi Wang, Daniel Lai, Andrew J. Mungall, Peter Eirew, Hak Woo Lee, Mirela Andronescu, Andrew McPherson, Marco A. Marra, Alexandre Bouchard-Côté, Teresa Ruiz de Algara, Ciara H. O'Flanagan, Jazmine Brimhall, Farhia Kabeer, So Ra Lee, Sohrab Salehi, Tehmina Masud, and Brian Yu Chieh Cheng
- Subjects
Transcriptome ,Genetics ,medicine.anatomical_structure ,Phylogenetic tree ,Fitness landscape ,Fitness model ,Cell ,Genotype ,medicine ,Biology ,Genome ,Clonal selection - Abstract
Tumour fitness landscapes underpin selection in cancer, impacting etiology, evolution and response to treatment. Progress in defining fitness landscapes has been impeded by a lack of timeseries perturbation experiments over realistic intervals at single cell resolution. We studied the nature of clonal dynamics induced by genetic and pharmacologic perturbation with a quantitative fitness model developed to ascribe quantitative selective coefficients to individual cancer clones, enable prediction of clone-specific growth potential, and forecast competitive clonal dynamics over time. We applied the model to serial single cell genome (>60,000 cells) and transcriptome (>58,000 cells) experiments ranging from 10 months to 2.5 years in duration. We found that genetic perturbation ofTP53in epithelial cell lines induces multiple forms of copy number alteration that confer increased fitness to clonal populations with measurable consequences on gene expression. In patient derived xenografts, predicted selective coefficients accurately forecasted clonal competition dynamics, that were validated with timeseries sampling of experimentally engineered mixtures of low and high fitness clones. In cisplatin-treated patient derived xenografts, the fitness landscape was inverted in a time-dependent manner, whereby a drug resistant clone emerged from a phylogenetic lineage of low fitness clones, and high fitness clones were eradicated. Moreover, clonal selection mediated reversible drug response early in the selection process, whereas late dynamics in genomically fixed clones were associated with transcriptional plasticity on a fixed clonal genotype. Together, our findings outline causal mechanisms with implication for interpreting how mutations and multi-faceted drug resistance mechanisms shape the etiology and cellular fitness of human cancers.
- Published
- 2020
- Full Text
- View/download PDF
19. Cancer phylogenetic tree inference at scale from 1000s of single cell genomes
- Author
-
Samuel Aparicio, Kevin Chern, Farhia Kabeer, Marc J Williams, Sohrab P. Shah, Tyler Funnell, Sohrab Salehi, Kieran R Campbell, Alexandre Bouchard-Côté, Daniel Lai, Mirela Andronescu, Nicole Rusk, Andrew Roth, Fatemeh Dorri, and Andrew McPherson
- Subjects
Whole genome sequencing ,symbols.namesake ,Transformation (function) ,Phylogenetic tree ,Computer science ,symbols ,Inference ,Markov chain Monte Carlo ,Computational biology ,Evolutionary dynamics ,Bayesian inference ,Genome - Abstract
A new generation of scalable single cell whole genome sequencing (scWGS) methods allows unprecedented high resolution measurement of the evolutionary dynamics of cancer cell populations. Phylogenetic reconstruction is central to identifying sub-populations and distinguishing the mutational processes that gave rise to them. Existing phylogenetic tree building models do not scale to the tens of thousands of high resolution genomes achievable with current scWGS methods. We constructed a phylogenetic model and associated Bayesian inference procedure, sitka, specifically for scWGS data. The method is based on a novel phylogenetic encoding of copy number (CN) data, the sitka transformation, that simplifies the site dependencies induced by rearrangements while still forming a sound foundation to phylogenetic inference. The sitka transformation allows us to design novel scalable Markov chain Monte Carlo (MCMC) algorithms. Moreover, we introduce a novel point mutation calling method that incorporates the CN data and the underlying phylogenetic tree to overcome the low per-cell coverage of scWGS. We demonstrate our method on three single cell datasets, including a novel PDX series, and analyse the topological properties of the inferred trees. Sitka is freely available athttps://github.com/UBC-Stat-ML/sitkatree.git.
- Published
- 2020
20. Computational modelling in single-cell cancer genomics: methods and future directions
- Author
-
Kieran R Campbell and Allen W. Zhang
- Subjects
FOS: Computer and information sciences ,Tumour heterogeneity ,Computer science ,Biophysics ,Genomics ,Statistics - Applications ,03 medical and health sciences ,0302 clinical medicine ,Structural Biology ,Neoplasms ,Humans ,Computer Simulation ,Applications (stat.AP) ,Quantitative Biology - Genomics ,Molecular Biology ,Noisy data ,030304 developmental biology ,Genomics (q-bio.GN) ,0303 health sciences ,Genome ,Computational Biology ,Cell Biology ,Epigenome ,Data science ,3. Good health ,ComputingMethodologies_PATTERNRECOGNITION ,FOS: Biological sciences ,Cell cancer ,Single-Cell Analysis ,030217 neurology & neurosurgery - Abstract
Single-cell technologies have revolutionized biomedical research by enabling scalable measurement of the genome, transcriptome, and proteome of multiple systems at single-cell resolution. Now widely applied to cancer models, these assays offer new insights into tumour heterogeneity, which underlies cancer initiation, progression, and relapse. However, the large quantities of high-dimensional, noisy data produced by single-cell assays can complicate data analysis, obscuring biological signals with technical artefacts. In this review article, we outline the major challenges in analyzing single-cell cancer genomics data and survey the current computational tools available to tackle these. We further outline unsolved problems that we consider major opportunities for future methods development to help interpret the vast quantities of data being generated., Review article; 10 pages, 1 figure, 2 tables
- Published
- 2020
21. Assigning scRNA-seq data to known and de novo cell types using CellAssign
- Author
-
Sohrab P. Shah, Kieran R Campbell, and Allen Zhang
- Subjects
Cell type ,Computational biology ,Biology - Abstract
Assigning cells to known or de-novo cell types is an important step in the analysis of single-cell RNA-sequencing (scRNA-seq) data. This protocol outlines how to use the CellAssign R package to accomplish this.
- Published
- 2019
22. 12 Grand Challenges in Single-Cell Data Science
- Author
-
Tzu-Hao Kuo, Pavel Skums, Boudewijn P. F. Lelieveldt, Jeroen de Ridder, Giacomo Corleone, Niko Beerenwinkel, Kieran R Campbell, Antoine-Emmanuel Saliba, Alexandros Stamatakis, Benjamin J. Raphael, Sohrab P. Shah, Marcel J. T. Reinders, Fabian J. Theis, Johannes Köster, Buys de Barbanson, Alexander Schönhuth, Stephanie C. Hicks, Mark D. Robinson, Alice C. McHardy, Felix Mölder, Samuel Aparicio, Emma M. Keizer, Alexander Zelikovsky, Antonio Cappuccio, Łukasz Rączkowski, Alexey M. Kozlov, Oliver Stegle, Rens Holmer, Maria Florescu, Katharina Jahn, Victor Guryev, Marleen Balvert, Indu Khatri, Amir Niknejad, Huan Yang, Ahmed Mahfouz, Camille Stephan Otto Attolini, Antonios Somarakis, John C. Marioni, Jasmijn A. Baaijens, Thamar Jessurun Lobo, Ewa Szczurek, Jan O. Korbel, Tobias Marschall, Szymon M. Kielbasa, Ion I. Mandoiu, Bas E. Dutilh, Davis J. McCarthy, Catalina A. Vallejos, David Laehnemann, and Luca Pinello
- Subjects
Developmental trajectory ,Tumour heterogeneity ,Computer science ,Phylogenomics ,computer.software_genre ,computer ,Data science ,Data integration ,Grand Challenges - Abstract
The recent upswing of microfluidics and combinatorial indexing strategies, further enhanced by very low sequencing costs, have turned single cell sequencing into an empowering technology; analyzing thousands—or even millions—of cells per experimental run is becoming a routine assignment in laboratories worldwide. As a consequence, we are witnessing a data revolution in single cell biology. Although some issues are similar in spirit to those experienced in bulk sequencing, many of the emerging data science problems are unique to single cell analysis; together, they give rise to the new realm of 'Single Cell Data Science'. Here, we outline twelve challenges that will be central in bringing this new field forward. For each challenge, the current state of the art in terms of prior work is reviewed, and open problems are formulated, with an emphasis on the research goals that motivate them. This compendium is meant to serve as a guideline for established researchers, newcomers and students alike, highlighting interesting and rewarding problems in 'Single Cell Data Science' for the coming years.
- Published
- 2019
23. Dissociation of solid tumour tissues with cold active protease for single-cell RNA-seq minimizes conserved collagenase-associated stress responses
- Author
-
Sohrab P. Shah, Allen W. Zhang, Nicholas Ceglia, Samuel Aparicio, Daniel Lai, Peter Eirew, Justina Biele, Esther Kong, Richard D. Moore, Kieran R Campbell, Cherie Bates, Jamie L. P. Lim, Jenifer Pham, Matt Wiens, Kelly Borkowski, Andrew McPherson, Farhia Kabeer, Andrew J. Mungall, Ciara H. O'Flanagan, James Hopkins, Jessica N. McAlpine, and Brittany Hewitson
- Subjects
Cell type ,lcsh:QH426-470 ,Tumour heterogeneity ,medicine.medical_treatment ,Cell ,Transcriptome ,Mice ,03 medical and health sciences ,Breast cancer ,0302 clinical medicine ,Stress, Physiological ,Ovarian cancer ,Neoplasms ,Gene expression ,MHC class I ,medicine ,Animals ,Humans ,Tissue dissociation ,Single cell ,Collagenases ,Viability assay ,lcsh:QH301-705.5 ,030304 developmental biology ,Tumor microenvironment ,0303 health sciences ,Protease ,biology ,Sequence Analysis, RNA ,Chemistry ,Research ,Quality control ,Genomics ,3. Good health ,Cell biology ,Cold Temperature ,lcsh:Genetics ,medicine.anatomical_structure ,lcsh:Biology (General) ,030220 oncology & carcinogenesis ,Collagenase ,biology.protein ,Single-Cell Analysis ,RNA-seq ,Peptide Hydrolases ,medicine.drug - Abstract
BackgroundSingle-cell RNA sequencing (scRNAseq) is a powerful tool for studying complex biological systems, such as tumour heterogeneity and tissue microenvironments. However, the sources of technical and biological variation in primary solid tumour tissues and patient-derived mouse xenografts for scRNAseq, are not well understood. Here, we used low temperature (6°C) protease and collagenase (37°C) to identify the transcriptional signatures associated with tissue dissociation across a diverse scRNAseq dataset comprising 128,481 cells from patient cancer tissues, patient-derived breast cancer xenografts and cancer cell lines.ResultsWe observe substantial variation in standard quality control (QC) metrics of cell viability across conditions and tissues. From FACS sorted populations gated for cell viability, we identify a sub-population of dead cells that would pass standard data filtering practices, and quantify the extent to which their transcriptomes differ from live cells. We identify a further subpopulation of transcriptomically “dying” cells that exhibit up-regulation of MHC class I transcripts, in contrast with live and fully dead cells. From the contrast between tissue protease dissociation at 37°C or 6°C, we observe that collagenase digestion results in a stress response. We derive a core gene set of 512 heat shock and stress response genes, includingFOSandJUN, induced by collagenase (37°C), which are minimized by dissociation with a cold active protease (6°C). While induction of these genes was highly conserved across all cell types, cell type-specific responses to collagenase digestion were observed in patient tissues. We observe that the yield of cancer and non-cancer cell types varies between tissues and dissociation methods.ConclusionsThe method and conditions of tumour dissociation influence cell yield and transcriptome state and are both tissue and cell type dependent. Interpretation of stress pathway expression differences in cancer single cell studies, including components of surface immune recognition such as MHC class I, may be especially confounded. We define a core set of 512 genes that can assist with identification of such effects in dissociated scRNA-seq experiments.
- Published
- 2019
24. Bayesian statistical learning for big data biology
- Author
-
Kieran R Campbell and Christopher Yau
- Subjects
Bayesian probability ,Big data ,Biophysics ,Review ,010402 general chemistry ,Bayesian inference ,Machine learning ,computer.software_genre ,Bayesian ,01 natural sciences ,Computational biology ,03 medical and health sciences ,Structural Biology ,Statistical modelling ,Probabilistic framework ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,business.industry ,Statistical learning ,Statistical model ,Statistics::Computation ,0104 chemical sciences ,Bayesian statistics ,Artificial intelligence ,business ,computer - Abstract
Bayesian statistical learning provides a coherent probabilistic framework for modelling uncertainty in systems. This review describes the theoretical foundations underlying Bayesian statistics and outlines the computational frameworks for implementing Bayesian inference in practice. We then describe the use of Bayesian learning in single-cell biology for the analysis of high-dimensional, large data sets.
- Published
- 2019
25. Probabilistic cell type assignment of single-cell transcriptomic data reveals spatiotemporal microenvironment dynamics in human cancers
- Author
-
Brittany Hewitson, Lauren Chong, Aoki T, Chan T, Jamie L. P. Lim, Allen W. Zhang, Sohrab P. Shah, Jessica N. McAlpine, Anja Mottok, Ciara H. O'Flanagan, Matt Wiens, Andrew McPherson, Weng Ap, Elizabeth A. Chavez, Wang X, Daniel Lai, Pascale Walters, Kieran R Campbell, Samuel Aparicio, Clémentine Sarkozy, and Christian Steidl
- Subjects
0303 health sciences ,Cell type ,Computer science ,Cell ,Probabilistic logic ,Computational biology ,Phenotype ,Transcriptome ,03 medical and health sciences ,0302 clinical medicine ,medicine.anatomical_structure ,030220 oncology & carcinogenesis ,Cancer cell ,medicine ,Cluster analysis ,Gene ,030304 developmental biology - Abstract
Single-cell RNA sequencing (scRNA-seq) has transformed biomedical research, enabling decomposition of complex tissues into disaggregated, functionally distinct cell types. For many applications, investigators wish to identify cell types with known marker genes. Typically, such cell type assignments are performed through unsupervised clustering followed by manual annotation based on these marker genes, or via "mapping" procedures to existing data. However, the manual interpretation required in the former case scales poorly to large datasets, which are also often prone to batch effects, while existing data for purified cell types must be available for the latter. Furthermore, unsupervised clustering can be error-prone, leading to under- and over- clustering of the cell types of interest. To overcome these issues we present CellAssign, a probabilistic model that leverages prior knowledge of cell type marker genes to annotate scRNA-seq data into pre-defined and de novo cell types. CellAssign automates the process of assigning cells in a highly scalable manner across large datasets while simultaneously controlling for batch and patient effects. We demonstrate the analytical advantages of CellAssign through extensive simulations and exemplify real-world utility to profile the spatial dynamics of high-grade serous ovarian cancer and the temporal dynamics of follicular lymphoma. Our analysis reveals subclonal malignant phenotypes and points towards an evolutionary interplay between immune and cancer cell populations with cancer cells escaping immune recognition.
- Published
- 2019
26. Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling
- Author
-
Kieran R Campbell, Nicholas Ceglia, Xuehai Wang, Samuel Aparicio, Anja Mottok, Lauren Chong, Jamie L. P. Lim, Clémentine Sarkozy, Christian Steidl, Sohrab P. Shah, Andrew McPherson, Allen W. Zhang, Daniel Lai, Andrew P Weng, Brittany Hewitson, Tim Chan, Elizabeth A. Chavez, Pascale Walters, Tomohiro Aoki, Ciara H. O'Flanagan, Matt Wiens, and Jessica N. McAlpine
- Subjects
Cell type ,Computer science ,Sequence analysis ,Cell ,RNA-Seq ,Computational biology ,Biochemistry ,Article ,03 medical and health sciences ,medicine ,Tumor Microenvironment ,Humans ,Molecular Biology ,Gene ,Lymphoma, Follicular ,030304 developmental biology ,Probability ,0303 health sciences ,Sequence Analysis, RNA ,Gene Expression Profiling ,Probabilistic logic ,RNA ,Cell Biology ,Gene expression profiling ,medicine.anatomical_structure ,Single-Cell Analysis ,Biotechnology - Abstract
Single-cell RNA sequencing (scRNA-seq) has enabled decomposition of complex tissues into functionally distinct cell types. Often, investigators wish to assign cells to cell types, performed through unsupervised clustering followed by manual annotation, or via “mapping” procedures to existing data. However, manual interpretation scales poorly to large datasets, mapping approaches require purified or pre-annotated data, and both are prone to batch effects. To overcome these issues we present CellAssign (www.github.com/irrationone/cellassign), a probabilistic model that leverages prior knowledge of cell type marker genes to annotate scRNA-seq data into pre-defined or de novo cell types. CellAssign automates the process of assigning cells in a highly scalable manner across large datasets while controlling for batch and sample effects. We demonstrate the advantages of CellAssign through extensive simulations and analysis of tumor microenvironment composition in high grade serous ovarian cancer and follicular lymphoma.
- Published
- 2019
27. Abstract 4899: Investigating mutation co-operativity in early tumorigenesis of low-grade serous ovarian carcinoma with organoid model system and single-cell RNA sequencing
- Author
-
Germain Ho, Dawn R. Cochrane, Genny Trigo, Joyce Yu Han Zhang, Cindy Shen, Kieran R Campbell, Minh Bui, Winnie Yang, David G. Huntsman, and Clara Salamanca
- Subjects
Cancer Research ,Cell ,RNA ,Model system ,Biology ,medicine.disease_cause ,Serous fluid ,medicine.anatomical_structure ,Oncology ,Ovarian carcinoma ,Mutation (genetic algorithm) ,Organoid ,medicine ,Cancer research ,Carcinogenesis - Abstract
Background: Ovarian cancers are the most common gynecologic malignancies. Low grade serous ovarian carcinoma (LGSOC) is a rare tumor, accounting for ~2000 cases diagnosed every year in North America. Most of LGSOCs are characterized by high fatality rates over the long term, with only 20% of women surviving 10 years after diagnosis, due suboptimal response to current chemotherapies. Understanding the molecular events is crucial for developing better early detection strategies and more informed therapeutic options. LGSOC harbors a relatively stable genome, with common activating mutations in BRAF, KRAS and NRAS. Recently, NRAS mutations (Q61R) were found to co-exist with EIF1AX mutations (G8E) in LGOSC, and the two mutated proteins functionally cooperate. Increasing histological and gene expression evidence suggest that the cell of origin of LGSOC is in the Fallopian tube. Low incidence of this disease means it is poorly understood, and the resulting lack of available models further limits the study of underlying mechanisms. We therefore propose to use organoid cultures. These consist of 3D multicellular units that resemble in vitro a tissue or organ of body, both structurally and functionally. Objective: to elucidate molecular events underpinning LGSOC, specifically how NRAS(Q61R) and EIF1AX (G8E) mutations co-operate to drive early stages of tumorigenesis, with organoid system and single-cell RNA sequencing (scRNA-seq) technologies. Method: To reflect genetic background and cell of origin of LGSOC, NRAS Q61R and EIF1AX G8E mutant proteins were overexpressed via lentiviral transduction in organoid cultures of normal human Fallopian tubes. After allowing organoids to establish, 2 weeks after transduction gene expression alterations were resolved with scRNA-seq. Histology of organoids were assessed for histomorphological signs of transformation. Patient-derived tumor organoids (PDTOs) were also cultured to assess how well our LGSOC-modelling organoids (LMOs) recapitulate the histological features of patient tumours. Result: LMOs showed cytologic signs of transformation such as increased nuclear/cytoplasmic ratio, prominent nucleoli, and cellular pleomorphism. Papillary structures, a major histologic characteristic of LGSOC tumor were also observed in LMOs. PDTOs showed similar cytological features and organization as LMOs. From scRNA-seq, we identified genes up-regulated in double-mutant compared to single-mutant organoids such as CA125 and TACSTD2. CA125 is one of the earliest identified biomarkers for ovarian cancer and has remained to be the most useful serum marker despite limited sensitivity and specificity; whereas TACSTD2 overexpression has been found to correlate with a chemo-resistant, aggressive malignant phenotype. Conclusion and future directions: Organoid culture and scRNA-seq is a powerful duo in studying early tumorigenesis events. We established a novel model system of LGSOC by introducing common co-occurring mutations into normal Fallopian tube tissues. Our model recapitulates to a large extent of LGSOC histology. Genes upregulated in double mutants included well-characterized biomarker (CA125) and a potential biomarker or therapeutic target (TACSTD2). Our work will be crucial for developing early detection strategies and targeted treatment options. Citation Format: Joyce Yu Han Zhang, Dawn Cochrane, Kieran Campbell, Minh Bui, Germain Ho, Cindy Shen, Winnie Yang, Clara Salamanca, Genny Trigo, David G. Huntsman. Investigating mutation co-operativity in early tumorigenesis of low-grade serous ovarian carcinoma with organoid model system and single-cell RNA sequencing [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 4899.
- Published
- 2020
28. Abstract B09: Single-cell RNA sequencing of normal endometrial organoids uncovers novel cell-type markers for prognostication of primary tumor samples
- Author
-
Friedrich Kommoss, Evan W. Gibbard, Samantha J. Neilson, Maya DeGrood, Samuel Leung, Kieran R Campbell, David Farnell, James Hopkins, Kendall Greening, Sohrab P. Shah, Germain C. Ho, Minh Bui, Dawn R. Cochrane, Jamie L. P. Lim, Jessica N. McAlpine, Daniel Lai, David G. Huntsman, Angela S. Cheng, and Vassilena Sharlandjieva
- Subjects
Cancer Research ,education.field_of_study ,Cell type ,Tissue microarray ,Population ,Cancer ,Biology ,medicine.disease ,Primary tumor ,Ovarian tumor ,Oncology ,medicine ,Cancer research ,Immunohistochemistry ,education ,Ovarian cancer - Abstract
Endometrial epithelium gives rise to both endometrial and ovarian cancers (of clear-cell and endometrioid subtypes), the latter arising from ectopic endometrium (endometriosis). Endometrial epithelium comprises mainly secretory cells, with a minor ciliated cell population. Due to their scarcity, little is known about the biology or function of endometrial ciliated cells. To understand the biology of endometrial epithelium, and by extension the cancers that arise from it, organoids derived from normal endometrial tissue were cultured. Notch signaling inhibitors were used to induce ciliated cell differentiation. Through single-cell RNA sequencing, distinct secretory and ciliated cell populations were observed, with the ciliated cell population increasing with Notch signaling inhibition. Many novel markers of ciliated cells were observed, but no highly specific markers of secretory cell differentiation. A marker of secretory cells (MST) and several markers of ciliated cells (FAM92B, WDR16 and DYDC2) were validated by immunohistochemistry on organoids and tissue sections. In endometrial tumors, both MST and FAM92B exhibited diffuse staining and were markers of better prognosis. This suggests that tumors expressing differentiation markers have better prognosis, whether it is a marker of secretory or ciliated cells. Interestingly, a small number of endometrial tumors stained positive for DYDC2; however, these tumors exhibited a variable staining pattern with 25-50% tumor cells staining intensely, and the remaining tumor cells not staining at all. A similar variable staining pattern had been observed previously with CTH, another ciliated cell marker. Endometrial and ovarian tumor tissue microarrays were stained with DYDC2, CTH and two ciliated cell markers, FOXJ1 and p73. For all these markers, a subset of tumors displayed a variable staining pattern and for endometrial cancers, the variable staining was a good prognostic indicator. Single-cell sequencing of endometrial tumors has been able to capture these two populations of tumor cells. In ovarian tumors, only variable CTH staining was a significant prognostic indicator. Normal endometrial secretory cells are able to differentiate into ciliated cells, and the variable staining pattern suggests that a subset of tumors retains this ability, and these are clinically less aggressive. Using single-cell sequencing technology on normal tissues to guide development of prognostic markers and provide insight into the biology of the tumors arising from these tissues may be useful for many other tumor types. Citation Format: Dawn R. Cochrane, Kieran R. Campbell, Kendall Greening, Germain C. Ho, James Hopkins, Minh Bui, Vassilena Sharlandjieva, Daniel Lai, Maya DeGrood, Evan W. Gibbard, Samuel Leung, Angela S. Cheng, Jamie L.P. Lim, Samantha Neilson, David Farnell, Friedrich Kommoss, Jessica N. McAlpine, Sohrab P. Shah, David G. Huntsman. Single-cell RNA sequencing of normal endometrial organoids uncovers novel cell-type markers for prognostication of primary tumor samples [abstract]. In: Proceedings of the AACR Special Conference on Advances in Ovarian Cancer Research; 2019 Sep 13-16, 2019; Atlanta, GA. Philadelphia (PA): AACR; Clin Cancer Res 2020;26(13_Suppl):Abstract nr B09.
- Published
- 2020
29. Eleven grand challenges in single-cell data science
- Author
-
Catalina A. Vallejos, Samuel Aparicio, Emma M. Keizer, Ion I. Mandoiu, Luca Pinello, Huan Yang, Maria Florescu, Camille Stephan Otto Attolini, Marcel J. T. Reinders, Ewa Szczurek, Ahmed Mahfouz, Alexandros Stamatakis, Jasmijn A. Baaijens, Amir Niknejad, Rens Holmer, Tzu Hao Kuo, Benjamin J. Raphael, Felix Mölder, Alexey M. Kozlov, Giacomo Corleone, Alexander Zelikovsky, Bas E. Dutilh, Alexander Schönhuth, Antoine-Emmanuel Saliba, Davis J. McCarthy, Sohrab P. Shah, Mark D. Robinson, Johannes Köster, David Lähnemann, Pavel Skums, Boudewijn P. F. Lelieveldt, Antonio Cappuccio, Jeroen de Ridder, Niko Beerenwinkel, Antonios Somarakis, Stephanie C. Hicks, Lukasz Raczkowski, Kieran R Campbell, John C. Marioni, Thamar Jessurun Lobo, Marleen Balvert, Oliver Stegle, Katharina Jahn, Indu Khatri, Jan O. Korbel, Tobias Marschall, Alice C. McHardy, Szymon M. Kielbasa, Fabian J. Theis, Victor Guryev, Buys de Barbanson, Robinson, Mark D [0000-0002-3048-5518], Apollo - University of Cambridge Repository, McCarthy, Davis J [0000-0002-2218-6833], Vallejos, Catalina A [0000-0003-3638-1960], Mölder, Felix [0000-0002-3976-9701], Stegle, Oliver [0000-0002-8818-7193], Zelikovsky, Alex [0000-0003-4424-4691], Theoretical Biology and Bioinformatics, Sub Bioinformatics, Robinson, Mark D. [0000-0002-3048-5518], BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany., and HIRI, Helmholtz-Institut für RNA-basierte Infektionsforschung, Josef-Shneider Strasse 2, 97080 Würzburg, Germany.
- Subjects
RNA-SEQUENCING DATA ,CHROMATIN ACCESSIBILITY ,lcsh:QH426-470 ,Bioinformatics ,Maximum likelihood ,Medizin ,Review ,Biology ,Boom ,Wiskundige en Statistische Methoden - Biometris ,Field (computer science) ,03 medical and health sciences ,Spatial reconstruction ,0302 clinical medicine ,ANALYSIS REVEALS ,Bioinformatica ,Tumours of the digestive tract Radboud Institute for Molecular Life Sciences [Radboudumc 14] ,Life Science ,Animals ,Humans ,RNA-Seq ,MAXIMUM-LIKELIHOOD ,Mathematical and Statistical Methods - Biometris ,lcsh:QH301-705.5 ,030304 developmental biology ,Grand Challenges ,GENE-EXPRESSION ,0303 health sciences ,Extramural ,DATA processing & computer science ,Data Science ,WHOLE-GENOME AMPLIFICATION ,Genomics ,Data science ,Compendium ,lcsh:Genetics ,lcsh:Biology (General) ,WIDE EXPRESSION ,Research questions ,TREE INFERENCE ,ddc:004 ,Single-Cell Analysis ,SPATIAL RECONSTRUCTION ,TUMOR MICROENVIRONMENT ,030217 neurology & neurosurgery - Abstract
Contains fulltext : 218166.pdf (Publisher’s version ) (Open Access) The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.
- Published
- 2020
30. clonealign: statistical integration of independent single-cell RNA & DNA-seq from human cancers
- Author
-
Alexandre Bouchard-Côté, Samuel Aparicio, Kieran R Campbell, Hossein Farahani, Ciara H. O'Flanagan, Jazmine Brimhall, Justina Biele, Emma Laks, Beixi Wang, Hans Zahn, Sohrab P. Shah, Adi Steif, Pascale Walters, Farhia Kabeer, David Lai, and Andrew McPherson
- Subjects
0303 health sciences ,education.field_of_study ,Somatic cell ,Population ,Cell ,RNA ,Cancer ,Computational biology ,Biology ,medicine.disease ,03 medical and health sciences ,chemistry.chemical_compound ,0302 clinical medicine ,medicine.anatomical_structure ,chemistry ,030220 oncology & carcinogenesis ,Cancer cell ,Gene expression ,medicine ,education ,DNA ,030304 developmental biology - Abstract
Measuring gene expression of genomically defined tumour clones at single cell resolution would associate functional consequences to somatic alterations, as a prelude to elucidating pathways driving cell population growth, resistance and relapse. In the absence of scalable methods to simultaneously assay DNA and RNA from the same single cell, independent sampling of cell populations for parallel measurement of single cell DNA and single cell RNA must be computationally mapped for genome-transcriptome association. Here we presentclonealign, a robust statistical framework to assign gene expression states to cancer clones using single-cell RNA-seq and DNA-seq independently sampled from an heterogeneous cancer cell population. We applyclonealignto triple-negative breast cancer patient derived xenografts and high-grade serous ovarian cancer cell lines and discover clone-specific dysregulated biological pathways not visible using either DNA-Seq or RNA-Seq alone.
- Published
- 2018
31. Single-Cell Sequencing of iPSC-Dopamine Neurons Reconstructs Disease Progression and Identifies HDAC4 as a Regulator of Parkinson Cell Phenotypes
- Author
-
Charmaine, Lang, Kieran R, Campbell, Brent J, Ryan, Phillippa, Carling, Moustafa, Attar, Jane, Vowles, Olga V, Perestenko, Rory, Bowden, Fahd, Baig, Meike, Kasten, Michele T, Hu, Sally A, Cowley, Caleb, Webber, and Richard, Wade-Martins
- Subjects
Dopamine ,Dopaminergic Neurons ,Gene Expression Profiling ,Induced Pluripotent Stem Cells ,Parkinson Disease ,Endoplasmic Reticulum Stress ,Histone Deacetylases ,Repressor Proteins ,Phenotype ,Gene Expression Regulation ,Mutation ,Disease Progression ,Glucosylceramidase ,Humans ,Single-Cell Analysis ,Transcriptome - Abstract
Induced pluripotent stem cell (iPSC)-derived dopamine neurons provide an opportunity to model Parkinson's disease (PD), but neuronal cultures are confounded by asynchronous and heterogeneous appearance of disease phenotypes in vitro. Using high-resolution, single-cell transcriptomic analyses of iPSC-derived dopamine neurons carrying the GBA-N370S PD risk variant, we identified a progressive axis of gene expression variation leading to endoplasmic reticulum stress. Pseudotime analysis of genes differentially expressed (DE) along this axis identified the transcriptional repressor histone deacetylase 4 (HDAC4) as an upstream regulator of disease progression. HDAC4 was mislocalized to the nucleus in PD iPSC-derived dopamine neurons and repressed genes early in the disease axis, leading to late deficits in protein homeostasis. Treatment of iPSC-derived dopamine neurons with HDAC4-modulating compounds upregulated genes early in the DE axis and corrected PD-related cellular phenotypes. Our study demonstrates how single-cell transcriptomics can exploit cellular heterogeneity to reveal disease mechanisms and identify therapeutic targets.
- Published
- 2018
32. Uncovering genomic trajectories with heterogeneous genetic and environmental backgrounds across single-cells and populations
- Author
-
Christopher Yau and Kieran R. Campbell
- Subjects
0303 health sciences ,Population level ,Limiting ,Computational biology ,Biology ,computer.software_genre ,Expression (mathematics) ,03 medical and health sciences ,0302 clinical medicine ,030220 oncology & carcinogenesis ,Data mining ,computer ,Temporal information ,030304 developmental biology - Abstract
Pseudotime algorithms can be employed to extract latent temporal information from crosssectional data sets allowing dynamic biological processes to be studied in situations where the collection of genuine time series data is challenging or prohibitive. Computational techniques have arisen from areas such as single-cell ‘omics and in cancer modelling where pseudotime can be used to learn about cellular differentiation or tumour progression. However, methods to date typically assume homogenous genetic and environmental backgrounds, which becomes particularly limiting as datasets grow in size and complexity. As a solution to this we describe a novel statistical framework that learns pseudotime trajectories in the presence of non-homogeneous genetic, phenotypic, or environmental backgrounds. We demonstrate that this enables us to identify interactions between such factors and the underlying genomic trajectory. By applying this model to both single-cell gene expression data and population level cancer studies we show that it uncovers known and novel interaction effects between genetic and enironmental factors and the expression of genes in pathways. We provide an R implementation of our method PhenoPath at https://github.com/kieranrcampbell/phenopath
- Published
- 2017
33. Integrated Single Cell Analysis Reveals Cell Cycle and Ontogeny Related Transcriptional Heterogeneity in Hscs
- Author
-
Quin F. Wills, Alba Rodriguez-Meira, Adam J. Mead, Christopher Yau, Christopher A.G. Booth, Benjamin J. Povinelli, Nikolaos Barkas, Sten Eirik W. Jacobsen, and Kieran R. Campbell
- Subjects
0301 basic medicine ,03 medical and health sciences ,Cancer Research ,030104 developmental biology ,Single-cell analysis ,Ontogeny ,Genetics ,Cell Biology ,Hematology ,Cell cycle ,Biology ,Molecular Biology ,Cell biology - Published
- 2018
34. Abstract GMM-030: MODELS AND ANALYTIC TECHNIQUES OF MULLERIAN TISSUE-DERIVED ORGANOIDS
- Author
-
Dawn R. Cochrane, Kieran R Campbell, Forouh Kalantari, Evan W. Gibbard, Kendall Greening, David Huntsman, Germain C. Ho, Basile Tessier-Cloutier, Jessica N. McAlpine, Sohrab P. Shah, Yemin Wang, and Genny Trigo-Gonzalez
- Subjects
Cancer Research ,Oncology ,Organoid ,Computational biology ,Biology - Abstract
INTRODUCTION: Ovarian cancer is the 5th deadliest cancer found in women and is the deadliest involving the gynecological tract. Most epithelial ovarian cancers have extra-ovarian origins and can be stratified into various histotypes: high and low-grade serous (HGS and LGS), endometrioid (ENOC), clear cell (CCOC), and mucinous – each of which are proposed to have distinct precursor lesions. We present organoids as a useful model to study precursor lesions and the process of tumorigenesis in epithelial ovarian carcinomas. Organoids recapitulate the in vivo growth microenvironment and are useful to study the mechanisms of tumorigenesis from healthy cells. We have previously proposed that ENOC arise from the secretory cell lineage, while CCOC originate from the ciliated cell lineage, and organoids are an ideal model to examine in greater depth the impact of mutation on specific cell populations, such as ciliated cells. METHODS: Surgical fallopian tube and endometrial tissues, removed for non-cancer reasons, were cultured in 2D followed by plating into Matrigel. Matrigel cultures were supplemented with media containing stem/progenitor differentiation factors promoting organoid growth. To study the effect of mutations often found in ovarian cancers on organoid growth and development, gene knockouts were produced using CRISPR lentiviruses on cells prior to Matrigel culture. Lentiviral transductions were optimized for organoid formation and for minimizing invasiveness accrued on cells. CRISPR gRNA constructs were validated by Western Blot and qPCR. Organoids containing knockouts of p53, BRCA1 and BRCA2 were used to model precursor lesions of HGS, whereas ARID1A knockouts and an inducible PIK3CA activating mutations were used to model CCOC. To gain further insight into ciliated cells of the endometrium, organoids were treated with the notch inhibitor-DBZ to drive differentiation of cells towards a ciliated cell lineage. We analyzed organoids by single-cell RNA sequencing (scRNA-seq), immunohistochemistry (IHC), and immunofluorescence staining (IF). Single cells were derived by purifying the organoids from Matrigel followed by a chemical and physical digestion. scRNA-seq was performed utilizing the 10X Genomics Platform and analyzed by in-house bioinformaticians. Bioinformatic analyses included stringent QC to remove low-quality and dead cells, before applying unsupervised learning algorithms like PCA and Gaussian mixture modeling as well as differential expression analysis to understand both how samples relate to each other and cell types discovered within each sample. RESULTS: We successfully recapitulated the histology observed in tissues by growing endometrial and fallopian tube organoids. The notch inhibitor, DBZ forced ciliated cell differentiation, as observed by IHC, IF and scRNA-seq. scRNA-seq clustering of DBZ-treated organoid cultures revealed a possible intermediary state between progenitor and ciliated cells. Initial IHC and IF analyses of CRISPR-mediated organoids reveal successful gene manipulation. CONCLUSIONS: Organoid cultures present as a powerful method for modelling precursor lesions; they can be readily manipulated genetically and with rapid turnaround compared to conventional mouse models. Organoids are also amenable to sequencing at single-cell resolution. The ability to model ovarian cancers with permanent knockouts in human tissue serves as a necessary link between animal models and human therapy. Citation Format: Germain C. Ho, Dawn R. Cochrane, Evan W. Gibbard, Kieran Campbell, Basile Tessier-Cloutier, Kendall Greening, Forouh Kalantari, Genny Trigo-Gonzalez, Yemin Wang, Jessica N. McAlpine, Sohrab P. Shah, David G. Huntsman. MODELS AND ANALYTIC TECHNIQUES OF MULLERIAN TISSUE-DERIVED ORGANOIDS [abstract]. In: Proceedings of the 12th Biennial Ovarian Cancer Research Symposium; Sep 13-15, 2018; Seattle, WA. Philadelphia (PA): AACR; Clin Cancer Res 2019;25(22 Suppl):Abstract nr GMM-030.
- Published
- 2019
35. Abstract GMM-020: CELL OF ORIGIN, MUTATION AND MICROENVIRONMENT: MODELING EARLY EVENTS OF ENDOMETRIOSIS ASSOCIATED CANCERS
- Author
-
Kieran R Campbell, C Blake Gilks, Lien N. Hoang, Basile Tessier-Cloutier, David Huntsman, Jessica N. McAlpine, Clara Salamanca, Sohrab P. Shah, Dawn R. Cochrane, Katherine M. Lawrence, Germain Ho, Evan W. Gibbard, Tayyebeh M. Nazeran, Anthony Karnezis, and Angela S. Cheng
- Subjects
Cancer Research ,Oncology ,Cell of origin ,Mutation (genetic algorithm) ,Endometriosis ,medicine ,Cancer research ,Biology ,medicine.disease - Abstract
Both clear cell ovarian carcinoma (CCOC) and endometrioid ovarian carcinoma (ENOC) are associated with ovarian endometriotic cysts, which is believed to be their precursor lesion. However, genomic evidence is lacking which could explain how these two clinically distinct histotypes of ovarian cancer arise from the same precursor lesion. We therefore hypothesized that these cancers arise from distinct cells of origin within endometrial tissue. Global proteomic analysis of ovarian cancer histotypes identified CTH as a marker for CCOC. We further found that CTH is highly expressed in the ciliated cells of endometrium (both ectopic endometrium and endometriosis), and of the fallopian tube, with little expression in the secretory cells. We also find that other ciliated cell markers are expressed in CCOC, whereas endometrial secretory cell markers are expressed in ENOC. We propose a new model of CCOC and ENOC histogenesis wherein ENOC is derived from cells of secretory cell lineage whereas CCOC is derived from cells of ciliated cell lineage. However, it remains unclear how external factors in the endometriotic cyst cooperate with cell of origin and mutation to promote cancer formation. To study normal tissue biology, we are using organoid cultures of normal endometrium. As ciliated cells of the endometrium are rare, and we have a particular interest in determining whether they have other features that may link them to CCOC, we used a Notch inhibitor, DBZ, to force ciliated cell differentiation in the organoids. We observed a dramatic shift in the cellular content with DBZ treamtment towards ciliated cells. We performed single cell RNA sequencing (scRNAseq) on these endometrial organoids. In the normal endometrial organoids, cells were predominantly a secretory phenotype, characterized by high ESR1 expression, with a minor ciliated cell population. The ciliated cell population expressed several known ciliated markers (FOXJ1 and DNAH12). Upon treatment with DBZ, the number of secretory cells decreases dramatically and two populations of cells emerge which have ciliated cell markers. The larger ciliated cell population is similar to the ciliated cells in the untreated organoids. The smaller ciliated cell population in the DBZ treated organoids express some ciliated cell markers, but clusters separately from normal ciliated cells. We believe this population may reprepsent an intermediary population, which has not fully differentiated. Interestingly, this population expresses the cytokine IL6, while the normal ciliated cell population does not. This is of note because CCOCs express more IL6 compared to the other histotypes. Therefore, we can speculate that this intermediary ciliated cell population may represent cells from which CCOC arise, however more testing is needed. In the future, the scRNAseq data from organoids will be compared to CCOC and ENOC tumors to determine whether the tumors resemble more closely one population of normal cells. We will use viral transduction to introduce mutations into the organoid cultures to determine whether specific mutation leads to transformation towards a CCOC or ENOC-like phenotype. These studies will enable us to tease apart the relative contribution of mutation, microenvironment and the cell of origin to promote tumor formation. Citation Format: Dawn R Cochrane, Basile Tessier-Cloutier, Germain Ho, Kieran Campbell, Evan Gibbard, Katherine M Lawrence, Tayyebeh Nazeran, Anthony N. Karnezis, Clara Salamanca, Angela S Cheng, Jessica N McAlpine, Sohrab Shah, Lien N Hoang, C Blake Gilks and David G Huntsman. CELL OF ORIGIN, MUTATION AND MICROENVIRONMENT: MODELING EARLY EVENTS OF ENDOMETRIOSIS ASSOCIATED CANCERS [abstract]. In: Proceedings of the 12th Biennial Ovarian Cancer Research Symposium; Sep 13-15, 2018; Seattle, WA. Philadelphia (PA): AACR; Clin Cancer Res 2019;25(22 Suppl):Abstract nr GMM-020.
- Published
- 2019
36. Abstract 4706: Temperature-dependent transcription artifacts and cell population biases in scRNAseq data are minimized by tissue dissociation at low temperatures
- Author
-
Samuel Aparicio, Kieran R Campbell, Farhia Kabeer, Jamie Lim, Sohrab P. Shah, Ciara H. O'Flanagan, and Allen W. Zhang
- Subjects
Cancer Research ,Tumor microenvironment ,education.field_of_study ,Chemistry ,Cell ,Population ,RNA ,Cell biology ,medicine.anatomical_structure ,Oncology ,Transcription (biology) ,Gene expression ,medicine ,Cytotoxic T cell ,education ,Gene - Abstract
Single cell RNA sequencing (scRNAseq) is a powerful tool, particularly for studying complex biological systems, such as tumor heterogeneity and the tumor microenvironment, which may not be resolved by sequencing of bulk material. Nonetheless, it is not without limitations, which include the technical challenges of generating a high quality single cell suspension. Dissociation of tissue to single cell suspension requires mechanical and enzymatic disruption, and the effect of these methods on gene expression or cellular population bias has not been established. In this study, we examined the effects of enzymatic dissociation on cell population capture and transcriptional changes at single cell resolution in breast and ovarian cancer patient samples, patient-derived breast cancer xenografts and cultured cell lines. scRNAseq data showed that enzymatic dissociation of tissues at 37oC with collagenase resulted in significant induction of heat shock, stress and immediate response genes, which was conserved across all tissues. This gene expression induction was not observed when tissues were dissociated at 6oC with a protease derived from a Himalayan glacier soil bacterium. Moreover, dissociation of patient tumors at low temperature enhanced the abundance of rare cell populations, including B-cells, T-Cells and cytotoxic T-cells, which were significantly depleted following dissociation at 37oC. These biases resulting from standard sample preparation methods could significantly affect biological interpretation of scRNAseq data, and can be minimized by dissociation of tissues at low temperature. Note: This abstract was not presented at the meeting. Citation Format: Ciara H. O'Flanagan, Kieran R. Campbell, Farhia Kabeer, Allen Zhang, Jamie Lim, Sohrab P. Shah, Samuel Aparicio. Temperature-dependent transcription artifacts and cell population biases in scRNAseq data are minimized by tissue dissociation at low temperatures [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 4706.
- Published
- 2019
37. Probabilistic inference of bifurcations in single-cell data using a hierarchical mixture of factor analysers
- Author
-
Christopher Yau and Kieran R. Campbell
- Subjects
0303 health sciences ,Heuristic ,Computer science ,business.industry ,Bayesian probability ,Sampling (statistics) ,Inference ,Markov chain Monte Carlo ,Statistical model ,Probabilistic inference ,computer.software_genre ,Machine learning ,Field (computer science) ,03 medical and health sciences ,symbols.namesake ,0302 clinical medicine ,symbols ,Data mining ,Artificial intelligence ,business ,computer ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
Modelling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analysers. Our model exhibits competitive performance on large datasets despite implementing full MCMC sampling and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process.
- Published
- 2016
38. scater:pre-processing, quality control, normalisation and visualisation of single-cell RNA-seq data in R
- Author
-
Kieran R. Campbell, Davis J. McCarthy, Quin F. Wills, and Aaron T. L. Lun
- Subjects
0303 health sciences ,Downstream (software development) ,Database ,Computer science ,Cell ,Process (computing) ,RNA ,RNA-Seq ,computer.software_genre ,Visualization ,Bioconductor ,03 medical and health sciences ,0302 clinical medicine ,medicine.anatomical_structure ,Gene expression ,medicine ,Data mining ,computer ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
MotivationSingle-cell RNA sequencing (scRNA-seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data for further analysis is not a straightforward process. Biases, artifacts, and other sources of unwanted variation are present in the data, requiring substantial time and effort to be spent on pre-processing, quality control (QC) and normalisation.ResultsWe have developed the R/Bioconductor packagescaterto facilitate rigorous pre-processing, quality control, normalisation and visualisation of scRNA-seq data. The package provides a convenient, flexible workflow to process raw sequencing reads into a high-quality expression dataset ready for downstream analysis.scaterprovides a rich suite of plotting tools for single-cell data and a flexible data structure that is compatible with existing tools and can be used as infrastructure for future software development.AvailabilityThe open-source code, along with installation instructions, vignettes and case studies, is available through Bioconductor athttp://bioconductor.org/packages/scater.Supplementary informationSupplementary material is available online atbioRxivaccompanying this manuscript, and all materials required to reproduce the results presented in this paper are available at dx.doi.org/10.5281/zenodo.60139.
- Published
- 2016
39. switchde: inference of switch-like differential expression along single-cell trajectories
- Author
-
Kieran R, Campbell and Christopher, Yau
- Subjects
Models, Statistical ,Models, Genetic ,Sequence Analysis, RNA ,Gene Expression Profiling ,Gene Expression ,Single-Cell Analysis ,Applications Notes ,Software - Abstract
Motivation: Pseudotime analyses of single-cell RNA-seq data have become increasingly common. Typically, a latent trajectory corresponding to a biological process of interest—such as differentiation or cell cycle—is discovered. However, relatively little attention has been paid to modelling the differential expression of genes along such trajectories. Results: We present switchde, a statistical framework and accompanying R package for identifying switch-like differential expression of genes along pseudotemporal trajectories. Our method includes fast model fitting that provides interpretable parameter estimates corresponding to how quickly a gene is up or down regulated as well as where in the trajectory such regulation occurs. It also reports a P-value in favour of rejecting a constant-expression model for switch-like differential expression and optionally models the zero-inflation prevalent in single-cell data. Availability and Implementation: The R package switchde is available through the Bioconductor project at https://bioconductor.org/packages/switchde. Contact: kieran.campbell@sjc.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2016
40. A descriptive marker gene approach to single-cell pseudotime inference
- Author
-
Kieran R. Campbell and Christopher Yau
- Subjects
Statistics and Probability ,Cell type ,Computer science ,Bayesian probability ,Cell ,Inference ,Gene Expression ,Computational biology ,computer.software_genre ,Biochemistry ,Marker gene ,Transcriptome ,Bayes' theorem ,03 medical and health sciences ,0302 clinical medicine ,Single-cell analysis ,Gene expression ,medicine ,Transient (computer programming) ,Trajectory learning ,Molecular Biology ,Gene ,030304 developmental biology ,0303 health sciences ,business.industry ,Gene Expression Profiling ,030302 biochemistry & molecular biology ,Computational Biology ,Bayes Theorem ,Pattern recognition ,Original Papers ,Computer Science Applications ,Computational Mathematics ,medicine.anatomical_structure ,Computational Theory and Mathematics ,Trajectory ,Data mining ,Artificial intelligence ,Single-Cell Analysis ,business ,computer ,Algorithms ,Software ,030217 neurology & neurosurgery - Abstract
Motivation Pseudotime estimation from single-cell gene expression data allows the recovery of temporal information from otherwise static profiles of individual cells. Conventional pseudotime inference methods emphasize an unsupervised transcriptome-wide approach and use retrospective analysis to evaluate the behaviour of individual genes. However, the resulting trajectories can only be understood in terms of abstract geometric structures and not in terms of interpretable models of gene behaviour. Results Here we introduce an orthogonal Bayesian approach termed ‘Ouija’ that learns pseudotimes from a small set of marker genes that might ordinarily be used to retrospectively confirm the accuracy of unsupervised pseudotime algorithms. Crucially, we model these genes in terms of switch-like or transient behaviour along the trajectory, allowing us to understand why the pseudotimes have been inferred and learn informative parameters about the behaviour of each gene. Since each gene is associated with a switch or peak time the genes are effectively ordered along with the cells, allowing each part of the trajectory to be understood in terms of the behaviour of certain genes. We demonstrate that this small panel of marker genes can recover pseudotimes that are consistent with those obtained using the entire transcriptome. Furthermore, we show that our method can detect differences in the regulation timings between two genes and identify ‘metastable’ states—discrete cell types along the continuous trajectories—that recapitulate known cell types. Availability and implementation An open source implementation is available as an R package at http://www.github.com/kieranrcampbell/ouija and as a Python/TensorFlow package at http://www.github.com/kieranrcampbell/ouijaflow. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2016
41. Unravelling Cell Cycle and Ontogeny Transcriptional Heterogeneity in Hematopoietic Stem Cells through Integrated Single Cell RNA-Seq
- Author
-
Christopher A.G. Booth, Christopher Yau, Benjamin J. Povinelli, Quin F. Wills, Nikolaos Barkas, Adam J. Mead, Kieran R. Campbell, and Sten Eirik W. Jacobsen
- Subjects
Immunology ,Cell ,RNA ,RNA-Seq ,Cell Biology ,Hematology ,Biology ,Cell cycle ,Biochemistry ,Cell biology ,Transplantation ,Haematopoiesis ,medicine.anatomical_structure ,Cell culture ,medicine ,Stem cell - Abstract
During fetal development, hematopoietic stem cells (HSCs) undergo a remarkable expansion through a combination of rapid proliferation and high rates of self-renewal. In contrast, adult HSCs are characterized by long-term quiescence. Understanding of the molecular mechanisms underlying these ontogeny-dependent differences in cell cycle and self-renewal is hampered by marked heterogeneity within the HSC compartment, making it difficult to distinguish overlapping signatures in bulk transcriptional data. Advances in single cell genomics provide a new opportunity to tease apart different sources of gene expression heterogeneity, including those relating to cell cycle and self-renewal capability. To address these questions, and improve resolution for cell-cycle annotation of individual HSCs, we developed an integrated single cell (sc)RNA-seq and live cell-cycle staining technique using Hoechst 33342 (DNA) and Pyronin-y (RNA) based FACS index sorting, followed by smart-seq2 based scRNA-seq. We validated our approach on 4 hematopoietic cell lines from mouse and human, using these data as a training set to apply a novel integrated pseudotime package that orders single cells by stage of cell cycle rather than developmental trajectory. By this approach we detected non-canonical cell cycle genes not apparent through bulk sorting of distinct cell cycle phases, and not previously annotated in published cell cycle gene sets (sc-pseudotime genes = 665, FDR < 0.05; non-annotated cell cycle = 487; non-bulk detected = 570). We then applied our technique to analyze primary mouse HSCs from 3 developmental time points (e15.5 fetal liver (FL), 2 week old bone marrow, and 6 week old adult bone marrow (ABM), n >1,500 single cells). Our cell cycle based integrated pseudotime analysis revealed distinct cell cycle signatures for FL, ABM, and common cell cycle related transcripts across distinct developmental time points that have not previously been described, including 555 unique cell cycle genes in FL; 401 unique cell cycle genes in ABM, and 93 novel cell cycle genes in common to both developmental time points e.g. Pclaf, Zfp367 and including long non-coding RNAs e.g. Lockd (FDR < 0.001). Our dataset uniquely allowed us to explore ontogeny related molecular signatures without the overriding effect of confounding cell cycle associated gene expression by directly comparing non-mitotic cells from FL and ABM groups. We identified 404 differentially expressed transcripts (FDR 2FC), including genes of unknown HSC function (e.g., FL: Lgals1, Gmfg; ABM: Zfp36l1, Rgs1). Single cell qPCR confirmed aberrant expression of 26/29 (89.7%) selected ontogeny candidate genes. Furthermore, hallmark gene set enrichment analysis revealed upregulation of oxidative phosphorylation, MYC targets, and E2F targets in FL; and TNFA signaling, Hypoxia, and TGF-beta signaling, among others, in ABM (FDR 1.5). We then functionally reversed ABM quiescence through in vivo 5-FU treatment, and performed our single cell RNA-seq analysis on HSCs both 2 and 6 days post injection. This allowed us to identify genes and pathways associated with selective resistance of HSCs to chemotherapy, including upregulation of the hallmark gene sets for unfolded protein response, fatty acid metabolism, and MTOR signaling (FDR < 0.01, NES > 1.5 for each set). To functionally validate novel ontogeny related genes we utilized genetic mouse models for two unexplored ABM related genes, Zfp36L1, an RNA-binding zinc finger protein, and Rgs1 a regulator of G-protein coupled signaling. We transplanted Cre-ERT2 conditionally floxed Zfp36L1 bone marrow with CD45.1 competitor control and induced deletion by tamoxifen four weeks post transplant. Compared to the initial four-week post transplant time point we observed a significant reduction in chimerism from Zfp36L1 deleted bone marrow compared to Cre-ERT2 control (p < .05). Competitive transplantation of Rgs1 -/- and WT bone marrow at a 1:1 ratio with CD45.1 competitors resulted in significantly reduced myeloid chimerism at 16 weeks post transplant. Secondary transplant and single cell cycle molecular analysis of these mice are ongoing together with functional validation of a number of other candidate genes. Our results demonstrate the utility of single cell analysis to discover novel HSC regulators providing a unique dataset for further studies investigating regulators of HSC function. Disclosures Mead: BMS: Honoraria; Pfizer: Honoraria; Novartis: Honoraria, Research Funding, Speakers Bureau.
- Published
- 2017
42. Laplacian eigenmaps and principal curves for high resolution pseudotemporal ordering of single-cell RNA-seq profiles
- Author
-
Kieran R. Campbell, Chris P. Ponting, and Caleb Webber
- Subjects
Computer science ,Cell ,Principal curves ,High resolution ,RNA-Seq ,Computational biology ,computer.software_genre ,Transcriptome ,medicine.anatomical_structure ,medicine ,Embedding ,Data mining ,Laplace operator ,computer - Abstract
Advances in RNA-seq technologies provide unprecedented insight into the variability and heterogeneity of gene expression at the single-cell level. However, such data offers only a snapshot of the transcriptome, whereas it is often the progression of cells through dynamic biological processes that is of interest. As a result, one outstanding challenge is to infer such progressions by ordering gene expression from single cell data alone, known as the cell ordering problem. Here, we introduce a new method that constructs a low-dimensional non-linear embedding of the data using laplacian eigenmaps before assigning each cell a pseudotime using principal curves. We characterise why on a theoretical level our method is more robust to the high levels of noise typical of single-cell RNA-seq data before demonstrating its utility on two existing datasets of differentiating cells.
- Published
- 2015
43. Bayesian Gaussian Process Latent Variable Models for pseudotime inference in single-cell RNA-seq data
- Author
-
Christopher Yau and Kieran R. Campbell
- Subjects
0303 health sciences ,Bayesian probability ,Inference ,Genomics ,Computational biology ,Latent variable ,Bayesian inference ,Bioinformatics ,03 medical and health sciences ,symbols.namesake ,Identification (information) ,0302 clinical medicine ,symbols ,Point estimation ,Gaussian process ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
Single-cell genomics has revolutionised modern biology while requiring the development of advanced computational and statistical methods. Advances have been made in uncovering gene expression heterogeneity, discovering new cell types and novel identification of genes and transcription factors involved in cellular processes. One such approach to the analysis is to construct pseudotime orderings of cells as they progress through a particular biological process, such as cell-cycle or differentiation. These methods assign a score - known as the pseudotime - to each cell as a surrogate measure of progression. However, all published methods to date are purely algorithmic and lack any way to give uncertainty to the pseudotime assigned to a cell. Here we present a method that combines Gaussian Process Latent Variable Models (GP-LVM) with a recently published electroGP prior to perform Bayesian inference on the pseudotimes. We go on to show that the posterior variability in these pseudotimes leads to nontrivial uncertainty in the pseudo-temporal ordering of the cells and that pseudotimes should not be thought of as point estimates.
- Published
- 2015
44. Eleven grand challenges in single-cell data science
- Author
-
David Lähnemann, Johannes Köster, Ewa Szczurek, Davis J. McCarthy, Stephanie C. Hicks, Mark D. Robinson, Catalina A. Vallejos, Kieran R. Campbell, Niko Beerenwinkel, Ahmed Mahfouz, Luca Pinello, Pavel Skums, Alexandros Stamatakis, Camille Stephan-Otto Attolini, Samuel Aparicio, Jasmijn Baaijens, Marleen Balvert, Buys de Barbanson, Antonio Cappuccio, Giacomo Corleone, Bas E. Dutilh, Maria Florescu, Victor Guryev, Rens Holmer, Katharina Jahn, Thamar Jessurun Lobo, Emma M. Keizer, Indu Khatri, Szymon M. Kielbasa, Jan O. Korbel, Alexey M. Kozlov, Tzu-Hao Kuo, Boudewijn P.F. Lelieveldt, Ion I. Mandoiu, John C. Marioni, Tobias Marschall, Felix Mölder, Amir Niknejad, Alicja Rączkowska, Marcel Reinders, Jeroen de Ridder, Antoine-Emmanuel Saliba, Antonios Somarakis, Oliver Stegle, Fabian J. Theis, Huan Yang, Alex Zelikovsky, Alice C. McHardy, Benjamin J. Raphael, Sohrab P. Shah, and Alexander Schönhuth
- Subjects
Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands—or even millions—of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.
- Published
- 2020
- Full Text
- View/download PDF
45. Dissociation of solid tumor tissues with cold active protease for single-cell RNA-seq minimizes conserved collagenase-associated stress responses
- Author
-
Ciara H. O’Flanagan, Kieran R. Campbell, Allen W. Zhang, Farhia Kabeer, Jamie L. P. Lim, Justina Biele, Peter Eirew, Daniel Lai, Andrew McPherson, Esther Kong, Cherie Bates, Kelly Borkowski, Matt Wiens, Brittany Hewitson, James Hopkins, Jenifer Pham, Nicholas Ceglia, Richard Moore, Andrew J. Mungall, Jessica N. McAlpine, The CRUK IMAXT Grand Challenge Team, Sohrab P. Shah, and Samuel Aparicio
- Subjects
Single cell ,RNA-seq ,Tissue dissociation ,Gene expression ,Quality control ,Breast cancer ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Background Single-cell RNA sequencing (scRNA-seq) is a powerful tool for studying complex biological systems, such as tumor heterogeneity and tissue microenvironments. However, the sources of technical and biological variation in primary solid tumor tissues and patient-derived mouse xenografts for scRNA-seq are not well understood. Results We use low temperature (6 °C) protease and collagenase (37 °C) to identify the transcriptional signatures associated with tissue dissociation across a diverse scRNA-seq dataset comprising 155,165 cells from patient cancer tissues, patient-derived breast cancer xenografts, and cancer cell lines. We observe substantial variation in standard quality control metrics of cell viability across conditions and tissues. From the contrast between tissue protease dissociation at 37 °C or 6 °C, we observe that collagenase digestion results in a stress response. We derive a core gene set of 512 heat shock and stress response genes, including FOS and JUN, induced by collagenase (37 °C), which are minimized by dissociation with a cold active protease (6 °C). While induction of these genes was highly conserved across all cell types, cell type-specific responses to collagenase digestion were observed in patient tissues. Conclusions The method and conditions of tumor dissociation influence cell yield and transcriptome state and are both tissue- and cell-type dependent. Interpretation of stress pathway expression differences in cancer single-cell studies, including components of surface immune recognition such as MHC class I, may be especially confounded. We define a core set of 512 genes that can assist with the identification of such effects in dissociated scRNA-seq experiments.
- Published
- 2019
- Full Text
- View/download PDF
46. clonealign: statistical integration of independent single-cell RNA and DNA sequencing data from human cancers
- Author
-
Kieran R. Campbell, Adi Steif, Emma Laks, Hans Zahn, Daniel Lai, Andrew McPherson, Hossein Farahani, Farhia Kabeer, Ciara O’Flanagan, Justina Biele, Jazmine Brimhall, Beixi Wang, Pascale Walters, IMAXT Consortium, Alexandre Bouchard-Côté, Samuel Aparicio, and Sohrab P. Shah
- Subjects
Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Measuring gene expression of tumor clones at single-cell resolution links functional consequences to somatic alterations. Without scalable methods to simultaneously assay DNA and RNA from the same single cell, parallel single-cell DNA and RNA measurements from independent cell populations must be mapped for genome-transcriptome association. We present clonealign, which assigns gene expression states to cancer clones using single-cell RNA and DNA sequencing independently sampled from a heterogeneous population. We apply clonealign to triple-negative breast cancer patient-derived xenografts and high-grade serous ovarian cancer cell lines and discover clone-specific dysregulated biological pathways not visible using either sequencing method alone.
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.