Back to Search Start Over

Quartets Enable Statistically Consistent Estimation of Cell Lineage Trees Under an Unbiased Error and Missingness Model (Abstract)

Authors :
Yunheng Han and Erin K. Molloy
Han, Yunheng
Molloy, Erin K.
Yunheng Han and Erin K. Molloy
Han, Yunheng
Molloy, Erin K.
Publication Year :
2023

Abstract

Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells [Lim et al., 2020]. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is fully resolved. These assumptions are challenged in tumor phylogenetics because single-cell sequencing produces sparse, error-ridden data and because tumors evolve clonally [Jahn et al., 2016; Schwartz and Schäffer, 2017]. Here, we study the theoretical utility of methods based on quartets (four-leaf, unrooted phylogenetic trees) and triplets (three-leaf, rooted phylogenetic trees), in light of these barriers. Quartets and triplets have long been used as the building blocks for reconstructing the evolutionary history of species [Wilkinson et al., 2005]. The reason triplet-based methods (e.g., MP-EST [Liu et al., 2010]) and quartet-based methods (e.g., ASTRAL [Mirarab et al., 2014]) have garnered such success in species phylogenetics is their good statistical properties under the Multi-Species Coalescent (MSC) model [Pamilo and Nei, 1988; Rannala and Yang, 2003]; see Allman et al. (2011) and Degnan (2006) for identifiability results under the MSC model for quartets and triplets, respectively. Inspired by these efforts, we study the utility of quartets and triplets for estimating cell lineage trees under a popular tumor phylogenetics model [Jahn et al., 2016; Ross and Markowetz, 2016; Wu, 2019; Kizilkale et al., 2022] with two phases. First, mutations arise on a (highly unresolved) cell lineage tree according to the infinite sites model, and second, errors (false positives and false negatives) and missing values are introduced to the resulting mutation data in an unbiased fashion, mimicking data produced by single-cell sequencing protocols. This infinite sites plus unbiased error and missingness (IS+UEM) model generates mutations (rather than gene genealog

Details

Database :
OAIster
Notes :
application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1402194073
Document Type :
Electronic Resource
Full Text :
https://doi.org/10.4230.LIPIcs.WABI.2023.8