Back to Search
Start Over
Development and Benchmarking of Imputation Methods for Micriobome and Single-cell Sequencing Data
- Publication Year :
- 2021
-
Abstract
- Next generation sequencing (NGS) has revolutionized biomedical research and has a broad impact and applications. Since its advent around 15 years ago, this high scalable DNA sequencing technology has generated numerous biological data with new features and brought new challenges to data analysis. For example, researchers utilize RNA sequencing (RNA-seq) technology to more accurately quantify the gene expression levels. However, the NGS technology involves many processing steps and technical variations when measuring the expression values in the biological samples. In other words, the NGS data researchers observed could be biased due to the randomness and constraints in the NGS technology. This dissertation will mainly focus on microbiome sequencing data and single-cell RNA-seq (scRNA-seq) data. Both of them are highly sparse matrix-form count data. The zeros could either be biological or non-biological, and the high sparsity in the data have brought challenges to data analysis. Missing data imputation problem has been studied in statistics and social science as the survey data often experience non-response to some of the survey questions and those unresponded questions will be marked as "NA" or missing values in the data. Imputation methods are used to provide a sophisticated guess for the missing values, and the purpose is to avoid discarding the collected samples and for the ease of using the state-of-the-art statistical methods. In machine learning, the famous Netflix data challenge regarding film recommendation system also falls into the missing data imputation problem category. Netflix wants to find a way to predict users' fondness of the movies they have not watched. The potential scores these users would give to the unwatched films are regarded as missing values in the data. NGS data imputation problem is different from the previous two cases in that the missing values in the NGS data are not so well-defined. The zeros in the NGS data could either come from t
Details
- Database :
- OAIster
- Notes :
- application/pdf, English
- Publication Type :
- Electronic Resource
- Accession number :
- edsoai.on1367505453
- Document Type :
- Electronic Resource