Back to Search
Start Over
Determination of a Screening Metric for High Diversity DNA Libraries
- Source :
- PLoS ONE, PLoS ONE, Vol 11, Iss 12, p e0167088 (2016)
- Publication Year :
- 2016
- Publisher :
- Public Library of Science (PLoS), 2016.
-
Abstract
- The fields of antibody engineering, enzyme optimization and pathway construction rely increasingly on screening complex variant DNA libraries. These highly diverse libraries allow researchers to sample a maximized sequence space; and therefore, more rapidly identify proteins with significantly improved activity. The current state of the art in synthetic biology allows for libraries with billions of variants, pushing the limits of researchers' ability to qualify libraries for screening by measuring the traditional quality metrics of fidelity and diversity of variants. Instead, when screening variant libraries, researchers typically use a generic, and often insufficient, oversampling rate based on a common rule-of-thumb. We have developed methods to calculate a library-specific oversampling metric, based on fidelity, diversity, and representation of variants, which informs researchers, prior to screening the library, of the amount of oversampling required to ensure that the desired fraction of variant molecules will be sampled. To derive this oversampling metric, we developed a novel alignment tool to efficiently measure frequency counts of individual nucleotide variant positions using next-generation sequencing data. Next, we apply a method based on the "coupon collector" probability theory to construct a curve of upper bound estimates of the sampling size required for any desired variant coverage. The calculated oversampling metric will guide researchers to maximize their efficiency in using highly variant libraries.
- Subjects :
- 0301 basic medicine
Computer science
lcsh:Medicine
Bioinformatics
computer.software_genre
Biochemistry
Synthetic biology
chemistry.chemical_compound
Oversampling
Nucleotide
DNA libraries
DNA sequencing
lcsh:Science
chemistry.chemical_classification
Numerical Analysis
Multidisciplinary
High-Throughput Nucleotide Sequencing
Genomics
Nucleic acids
Physical Sciences
Metric (mathematics)
Probability distribution
Data mining
Sequence Analysis
Transcriptome Analysis
Research Article
Next-Generation Sequencing
Nucleotide sequencing
Nucleotide Sequencing
Library Screening
Research and Analysis Methods
03 medical and health sciences
Genetics
Humans
Fraction (mathematics)
Molecular Biology Techniques
Sequencing Techniques
Molecular Biology
Gene Library
Probability
Molecular Biology Assays and Analysis Techniques
Sequence Assembly Tools
lcsh:R
Biology and Life Sciences
Computational Biology
Genetic Variation
DNA
Construct (python library)
Models, Theoretical
Genome Analysis
Probability Theory
Probability Distribution
030104 developmental biology
chemistry
lcsh:Q
Numerical Integration
Sequence Alignment
computer
Mathematics
Subjects
Details
- ISSN :
- 19326203
- Volume :
- 11
- Database :
- OpenAIRE
- Journal :
- PLOS ONE
- Accession number :
- edsair.doi.dedup.....fece12e1bd641c313c11ad902d253abb
- Full Text :
- https://doi.org/10.1371/journal.pone.0167088