1. Pan-genome Storage and Analysis Techniques
- Author
-
Zekic, Tina, Holley, Guillaume, Stoye, Jens, Setubal, João C., and Stadler, Peter
- Subjects
0301 basic medicine ,Comparative genomics ,03 medical and health sciences ,Identification (information) ,ComputingMethodologies_PATTERNRECOGNITION ,030104 developmental biology ,Computer science ,Pan-genome ,Gene ,Genome ,Data science ,DNA sequencing ,Variety (cybernetics) - Abstract
Computational pan-genome analysis has emerged from the rapid increase of available genome sequencing data. Starting from a microbial pan-genome, the concept has spread to a variety of species, such as plants or viruses. Characterizing a pan-genome provides insights into intra-species evolution, functions, and diversity. However, researchers face challenges such as processing and maintaining large datasets while providing accurate and efficient analysis approaches. Comparative genomics methods are required for detecting conserved and unique regions between a set of genomes. This chapter gives an overview of tools available for indexing pan-genomes, identifying the sub-regions of a pan-genome and offering a variety of downstream analysis methods. These tools are categorized into two groups, gene-based and sequence-based, according to the pan-genome identification method. We highlight the differences, advantages, and disadvantages between the tools, and provide information about the general workflow, methodology of pan-genome identification, covered functionalities, usability and availability of the tools.
- Published
- 2018