1. VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank
- Author
-
Alexander L. Greninger, Michelle J. Lin, Ryan C. Shean, Graham D. Stoddard, and Negar Makhsous
- Subjects
Virus sequence ,Data submission ,viruses ,Computational biology ,Mumps virus ,Genome, Viral ,Biology ,Dengue virus ,lcsh:Computer applications to medicine. Medical informatics ,medicine.disease_cause ,Biochemistry ,Genome ,Virus ,GenBank ,03 medical and health sciences ,Annotation ,0302 clinical medicine ,Structural Biology ,medicine ,Humans ,lcsh:QH301-705.5 ,Molecular Biology ,030304 developmental biology ,VAPiD ,Whole genome sequencing ,0303 health sciences ,Applied Mathematics ,Genome project ,Genomics ,Computer Science Applications ,lcsh:Biology (General) ,NCBI ,030220 oncology & carcinogenesis ,lcsh:R858-859.7 ,Viral annotation ,Viral genomics ,Databases, Nucleic Acid ,Software - Abstract
With sequencing technologies becoming cheaper and easier to use, more groups are able to obtain whole genome sequences of viruses of public health and scientific importance. Submission of genomic data to NCBI GenBank is a requirement prior to publication and plays a critical role in making scientific data publicly available. GenBank currently has automatic prokaryotic and eukaryotic genome annotation pipelines but has no viral annotation pipeline beyond influenza virus. Annotation and submission of viral genome sequence is a non-trivial task, especially for groups that do not routinely interact with GenBank for data submissions. We present Viral Annotation Pipeline and iDentification (VAPiD), a portable and lightweight command-line tool for annotation and GenBank deposition of viral genomes. VAPiD supports annotation of nearly all unsegmented viral genomes. The pipeline has been validated on human immunodeficiency virus, human parainfluenza virus 1–4, human metapneumovirus, human coronaviruses (229E/OC43/NL63/HKU1/SARS/MERS), human enteroviruses/rhinoviruses, measles virus, mumps virus, Hepatitis A-E Virus, Chikungunya virus, dengue virus, and West Nile virus, as well the human polyomaviruses BK/JC/MCV, human adenoviruses, and human papillomaviruses. The program can handle individual or batch submissions of different viruses to GenBank and correctly annotates multiple viruses, including those that contain ribosomal slippage or RNA editing without prior knowledge of the virus to be annotated. VAPiD is programmed in Python and is compatible with Windows, Linux, and Mac OS systems. We have created a portable, lightweight, user-friendly, internet-enabled, open-source, command-line genome annotation and submission package to facilitate virus genome submissions to NCBI GenBank. Instructions for downloading and installing VAPiD can be found at https://github.com/rcs333/VAPiD .
- Published
- 2018