1. GAD: A Python Script for Dividing Genome Annotation Files into Feature-Based Files
- Author
-
Norhan Yasser and Ahmed Karam
- Subjects
Untranslated region ,Computer science ,Big data ,Health Informatics ,Genomics ,Computational biology ,Genome ,DNA sequencing ,General Biochemistry, Genetics and Molecular Biology ,Exon ,03 medical and health sciences ,Annotation ,Intergenic region ,Humans ,Sequence Ontology ,Gene ,030304 developmental biology ,computer.programming_language ,Whole genome sequencing ,0303 health sciences ,Information retrieval ,Genome, Human ,business.industry ,030302 biochemistry & molecular biology ,Intron ,Computational Biology ,Molecular Sequence Annotation ,Genome project ,Gene Annotation ,Python (programming language) ,File format ,Computer Science Applications ,ComputingMethodologies_PATTERNRECOGNITION ,business ,computer ,Software - Abstract
Nowadays, manipulating and analyzing publicly available genomic datasets become a daily task in bioinformatics and genomics laboratories. The release of several genome sequencing projects prompts bioinformaticians to develop automated scripts and pipelines which analyze genomic datasets in particular gene annotation pipelines. Handling genome annotation files with fully-featured programs used by non-developers is necessary, furthermore, accelerating genomic data analysis with a focus on diminishing the genome annotation and sequence files based on specific features is required. Consequently, to extract genome features from GTF or GFF3 in a precise manner, GAD script (https://github.com/bio-projects/GAD) provides a simple graphical user interface which interpreted by all python versions installed in different operating systems. GAD script contains unique entry widgets which are capable to analyze multiple genome sequence and annotation files by a click. With highly influential coded functions, genome features such upstream genes, downstream genes, intergenic regions, genes, transcripts, exons, introns, coding sequences, five prime untranslated regions, and three prime untranslated regions and other ambiguous sequence ontology terms will be extracted. GAD script outputs the results in diverse file formats such as BED, GTF/GFF3 and FASTA files which supported by other bioinformatics programs. Our script could be incorporated into various pipelines in all genomics laboratories with the aim of accelerating data analysis.
- Published
- 2020
- Full Text
- View/download PDF