Back to Search Start Over

Development of CAbase and an Exon Analysis Pipeline for Visual Assessment of Predicted Genes for the Carbonic Anhydrases

Authors :
Isokangas, Lydia
BioMediTech - BioMediTech
University of Tampere
Publication Year :
2016

Abstract

Background and Aims Humans are able to quickly recognize and evaluate visual patterns, thus this thesis aims to apply this feature to the analysis of aspects of the conservation of carbonic anhydrase proteins. This was facilitated through the creation of two pipelines: * One to create a publically available specialized database to service the CA research world named CAbase, and, * One to create a visual display of the aligned exons of the cDNA transcripts contained within CAbase with indicators to show the positions of start and stop codons along with the locations of the predicted signal and mitochondrial targeting peptides. This pipeline was named Exon_Analysis. Carbonic anhydrases (CAs) are ubiquitous proteins that reversibly catalyse carbon dioxide into carbonic acid. Through the events of duplication, the CAs exist in at least 16 different isoforms and potentially up to 17 different isoforms. Methods The pipelines were created using freely available tools that included python, MySQL, various bioinformatic tools such as Clustal Omega, PRANK, BLAST and Pal2Nal. The data for CAbase was extracted from Ensembl, NCBI, UniProt, RSCB PDB, UniGene and FlyBase. Additionally, calculated data from using SignalP and TargetP was also included. CAbase is hosted on the Amazon Web Server and can be accessed using any computer that has access to the Internet and has MySQL installed. Exon_Analysis draws a scaled exon MSA schematic based on a PRANK MSA of the cDNA transcripts for a CA isoform. The exons and other indicators such as the start and stop codon, and the signal and target peptides are all drawn in different colours and in their scaled locations. Thus it is possible to see the conserved nature of the exons within the coding regions and the aligned start and stop codons and the peptides for each CA isoform. Results CAbase is now publically available for anyone to use. However, it is still somewhat user unfriendly due to the requirement that user be familiar with SQL. CAbase facilitated the use of Exon_Analysis. This pipeline has enabled the quality control of the predicted genes one exon at a time. Evolutionary events such as the conservation of short exons within CAs VI, IX, XII, XIV, X and XI has been shown in the exon MSA schematics. Both tools could be adapted for use with other proteins. Conclusion CAbase is a specialized database for CA researchers that can continue to be adapted to their needs. It allows researchers to create specific queries without having to filter for unwanted proteins. Exon_Analysis creates an exon MSA schematic to visualise the relationship between the exons and other features of interest. This facilitates the quality control of predicted genes one exon at a time.

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.od......4853..16748e99f3b20bd324d7853aff811cec