1. De novo sequencing, assembly and annotation of the Agapornis roseicollis genome to identify variants for the development of genetic screening tests
- Author
-
Van der Zwan, Henriëtte, Van der Sluis, Rencia, Visser, Carina, and 21224919 - Van der Sluis, Rencia (Supervisor)
- Abstract
PhD (Biochemistry), North-West University, Potchefstroom Campus, 2019 The genus Agapornis consists of nine small African parrots that are commonly called lovebirds. These birds are found in their natural habitat across Africa (eight species) and Madagascar (one species), but also house as domesticated pet birds across the globe. Eight of these species have been domesticated of which five are commonly found in breeding systems. Wild populations are placed under strain due to poaching and illegal export to sell birds to the pet market. Trade has subsequently been restricted in some countries due to declining numbers. Poaching, trapping and illegal export are however still a problem for some populations. The main selection criterion breeders use to select birds is plumage colour variations. There are 30 known colour variations found amongst these species, many of which can be combined. Very little research has been conducted on the molecular genetic mechanisms that control parrot plumage pigmentation. This group of birds have a unique pigment, psittacofulvin, that is only found amongst parrots. It is believed to be genetically controlled and not under dietary control. The inheritance pattern of these variations have been determined by breeders via test matings, and most are inherited as Mendelian traits. Despite the inheritance patterns being known, the genes and polymorphisms linked to these traits have not yet been identified. This has caused breeders to use pedigree data to predict the colour genotypes of an offspring with wildtype colouration. Notwithstanding this practice, there is no molecular parentage verification panel available for lovebirds. The avian parentage verification panels that are available were developed for an array of bird species and are all microsatellite marker based. Microsatellite markers are becoming redundant in animal breeding systems and replaced with Single Nucleotide Polymorphism (SNPs). One of the limitations of developing a parentage verification panel for this genus, is the lack of a reference genome from where SNPs could be identified. The de novo genome of A. roseicollis were subsequently sequenced, assembled and annotated for this purpose. Sequencing was performed at 100x coverage using the Illumina HiSeq 2000 platform. Three shotgun sequencing libraries of insert sizes 300 bp, 550 bp and 750 bp, respectively, as well as two long jumping distance pairedend libraries of sizes 3 kbp and 8kbp were constructed. The de novo assembly was performed using the SOAPdenovo v2.04 assembler and a k-mer length of 69 was applied. The genome was found to be 1.1 Gbp in size, with contig and scaffold N50 lengths of 5 45 bp and 108 514 bp, respectively, and the G/C content 43%. During the genome annotation phase 15 045 coding gene sequences and 999 non-coding gene sequences were identified. The genome assembly compared well with previously assembled avian genomes such as the budgerigar (Melopsittacus undulates), scarlet macaw (Ara macao) and Puerto Rican parrot (Amazona vittata) in terms of genome size, number of genes annotated and scaffold and contig N50 lengths. The number of eukaryotic core genes detected in the lovebird assembly outperformed those identified in the budgerigar, Puerto Rican parrot and scarlet macaw assemblies, indicating that the assembly was accurate and complete. The genomes of both of the parents of the reference genome individual were sequenced and the sequencing data used to identify SNPs throughout the genome that could be included in a parentage verification panel. Sequencing was performed at 30x coverage using the Illumina HiSeq 2000 platform. Two shotgun sequencing libraries of insert sizes 300 kb and 550 kb, respectively, were constructed. These reads were mapped against the reference genome of their chick and variants were discovered using the variant caller Genome Analysis Toolkit (GATK). Over 2 million raw variants were discovered for the mother while 1,60 million raw variants were discovered for the father. The parents' genotypes were combined to identify SNPs that were shared by the two birds. Unwanted variants such as insertions and deletions (indels) were discarded which resulted in a callset of 1,66 million SNPs. These SNPs were filtered based on parameters as recommended by GATK resulting in 103 287 SNPs that passed the criterion that were set. Two of the parameters applied were QUAL (a Phred-based prediction of a false positive variant) and QD (normalization of the QUAL score for sequencing depth). True variants are found in the QD range of 11.5 to 12.5 and a higher QUAL score indicate a true variant. Therefore, all SNPs within this QD range, subsequently ranked by their QUAL scores were included. One SNP per scaffold was selected from this set and the top 480 SNPs were included in the final parentage verification panel. A population of 960 lovebirds from seven different species were genotyped at these 480 SNPs using the QuantStudio 12 K Flex platform. These birds included the reference genome individual and its father. A panel of 262 SNPs were constructed where the father’s genotype amplified and were used as a reference. This panel was further reduced to include SNPs with minor allele frequencies (MAF) and observed heterozygosity (HO) values greater than 0.1. This resulted in a panel of 195 SNPs. The third panel was filtered based on the same parameters but included SNPs with MAF and HO values greater than 0.3 and amounted to 40 SNPs. The three panels were all assessed for their exclusion power in 43 lovebird families with known pedigrees. It was found that the 195-SNP panel was the panel with the greatest exclusion power applying the least number of SNPs and was proposed as the lovebird parentage verification panel National Research Foundation (NRF) Human Resources for Industry Programme (THRIP) Technology Innovation Agency (TIA) Doctoral
- Published
- 2019