Back to Search Start Over

Discovery of non-reference processed pseudogenes in the Swedish population

Authors :
Esmee Ten Berk de Boer
Kristine Bilgrav Saether
Jesper Eisfeldt
Source :
Frontiers in Genetics, Vol 14 (2023)
Publication Year :
2023
Publisher :
Frontiers Media S.A., 2023.

Abstract

The vast majority of the human genome is non-coding. There is a diversity of non-coding features, some of which have functional importance. Although the non-coding regions constitute the majority of the genome, they remain understudied, and for a long time, these regions have been referred to as junk DNA. Pseudogenes are one of these features. A pseudogene is a non-functional copy of a protein-coding gene. Pseudogenes may arise through a variety of genetic mechanisms. Processed pseudogenes are formed through reverse transcription of mRNA by LINE elements, after which the cDNA is integrated into the genome. Processed pseudogenes are known to be variable across populations; however, the variability and distribution remains unknown. Herein, we apply a custom-designed processed pseudogene pipeline on the whole genome sequencing data of 3,500 individuals; 2,500 individuals from the thousand genomes dataset, as well as 1,000 Swedish individuals. Through these analyses, we discover over 3,000 pseudogenes missing from the GRCh38 reference. Utilising our pipeline, we position 74% of the detected processed pseudogenes—allowing for analyses of formation. Notably, we find that common structural variant callers, such as Delly, classify the processed pseudogenes as deletion events, which are later predicted to be truncating variants. By compiling lists of non-reference processed pseudogenes and their frequencies, we find a great variability of pseudogenes; indicating that non-reference processed pseudogenes may be useful for DNA testing and as population-specific markers. In summary, our findings highlight a great diversity of processed pseudogenes, that processed pseudogenes are actively formed in the human genome; and that our pipeline may be used to reduce false positive structural variation caused by the misalignment and subsequent misclassification of non-reference processed pseudogenes.

Details

Language :
English
ISSN :
16648021
Volume :
14
Database :
Directory of Open Access Journals
Journal :
Frontiers in Genetics
Publication Type :
Academic Journal
Accession number :
edsdoj.467856195694919b7ff33425f69a148
Document Type :
article
Full Text :
https://doi.org/10.3389/fgene.2023.1176626