1. YHP: Y-chromosome Haplogroup Predictor for predicting male lineages based on Y-STRs.
- Author
-
Song, Mengyuan, Zhou, Yuxiang, Zhao, Chenxi, Song, Feng, and Hou, Yiping
- Subjects
- *
Y chromosome , *DATA analysis , *SINGLE nucleotide polymorphisms , *ALLELES , *EAST Asians - Abstract
Human Y chromosome reflects the evolutionary process of males. Male lineage tracing by Y chromosome is of great use in evolutionary, forensic, and anthropological studies. Identifying the male lineage based on the specific distribution of Y haplogroups narrows down the investigation scope, which has been used in forensic scenarios. However, existing software aids in familial searching using Y-STRs (Y-chromosome short tandem repeats) to predict Y-SNP (Y-chromosome single nucleotide polymorphism) haplogroups, they often lack resolution. In this study, we developed YHP (Y Haplogroup Predictor), a novel software offering high-resolution haplogroup inference without requiring extensive Y-SNP sequencing. Leveraging existing datasets (219 haplogroups, 4064 samples in total), YHP predicts haplogroups with 0.923 accuracy under the highest haplogroup resolution, employing a random forest algorithm. YHP, available on Github (https://github.com/cissy123/YHP-Y-Haplogroup-Predictor -), facilitates high-resolution haplogroup prediction, haplotype mismatch analysis, and haplotype similarity comparison. Notably, it demonstrates efficacy in East Asian populations, benefiting from training data from eight distinct East Asian ethnic populations. Moreover, it enables seamless integration of additional training sets, extending its utility to diverse populations. • A prediction accuracy of 0.923 in the highest haplogroup resolution was achieved. • The significance of the 27 utilized Y-STRs was systematically ranked. • The "Match&Count" and "Similarity" functions provide comprehensive informations. • Predictions can be made for all populations, provided that training data exists. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF