Start Over

Using simulated microhaplotype genotyping data to evaluate the value of machine learning algorithms for inferring DNA mixture contributor numbers.

Authors :: Wang, Haoyu
Zhu, Qiang
Huang, Yuguo
Cao, Yueyan
Hu, Yuhan
Wei, Yifan
Wang, Yuting
Hou, Tingyun
Shan, Tiantian
Dai, Xuan
Zhang, Xiaokang
Wang, Yufang
Zhang, Ji
Source :: Forensic Science International: Genetics; Mar2024, Vol. 69, pN.PAG-N.PAG, 1p
Publication Year :: 2024
Abstract: Inferring the number of contributors (NoC) is a crucial step in interpreting DNA mixtures, as it directly affects the accuracy of the likelihood ratio calculation and the assessment of evidence strength. However, obtaining the correct NoC in complex DNA mixtures remains challenging due to the high degree of allele sharing and dropout. This study aimed to analyze the impact of allele sharing and dropout on NoC inference in complex DNA mixtures when using microhaplotypes (MH). The effectiveness and value of highly polymorphic MH for NoC inference in complex DNA mixtures were evaluated through comparing the performance of three NoC inference methods, including maximum allele count (MAC) method, maximum likelihood estimation (MLE) method, and random forest classification (RFC) algorithm. In this study, we selected the top 100 most polymorphic MH from the Southern Han Chinese (CHS) population, and simulated over 40 million complex DNA mixture profiles with the NoC ranging from 2 to 8. These profiles involve unrelated individuals (RM type) and related pairs of individuals, including parent-offspring pairs (PO type), full-sibling pairs (FS type), and second-degree kinship pairs (SE type). Our results indicated that how the number of detected alleles in DNA mixture profiles varied with the markers' polymorphism, kinship's involvement, NoC, and dropout settings. Across different types of DNA mixtures, the MAC and MLE methods performed best in the RM type, followed by SE, FS, and PO types, while RFC models showed the best performance in the PO type, followed by RM, SE, and FS types. The recall of all three methods for NoC inference were decreased as the NoC and dropout levels increased. Furthermore, the MLE method performed better at low NoC, whereas RFC models excelled at high NoC and/or high dropout levels, regardless of the availability of a priori information about related pairs of individuals in DNA mixtures. However, the RFC models which considered the aforementioned priori information and were trained specifically on each type of DNA mixture profiles, outperformed RFC_ALL model that did not consider such information. Finally, we provided recommendations for model building when applying machine learning algorithms to NoC inference. • Evaluated the potential of the MHs for NoC inference in complex DNA mixtures based on over 40 million simulated profiles. • Determined the effects of genetic marker polymorphism, relatedness, NoC, and dropout on the number of alleles detected. • Clarified in which complex samples the MAC method, MLE method, and RFC model perform best in NoC inference. • Provide recommendations for building machine learning models based on large-scale data for NoC inference. [ABSTRACT FROM AUTHOR]

Subjects :: MACHINE learning
MAXIMUM likelihood statistics
MIXTURES
CHINESE people
GENETIC polymorphisms
MICROSATELLITE repeats

Details

Language :: English
ISSN :: 18724973
Volume :: 69
Database :: Supplemental Index
Journal :: Forensic Science International: Genetics
Publication Type :: Academic Journal
Accession number :: 175026374
Full Text :: https://doi.org/10.1016/j.fsigen.2024.103008

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Using simulated microhaplotype genotyping data to evaluate the value of machine learning algorithms for inferring DNA mixture contributor numbers.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Using simulated microhaplotype genotyping data to evaluate the value of machine learning algorithms for inferring DNA mixture contributor numbers.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources