Back to Search Start Over

Machine learning in prediction of genetic risk of nonsyndromic oral clefts in the Brazilian population

Authors :
Ricardo D. Coletta
Carolina de Oliveira Silva
Lucimara Teixeira das Neves
Hercílio Martelli-Júnior
Renato Assis Machado
Source :
Repositório Institucional da USP (Biblioteca Digital da Produção Intelectual), Universidade de São Paulo (USP), instacron:USP
Publication Year :
2020
Publisher :
Springer Science and Business Media LLC, 2020.

Abstract

Genetic variants in multiple genes and loci have been associated with the risk of nonsyndromic cleft lip with or without cleft palate (NSCL ± P). However, the estimation of risk remains challenge, because most of these variants are population-specific rendering the identification of the underlying genetic risk difficult. Herein we examined the use of machine learning network in previously reported single nucleotide polymorphisms (SNPs) to predict risk of NSCL ± P in the Brazilian population. Random forest and neural network methods were applied in 72 SNPs in a case-control sample composed by 722 NSCL ± P and 866 controls for discrimination of NSCL ± P risk. SNP-SNP interactions and functional annotation biological processes associated with the identified NSCL ± P risk genes were verified. Supervised random forest decision trees revealed high scores of importance for the SNPs rs11717284 and rs1875735 in FGF12, rs41268753 in GRHL3, rs2236225 in MTHFD1, rs2274976 in MTHFR, rs2235371 and rs642961 in IRF6, rs17085106 in RHPN2, rs28372960 in TCOF1, rs7078160 in VAX1, rs10762573 and rs2131960 in VCL, and rs227731 in 17q22, with an accuracy of 99% and an error rate of approximately 3% to predict the risk of NSCL ± P. Those same 13 SNPs were considered the most important for the neural network to effectively predict NSCL ± P risk, with an overall accuracy of 94%. Multivariate regression model revealed significant interactions among all SNPs, with an exception of those in FGF12 and MTHFD1. The most significantly biological processes for selected genes were those involved in tissue and epithelium development; neural tube closure; and metabolism of methionine, folate, and homocysteine. Our results provide novel clues for genetic mechanism studies of NSCL ± P and point out for a machine learning model composed by 13 SNPs that is capable of predicting NSCL ± P risk. Although validation is necessary, this genetic panel can be useful in the near future to assist in NSCL ± P genetic counseling.

Details

ISSN :
14363771 and 14326981
Volume :
25
Database :
OpenAIRE
Journal :
Clinical Oral Investigations
Accession number :
edsair.doi.dedup.....41ca8e9faf03d3757839ed277cacedbb
Full Text :
https://doi.org/10.1007/s00784-020-03433-y