1. Classification of rheumatoid arthritis status with candidate gene and genome-wide single-nucleotide polymorphisms using random forests
- Author
-
Kaushal Desai, Zhaohui Cai, Yan V. Sun, Ansar Jawaid, Sharon L.R. Kardia, Huiying Yang, Richard Leff, and Rachael Lawrance
- Subjects
Candidate gene ,business.industry ,lcsh:R ,lcsh:Medicine ,Single-nucleotide polymorphism ,General Medicine ,Computational biology ,medicine.disease ,Bioinformatics ,Genome ,General Biochemistry, Genetics and Molecular Biology ,Random forest ,Proceedings ,Polymorphism (computer science) ,Rheumatoid arthritis ,medicine ,SNP ,lcsh:Q ,lcsh:Science ,business ,Gene - Abstract
Using the North American Rheumatoid Arthritis Consortium (NARAC) candidate gene and genome-wide single-nucleotide polymorphism (SNP) data sets, we applied regression methods and tree-based random forests to identify genetic associations with rheumatoid arthritis (RA) and to predict RA disease status. Several genes were consistently identified as weakly associated with RA without a significant interaction or combinatorial effect with other candidate genes. Using random forests, the tested candidate gene SNPs were not sufficient to predict RA patients and normal subjects with high accuracy. However, using the top 500 SNPs, ranked by the importance score, from the genome-wide linkage panel of 5742 SNPs, we were able to accurately predict RA patients and normal subjects with sensitivity of approximately 90% and specificity of approximately 80%, which was confirmed by five-fold cross-validation. However, in a complete training-testing framework, replication of genetic predictors was less satisfactory; thus, further evaluation of existing methodology and development of new methods are warranted.
- Full Text
- View/download PDF