Back to Search
Start Over
Alternative strategies for selecting subsets of predicting SNPs by LASSO-LARS procedure
- Source :
- BMC Proceedings
- Publisher :
- Springer Nature
-
Abstract
- Background The least absolute shrinkage and selection operator (LASSO) can be used to predict SNP effects. This operator has the desirable feature of including in the model only a subset of explanatory SNPs, which can be useful both in QTL detection and GWS studies. LASSO solutions can be obtained by the least angle regression (LARS) algorithm. The big issue with this procedure is to define the best constraint (t), i.e. the upper bound of the sum of absolute value of the SNP effects which roughly corresponds to the number of SNPs to be selected. Usai et al. (2009) dealt with this problem by a cross-validation approach and defined t as the average number of selected SNPs overall replications. Nevertheless, in small size populations, such estimator could give underestimated values of t. Here we propose two alternative ways to define t and compared them with the "classical" one. Methods The first (strategy 1), was based on 1,000 cross-validations carried out by randomly splitting the reference population (2,000 individuals with performance) into two halves. The value of t was the number of SNPs which occurred in more than 5% of replications. The second (strategy 2), which did not use cross-validations, was based on the minimization of the Cp-type selection criterion which depends on the number of selected SNPs and the expected residual variance. Results The size of the subset of selected SNPs was 46, 189 and 64 for the classical approach, strategy 1 and 2 respectively. Classical and strategy 2 gave similar results and indicated quite clearly the regions were QTL with additive effects were located. Strategy 1 confirmed such regions and added further positions which gave a less clear scenario. Correlation between GEBVs estimated with the three strategies and TBVs in progenies without phenotypes were 0.9237, 0.9000 and 0.9240 for classical, strategy 1 and 2 respectively. Conclusions This suggests that the Cp-type selection criterion is a valid alternative to the cross-validations to define the best constraint for selecting subsets of predicting SNPs by LASSO-LARS procedure.
- Subjects :
- business.industry
Least-angle regression
Estimator
General Medicine
Absolute value (algebra)
Residual
computer.software_genre
Upper and lower bounds
General Biochemistry, Genetics and Molecular Biology
Constraint (information theory)
Proceedings
Lasso (statistics)
Statistics
Feature (machine learning)
Medicine
Data mining
business
computer
Subjects
Details
- Language :
- English
- ISSN :
- 17536561
- Volume :
- 6
- Issue :
- Suppl 2
- Database :
- OpenAIRE
- Journal :
- BMC Proceedings
- Accession number :
- edsair.doi.dedup.....1ded3b5def64f8e3480ce2b247be9edc
- Full Text :
- https://doi.org/10.1186/1753-6561-6-s2-s9