Back to Search
Start Over
Efficient Tuning Parameter Selection By Cross-Validated Score In High Dimensional Models
- Publication Year :
- 2016
- Publisher :
- Zenodo, 2016.
-
Abstract
- As DNA microarray data contain relatively small sample size compared to the number of genes, high dimensional models are often employed. In high dimensional models, the selection of tuning parameter (or, penalty parameter) is often one of the crucial parts of the modeling. Cross-validation is one of the most common methods for the tuning parameter selection, which selects a parameter value with the smallest cross-validated score. However, selecting a single value as an ‘optimal’ value for the parameter can be very unstable due to the sampling variation since the sample sizes of microarray data are often small. Our approach is to choose multiple candidates of tuning parameter first, then average the candidates with different weights depending on their performance. The additional step of estimating the weights and averaging the candidates rarely increase the computational cost, while it can considerably improve the traditional cross-validation. We show that the selected value from the suggested methods often lead to stable parameter selection as well as improved detection of significant genetic variables compared to the tradition cross-validation via real data and simulated data sets.<br />{"references":["J. Zhu and T. Hastie, \"Classification of gene microarrays by penalized\nlogistic regression,\" Biostatistics, vol. 5, no. 3, pp. 427 – 443, 2004.","L. Shen and E. C. Tan, \"Dimension reduction-based penalized logistic\nregression for cancer classification using microarray data,\" IEEE/ACM\nTransactions on Computational Biology and Bioinformatics, vol. 2,\nno. 2, pp. 166 – 175, 2005.","C. Li and H. Li, \"Network-constrained regularization and variable\nselection for analysis of genomic data,\" Bioinformatics, vol. 24, no. 9,\npp. 1175 – 1182, 2008.","W. Pan, B. Xie, and X. Shen, \"Incorporating predictor network in\npenalized regression with application to microarray data,\" Biometrics,\nvol. 66, pp. 474 – 484, 2010.","G. Fort and S. Lambert-Lacroix, \"Classification using partial least\nsquares with penalized logistic regression,\" Bioinformatics, vol. 21,\nno. 7, pp. 1104 – 1111, 2005.","G. C. Cawley and N. L. C. Talbot, \"Gene selection in cancer\nclassification using sparse logistic regression with bayesian\nregularization,\" Bioinformatics, vol. 22, no. 19, pp. 2348 – 2355,\n2006.","L. Waldron, M. Pintilie, M.-S. Tsao, F. A. Shepherd, C. Huttenhower,\nand I. Jurisica, \"Optimized application of penalized regression methods\nto diverse genomic data,\" Bioinformatics, vol. 27, no. 24, pp. 3399 –\n3406, 2011.","P. Breheny and J. Huang, \"Coordinate descent algorithms for nonconvex\npenalized regression, with applications to biological feature selection,\"\nThe Annals of Applied Statistics, vol. 5, no. 457, pp. 232 – 253, 2011.","R. Tibshirani, \"Regression shrinkage and selection via the lasso,\"\nJournal of the Royal Statistical Society. Series B (Methodological),\nvol. 58, no. 1, pp. 267 – 288, 1996.\n[10] J. Friedman, T. Hastie, and R. Tibshirani, \"Regularization paths for\ngeneralized linear models via coordinate descent,\" Journal of Statistical\nSoftware, vol. 33, no. 1, pp. 1 – 22, 2008. [Online]. Available:\nhttp://www.jstatsoft.org/v33/i01/\n[11] N. Simon, J. Friedman, T. Hastie, and R. Tibshirani, \"Regularization\npaths for cox's proportional hazards model via coordinate descent,\"\nJournal of Statistical Software, vol. 39, no. 5, pp. 1 – 13, 2011.\n[Online]. Available: http://www.jstatsoft.org/v39/i05/\n[12] M. Y. Park and T. Hastie, \"L1 regularization path algorithm for\ngeneralized linear models,\" Journal of the Royal Statistical Society.\nSeries B (Methodological), vol. 69, no. 4, pp. 659 – 677, 2007.\n[13] R. Tibshirani and J. Taylor, \"The solution path of the generalized lasso,\"\nAnnals of Statistics, vol. 39, no. 3, pp. 1335 – 1371, 2011.\n[14] M. Stone, \"Cross-validatory choice and the assessment of statistical\npredictions (with discussion),\" Journal of the Royal Statistical Society.\nSeries B (Methodological), vol. 36, no. 2, pp. 111 – 147, 1974.\n[15] S. Geisser, \"The predictive sample reuse method with applications,\"\nJournal of the American Statistical Association, vol. 70, no. 350, pp.\n320 – 328, 1975.\n[16] L. J. Buturovi´c, \"Pcp: a program for supervised classification of gene\nexpression profiles,\" Bioinformatics, vol. 22, no. 2, pp. 245 – 247, 2006.\n[17] V. V. Belle, K. Pelckmans, S. V. Huffel, and J. A. K. Suykens, \"Improved\nperformance on high-dimensional survival data by application of\nsurvival-svm,\" Bioinformatics, vol. 27, no. 1, pp. 87 – 94, 2011.\n[18] A.-L. Boulesteix, C. Porzelius, and M. Daumer, \"Microarray-based\nclassification and clinical predictors: on combined classifiers and\nadditional predictive value,\" Bioinformatics, vol. 24, no. 15, pp. 1698 –\n1706, 2008.\n[19] W. Pan and X. Shen, \"Penalized model-based clustering with application\nto variable selection,\" Journal of Machine Learning Research, vol. 8, pp.\n1145 – 1164, 2007.\n[20] T. Hancock, I. Takigawa, and H. Mamitsuka, \"Mining metabolic\npathways through gene expression,\" Bioinformatics, vol. 26, no. 17, pp.\n2128 – 2135, 2010.\n[21] S. Arlot and A. Celisse, \"A survey of cross-validation procedures for\nmodel selection,\" Statistics Surveys, vol. 4, pp. 40 – 79, 2010.\n[22] B. Efron and R. Tibshirani, \"Improvements on cross-validation:\nThe .632+ bootstrap method,\" Journal of the American Statistical\nAssociation, vol. 92, no. 438, pp. 548 – 560, 1997.\n[23] U. Braga-Neto, R. Hashimoto, E. R. Dougherty, D. V. Nguyen, and\nR. J. Carroll, \"Is cross-validation better than resubstitution for ranking\ngenes?\" Bioinformatics, vol. 20, no. 2, pp. 253 – 258, 2004.\n[24] B. Scholk¨opf, K. Sung, C. Burges, T. P. F. Girosi, P. Niyogi, and\nV. Vapnik., \"Comparing support vector machines with gaussian kernels\nto radial basis function classifiers,\" IEEE Trans. Sign. Processing,\nvol. 45, pp. 2758 – 2765, 1997.\n[25] E. Dimitriadou, K. Hornik, F. Leisch, D. Meyer, and A. Weingessel,\n\"e1071: Misc functions of the department of statistics (e1071),\" TU\nWien,Version 1.5-11, Tech. Rep., 2005.\n[26] A. Karatzoglou, A. Smola, K. Hornik, and A. Zeileis, \"kernlab –\nan S4 package for kernel methods in R,\" Journal of Statistical\nSoftware, vol. 11, no. 9, pp. 1 – 20, 2004. [Online]. Available:\nhttp://www.jstatsoft.org/v11/i09/\n[27] Y. Guo, T. Hastie, and R. Tibshirani, \"Regularized linear discriminant\nanalysis and its application in microarrays,\" Biostatistics, vol. 8, pp. 86\n– 100, 2007.\n[28] G. Schwarz, \"Estimating the dimension of a model,\" The Annals of\nStatistics, vol. 6, no. 2, pp. 461 – 464, 1978.\n[29] H. Akaike, \"A new look at the statistical model identification,\" IEEE\nTransactions on Automatic Control, vol. 19, no. 6, pp. 716 – 723,\n1974.\n[30] J. Chen and Z. Chen, \"Extended bayesian information criteria for model\nselection with large model spaces,\" Biometrika, vol. 95, no. 3, pp. 759\n– 771, 2008.\n[31] H. Wang, B. Li, and C. Leng, \"Shrinkage tuning parameter selection\nwith a diverging number of parameters,\" Journal of the Royal Statistical\nSociety. Series B (Methodological), vol. 71, no. 3, pp. 671 – 683, 2009.\n[32] J. Chen and Z. Chen, \"Extended BIC for small-n-large-p sparse GLM,\"\nStatistica Sinica, vol. 22, pp. 555 – 574, 2012.\n[33] A. E. Hoerl and R. W. Kennard, \"Ridge regression: Biased estimation\nfor nonorthogonal problems,\" Technometrics, vol. 12, no. 1, pp. 55 –\n67, 1970.\n[34] A. Karatzoglou, D. Meyer, and K. Hornik, \"Support vector machines in\nr,\" Journal of Statistical Software, vol. 15, no. 9, pp. 1 – 28, 4 2006.\n[35] T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov,\nH. Coller, M. Loh, J. Downing, C. Caligiuri, M.A.and Bloomfield, and\nE. Lander, \"Molecular classification of cancer: class discovery and class\nprediction by gene expression monitoring.\" Science, vol. 286, pp. 531 –\n537, 1999.\n[36] U. Alon, N. Barkai, D. Notterman, K. Gish, S. Mack, and J. Levine,\n\"Broad patterns of gene expression revealed by clustering analysis of\ntumor and normal colon tissues probed by oligonucleotide arrays.\"\nProceedings of the National Academy of Sciences of the USA, vol. 96,\npp. 6745 – 6750, 1999. [37] A. Alizadeh, M. Eisen, R. Davis, C. Ma, I. Lossos, A. Rosenwald,\nJ. Boldrick, H. Sabet, T. Tran, and X. e. a. Yu, \"Distinct types of diffuse\nlarge b-cell lymphoma identified by gene expression profiling.\" Nature,\nvol. 403, no. 6769, pp. 503 – 511, 2000.\n[38] J. Khan, J. Wei, M. Ringner, L. Saal, M. Ladanyi, F. Westermann,\nF. Berthold, M. Schwab, and C. e. a. Antonescu, \"Classification and\ndiagnostic prediction of cancer using gene expression profiling and\nartificial neural networks.\" Nature Medicine, vol. 7, pp. 673 – 679, 2001.\n[39] D. Witten and R. Tibshirani, \"Penalized classification using fisher's\nlinear discriminant,\" Journal of the Royal Statistical Society. Series B\n(Methodological), vol. 73, no. 5, pp. 753 – 772, 2011.\n[40] H. Zou, \"The adaptive lasso and its oracle properties,\" Journal of the\nAmerican Statistical Association, vol. 101, no. 476, pp. 1418 – 1429,\n2006.\n[41] L. W. Hahn, M. D. Ritchie, and J. H. Moore, \"Multifactor dimensionality\nreduction software for detecting genegene and geneenvironment\ninteractions,\" Bioinformatics, vol. 19, no. 3, pp. 376 – 382, 2003.\n[42] C. Kooperberg, M. LeBlanc, J. Y. Dai, and I. Rajapakse, \"Structures and\nassumptions: Strategies to harness gene x gene and gene x environment\ninteractions in GWAS,\" Statistical Science, vol. 24, no. 4, pp. 472 – 488,\n2009."]}
Details
- Language :
- English
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....01ef0b663d7ab677095bfdf95161463f
- Full Text :
- https://doi.org/10.5281/zenodo.1111682