1. Oncogene Identification using Filter based Approaches between Various Cancer Types in Lung
- Author
-
Netzer, M., Seger, M., Visvanathan, M., Lushington, G. H., Pfeifer, B., and Christian Baumgartner
- Subjects
micro arrays ,lung cancer ,data mining ,feature selection - Abstract
Lung cancer accounts for the most cancer related deaths for men as well as for women. The identification of cancer associated genes and the related pathways are essential to provide an important possibility in the prevention of many types of cancer. In this work two filter approaches, namely the information gain and the biomarker identifier (BMI) are used for the identification of different types of small-cell and non-small-cell lung cancer. A new method to determine the BMI thresholds is proposed to prioritize genes (i.e., primary, secondary and tertiary) using a k-means clustering approach. Sets of key genes were identified that can be found in several pathways. It turned out that the modified BMI is well suited for microarray data and therefore BMI is proposed as a powerful tool for the search for new and so far undiscovered genes related to cancer., {"references":["A. Jemal, R. Siegel, E. Ward, Y. Hao, J. Xu, T. Murray and M.J. Thun,\n\"Cancer Statistics\", CA Cancer J Clin, vol 58, pp. 71-96, 2008.","R.S. Herbst, J.V. Heymach, S.M. Lippman, \"Lung cancer.\" , N Engl J Med., vol. 360, pp. 87-8, 2009.","I.G. Campbell, S.E. Russell, D.Y. Choong, K.G. Montgomery, M.L.\nCiavarella, C.S. Hooi, B.E. Cristiano, P.B. Pearson, W.A. Phillips, \"Mutation of the pik3ca gene in ovarian and breast cancer\", Cancer\nRes., vol. 64, pp. 7678-7681, 2004.","R. Hewett and P. Kijsanayothin, \"Tumor classification ranking from\nmicroarray data\", BMC Genomics, vol. 9, 2008.","C. Baumgartner and A. Graber, \"Data mining and knowledge discovery\nin metabolomics,\" In Masseglia F, Poncelet P, Teisseire M (eds.)\nSuccesses and new directions in data mining. Idea Group Inc, 2007, pp.\n141-166.","M. Netzer, G. Millonig, M. Osl, B. Pfeifer, S. Praun, J. Villinger, W. Vogel and C. Baumgartner, \"A new ensemble-based algorithm for\nidentifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry (IMR-MS)\", Bioinformatics, vol.\n25, pp. 941-947, 2009.","S. Geman, E. Bienenstock and R. Doursat, \"Neural networks and the\nbias/variance dilemma.\", Neural Computation, vol. 4, pp. 1-58, 1992.","P. Putten and M. Someren, \"A bias-variance analysis of a real world\nlearning problem: the coil challenge 2000.\" Machine Learning, vol. 57,\npp. 177-195, 2004.","I.H. Witten and E. Frank, Data Mining: Practical Machine Learning\nTools and Techniques, Second Edition. Morgan Kaufmann Publishers\nInc., San Francisco, CA, USA, 2005.\n[10] R.J. Quinlan, C4.5: Programs for Machine Learning. San Francisco:\nMorgan Kaufmann, 1993.\n[11] C. Baumgartner and D. Baumgartner, \"Biomarker discovery, disease\nclassification, and similarity query processing on high-throughput ms/ms\ndata of inborn errors of metabolism.\" J Biomol Screen, vol. 11, pp. 90-99, 2006.\n[12] NCI,\nhttps://array.nci.nih.gov/caarray/project/details.action?project.experime\nnt.publicIdentifier=woost-00041#; last visited on April 9th, 2009.\n[13] M. Osl, S. Dreiseitl, B. Pfeifer, K. Weinberger, H. Klocker, G. Bartsch,\nG. Schäfer, B. Tilg, A. Graber, and C. Baumgartner, \"A new rule-based\ndata mining algorithm for identifying metabolic markers in prostate\ncancer using tandem mass spectrometry.\" Bioinformatics, vol. 24, pp.\n2908-2914, 2008.\n[14] J.D. Nelson, \"Finding useful questions: on Bayesian diagnosticity,\nprobability, impact, and information gain.\" Psychol Rev., pp. 979-99,\n2005.\n[15] J.B. MacQueen (1967): \"Some Methods for classification and Analysis\nof Multivariate Observations, Proceedings of 5-th Berkeley Symposium\non Mathematical Statistics and Probability\", Berkeley, University of\nCalifornia Press, 1:281-297\n[16] J.A. Hartigan and M.A. Wong, \"A k-means clustering algorithm.\" JR\nStat. Soc. Ser. C-Appl. Stat, 28:100-108, 1979.\n[17] R. Barriot, J. Poix., A. Groppi, A. Barre., N. Goffard., D. Sherman., I.\nDutour and A. de Daruvar, \"New strategy for the representation and the\nintegration of biomolecular knowledge at a cellular scale.\" Nucleic Acids\nRes., vol. 32, pp. 3581-3589, 2004."]}