48 results on '"Medical datasets"'
Search Results
2. A novel MissForest-based missing values imputation approach with recursive feature elimination in medical applications.
- Author
-
Hu, Ya-Han, Wu, Ruei-Yan, Lin, Yen-Cheng, and Lin, Ting-Yin
- Subjects
- *
STANDARD deviations , *MISSING data (Statistics) , *FEATURE selection , *K-nearest neighbor classification , *STATISTICAL significance - Abstract
Background: Missing values in datasets present significant challenges for data analysis, particularly in the medical field where data accuracy is crucial for patient diagnosis and treatment. Although MissForest (MF) has demonstrated efficacy in imputation research and recursive feature elimination (RFE) has proven effective in feature selection, the potential for enhancing MF through RFE integration remains unexplored. Methods: This study introduces a novel imputation method, "recursive feature elimination-MissForest" (RFE-MF), designed to enhance imputation quality by reducing the impact of irrelevant features. A comparative analysis is conducted between RFE-MF and four classical imputation methods: mean/mode, k-nearest neighbors (kNN), multiple imputation by chained equations (MICE), and MF. The comparison is carried out across ten medical datasets containing both numerical and mixed data types. Different missing data rates, ranging from 10 to 50%, are evaluated under the missing completely at random (MCAR) mechanism. The performance of each method is assessed using two evaluation metrics: normalized root mean squared error (NRMSE) and predictive fidelity criterion (PFC). Additionally, paired samples t-tests are employed to analyze the statistical significance of differences among the outcomes. Results: The findings indicate that RFE-MF demonstrates superior performance across the majority of datasets when compared to four classical imputation methods (mean/mode, kNN, MICE, and MF). Notably, RFE-MF consistently outperforms the original MF, irrespective of variable type (numerical or categorical). Mean/mode imputation exhibits consistent performance across various scenarios. Conversely, the efficacy of kNN imputation fluctuates in relation to varying missing data rates. Conclusion: This study demonstrates that RFE-MF holds promise as an effective imputation method for medical datasets, providing a novel approach to addressing missing data challenges in medical applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Handling imbalanced medical datasets: review of a decade of research.
- Author
-
Salmi, Mabrouka, Atif, Dalia, Oliva, Diego, Abraham, Ajith, and Ventura, Sebastian
- Abstract
Machine learning and medical diagnostic studies often struggle with the issue of class imbalance in medical datasets, complicating accurate disease prediction and undermining diagnostic tools. Despite ongoing research efforts, specific characteristics of medical data frequently remain overlooked. This article comprehensively reviews advances in addressing imbalanced medical datasets over the past decade, offering a novel classification of approaches into preprocessing, learning levels, and combined techniques. We present a detailed evaluation of the medical datasets and metrics used, synthesizing the outcomes of previous research to reflect on the effectiveness of the methodologies despite methodological constraints. Our review identifies key research trends and offers speculative insights and research trajectories to enhance diagnostic performance. Additionally, we establish a consensus on best practices to mitigate persistent methodological issues, assisting the development of generalizable, reliable, and consistent results in medical diagnostics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Recent Advances in Large Language Models for Healthcare.
- Author
-
Nassiri, Khalid and Akhloufi, Moulay A.
- Subjects
- *
LANGUAGE models , *MEDICAL care , *MEDICAL practice , *MEDICAL databases , *MEDICAL records - Abstract
Recent advances in the field of large language models (LLMs) underline their high potential for applications in a variety of sectors. Their use in healthcare, in particular, holds out promising prospects for improving medical practices. As we highlight in this paper, LLMs have demonstrated remarkable capabilities in language understanding and generation that could indeed be put to good use in the medical field. We also present the main architectures of these models, such as GPT, Bloom, or LLaMA, composed of billions of parameters. We then examine recent trends in the medical datasets used to train these models. We classify them according to different criteria, such as size, source, or subject (patient records, scientific articles, etc.). We mention that LLMs could help improve patient care, accelerate medical research, and optimize the efficiency of healthcare systems such as assisted diagnosis. We also highlight several technical and ethical issues that need to be resolved before LLMs can be used extensively in the medical field. Consequently, we propose a discussion of the capabilities offered by new generations of linguistic models and their limitations when deployed in a domain such as healthcare. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Cardiovascular Disease Prediction Using Machine Learning
- Author
-
Sesha Talpa Sai, P. H. V., Chaitanya Lahari, M. L. R., Shagin, C. M., Vineeth, C. A., Vijayakumar, P., Naveen Kumar, G. S., Bhaumik, Amiya, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Bhateja, Vikrant, editor, Tang, Jinshan, editor, Polkowski, Zdzislaw, editor, Simic, Milan, editor, and Chakravarthy, V. V. S. S. S., editor
- Published
- 2024
- Full Text
- View/download PDF
6. A Method to Transform Datasets into Knowledge Graphs
- Author
-
Bravo, Maricela, Barbosa, José L., Sánchez-Martínez, Leonardo D., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
- Published
- 2024
- Full Text
- View/download PDF
7. Demographic Bias in Medical Datasets for Clinical AI
- Author
-
Bojana Velichkovska, Sandra Petrushevska, Bisera Runcheva, and Marija Kalendar
- Subjects
artificial intelligence ,machine learning ,gender bias ,age bias ,demographic bias ,medical datasets ,Electronic computers. Computer science ,QA75.5-76.95 ,Technology - Abstract
Numerous studies have detailed instances of demographic bias in medical data and artificial intelligence (AI) systems used in medical setting. Moreover, these studies have also shown how these biases can significantly impact the access to and quality of care, as well as quality of life for patients belonging in certain under-represented groups. These groups are then being marginalised because of stigma based on demographic information such as race, gender, age, ability, and so on. Since the performance of AI models is highly dependent on the quality of data used to train the algorithms, it is a necessary precaution to analyse any potential bias inadvertently existent in the data, in order to mitigate the consequences of using biased data in creating medical AI systems. For that reason, we propose a machine learning (ML) analysis which receives patient biosignals as input information and analyses them for two types of demographic bias, namely gender and age bias. The analysis is performed using several ML algorithms (Logistic Regression, Decision Trees, Random Forest, and XGBoost). The trained models are evaluated with a holdout technique and by observing the confusion matrixes and the classification reports. The results show that the models are capable of detecting bias in data. This makes the proposed approach one way to identify bias in data, especially throughout the process of building AI-based medical systems. Consequently, the proposed pipeline can be used as a mitigation technique for bias analysis in data.
- Published
- 2023
- Full Text
- View/download PDF
8. An Empirical Comparison of Classification Machine Learning Models Using Medical Datasets
- Author
-
Saketha Rama, B. V., Suryanarayana, G., Ansari, Mohd Dilshad, Begum, Ruqqaiya, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Kumar, Amit, editor, Gunjan, Vinit Kumar, editor, Hu, Yu-Chen, editor, and Senatore, Sabrina, editor
- Published
- 2023
- Full Text
- View/download PDF
9. A Novel Probabilistic Approach Based on Trigonometric Function: Model, Theory with Practical Applications.
- Author
-
Odhah, Omalsad Hamood, Alshanbari, Huda M., Ahmad, Zubair, Khan, Faridoon, and El-Bagoury, Abd Al-Aziz Hosni
- Subjects
- *
TRIGONOMETRIC functions , *MONTE Carlo method , *MAXIMUM likelihood statistics , *WEIBULL distribution , *COTANGENT function - Abstract
Proposing new families of probability models for data modeling in applied sectors is a prominent research topic. This paper also proposes a new method based on the trigonometric function to derive the updated form of the existing probability models. The proposed family is called the cotangent trigonometric-G family of distributions. Based on the cotangent trigonometric-G method, a new version of the Weibull model, namely, the cotangent trigonometric Weibull distribution, is studied. Certain mathematical properties of the cotangent trigonometric-G family are derived. The estimators of the cotangent trigonometric-G distributions are obtained via the maximum likelihood method. The Monte Carlo simulation study is conducted to assess the performances of the estimators. Finally, two applications from the health sector are considered to illustrate the cotangent trigonometric-G method. Based on seven evaluating criteria, it is observed that the cotangent trigonometric-G significantly improves the fitting power of the existing models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
10. A review on Deep Learning approaches for low-dose Computed Tomography restoration.
- Author
-
Kulathilake, K. A. Saneera Hemantha, Abdullah, Nor Aniza, Sabri, Aznul Qalid Md, and Lai, Khin Wee
- Subjects
COMPUTED tomography ,DEEP learning ,RADIATION exposure ,GENERATIVE adversarial networks ,SIGNAL-to-noise ratio ,DIAGNOSTIC imaging - Abstract
Computed Tomography (CT) is a widely use medical image modality in clinical medicine, because it produces excellent visualizations of fine structural details of the human body. In clinical procedures, it is desirable to acquire CT scans by minimizing the X-ray flux to prevent patients from being exposed to high radiation. However, these Low-Dose CT (LDCT) scanning protocols compromise the signal-to-noise ratio of the CT images because of noise and artifacts over the image space. Thus, various restoration methods have been published over the past 3 decades to produce high-quality CT images from these LDCT images. More recently, as opposed to conventional LDCT restoration methods, Deep Learning (DL)-based LDCT restoration approaches have been rather common due to their characteristics of being data-driven, high-performance, and fast execution. Thus, this study aims to elaborate on the role of DL techniques in LDCT restoration and critically review the applications of DL-based approaches for LDCT restoration. To achieve this aim, different aspects of DL-based LDCT restoration applications were analyzed. These include DL architectures, performance gains, functional requirements, and the diversity of objective functions. The outcome of the study highlights the existing limitations and future directions for DL-based LDCT restoration. To the best of our knowledge, there have been no previous reviews, which specifically address this topic. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. Federated Learning Framework for IID and Non-IID datasets of Medical Images
- Author
-
Kavitha Srinivasan, Sainath Prasanna, Rohit Midha, and Shraddhaa Mohan
- Subjects
Federated Learning ,Federated Learning framework ,Classification task ,Object detection task ,Medical datasets ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Advances have been made in the field of Machine Learning showing that it is an effective tool that can be used for solving real world problems. This success is hugely attributed to the availability of accessible data which is not the case for many fields such as healthcare, a primary reason being the issue of privacy. Federated Learning (FL) is a technique that can be used to overcome the limitation of availability of data at a central location and allows for training machine learning models on private data or data that cannot be directly accessed. It allows the use of data to be decoupled from the governance (or control) over data. In this paper, we present an easy-to-use framework that provides a complete pipeline to let researchers and end users train any model on image data from various sources in a federated manner. We also show a comparison in results between models trained in a federated fashion and models trained in a centralized fashion for Independent and Identically Distributed (IID) and non IID datasets. The Intracranial Brain Hemorrhage dataset and the Pneumonia Detection dataset provided by the Radiological Society of North America (RSNA) are used for validating the FL framework and comparative analysis.
- Published
- 2023
- Full Text
- View/download PDF
12. Early diagnosis of diabetes mellitus using data mining and machine learning techniques.
- Author
-
Deepa, K. and Ranjeeth Kumar, C.
- Subjects
- *
DATA mining , *MACHINE learning , *DIAGNOSIS of diabetes , *DIABETES , *EARLY diagnosis , *SUPERVISED learning , *HEBBIAN memory , *BIOTECHNOLOGY - Abstract
The remarkable developments in biotechnology as well as the health sciences have resulted in the production of an enormous amount of data, including high-throughput screening genomics information and clinical information obtained through extensive electronic health records (EHRs). The application of data mining and machine learning techniques in the biosciences is today more vital than ever to achieving this objective as attempts are made to intelligently translate all readily available data into knowledge. Diabetes mellitus (DM), a group of metabolic disorders, is well known to have a serious detrimental effect on population lives all over the world. Large-scale research into all aspects of diabetic has resulted in the production of enormous amounts of data (detection, etiopathophysiology, therapy, etc.). The goal of the current study is to conduct a thorough examination of the use of machine learning, data mining methods and tools in the field of diabetes research, with the first classification making an appearance to be the most popular. These applications relate to a Statistical model and Diagnosis, b) Diabetic Complications, c) Multiple genes Background and Environment, and e) Free Healthcare and Management. Numerous machine learning algorithms were applied. 85% of the methods used were supervised learning approaches, whereas 15% were uncontrolled ones, including association rules. Developed on improved support vector machines, the most successful and widely used algorithm (SVM). Medical datasets were predominantly used in terms of data kind. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. An Improved Binary Quantum-based Avian Navigation Optimizer Algorithm to Select Effective Feature Subset from Medical Data: A COVID-19 Case Study
- Author
-
Fatahi, Ali, Nadimi-Shahraki, Mohammad H., and Zamani, Hoda
- Published
- 2024
- Full Text
- View/download PDF
14. Binary Approaches of Quantum-Based Avian Navigation Optimizer to Select Effective Features from High-Dimensional Medical Data.
- Author
-
Nadimi-Shahraki, Mohammad H., Fatahi, Ali, Zamani, Hoda, and Mirjalili, Seyedali
- Subjects
- *
FEATURE selection , *METAHEURISTIC algorithms , *TRANSFER functions , *LYMPHANGIOGRAPHY , *SWARM intelligence , *PROSTATE tumors , *SCALABILITY - Abstract
Many metaheuristic approaches have been developed to select effective features from different medical datasets in a feasible time. However, most of them cannot scale well to large medical datasets, where they fail to maximize the classification accuracy and simultaneously minimize the number of selected features. Therefore, this paper is devoted to developing an efficient binary version of the quantum-based avian navigation optimizer algorithm (QANA) named BQANA, utilizing the scalability of the QANA to effectively select the optimal feature subset from high-dimensional medical datasets using two different approaches. In the first approach, several binary versions of the QANA are developed using S-shaped, V-shaped, U-shaped, Z-shaped, and quadratic transfer functions to map the continuous solutions of the canonical QANA to binary ones. In the second approach, the QANA is mapped to binary space by converting each variable to 0 or 1 using a threshold. To evaluate the proposed algorithm, first, all binary versions of the QANA are assessed on different medical datasets with varied feature sizes, including Pima, HeartEW, Lymphography, SPECT Heart, PenglungEW, Parkinson, Colon, SRBCT, Leukemia, and Prostate tumor. The results show that the BQANA developed by the second approach is superior to other binary versions of the QANA to find the optimal feature subset from the medical datasets. Then, the BQANA was compared with nine well-known binary metaheuristic algorithms, and the results were statistically assessed using the Friedman test. The experimental and statistical results demonstrate that the proposed BQANA has merit for feature selection from medical datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
15. A review on Deep Learning approaches for low-dose Computed Tomography restoration
- Author
-
K. A. Saneera Hemantha Kulathilake, Nor Aniza Abdullah, Aznul Qalid Md Sabri, and Khin Wee Lai
- Subjects
Deep Learning ,Generative adversarial networks ,Optimization ,Medical datasets ,Structure preservation ,Denoising ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Abstract Computed Tomography (CT) is a widely use medical image modality in clinical medicine, because it produces excellent visualizations of fine structural details of the human body. In clinical procedures, it is desirable to acquire CT scans by minimizing the X-ray flux to prevent patients from being exposed to high radiation. However, these Low-Dose CT (LDCT) scanning protocols compromise the signal-to-noise ratio of the CT images because of noise and artifacts over the image space. Thus, various restoration methods have been published over the past 3 decades to produce high-quality CT images from these LDCT images. More recently, as opposed to conventional LDCT restoration methods, Deep Learning (DL)-based LDCT restoration approaches have been rather common due to their characteristics of being data-driven, high-performance, and fast execution. Thus, this study aims to elaborate on the role of DL techniques in LDCT restoration and critically review the applications of DL-based approaches for LDCT restoration. To achieve this aim, different aspects of DL-based LDCT restoration applications were analyzed. These include DL architectures, performance gains, functional requirements, and the diversity of objective functions. The outcome of the study highlights the existing limitations and future directions for DL-based LDCT restoration. To the best of our knowledge, there have been no previous reviews, which specifically address this topic.
- Published
- 2021
- Full Text
- View/download PDF
16. Diabetes Disease Prediction and it’s Time Prediction.
- Author
-
M., Ayush, Rama, K. S., Preetham, Jatin, Ramakrishna, Amith, and S., Rajini
- Subjects
K-nearest neighbor classification ,DIABETES ,FORECASTING - Abstract
Diabetes is a serious medical issue at the present. Detection of diabetes play an important role.This project mainly deals with prediction of diabetes and its time prediction. Several IEEE papers are referred to select the best algorithm that can be used for diabetes prediction. In this project we use KNN algorithm for diabetes prediction. Diabetes time prediction can be said to be a new concept that we implemented using KNN algorithm where it predicts when a patient may suffer from diabetes within a range of months. Diabetes and its time prediction are done using a set of parameters that are inputted to that KNN algorithm. This project is mainly for physicians to make their work easy. The parameters of the patient are uploaded to the database by the receptionist. The admin of the project creates account for doctors and receptionist. The doctors can use the present parameters to predict diabetes and its time prediction and recommend the treatment for the patient that he/she has to undergo. The patient can only view the treatment recommended for them. [ABSTRACT FROM AUTHOR]
- Published
- 2022
17. Advanced extreme learning machine‐based ensemble classification scheme with enhanced data perturbation for human DNA sequences.
- Author
-
Janakiraman, Sengathir and Deva Priya, Maruthakutty
- Subjects
- *
DNA sequencing , *HUMAN DNA , *MACHINE learning , *CLASSIFICATION , *DATA protection - Abstract
The dramatic growth in machine learning has brought in significant features and quantified non‐linear associations in the data derived from sensitive medical datasets. The data should be preserved without influencing the associated classifications by applying a robust, effective and reliable data perturbation technique before enforcing ensemble classification. In this paper, an Integrated Condensation Scheme imposed Privacy Preserving Rotation‐based Data Perturbation and Ensemble Classification (ICS‐PPR‐DPEC) is proposed for ensuring privacy of such sensitive data. Condensation Algorithm‐based Data Perturbation is used for constructing homogenous groups determined from the distance between tuples. It also generates a rotation matrix for conducting perturbation that ensures higher data sensitivity protection before it is sent for classification. Advanced Extreme Learning Machine‐based Ensemble Classification Scheme includes kernel, norm‐optimized and regularized Extreme Learning Machine (ELM)‐based classifiers for attaining predominant classification accuracy in identifying human DNA sequences. This approach facilitates classification by constructing ensembles which are trained through randomly resampled ELM classifiers. It includes an objective function that systematically improves the accuracy and diversity among resulting ensembles. The experimental results of the proposed ICS‐PPR‐DPEC are found to be excellent in terms of classification Accuracy, Precision, Recall, and Kappa statistic when compared to the benchmarked techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
18. The Feature Selection Effect on Missing Value Imputation of Medical Datasets.
- Author
-
Liu, Chia-Hui, Tsai, Chih-Fong, Sue, Kuen-Liang, and Huang, Min-Wei
- Subjects
STATISTICAL learning ,PATTERN recognition systems ,MACHINE learning ,DECISION trees ,INFORMATION modeling ,FEATURE selection ,MEDICAL databases - Abstract
In practice, many medical domain datasets are incomplete, containing a proportion of incomplete data with missing attribute values. Missing value imputation can be performed to solve the problem of incomplete datasets. To impute missing values, some of the observed data (i.e., complete data) are generally used as the reference or training set, and then the relevant statistical and machine learning techniques are employed to produce estimations to replace the missing values. Since the collected dataset usually contains a certain number of feature dimensions, it is useful to perform feature selection for better pattern recognition. Therefore, the aim of this paper is to examine the effect of performing feature selection on missing value imputation of medical datasets. Experiments are carried out on five different medical domain datasets containing various feature dimensions. In addition, three different types of feature selection methods and imputation techniques are employed for comparison. The results show that combining feature selection and imputation is a better choice for many medical datasets. However, the feature selection algorithm should be carefully chosen in order to produce the best result. Particularly, the genetic algorithm and information gain models are suitable for lower dimensional datasets, whereas the decision tree model is a better choice for higher dimensional datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
19. B-MFO: A Binary Moth-Flame Optimization for Feature Selection from Medical Datasets
- Author
-
Mohammad H. Nadimi-Shahraki, Mahdis Banaie-Dezfouli, Hoda Zamani, Shokooh Taghian, and Seyedali Mirjalili
- Subjects
optimization ,binary metaheuristic algorithms ,swarm intelligence algorithms ,feature selection ,medical datasets ,transfer function ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Advancements in medical technology have created numerous large datasets including many features. Usually, all captured features are not necessary, and there are redundant and irrelevant features, which reduce the performance of algorithms. To tackle this challenge, many metaheuristic algorithms are used to select effective features. However, most of them are not effective and scalable enough to select effective features from large medical datasets as well as small ones. Therefore, in this paper, a binary moth-flame optimization (B-MFO) is proposed to select effective features from small and large medical datasets. Three categories of B-MFO were developed using S-shaped, V-shaped, and U-shaped transfer functions to convert the canonical MFO from continuous to binary. These categories of B-MFO were evaluated on seven medical datasets and the results were compared with four well-known binary metaheuristic optimization algorithms: BPSO, bGWO, BDA, and BSSA. In addition, the convergence behavior of the B-MFO and comparative algorithms were assessed, and the results were statistically analyzed using the Friedman test. The experimental results demonstrate a superior performance of B-MFO in solving the feature selection problem for different medical datasets compared to other comparative algorithms.
- Published
- 2021
- Full Text
- View/download PDF
20. Classification of Diseases Using Machine Learning Algorithms: A Comparative Study
- Author
-
Marco-Antonio Moreno-Ibarra, Yenny Villuendas-Rey, Miltiadis D. Lytras, Cornelio Yáñez-Márquez, and Julio-César Salgado-Ramírez
- Subjects
meta-learning ,supervised classifiers ,medical datasets ,data complexity ,Mathematics ,QA1-939 - Abstract
Machine learning in the medical area has become a very important requirement. The healthcare professional needs useful tools to diagnose medical illnesses. Classifiers are important to provide tools that can be useful to the health professional for this purpose. However, questions arise: which classifier to use? What metrics are appropriate to measure the performance of the classifier? How to determine a good distribution of the data so that the classifier does not bias the medical patterns to be classified in a particular class? Then most important question: does a classifier perform well for a particular disease? This paper will present some answers to the questions mentioned above, making use of classification algorithms widely used in machine learning research with datasets relating to medical illnesses under the supervised learning scheme. In addition to state-of-the-art algorithms in pattern classification, we introduce a novelty: the use of meta-learning to determine, a priori, which classifier would be the ideal for a specific dataset. The results obtained show numerically and statistically that there are reliable classifiers to suggest medical diagnoses. In addition, we provide some insights about the expected performance of classifiers for such a task.
- Published
- 2021
- Full Text
- View/download PDF
21. An Algorithm for Clustering and Classification of Medical Datasets Using k-Means and Radial Basis Function Neural Networks
- Author
-
Madhu, G.
- Published
- 2016
- Full Text
- View/download PDF
22. Classification and Feature Selection Method for Medical Datasets by Brain Storm Optimization Algorithm and Support Vector Machine.
- Author
-
Tuba, Eva, Strumberger, Ivana, Bezdan, Timea, Bacanin, Nebojsa, and Tuba, Milan
- Subjects
SUPPORT vector machines ,FEATURE selection ,MATHEMATICAL optimization ,BRAINSTORMING ,COMPUTER software ,BRAIN-computer interfaces ,MEDICAL databases - Abstract
Medicine is one of the sciences where development of computer science enables a lot of improvements. Usage of computers in medicine increases the accuracy and speeds up processes of data analysis and setting the diagnoses. Nowadays, numerous computer aided diagnostic systems exist and machine learning algorithms have significant role in them. Faster and more accurate systems are necessary. Common machine learning task that is part of computer aided diagnostic systems and different medical data analytic software packages is classification. In order to obtain better classification accuracy it is important to choose feature set and proper parameters for the classification model. Medical datasets often have large feature sets where many features are in correlation with others thus it is important to reduce the feature set. In this paper we propose adjusted brain storm optimization algorithm for feature selection in medical datasets. Classification was done by support vector machine where its parameters are optimized also by brain storm optimization algorithm. The proposed method is tested on standard publicly available medical datasets and compared to other state-of-the-art methods. By analyzing the obtained results it was shown that the proposed method achieves higher accuracy and reduce the number of feature needed. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
23. Classification of medical datasets using back propagation neural network powered by genetic-based features elector.
- Author
-
Lafta, Hussein Attya, Hasan, Zainab Falah, and Ayoob, Noor Kadhim
- Subjects
MEDICAL coding ,BACK propagation ,COMPUTER engineering ,COMPUTER systems ,DATA mining ,MYOCARDIAL infarction - Abstract
The classification is a one of the most indispensable domains in the data mining and machine learning. The classification process has a good reputation in the area of diseases diagnosis by computer systems where the progress in smart technologies of computer can be invested in diagnosing various diseases based on data of real patients documented in databases. The paper introduced a methodology for diagnosing a set of diseases including two types of cancer (breast cancer and lung), two datasets for diabetes and heart attack. Back Propagation Neural Network plays the role of classifier. The performance of neural net is enhanced by using the genetic algorithm which provides the classifier with the optimal features to raise the classification rate to the highest possible. The system showed high efficiency in dealing with databases differs from each other in size, number of features and nature of the data and this is what the results illustrated, where the ratio of the classification reached to 100% in most datasets). [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
24. The Feature Selection Effect on Missing Value Imputation of Medical Datasets
- Author
-
Chia-Hui Liu, Chih-Fong Tsai, Kuen-Liang Sue, and Min-Wei Huang
- Subjects
missing values ,imputation ,feature selection ,data mining ,medical datasets ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
In practice, many medical domain datasets are incomplete, containing a proportion of incomplete data with missing attribute values. Missing value imputation can be performed to solve the problem of incomplete datasets. To impute missing values, some of the observed data (i.e., complete data) are generally used as the reference or training set, and then the relevant statistical and machine learning techniques are employed to produce estimations to replace the missing values. Since the collected dataset usually contains a certain number of feature dimensions, it is useful to perform feature selection for better pattern recognition. Therefore, the aim of this paper is to examine the effect of performing feature selection on missing value imputation of medical datasets. Experiments are carried out on five different medical domain datasets containing various feature dimensions. In addition, three different types of feature selection methods and imputation techniques are employed for comparison. The results show that combining feature selection and imputation is a better choice for many medical datasets. However, the feature selection algorithm should be carefully chosen in order to produce the best result. Particularly, the genetic algorithm and information gain models are suitable for lower dimensional datasets, whereas the decision tree model is a better choice for higher dimensional datasets.
- Published
- 2020
- Full Text
- View/download PDF
25. A review on Deep Learning approaches for low-dose Computed Tomography restoration
- Author
-
Nor Aniza Abdullah, Aznul Qalid Md Sabri, Khin Wee Lai, and K. A. Saneera Hemantha Kulathilake
- Subjects
Optimization ,Generative adversarial networks ,Computer science ,High radiation ,Computed tomography ,02 engineering and technology ,Machine learning ,computer.software_genre ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,0302 clinical medicine ,Deep Learning ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Denoising ,Modality (human–computer interaction) ,medicine.diagnostic_test ,business.industry ,Deep learning ,Low dose ,General Medicine ,Medical datasets ,Structure preservation ,020201 artificial intelligence & image processing ,Original Article ,Artificial intelligence ,business ,computer - Abstract
Computed Tomography (CT) is a widely use medical image modality in clinical medicine, because it produces excellent visualizations of fine structural details of the human body. In clinical procedures, it is desirable to acquire CT scans by minimizing the X-ray flux to prevent patients from being exposed to high radiation. However, these Low-Dose CT (LDCT) scanning protocols compromise the signal-to-noise ratio of the CT images because of noise and artifacts over the image space. Thus, various restoration methods have been published over the past 3 decades to produce high-quality CT images from these LDCT images. More recently, as opposed to conventional LDCT restoration methods, Deep Learning (DL)-based LDCT restoration approaches have been rather common due to their characteristics of being data-driven, high-performance, and fast execution. Thus, this study aims to elaborate on the role of DL techniques in LDCT restoration and critically review the applications of DL-based approaches for LDCT restoration. To achieve this aim, different aspects of DL-based LDCT restoration applications were analyzed. These include DL architectures, performance gains, functional requirements, and the diversity of objective functions. The outcome of the study highlights the existing limitations and future directions for DL-based LDCT restoration. To the best of our knowledge, there have been no previous reviews, which specifically address this topic.
- Published
- 2021
26. Development of Rheumatoid Arthritis Classification from Electronic Image Sensor Using Ensemble Method
- Author
-
Ho Sharon, Irraivan Elamvazuthi, Cheng-Kai Lu, S. Parasuraman, and Elango Natarajan
- Subjects
wearable sensor ,image sensor ,machine learning ,medical datasets ,ensemble method ,classification ,Chemical technology ,TP1-1185 - Abstract
Rheumatoid arthritis (RA) is an autoimmune illness that impacts the musculoskeletal system by causing chronic, inflammatory, and systemic effects. The disease often becomes progressive and reduces physical function, causes suffering, fatigue, and articular damage. Over a long period of time, RA causes harm to the bone and cartilage of the joints, weakens the joints’ muscles and tendons, eventually causing joint destruction. Sensors such as accelerometer, wearable sensors, and thermal infrared camera sensor are widely used to gather data for RA. In this paper, the classification of medical disorders based on RA and orthopaedics datasets using Ensemble methods are discussed. The RA dataset was gathered from the analysis of white blood cell classification using features extracted from the image of lymphocytes acquired from a digital microscope with an electronic image sensor. The orthopaedic dataset is a benchmark dataset for this study, as it posed a similar classification problem with several numerical features. Three ensemble algorithms such as bagging, Adaboost, and random subspace were used in the study. These ensemble classifiers use k-NN (K-nearest neighbours) and Random forest (RF) as the base learners of the ensemble classifiers. The data classification is accessed using holdout and 10-fold cross-validation evaluation methods. The assessment was based on set of performance measures such as precision, recall, F-measure, and receiver operating characteristic (ROC) curve. The performance was also measured based on the comparison of the overall classification accuracy rate between different ensembles classifiers and the base learners. Overall, it was found that for Dataset 1, random subspace classifier with k-NN shows the best results in terms of overall accuracy rate of 97.50% and for Dataset 2, bagging-RF shows the highest overall accuracy rate of 94.84% over different ensemble classifiers. The findings indicate that the efficiency of the base classifiers with ensemble classifier have substantially improved.
- Published
- 2019
- Full Text
- View/download PDF
27. Knowledge Discovery and Diseases Prediction: A Comparative Study of Machine Learning Techniques.
- Author
-
Nilashi, Mehrbakhsh, Ahmadi, Hossein, Shahmoradi, Leila, Mardani, Abbas, Ibrahim, Othman, and Yadegaridehkordi, Elaheh
- Subjects
MACHINE learning ,DISEASES ,MEDICAL databases ,BIG data ,SUPPORT vector machines ,NEURAL circuitry - Abstract
The use of medical datasets has attracted the attention of researchers worldwide. Data mining techniques have been widely used in developing decision support systems for disease classification through a set of medical datasets. In this paper, we propose a predictive method for diseases prediction using machine learning techniques. The proposed method is developed through clustering, noise removal, and supervised machine learning techniques. Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Neural Network (NN), Adaptive Network-Based Fuzzy Inference System (ANFIS), Support Vector Regression (SVR) and Classification and Regression Trees (CART) are used for diseases prediction task. We also use the Principal Component Analysis (PCA) for dimensionality reduction and to address multi-collinearity problems in the experimental datasets. We test our proposed method on several public medical datasets. Experimental results on Wisconsin Diagnostic Breast Cancer, StatLog, Cleveland and Parkinson's telemonitoring datasets show has potential to be used as a decision support system in healthcare. [ABSTRACT FROM AUTHOR]
- Published
- 2017
28. A hybrid filter-wrapper feature selection using Fuzzy KNN based on Bonferroni mean for medical datasets classification: A COVID-19 case study.
- Author
-
Vommi, Amukta Malyada and Battula, Tirumala Krishna
- Subjects
- *
FEATURE selection , *COVID-19 pandemic , *MEDICAL coding , *MACHINE learning , *COMPUTER vision , *COMPUTER engineering , *HUMAN facial recognition software , *FACE perception - Abstract
Several feature selection methods have been developed to extract the optimal features from a dataset in medical datasets classification. Creating an efficient technique has become a challenge because of the high dimensions, noise, and redundant information. In this paper, we propose a hybrid filter-wrapper approach for feature selection. An ensemble of filter methods, ReliefF and Fuzzy Entropy (RFE) is developed, and the union of top-n features from each method are considered. The Equilibrium Optimizer (EO) technique is combined with Opposition Based Learning (OBL), Cauchy Mutation operator and a novel search strategy to enhance its capabilities. The OBL strategy improves the diversity of the population in the search space. The Cauchy Mutation operator enhances its ability to evade the local optima during the search, and the novel search strategy improves the exploration capability of the algorithm. This enhanced form of EO is integrated with eight time-varying S and V-shaped transfer functions to convert the solutions into binary form, Binary Enhanced Equilibrium Optimizer (BEE). The features from the ensemble are given as input to the Binary Enhanced Equilibrium Optimizer to extract the essential features. Fuzzy KNN based on Bonferroni mean is used as the learning algorithm. Twenty-two benchmark datasets and four microarray datasets are used to test the algorithm's efficiency. This method is also applied to a COVID-19 case study. The results demonstrate the superiority of the proposed approach, RFE-BEE, among other methods in terms of fitness values, accuracy, precision, sensitivity, and F-measure, among several other state-of-the-art algorithms. RFE-BEE can be applied to various biomedical, computer vision and engineering applications such as electromyography pattern recognition, COVID-19 diagnosis, face recognition and fault diagnosis. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
29. The distance function effect on k-nearest neighbor classification for medical datasets.
- Author
-
Hu, Li-Yu, Huang, Min-Wei, Ke, Shih-Wen, and Tsai, Chih-Fong
- Subjects
- *
K-nearest neighbor classification , *SUPERVISED learning , *MACHINE learning , *SUPPORT vector machines , *MEDICAL care - Abstract
Introduction: K-nearest neighbor (k-NN) classification is conventional non-parametric classifier, which has been used as the baseline classifier in many pattern classification problems. It is based on measuring the distances between the test data and each of the training data to decide the final classification output. Case description: Since the Euclidean distance function is the most widely used distance metric in k-NN, no study examines the classification performance of k-NN by different distance functions, especially for various medical domain problems. Therefore, the aim of this paper is to investigate whether the distance function can affect the k-NN performance over different medical datasets. Our experiments are based on three different types of medical datasets containing categorical, numerical, and mixed types of data and four different distance functions including Euclidean, cosine, Chi square, and Minkowsky are used during k-NN classification individually. Discussion and evaluation: The experimental results show that using the Chi square distance function is the best choice for the three different types of datasets. However, using the cosine and Euclidean (and Minkowsky) distance function perform the worst over the mixed type of datasets. Conclusions: In this paper, we demonstrate that the chosen distance function can affect the classification accuracy of the k-NN classifier. For the medical domain datasets including the categorical, numerical, and mixed types of data, K-NN based on the Chi square distance function performs the best. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
30. B-MFO: A Binary Moth-Flame Optimization for Feature Selection from Medical Datasets
- Author
-
Hoda Zamani, Seyedali Mirjalili, Mohammad H. Nadimi-Shahraki, Mahdis Banaie-Dezfouli, and Shokooh Taghian
- Subjects
Computer Networks and Communications ,Computer science ,business.industry ,Binary number ,Feature selection ,Pattern recognition ,QA75.5-76.95 ,medical datasets ,Transfer function ,Human-Computer Interaction ,feature selection ,binary metaheuristic algorithms ,Friedman test ,Metaheuristic algorithms ,Electronic computers. Computer science ,Convergence (routing) ,Scalability ,Moth flame optimization ,transfer function ,Artificial intelligence ,swarm intelligence algorithms ,business ,optimization - Abstract
Advancements in medical technology have created numerous large datasets including many features. Usually, all captured features are not necessary, and there are redundant and irrelevant features, which reduce the performance of algorithms. To tackle this challenge, many metaheuristic algorithms are used to select effective features. However, most of them are not effective and scalable enough to select effective features from large medical datasets as well as small ones. Therefore, in this paper, a binary moth-flame optimization (B-MFO) is proposed to select effective features from small and large medical datasets. Three categories of B-MFO were developed using S-shaped, V-shaped, and U-shaped transfer functions to convert the canonical MFO from continuous to binary. These categories of B-MFO were evaluated on seven medical datasets and the results were compared with four well-known binary metaheuristic optimization algorithms: BPSO, bGWO, BDA, and BSSA. In addition, the convergence behavior of the B-MFO and comparative algorithms were assessed, and the results were statistically analyzed using the Friedman test. The experimental results demonstrate a superior performance of B-MFO in solving the feature selection problem for different medical datasets compared to other comparative algorithms.
- Published
- 2021
31. Collaborative learning based on associative models: Application to pattern classification in medical datasets.
- Author
-
Aldape-Pérez, Mario, Yáñez-Márquez, Cornelio, Camacho-Nieto, Oscar, López-Yáñez, Itzamá, and Argüelles-Cruz, Amadeo-José
- Subjects
- *
ALGORITHMS , *EXPERTISE , *INTERPROFESSIONAL relations , *MEDICAL informatics , *MEDICAL personnel , *SOCIAL networks , *CLINICAL competence , *DATA analysis , *COMPUTER-aided diagnosis , *EDUCATION - Abstract
This paper addresses social networking and collaborative learning in the medical domain by focusing on two main objectives: the first one concerns about social networking between computer science experts and postgraduate students, while the second concerns about collaborative learning between medical experts and less experienced physicians. The tasks of algorithms testing and performance evaluation were assigned to computer science postgraduate students. They made extensive use of social networking in order to implement associative models to perform pattern classification tasks in medical datasets and share performance results. Associative memories have a number of properties, including a rapid, compute efficient best-match and intrinsic noise tolerance that make them ideal for diagnostic hypothesis-generation processes in the medical domain. Using supervised machine learning algorithms allows less experienced physicians to compare their diagnostic results between workgroups and verify whether their knowledge is consistent with the results delivered by computational tools. Throughout the experimental phase the proposed algorithm is applied to help diagnose diseases; particularly, it is applied in the diagnosis of five different problems in the medical field. The performance of the proposed model is validated by comparing classification accuracy of DAM against the performance achieved by other twenty well known algorithms. Experimental results have shown that DAM achieved the best performance in three of the five pattern classification problems in the medical field. Similarly, it should be noted that our proposal achieved the best classification accuracy averaged over all datasets. Experimental results confirm that the proposed algorithm can be a valuable tool for promoting collaborative learning among less experienced physicians. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
32. A hybrid model for classification of medical data set based on factor analysis and extreme learning machine: FA + ELM.
- Author
-
Kaya, Yılmaz and Kuncan, Fatma
- Subjects
FACTOR analysis ,MEDICAL coding ,DATABASES ,LYMPHANGIOGRAPHY ,DIAGNOSIS ,MACHINE learning - Abstract
• A hybrid model based on Factor Analysis (FA) and Extreme Learning Machine (ELM) was proposed in this study for diagnosing breast cancer, Lymphography, and erythemato-squamous diseases. • The best success rate achieved by classifying the DERM dataset directly using ELM was determined as 100%, while the highest success rate achieved after preprocessing with FA did not change. However, the average success rate achieved after preprocessing of DETM dataset with FA increased from 96.39% to 96.94%. • The highest success rate achieved by classifying the LYMP dataset directly using ELM was determined as 90.00 %, while the result obtained using FA + ELM as 93.33 %. FA increased the average success rate from 84.50 % to %85.10. • The best success rate achieved for the Wisconsin breast cancer data set using ELM and FA + ELM was 99.27 %. However, FA increased the average success rate from 97.10 % to 97.25 %. • In all datasets, higher success rates were obtained using FA despite decreasing the dimension (size) of attributes. • As a result, important conclusions were obtained in classifying medical data using the hybrid model based on factor analysis and the extreme learning machine proposed in this study. The proposed method will be helpful in the decision-making stage in medical diagnosis systems. Data mining techniques such as classification, clustering, and prediction are used extensively for medical diagnosis in epidemiological fields. A hybrid model based on Factor Analysis (FA) and Extreme Learning Machine (ELM) was proposed in this study for diagnosing breast cancer, Lymphography, and erythemato-squamous diseases. The proposed hybrid model consists of two stages. Firstly, FA was used for preprocessing the medical dataset, and then, the factors obtained using FA were used as input features for the ELM model. Dermatology, Lymphography, and Wisconsin Breast Cancer real datasets obtained from the UCI machine learning database were used to test the proposed model. An average success rate of 96.39 % and 96.94 % was obtained after classifying the dermatology dataset with ELM and FA + ELM models. While the success rate obtained by classifying the lymphography data set using ELM is 84.50 %, the result obtained with FA + ELM is 85.10 %. The success rates of 97.10 % and 97.25 % are achieved respectively for Wisconsin Breast Cancer (WBC) using ELM and FA + ELM. As a result, it was observed that preprocessing of the data increased the average classification success in three different medical datasets used for the classification problem. It is considered that the proposed hybrid model will be helpful for the decision-making stage in medical diagnosis systems. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. Applications of dynamic feature selection and clustering methods to medical diagnosis.
- Author
-
Ershadi, Mohammad Mahdi and Seifi, Abbas
- Subjects
FEATURE selection ,DEEP learning ,DIAGNOSIS ,PARTICLE swarm optimization ,PRINCIPAL components analysis ,MACHINE learning - Abstract
Machine learning methods are commonly used for disease and cancer diagnosis. The model performance can be improved via feature selection, feature reduction, and clustering methods. Although these supplementary techniques have certain advantages, they cannot necessarily guarantee better performance. The objective of this study is to improve the performance of classification methods used for medical diagnosis of various diseases. We propose a dynamic feature selection method based on the merits of both principal component analysis and Wrapper feature selection methods. It is a novel multi-objective feature selection method based on a customized genetic algorithm that is guided by eigenvalues of the features and feedbacks of various classifiers' output. To reinforce classification learning and further enhancement of their performance, we utilize a dynamic selection of three clustering methods including K-means, fuzzy c-means, and particle swarm optimization. We also investigated the performance of two deep learning classifiers on the proposed methods. To show the impacts of combination of the proposed methods, we analyze the results of applying 12 machine learning and two deep learning classifiers to 30 imbalanced medical datasets. According to our extensive computational experiments and the statistical tests, the proposed dynamic feature selection and clustering methods perform significantly better than existing methods. The proposed methods not only improve the average of performance measures by 5% but also are more accurate than best performing classification methods in the literature used for the same datasets. [Display omitted] • A dynamic feature selection method called GA-Eig-RBF is proposed in this paper. • We use a dynamic clustering selection based on K-means, fuzzy c-means, and PSO algorithm to reinforce classification learnings. • Thirty imbalanced medical datasets from UCI repository have been used in this study. • RBF had the best average performance among all 12 machine learning and two deep learning classifiers used in this research. • The proposed methods outperform Principal Component Analysis (PCA) feature reduction and Wrapper feature selection methods for classification. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
34. The Use of Hellinger Distance Undersampling Model to Improve the Classification of Disease Class in Imbalanced Medical Datasets
- Author
-
Zaed Hamady, Nadia Peppa, Sefer Kurnaz, Alex H. Mirnezami, Zina Z R Al-Shamaa, Adil Deniz Duru, Al-Shamaa, Zina Z. R., Kurnaz, Sefer, Duru, Adil Deniz, Peppa, Nadia, Mirnezami, Alex H., and Hamady, Zaed Z. R.
- Subjects
SELECTION ,Article Subject ,QH301-705.5 ,Computer science ,Biomedical Engineering ,Medicine (miscellaneous) ,Bioengineering ,02 engineering and technology ,Minority class ,Machine learning ,computer.software_genre ,Measure (mathematics) ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,Hellinger ,Sensitivity (control systems) ,Biology (General) ,Hellinger distance ,SMOTE ,030304 developmental biology ,0303 health sciences ,business.industry ,Medical Datasets ,ALGORITHMS ,Baseline model ,Classification ,Class (biology) ,Majority class ,Undersampling ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,TP248.13-248.65 ,Biotechnology ,Research Article - Abstract
Mirnezami, Alexander/0000-0002-6199-8332 WOS:000594274800001 PubMed: 33204304 Imbalanced class distribution in the medical dataset is a challenging task that hinders classifying disease correctly. It emerges when the number of healthy class instances being much larger than the disease class instances. To solve this problem, we proposed undersampling the healthy class instances to improve disease class classification. This model is named Hellinger Distance Undersampling (HDUS). It employs the Hellinger Distance to measure the resemblance between majority class instance and its neighbouring minority class instances to separate classes effectively and boost the discrimination power for each class. An extensive experiment has been conducted on four imbalanced medical datasets using three classifiers to compare HDUS with a baseline model and three state-of-the-art undersampling models. The outcomes display that HDUS can perform better than other models in terms of sensitivity, F1 measure, and balanced accuracy.
- Published
- 2020
35. GeFeS : A generalized wrapper feature selection approach for optimizing classification performance
- Author
-
Sahebi, Golnaz, Movahedi, Parisa, Ebrahimi, Masoumeh, Pahikkala, Tapio, Plosila, Juha, Tenhunen, Hannu, Sahebi, Golnaz, Movahedi, Parisa, Ebrahimi, Masoumeh, Pahikkala, Tapio, Plosila, Juha, and Tenhunen, Hannu
- Abstract
In this paper, we propose a generalized wrapper-based feature selection, called GeFeS, which is based on a parallel new intelligent genetic algorithm (GA). The proposed GeFeS works properly under different numerical dataset dimensions and sizes, carefully tries to avoid overfitting and significantly enhances classification accuracy. To make the GA more accurate, robust and intelligent, we have proposed a new operator for features weighting, improved the mutation and crossover operators, and integrated nested cross-validation into the GA process to properly validate the learning model. The k-nearest neighbor (kNN) classifier is utilized to evaluate the goodness of selected features. We have evaluated the efficiency of GeFeS on various datasets selected from the UCI machine learning repository. The performance is compared with state-of-the-art classification and feature selection methods. The results demonstrate that GeFeS can significantly generalize the proposed multi-population intelligent genetic algorithm under different sizes of two-class and multi-class datasets. We have achieved the average classification accuracy of 95.83%, 97.62%, 99.02%, 98.51%, and 94.28% while reducing the number of features from 56 to 28, 34 to 18, 279 to 135, 30 to 16, and 19 to 9 under lung cancer, dermatology, arrhythmia, WDBC, and hepatitis, respectively., QC 20201201
- Published
- 2020
- Full Text
- View/download PDF
36. Incorporating expert knowledge when learning Bayesian network structure: A medical case study
- Author
-
Julia Flores, M., Nicholson, Ann E., Brunskill, Andrew, Korb, Kevin B., and Mascaro, Steven
- Subjects
- *
BAYESIAN analysis , *TECHNOLOGICAL innovations , *MEDICAL technology , *MEDICAL informatics , *COMPUTERS in medicine , *MACHINE learning , *COMPUTER networks - Abstract
Abstract: Objectives: Bayesian networks (BNs) are rapidly becoming a leading technology in applied Artificial Intelligence, with many applications in medicine. Both automated learning of BNs and expert elicitation have been used to build these networks, but the potentially more useful combination of these two methods remains underexplored. In this paper we examine a number of approaches to their combination when learning structure and present new techniques for assessing their results. Methods and materials: Using public-domain medical data, we run an automated causal discovery system, CaMML, which allows the incorporation of multiple kinds of prior expert knowledge into its search, to test and compare unbiased discovery with discovery biased with different kinds of expert opinion. We use adjacency matrices enhanced with numerical and colour labels to assist with the interpretation of the results. We present an algorithm for generating a single BN from a set of learned BNs that incorporates user preferences regarding complexity vs completeness. These techniques are presented as part of the first detailed workflow for hybrid structure learning within the broader knowledge engineering process. Results: The detailed knowledge engineering workflow is shown to be useful for structuring a complex iterative BN development process. The adjacency matrices make it clear that for our medical case study using the IOWA dataset, the simplest kind of prior information (partially sorting variables into tiers) was more effective in aiding model discovery than either using no prior information or using more sophisticated and detailed expert priors. The method for generating a single BN captures relationships that would be overlooked by other approaches in the literature. Conclusion: Hybrid causal learning of BNs is an important emerging technology. We present methods for incorporating it into the knowledge engineering process, including visualisation and analysis of the learned networks. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
37. B-MFO: A Binary Moth-Flame Optimization for Feature Selection from Medical Datasets.
- Author
-
Nadimi-Shahraki, Mohammad H., Banaie-Dezfouli, Mahdis, Zamani, Hoda, Taghian, Shokooh, and Mirjalili, Seyedali
- Subjects
FEATURE selection ,METAHEURISTIC algorithms ,COMPARATIVE psychology ,TRANSFER functions ,SWARM intelligence ,MATHEMATICAL optimization ,MULTICASTING (Computer networks) - Abstract
Advancements in medical technology have created numerous large datasets including many features. Usually, all captured features are not necessary, and there are redundant and irrelevant features, which reduce the performance of algorithms. To tackle this challenge, many metaheuristic algorithms are used to select effective features. However, most of them are not effective and scalable enough to select effective features from large medical datasets as well as small ones. Therefore, in this paper, a binary moth-flame optimization (B-MFO) is proposed to select effective features from small and large medical datasets. Three categories of B-MFO were developed using S-shaped, V-shaped, and U-shaped transfer functions to convert the canonical MFO from continuous to binary. These categories of B-MFO were evaluated on seven medical datasets and the results were compared with four well-known binary metaheuristic optimization algorithms: BPSO, bGWO, BDA, and BSSA. In addition, the convergence behavior of the B-MFO and comparative algorithms were assessed, and the results were statistically analyzed using the Friedman test. The experimental results demonstrate a superior performance of B-MFO in solving the feature selection problem for different medical datasets compared to other comparative algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
38. Classification of Diseases Using Machine Learning Algorithms: A Comparative Study.
- Author
-
Moreno-Ibarra, Marco-Antonio, Villuendas-Rey, Yenny, Lytras, Miltiadis D., Yáñez-Márquez, Cornelio, and Salgado-Ramírez, Julio-César
- Subjects
NOSOLOGY ,MACHINE learning ,MEDICAL personnel ,CLASSIFICATION algorithms ,DIAGNOSIS - Abstract
Machine learning in the medical area has become a very important requirement. The healthcare professional needs useful tools to diagnose medical illnesses. Classifiers are important to provide tools that can be useful to the health professional for this purpose. However, questions arise: which classifier to use? What metrics are appropriate to measure the performance of the classifier? How to determine a good distribution of the data so that the classifier does not bias the medical patterns to be classified in a particular class? Then most important question: does a classifier perform well for a particular disease? This paper will present some answers to the questions mentioned above, making use of classification algorithms widely used in machine learning research with datasets relating to medical illnesses under the supervised learning scheme. In addition to state-of-the-art algorithms in pattern classification, we introduce a novelty: the use of meta-learning to determine, a priori, which classifier would be the ideal for a specific dataset. The results obtained show numerically and statistically that there are reliable classifiers to suggest medical diagnoses. In addition, we provide some insights about the expected performance of classifiers for such a task. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
39. Healthcare Records 2
- Author
-
Grgic, Hrvoje, Prevolsek, Kristijan, Raguz, Ana, Rizzieri, Jordan, Gross, Hunter, and Pandza, Filip
- Subjects
health care facilities, manpower, and services ,health services administration ,education ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Web application for medical diagnostics ,medical datasets ,telemedicine ,human activities - Abstract
Due to the continuous development of multimedia, there is a growing demand for multimedia applications in diverse application areas, such as medical imaging and telemedicine. The task is to build a web application that will enable medical specialists (radiologists, surgeons, etc.) to seek consultations/opinions from their colleagues from other departments/locations/institutions, regarding specific diagnostic procedure results.
- Published
- 2018
40. Tıbbi veri sınıflandırması için yapay sinir ağını geliştirmek için meta-heuristik algoritmalar
- Author
-
Gburı, Ihsan Salman Jasım Al, Uçan, Osman Nuri, Shaker, Khalid, and Gburı, Ihsan Salman Jasım Al
- Subjects
Optimization ,Medical Datasets ,Data Mining ,Classification ,ANN ,Metaheuristic Algorithms - Abstract
YÖK Tez No: 520910 Doktora Bilgisayar donanım teknolojilerinin muazzam büyümesi ve büyük miktarlardaki veri karmaşıklığını çözme yetenekleri, araştırmacıları karmaşık veri madenciliği zorluklarını ve problemlerini aşmaya yöneltmiştir. Tıbbi veri seti sınıflandırması, yapay zeka ve veri madenciliği alanındaki araştırmaların karşılaştığı en önemli ve karmaşık problemlerden birini temsil etmektedir. Farklı hastalıklar ve birden fazla test kullanarak çeşitli teşhis yolları, çok miktarda karmaşık tıbbi veri üretmiştir. Üstelik, klinik merkezler ve hastanelerdeki ve diğer sağlık kurumlarındaki hasta kayıtlarının sayısı, doktorların ve terapistlerin, hastaların kritik koşullarda olup olmadığına veya uzak takiplere ihtiyaç duyulup duyulmadığına bakılmaksızın vakaları araştırmasına yardımcı olmak için gelişmiş ve doğru tıbbi madencilik uygulamalarına ihtiyaç duymaktadır. Bu tez, tıbbi veri madenciliğinin üst üste binen alanları için bir sınıflandırma modelinin doğruluğunu arttırmak için yapay sinir ağı (YSA) ve metaheuristik algoritmaların melezleştirilmesine odaklanmaktadır. Tıbbi tanılarla ilişkili temel problemler, son derece doğru sınıflandırma modellerinin tanımlanmasını içerir. Bu tezin katkıları, ilgili literatürde vurgulanan iki önemli sınıflandırma problemi veya konu etrafında döner. İlk strateji için, YSA yapısı ve optimize edilmiş algoritma arasındaki ilişki kurulmuştur. İkinci strateji için, çeşitlendirme ve yoğunlaşma arasındaki geçiş, optimal küresel çözüm arayışının bir parçası olarak incelenmiştir. Birinci bölümde bir arka plan girişini tartıştık ve ikinci bölümde problem üzerine uygulanan yaklaşımlar hakkında bir literatür taraması yaptık. Bu tezin üçüncü bölümünde, metaheüristik yinelemenin YSA yapısına etkisi tartışılmaktadır. Metaheuristik algoritma yinelemesinin YSA yapısı üzerindeki etkisini iyileştirmek suretiyle önerilen çalışmanın yeniliği gösterilmiştir. YSA, üç farklı metaheuristik algoritma (parçacık sürüsü optimizasyonu PSO, genetik algoritma GA ve havai fişek algoritması FW) kullanılarak geliştirilmiştir. Önerilen modeller beş standart tıbbi kriter ve bir büyük veri tıbbi veri kümesi üzerinde test edilmiştir. Önerilen çalışma başarıyla uygulandı ve dikkate değer sonuçlar elde edildi. Ayrıca, serbest-öğle yemeği teoremi NFLT çalışmanın bağlamında doğrulanır, yani, tüm sorun alanları için hiçbir algoritma evrensel değildir. Bu tezin ileriki bölümü, tıbbi veri sınıflandırmasının en iyi doğruluğunu temsil eden en uygun küresel çözümü elde ederken, keşif ve sömürü arasındaki geçişi araştırmaktadır. Gizli katmanların sayısı ve her katmandaki nöronların sayısı hem ANN öğrenimini etkileyebilir. Böylece, bu tezde kullanılan YSA, yüksek doğrulukta sonuçlara ulaşabilen karmaşık bir yapının seçilmesini içerir; Sonuç olarak, küresel optimum için arama yaparken metaheuristik algoritma verimliliği garanti edilebilir. Diferansiyel evrim algoritması DE ve benzetimli tavlama SA olarak adlandırılan iki meta-yandaş algoritma, problem alan adı için yeni ve geliştirilmiş bir algoritma DESA formüle etmek üzere birleştirilmiştir. Bununla birlikte, son derece hassas iki algoritmanın seçilmesi zorunlu değildir; bunun yerine, kolaylık sağlamak için ampirik testler yapılabilir. Önerilen yöntemin orijinalliği, küresel çözümler ve yoğun olarak kullanılan yerel çözümler için geniş bir alanı araştırmak için arama ve sömürü arasında denge sağlamak üzere, evrimsel metaheuristik algoritma olarak DE ve yörünge algoritması olarak SA'yı birleştirmektedir. DESA yöntemi GA ve DE olan tow evrimsel ile ve SA ve Tabu TS olan iki yörüngeyle karşılaştırıldı. Önerilen yöntem DESA başarıyla uygulandı ve daha iyi sonuçlar elde edildi. The tremendous growth of computer hardware technologies and their abilities to solve huge amounts of complex of data has motivated researchers to overcome complicated data mining challenges and problems. Medical dataset classification represents one of the most crucial and complicated problems faced by researches in the field of artificial intelligence and data mining. The different diseases and the various ways of diagnosis by using multiple testing have produced large amounts of complex medical data. Moreover, the huge number of patient records in clinical centers and hospitals and other health institutions has generated the need for advanced and accurate medical mining applications to help doctors and therapists investigate cases regardless whether patients are in critical conditions or require remote follow-ups. This thesis focuses on the hybridization of the artificial neural network (ANN) and metaheuristic algorithms to enhance the accuracy of a classification model for the overlapping fields of medical data mining. The key problems associated with medical diagnoses involve the identification of highly accurate classification models. The contributions of this thesis revolve around the two important classification problems or issues highlighted in the related literature. For the first strategy, the relation between the ANN structure and the optimized algorithm is established. For the second strategy, the tradeoff between diversification and intensification is investigated as part of the search for the optimal global solution. In the first chapter we discuss a background introduction and in the second chapter a literature survey about the approaches applied on the problem. The third chapter of this thesis discusses the effect of metaheuristic iteration on ANN structure. The novelty of the proposed work shown through improving the impact of metaheuristic algorithm iteration on ANN structure. ANN is enhanced using separate three metaheuristic algorithms (particle swarm optimization PSO, genetic algorithm GA, and fireworks algorithm FW). The proposed models are tested on five standard medical benchmarks and one big-data medical dataset. The proposed study is successfully implemented, and remarkable results are obtained. Furthermore, the no-free-lunch theorem (NFLT) is verified in the study's context, that is, no algorithm is universal for all problem domains. The forth chapter of this thesis investigates the tradeoff between exploration and exploitation when obtaining the optimal global solution which represent best accuracy of medical data classification. The number of hidden layers and the number of neurons in each layer can both affect ANN learning. Thus, the ANN used in this thesis involves the selection of a complex structure that can achieve highly accurate results; consequently, metaheuristic algorithm efficiency can be guaranteed when searching for the global optimum. Two metaheuristic algorithms named differential evolution algorithm DE and simulated annealing SA are combined to formulate a new and improved algorithm DESA for considered problem domain. However, selecting the highly accurate two algorithms is not mandatory; instead, empirical tests can be performed for convenience. Originality of proposed method is combining between DE as evolutionary metaheuristic algorithm and SA as trajectory algorithm to provide balance between exploration and exploitation to explore search space widely for global solutions and intensively exploited local solutions. DESA method compared with tow evolutionary which are GA and DE, and with two trajectory which are SA and Tabu search TS. Proposed method DESA is implemented successfully, and better results obtained.
- Published
- 2018
41. Healthcare Records 1
- Author
-
Gertz, Benjamin, Bach, Leo, Merabet, Youssef, Smith, Conner, Buto, Casey, and Zalenkovic, Marko
- Subjects
health care facilities, manpower, and services ,health services administration ,education ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Web application for medical diagnostics ,medical datasets ,telemedicine ,human activities - Abstract
Due to the continuous development of multimedia, there is a growing demand for multimedia applications in diverse application areas, such as medical imaging and telemedicine. The task is to build a web application that will enable medical specialists (radiologists, surgeons, etc.) to seek consultations/opinions from their colleagues from other departments/locations/institutions, regarding specific diagnostic procedure results.
- Published
- 2018
42. Healthcare Records 3
- Author
-
Horvat, Dino, Landeka, Ivan, Clark, Kira, Hoover, Matthew, Tillotson, Jenna, and Dragicevic, Luka
- Subjects
health care facilities, manpower, and services ,health services administration ,education ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Web application for medical diagnostics ,medical datasets ,telemedicine ,human activities - Abstract
Due to the continuous development of multimedia, there is a growing demand for multimedia applications in diverse application areas, such as medical imaging and telemedicine. The task is to build a web application that will enable medical specialists (radiologists, surgeons, etc.) to seek consultations/opinions from their colleagues from other departments/locations/institutions, regarding specific diagnostic procedure results.
- Published
- 2018
43. Healthcare Records 4
- Author
-
Labriola, Antonio, Stadtlander, Ryan, Atwell, Matthew, Kunej, Daniela, Hakstok, Ante, and Lin, Katherine
- Subjects
health care facilities, manpower, and services ,health services administration ,education ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Web application for medical diagnostics ,medical datasets ,telemedicine ,human activities - Abstract
Due to the continuous development of multimedia, there is a growing demand for multimedia applications in diverse application areas, such as medical imaging and telemedicine. The task is to build a web application that will enable medical specialists (radiologists, surgeons, etc.) to seek consultations/opinions from their colleagues from other departments/locations/institutions, regarding specific diagnostic procedure results.
- Published
- 2018
44. Classification of medical datasets using back propagation neural network powered by genetic-based features elector
- Author
-
Hussein Attya Lafta, Zainab Falah Hasan, and Noor Kadhim Ayoob
- Subjects
General Computer Science ,Artificial neural network ,business.industry ,Computer science ,Neural networks classification ,Machine learning ,computer.software_genre ,Medical datasets ,Back propagation neural network ,ComputingMethodologies_PATTERNRECOGNITION ,Genetic algorithm ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,Classifier (UML) - Abstract
The classification is a one of the most indispensable domains in the data mining and machine learning. The classification process has a good reputation in the area of diseases diagnosis by computer systems where the progress in smart technologies of computer can be invested in diagnosing various diseases based on data of real patients documented in databases. The paper introduced a methodology for diagnosing a set of diseases including two types of cancer (breast cancer and lung), two datasets for diabetes and heart attack. Back Propagation Neural Network plays the role of classifier. The performance of neural net is enhanced by using the genetic algorithm which provides the classifier with the optimal features to raise the classification rate to the highest possible. The system showed high efficiency in dealing with databases differs from each other in size, number of features and nature of the data and this is what the results illustrated, where the ratio of the classification reached to 100% in most datasets).
- Published
- 2019
- Full Text
- View/download PDF
45. GeFeS: A generalized wrapper feature selection approach for optimizing classification performance.
- Author
-
Sahebi G, Movahedi P, Ebrahimi M, Pahikkala T, Plosila J, and Tenhunen H
- Subjects
- Arrhythmias, Cardiac, Humans, Algorithms, Machine Learning
- Abstract
In this paper, we propose a generalized wrapper-based feature selection, called GeFeS, which is based on a parallel new intelligent genetic algorithm (GA). The proposed GeFeS works properly under different numerical dataset dimensions and sizes, carefully tries to avoid overfitting and significantly enhances classification accuracy. To make the GA more accurate, robust and intelligent, we have proposed a new operator for features weighting, improved the mutation and crossover operators, and integrated nested cross-validation into the GA process to properly validate the learning model. The k-nearest neighbor (kNN) classifier is utilized to evaluate the goodness of selected features. We have evaluated the efficiency of GeFeS on various datasets selected from the UCI machine learning repository. The performance is compared with state-of-the-art classification and feature selection methods. The results demonstrate that GeFeS can significantly generalize the proposed multi-population intelligent genetic algorithm under different sizes of two-class and multi-class datasets. We have achieved the average classification accuracy of 95.83%, 97.62%, 99.02%, 98.51%, and 94.28% while reducing the number of features from 56 to 28, 34 to 18, 279 to 135, 30 to 16, and 19 to 9 under lung cancer, dermatology, arrhythmia, WDBC, and hepatitis, respectively., (Copyright © 2020 Elsevier Ltd. All rights reserved.)
- Published
- 2020
- Full Text
- View/download PDF
46. Development of Rheumatoid Arthritis Classification from Electronic Image Sensor Using Ensemble Method.
- Author
-
Sharon, Ho, Elamvazuthi, Irraivan, Lu, Cheng-Kai, Parasuraman, S., and Natarajan, Elango
- Subjects
DIGITAL images ,IMAGE sensors ,RHEUMATOID arthritis ,LEUCOCYTES ,RECEIVER operating characteristic curves ,THERMAL tolerance (Physiology) ,MUSCULOSKELETAL system - Abstract
Rheumatoid arthritis (RA) is an autoimmune illness that impacts the musculoskeletal system by causing chronic, inflammatory, and systemic effects. The disease often becomes progressive and reduces physical function, causes suffering, fatigue, and articular damage. Over a long period of time, RA causes harm to the bone and cartilage of the joints, weakens the joints' muscles and tendons, eventually causing joint destruction. Sensors such as accelerometer, wearable sensors, and thermal infrared camera sensor are widely used to gather data for RA. In this paper, the classification of medical disorders based on RA and orthopaedics datasets using Ensemble methods are discussed. The RA dataset was gathered from the analysis of white blood cell classification using features extracted from the image of lymphocytes acquired from a digital microscope with an electronic image sensor. The orthopaedic dataset is a benchmark dataset for this study, as it posed a similar classification problem with several numerical features. Three ensemble algorithms such as bagging, Adaboost, and random subspace were used in the study. These ensemble classifiers use k-NN (K-nearest neighbours) and Random forest (RF) as the base learners of the ensemble classifiers. The data classification is accessed using holdout and 10-fold cross-validation evaluation methods. The assessment was based on set of performance measures such as precision, recall, F-measure, and receiver operating characteristic (ROC) curve. The performance was also measured based on the comparison of the overall classification accuracy rate between different ensembles classifiers and the base learners. Overall, it was found that for Dataset 1, random subspace classifier with k-NN shows the best results in terms of overall accuracy rate of 97.50% and for Dataset 2, bagging-RF shows the highest overall accuracy rate of 94.84% over different ensemble classifiers. The findings indicate that the efficiency of the base classifiers with ensemble classifier have substantially improved. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
47. Biyomedikal sinyallerde veri ön-işleme tekniklerinin medikal teşhiste sınıflama doğruluğuna etkisinin incelenmesi
- Author
-
Polat, Kemal, Güneş, Salih, Elektrik-Elektronik Mühendisliği Anabilim Dalı, and Enstitüler, Fen Bilimleri Enstitüsü, Elektrik Elektronik Mühendisliği Ana Bilim Dalı
- Subjects
Data weighting ,Data processing ,Hybrid systems ,Elektrik ve Elektronik Mühendisliği ,Veri ağırlıklandırma ,Medikal veri kümeleri ,Feature selection ,Hibrid sistemler ,Sınıflandırma ,Özellik seçme ,Classification ,Electrical and Electronics Engineering ,Medical datasets - Abstract
Bu tez çalışmasında, biyomedikal veri kümelerinin sınıflandırılmasında sınıflama performansını arttırmak için veri ağırlıklandırma ve özellik seçme yöntemleri önerilmiş ve kullanılmıştır. Biyomedikal veri kümelerini sınıflamada sınıflama performansını azaltan bazı etmenler vardır. Bu etmenler gürültü, aykırı değer, lineer olmayan bir veri dağılımına sahip olma gibi durumlardır. Yukarıdaki etmenlere sahip olan veri kümelerinin sınıflama performanslarını arttırmak için çeşitli veri ön-işleme teknikleri kullanılır. Biyomedikal veri kümelerinde, özellik çıkarımından sonra oluşturulan veri setinin boyutu fazla olabilir veya veri setinde ilgisiz/fazla özellikler olabilir. Bu özelliklerin dezavantajları; sınıflama performansını azaltır ve sınıflandırıcının hesaplama maliyetini arttırır. Yapılan çalışmalarda, özellik seçme algoritmaları ile daha yüksek genelleştirme yeteneği ve daha az işlem karışıklığı elde edilmiştir. Bu tez çalışmasında, boyut azaltımı ve özellik seçme algoritması olarak, temel bileşen analizi, bilgi kazancına dayanan özellik seçme algoritması ve Kernel F-skor özellik seçme yöntemleri özelik seçme algoritmaları olarak kullanılmıştır. Bu yöntemler arasında, özellik seçme olarak, bilgi kazancına dayanan özellik seçme algoritması ile Kernel F-skor özellik seçme yöntemi ön plana çıkmaktadır. Boyut azaltımı olarak da temel bileşen analizine ağırlık verilmiştir. Veri ağırlıklandırma yöntemleri olarak, bulanık ağırlıklandırma ön-işleme, k-NN (k-en yakın komşu) tabanlı veri ağırlıklandırma ön-işleme, genelleştirilmiş ayrışım analizi ve benzerlik tabanlı veri ağırlıklandırma ön-işleme yöntemleri medikal veri kümelerini sınıflamada sınıflama performansını iyileştirmek için kullanılmış ve önerilmiştir. Bu tez çalışmasında kullanılan biyomedikal veri kümeleri; kalp hastalığı, SPECT (Single Photon Emission Computed Tomography) görüntüleri ile kalp hastalığı, E.coli Promoter gen dizileri, Doppler sinyali ile damar sertliği (Atherosclerosis) hastalığı, VEP (Görsel Uyarılmış Potansiyel) sinyali ile optik sinir hastalığı ve PERG (Örüntü Retinografisi) sinyali ile Macular hastalığı veri kümeleridir. Bu veri kümeleri içinden, kalp hastalığı, SPECT (Single Photon Emission Computed Tomography) görüntüleri ile kalp hastalığı, E.coli Promoter gen dizileri veri kümeleri, UCI (University of California, Irvine) makine öğrenmesi veritabanından alınmıştır. Doppler sinyali ile damar sertliği hastalığı, VEP sinyali ile optik sinir hastalığı ve PERG sinyali ile Macular hastalığı veri kümeleri ise Fatih Üniversitesi Öğretim Üyesi Prof. Dr. Sadık Kara ve Erciyes Üniversitesi Biyomedikal Mühendisliği ekibi tarafından alınan verilerdir. Veri ön-işleme ve özellik seçme yöntemlerinin performanslarını değerlendirmek için bu yöntemler sınıflama algoritmaları ile hibrid olarak kullanılmışlardır. Kullanılan sınıflama algoritmaları, ANFIS (Adaptif Ağ Tabanlı Bulanık Çıkarım Sistemi), C4.5 karar ağacı, YBTS (Yapay Bağışıklık Tanıma Sistemi), bulanık kaynak dağılım mekanizmalı YBTS ve yapay sinir ağlarıdır. Biyomedikal veri kümelerinin sınıflandırılması sonucunda, veri ağırlıklandırma yöntemleri arasında en iyi sonuçları veren yöntem, k-NN (k- en yakın komşu) tabanlı veri ağırlıklandırma yöntemi olmuştur. Özellik seçme yöntemleri arasında ise temel bileşen analizi diğer özellik seçme yöntemlere göre üstün sonuçlar elde etmiştir. Özellik seçme yöntemleri, veri ağırlıklandırma yöntemleri ile sınıflama algoritmaları birleştirilerek 12 yeni hibrid sistem oluşturulmuş ve bu yeni hibrid sistemler tezde kullanılan 6 medikal veri kümesine uygulanmıştır. Hesaplama maliyeti ve sınıflama performansı açısından her bir medikal veri kümesi için en iyi hibrid model seçilmiştir., In this PhD. thesis, data weighting and feature selection methods are proposed and used for increasing the performance of classification of biomedical datasets. There are some factors that decrease the classification performance on classification of biomedical datasets. These factors are noise, invalid data, non-linearly separable data distribution etc. Various data pre-processing methods are used to increase the classification performance of medical datasets afflicted above factors. In the biomedical datasets, after feature extraction, the dimension of produced dataset can be huge or biomedical datasets may contain the irrelevant or redundant features. The disadvantages of these features are as follows: they decrease the classification performance and increase the computation cost of classifier. In the conducted studies, higher generalization ability and lesser operational complexity are achieved with feature selection and dimensionality reduction algorithms. In this thesis, principal component analysis, feature selection algorithm based on information gain, and kernel f-score feature selection methods are proposed and used as feature selection and dimensionality reduction algorithms. Among these methods, feature selection algorithm based on information gain and kernel f-score feature selection methods are emphasized. As for the dimensionality reduction process, more weight is given to principal component analysis. As data weighting methods, fuzzy weighted pre-processing, k-NN based weighted pre-processing, generalized discriminant analysis, similarity based weighted pre-processing methods are proposed and used to improve the performance of classifier in classification of biomedical datasets. Among above methods, the proposed data weighted methods are fuzzy weighted pre-processing, k-NN based weighted pre-processing, and similarity based weighted pre-processing methods. In this PhD. thesis, the used biomedical datasets are heart disease, heart disease with SPECT (Single Photon Emission Computed Tomography) images, E.coli Promoter gene sequences, Atherosclerosis disease with Doppler signals, optic nerve disease with VEP (Visual Evoked Potentials) signals, and macular disease with PERG (Pattern Electroretinography) datasets. Among datasets, heart disease, heart disease with SPECT (Single Photon Emission Computed Tomography) images, E.coli Promoter gene sequences datasets are taken from UCI (University of California, Irvine) machine learning database. The other datasets including Atherosclerosis disease with Doppler signals, optic nerve disease with VEP signals, and macular disease with PERG datasets are taken from Prof. Dr. Sadık Kara in Fatih University and biomedical engineering team in Erciyes University. In order to evaluate the performances of data weighting and feature selection methods, these methods are used as hybrid with classifier algorithms. Used classification algorithms are ANFIS (Adaptive Network Based Fuzzy Inference System), C4.5 decision tree classifier algorithm, AIRS (Artificial Immune Recognition Immune System), Fuzzy-AIRS (Artificial Immune Recognition Immune System with Fuzzy Resource Allocation Mechanism) and Artificial neural network. As a result of classifying the biomedical datasets, k-NN based weighted method was the best data weighting method among others. Among feature selection methods, the principal component analysis was superior to other methods. The twelve new hybrid systems was created combining feature selection methods, data weighting methods and classifier algorithms. These novel hybrid systems were applied to six medical datasets used in this thesis. The best hybrid system in terms of computation time and classification performance was chosen for each medical dataset.
- Published
- 2008
48. Automatic Visualization Pipeline Formation For Medical Datasets On Grid Computing Environment
- Author
-
Aboamama Atahar Ahmed, Muhammad Shafie Abd Latiff, Kamalrulnizam Abu Bakar, and Zainul AhmadRajion
- Subjects
VTK ,Globus toolkit ,Grid computing ,thin clients ,visualization techniques ,Visualization ,Medical datasets - Abstract
Distance visualization of large datasets often takes the direction of remote viewing and zooming techniques of stored static images. However, the continuous increase in the size of datasets and visualization operation causes insufficient performance with traditional desktop computers. Additionally, the visualization techniques such as Isosurface depend on the available resources of the running machine and the size of datasets. Moreover, the continuous demand for powerful computing powers and continuous increase in the size of datasets results an urgent need for a grid computing infrastructure. However, some issues arise in current grid such as resources availability at the client machines which are not sufficient enough to process large datasets. On top of that, different output devices and different network bandwidth between the visualization pipeline components often result output suitable for one machine and not suitable for another. In this paper we investigate how the grid services could be used to support remote visualization of large datasets and to break the constraint of physical co-location of the resources by applying the grid computing technologies. We show our grid enabled architecture to visualize large medical datasets (circa 5 million polygons) for remote interactive visualization on modest resources clients., {"references":["Bethel .W, Tierney. Brian, Lee. J, Gunter .D, Lau S (2000): Visapult\nUsing High-Speed WANs and Network Data Caches to Enable Remote\nand Distributed Visualization, 2000 IEEE.","Xiaoyu Zhang, Chandrajit Bajaj, William Blanke : 2001 Scalable\nIsosurface Visualization of Massive Datasets on COTS Clusters :\nProceedings of the IEEE 2001 symposium on parallel and large-data\nvisualization and graphics.","Engel K Sommer .O, Ernst C, Ertl T. (2000): Remote 3D Visualization\nusing Image- Streaming Techniques. 2000.","Brett Beeson1,2, Mark Dwyer1, David 2005 : Server-side Visualization\nof Massive Datasets Thompson3 Proceedings of the First International\nConference on e-Science and Grid Computing (e-Science-05).","Foster, C. Kesselman, Nick .K. M., Tuecke .S (2002): The Physiology of\nthe Grid: An Open Grid Services Architecture for Distributed Systems\nIntegration. Technical report, Globus, February 2002.","McCormick B. H., DeFanti T. A., Brown M. D. (1987), \"Visualization in\nScientific Computing\", Computer Graphics 21 1-14.","Haber, R.B. and McNabb, D.A. 1990. Visualization Idioms: A\nConceptual Model for Scientific Visualization Systems. In:\nVisualization in Scientific Computing, Shriver, B., Neilson, G.M., and\nRosenblum, L.J., Eds., IEEE Computer Society Press, 74-93.","Upson, C., Faulhaber, T., Kamins, D., Schlegel, D., Laidlaw, D.,\nVroom, J., Gurwitz, R. and van Dam, A. 1989. The Application\nVisualization System: a Computational Environment for Scientific\nVisualization, IEEE Computer Graphics and Applications 9, 4, 30- 42.","Will Schroeder, Ken Martin, and Bill Lorensen, The Visualization\nToolkit: An Object-Oriented Approach To 3D Graphics. Second Edition.\nPrentice Hall. Upper Saddle River, NJ. 1998.\n[10] SGI. SGI OpenGL VizServer 3.1. Data sheet, SGI, March 2003.\n[11] Walton, J.P.R.B. (2004). NAG-s IRIS Explorer. In: Visualization\nHandbook, Johnson, C.R. and Hansen, C.D., Eds., Academic Press (in\npress). Available at http://www.nag.co.uk/doc/TechRep/Pdf/tr2_03.pdf\n[12] Walker D. W. , Grimstead .I (2004): Resource aware visualization\nenvironment. http://www.wesc.ac.uk/projects/rave/.2004\n[13] Wood. J, Brodlie, K., J. Walton. (2003) gViz - visualization and\nsteering for the grid. In Proceedings of the UK All Hands Meeting 2003,\nhttp://www.nesc.ac.uk/events/ahm2003/AHMCD/pdf/030.pdf.,\nhttp://www.visualization.leeds.ac.uk/gViz.\n[14] Charters, S., Holliman, N.S. and Munro, M. 2003. Visualization in e-\nDemand: Grid Service Architecture for Stereoscopic Visualization,\nProceedings of UK e-Science Second All Hands Meeting.\n[15] Osborne .J, Wright .H, (2003) SuperVise: Using Grid Tools to Support\nVisualization. In Proceedings of the Fifth International Conference on\nParallel Processing and Applied Mathematics (PPAM 2003).\n[16] Mahovsky .J, Benedicenti. L (2003): Architecture for Java-Based Real-\nTime Distributed Visualization. IEEE Transactions on Visualization and\nComputer Graphics, 9(4):570 - 579, October December 2003.\n[17] Allen .G, Benger. W, Goodale. T, Hege H.-C, Lanfermann . G , Merzky .\nA, Radke. T , Seidel .E, Shalf .J (2000): The Cactus Code: A Problem\nSolving Environment for the Grid. In Proceedings of the Ninth\nInternational Symposium on High Performance Distributed Computing\n(HPDC-00), pages 253-262. IEEE, August 2000.\n[18] Engel K. et al.. (2000): Combining Local and Remote Visualization\nTechniques for Interactive Volume Rendering in Medical Applications.\n2000.\n[19] Lorensen, William and Harvey E. Cline. Marching Cubes: A High\nResolution 3D Surface Construction Algorithm. Computer Graphics\n(SIGGRAPH 87 Proceedings) 21(4) July 1987, p. 163-170)\nhttp://www.cs.duke.edu/education/courses/fall01/cps124/resources/p163\n-lorensen.pdf\n[20] Ade J. Fewings and Nigel W. John, \"Distributed Graphics Pipelines on\nthe Grid,\" IEEE Distributed Systems Online, vol. 8, no. 1, 2007, art. no.\n0701-o1001.\n[21] Dutra, Rodrigues, Giraldi, Schulze, \"Distributed Visualization Using\nVTK in Grid Environments,\" ccgrid, pp. 381-388, Seventh IEEE\nInternational Symposium on Cluster Computing and the Grid (CCGrid\n'07), 2007.\n[22] William J. Schroeder, Jonathan A. Zarge , William E. Lorensen,\nDecimation of triangle meshes, ACM SIGGRAPH Computer Graphics,\nv.26 n.2, p.65-70, July 1992.\n[23] Thomas Sandholm and Jarek Gawor. Globus Toolkit 3 Core - A Grid\nService Container Framework. Globus Toolkit 3 Core White Paper, July\n2003.\n[24] M. L. Massie, B. N. Chun, and D. E. Culler, The Ganglia Distributed\nMonitoring System: Design, Implementation, and Experience, Parallel\nComputing, Vol. 30, Issue 7, July, 2004.\n[25] Ian Bowman 2004 Performance Modeling for 3D Visualization in a\nHeterogeneous Computing Environment: available online\nhttp://vis.lbl.gov/Publications/2004/Bowman-PGV-LBNL-56977.pdf"]}
- Published
- 2007
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.