202 results on '"Random subspace"'
Search Results
2. Characterization of the Freshness of Pork by Near-Infrared Spectroscopy (NIRS) and Ensemble Learning.
- Author
-
Tan, Cheng, Chen, Hui, Zeng, Miao, and Xue, Zhi
- Subjects
- *
PERISHABLE foods , *INDEPENDENT sets , *CHEMOMETRICS , *PORK , *SPECTROMETRY - Abstract
AbstractPork is a perishable food and often needs to be stored in the refrigerator to maintain its quality as much as possible. Traditional methods for discriminating fresh and refrigerated pork are subjective, time-consuming, or destructive. The feasibility of using near-infrared (NIR) spectroscopy combined with chemometrics was explored to discriminate fresh and refrigerated pork. A total of 104 samples including 40 fresh and 64 refrigerated samples were first prepared and split into the training and test sets. Both partial least squares (PLS) and a subspace-based ensemble algorithm were used to establish classifiers. Also, both the number of learners and the size of subspace were optimized for ensemble modeling. On the independent test set, three measures, that is, the sensitivity, specificity, and total accuracy of the ensemble classifier were 95%, 93.8%, and 94.2%, respectively, each of which is superior to that of the PLS classifier. In addition, the influence of training set composition on classifier performance was also studied, indicating that ensemble modeling is robust. The results show that the NIR spectroscopy coupled with such an ensemble model can serve as a potential tool of discriminating fresh and refrigerated pork. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Multi-label Random Subspace Ensemble Classification.
- Author
-
Bi, Fan, Zhu, Jianan, and Feng, Yang
- Subjects
- *
ARTIFICIAL neural networks , *CLASSIFICATION algorithms , *K-nearest neighbor classification , *ALGORITHMS , *CLASSIFICATION - Abstract
AbstractIn this work, we develop a new ensemble learning framework,
multi-label Random Subspace Ensemble (mRaSE), for multi-label classification. Given a base classifier (e.g., multinomial logistic regression, classification tree,K -nearest neighbors), mRaSE works by first randomly sampling a collection of subspaces, then choosing the best ones that achieve the minimum cross-validation errors and, finally, aggregating the chosen weak learners. In addition to its superior prediction performance, mRaSE also provides a model-free feature ranking depending on the given base classifier. An iterative version of mRaSE is also developed to further improve the performance. A model-free extension is pursued on the iterative version, leading to the so-calledSuper mRaSE , which accepts a collection of base classifiers as input to the algorithm. We show the proposed algorithms compared favorably with the state-of-the-art classification algorithm including random forest and deep neural network, via extensive simulation studies and two real data applications. The new algorithms are implemented in an updated version of the R package RaSEn. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
4. Application of Big Data Technology in Internet Financial Risk Control.
- Author
-
Jingjing Chen, Guixian Tian, and Jia Wang
- Subjects
INFORMATION technology ,DATA structures ,FINANCIAL risk ,LOSS control ,SOFT sets - Abstract
Big data technique is now a prevalent concentration for specialist and academics due to the development of information technology, cloud computing, and Internet of Things technologies. A fiscal risk controlling model using MSHDS-RS, was innovatively proposed to deal with the current situation of unreasonable design of features in risk controlling technique. This model's innovation is that the model utilizes a normalized sparse approach for optimizing feature fusion after drawing loan customer information sources' hard and soft features, thereby forming integrated features. Then, the feature subset derived from probability sampling is trained as a base classifier, and the results of the base classifier are fused and optimized using evidence reasoning rules. MSHDS-RS's accuracy improvement rate was about 3.0% and 3.6% higher than existing PMB-RS methods', respectively, by observing MSHDS-RS's operating results in different feature sets with soft and integrated feature indicators. Therefore, the proposed optimization fusion method is reliable and feasible. This research contributes to the control of internet financial risks and has certain value in making effective decisions on loan platforms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
5. Novel Ensemble Models Based on the Split‐Point Sampling and Node Attribute Subsampling Classifier for Groundwater Potential Mapping.
- Author
-
Wang, Zhengtao, Le, TienDuy, Tian, Kunjun, Phong, Tran Van, Bien, Tran Xuan, and Pham, Binh Thai
- Subjects
- *
MACHINE learning , *GROUNDWATER , *WATER supply , *WATERSHEDS , *RAINFALL - Abstract
Groundwater potential maps are crucial tools for effectively managing water resources, particularly in agriculturally focused countries such as Vietnam. However, creating these maps is a challenging task that requires reliable data and methods. In this study, we integrated the Split‐Point Sampling and Node Attribute Subsampling Classifier (SPAARC) with the Bagging (B), MultiBoostAB (MBAB), and Random Subspace (RSS) ensemble learning techniques and developed three ensemble models: B‐SPAARC, MBAB‐SPAARC, and RSS‐SPAARC. We selected 13 geoenvironmental factors based on their availability, relevance, and association with groundwater potential in the Sesan River basin of Vietnam. We assessed the models' performance using various metrics such as area under the curve (AUC), accuracy, sensitivity, specificity, and RMSE. The findings indicated that the ensemble models performed better than the single SPAARC model in mapping groundwater potential. The MBAB‐SPAARC model demonstrated the highest accuracy with an AUC value of 0.891, followed by B‐SPAARC (AUC = 0.844), RSS‐SPAARC (AUC = 0.871), and the single SPAARC (AUC = 0.853) models. The results also highlighted that elevation, rainfall, land use/cover, and altitude were the most significant factors for mapping groundwater potential in the Sesan River basin. The innovative ensemble models and reliable potential maps developed in this study assist water resource managers in planning water usage based on the benefits and costs for various users and in devising sustainable strategies for using, protecting, and managing groundwater. Key Points: Groundwater potential mapping was carried out at the Central Highlands of VietnamHybrid Machine learning Models based on Naïve Bayes Tree were developed and usedResults showed that the proposal models are potential and accurate tools [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Random Subspace Sampling for Classification with Missing Data.
- Author
-
Cao, Yun-Hao and Wu, Jian-Xin
- Subjects
STATISTICAL sampling ,CLASSIFICATION ,SAMPLING methods ,MISSING data (Statistics) - Abstract
Many real-world datasets suffer from the unavoidable issue of missing values, and therefore classification with missing data has to be carefully handled since inadequate treatment of missing values will cause large errors. In this paper, we propose a random subspace sampling method, RSS, by sampling missing items from the corresponding feature histogram distributions in random subspaces, which is effective and efficient at different levels of missing data. Unlike most established approaches, RSS does not train on fixed imputed datasets. Instead, we design a dynamic training strategy where the filled values change dynamically by resampling during training. Moreover, thanks to the sampling strategy, we design an ensemble testing strategy where we combine the results of multiple runs of a single model, which is more efficient and resource-saving than previous ensemble methods. Finally, we combine these two strategies with the random subspace method, which makes our estimations more robust and accurate. The effectiveness of the proposed RSS method is well validated by experimental studies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Learning from high dimensional data based on weighted feature importance in decision tree ensembles.
- Author
-
Pour, Nayiri Galestian and Shemehsavar, Soudabeh
- Subjects
- *
DECISION trees , *RANDOM forest algorithms , *IMAGE recognition (Computer vision) , *COMPUTATIONAL biology , *MACHINE learning , *DATA analysis - Abstract
Learning from high dimensional data has been utilized in various applications such as computational biology, image classification, and finance. Most classical machine learning algorithms fail to give accurate predictions in high dimensional settings due to the enormous feature space. In this article, we present a novel ensemble of classification trees based on weighted random subspaces that aims to adjust the distribution of selection probabilities. In the proposed algorithm base classifiers are built on random feature subspaces in which the probability that influential features will be selected for the next subspace, is updated by incorporating grouping information based on previous classifiers through a weighting function. As an interpretation tool, we show that variable importance measures computed by the new method can identify influential features efficiently. We provide theoretical reasoning for the different elements of the proposed method, and we evaluate the usefulness of the new method based on simulation studies and real data analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Integrating Support Vector Machines with Different Ensemble Learners for Improving Streamflow Simulation in an Ungauged Watershed.
- Author
-
Takai Eddine, Yahi, Nadir, Marouf, Sabah, Sehtal, and Jaafari, Abolfazl
- Subjects
SUPPORT vector machines ,STANDARD deviations ,MEDITERRANEAN climate ,WATERSHEDS - Abstract
Streamflow simulation, particularly in ungauged watersheds, poses a significant challenge in surface water hydrology. The estimation of natural river and streamflow has been a research focus in recent years, with numerous strategies proposed. Hybrid ensemble soft computing models have proven their effectiveness in predicting flow rates. This study proposes a modeling approach that integrates a support vector machine (SVM) with several ensemble learning techniques, such as Bagging, Dagging, Random subspace, and Rotation Forest, to predict flow rates in natural rivers of a Mediterranean climate in Algeria. The gauging data of the hydrometric station "Amont des gorges" were used, and the following quantitative parameters were considered: flow, velocity, depth, width, and hydraulic radius. The proposed models were evaluated based on Nash–Sutcliffe efficiency (NSE), root mean square error (RMSE), and correlation coefficient (R). Our results indicated that the ensemble models outperformed the standalone SVM model. More specifically, the SVM-Dagging model performed the best, with RMSE = 6.58, NSE = 0.76 and R = 0.96, followed by SVM-Bagging (RMSE = 6.83, NSE = 0.75, and R = 0.96), SVM-RF (RMSE = 6.95, NSE = 0.74, and R = 0.95), SVM-RSS (RMSE = 8.34, NSE = 0.62, and R = 0.93), and the standalone SVM models (RMSE = 7.71, NSE = 0.68, and R = 0.88), respectively. These findings suggest that the proposed ensemble models are valuable tools for accurately forecasting stream and river flows, aiding planners and decision-makers. Accurate prediction of flow rates in natural rivers can enhance water resource planning, optimize resource allocation, and improve water management practices. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Modeling Flood Susceptible Areas Using Deep Learning Techniques with Random Subspace: A Case Study of the Mae Chan Basin in Thailand.
- Author
-
Surachai Chantee and Theeraya Mayakul
- Subjects
DEEP learning ,ARTIFICIAL neural networks ,MACHINE learning ,GEOGRAPHIC information systems ,WILCOXON signed-rank test ,FEATURE selection ,LAND cover - Abstract
Flooding is a recurring global issue that leads to substantial loss of life and property damage. A crucial tool in managing and mitigating the impact of flooding is using flood hazard maps, which help identify high-risk areas and enable effective planning and management. This study presents a study on developing a predictive model to identify flood-prone areas in the Mae Chan Basin of Thailand using machine learning techniques, precisely the random sub-space ensemble method combined with a deep neural network (RS-DNN) and Nadam optimizer. The model was trained using 11 geographic information system (GIS) layers, including rainfall, elevation, slope, distance from the river, soil group, NDVI, road density, curvature, land use, flow accumulation, geology, and flood inventory data. Feature selection was carried out using the Gain Ratio method. The model was validated using accuracy, precision, ROC, and AUC metrics. Using the Wilcoxon signed-rank test, the effectiveness was compared to other machine learning algorithms, including random tree and support vector machines. The results showed that the RS-DNN model achieved a higher classification accuracy of 97% in both the training and testing datasets, compared to random tree (93%) and SVM (82%). The model's performance was also validated by its high AUC value of (0.99), compared to a random tree (0.93) and SVM (0.82) at a significance level of 0.05. In conclusion, the RS-DNN model is a highly accurate tool for identifying flood-prone areas, aiding in effective flood management and planning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. A Comparative Study of Genetic Algorithm-Based Ensemble Models and Knowledge-Based Models for Wildfire Susceptibility Mapping.
- Author
-
Al-Shabeeb, Abdel Rahman, Hamdan, Ibraheem, Meimandi Parizi, Sedigheh, Al-Fugara, A'kif, Odat, Sana'a, Elkhrachy, Ismail, Hu, Tongxin, and Sammen, Saad Sh.
- Abstract
Wildfire susceptibility mapping (WSM) plays a crucial role in identifying areas with heightened vulnerability to forest fires, allowing for proactive measures in fire prevention, management, and resource allocation, ultimately leading to more effective fire control and mitigation strategies. This paper describes our undertaking to develop and compare the performance of two knowledge-based models, namely the analytic hierarchy process (AHP) and the technique for order performance by similarity to ideal solution (TOPSIS), as well as two novel genetic algorithm (GA)-based ensemble data-driven models: boosting and random subspace. The objective was to map susceptibility to forest fires in the Northern Mazar District in Jordan. The ensemble models were constructed using four well-known classifiers: decision tree (DT), support vector machine (SVM), k-nearest neighbors (kNN), and naive Bayes (NB) algorithms. This study utilized seventy forest fire locations and twelve influential factors to build and evaluate the models. To identify the optimal features for constructing the data-driven models, a GA-based wrapper method and four machine learning models were applied. During the validation phase, the area under the receiver operating characteristic curve (AUROCC) values for the single SVM, single NB, single DT, single kNN, GA-based boosting, GA-based random subspace, FR-AHP, and AHP-TOPSIS models were found to be 85.3%, 85.9%, 73.8%, 88.7%, 95.0%, 95.0%, 74.0%, and 65.4% respectively. The results indicated that the GA-based ensemble models outperformed both the single machine learning models and the knowledge-based techniques in terms of performance. The developed models in this study can be effectively utilized in various management and decision-making processes aimed at mitigating forest fire risks and enhancing fire control strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. Forecasting Long-Series Daily Reference Evapotranspiration Based on Best Subset Regression and Machine Learning in Egypt.
- Author
-
Elbeltagi, Ahmed, Srivastava, Aman, Al-Saeedi, Abdullah Hassan, Raza, Ali, Abd-Elaty, Ismail, and El-Rawy, Mustafa
- Subjects
EVAPOTRANSPIRATION ,WATER management ,AGRICULTURAL water supply ,MACHINE learning ,TREE pruning ,HYDROLOGIC cycle - Abstract
The estimation of reference evapotranspiration (ET
o ), a crucial step in the hydrologic cycle, is essential for system design and management, including the balancing, planning, and scheduling of agricultural water supply and water resources. When climates vary from arid to semi-arid, and there are problems with a lack of meteorological data and a lack of future information on ETo , as is the case in Egypt, it is more important to estimate ETo precisely. To address this, the current study aimed to model ETo for Egypt's most important agricultural governorates (Al Buhayrah, Alexandria, Ismailiyah, and Minufiyah) using four machine learning (ML) algorithms: linear regression (LR), random subspace (RSS), additive regression (AR), and reduced error pruning tree (REPTree). The Climate Forecast System Reanalysis (CFSR) of the National Centers for Environmental Prediction (NCEP) was used to gather daily climate data variables from 1979 to 2014. The datasets were split into two sections: the training phase, i.e., 1979–2006, and the testing phase, i.e., 2007–2014. Maximum temperature (Tmax ), minimum temperature (Tmin ), and solar radiation (SR) were found to be the three input variables that had the most influence on the outcome of subset regression and sensitivity analysis. A comparative analysis of ML models revealed that REPTree outperformed competitors by achieving the best values for various performance matrices during the training and testing phases. The study's novelty lies in the use of REPTree to estimate and predict ETo , as this algorithm has not been commonly used for this purpose. Given the sparse attempts to use this model for such research, the remarkable accuracy of the REPTree model in predicting ETo highlighted the rarity of this study. In order to combat the effects of aridity through better water resource management, the study also cautions Egypt's authorities to concentrate their policymaking on climate adaptation. [ABSTRACT FROM AUTHOR]- Published
- 2023
- Full Text
- View/download PDF
12. Performance of Machine Learning Techniques for Meteorological Drought Forecasting in the Wadi Mina Basin, Algeria.
- Author
-
Achite, Mohammed, Elshaboury, Nehal, Jehanzaib, Muhammad, Vishwakarma, Dinesh Kumar, Pham, Quoc Bao, Anh, Duong Tran, Abdelkader, Eslam Mohammed, and Elbeltagi, Ahmed
- Subjects
WATER management ,DESERTIFICATION ,DROUGHT forecasting ,MACHINE performance ,MACHINE learning ,SUPPORT vector machines ,SOIL degradation - Abstract
Water resources, land and soil degradation, desertification, agricultural productivity, and food security are all adversely influenced by drought. The prediction of meteorological droughts using the standardized precipitation index (SPI) is crucial for water resource management. The modeling results for SPI at 3, 6, 9, and 12 months are based on five types of machine learning: support vector machine (SVM), additive regression, bagging, random subspace, and random forest. After training, testing, and cross-validation at five folds on sub-basin 1, the results concluded that SVM is the most effective model for predicting SPI for different months (3, 6, 9, and 12). Then, SVM, as the best model, was applied on sub-basin 2 for predicting SPI at different timescales and it achieved satisfactory outcomes. Its performance was validated on sub-basin 2 and satisfactory results were achieved. The suggested model performed better than the other models for estimating drought at sub-basins during the testing phase. The suggested model could be used to predict meteorological drought on several timescales, choose remedial measures for research basin, and assist in the management of sustainable water resources. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. Credal-Decision-Tree-Based Ensembles for Spatial Prediction of Landslides.
- Author
-
Gui, Jingyun, Pérez-Rey, Ignacio, Yao, Miao, Zhao, Fasuo, and Chen, Wei
- Subjects
LANDSLIDES ,LANDSLIDE prediction ,LANDSLIDE hazard analysis ,RECEIVER operating characteristic curves ,DECISION trees - Abstract
Spatial landslide susceptibility assessment is a fundamental part of landslide risk management and land-use planning. The main objective of this study is to apply the Credal Decision Tree (CDT), adaptive boosting Credal Decision Tree (AdaCDT), and random subspace Credal Decision Tree (RSCDT) models to construct landslide susceptibility maps in Zhashui County, China. The observed 169 historical landslides were classified into two groups: 70% (118 landslides) for training and 30% (51 landslides) for validation. To compare and validate the performance of the three models, the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) were utilized. Specifically, the success rates of the CDT model, AdaCDT model, and RSCDT model were 0.788, 0.821, and 0.847, respectively, while the corresponding prediction rates were 0.771, 0.802, and 0.861, respectively. In sum, the two ensemble models can effectively improve the performance accuracy of an individual CDT model, and the RSCDT model was proven to be superior to the other two models. Therefore, ensemble models are capable of being novel and promising approaches for the spatial prediction and zonation of a certain region's landslide susceptibility. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
14. Enhancing trustworthiness among iot network nodes with ensemble deep learning-based cyber attack detection.
- Author
-
Malathi, Dr S and Begum, S. Razool
- Subjects
- *
COMPUTER network traffic , *CONVOLUTIONAL neural networks , *TELECOMMUNICATION , *CYBER physical systems , *EXPERT systems - Abstract
A lot of machine learning methods and expert systems are used in network intrusion detection automation. When different industrial control systems merge with the Internet of Things (IoT) environment, they become vulnerable to cyber-attacks in critical infrastructure situations requiring communication technologies. Conventional machine learning techniques used for network anomaly detection are ineffective due to the substantial amount of network traffic within important Cyber-Physical Systems (CPSs). In this manuscript, Cyberattack Identification Through Ensemble Deep Learning in an IoT environment is proposed. Initially, the input network traffic data are taken from the IoT-23 dataset. Then the network traffic data are preprocessed using Z-Score normalization to reduce any irrelevant or erroneous data from the input dataset. Then the relevant features are selected using the Gorilla Troops Optimization (GTO) algorithm. Afterwards, the selected features are fed into the ensemble classification model based on Random Space (RS), Random Tree (RT), Extreme Gradient Boosting (XGBoost), and Graph Convolutional Neural Network (GCNN). Among the several ensembling techniques, GCNN can achieve better performance. Python is used to accomplish the suggested technique. The performance of the proposed GCNN method provides 12.09%, 4.34%, and 3.21% higher accuracy than the other models like RS, RT and XGBoost respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Groundwater potentiality mapping using ensemble machine learning algorithms for sustainable groundwater management
- Author
-
Showmitra Kumar Sarkar, Swapan Talukdar, Atiqur Rahman, Shahfahad, and Sujit Kumar Roy
- Subjects
Groundwater potentiality ,Data mining ,GIS ,Remote sensing ,Random subspace ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Purpose – The present study aims to construct ensemble machine learning (EML) algorithms for groundwater potentiality mapping (GPM) in the Teesta River basin of Bangladesh, including random forest (RF) and random subspace (RSS). Design/methodology/approach – The RF and RSS models have been implemented for integrating 14 selected groundwater condition parametres with groundwater inventories for generating GPMs. The GPM were then validated using the empirical and bionormal receiver operating characteristics (ROC) curve. Findings – The very high (831–1200 km2) and high groundwater potential areas (521–680 km2) were predicted using EML algorithms. The RSS (AUC-0.892) model outperformed RF model based on ROC's area under curve (AUC). Originality/value – Two new EML models have been constructed for GPM. These findings will aid in proposing sustainable water resource management plans.
- Published
- 2022
- Full Text
- View/download PDF
16. Drought indicator analysis and forecasting using data driven models: case study in Jaisalmer, India.
- Author
-
Elbeltagi, Ahmed, Kumar, Manish, Kushwaha, N. L., Pande, Chaitanya B., Ditthakit, Pakorn, Vishwakarma, Dinesh Kumar, and Subeesh, A.
- Subjects
- *
DROUGHT management , *DROUGHT forecasting , *DROUGHTS , *WEATHER forecasting , *IRRIGATION scheduling , *DATA modeling , *TREE pruning , *RANDOM forest algorithms - Abstract
Agricultural droughts are a prime concern for economies worldwide as they negatively impact the productivity of rain-fed crops, employment, and income per capita. In this study, Standard Precipitation Index (SPI) has been used to evaluate different drought indices for Rajasthan of India. In agricultural, hydrological, and meteorological applications such as irrigation scheduling, crop simulation, water budgeting, reservoir operations, and weather forecasting, the accurate estimation of the drought indices such as the Standardized Precipitation Index (SPI) plays an important role. Thus, the present study was conducted to examine the feasibility and effectiveness of the Random Subspace (RSS) model and its hybridization with the M5 Pruning tree (M5P), Random Forest (RF), and Random Tree (RT) to estimate the SPI at 3, 6, and 12 droughts during 2000–2019. Performances of RSS and hybridized algorithms were assessed and compared using performance indicators (i.e., MAE, RMSE, RAE, RRSE, and R2) and various graphical interpretations. Results indicated that the RSS-M5P provided the most accurate SPI prediction (MAE = 0.497, RMSE = 0.682, RAE = 81.88, RRSE = 87.22, and R2 = 0.507 for SPI-3; MAE = 0.452, RMSE = 0.717, RAE = 69.76, RRSE = 85.24, and R2 = 0.402 for SPI-6. And MAE = 0.294, RMSE = 0.377, RAE = 55.79, RRSE = 59.57, and R2 = 0.783 for SPI-12) compare to RSS alone, RSS-RF, and RSS-RT models for study the drought situation in Jaisalmer Rajasthan. The M5P algorithms have improved the performance of the RSS structure. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Evaluation of Data-driven Hybrid Machine Learning Algorithms for Modelling Daily Reference Evapotranspiration.
- Author
-
Kushwaha, Nand Lal, Rajput, Jitendra, Sena, D.R., Elbeltagi, Ahmed, Singh, D.K., and Mani, Indra
- Abstract
Copyright of Atmosphere -- Ocean (Taylor & Francis Ltd) is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2022
- Full Text
- View/download PDF
18. Breast cancer prediction from microRNA profiling using random subspace ensemble of LDA classifiers via Bayesian optimization.
- Author
-
Sharma, Sudhir Kumar, Vijayakumar, K., Kadam, Vinod J., and Williamson, Sheldon
- Subjects
FISHER discriminant analysis ,BREAST cancer ,NON-coding RNA ,MICRORNA ,TUMOR classification ,CANCER diagnosis - Abstract
Breast cancer rates are rising. It also remains the second principal reason for cancer-related mortality in females, and the mortality rate is also drastically rising. In recent years, MicroRNAs (miRNAs) have emerged to have a large potential as biomarkers because of their effective roles in human disease diagnosis (including breast cancer). miRNAs are small (short), regulatory, and evolutionarily conserved non-coding RNAs (ncRNAs) molecules (with a length of about 22 nucleotides) that are present in all eukaryotic cells. There are many studies available in the literature that focus on recent circulating miRNAs research, their relationships to human diseases, their role as a potential biomarker, etc. Therefore, in this study we used three key techniques for classification of breast cancer using miRNAs features: Linear Discriminant Analysis (LDA), Random Subspace Ensemble (RSE) and Bayesian Hyperparameter optimization (BHO). Linear Discriminant Analysis (LDA) is a simple but most practical and computationally attractive classification approach. Random Subspace Ensemble (RSE) is capable of producing a robust ensemble for classification. Some previous research showed applications of Bayesian optimization in many engineering optimization problems. Notably, it is a recently applied for hyperparameter tuning in various ensemble classifiers. Therefore, the potential application of the RSE of LDA classifiers (LDA as a base classifier) with BHO method to boost the predicting accuracy of breast cancer diagnosis using miRNAs profiling dataset, has been studied in this study. A publicly available dataset of serum miRNA profiles obtained from the GEO dataset (accession code GSE106817) has been applied for validation. A variety of output measurements were employed to determine the performances and efficiencies of the proposed model and other classifiers. The proposed approach exhibited successful overall performance. The results were directly compared with the individual LDA classifier and other established state-of-the-art classifiers. The outcomes point out that the approach is superior in terms of different efficiency indicators to the LDA and all established state-of-the-art models used in the study. Study simulations, outcomes, and mathematical investigations have illustrated that the technique presented is a practical and advantageous model for the classification of breast cancer from miRNA profiling. This model may usefully be employed in other cancer classifications from miRNA profiling. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. Landslide Susceptibility Modeling Using Remote Sensing Data and Random SubSpace-Based Functional Tree Classifier.
- Author
-
Peng, Tao, Chen, Yunzhi, and Chen, Wei
- Subjects
- *
LANDSLIDES , *LANDSLIDE hazard analysis , *NORMALIZED difference vegetation index , *REMOTE sensing , *RECEIVER operating characteristic curves , *REGRESSION trees - Abstract
In this study, a random subspace-based function tree (RSFT) was developed for landslide susceptibility modeling, and by comparing with a bagging-based function tree (BFT), classification regression tree (CART), and Naïve-Bayes tree (NBTree) Classifier, to judge the performance difference between the hybrid model and the single models. In the first step, according to the characteristics of the geological environment and previous literature, 12 landslide conditioning factors were selected, including aspect, slope, profile curvature, plan curvature, elevation, topographic wetness index (TWI), lithology, and normalized difference vegetation index (NDVI), land use, soil, distance to river and distance to the road. Secondly, 328 historical landslides were randomly divided into a training group and a validation group in a ratio of 70/30, and the important analysis of landslide points and conditional factors was carried out using the functional tree (FT) model. In the third step, all data are loaded into FT, RSFT, BFT, CART, and NBTree models for the generation of landslide susceptibility maps (LSM). Comparisons were made by the area under the receiver operating characteristic curve (AUC) to determine efficiency and effectiveness. According to the verification results, the five models selected this time all perform reasonably, but the RSFT model has the highest prediction rate (AUC = 0.838), which is better than the other three single machine learning models. The results of this study also demonstrated that the hybrid model generally improves the predictive power of the benchmark landslide susceptibility models. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
20. An automated clinical decision support system for predicting cardiovascular disease using ensemble learning approach.
- Subjects
CLINICAL decision support systems ,CARDIOVASCULAR diseases ,HEART disease diagnosis ,OUTLIER detection ,DISEASE risk factors ,K-nearest neighbor classification - Abstract
With the vast advancements in the medical domain, earlier prediction of disease plays a substantial role in enhancing the healthcare quality and assists in taking better decisions making during emergency times. Most of the existing research concentrates on modeling an automated prediction model for heart disease and the risk factors. Nevertheless, accurate classification is a vital challenge in heart disease diagnosis where the managing of high‐dimensional data increases the execution time of existing classifiers. In this paper, a new ensemble model has been proposed with the aid of random subspace and K‐nearest neighbor (RSS‐KNN) scheme for earlier prediction of heart disease. Primarily, the proposed scheme implements an isolation‐based outlier removal mechanism to eradicate the noises and outliers in the distributed data. Subsequently, the essential features are identified using RSS by varying the testing and training errors in the evaluation phase. The extracted features are then fed into KNN for the accurate classification of heart disease. Finally, an enhanced squirrel optimizer has been employed in the proposed scheme to obtain the global results which balance the exploration as well as exploitation issues and eliminate the over‐fitting problems. The simulation results manifest that the accuracy (without features) of the proposed ensemble RSS‐KNN scheme in the UCI ML dataset is 97.65%, accuracy (with features) is 98.56%, and specificity is 98.10% when compared with existing state‐of‐the‐art classifiers. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. Ensemble machine learning models based on Reduced Error Pruning Tree for prediction of rainfall-induced landslides
- Author
-
Binh Thai Pham, Abolfazl Jaafari, Trung Nguyen-Thoi, Tran Van Phong, Huu Duy Nguyen, Neelima Satyam, Md Masroor, Sufia Rehman, Haroon Sajjad, Mehebub Sahana, Hiep Van Le, and Indra Prakash
- Subjects
machine learning ,ensemble modeling ,bagging ,decorate ,random subspace ,Mathematical geography. Cartography ,GA1-1776 - Abstract
In this paper, we developed highly accurate ensemble machine learning models integrating Reduced Error Pruning Tree (REPT) as a base classifier with the Bagging (B), Decorate (D), and Random Subspace (RSS) ensemble learning techniques for spatial prediction of rainfall-induced landslides in the Uttarkashi district, located in the Himalayan range, India. To do so, a total of 103 historical landslide events were linked to twelve conditioning factors for generating training and validation datasets. Root Mean Square Error (RMSE) and Area Under the receiver operating characteristic Curve (AUC) were used to evaluate the training and validation performances of the models. The results showed that the single REPT model and its derived ensembles provided a satisfactory accuracy for the prediction of landslides. The D-REPT model with RMSE = 0.351 and AUC = 0.907 was identified as the most accurate model, followed by RSS-REPT (RMSE = 0.353 and AUC = 0.898), B-REPT (RMSE = 0.396 and AUC = 0.876), and the single REPT model (RMSE = 0.398 and AUC = 0.836), respectively. The prominent ensemble models proposed and verified in this study provide engineers and modelers with insights for development of more advanced predictive models for different landslide-susceptible areas around the world.
- Published
- 2021
- Full Text
- View/download PDF
22. Application of an ensemble learning model based on random subspace and a J48 decision tree for landslide susceptibility mapping: a case study for Qingchuan, Sichuan, China.
- Author
-
Li, Yangchun, Lin, Feikai, Luo, Xiangang, Zhu, Shuang, Li, Jiang, Xu, Zhanya, Liu, Xiuwei, Luo, Shungen, Huo, Guangjie, Peng, Liangsheng, and Feng, Haiping
- Subjects
LANDSLIDE hazard analysis ,LANDSLIDES ,DECISION trees ,ARTIFICIAL neural networks - Abstract
Landslides are a serious natural hazard in the world. A map of landslide susceptibility can help to effectively reduce losses. In this paper, a hybrid ensemble technique based on random subspace (RS) and a J48 decision tree named RS–J48T was proposed for landslide susceptibility mapping. This model could enhance the effect of a single classifier significantly and solve the problem of overfitting. Qingchuan County, Sichuan (China) was taken as a study area. A geospatial database which consisted of 640 landslide locations and 12 factors was constructed for this study. The J48 decision tree, artificial neural network (ANN), and other ensemble techniques like AdaBoost and Bagging, were selected for comparison. Receiver operating curves and some statistical indices were used for model validation. The results showed that the RS–J48T model had the better fitting capability (AUC = 0.875), and the best prediction capability (AUC = 0.769) compared to other models. Overall, the novel hybrid model could be a promising way for generating landslide susceptibility maps for other prone areas. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
23. An Efficient Classification of MRI Brain Images
- Author
-
Muhammad Assam, Hira Kanwal, Umar Farooq, Said Khalid Shah, Arif Mehmood, and Gyu Sang Choi
- Subjects
Color moments (CMs) ,feed forward artificial neural network (FF-ANN) ,random subspace ,random forest ,baysnet ,principle component analysis (PCA) ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The unprecedented improvements in computing capabilities and the introduction of advanced techniques for the analysis, interpretation, processing, and visualization of images have greatly diversified the domain of medical sciences and resulted in the field of medical imaging. The Magnetic Resonance Imaging (MRI), an advanced imaging technique, is capable of producing high quality images of the human body including the brain for diagnosis purposes. This paper proposes a simple but efficient solution for the classification of MRI brain images into normal, and abnormal images containing disorders and injuries. It uses images with brain tumor, acute stroke and alzheimer, besides normal images, from the public dataset developed by harvard medical school, for evaluation purposes. The proposed model is a four step process, in which the steps are named: 1). Pre-processing, 2). Features Extraction, 3). Features Reduction, and 4). Classification. Median filter, being one of the best algorithms, is used for the removal of noise such as salt and pepper, and unwanted components such as scalp and skull, in the pre-processing step. During this stage, the images are converted from gray scale to colored images for further processing. In second step, it uses Discrete Wavelet Transform (DWT) technique to extract different features from the images. In third stage, Color Moments (CMs) are used to reduce the number of features and get an optimal set of characteristics. Images with the optimal set of features are passed to different classifiers for the classification of images. The Feed Forward - ANN (FF-ANN), an individual classifier, which was given a 65% to 35% split ratio for training and testing, and hybrid classifiers called: Random Subspace with Random Forest (RSwithRF) and Random Subspace with Bayesian Network (RSwithBN), which used 10-Fold cross validation technique, resulted in 95.83%, 97.14% and 95.71% accurate classification, in corresponding order. These promising results show that the proposed method is robust and efficient, in comparison with, existing classification methods in terms of accuracy with smaller number of optimal features.
- Published
- 2021
- Full Text
- View/download PDF
24. Rotation forest of random subspace models.
- Author
-
Alexandropoulos, Stamatios-Aggelos N., Aridas, Christos K., Kotsiantis, Sotiris B., Gravvanis, George A., and Vrahatis, Michael N.
- Subjects
ROTATIONAL motion ,DATA mining - Abstract
During the last decade, a variety of ensembles methods has been developed. All known and widely used methods of this category produce and combine different learners utilizing the same algorithm as the basic classifiers. In the present study, we use two well-known approaches, namely, Rotation Forest and Random Subspace, in order to increase the effectiveness of a single learning algorithm. We have conducted experiments with other well-known ensemble methods, with 25 sub-classifiers, in order to test the proposed model. The experimental study that we have conducted is based on 35 various datasets. According to the Friedman test, the Rotation Forest of Random Subspace C4.5 (RFRS C4.5) and the PART (RFRS PART) algorithms exhibit the best scores in our resulting ranking. Our results have shown that the proposed method exhibits competitive performance and better accuracy in most of the cases. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
25. Genetically Optimized Ensemble Classifiers for Multiclass Student Performance Prediction.
- Author
-
Begum, Safira and Padmannavar, Sunita S.
- Subjects
DATA mining ,SCHOOL dropouts ,MACHINE learning ,GENETIC algorithms ,GENETIC models - Abstract
The knowledge obtained from data can be useful for the improvement of education systems, giving rise to a research space called Educational Data Mining (EDM). EDM covers the development of methods to explore information collected from educational environments, allowing to understand students more effectively and adequately, providing better educational benefits to them. Machine learning (ML) technologies are growing considerably in recent years. The field of data mining in education provides researchers and educators with metrics of success, failure, dropout, and more, allowing students to guess. The main reason for dropping out of school is not studying. Several researchers have proposed various educational data mining techniques to predict student performance and analyzed the techniques found in educational datasets. This paper proposes a student predictive model with the use of ensemble classifiers. Initially data is pre-processed and an analysis of the correlation between the entrance attributes was carried out to identify the existence of possible redundancies between them, resulting from a very high positive correlation. The filtered attribute is trained and tested with Boosting, Bagging and Random subspace classifiers. Further to improve the accuracy of predictive model genetic algorithm is applied on three classifiers. Genetic Algorithm is an approach used to find optimized solution to search problems and it intend to increase the probability of solving the problem. The process of optimization involves selection of the best option from the available set of options to achieve the desired goal. Selection is done such that the efficiency can be maximized and error can be minimized. An analysis of the correlation between the entrance attributes was carried out to identify the existence of possible redundancies between them, resulting from a very high positive correlation. There is significant improvement in classifier accuracy, when tested mathematic and Portuguese data i.e. 3 % and 11% respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
26. Digital Watermark Extraction Using RS-KNN and RS-LDA with LWT and Statistical Features
- Author
-
Jaiswal, Sushma and Pandey, Manoj Kumar
- Published
- 2023
- Full Text
- View/download PDF
27. Modeling groundwater potential using novel GIS-based machine-learning ensemble techniques
- Author
-
Alireza Arabameri, Subodh Chandra Pal, Fatemeh Rezaie, Omid Asadi Nalivan, Indrajit Chowdhuri, Asish Saha, Saro Lee, and Hossein Moayedi
- Subjects
Groundwater ,Machine learning ,Random subspace ,Ensemble models ,RS-GIS ,Iran ,Physical geography ,GB3-5030 ,Geology ,QE1-996.5 - Abstract
Study region: The present study has been carried out in the Tabriz River basin (5397 km2) in north-western Iran. Elevations vary from 1274 to 3678 m above sea level, and slope angles range from 0 to 150.9 %. The average annual minimum and maximum temperatures are 2 °C and 12 °C, respectively. The average annual rainfall ranges from 243 to 641 mm, and the northern and southern parts of the basin receive the highest amounts. Study focus: In this study, we mapped the groundwater potential (GWP) with a new hybrid model combining random subspace (RS) with the multilayer perception (MLP), naïve Bayes tree (NBTree), and classification and regression tree (CART) algorithms. A total of 205 spring locations were collected by integrating field surveys with data from Iran Water Resources Management, and divided into 70:30 for training and validation. Fourteen groundwater conditioning factors (GWCFs) were used as independent model inputs. Statistics such as receiver operating characteristic (ROC) and five others were used to evaluate the performance of the models. New hydrological insights for the region: The results show that all models performed well for GWP mapping (AUC > 0.8). The hybrid MLP-RS model achieved high validation scores (AUC = 0.935). The relative importance of GWCFs was revealed that slope, elevation, TRI and HAND are the most important predictors of groundwater presence. This study demonstrates that hybrid ensemble models can support sustainable management of groundwater resources.
- Published
- 2021
- Full Text
- View/download PDF
28. Landslide susceptibility mapping using an ensemble model of Bagging scheme and random subspace–based naïve Bayes tree in Zigui County of the Three Gorges Reservoir Area, China.
- Author
-
Hu, Xudong, Huang, Cheng, Mei, Hongbo, and Zhang, Han
- Subjects
- *
LANDSLIDES , *LANDSLIDE hazard analysis , *LANDSLIDE prediction , *GORGES , *SUPPORT vector machines , *PEARSON correlation (Statistics) , *RANDOM forest algorithms - Abstract
A novel machine learning ensemble model that is a hybridization of Bagging and random subspace–based naïve Bayes tree (RSNBtree), named as BRSNBtree, was used to prepare a landslide susceptibility map for Zigui County of the Three Gorges Reservoir Area, China. The proposed method is implemented by using the Bagging scheme to integrate the base-level RSNBtree model. To predict landslide susceptibility for the study area, a spatial database consisted of 807 landslides and 11 conditioning factors has been prepared. Evaluation of conditioning factors was conducted using the Pearson correlation coefficient and Relief-F method. The results indicate that all factors except the topographic wetness index can be accepted as modeling inputs. Particularly, the distance to rivers is the most important factor in landslide susceptibility prediction. The performance of landslide models was evaluated using statistical indices and areas under the receiver operatic characteristic curve (AUC). The support vector machines (SVM) and random forest (RF) were adopted for the comparison with our methods. Results show that the BRSNBtree (AUC = 0.968) achieves the highest prediction performance, which successfully refines the RSNBtree (AUC = 0.938) and outperforms the RF (AUC = 0.949) and SVM (AUC = 0.895). Therefore, the proposed BRSNBtree presents advantages in targeting landslide susceptible areas and provides a promising method for landslide susceptibility assessment. The developed susceptibility maps could facilitate effective landslide risk management for this landslide-prone area. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
29. Ensemble machine learning models based on Reduced Error Pruning Tree for prediction of rainfall-induced landslides.
- Author
-
Pham, Binh Thai, Jaafari, Abolfazl, Nguyen-Thoi, Trung, Van Phong, Tran, Nguyen, Huu Duy, Satyam, Neelima, Masroor, Md, Rehman, Sufia, Sajjad, Haroon, Sahana, Mehebub, Van Le, Hiep, and Prakash, Indra
- Subjects
LANDSLIDE prediction ,TREE pruning ,MACHINE learning ,LANDSLIDE hazard analysis ,RECEIVER operating characteristic curves ,LANDSLIDES ,STANDARD deviations - Abstract
In this paper, we developed highly accurate ensemble machine learning models integrating Reduced Error Pruning Tree (REPT) as a base classifier with the Bagging (B), Decorate (D), and Random Subspace (RSS) ensemble learning techniques for spatial prediction of rainfall-induced landslides in the Uttarkashi district, located in the Himalayan range, India. To do so, a total of 103 historical landslide events were linked to twelve conditioning factors for generating training and validation datasets. Root Mean Square Error (RMSE) and Area Under the receiver operating characteristic Curve (AUC) were used to evaluate the training and validation performances of the models. The results showed that the single REPT model and its derived ensembles provided a satisfactory accuracy for the prediction of landslides. The D-REPT model with RMSE = 0.351 and AUC = 0.907 was identified as the most accurate model, followed by RSS-REPT (RMSE = 0.353 and AUC = 0.898), B-REPT (RMSE = 0.396 and AUC = 0.876), and the single REPT model (RMSE = 0.398 and AUC = 0.836), respectively. The prominent ensemble models proposed and verified in this study provide engineers and modelers with insights for development of more advanced predictive models for different landslide-susceptible areas around the world. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
30. Anti-cross validation technique for constructing and boosting random subspace neural network ensembles for hyperspectral image classification.
- Author
-
Eeti, Laxmi Narayana and Buddhiraju, Krishna Mohan
- Subjects
- *
FEATURE selection , *DATA mining , *CLASSIFICATION - Abstract
Achieving high classification accuracy is vital in reliable information extraction from images. Single classifiers and existing ensemble methods suffer from data dimensionality, insufficient ground truth information and lack in defining optimal feature selection. This article presents a novel idea for constructing component classifiers that boost random subspace ensemble method in improving its classification performance. It is achieved through sub-optimal training of component classifiers through interference in training process during validation error evaluation. The new approach allows to enforce different class errors among component classifiers, besides improving individual class accuracy. This article demonstrates effectiveness of the anti-cross validation approach using three classical hyperspectral Image (HSI) datasets with significant improvement in classification accuracies from 3 to 10% with the proposed approach. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
31. Random Subspace Ensemble Learning for Functional Near-Infrared Spectroscopy Brain-Computer Interfaces
- Author
-
Jaeyoung Shin
- Subjects
brain-computer interface ,ensemble learning ,functional near-infrared spectroscopy ,linear discriminant analysis ,random subspace ,support vector machine ,Neurosciences. Biological psychiatry. Neuropsychiatry ,RC321-571 - Abstract
The feasibility of the random subspace ensemble learning method was explored to improve the performance of functional near-infrared spectroscopy-based brain-computer interfaces (fNIRS-BCIs). Feature vectors have been constructed using the temporal characteristics of concentration changes in fNIRS chromophores such as mean, slope, and variance to implement fNIRS-BCIs systems. The mean and slope, which are the most popular features in fNIRS-BCIs, were adopted. Linear support vector machine and linear discriminant analysis were employed, respectively, as a single strong learner and multiple weak learners. All features in every channel and available time window were employed to train the strong learner, and the feature subsets were selected at random to train multiple weak learners. It was determined that random subspace ensemble learning is beneficial to enhance the performance of fNIRS-BCIs.
- Published
- 2020
- Full Text
- View/download PDF
32. Monthly suspended sediment load prediction using artificial intelligence: testing of a new random subspace method.
- Author
-
Nhu, Viet-Ha, Khosravi, Khabat, Cooper, James R., Karimi, Mahshid, Kisi, Ozgur, Pham, Binh Thai, and Lyu, Zongjie
- Subjects
- *
SUSPENDED sediments , *INTELLIGENCE tests , *FORECASTING , *ARTIFICIAL intelligence , *RADIAL basis functions - Abstract
The predictive capability of a new artificial intelligence method, random subspace (RS), for the prediction of suspended sediment load in rivers was compared with commonly used methods: random forest (RF) and two support vector machine (SVM) models using a radial basis function kernel (SVM-RBF) and a normalized polynomial kernel (SVM-NPK). Using river discharge, rainfall and river stage data from the Haraz River, Iran, the results revealed: (a) the RS model provided a superior predictive accuracy (NSE = 0.83) to SVM-RBF (NSE = 0.80), SVM-NPK (NSE = 0.78) and RF (NSE = 0.68), corresponding to very good, good, satisfactory and unsatisfactory accuracies in load prediction; (b) the RBF kernel outperformed the NPK kernel; (c) the predictive capability was most sensitive to gamma and epsilon in SVM models, maximum depth of a tree and the number of features in RF models, classifier type, number of trees and subspace size in RS models; and (d) suspended sediment loads were most closely correlated with river discharge (PCC = 0.76). Overall, the results show that RS models have great potential in data poor watersheds, such as that studied here, to produce strong predictions of suspended load based on monthly records of river discharge, rainfall depth and river stage alone. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
33. Random Subspace Ensemble Learning for Functional Near-Infrared Spectroscopy Brain-Computer Interfaces.
- Author
-
Shin, Jaeyoung
- Subjects
BRAIN-computer interfaces ,FISHER discriminant analysis ,SUPPORT vector machines ,SPECTROMETRY - Abstract
The feasibility of the random subspace ensemble learning method was explored to improve the performance of functional near-infrared spectroscopy-based brain-computer interfaces (fNIRS-BCIs). Feature vectors have been constructed using the temporal characteristics of concentration changes in fNIRS chromophores such as mean, slope, and variance to implement fNIRS-BCIs systems. The mean and slope, which are the most popular features in fNIRS-BCIs, were adopted. Linear support vector machine and linear discriminant analysis were employed, respectively, as a single strong learner and multiple weak learners. All features in every channel and available time window were employed to train the strong learner, and the feature subsets were selected at random to train multiple weak learners. It was determined that random subspace ensemble learning is beneficial to enhance the performance of fNIRS-BCIs. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
34. A random subspace based conic functions ensemble classifier.
- Author
-
ÇİMEN, Emre
- Subjects
- *
CLASSIFICATION algorithms , *HIGH-dimensional model representation , *ALGORITHMS , *SUPPORT vector machines , *LINEAR programming , *CONIC sections - Abstract
Classifiers overfit when the data dimensionality ratio to the number of samples is high in a dataset. This problem makes a classification model unreliable. When the overfitting problem occurs, one can achieve high accuracy in the training; however, test accuracy occurs significantly less than training accuracy. The random subspace method is a practical approach to overcome the overfitting problem. In random subspace methods, the classification algorithm selects a random subset of the features and trains a classifier function trained with the selected features. The classification algorithm repeats the process multiple times, and eventually obtains an ensemble of classifier functions. Conic functions based classifiers achieve high performance in the literature; however, these classifiers cannot overcome the overfitting problem when it is the case data dimensionality ratio to the number of samples is high. The proposed method fills the gap in the conic functions classifiers related literature. In this study, we combine the random subspace method and a novel conic function based classifier algorithm. We present the computational results by comparing the new approach with a wide range of models in the literature. The proposed method achieves better results than the previous implementations of conic function based classifiers and can compete with the other well-known methods. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
35. Soft Computing Ensemble Models Based on Logistic Regression for Groundwater Potential Mapping.
- Author
-
Nguyen, Phong Tung, Ha, Duong Hai, Avand, Mohammadtaghi, Jaafari, Abolfazl, Nguyen, Huu Duy, Al-Ansari, Nadhir, Van Phong, Tran, Sharma, Rohit, Kumar, Raghvendra, Le, Hiep Van, Ho, Lanh Si, Prakash, Indra, and Pham, Binh Thai
- Subjects
SOFT computing ,LOGISTIC regression analysis ,STANDARD deviations ,RECEIVER operating characteristic curves ,GROUNDWATER management - Abstract
Groundwater potential maps are one of the most important tools for the management of groundwater storage resources. In this study, we proposed four ensemble soft computing models based on logistic regression (LR) combined with the dagging (DLR), bagging (BLR), random subspace (RSSLR), and cascade generalization (CGLR) ensemble techniques for groundwater potential mapping in Dak Lak Province, Vietnam. A suite of well yield data and twelve geo-environmental factors (aspect, elevation, slope, curvature, Sediment Transport Index, Topographic Wetness Index, flow direction, rainfall, river density, soil, land use, and geology) were used for generating the training and validation datasets required for the building and validation of the models. Based on the area under the receiver operating characteristic curve (AUC) and several other validation methods (negative predictive value, positive predictive value, root mean square error, accuracy, sensitivity, specificity, and Kappa), it was revealed that all four ensemble learning techniques were successful in enhancing the validation performance of the base LR model. The ensemble DLR model (AUC = 0.77) was the most successful model in identifying the groundwater potential zones in the study area, followed by the RSSLR (AUC = 0.744), BLR (AUC = 0.735), CGLR (AUC = 0.715), and single LR model (AUC = 0.71), respectively. The models developed in this study and the resulting potential maps can assist decision-makers in the development of effective adaptive groundwater management plans. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
36. Data Intelligence Model and Meta-Heuristic Algorithms-Based Pan Evaporation Modelling in Two Different Agro-Climatic Zones: A Case Study from Northern India
- Author
-
Nand Lal Kushwaha, Jitendra Rajput, Ahmed Elbeltagi, Ashraf Y. Elnaggar, Dipaka Ranjan Sena, Dinesh Kumar Vishwakarma, Indra Mani, and Enas E. Hussein
- Subjects
support vector machine ,random tree ,random subspace ,sensitivity analysis ,Meteorology. Climatology ,QC851-999 - Abstract
Precise quantification of evaporation has a vital role in effective crop modelling, irrigation scheduling, and agricultural water management. In recent years, the data-driven models using meta-heuristics algorithms have attracted the attention of researchers worldwide. In this investigation, we have examined the performance of models employing four meta-heuristic algorithms, namely, support vector machine (SVM), random tree (RT), reduced error pruning tree (REPTree), and random subspace (RSS) for simulating daily pan evaporation (EPd) at two different locations in north India representing semi-arid climate (New Delhi) and sub-humid climate (Ludhiana). The most suitable combinations of meteorological input variables as covariates to estimate EPd were ascertained through the subset regression technique followed by sensitivity analyses. The statistical indicators such as root mean square error (RMSE), mean absolute error (MAE), Nash–Sutcliffe efficiency (NSE), Willmott index (WI), and correlation coefficient (r) followed by graphical interpretations, were utilized for model evaluation. The SVM algorithm successfully performed in reconstructing the EPd time series with acceptable statistical criteria (i.e., NSE = 0.937, 0.795; WI = 0.984, 0.943; r = 0.968, 0.902; MAE = 0.055, 0.993 mm/day; and RMSE = 0.092, 1.317 mm/day) compared with the other applied algorithms during the testing phase at the New Delhi and Ludhiana stations, respectively. This study also demonstrated and discussed the potential of meta-heuristic algorithms for producing reasonable estimates of daily evaporation using minimal meteorological input variables with applicability of the best candidate model vetted in two diverse agro-climatic settings.
- Published
- 2021
- Full Text
- View/download PDF
37. An evolutionary classifier for steel surface defects with small sample set
- Author
-
Mang Xiao, Mingming Jiang, Guangyao Li, Li Xie, and Li Yi
- Subjects
Surface defect ,Support vector machine ,Random subspace ,Bayes classifier ,Electronics ,TK7800-8360 - Abstract
Abstract Nowadays, surface defect detection systems for steel strip have replaced traditional artificial inspection systems, and automatic defect detection systems offer good performance when the sample set is large and the model is stable. However, the trained model does work well when a new production line is initiated with different equipment, processes, or detection devices. These variables make just tiny changes to the real-world model but have a significant impact on the classification result. To overcome these problems, we propose an evolutionary classifier with a Bayes kernel (BYEC) that can be adjusted with a small sample set to better adapt the model for a new production line. First, abundant features were introduced to cover detailed information about the defects. Second, we constructed a series of support vector machines (SVMs) with a random subspace of the features. Then, a Bayes classifier was trained as an evolutionary kernel fused with the results from the sub-SVM to form an integrated classifier. Finally, we proposed a method to adjust the Bayes evolutionary kernel with a small sample set. We compared the performance of this method to various algorithms; experimental results demonstrate that the proposed method can be adjusted with a small sample set to fit the changed model. Experimental evaluations were conducted to demonstrate the robustness, low requirement for samples, and adaptiveness of the proposed method.
- Published
- 2017
- Full Text
- View/download PDF
38. Classification via semi-supervised multi-random subspace sparse representation.
- Author
-
Zhao, Zhuang, Bai, Lianfa, Zhang, Yi, and Han, Jing
- Abstract
In this paper, we combine the random subspace and multi-view together and obtain a novel approach named semi-supervised multi-random subspace sparse representation (SSM-RSSR). In the proposed SSM-RSSR, firstly, we use subspace sparse representation to obtain the graph to characterize the distribution of samples in each subspace. Then, we fuse these graphs in the viewpoint of multi-view through an alternating optimization method and obtain the optimal coefficients of all random subspaces. Finally, we train a linear classifier under the framework of manifold regularization (MR) to obtain the final classified results. Through fusing the random subspaces, the proposed SSM-RSSR can obtain better and more stable results in a wider range of the dimension of random subspace and the number of random subspaces. Extensive experimental results on the several UCI datasets and face image datasets have demonstrated the effectiveness of the proposed SSM-RSSR. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
39. Deep Learning Ensemble for Hyperspectral Image Classification.
- Author
-
Chen, Yushi, Wang, Ying, Gu, Yanfeng, He, Xin, Ghamisi, Pedram, and Jia, Xiuping
- Abstract
Deep learning models, especially deep convolutional neural networks (CNNs), have been intensively investigated for hyperspectral image (HSI) classification due to their powerful feature extraction ability. In the same manner, ensemble-based learning systems have demonstrated high potential to effectively perform supervised classification. In order to boost the performance of deep learning-based HSI classification, the idea of deep learning ensemble framework is proposed here, which is loosely based on the integration of deep learning model and random subspace-based ensemble learning. Specifically, two deep learning ensemble-based classification methods (i.e., CNN ensemble and deep residual network ensemble) are proposed. CNNs or deep residual networks are used as individual classifiers and random subspaces contribute to diversify the ensemble system in a simple yet effective manner. Moreover, to further improve the classification accuracy, transfer learning is investigated in this study to transfer the learnt weights from one individual classifier to another (i.e., CNNs). This mechanism speeds up the learning stage. Experimental results with widely used hyperspectral datasets indicate that the proposed deep learning ensemble system provides competitive results compared with state-of-the-art methods in terms of classification accuracy. The combination of deep learning and ensemble learning provides a significant potential for reliable HSI classification. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
40. Enhanced Predictive Models for Construction Costs: A Case Study of Turkish Mass Housing Sector.
- Author
-
Ugur, Latif Onur, Kanit, Recep, Erdal, Hamit, Namli, Ersin, Erdal, Halil Ibrahim, Baykan, Umut Naci, and Erdal, Mursel
- Subjects
DOMESTIC architecture ,HOUSE construction ,CONSTRUCTION costs ,CONSTRUCTION projects ,MACHINE learning ,ECONOMIC competition - Abstract
The analysis of a construction project, regarding cost, is one of the most vital problem in planning. Due to its nature, the construction sector is an area of strong competition and estimation works are of vital importance. In recent years the Turkish Republic has started a serious urban regeneration movement in parallel to its economic development. This study is based on the drawings and quantities of 63 detached multi-story reinforced concrete housing unit projects of the Housing Development Administration (TOKI) and the Turkey Residential Building Cooperative Union (TURKKONUT). TOKI is a public company and its projects are that have been applied to 282 separate projects and are being applied to a further 266. On the other side TURKKONUT is a union of 1347 private building cooperative and have been completed 200,000 residential building. The main objective of this study is to improve the estimation accuracy of individual machine learning techniques, namely multi-layer perceptron and classification and regression trees and compares the performance of two machine learning meta-algorithms (i.e., bagging and random subspace) on a real world construction cost estimation problem. The study shows that the estimation accuracy of ensemble models are better than the models that constructed by their base learners and ensemble models may improve individual machine learning models. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
41. Random subspace-based ensemble modeling for near-infrared spectral diagnosis of colorectal cancer.
- Author
-
Chen, Hui, Lin, Zan, and Tan, Chao
- Subjects
- *
NEAR infrared spectroscopy , *NEAR infrared radiation , *COLON cancer , *GASTROINTESTINAL cancer , *DIAGNOSIS , *TISSUES - Abstract
Abstract The feasibility of using near-infrared (NIR) spectroscopy coupled with classifier ensemble for improving the diagnosis of colorectal cancer was explored. A total of 157 NIR spectra from the patients were recorded and partitioned into the training set and the test set. Four algorithms, i.e., Adaboost.M1, Totalboost and LPboost using decision tree as weak learners, together with random subspace method (RSM) using linear discriminant classifier (LDA) as weak learners, were used to construct diagnostic models. Some key parameters such as the size of ensemble, i.e., the number of weak learners in ensemble, and the size of each subspace in RSM, were optimized. The results indicated that, in terms of generalization ability, the RSM-based classifier outperforms all other classifiers by only 40 members with 30 features each. On the basis of 200 different training sets, model population analysis (MPA) was made. The average sensitivity and specificity of the RSM classifier were 97.4% and 95.6%, respectively. It indicates that the NIR technique combined with the RSM algorithm can serve as a potential means for automatic identification of colorectal tissues. Highlights • Subspace ensemble is used as the basic tool of NIR spectral diagnosis. • Model population analysis was used for robustness analysis. • Such a method is potential for objective clinical diagnosis. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
42. Geographical origin identification of ginseng using near-infrared spectroscopy coupled with subspace-based ensemble classifiers.
- Author
-
Chen, Hui, Tan, Chao, and Lin, Zan
- Subjects
- *
NEAR infrared spectroscopy , *GINSENG , *HERBAL medicine , *TRADITIONAL medicine , *SENSITIVITY & specificity (Statistics) , *PREDICTION models - Abstract
[Display omitted] • Rapid identification of geographical origin of ginseng. • Random subspace ensemble used as the modeling tool. • Simple strategy produces good results and can be generalized. Ginseng is a well-known traditional herbal medicine and the ginseng available on the market may not actually be produced in a certain place as claimed. Traditional methods of identifying the geographical origin of Ginseng are subjective, time-consuming or destructive. A more efficient approach is desirable. The feasibility of combining near-infrared (NIR) spectroscopy with ensemble learning for discriminating ginseng producing area was explored. A total of 270 samples were collected and evenly partitioned into the training and test sets. Random subspace ensemble (RSE) that uses linear discriminant classifier (LDA) as weak learner (abbreviated RSE-LDA) was used to construct predictive models. Two parameters including the size of subspace and the number of learners in ensemble were optimized. Classic partial least algorithm (PLS) was applied to build the reference model. The sensitivity, specificity, and total accuracy of final RSE-LDA and PLS models were 97.8 %, 100 %, 99.3 %, and 93.3 %, 96.7 %, 95.6 %, respectively. In order to study the impact of training set composition on the results, the samples were randomly divided 200 times and the algorithm was run repeatedly to statistically analyze the sensitivity and specificity on the test set. Similar results were obtained. The effect of training set size was also investigated. It indicates that the combination of NIR spectroscopy with the RSE algorithm is a potential tool of discriminating the origin of Ginseng. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Forecasting Long-Series Daily Reference Evapotranspiration Based on Best Subset Regression and Machine Learning in Egypt
- Author
-
Ahmed Elbeltagi, Aman Srivastava, Abdullah Hassan Al-Saeedi, Ali Raza, Ismail Abd-Elaty, and Mustafa El-Rawy
- Subjects
reference evapotranspiration ,machine learning algorithms ,linear regression ,random subspace ,additive regression ,reduced error pruning tree ,water resources management ,climate-resilient pathways ,Geography, Planning and Development ,Aquatic Science ,Biochemistry ,Water Science and Technology - Abstract
The estimation of reference evapotranspiration (ETo), a crucial step in the hydrologic cycle, is essential for system design and management, including the balancing, planning, and scheduling of agricultural water supply and water resources. When climates vary from arid to semi-arid, and there are problems with a lack of meteorological data and a lack of future information on ETo, as is the case in Egypt, it is more important to estimate ETo precisely. To address this, the current study aimed to model ETo for Egypt’s most important agricultural governorates (Al Buhayrah, Alexandria, Ismailiyah, and Minufiyah) using four machine learning (ML) algorithms: linear regression (LR), random subspace (RSS), additive regression (AR), and reduced error pruning tree (REPTree). The Climate Forecast System Reanalysis (CFSR) of the National Centers for Environmental Prediction (NCEP) was used to gather daily climate data variables from 1979 to 2014. The datasets were split into two sections: the training phase, i.e., 1979–2006, and the testing phase, i.e., 2007–2014. Maximum temperature (Tmax), minimum temperature (Tmin), and solar radiation (SR) were found to be the three input variables that had the most influence on the outcome of subset regression and sensitivity analysis. A comparative analysis of ML models revealed that REPTree outperformed competitors by achieving the best values for various performance matrices during the training and testing phases. The study’s novelty lies in the use of REPTree to estimate and predict ETo, as this algorithm has not been commonly used for this purpose. Given the sparse attempts to use this model for such research, the remarkable accuracy of the REPTree model in predicting ETo highlighted the rarity of this study. In order to combat the effects of aridity through better water resource management, the study also cautions Egypt’s authorities to concentrate their policymaking on climate adaptation.
- Published
- 2023
- Full Text
- View/download PDF
44. Performance of Machine Learning Techniques for Meteorological Drought Forecasting in the Wadi Mina Basin, Algeria
- Author
-
Mohammed Achite, Nehal Elshaboury, Muhammad Jehanzaib, Dinesh Vishwakarma, Quoc Pham, Duong Anh, Eslam Abdelkader, and Ahmed Elbeltagi
- Subjects
Geography, Planning and Development ,meteorological drought ,semi-arid regions ,support vector machine ,additive regression ,bagging ,random subspace ,random forest ,Aquatic Science ,Biochemistry ,Water Science and Technology - Abstract
Water resources, land and soil degradation, desertification, agricultural productivity, and food security are all adversely influenced by drought. The prediction of meteorological droughts using the standardized precipitation index (SPI) is crucial for water resource management. The modeling results for SPI at 3, 6, 9, and 12 months are based on five types of machine learning: support vector machine (SVM), additive regression, bagging, random subspace, and random forest. After training, testing, and cross-validation at five folds on sub-basin 1, the results concluded that SVM is the most effective model for predicting SPI for different months (3, 6, 9, and 12). Then, SVM, as the best model, was applied on sub-basin 2 for predicting SPI at different timescales and it achieved satisfactory outcomes. Its performance was validated on sub-basin 2 and satisfactory results were achieved. The suggested model performed better than the other models for estimating drought at sub-basins during the testing phase. The suggested model could be used to predict meteorological drought on several timescales, choose remedial measures for research basin, and assist in the management of sustainable water resources.
- Published
- 2023
- Full Text
- View/download PDF
45. Credal-Decision-Tree-Based Ensembles for Spatial Prediction of Landslides
- Author
-
Jingyun Gui, Ignacio Pérez-Rey, Miao Yao, Fasuo Zhao, and Wei Chen
- Subjects
landslide ,machine learning ,Credal Decision Tree ,Geography, Planning and Development ,AdaBoost ,random subspace ,Aquatic Science ,Biochemistry ,Water Science and Technology - Abstract
Spatial landslide susceptibility assessment is a fundamental part of landslide risk management and land-use planning. The main objective of this study is to apply the Credal Decision Tree (CDT), adaptive boosting Credal Decision Tree (AdaCDT), and random subspace Credal Decision Tree (RSCDT) models to construct landslide susceptibility maps in Zhashui County, China. The observed 169 historical landslides were classified into two groups: 70% (118 landslides) for training and 30% (51 landslides) for validation. To compare and validate the performance of the three models, the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) were utilized. Specifically, the success rates of the CDT model, AdaCDT model, and RSCDT model were 0.788, 0.821, and 0.847, respectively, while the corresponding prediction rates were 0.771, 0.802, and 0.861, respectively. In sum, the two ensemble models can effectively improve the performance accuracy of an individual CDT model, and the RSCDT model was proven to be superior to the other two models. Therefore, ensemble models are capable of being novel and promising approaches for the spatial prediction and zonation of a certain region’s landslide susceptibility.
- Published
- 2023
- Full Text
- View/download PDF
46. Soft Computing Ensemble Models Based on Logistic Regression for Groundwater Potential Mapping
- Author
-
Phong Tung Nguyen, Duong Hai Ha, Mohammadtaghi Avand, Abolfazl Jaafari, Huu Duy Nguyen, Nadhir Al-Ansari, Tran Van Phong, Rohit Sharma, Raghvendra Kumar, Hiep Van Le, Lanh Si Ho, Indra Prakash, and Binh Thai Pham
- Subjects
machine learning ,ensemble modeling ,dagging ,bagging ,random subspace ,cascade generalization ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Groundwater potential maps are one of the most important tools for the management of groundwater storage resources. In this study, we proposed four ensemble soft computing models based on logistic regression (LR) combined with the dagging (DLR), bagging (BLR), random subspace (RSSLR), and cascade generalization (CGLR) ensemble techniques for groundwater potential mapping in Dak Lak Province, Vietnam. A suite of well yield data and twelve geo-environmental factors (aspect, elevation, slope, curvature, Sediment Transport Index, Topographic Wetness Index, flow direction, rainfall, river density, soil, land use, and geology) were used for generating the training and validation datasets required for the building and validation of the models. Based on the area under the receiver operating characteristic curve (AUC) and several other validation methods (negative predictive value, positive predictive value, root mean square error, accuracy, sensitivity, specificity, and Kappa), it was revealed that all four ensemble learning techniques were successful in enhancing the validation performance of the base LR model. The ensemble DLR model (AUC = 0.77) was the most successful model in identifying the groundwater potential zones in the study area, followed by the RSSLR (AUC = 0.744), BLR (AUC = 0.735), CGLR (AUC = 0.715), and single LR model (AUC = 0.71), respectively. The models developed in this study and the resulting potential maps can assist decision-makers in the development of effective adaptive groundwater management plans.
- Published
- 2020
- Full Text
- View/download PDF
47. A New Modeling Approach for Spatial Prediction of Flash Flood with Biogeography Optimized CHAID Tree Ensemble and Remote Sensing Data
- Author
-
Viet-Nghia Nguyen, Peyman Yariyan, Mahdis Amiri, An Dang Tran, Tien Dat Pham, Minh Phuong Do, Phuong Thao Thi Ngo, Viet-Ha Nhu, Nguyen Quoc Long, and Dieu Tien Bui
- Subjects
flash flood modeling ,sentinel-1 ,random subspace ,decsion tree ,machine learning ,Vietnam ,Science - Abstract
Flash floods induced by torrential rainfalls are considered one of the most dangerous natural hazards, due to their sudden occurrence and high magnitudes, which may cause huge damage to people and properties. This study proposed a novel modeling approach for spatial prediction of flash floods based on the tree intelligence-based CHAID (Chi-square Automatic Interaction Detector)random subspace, optimized by biogeography-based optimization (the CHAID-RS-BBO model), using remote sensing and geospatial data. In this proposed approach, a forest of tree intelligence was constructed through the random subspace ensemble, and, then, the swarm intelligence was employed to train and optimize the model. The Luc Yen district, located in the northwest mountainous area of Vietnam, was selected as a case study. For this circumstance, a flood inventory map with 1866 polygons for the district was prepared based on Sentinel-1 synthetic aperture radar (SAR) imagery and field surveys with handheld GPS. Then, a geospatial database with ten influencing variables (land use/land cover, soil type, lithology, river density, rainfall, topographic wetness index, elevation, slope, curvature, and aspect) was prepared. Using the inventory map and the ten explanatory variables, the CHAID-RS-BBO model was trained and verified. Various statistical metrics were used to assess the prediction capability of the proposed model. The results show that the proposed CHAID-RS-BBO model yielded the highest predictive performance, with an overall accuracy of 90% in predicting flash floods, and outperformed benchmarks (i.e., the CHAID, the J48-DT, the logistic regression, and the multilayer perception neural network (MLP-NN) models). We conclude that the proposed method can accurately estimate the spatial prediction of flash floods in tropical storm areas.
- Published
- 2020
- Full Text
- View/download PDF
48. A Comparison of Extended Space Forests for Classifier Ensembles on Short Turkish Texts.
- Author
-
Kilimci, Zeynep Hilal and Omurca, Sevinç İlhan
- Subjects
- *
CLASSIFICATION , *RANDOM forest algorithms , *NAIVE Bayes classification , *BOOTSTRAP aggregation (Algorithms) , *FEATURE selection , *TURKISH language - Abstract
The proliferation of text documents available on the different digital platforms has attracted the attention of many researchers due to the classification problems. In order to boost classification success, they focus on parsing of documents, tokenization, stop-words removal, stemming, and representation of documents in convenient forms and weights, selection of features and classifiers, training and testing phases. In this study, we center on classifier ensembles with extended space forests by enriching feature space with various feature selection techniques in text categorization domain. For this purpose, original feature space is enhanced with random combinations of features and significant features which have high classification capacity by using gain ratio as a feature selection technique. Then, the training procedure is carried out with the well-known classification algorithm, namely naïve Bayes and various ensemble algorithms such as Bagging, Random Subspace and Random Forest. A wide range of comparative experiments are conducted on short Turkish texts gathered from Turkish National News Agency to demonstrate the contribution of our work. Eventually, experiment results represent that the versions of enhanced space forest perform better classification accuracy than the usage of original feature spaces on Turkish texts. [ABSTRACT FROM AUTHOR]
- Published
- 2017
49. A novel ensemble method for k-nearest neighbor.
- Author
-
Zhang, Youqiang, Cao, Guo, Wang, Bisheng, and Li, Xuesong
- Subjects
- *
PERTURBATION theory , *K-nearest neighbor classification , *ALGORITHMS , *DEMPSTER-Shafer theory , *IMAGE processing - Abstract
Highlights • We proposed a weighted heterogeneous distance metric (WHDM). • We presented WHDM and Dempster–Shafer theory based k NN algorithm. • We proposed a multimodal perturbation method (RRSB) for k NN ensemble. • The effectiveness of our algorithms was shown on multiple UCI data sets and a KDD data set. Abstract In this paper, to address the issue that ensembling k -nearest neighbor (k NN) classifiers with resampling approaches cannot generate component classifiers with a large diversity, we consider ensembling k NN through a multimodal perturbation-based method. Since k NN is sensitive to the input attributes, we propose a weighted heterogeneous distance Metric (WHDM). By using a WHDM and evidence theory, a progressive k NN classifier is developed. Based on a progressive k NN, the random subspace method, attribute reduction, and Bagging, a novel algorithm termed RRSB (reduced random subspace-based Bagging) is proposed for construct ensemble classifier, which can increase the diversity of component classifiers without damaging the accuracy of the component classifiers. In detail, RRSB adopts the perturbation on the learning parameter with a weighted heterogeneous distance metric, the perturbation on the input space with random subspace and attribute reduction, the perturbation on the training data with Bagging, and the perturbation on the output target of k neighbors with evidence theory. In the experimental stage, the value of k , the different perturbations on RRSB and the ensemble size are analyzed. In addition, RRSB is compared with other multimodal perturbation-based ensemble algorithms on multiple UCI data sets and a KDD data set. The results from the experiments demonstrate the effectiveness of RRSB for k NN ensembling. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
50. Probabilistic semi-supervised random subspace sparse representation for classification.
- Author
-
Zhao, Zhuang, Bai, Lianfa, Zhang, Yi, and Han, Jing
- Subjects
SUBSPACES (Mathematics) ,SUBSPACE identification (Mathematics) ,PROBABILITY theory ,DISTRIBUTION (Probability theory) ,SPARSE approximations - Abstract
In this paper, we present a novel approach for classification named Probabilistic Semi-supervised Random Subspace Sparse Representation (P-RSSR). In many random subspaces based methods, all features have the same probability to be selected to compose the random subspace. However, in the real world, especially in images, some regions or features are important for classification and some are not. In the proposed P-RSSR, firstly, we calculate the distribution probability of the image and determine which feature is selected to compose the random subspace. Then, we use Sparse Representation (SR) to construct graphs to characterize the distribution of samples in random subspaces, and train classifiers under the framework of Manifold Regularization (MR) in these random subspaces. Finally, we fuse the results in all random subspaces and obtain the classified results through majority vote. Experimental results on face image datasets have demonstrated the effectiveness of the proposed P-RSSR. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.