105 results on '"leave-one-out cross validation"'
Search Results
2. 一种基于BP神经网络的完井液 污染类型识别方法.
- Author
-
程鑫, 张太亮, 杨兰平, 阳清正, and 白毅
- Subjects
- *
K-means clustering , *COMPUTER simulation , *FLUIDS - Abstract
Objective The aim is to solve the problem that the contamination type of the completion fluid can not be effectively identified after being contained by brine and residual acid during the drilling of the target layer. Methods The contamination of brine and residual acid with different mass fractions of the completion fluid was measured, and the labels of the data samples with different contamination degrees were revised by K-means clustering algorithm. Different BP neural network models were trained according to the difficulty of obtaining data sample features and the number of hidden layers, and the classification accuracy of the models was tested by leave-one-out cross validation method. Results It is found that the more features the data samples possess, the higher classification accuracy of the trained BP neural network could be achieved, while more hidden layers would lower the classification accuracy. The BP neural network model with one hidden layer was subsequently established with data samples that contain four kinds of features including "rheology+aging+filtration loss+well name". The average classification accuracy rate reached as high as 93.18%. Conclusions The BP neural network model trained by rheology and filtration loss features can be quickly deployed in the oil-testing sites to solve the problem of failing to identify the type of completion fluid contamination due to the lack of special equipment in the field. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Estimation of the nonparametric mean and covariance functions for multivariate longitudinal and sparse functional data.
- Author
-
Tengteng, Xu and Zhang, Riquan
- Subjects
- *
NONPARAMETRIC estimation , *STATISTICAL correlation , *MEASUREMENT errors - Abstract
Estimation of the mean and covariance functions is very important to analyze multivariate longitudinal and sparse functional data. We define a new covariance function that not only consider the correlation of different observed responses for the same biomarker but different biomarkers. Full quasi-likelihood and the kernel method are used to approximate mean and covariance functions, the covariance decomposition is considered to decompose covariance functions to correlation function and variance function. We use the full quasi-likelihood to solve measurement errors variance λ and choose the iterative algorithm to update the multivariate mean and covariance functions until convergence. Gaussian kernel and leave-one-out cross-validation are used to select bandwidth h. Finally, we give theoretical properties of the unknown functions and prove their convergence. Simulation and application results show the effectiveness of our proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
4. Estimation of spatial-functional based-line logit model for multivariate longitudinal data.
- Author
-
Xu, Tengteng, Zhang, Riquan, and Zhang, Xiuzhen
- Subjects
- *
PANEL analysis , *LOGISTIC regression analysis , *AIR quality , *DATA analysis - Abstract
In this paper, a novel method is proposed to analyze multivariate longitudinal data that contains spatial location information. The method has the advantage of analyzing the relationship between curves at neighbor time points and observing the relationship between locations. We offer the spatial covariance function and use functional PCA to estimate unknown parameter functions. A detail solving process and theoretical properties are introduced. Based on the gradient descent method and leave-one-out cross-validation method, we estimate those unknown parameters and select the principal components respectively. Furthermore, compared with other four methods, the proposed method shows a better category effect on simulation studies and air quality data analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Choosing the best value of shape parameter in radial basis functions by Leave-P-Out Cross Validation.
- Author
-
Yaghouti, Mohammad Reza and Farshadmoghadam, Farnaz
- Subjects
RADIAL basis functions ,ALGORITHMS ,EQUATIONS ,POLYNOMIALS ,PROBLEM solving - Abstract
The radial basis functions (RBFs) meshless method has high accuracy for the interpolation of scattered data in high dimensions. Most of the RBFs depend on a parameter, called shape parameter which plays a significant role to specify the accuracy of the RBF method. In this paper, we present three algorithms to choose the optimal value of the shape parameter. These are based on Rippa’s theory to remove data points from the data set and results obtained by Fasshauer and Zhang for the iterative approximate moving least square (AMLS) method. Numerical experiments confirm stable solutions with high accuracy compared to other methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. Efficient strategies for leave-one-out cross validation for genomic best linear unbiased prediction
- Author
-
Cheng, Hao, Garrick, Dorian J, and Fernando, Rohan L
- Subjects
Genetics ,Human Genome ,2.5 Research design and methodologies (aetiology) ,Aetiology ,Generic health relevance ,Leave-one-out cross validation ,GBLUP ,Animal Production ,Agricultural Biotechnology - Abstract
BackgroundA random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model.MethodsNaive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.ResultsEfficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations.ConclusionsEfficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.
- Published
- 2017
7. Multi-Objective Optimization of Three-Column Semi-Submersible Platforms Based on Surrogate Models
- Author
-
QIU Wenzhen, SONG Xingyu, and ZHANG Xinshu
- Subjects
multi-objective particle swarm optimization (mopso) ,surrogate model ,leave-one-out cross validation ,radial basis function ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Chemical engineering ,TP155-156 ,Naval architecture. Shipbuilding. Marine engineering ,VM1-989 - Abstract
In the initial design stage of a semi-submersible platform, the main particulars of the platform are the key factor affecting the hydrodynamic performance and construction cost. Therefore, multi-objective optimization of the main particulars of the semi-submersible platform is of great engineering significance. First, the design variables of each platform and sample database are determined by design of experiments. Then, the hydrodynamic performances of the semi-submersible platform are analyzed by using the panel method and Morison’s equation. The distribution of probes for estimating the wave elevations on the calm water surface is arranged, and the airgap can be computed. Based on the database obtained by numerical simulation, the surrogate models based on radial basis function (RBF) are established. Next, the formal parameters in RBF are obtained by using the leave-one-out cross validation method. The surrogate model can greatly improve the optimization efficiency. Finally, by using the multi-objective particle swarm optimization (MOPSO) method, taking safety and economy of offshore platforms as two optimization objectives, and taking platform stability, airgap and horizontal motion performance as constraints, the optimization program for the semi-submersible platform can be obtained. Through the detailed analyses of the optimization program for the semi-submersible platform, the most efficient design strategy for the three-column semi-submersible platform is proposed.
- Published
- 2021
- Full Text
- View/download PDF
8. 基于自适应不完全S变换与LOO-KELM算法的 复合电能质量扰动识别.
- Author
-
伊慧娟, 高云鹏, 朱彦卿, 黄 瑞, and 黄 纯
- Abstract
Copyright of Electric Power Automation Equipment / Dianli Zidonghua Shebei is the property of Electric Power Automation Equipment Press and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2022
- Full Text
- View/download PDF
9. Maps, trends, and temperature sensitivities—phenological information from and for decreasing numbers of volunteer observers.
- Author
-
Yuan, Ye, Härer, Stefan, Ottenheym, Tobias, Misra, Gourav, Lüpke, Alissa, Estrella, Nicole, and Menzel, Annette
- Subjects
- *
PLANT phenology , *METEOROLOGICAL services , *VOLUNTEERS , *TEMPERATURE , *VOLUNTEER service - Abstract
Phenology serves as a major indicator of ongoing climate change. Long-term phenological observations are critically important for tracking and communicating these changes. The phenological observation network across Germany is operated by the National Meteorological Service with a major contribution from volunteering activities. However, the number of observers has strongly decreased for the last decades, possibly resulting in increasing uncertainties when extracting reliable phenological information from map interpolation. We studied uncertainties in interpolated maps from decreasing phenological records, by comparing long-term trends based on grid-based interpolated and station-wise observed time series, as well as their correlations with temperature. Interpolated maps in spring were characterized by the largest spatial variabilities across Bavaria, Germany, with respective lowest interpolated uncertainties. Long-term phenological trends for both interpolations and observations exhibited mean advances of −0.2 to −0.3 days year−1 for spring and summer, while late autumn and winter showed a delay of around 0.1 days year−1. Throughout the year, temperature sensitivities were consistently stronger for interpolated time series than observations. Such a better representation of regional phenology by interpolation was equally supported by satellite-derived phenological indices. Nevertheless, simulation of observer numbers indicated that a decline to less than 40% leads to a strong decrease in interpolation accuracy. To better understand the risk of declining phenological observations and to motivate volunteer observers, a Shiny app is proposed to visualize spatial and temporal phenological patterns across Bavaria and their links to climate change–induced temperature changes. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
10. ALSBMF: Predicting lncRNA-Disease Associations by Alternating Least Squares Based on Matrix Factorization
- Author
-
Wen Zhu, Kaimei Huang, Xiaofang Xiao, Bo Liao, Yuhua Yao, and Fang-Xiang Wu
- Subjects
Alternating least squares ,disease similarity ,lncRNA similarity ,leave-one-out cross validation ,matrix factorization ,ROC curve ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In recent years, it has been increasingly clear that long non-coding RNAs (lncRNAs) are able to regulate their target genes at multi-levels, including transcriptional level, translational level, etc and play key regulatory roles in many important biological processes, such as cell differentiation, chromatin remodeling and more. Inferring potential lncRNA-disease associations is essential to reveal the secrets behind diseases, develop novel drugs, and optimize personalized treatments. However, biological experiments to validate lncRNA-disease associations are very time-consuming and costly. Thus, it is critical to develop effective computational models. In this study, we have proposed a method by alternating least squares based on matrix factorization to predict lncRNA-disease associations, referred to as ALSBMF. ALSBMF first decomposes the known lncRNA-disease correlation matrix into two characteristic matrices, then defines the optimization function using disease semantic similarity, lncRNA functional similarity and known lncRNA-disease associations and solves two optimal feature matrices by least squares method. The two optimal feature matrices are finally multiplied to reconstruct the scoring matrix, filling the missing values of the original matrix to predict lncRNA-disease associations. Compared to existing methods, ALSBMF has the same advantages as BPLLDA. It does not require negative samples and can predict associations related to novel lncRNAs or novel diseases. In addition, this study performs leave-one-out cross-validation (LOOCV) and five-fold cross-validation to evaluate the prediction performance of ALSBMF. The AUCs are 0.9501 and 0.9215, respectively, which are better than the existing methods. Furthermore colon cancer, kidney cancer, and liver cancer are selected as case studies. The predicted top three colon cancer, kidney cancer, and liver cancer-related lncRNAs were validated in the latest LncRNADisease database and related literature. In order to test the ability of ALSBMF to predict novel disease-associated lncRNAs and new lncRNA-associated diseases, all known associations of diseases and lncRNAs were eliminated, the predicted top five breast cancer, nasopharyngeal carcinoma cancer-related lncRNAs and top five H19, MALAT1 lncRNA-related cancers were validated in PubMed and dbSNP.
- Published
- 2020
- Full Text
- View/download PDF
11. A Comparison of Model Validation Techniques for Audio-Visual Speech Recognition
- Author
-
Seong, Thum Wei, Ibrahim, Mohd Zamri, Arshad, Nurul Wahidah Binti, Mulvaney, D. J., Angrisani, Leopoldo, Series editor, Arteaga, Marco, Series editor, Chakraborty, Samarjit, Series editor, Chen, Jiming, Series editor, Chen, Tan Kay, Series editor, Dillmann, Ruediger, Series editor, Duan, Haibin, Series editor, Ferrari, Gianluigi, Series editor, Ferre, Manuel, Series editor, Hirche, Sandra, Series editor, Jabbari, Faryar, Series editor, Kacprzyk, Janusz, Series editor, Khamis, Alaa, Series editor, Kroeger, Torsten, Series editor, Ming, Tan Cher, Series editor, Minker, Wolfgang, Series editor, Misra, Pradeep, Series editor, Möller, Sebastian, Series editor, Mukhopadhyay, Subhas Chandra, Series editor, Ning, Cun-Zheng, Series editor, Nishida, Toyoaki, Series editor, Panigrahi, Bijaya Ketan, Series editor, Pascucci, Federica, Series editor, Samad, Tariq, Series editor, Seng, Gan Woon, Series editor, Veiga, Germano, Series editor, Wu, Haitao, Series editor, Zhang, Junjie James, Series editor, Kim, Kuinam J., editor, Kim, Hyuncheol, editor, and Baek, Nakhoon, editor
- Published
- 2018
- Full Text
- View/download PDF
12. The Optimum Number of Latent Variables
- Author
-
Olivieri, Alejandro C. and Olivieri, Alejandro C.
- Published
- 2018
- Full Text
- View/download PDF
13. 基于不同可加性方法的黑龙江省红松 人工林林分生物量模型.
- Author
-
辛士冬, 严云仙, and 姜立春
- Abstract
Copyright of Chinese Journal of Applied Ecology / Yingyong Shengtai Xuebao is the property of Chinese Journal of Applied Ecology and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2020
- Full Text
- View/download PDF
14. Least squares support vector machines with fast leave-one-out AUC optimization on imbalanced prostate cancer data.
- Author
-
Wang, Guanjin, Teoh, Jeremy Yuen-Chun, Lu, Jie, and Choi, Kup-Sze
- Abstract
Quite often, the available pre-biopsy data for early prostate cancer detection are imbalanced. When the least squares support vector machines (LS-SVMs) are applied to such scenarios, it becomes naturally desirable for us to introduce the well-known AUC performance index into the LS-SVMs framework to avoid bias towards majority classes. However, this may result in high computational complexity for the minimal leave-one-out error. In this paper, by introducing the parameter λ , a generalized Area under the ROC curve (AUC) performance index R AUCLS is developed to theoretically guarantee that R AUCLS linearly depends on the classical AUC performance index R AUC . Based on both R AUCLS and the classical LS-SVM, a new AUC-based least squares support vector machine called AUC-LS-SVMs is proposed for directly and effectively classifying imbalanced prostate cancer data. The distinctive advantage of the proposed classifier AUC-LS-SVMs exists in that it can achieve the minimal leave-one-out error by quickly optimizing the parameter λ in R AUCLS using the proposed fast leave-one-out cross validation (LOOCV) strategy. The proposed classifier is first evaluated using generic public datasets. Further experiments are then conducted on a real-world prostate cancer dataset to demonstrate the efficacy of our proposed classifier for early prostate cancer detection. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
15. Classifying dynamic motor imagery with the locals-balanced extreme learning machine.
- Author
-
Zhang, Qizhong, Bai, Junda, Liu, Yang, and Zhou, Yizhi
- Subjects
MOTOR imagery (Cognition) ,MACHINE learning ,LARGE-scale brain networks ,BRAIN-computer interfaces ,REGULARIZATION parameter ,MOTORS - Abstract
• Classification of hand movements based on dynamic motor imagination experiments. • Propose a new binarization method of brain function network obtained through feature fusion. • Aiming at the two defects of extreme learning machine, a local equilibrium extreme learning machine is proposed. • Propose an improved version of leave-one-out cross-validation. The dynamic motor imagery (dMI) provides additional benefits as compared to the traditional motor imagery (MI) in training and neurorehabilitation field. The objective of this work is to develop a brain-computer interface (BCI) for dMI electroencephalograph (EEG). We propose a novel method by combining synchronization likelihood (SL) based functional brain network (FBN) and modified extreme learning machine (ELM) to interpret EEG. The proposed method 1) uses L1-norm and adaptive threshold instead of L2-norm and empirical threshold in SL method; 2) improves the procedure of FBN calculations (namely, SL matrix binarization) by combining threshold and MST methods; 3) detects two defects in standard ELM in fusion, and proposes the locals-balanced ELM (LBELM); 4) employs the special version of leave-one-out cross validation (LOO) approach improved the computational complexity of optimal threshold for binarization, the most effective fusion of two complementary features, and the optimal regularization parameters for regularized LBELM. As compared with the empirical threshold, the recognition rate of the optimal threshold increases by 1.39–11.11 %. Similarly, as compared with the regularized ELM, the recognition rate of LBELM increases by 2.78–9.73 %. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Efficient strategies for leave-one-out cross validation for genomic best linear unbiased prediction
- Author
-
Hao Cheng, Dorian J. Garrick, and Rohan L. Fernando
- Subjects
Leave-one-out cross validation ,GBLUP ,Animal culture ,SF1-1100 ,Veterinary medicine ,SF600-1100 - Abstract
Abstract Background A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model. Methods Naive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis. Results Efficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations. Conclusions Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.
- Published
- 2017
- Full Text
- View/download PDF
17. A Bidirectional Label Propagation Based Computational Model for Potential Microbe-Disease Association Prediction
- Author
-
Lei Wang, Yuqi Wang, Hao Li, Xiang Feng, Dawei Yuan, and Jialiang Yang
- Subjects
microbe-disease association ,bidirectional label propagation ,leave-one-out cross validation ,5-fold cross validation ,COPD ,Microbiology ,QR1-502 - Abstract
A growing number of clinical observations have indicated that microbes are involved in a variety of important human diseases. It is obvious that in-depth investigation of correlations between microbes and diseases will benefit the prevention, early diagnosis, and prognosis of diseases greatly. Hence, in this paper, based on known microbe-disease associations, a prediction model called NBLPIHMDA was proposed to infer potential microbe-disease associations. Specifically, two kinds of networks including the disease similarity network and the microbe similarity network were first constructed based on the Gaussian interaction profile kernel similarity. The bidirectional label propagation was then applied on these two kinds of networks to predict potential microbe-disease associations. We applied NBLPIHMDA on Human Microbe-Disease Association database (HMDAD), and compared it with 3 other recent published methods including LRLSHMDA, BiRWMP, and KATZHMDA based on the leave-one-out cross validation and 5-fold cross validation, respectively. As a result, the area under the receiver operating characteristic curves (AUCs) achieved by NBLPIHMDA were 0.8777 and 0.8958 ± 0.0027, respectively, outperforming the compared methods. In addition, in case studies of asthma, colorectal carcinoma, and Chronic obstructive pulmonary disease, simulation results illustrated that there are 10, 10, and 8 out of the top 10 predicted microbes having been confirmed by published documentary evidences, which further demonstrated that NBLPIHMDA is promising in predicting novel associations between diseases and microbes as well.
- Published
- 2019
- Full Text
- View/download PDF
18. 基于 GF-2 的油松人工林地上生物量反演.
- Author
-
苟睿坤, 陈佳琦, 段高辉, 杨瑞, 卜元坤, 赵君, and 赵鹏祥
- Abstract
Copyright of Chinese Journal of Applied Ecology / Yingyong Shengtai Xuebao is the property of Chinese Journal of Applied Ecology and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2019
- Full Text
- View/download PDF
19. Predicting the Number of Nearest Neighbor for kNN Classifier.
- Author
-
Yanying Li, Youlong Yang, Jinxing Che, and Long Zhang
- Subjects
K-nearest neighbor classification ,COMPETITION (Psychology) ,NEIGHBORHOODS - Abstract
The k nearest neighbor (kNN) rule is known as its simplicity, effectiveness, intuitiveness and competitive classification performance. Selecting the parameter k with the highest classification accuracy is crucial for kNN. There's no doubt that the leave-one-out cross validation (LOO-CV) is the best method to do this work as its almost unbiased property. However, it is too time consuming to be used in practice especially for large data. In this paper, we propose a new algorithm for selecting an optimal neighborhood size k. We found that the classification accuracy of LOO-CV is approximate concave for the parameter k. And a search method is proposed to pick out the optimal value of k. An empirical study conducted on 8 standard databases from the UCI repository shows that the new strategy can find the optimal k with significantly less time than the LOO-CV method. [ABSTRACT FROM AUTHOR]
- Published
- 2019
20. BPLLDA: Predicting lncRNA-Disease Associations Based on Simple Paths With Limited Lengths in a Heterogeneous Network
- Author
-
Xiaofang Xiao, Wen Zhu, Bo Liao, Junlin Xu, Changlong Gu, Binbin Ji, Yuhua Yao, Lihong Peng, and Jialiang Yang
- Subjects
disease similarity ,lncRNA similarity ,path with limited length ,Gaussian interaction profile kernel similarity ,leave-one-out cross validation ,ROC curve ,Genetics ,QH426-470 - Abstract
In recent years, it has been increasingly clear that long noncoding RNAs (lncRNAs) play critical roles in many biological processes associated with human diseases. Inferring potential lncRNA-disease associations is essential to reveal the secrets behind diseases, develop novel drugs, and optimize personalized treatments. However, biological experiments to validate lncRNA-disease associations are very time-consuming and costly. Thus, it is critical to develop effective computational models. In this study, we have proposed a method called BPLLDA to predict lncRNA-disease associations based on paths of fixed lengths in a heterogeneous lncRNA-disease association network. Specifically, BPLLDA first constructs a heterogeneous lncRNA-disease network by integrating the lncRNA-disease association network, the lncRNA functional similarity network, and the disease semantic similarity network. It then infers the probability of an lncRNA-disease association based on paths connecting them and their lengths in the network. Compared to existing methods, BPLLDA has a few advantages, including not demanding negative samples and the ability to predict associations related to novel lncRNAs or novel diseases. BPLLDA was applied to a canonical lncRNA-disease association database called LncRNADisease, together with two popular methods LRLSLDA and GrwLDA. The leave-one-out cross-validation areas under the receiver operating characteristic curve of BPLLDA are 0.87117, 0.82403, and 0.78528, respectively, for predicting overall associations, associations related to novel lncRNAs, and associations related to novel diseases, higher than those of the two compared methods. In addition, cervical cancer, glioma, and non-small-cell lung cancer were selected as case studies, for which the predicted top five lncRNA-disease associations were verified by recently published literature. In summary, BPLLDA exhibits good performances in predicting novel lncRNA-disease associations and associations related to novel lncRNAs and diseases. It may contribute to the understanding of lncRNA-associated diseases like certain cancers.
- Published
- 2018
- Full Text
- View/download PDF
21. Prediction of Low Birth Weight in Infants via Artificial Intelligence Techniques without Using Sonographic Measurements
- Author
-
Mahtab Farahbakhsh, Hamid Reza Marateb, Marjan Mansourian, and Masoomeh Goodarzi-Khoigani
- Subjects
Computer-aided medical diagnosis ,Leave-one-out cross validation ,Low birth weight ,Sequential feature selection ,Medicine ,Medicine (General) ,R5-920 - Abstract
Background: Birth weight is probably the most important factor affecting neonatal mortality and morbidity. Compared with normal weight infants, low-birth-weight (LBW) infants may be more at risk for many health problems. The prediction of low birth weight is important as it may cause mental and physical health problems in childhood and adulthood. We assessed a computer-aided diagnosis system to classify infants to low or normal birth weight categories. Methods: In the present study, the association between the low birth weight and the intake of about 40 types of macro- and micronutrients during the first (1st Tr), second (2nd Tr.) and third (3rd Tr.) trimesters was assessed based on demographic and reproductive characteristics, physical activity and nutrients intake in pregnant women. The dataset used in this study contained 526 pregnant women with 95 input features. The used classifiers were k-Nearest Neighbors (kNN), Probabilistic Neural Network (PNN), and two Adaptive Neuro-Fuzzy Classifiers (ANFC-SCG: Scaled Conjugate Gradient, ANFC-LHs: Linguistic Hedges). Also, sequential feature selection (FS) was applied on the low birth weight risk factors to reduce the feature space. Findings: The accuracy of the classifiers kNN, PNN, ANFC-SCG and ANFC-LHs were 48%, 50%, 50% and 50% without feature selection and 93%, 83%, 80% and 83% with feature selection, respectively. Conclusion: Among the tested classifiers, the statistical power and type I error (α) of the best configuration (FS-kNN; k = 3) were 96% and 0.10 in the Leave-One-Out validation framework, showing that the proposed diagnosis system is clinically reliable. Also, using Leave-One-Out cross-validation, the guarding against Type III error was granted.
- Published
- 2015
22. Brake fault diagnosis using Clonal Selection Classification Algorithm (CSCA) – A statistical learning approach
- Author
-
R. Jegadeeshwaran and V. Sugumaran
- Subjects
Decision tree ,Statistical features ,CSCA ,Attribute evaluator ,Leave-one-out cross validation ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
In automobile, brake system is an essential part responsible for control of the vehicle. Any failure in the brake system impacts the vehicle's motion. It will generate frequent catastrophic effects on the vehicle cum passenger's safety. Thus the brake system plays a vital role in an automobile and hence condition monitoring of the brake system is essential. Vibration based condition monitoring using machine learning techniques are gaining momentum. This study is one such attempt to perform the condition monitoring of a hydraulic brake system through vibration analysis. In this research, the performance of a Clonal Selection Classification Algorithm (CSCA) for brake fault diagnosis has been reported. A hydraulic brake system test rig was fabricated. Under good and faulty conditions of a brake system, the vibration signals were acquired using a piezoelectric transducer. The statistical parameters were extracted from the vibration signal. The best feature set was identified for classification using attribute evaluator. The selected features were then classified using CSCA. The classification accuracy of such artificial intelligence technique has been compared with other machine learning approaches and discussed. The Clonal Selection Classification Algorithm performs better and gives the maximum classification accuracy (96%) for the fault diagnosis of a hydraulic brake system.
- Published
- 2015
- Full Text
- View/download PDF
23. Determination of endometrial carcinoma with gene expression based on optimized Elman neural network.
- Author
-
Hu, Hongping, Wang, Haiyan, Bai, Yanping, and Liu, Maoxing
- Subjects
- *
ENDOMETRIAL cancer , *GENE expression , *NEURAL circuitry , *SIGNAL-to-noise ratio , *PARTICLE swarm optimization - Abstract
Abstract Endometrial carcinoma is a life-threatening disease that causes serious damage to the women's health. This paper discusses classifications of 87 endometrial samples with gene expressions that are cancerous or cancer-free. Every sample has 5 indicators. For every indicator, the corresponding genes of the missing data are deleted and the signal noise ratios (SNRs) are calculated to filter the irrelevant genes. Then the obtained new samples use the principle component analysis to decrease the dimensions. Finally 10 random samples are selected to be the testing samples for classification. Thus the classification accuracy rate is given for every indicator. Based on cancer related to 5 indicators, the combination of the 5 indicators is used to classify to make new 87 endometrial samples as cancerous or cancer-free. We repeatedly process these new samples by deleting the missing data, filtering the irrelevant genes with SNRs, and decreasing the dimensions with PCA, an obtain the new data. The proposed method is that the particle swarm algorithm (PSO) and the grey wolf optimizer (GWO) is combined to optimize the parameters of Elman recurrent neural network (ERNN), written as PSOGWO-ERNN. The results show that PSOGWO-ERNN is superior to the single ERNN, ERNN optimized by PSO or GWO (PSO-ERNN or GWO-ERNN), and the classification accuracy rate of PSOGWO-ERNN reaches 88.8506%. The results also show that the neural networks optimized by some swarm intelligence algorithms are more useful for classification. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
24. BPLLDA: Predicting lncRNA-Disease Associations Based on Simple Paths With Limited Lengths in a Heterogeneous Network.
- Author
-
Xiao, Xiaofang, Zhu, Wen, Liao, Bo, Xu, Junlin, Gu, Changlong, Ji, Binbin, Yao, Yuhua, Peng, Lihong, and Yang, Jialiang
- Abstract
In recent years, it has been increasingly clear that long noncoding RNAs (lncRNAs) play critical roles in many biological processes associated with human diseases. Inferring potential lncRNA-disease associations is essential to reveal the secrets behind diseases, develop novel drugs, and optimize personalized treatments. However, biological experiments to validate lncRNA-disease associations are very time-consuming and costly. Thus, it is critical to develop effective computational models. In this study, we have proposed a method called BPLLDA to predict lncRNA-disease associations based on paths of fixed lengths in a heterogeneous lncRNA-disease association network. Specifically, BPLLDA first constructs a heterogeneous lncRNA-disease network by integrating the lncRNA-disease association network, the lncRNA functional similarity network, and the disease semantic similarity network. It then infers the probability of an lncRNA-disease association based on paths connecting them and their lengths in the network. Compared to existing methods, BPLLDA has a few advantages, including not demanding negative samples and the ability to predict associations related to novel lncRNAs or novel diseases. BPLLDA was applied to a canonical lncRNA-disease association database called LncRNADisease, together with two popular methods LRLSLDA and GrwLDA. The leave-one-out cross-validation areas under the receiver operating characteristic curve of BPLLDA are 0.87117, 0.82403, and 0.78528, respectively, for predicting overall associations, associations related to novel lncRNAs, and associations related to novel diseases, higher than those of the two compared methods. In addition, cervical cancer, glioma, and non-small-cell lung cancer were selected as case studies, for which the predicted top five lncRNA-disease associations were verified by recently published literature. In summary, BPLLDA exhibits good performances in predicting novel lncRNA-disease associations and associations related to novel lncRNAs and diseases. It may contribute to the understanding of lncRNA-associated diseases like certain cancers. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
25. Evaluation of multinomial logistic regression models for predicting causative pathogens of food poisoning cases.
- Author
-
INOUE, Hideya, SUZUKI, Tomoyuki, HYODO, Masashi, and MIYAKE, Masami
- Subjects
FOOD poisoning ,PATHOGENIC microorganisms ,LOGISTIC regression analysis ,EPIDEMIOLOGICAL models - Abstract
In cases of food poisoning, it is important for food sanitation inspectors to determine the causative pathogen as early as possible and take necessary measures to minimize outbreaks. Interviews are usually conducted to obtain epidemiological information to aid in the rapid determination of the cause. However, the current method of determining the causative pathogen has the disadvantage of being reliant upon the experience and knowledge of food sanitation inspectors. Here, we analyzed 529 infectious food poisoning incidents reported in five municipalities in the Kinki region to develop a tool for evaluation using a multinomial logistic regression model, which can predict the causative pathogen based on the patients' epidemiological information. This tool predicts the most probable cause of the incident by generating a list of pathogens with the highest probability. As a result of leave-one-out cross validation, the agreement ratio with the actual pathogen was 86.4%, and this ratio increased to 97.5% when the agreement was judged by including the true pathogen within the top three pathogens with the highest probability. In cases where the difference of probability between the first and second candidate pathogen was ≥50%, the agreement ratio increased to 94.2%. Using this tool, it is possible to accurately estimate the causative pathogen at an early stage based on patient information, and this will further help narrow the target of investigations to identify causative agent, thereby leading to a prompt identification, which can prevent the spread of food poisoning. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
26. Leave-Two-Out Cross Validation to optimal shape parameter in radial basis functions.
- Author
-
Azarboni, Habibe Ramezannezhad, Keyanpour, Mohammad, and Yaghouti, Mohammadreza
- Subjects
- *
DATA analysis , *DESCRIPTIVE statistics , *BUSINESS analytics , *THEORY of knowledge , *STATISTICS - Abstract
Abstract Determination of shape parameter has a major role in the accuracy of the radial basis functions method. In this paper, we present a new method called Leave-Two-Out Cross Validation to determine the best shape parameter. In the proposed method, by deleting two data from the data set, a new formula is derived for determining the error value and then the optimal shape parameter is determined. Num results show that the method will be more accurate in comparison with other methods. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
27. Evaluation of the Radar QPE and Rain Gauge Data Merging Methods in Northern China
- Author
-
Qingtai Qiu, Jia Liu, Jiyang Tian, Yufei Jiao, Chuanzhe Li, Wei Wang, and Fuliang Yu
- Subjects
weather radar quantitative precipitation estimation ,rain gauge ,radar-rain gauge merging ,leave-one-out cross validation ,verification ,Science - Abstract
Radar-rain gauge merging methods have been widely used to produce high-quality precipitation with fine spatial resolution by combing the advantages of the rain gauge observation and the radar quantitative precipitation estimation (QPE). Different merging methods imply a specific choice on the treatment of radar and rain gauge data. In order to improve their applicability, significant studies have focused on evaluating the performances of the merging methods. In this study, a categorization of the radar-rain gauge merging methods was proposed as: (1) Radar bias adjustment category, (2) radar-rain gauge integration category, and (3) rain gauge interpolation category for a total of six commonly used merging methods, i.e., mean field bias (MFB), regression inverse distance weighting (RIDW), collocated co-kriging (CCok), fast Bayesian regression kriging (FBRK), regression kriging (RK), and kriging with external drift (KED). Eight different storm events were chosen from semi-humid and semi-arid areas of Northern China to test the performance of the six methods. Based on the leave-one-out cross validation (LOOCV), conclusions were obtained that the integration category always performs the best, the bias adjustment category performs the worst, and the interpolation category ranks between them. The quality of the merging products can be a function of the merging method that is affected by both the quality of radar QPE and the ability of the rain gauge to capture small-scale rainfall features. In order to further evaluate the applicability of the merging products, they were then used as the input to a rainfall-runoff model, the Hybrid-Hebei model, for flood forecasting. It is revealed that a higher quality of the merging products indicates a better agreement between the observed and the simulated runoff.
- Published
- 2020
- Full Text
- View/download PDF
28. An improved unsupervised learning approach for potential human microRNA–disease association inference using cluster knowledge
- Author
-
Rajapandy, Manoov and Anbarasu, Anand
- Published
- 2021
- Full Text
- View/download PDF
29. Online Extreme Learning Machine with Hybrid Sampling Strategy for Sequential Imbalanced Data.
- Author
-
Mao, Wentao, Jiang, Mengxue, Wang, Jinwan, and Li, Yuan
- Abstract
In real applications of cognitive computation, data with imbalanced classes are used to be collected sequentially. In this situation, some of current machine learning algorithms, e.g., support vector machine, will obtain weak classification performance, especially on minority class. To solve this problem, a new hybrid sampling online extreme learning machine (ELM) on sequential imbalanced data is proposed in this paper. The key idea is keeping the majority and minority classes balanced with similar sequential distribution characteristic of the original data. This method includes two stages. At the offline stage, we introduce the principal curve to build confidence regions of minority and majority classes respectively. Based on these two confidence zones, over-sampling of minority class and under-sampling of majority class are both conducted to generate new synthetic samples, and then, the initial ELM model is established. At the online stage, we first choose the most valuable ones from the synthetic samples of majority class in terms of sample importance. Afterwards, a new online fast leave-one-out cross validation (LOO CV) algorithm utilizing Cholesky decomposition is proposed to determine whether to update the ELM network weight at online stage or not. We also prove theoretically that the proposed method has upper bound of information loss. Experimental results on seven UCI datasets and one real-world air pollutant forecasting dataset show that, compared with ELM, OS-ELM, meta-cognitive OS-ELM, and OSELM with SMOTE strategy, the proposed method can simultaneously improve the classification performance of minority and majority classes in terms of accuracy, G-mean value, and ROC curve. As a conclusion, the proposed hybrid sampling online extreme learning machine can be effectively applied to the sequential data imbalance problem with better generalization performance and numerical stability. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
30. An enhance excavation equipments classification algorithm based on acoustic spectrum dynamic feature.
- Author
-
Cao, Jiuwen, Huang, Wuhao, Zhao, Tuo, Wang, Jianzhong, and Wang, Ruirong
- Abstract
Underground pipeline network surveillance system attracts increasingly attentions recently due to severe breakages caused by external excavation equipments in the mainland of China. In this paper, we study excavation equipments classification algorithm based on acoustic signal processing and machine learning algorithms. A cross-layer microphone array with four elements is designed to collect the acoustic database of representative excavation equipments on real construction sites. The generalized sidelobe canceller algorithm is employed for background noise reduction. The improved spectrum dynamic feature extraction algorithm is then implemented for the benchmark acoustic feature database construction of excavation equipments. To perform classification and background noise identification, the single hidden layer feedforward neural network is employed as the classifier. An improved algorithm based on the popular extreme learning machine (ELM) is proposed for classifier learning. The leave-one-out cross validation strategy is adopted for the regularization parameter optimization in ELM. Comprehensive experiments are conducted to test the effectiveness of the proposed algorithm. Comparisons with state-of-art classifiers and the Mel-frequency cepstrual coefficients acoustic features are also provided to demonstrate the superiority of our approach. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
31. On the determination of locating the source points of the MFS using effective condition number.
- Author
-
Chen, C.S., Noorizadegan, Amir, Young, D.L., and Chen, Chuin-Shan
- Subjects
- *
GEOMETRIC shapes , *HEISENBERG uncertainty principle - Abstract
The method of fundamental solutions (MFS) is a highly accurate numerical method for solving homogeneous solutions subject to a properly selection of the sources location. In this work, we choose the effective condition number as a tool for the determination of a good source location of the MFS that leads to highly accurate results with low computational cost. Three approaches for the location of the fictitious source points are considered. An efficient algorithm for the evaluation of the effective condition number is proposed. We also compare the proposed method with the well-known LOOCV (leave-one-out cross validation) and show the advantages and shortcomings of each approach. Five numerical examples with different geometric shapes of the domain for both harmonic and non-harmonic boundary conditions in 2D and 3D are presented. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
32. Sample-Based Attribute Selective A$n$ DE for Large Data.
- Author
-
Chen, Shenglei, Martinez, Ana M., Webb, Geoffrey I., and Wang, Limin
- Subjects
- *
BIG data , *COMPUTER memory management , *CLASSIFICATION algorithms , *INFORMATION technology , *BAYESIAN analysis - Abstract
More and more applications have come with large data sets in the past decade. However, existing algorithms cannot guarantee to scale well on large data. Averaged n-Dependence Estimators (AnDE) allows for flexible learning from out-of-core data, by varying the value of $n$
(number of super parents). Hence, AnDE is especially appropriate for large data learning. In this paper, we propose a sample-based attribute selection technique for AnDE. It needs one more pass through the training data, in which a multitude of approximate AnDE models are built and efficiently assessed by leave-one-out cross validation. The use of a sample reduces the training time. Experiments on 15 large data sets demonstrate that the proposed technique significantly reduces AnDE's error at the cost of a modest increase in training time. This efficient and scalable out-of-core approach delivers superior or comparable performance to typical in-core Bayesian network classifiers. [ABSTRACT FROM AUTHOR]- Published
- 2017
- Full Text
- View/download PDF
33. Assessing genomic prediction accuracy for Holstein sires using bootstrap aggregation sampling and leave-one-out cross validation.
- Author
-
Mikshowsky, Ashley A., Gianola, Daniel, and Weigel, Kent A.
- Subjects
- *
HOLSTEIN-Friesian cattle , *PREGNANCY , *DAIRY cattle genetics , *SOMATIC cells , *STANDARD deviations , *GENETICS - Abstract
Since the introduction of genome-enabled prediction for dairy cattle in 2009, genomic selection has markedly changed many aspects of the dairy genetics industry and enhanced the rate of response to selection for most economically important traits. Young dairy bulls are genotyped to obtain their genomic predicted transmitting ability (GPTA) and reliability (REL) values. These GPTA are a main factor in most purchasing, marketing, and culling decisions until bulls reach 5 yr of age and their milk-recorded offspring become available. At that time, daughter yield deviations (DYD) can be compared with the GPTA computed several years earlier. For most bulls, the DYD align well with the initial predictions. However, for some bulls, the difference between DYD and corresponding GPTA is quite large, and published REL are of limited value in identifying such bulls. A method of bootstrap aggregation sampling (bagging) using genomic BLUP (GBLUP) was applied to predict the GPTA of 2,963, 2,963, and 2,803 young Holstein bulls for protein yield, somatic cell score, and daughter pregnancy rate (DPR), respectively. For each trait, 50 bootstrap samples from a reference population comprising 2011 DYD of 8,610, 8,405, and 7,945 older Holstein bulls were used. Leave-one-out cross validation was also performed to assess prediction accuracy when removing specific bulls from the reference population. The main objectives of this study were (1) to assess the extent to which current REL values and alternative measures of variability, such as the bootstrap standard deviation (SD) of predictions, could detect bulls whose daughter performance deviates significantly from early genomic predictions, and (2) to identify factors associated with the reference population that inform about inaccurate genomic predictions. The SD of bootstrap predictions was a mildly useful metric for identifying bulls whose future daughter performance may deviate significantly from early GPTA for protein and DPR. Leave-one-out cross validation allowed us to identify groups of reference population bulls that were influential on other reference population bulls for protein yield and observe their effects on predictions of testing set bulls, as a whole and individually. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
34. 基于机器视觉的牛肉结缔组织特征和嫩度关系研究.
- Author
-
陈士进, 丁冬, 李泊, 沈明霞, and 林盛业
- Abstract
[Objectives] Tenderness is the primary indicator of the meat quality. It influences the consumption and commercial value of the beef. Looking for suitable indicators of tenderness and predicting the tenderness with a rapid, non-destructive, and objective method has always been one of research focuses. [Methods] In this paper, the area of connective tissue between the muscles was segmented based on computer vision technology and image processing methods to extract features. Then statistical methods were used to find the relationship between characteristic parameters and cooked-beef shear force value. And combining with rating by a trained panel, the beef tenderness model was established by Stepwise-multiple linear regressions to predict cooked-beef tenderness and grading. [Results]The connective tissue feature data for 70 sample images were used to train and test sample tenderness model in a rotational leave-one-out scheme. Beef tenderness discrimination coefficient of the model R~ was 0.857, and RSEC was 6.453. Through cross validation, the beef was classified into tender, medium and tough groups with 88.57% classification accuracy. [Conclusions] Experimental results showed that image features of connective tissue between muscles wen' important indicators of beef tenderness. The hardware and software which was able to predict beef tenderness levels quickly and non-destruetively had good practical value and guiding significance. [ABSTRACT FROM AUTHOR]
- Published
- 2016
35. Extreme learning machine and adaptive sparse representation for image classification.
- Author
-
Cao, Jiuwen, Zhang, Kai, Luo, Minxia, Yin, Chun, and Lai, Xiaoping
- Subjects
- *
MACHINE learning , *IMAGE processing , *ROBUST control , *COMPUTATIONAL complexity , *HYBRID systems - Abstract
Recent research has shown the speed advantage of extreme learning machine (ELM) and the accuracy advantage of sparse representation classification (SRC) in the area of image classification. Those two methods, however, have their respective drawbacks, e.g., in general, ELM is known to be less robust to noise while SRC is known to be time-consuming. Consequently, ELM and SRC complement each other in computational complexity and classification accuracy. In order to unify such mutual complementarity and thus further enhance the classification performance, we propose an efficient hybrid classifier to exploit the advantages of ELM and SRC in this paper. More precisely, the proposed classifier consists of two stages: first, an ELM network is trained by supervised learning. Second, a discriminative criterion about the reliability of the obtained ELM output is adopted to decide whether the query image can be correctly classified or not. If the output is reliable, the classification will be performed by ELM; otherwise the query image will be fed to SRC. Meanwhile, in the stage of SRC, a sub-dictionary that is adaptive to the query image instead of the entire dictionary is extracted via the ELM output. The computational burden of SRC thus can be reduced. Extensive experiments on handwritten digit classification, landmark recognition and face recognition demonstrate that the proposed hybrid classifier outperforms ELM and SRC in classification accuracy with outstanding computational efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
36. FCMDAP: using miRNA family and cluster information to improve the prediction accuracy of disease related miRNAs
- Author
-
Xiaoying Li, Yaping Lin, Changlong Gu, and Jialiang Yang
- Subjects
Association score ,Computer science ,Systems biology ,Nearest neighbor recommendation algorithm ,0206 medical engineering ,02 engineering and technology ,Computational biology ,Disease ,Disease cluster ,Cross-validation ,Similarity (network science) ,Structural Biology ,microRNA ,Humans ,Gene Regulatory Networks ,Disease-related miRNA ,Leave-one-out cross validation ,Molecular Biology ,lcsh:QH301-705.5 ,Research ,Applied Mathematics ,Computational Biology ,Mutual information ,miRNA family information ,Computer Science Applications ,MicroRNAs ,lcsh:Biology (General) ,miRNA cluster information ,Modeling and Simulation ,Colorectal Neoplasms ,020602 bioinformatics - Abstract
Background Biological experiments have confirmed the association between miRNAs and various diseases. However, such experiments are costly and time consuming. Computational methods help select potential disease-related miRNAs to improve the efficiency of biological experiments. Methods In this work, we develop a novel method using multiple types of data to calculate miRNA and disease similarity based on mutual information, and add miRNA family and cluster information to predict human disease-related miRNAs (FCMDAP). This method not only depends on known miRNA-diseases associations but also accurately measures miRNA and disease similarity and resolves the problem of overestimation. FCMDAP uses the k most similar neighbor recommendation algorithm to predict the association score between miRNA and disease. Information about miRNA cluster is also used to improve prediction accuracy. Result FCMDAP achieves an average AUC of 0.9165 based on leave-one-out cross validation. Results confirm the 100, 98 and 96% of the top 50 predicted miRNAs reported in case studies on colorectal, lung, and pancreatic neoplasms. FCMDAP also exhibits satisfactory performance in predicting diseases without any related miRNAs and miRNAs without any related diseases. Conclusions In this study, we present a computational method FCMDAP to improve the prediction accuracy of disease related miRNAs. FCMDAP could be an effective tool for further biological experiments. Electronic supplementary material The online version of this article (10.1186/s12918-019-0696-9) contains supplementary material, which is available to authorized users.
- Published
- 2019
- Full Text
- View/download PDF
37. Regularized online sequential extreme learning machine with adaptive regulation factor for time-varying nonlinear system.
- Author
-
Lu, XinJiang, Zhou, Chuang, Huang, MingHui, and Lv, WenBing
- Subjects
- *
ADAPTIVE computing systems , *MACHINE learning , *TIME-varying systems , *NONLINEAR systems , *PROBLEM solving - Abstract
In order to more accurately model time-varying nonlinear systems, we propose a regularized online sequential extreme learning machine with adaptive regulation factor (ROSELM-ARF). The construction of a new objective function allows for the online updating of both the model coefficient as well as the regulation factor, while negating the influence of the cumulate error. This differs from the traditional regularized online sequential extreme learning machine (ReOS-ELM) which only updates the model coefficient. The development and application of a two-step solving method is used to determine the optimal parameters, where the optimal regulation factor is derived using the proposed fast and online leave-one-out cross validation (FOLOO) method. The computational performance could be drastically improved by using the proposed FOLOO method as compared to using the existing leave-one-out cross validation (LOO) method. The application of the proposed method in the modeling of two practical cases is done in order to demonstrate its effectiveness. The experimental results indicate that the proposed method provides a more accurate model than several conventional modeling methods, while also improving the computational performance. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
38. Multi-label classification using stacked spectral kernel discriminant analysis.
- Author
-
Tahir, Muhammad Atif, Kittler, Josef, and Bouridane, Ahmed
- Subjects
- *
DISCRIMINANT analysis , *KERNEL operating systems , *REGRESSION analysis , *GENERALIZATION , *NEURAL computers - Abstract
Multi-label classification is a challenging research problem due to the fact that each example may belong to a varying number of classes. This problem can be further aggravated by high dimensionality and complex correlation among labels. In this paper, a discriminant approach to multi-label classification is proposed using the concept of stacking and spectral regression based kernel discriminant analysis (SSRKDA). For effective stacked generalisation, a novel fast implementation of the leave-one-out cross-validation for SSRKDA is also presented in this paper. The proposed system is validated on several multi-label databases. The results indicate a significant boost in performance when SSRKDA is compared to other multi-label classification techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
39. Alveolar bone-loss area localization in periodontitis radiographs based on threshold segmentation with a hybrid feature fused of intensity and the H-value of fractional Brownian motion model.
- Author
-
Lin, P.L., Huang, P.W., Huang, P.Y., and Hsu, H.C.
- Subjects
- *
PERIODONTITIS , *ALVEOLAR process , *RADIOGRAPHY , *IMAGE segmentation , *BROWNIAN motion , *DISEASE progression , *DIAGNOSIS - Abstract
Background and objective Periodontitis involves progressive loss of alveolar bone around the teeth. Hence, automatic alveolar bone-loss (ABL) measurement in periapical radiographs can assist dentists in diagnosing such disease. In this paper, we propose an effective method for ABL area localization and denote it as ABLIfBm. Method ABLIfBm is a threshold segmentation method that uses a hybrid feature fused of both intensity and texture measured by the H -value of fractional Brownian motion (fBm) model, where the H -value is the Hurst coefficient in the expectation function of a fBm curve (intensity change) and is directly related to the value of fractal dimension. Adopting leave-one-out cross validation training and testing mechanism, ABLIfBm trains weights for both features using Bayesian classifier and transforms the radiograph image into a feature image obtained from a weighted average of both features. Finally, by Otsu's thresholding, it segments the feature image into normal and bone-loss regions. Results Experimental results on 31 periodontitis radiograph images in terms of mean true positive fraction and false positive fraction are about 92.5% and 14.0%, respectively, where the ground truth is provided by a dentist. The results also demonstrate that ABLIfBm outperforms (a) the threshold segmentation method using either feature alone or a weighted average of the same two features but with weights trained differently; (b) a level set segmentation method presented earlier in literature; and (c) segmentation methods based on Bayesian, K-NN, or SVM classifier using the same two features. Conclusion Our results suggest that the proposed method can effectively localize alveolar bone-loss areas in periodontitis radiograph images and hence would be useful for dentists in evaluating degree of bone-loss for periodontitis patients. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
40. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation.
- Author
-
Wong, Tzu-Tsung
- Subjects
- *
PERFORMANCE evaluation , *ALGORITHMS , *DATA mining , *INDEPENDENCE (Mathematics) , *STATISTICAL sampling - Abstract
Classification is an essential task for predicting the class values of new instances. Both k -fold and leave-one-out cross validation are very popular for evaluating the performance of classification algorithms. Many data mining literatures introduce the operations for these two kinds of cross validation and the statistical methods that can be used to analyze the resulting accuracies of algorithms, while those contents are generally not all consistent. Analysts can therefore be confused in performing a cross validation procedure. In this paper, the independence assumptions in cross validation are introduced, and the circumstances that satisfy the assumptions are also addressed. The independence assumptions are then used to derive the sampling distributions of the point estimators for k -fold and leave-one-out cross validation. The cross validation procedure to have such sampling distributions is discussed to provide new insights in evaluating the performance of classification algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
41. In vivoRaman spectroscopy of human uterine cervix: exploring the utility of vagina as an internal control.
- Author
-
Shaikh, Rubina, Dora, Tapas Kumar, Chopra, Supriya, Maheshwari, Amita, K., Deodhar Kedar, Bharat, Rekhi, and Krishna, C. Murali
- Subjects
- *
RAMAN spectroscopy , *CERVIX uteri , *VAGINA , *TUMORS , *CERVICAL cancer - Abstract
In vivo Raman spectroscopy is being projected as a new, noninvasive method for cervical cancer diagnosis. In most of the reported studies, normal areas in the cancerous cervix were used as control. However, in the Indian subcontinent, the majority of cervical cancers are detected at advanced stages, leaving no normal sites for acquiring control spectra. Moreover, vagina and ectocervix are reported to have similar biochemical composition. Thus, in the present study, we have evaluated the feasibility of classifying normal and cancerous conditions in the Indian population and we have also explored the utility of the vagina as an internal control. A total of 228 normal and 181 tumor in vivo Raman spectra were acquired from 93 subjects under clinical supervision. The spectral features in normal conditions suggest the presence of collagen, while DNA and noncollagenous proteins were abundant in tumors. Principal-component linear discriminant analysis (PC-LDA) yielded 97% classification efficiency between normal and tumor groups. An analysis of a normal cervix and vaginal controls of cancerous and noncancerous subjects suggests similar spectral features between these groups. PC-LDA of tumor, normal cervix, and vaginal controls further support the utility of the vagina as an internal control. Overall, findings of the study corroborate with earlier studies and facilitate objective, noninvasive, and rapid Raman spectroscopic-based screening/diagnosis of cervical cancers. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
42. Maps, trends, and temperature sensitivities—phenological information from and for decreasing numbers of volunteer observers
- Author
-
Tobias Ottenheym, Gourav Misra, Annette Menzel, Alissa Lüpke, Stefan Härer, Nicole Estrella, and Ye Yuan
- Subjects
0106 biological sciences ,Volunteers ,Atmospheric Science ,010504 meteorology & atmospheric sciences ,Health, Toxicology and Mutagenesis ,Climate Change ,Climate change ,Phenological season ,Citizen science ,010603 evolutionary biology ,01 natural sciences ,Map interpolation ,Cross-validation ,Meteorology ,Inverse distance weighting ,Humans ,Leave-one-out cross validation ,Multiple linear regression ,0105 earth and related environmental sciences ,Series (stratigraphy) ,Original Paper ,Ecology ,Phenology ,Temperature ,ddc ,Climatology ,Environmental science ,Seasons ,Interpolation - Abstract
Phenology serves as a major indicator of ongoing climate change. Long-term phenological observations are critically important for tracking and communicating these changes. The phenological observation network across Germany is operated by the National Meteorological Service with a major contribution from volunteering activities. However, the number of observers has strongly decreased for the last decades, possibly resulting in increasing uncertainties when extracting reliable phenological information from map interpolation. We studied uncertainties in interpolated maps from decreasing phenological records, by comparing long-term trends based on grid-based interpolated and station-wise observed time series, as well as their correlations with temperature. Interpolated maps in spring were characterized by the largest spatial variabilities across Bavaria, Germany, with respective lowest interpolated uncertainties. Long-term phenological trends for both interpolations and observations exhibited mean advances of −0.2 to −0.3 days year−1 for spring and summer, while late autumn and winter showed a delay of around 0.1 days year−1. Throughout the year, temperature sensitivities were consistently stronger for interpolated time series than observations. Such a better representation of regional phenology by interpolation was equally supported by satellite-derived phenological indices. Nevertheless, simulation of observer numbers indicated that a decline to less than 40% leads to a strong decrease in interpolation accuracy. To better understand the risk of declining phenological observations and to motivate volunteer observers, a Shiny app is proposed to visualize spatial and temporal phenological patterns across Bavaria and their links to climate change–induced temperature changes. Supplementary Information The online version contains supplementary material available at 10.1007/s00484-021-02110-3.
- Published
- 2020
43. Partial least squares and random sample consensus in outlier detection
- Author
-
Peng, Jiangtao, Peng, Silong, and Hu, Yong
- Subjects
- *
LEAST squares , *RANDOM variables , *OUTLIERS (Statistics) , *COMPARATIVE studies , *CLINICAL drug trials , *DATA analysis , *ALGORITHMS , *MATHEMATICAL analysis - Abstract
Abstract: A novel outlier detection method in partial least squares based on random sample consensus is proposed. The proposed algorithm repeatedly generates partial least squares solutions estimated from random samples and then tests each solution for the support from the complete dataset for consistency. A comparative study of the proposed method and leave-one-out cross validation in outlier detection on simulated data and near-infrared data of pharmaceutical tablets is presented. In addition, a comparison between the proposed method and PLS, RSIMPLS, PRM is provided. The obtained results demonstrate that the proposed method is highly efficient. [Copyright &y& Elsevier]
- Published
- 2012
- Full Text
- View/download PDF
44. Fast automatic two-stage nonlinear model identification based on the extreme learning machine
- Author
-
Deng, Jing, Li, Kang, and Irwin, George W.
- Subjects
- *
MACHINE learning , *NONLINEAR statistical models , *NONLINEAR theories , *MATHEMATICAL models , *PARAMETER estimation , *LEAST squares , *RADIAL basis functions - Abstract
Abstract: It is convenient and effective to solve nonlinear problems with a model that has a linear-in-the-parameters (LITP) structure. However, the nonlinear parameters (e.g. the width of Gaussian function) of each model term needs to be pre-determined either from expert experience or through exhaustive search. An alternative approach is to optimize them by a gradient-based technique (e.g. Newton''s method). Unfortunately, all of these methods still need a lot of computations. Recently, the extreme learning machine (ELM) has shown its advantages in terms of fast learning from data, but the sparsity of the constructed model cannot be guaranteed. This paper proposes a novel algorithm for automatic construction of a nonlinear system model based on the extreme learning machine. This is achieved by effectively integrating the ELM and leave-one-out (LOO) cross validation with our two-stage stepwise construction procedure . The main objective is to improve the compactness and generalization capability of the model constructed by the ELM method. Numerical analysis shows that the proposed algorithm only involves about half of the computation of orthogonal least squares (OLS) based method. Simulation examples are included to confirm the efficacy and superiority of the proposed technique. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
45. Estimating and predicting bark thickness for seven conifer species in the Acadian Region of North America using a mixed-effects modeling approach: comparison of model forms and subsampling strategies.
- Author
-
Li, Rongxia and Weiskittel, Aaron
- Abstract
In many situations, information on stem diameters inside bark (dib) are more desirable than on diameters outside bark (dob). However, obtaining dib measurements is usually expensive, time-consuming, and prone to significant measurement errors when done on standing trees. Many bark thickness equations have been proposed to estimate the dibs of standing trees. In this study, we compared several commonly used bark thickness equations for seven conifer species in the Acadian Region of North America. Mixed-effects modeling techniques were employed to fit linear and non-linear bark thickness equations. We found the equation proposed by Cao and Pepper (South J Appl Forestry 10:220-224, 1986; Eq. 5) performed significantly better than other equations for most of our study species. The Cao and Pepper (South J Appl Forestry 10:220-224, ) equation is a function of dob, relative height in the stem, tree height, and the ratio of dib to dob at breast height. The mean absolute bias was found to be reduced up to 74% compared with using a fixed ratio approach employed in the widely used Northeastern variant of the Forest Vegetation Simulator (FVS-NE) growth and yield model. Leave-one-out cross validation was further performed to determine the location of suitable prior measurements in the prediction process for three of the most well-behaved equations. Results show that no unified prior measurement can provide best predictive abilities across all species as the choice of prior dib measurements depends on both species and bark thickness equations. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
46. Classification of foreign fibers in cotton lint using machine vision and multi-class support vector machine
- Author
-
Li, Daoliang, Yang, Wenzhu, and Wang, Sile
- Subjects
- *
COMPUTER vision , *AUTOMATIC classification , *PLANT fibers , *IMAGE processing , *PLANT classification , *COTTON , *ALGORITHMS - Abstract
Abstract: Automatic classification of foreign fibers in cotton lint using machine vision is still a challenge due to various colors and shapes of the foreign fibers. This paper presents a novel classification method based on multi-class support vector machine (MSVM) which aims at accurate and fast classification of the foreign fibers. Firstly, live images were acquired by a machine vision system and then processed using image processing algorithms. Then the color features, shape features and texture features of each foreign fiber object were extracted and feature vectors were composed. Afterwards, three kinds of multi-class support vector machines were constructed, i.e., one-against-all decision-tree based MSVM, one-against-one voting based MSVM and one-against-one directed acyclic graph MSVM separately. At last, with the extracted feature vectors as input, the MSVMs were tested using leave-one-out cross validation. The results indicate that both the one-against-one voting based MSVM and the one-against-one directed acyclic graph MSVM can satisfy the accuracy requirement of the classification of foreign fibers, and the mean accuracy is 93.57% and 92.34% separately. The one-against-all decision-tree based MSVM only obtains mean accuracy of 79.25% which can not meet the accuracy requirement. In classification speed, one-against-one directed acyclic graph MSVM is the fastest and fitter for online classification. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
47. Particle Swarm Optimization Aided Orthogonal Forward Regression for Unified Data Modeling.
- Author
-
Sheng Chen, Xia Hong, and Harris, Chris J.
- Subjects
REGRESSION analysis ,ALGORITHMS ,DATA modeling ,SWARM intelligence ,MATHEMATICAL optimization ,EVOLUTIONARY computation - Abstract
We propose a unified data modeling approach that is equally applicable to supervised regression and classification applications, as well as to unsupervised probability density function estimation. A particle swarm optimization (PSO) aided orthogonal forward regression (OFR) algorithm based on leave-one-out (LOO) criteria is developed to construct parsimonious radial basis function (RBF) networks with tunable nodes. Each stage of the construction process determines the center vector and diagonal covariance matrix of one RBF node by minimizing the LOO statistics. For regression applications, the LOO criterion is chosen to be the LOO mean square error, while the LOO misclassification rate is adopted in two-class classification applications. By adopting the Parzen window estimate as the desired response, the unsupervised density estimation problem is transformed into a constrained regression problem. This PSO aided OFR algorithm for tunable-node RBF networks is capable of constructing very parsimonious RBF models that generalize well, and our analysis and experimental results demonstrate that the algorithm is computationally even simpler than the efficient regularization assisted orthogonal least square algorithm based on LOO criteria for selecting fixed-node RBF models. Another significant advantage of the proposed learning procedure is that it does not have learning hyperparameters that have to be tuned using costly cross validation. The effectiveness of the proposed PSO aided OFR construction procedure is illustrated using several examples taken from regression and classification, as well as density estimation applications. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
48. Verification of the geological origin of bottled mineral water using artificial neural networks
- Author
-
Grošelj, Neva, van der Veer, Grishja, Tušar, Marjan, Vračko, Marjan, and Novič, Marjana
- Subjects
- *
BOTTLED water , *COMPOSITION of water , *ARTIFICIAL neural networks , *MINERAL water bottles , *COST effectiveness , *WATER sampling , *PREDICTION models - Abstract
Abstract: As a first step towards objective and cost-efficient verification of the geographical origin of commercially sold mineral water, we determined up to what extent the chemical composition of mineral water can be linked to the geology of the local water source. For this purpose, a dataset consisting of 145 European mineral water samples from a known geology was analysed using counter-propagation artificial neural networks (CP-ANNs) with supervised learning algorithm. The models were tested for recall ability (RA) and validated with a leave-one-out cross validation (LOO-CV). The optimal model shows 85% and 65% correct predictions on RA and on LOO-CV, respectively, indicating a substantial success to correctly predict the geology of the mineral water samples. Results further show that using the proper lithological classification scheme largely determines the success of the prediction, whereas inclusion of the calculated saturation indices of different solutes as additional variables in the data appeared to have negligible effect on the predictive power of the model. [Copyright &y& Elsevier]
- Published
- 2010
- Full Text
- View/download PDF
49. Comparison between two types of Artificial Neural Networks used for validation of pharmaceutical processes
- Author
-
Behzadi, Sharareh Salar, Prakasvudhisarn, Chakguy, Klocker, Johanna, Wolschann, Peter, and Viernstein, Helmut
- Subjects
- *
ARTIFICIAL neural networks , *COMPARATIVE studies , *PHARMACEUTICAL technology , *GRANULATION , *CHEMICAL processes , *BAYESIAN analysis , *FEEDFORWARD control systems - Abstract
Abstract: Two types of Artificial Neural Networks (ANNs), a Multi-Layer Perceptron (MLP) and a Generalized Regression Neural Network (GRNN), have been used for the validation of a fluid bed granulation process. The training capacity and the accuracy of these two types of networks were compared. The variations of the ratio of binder solution to feed material, product bed temperature, atomizing air pressure, binder spray rate, air velocity and batch size were taken as input variables for training the MLP and GRNN. The properties of size, size distribution, flow rate, angle of repose and Hausner''s ratio of granules produced, were measured and used as output variables. Qualitatively, the two networks gave comparable results, as both pointed out the importance of the binder spray rate and the atomizing air pressure to the granulation process. However, the averaged absolute error of the MLP was higher than the averaged absolute error of the GRNN. Furthermore, the correlation coefficients between the experimentally determined and the calculated output values, the corresponding prediction accuracy for the different granule properties as well as the overall prediction accuracy using GRNN were better than using MLP. In conclusion, the comparison of two different networks (MLP, a so-called feed-forward back-propagation network and GRNN, a so-called Bayesian Neural Network) showed the higher capacity of the latter for validation of such granulation processes. [Copyright &y& Elsevier]
- Published
- 2009
- Full Text
- View/download PDF
50. A Least-squares Approach to Direct Importance Estimation.
- Author
-
Kanamori, Takafumi, Hido, Shohei, and Sugiyama, Masashi
- Subjects
- *
LEAST squares , *ESTIMATION theory , *DENSITY functionals , *PROBABILITY theory , *OUTLIERS (Statistics) , *MODEL validation - Abstract
We address the problem of estimating the ratio of two probability density functions, which is often referred to as the importance. The importance values can be used for various succeeding tasks such as covariate shift adaptation or outlier detection. In this paper, we propose a new importance estimation method that has a closed-form solution; the leave-one-out cross-validation score can also be computed analytically. Therefore, the proposed method is computationally highly efficient and simple to implement. We also elucidate theoretical properties of the proposed method such as the convergence rate and approximation error bounds. Numerical experiments show that the proposed method is comparable to the best existing method in accuracy, while it is computationally more efficient than competing approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2009
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.