1,469 results on '"elastic net"'
Search Results
2. A novel comparison of shrinkage methods based on multi criteria decision making in case of multicollinearity.
- Author
-
Kılıçoğlu, Șevval and Yerlikaya-Özkurt, Fatma
- Abstract
Data analysis is very important in many fields of science. The most preferred methods in data analysis is linear regression due to its simplicity to interpret and ease of application. One of the assumptions accepted while obtaining linear regression is that there is no correlation between the independent variables in the model which refers to absence of multicollinearity. As a result of multicollinearity, the variance of the parameter estimates will be high and this reduces the accuracy and reliability of the linear models. Shrinkage methods aim to handle the multicollinearity problem by minimizing the variance of the estimators in linear model. Ridge Regression, Lasso, and Elastic-Net methods are applied to different simulated data sets with different characteristics and also real world data sets. Based on performance results, the methods are compared according to multi-criteria decision making method named TOPSIS, and the order of preference is determined for each data set. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. A novel IoT-integrated ensemble learning approach for indoor air quality enhancement.
- Author
-
Kareem Abed Alzabali, Saja, Bastam, Mostafa, and Ataie, Ehsan
- Subjects
- *
MACHINE learning , *INDOOR air quality , *AIR quality monitoring , *STANDARD deviations , *PARTICULATE matter , *ATMOSPHERIC carbon dioxide , *LIQUEFIED petroleum gas - Abstract
In indoor environments, air quality significantly impacts human health and well-being, with carbon monoxide (CO) posing a particular hazard due to its colorless and odorless nature and potential to cause severe health issues. Integrating the Internet of Things and remote sensing technologies has revolutionized data monitoring, collection, and evaluation, especially within the context of 'smart' homes. This study leverages these technologies to enhance indoor air quality monitoring. By collecting data on key indoor atmospheric quality indicators—carbon dioxide (CO2), methane (CH4), alcohol, liquefied petroleum gas (LPG), particulate matter (PM1 and PM2.5), humidity, and temperature—the study aims to predict indoor carbon monoxide levels. A custom dataset was compiled from August to October, consisting of 61,710 observations recorded at one-minute intervals. The methodology employs a stacking ensemble approach, integrating multiple machine learning models to boost prediction accuracy and reliability. In the stacking ensemble, six distinct models are employed: Random Forest, Multi-Layer Perceptron, Lasso, Elastic Net, XGBoost, and Support Vector Regression. Each model is individually trained and fine-tuned using the Grid Search method to optimize parameter combinations. These optimized models are then combined in the stacking ensemble, which achieves a Mean Squared Error (MSE) of 0.0140, a Root Mean Squared Error (RMSE) of 0.1185, and a Mean Absolute Error (MAE) of 0.0291. The results demonstrate that the proposed system significantly enhances the precision of CO prediction, underscoring its critical role in air quality surveillance within smart environments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Laser-Induced Breakdown Spectroscopic Steel Classification Method Using Mixed Feature Selection and Lime.
- Author
-
Lin, Xiaomei, Duan, Xinyang, Lin, Jingjun, Huang, Yutao, Yang, Jiangfei, Zhang, Zhuojia, and Dong, Yanjie
- Subjects
- *
LASER-induced breakdown spectroscopy , *FEATURE selection , *SUPPORT vector machines , *STEEL , *CLASSIFICATION - Abstract
Laser-induced breakdown spectroscopy (LIBS) technology faces the challenge of redundant or irrelevant features when dealing with high-dimensional data of steel. To enhance the accuracy and interpretability of multivariate classification, this study introduces an innovative hybrid feature selection (FS) method that skillfully combines the filtering characteristics of the select percentile (SP) algorithm with the embedded advantages of the elastic net (EN) algorithm. Under this framework, the support vector machine (SVM) algorithm was applied for classification, demonstrating outstanding performance with an accuracy, precision, and F1 score of 0.9888, 0.9895, and 0.9889 on the test set, respectively. To address the 'black box' nature of the SVM algorithm, this paper further introduces the local interpretable model-agnostic explanations (LIME) method. LIME allows for the visualization of the importance of each variable, thereby enhancing the interpretability and credibility of the model. Overall, the model and methods proposed in this study show significant effectiveness in eliminating redundant or irrelevant features and in precise classification, effectively solving most of the challenges faced by LIBS in steel classification issues. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Regularization and variable selection with triple shrinkage in linear regression: a generalization of lasso.
- Author
-
Genç, Murat and Özkale, M. Revan
- Subjects
- *
DATA analysis , *GENERALIZATION , *ALGORITHMS , *MULTICOLLINEARITY - Abstract
We propose a new shrinkage and variable selection method in linear regression, which is based on triple shrinkage on the regression coefficients. The new estimation method contains the ridge, lasso and elastic net as special cases. The term based on the shrunken estimator in the new method can provide estimates with a smaller length depending on the size of a new tuning parameter compared to the elastic net, maintaining the variable selection feature in the case of multicollinearity. The new estimator has the property of the grouping effect similar to that of the elastic net. The well-known coordinate descent algorithm is used to compute the coefficient path of the new estimator, efficiently. We conduct real data analysis and simulation studies to compare the new estimator with several methods including the lasso and elastic net. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Influence Line Identification Method Based on VMD Combined with Improved Wavelet Threshold Denoising.
- Author
-
Zhu, Jinsong and Zhou, Shuai
- Subjects
- *
LIVE loads , *NUMERICAL calculations , *MATHEMATICAL models , *NOISE , *SPEED - Abstract
The influence line of bridge can reflect the performance of the bridge under the moving vehicle load. It has a wide range of applications in structural damage detection, performance evaluation, model correction and bridge weigh-in-motion. Fast moving vehicles will cause the dynamic effect of the bridge, resulting in the change of the bridge response curve and the difficulty of identifying the influence line. In this paper, an influence line identification method based on variational mode decomposition (VMD) combined with improved wavelet threshold denoising is proposed. Firstly, the mathematical model of influence line identification is established based on vehicle load matrix and bridge response. Then, VMD combined with improved wavelet threshold denoising method is used to eliminate dynamic fluctuation effect and noise. Finally, the elastic net penalty term is introduced in the influence line identification problem, and the B-spline basis function is used to reconstruct the influence line. In order to verify the effectiveness of the proposed method, a vehicle-bridge coupling model is established for numerical calculation, and the identification of bridge displacement influence lines under different conditions such as vehicle speed, noise level, road roughness and vehicle weight is investigated. The results show that the proposed method can effectively eliminate the dynamic fluctuation components and noise of the bridge response. In the case of good road roughness, the influence line can still be accurately identified when the vehicle speed reaches 15 m/s. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Comparative evaluation of statistical and machine learning models for weather-driven wheat yield forecasting across different districts of Punjab.
- Author
-
Gill, Kulwinder Kaur, Bhatt, Kavita, Akansha, Setiya, Parul, Sandhu, Sandeep Singh, and Kaur, Baljeet
- Subjects
ARTIFICIAL neural networks ,AGRICULTURAL forecasts ,MACHINE learning ,CROP yields ,STANDARD deviations - Abstract
Predicting crop yields before harvest is important for making and carrying out policies about food safety, transportation costs, import-export, storage, and selling of agricultural goods. The weather is a key factor in crop growth and its development. Therefore, models that include meteorological variables can predict reliable forecasts for crop output; however, selecting the appropriate model for use in agricultural production forecasting can be challenging. This study investigates the development of wheat yield prediction models using various multivariate analysis techniques and weather indices derived from meteorological data collected over 22 years in Punjab, India. Five different modeling approaches, including stepwise multiple linear regression (SMLR), LASSO, elastic net (ELNET), artificial neural network (ANN), and ridge regression, were employed and compared for their effectiveness in predicting wheat yield. The models were calibrated using data from 17 years (2000–01 to 2016–17) and validated using data from the subsequent 5 years (2017–18 to 2021–22). Evaluation metrics such as R
2 , root mean square error (RMSE), normalized root mean square error (NRMSE), mean biased error (MBE), and modeling efficiency (EF) were utilized to assess model performance. The results indicate varying degrees of performance across districts and modeling techniques. ANN demonstrated the highest performance during both calibration and validation periods, followed closely by LASSO and ELNET. However, certain districts showed discrepancies in model fit, with some models performing better than others depending on the specific district. Overall, ANN emerged as the most reliable approach for wheat yield prediction in Punjab followed by ELNET and LASSO, offering valuable insights for agricultural planning and management. This comprehensive analysis provides valuable contributions to the field of crop yield prediction, enhancing understanding of the complex interactions between weather variables and agricultural outcomes. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
8. Least angle regression, relaxed lasso, and elastic net for algebraic multigrid of systems of elliptic partial differential equations.
- Author
-
Lee, Barry
- Subjects
- *
ELLIPTIC differential equations , *PARSIMONIOUS models , *DEGREES of freedom , *INTERPOLATION , *ABILITY grouping (Education) , *MULTIGRID methods (Numerical analysis) , *PETRI nets - Abstract
In a sequence of papers, the author examined several statistical affinity measures for selecting the coarse degrees of freedom (CDOFs) or coarse nodes (Cnodes) in algebraic multigrid (AMG) for systems of elliptic partial differential equations (PDEs). These measures were applied to a set of relaxed vectors that exposes the problematic error components. Once the CDOFs are determined using any one of these measures, the interpolation operator is constructed in a bootstrap AMG (BAMG) procedure. However, in a recent paper of Kahl and Rottmann, the statistical least angle regression (LARS) method was utilized in the coarsening procedure and shown to be promising in the CDOF selection. This method is generally used in the statistics community to select the most relevant variables in constructing a parsimonious model for a very complicated and high‐dimensional model or data set (i.e., variable selection for a "reduced" model). As pointed out by Kahl and Rottmann, the LARS procedure has the ability to detect group relations between variables, which can be more useful than binary relations that are derived from strength‐of‐connection, or affinity measures, between pairs of variables. Moreover, by using an updated Cholesky factorization approach in the regression computation, the LARS procedure can be performed efficiently even when the original set of variables is large; and due to the LARS formulation itself (i.e., its l1$$ {l}_1 $$‐norm constraint), sparse interpolation operators can be generated. In this article, we extend the LARS coarsening approach to systems of PDEs. Furthermore, we incorporate some modifications to the LARS approach based on the so‐called elastic net and relaxed lasso methods, which are well known and thoroughly analyzed in the statistics community for ameliorating several major issues with LARS as a variable selection procedure. We note that the original LARS coarsening approach may have addressed some of these issues in similar or other ways but due to the limited details provided there, it is difficult to determine the extent of their similarities. Incorporating these modifications (or effecting them in similar ways) leads to improved robustness in the LARS coarsening procedure, and numerical experiments indicate that the changes lead to faster convergence in the multigrid method. Moreover, the relaxed lasso modification permits an indirect BAMG (iBAMG) extension to the interpolation operator. This iBAMG extension applied in an intra‐ or inter‐variable interpolation setting (i.e., nodal‐based coarsening), as well as in variable‐based coarsening, which will not preserve the nodal structure of a finest‐level discretization on the lower levels of the multilevel hierarchy, will be examined. For the variable‐based coarsening, because of the parsimonious feature of LARS, the performance is reasonably good when applied to systems of PDEs albeit at a substantial additional cost over a nodal‐based procedure. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Predicting Biochemical and Physiological Parameters: Deep Learning from IgG Glycome Composition.
- Author
-
Vujić, Ana, Klasić, Marija, Lauc, Gordan, Polašek, Ozren, Zoldoš, Vlatka, and Vojta, Aleksandar
- Subjects
- *
MACHINE learning , *DEEP learning , *IMMUNOGLOBULIN G , *GLUCOSE metabolism , *INDIVIDUALIZED medicine - Abstract
In immunoglobulin G (IgG), N-glycosylation plays a pivotal role in structure and function. It is often altered in different diseases, suggesting that it could be a promising health biomarker. Studies indicate that IgG glycosylation not only associates with various diseases but also has predictive capabilities. Additionally, changes in IgG glycosylation correlate with physiological and biochemical traits known to reflect overall health state. This study aimed to investigate the power of IgG glycans to predict physiological and biochemical parameters. We developed two models using IgG N-glycan data as an input: a regression model using elastic net and a machine learning model using deep learning. Data were obtained from the Korčula and Vis cohorts. The Korčula cohort data were used to train both models, while the Vis cohort was used exclusively for validation. Our results demonstrated that IgG glycome composition effectively predicts several biochemical and physiological parameters, especially those related to lipid and glucose metabolism and cardiovascular events. Both models performed similarly on the Korčula cohort; however, the deep learning model showed a higher potential for generalization when validated on the Vis cohort. This study reinforces the idea that IgG glycosylation reflects individuals' health state and brings us one step closer to implementing glycan-based diagnostics in personalized medicine. Additionally, it shows that the predictive power of IgG glycans can be used for imputing missing covariate data in deep learning frameworks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Improving random forest algorithm by selecting appropriate penalized method.
- Author
-
Farhadi, Zari, Bevrani, Hossein, and Feizi-Derakhshi, Mohammad-Reza
- Subjects
- *
RANDOM forest algorithms , *MONTE Carlo method , *MACHINE learning - Abstract
This article is improved the random forest algorithm by selecting the most appropriate penalized regression methods, and it is tried to improve the post-selection boosting random forest (PBRF) algorithm using elastic net regression. The proposed method with the highest efficiency is called Reducing and Aggregating Random Forest Trees by Elastic Net (RARTEN). The introduced method consists of three steps. In the first step, the random forest algorithm is used as a predictor. In the second step, Elastic Net, as a penalized regression method, is applied to reduce the number of trees and improve the random forest and PBRF. In the last step, selected trees are aggregated. The obtained results of the real data and Monte Carlo simulation are evaluated using various statistical performance criteria. The simulation study shows that the RARTEN with 7%, 5%, and 8.5% reduction in the linear, nonlinear, and noise model, respectively improve the accuracy of the traditional random forest and the proposed method by Wang. In addition, this method has a significant reduction compared to other penalized regression methods. Moreover, the real data results show that the proposed method in our study with a reduction of almost 16% confirms the validity of the proposed model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. The spike-and-slab lasso and scalable algorithm to accommodate multinomial outcomes in variable selection problems.
- Author
-
Leach, Justin M., Yi, Nengjun, and Aban, Inmaculada
- Subjects
- *
DISTRIBUTION (Probability theory) , *PARAMETER estimation , *GENERALIZATION , *ALGORITHMS , *MIXTURES , *EXPECTATION-maximization algorithms - Abstract
Spike-and-slab prior distributions are used to impose variable selection in Bayesian regression-style problems with many possible predictors. These priors are a mixture of two zero-centered distributions with differing variances, resulting in different shrinkage levels on parameter estimates based on whether they are relevant to the outcome. The spike-and-slab lasso assigns mixtures of double exponential distributions as priors for the parameters. This framework was initially developed for linear models, later developed for generalized linear models, and shown to perform well in scenarios requiring sparse solutions. Standard formulations of generalized linear models cannot immediately accommodate categorical outcomes with > 2 categories, i.e. multinomial outcomes, and require modifications to model specification and parameter estimation. Such modifications are relatively straightforward in a Classical setting but require additional theoretical and computational considerations in Bayesian settings, which can depend on the choice of prior distributions for the parameters of interest. While previous developments of the spike-and-slab lasso focused on continuous, count, and/or binary outcomes, we generalize the spike-and-slab lasso to accommodate multinomial outcomes, developing both the theoretical basis for the model and an expectation-maximization algorithm to fit the model. To our knowledge, this is the first generalization of the spike-and-slab lasso to allow for multinomial outcomes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Analyzing the Factors that Affect the Production Volume of Sea-Caught Stingrays using Elastic Net Regression.
- Author
-
Hermawan, Raihan Ariq, Ohyver, Margaretha, and Fitrianah, Devi
- Subjects
PRODUCTION quantity ,LEAST squares ,CORAL reefs & islands ,STINGRAYS ,CORALS - Abstract
Elastic Net is a regularization method that can be applied to select independent variables (variable selection) and overcome multicollinearity issues. These two issues frequently occur when modeling using linear regression, as is the case with the production volume of sea-caught stingrays in Indonesia. There are numerous factors that may influence the production volume of stingrays, some of which may also have a high correlation between them. Therefore, the purpose of this study is to identify the factors that influence the production volume of sea-caught stingrays in Indonesia in 2021 using Elastic Net. After the analysis was carried out, the result showed the factors that have important roles in the production volume of sea-caught stingrays are the number of powered motor fishing vessel and the number of fishery household without vessel from Multiple Linear Regression using the Ordinary Least Square method and Elastic Net models. Other factors that may be considered, based on the Elastic Net model (α = 0.50 & λ= 0.1125699), are the production volume of sea-caught shrimps, the area of coral reef, the fishery household with non-powered vessel, and the non-powered fishing vessel. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Versatile Descent Algorithms for Group Regularization and Variable Selection in Generalized Linear Models.
- Author
-
Helwig, Nathaniel E.
- Subjects
- *
POISSON distribution , *ORTHOGONALIZATION , *ALGORITHMS , *BINOMIAL distribution , *FISHER information - Abstract
AbstractThis article proposes an adaptively bounded gradient descent (ABGD) algorithm for group elastic net penalized regression. Unlike previously proposed algorithms, the proposed algorithm adaptively bounds the Fisher information matrix, which results in a flexible and stable computational framework. In particular, the proposed algorithm (i) does not require orthogonalization of the predictors, and (ii) can be easily applied to any combination of exponential family response distribution and link function. The proposed algorithm is implemented in the grpnet R package (available from CRAN), which implements the approach for common response distributions (Gaussian, binomial, and Poisson), as well as several response distributions not previously considered in the group penalization literature (i.e., multinomial, negative binomial, gamma, and inverse Gaussian). Simulated and real data examples demonstrate that the proposed algorithm is as or more efficient than existing methods for Gaussian, binomial, and Poisson distributions. Furthermore, using two genomic examples, I demonstrate how the proposed algorithm can be applied to high-dimensional multinomial regression problems with grouped predictors. R code to reproduce the results is included as supplementary materials. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Gestational exposure to organochlorine compounds and metals and infant birth weight: effect modification by maternal hardships.
- Author
-
Hu, Janice M. Y., Arbuckle, Tye E., Janssen, Patricia A., Lanphear, Bruce P., Alampi, Joshua D., Braun, Joseph M., MacFarlane, Amanda J., Chen, Aimin, and McCandless, Lawrence C.
- Subjects
- *
BIRTH weight , *WEIGHT in infancy , *ORGANOCHLORINE compounds , *TOXIC substance exposure , *METAL compounds , *FOLIC acid , *FETAL growth disorders - Abstract
Background: Gestational exposure to toxic environmental chemicals and maternal social hardships are individually associated with impaired fetal growth, but it is unclear whether the effects of environmental chemical exposure on infant birth weight are modified by maternal hardships. Methods: We used data from the Maternal-Infant Research on Environmental Chemicals (MIREC) Study, a pan-Canadian cohort of 1982 pregnant females enrolled between 2008 and 2011. We quantified eleven environmental chemical concentrations from two chemical classes – six organochlorine compounds (OCs) and five metals – that were detected in ≥ 70% of blood samples collected during the first trimester. We examined fetal growth using birth weight adjusted for gestational age and assessed nine maternal hardships by questionnaire. Each maternal hardship variable was dichotomized to indicate whether the females experienced the hardship. In our analysis, we used elastic net to select the environmental chemicals, maternal hardships, and 2-way interactions between maternal hardships and environmental chemicals that were most predictive of birth weight. Next, we obtained effect estimates using multiple linear regression, and plotted the relationships by hardship status for visual interpretation. Results: Elastic net selected trans-nonachlor, lead, low educational status, racially minoritized background, and low supplemental folic acid intake. All were inversely associated with birth weight. Elastic net also selected interaction terms. Among those with increasing environmental chemical exposures and reported hardships, we observed stronger negative associations and a few positive associations. For example, every two-fold increase in lead concentrations was more strongly associated with reduced infant birth weight among participants with low educational status (β = -100 g (g); 95% confidence interval (CI): -215, 16), than those with higher educational status (β = -34 g; 95% CI: -63, -3). In contrast, every two-fold increase in mercury concentrations was associated with slightly higher birth weight among participants with low educational status (β = 23 g; 95% CI: -25, 71) compared to those with higher educational status (β = -9 g; 95% CI: -24, 6). Conclusions: Our findings suggest that maternal hardships can modify the associations of gestational exposure to some OCs and metals with infant birth weight. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Regularization for electricity price forecasting.
- Author
-
Uniejewski, Bartosz
- Abstract
The most commonly used form of regularization typically involves defining the penalty function as a l¹ or l² norm. However, numerous alternative approaches remain untested in practical applications. In this study, we apply ten different penalty functions to predict electricity prices and evaluate their performance under two different model structures and in two distinct electricity markets. The study reveals that LQ and elastic net consistently produce more accurate forecasts compared to other regularization types. In particular, they were the only types of penalty functions that consistently produced more accurate forecasts than the most commonly used LASSO. Furthermore, the results suggest that cross-validation outperforms Bayesian information criteria for parameter optimization, and performs as well as models with ex-post parameter selection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Machine Learning Models for Salary Prediction in Peruvian Teachers of Regular Basic Education
- Author
-
José, Tinoco Ramos, Jhoset, Yupanqui Arellano, Soria, Juan J., Saboya, Nemias, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Silhavy, Radek, editor, and Silhavy, Petr, editor
- Published
- 2024
- Full Text
- View/download PDF
17. Dimensions Related to NCD in Developing Countries During Working Age Using Ridge, Lasso, and Elastic Net Regressions
- Author
-
Arturo, Domínguez-Miranda Sergio, Rodriguez-Aguilar, Roman, Chlamtac, Imrich, Series Editor, Marmolejo-Saucedo, José Antonio, editor, De La Mota, Idalia Flores, editor, Rodriguez-Aguilar, Roman, editor, Marmolejo-Saucedo, Liliana, editor, Rodriguez-Aguilar, Miriam, editor, Litvinchev, Igor, editor, Vasant, Pandian, editor, and Kose, Utku, editor
- Published
- 2024
- Full Text
- View/download PDF
18. Regression Models for Estimating the Stress Concentration Factor of Rectangular Plates
- Author
-
Monares, J. Alfredo Ramírez, Juárez, Rogelio Florencia, Kacprzyk, Janusz, Series Editor, Jain, Lakhmi C., Series Editor, Pedrycz, Witold, editor, Rivera, Gilberto, editor, Fernández, Eduardo, editor, and Meschino, Gustavo Javier, editor
- Published
- 2024
- Full Text
- View/download PDF
19. Elastic Filter Prune in Deep Neural Networks Using Modified Weighted Hybrid Criterion
- Author
-
Hu, Wei, Han, Yi, Liu, Fang, Hu, Mingce, Li, Xingyuan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Cao, Cungeng, editor, Chen, Huajun, editor, Zhao, Liang, editor, Arshad, Junaid, editor, Asyhari, Taufiq, editor, and Wang, Yonghao, editor
- Published
- 2024
- Full Text
- View/download PDF
20. Crop Yield Prediction Using Stacking Ensemble Model
- Author
-
Rao, D. Srinivasa, Chaganti, Surya Sai Sameera, Chelikani, Santhi Saranya, Nandamuri, Yaswant Venkat, Nippun, Puvvula Venkat, Bansal, Jagdish Chand, Series Editor, Deep, Kusum, Series Editor, Nagar, Atulya K., Series Editor, Tiwari, Ritu, editor, Saraswat, Mukesh, editor, and Pavone, Mario, editor
- Published
- 2024
- Full Text
- View/download PDF
21. Investigating Variable Selection Techniques Under Missing Data: A Simulation Study
- Author
-
Bain, Catherine, Shi, Dingjing, Wiberg, Marie, Kim, Jee-Seon, Hwang, Heungsun, editor, Wu, Hao, editor, and Sweet, Tracy, editor
- Published
- 2024
- Full Text
- View/download PDF
22. Does Contract Farming Improve Farmers’ Income? The Case of Pineapple Farmers in Nong Khai and Loei, Thailand
- Author
-
Teetranont, Teerawut, Tarkhamtham, Payap, Kacprzyk, Janusz, Series Editor, Kreinovich, Vladik, editor, Sriboonchitta, Songsak, editor, and Yamaka, Woraphon, editor
- Published
- 2024
- Full Text
- View/download PDF
23. EleMi: A Robust Method to Infer Soil Ecological Networks with Better Community Structure
- Author
-
Chen, Nan, Bucur, Doina, Botta, Federico, editor, Macedo, Mariana, editor, Barbosa, Hugo, editor, and Menezes, Ronaldo, editor
- Published
- 2024
- Full Text
- View/download PDF
24. Bayesian Elastic-Net and Fused Lasso for Semiparametric Structural Equation Models: An Application in Understanding the Relationship Between Alcohol Morbidity and Other Substance Abuse Factors Among American Youth
- Author
-
Wang, Zhenyu, Chakraborty, Sounak, and Wood, Phillip
- Published
- 2024
- Full Text
- View/download PDF
25. State-size and economic growth: evidence from state reorganization in India
- Author
-
Vaibhav, Vikash and Ramaswamy, K. V.
- Published
- 2024
- Full Text
- View/download PDF
26. Effect of Monetary Policy Decisions and Announcements on the Price of Cryptocurrencies: An Elastic-Net With Arima Residuals Approach
- Author
-
Peciulis Tomas and Vasiliauskaite Asta
- Subjects
cryptocurrency ,monetary policy ,elastic net ,arima ,e42 ,e52 ,e58 ,g02 ,g15 ,Business ,HF5001-6182 ,Economics as a science ,HB71-74 - Abstract
This study analysed the three cryptocurrencies with the largest market capitalization: Bitcoin, Ether (cryptocurrency built upon the Ethereum project's blockchain technology), and Binance coin, which account for 60% of the total cryptocurrency market capitalization. The purpose of this research was to measure the impact of monetary policy on the price of these cryptocurrencies using an adjusted R squared.
- Published
- 2024
- Full Text
- View/download PDF
27. Robust and sparse logistic regression.
- Author
-
Cornilly, Dries, Tubex, Lise, Van Aelst, Stefan, and Verdonck, Tim
- Abstract
Logistic regression is one of the most popular statistical techniques for solving (binary) classification problems in various applications (e.g. credit scoring, cancer detection, ad click predictions and churn classification). Typically, the maximum likelihood estimator is used, which is very sensitive to outlying observations. In this paper, we propose a robust and sparse logistic regression estimator where robustness is achieved by means of the γ -divergence. An elastic net penalty ensures sparsity in the regression coefficients such that the model is more stable and interpretable. We show that the influence function is bounded and demonstrate its robustness properties in simulations. The good performance of the proposed estimator is also illustrated in an empirical application that deals with classifying the type of fuel used by cars. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Automated Bayesian variable selection methods for binary regression models with missing covariate data.
- Author
-
Bergrab, Michael and Aßmann, Christian
- Abstract
Data collection and the availability of large data sets has increased over the last decades. In both statistical and machine learning frameworks, two methodological issues typically arise when performing regression analysis on large data sets. First, variable selection is crucial in regression modeling, as it helps to identify an appropriate model with respect to the considered set of conditioning variables. Second, especially in the context of survey data, handling of missing values is important for estimation, which occur even with state-of-the-art data collection and processing methods. Within this paper, we provide an Bayesian approach based on a spike-and-slab prior for the regression coefficients, which allows for simultaneous handling of variable selection and estimation in combination with handling of missing values in covariate data. The paper also discusses the implementation of the approach using Markov chain Monte Carlo techniques and provides results for simulated data sets and an empirical illustration based on data from the German National Educational Panel Study. The suggested Bayesian approach is compared to other statistical and machine learning frameworks such as Lasso, ridge regression, and Elastic net, and is shown to perform well in terms of estimation performance and variable selection accuracy. The simulation results demonstrate that ignoring the handling of missing values in data sets can lead to the generation of biased selection results. Overall, the proposed Bayesian method offers a holistic, flexible, and powerful framework for variable selection in the presence of missing covariate data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Affinity-based elastic net intuitionistic fuzzy twin support vector machines.
- Author
-
Li, Zhishen and Zhang, Peiai
- Abstract
Intuitionistic fuzzy twin support vector machines (IFTWSVM) combined the concept of intuitionistic fuzzy sets with twin support vector machines (TWSVM) and showed excellent performance in classification. However, the existing intuitionistic fuzzy number schemes based on the single center and the local neighborhood of the sample are difficult to accurately reflect the location information of the sample, and the L1-norm penalty of the slack variable is not well defined from the point of view of geometric points. In view of the above deficiencies, we design a noval intuitionistic fuzzy number scheme and adopt elastic net to penalize the slack variables, propose Affinity-Based Elastic Net Intuitionistic Fuzzy Twin Support Vector Machines (AENIFTWSVM). It calculates the affinity of different classes of samples according to the Support Vector Data Description (SVDD) model in the kernel space, and considers the contribution of samples to the two classes, and the obtained affinity can be used to identify noise information. A series of experimental outcomes on benchmark datasets and handwritten digit dataset support that the proposed model outperforms some existing models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Estimation of a treatment effect based on a modified covariates method with L0 norm.
- Author
-
Tanioka, Kensuke, Okuda, Kaoru, Hiwa, Satoru, and Hiroyasu, Tomoyuki
- Abstract
In randomized clinical trials, we assumed the situation that the new treatment is not adequate compared to the control treatment as a result. However, it is unknown if the new treatment is ineffective for all patients or if it is effective for only a subgroup of patients with specific characteristics. If such a subgroup exists and can be detected, the patients can receive effective therapy. To detect subgroups, we need to estimate treatment effects. To achieve this, various treatment effect estimation methods have been proposed based on the sparse regression method. However, these methods are affected by noise. Therefore, we propose new treatment effect estimation approaches based on the modified covariate method, one using lasso regression and the other ridge regression, using the L 0 norm. The proposed approach was evaluated through numerical simulation and real data examples. As a result, the results of the proposed method were almost the same as those of existing methods in numerical simulations, but were effective in real data example. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Multivariate realized volatility: an analysis via shrinkage methods for Brazilian market data.
- Author
-
Vieir, Leonardo Ieracitano, Laurini, Márcio Poletti, Loperfido, Nicola, and Ismail, Mohd Tahir
- Subjects
COVARIANCE matrices ,GARCH model ,CAPITAL assets pricing model ,TIME series analysis - Abstract
Introduction: Realized volatility analysis of assets in the Brazilian market within a multivariate framework is the focus of this study. Despite the success of volatility models in univariate scenarios, challenges arise due to increasing dimensionality of covariance matrices and lower asset liquidity in emerging markets. Methods: In this study, we utilize intraday stock trading data from the Brazilian Market to compute daily covariance matrices using various specifications. To mitigate dimensionality issues in covariance matrix estimation, we implement penalization restrictions on coefficients through regressions with shrinkage techniques using Ridge, LASSO, or Elastic Net estimators. These techniques are employed to capture the dynamics of covariance matrices. Results: Comparison of covariance construction models is performed using the Model Confidence Set (MCS) algorithm, which selects the best models based on their predictive performance. The findings indicate that the method used for estimating the covariance matrix significantly impacts the selection of the best models. Additionally, it is observed that more liquid sectors demonstrate greater intra-sectoral dynamics. Discussion: While the results benefit from shrinkage techniques, the high correlation between assets presents challenges in capturing stock or sector idiosyncrasies. This suggests the need for further exploration and refinement of methods to better capture the complexities of volatility dynamics in emerging markets like Brazil. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Validation and Implementation of Customer Classification System using Machine Learning.
- Author
-
Yoon, Hyemin, Kim, HyunJin, and Kim, Sangjin
- Subjects
- *
MACHINE learning , *FINANCIAL institutions , *SUPPORT vector machines - Abstract
We have maintained the customer grade system that is being implemented to customers with excellent performance through customer segmentation for years. Currently, financial institutions that operate the customer grade system provide similar services based on the score calculation criteria, but the score calculation criteria vary from the financial institution to financial institution. In this study, we create a machine learning prediction model using items and added items that are based on the current customer grade of our bank,- and the purpose is an optimal model that considers the adequacy of existing variables and the validity of additional variables through comparison between models. Using Lasso, Elastic net and Multinomial Logistic Regression, Decision Tree, Random Forest, and Support Vector Machine, we propose that the best model be found and gradually applied to customer grade calculation criteria. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. 基于 k-sums 分段聚类的动态组合学习光伏短期功率预测.
- Author
-
吴家葆, 曾国辉, and 张振华
- Abstract
At present, the prediction accuracy of a single model is difficult to remain optimal with power fluctuation. To improve the stability of grid connected system operation and energy saving dispatching of power grid, this study proposes a dynamic learning combination short-term power prediction method based on k-sums hierarchical clustering. The weather types are divided into sunny day A1, cloudy day A2, and rainy day B through segmentation clustering using k-sums algorithm. The TCN (Temporal Convolutional Network) is used to extract the time series characteristics of data, and the GRU(Gate Recurrent Unit) structure of the fusion extraction time series characteristics module is established with GRU to achieve the effect of being sensitive to the time series characteristics. After dynamically combining the improved GRU structure with the SVM(Support Vector Machine), the Elastic Net algorithm is adopted to output the best weight value to obtain the final prediction value. The power data of photovoltaic power generation and corresponding meteorological data of a region in Jiangsu are used to verify the proposed method. The results show that the MAE(Mean Absolute Error) of the dynamic combination learning model is 1.888, and the RMSE(Root Mean Squared Error) is 2.403. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Identifying factors related to mortality of hospitalized COVID-19 patients using machine learning methods
- Author
-
Farzaneh Hamidi, Hadi Hamishehkar, Pedram Pirmad Azari Markid, and Parvin Sarbakhsh
- Subjects
COVID-19 ,Mortality ,Machine learning ,LASSO ,Elastic net ,Artificial neural network ,Science (General) ,Q1-390 ,Social sciences (General) ,H1-99 - Abstract
Background: The COVID-19 pandemic has had a profound impact globally, presenting significant social and economic challenges. This study aims to explore the factors affecting mortality among hospitalized COVID-19 patients and construct a machine learning-based model to predict the risk of mortality. Methods: The study examined COVID-19 patients admitted to Imam Reza Hospital in Tabriz, Iran, between March 2020 and November 2021. The Elastic Net method was employed to identify and rank features associated with mortality risk. Subsequently, an artificial neural network (ANN) model was developed based on these features to predict mortality risk. The performance of the model was evaluated by receiver operating characteristic (ROC) curve analysis. Results: The study included 706 patients with 96 features, out of them 26 features were identified as crucial predictors of mortality. The ANN model, utilizing 20 of these features, achieved an area under the ROC curve (AUC) of 98.8 %, effectively stratifying patients by mortality risk. Conclusion: The developed model offers accurate and precipitous mortality risk predictions for COVID-19 patients, enhancing the responsiveness of healthcare systems to high-risk individuals.
- Published
- 2024
- Full Text
- View/download PDF
35. Tissue of origin prediction for cancer of unknown primary using a targeted methylation sequencing panel
- Author
-
Miaomiao Sun, Bo Xu, Chao Chen, Youjie Zhu, Xiaomo Li, and Kuisheng Chen
- Subjects
Cancer of unknown primary ,Methylation classifier ,Tissue of origin ,Machine learning ,Random forest ,Elastic net ,Medicine ,Genetics ,QH426-470 - Abstract
Abstract Rationale Cancer of unknown primary (CUP) is a group of rare malignancies with poor prognosis and unidentifiable tissue-of-origin. Distinct DNA methylation patterns in different tissues and cancer types enable the identification of the tissue of origin in CUP patients, which could help risk assessment and guide site-directed therapy. Methods Using genome-wide DNA methylation profile datasets from The Cancer Genome Atlas (TCGA) and machine learning methods, we developed a 200-CpG methylation feature classifier for CUP tissue of origin prediction (MFCUP). MFCUP was further validated with public-available methylation array data of 2977 specimens and targeted methylation sequencing of 78 Formalin‐fixed paraffin‐embedded (FFPE) samples from a single center. Results MFCUP achieved an accuracy of 97.2% in a validation cohort (n = 5923) representing 25 cancer types. When applied to an Infinium 450 K array dataset (n = 1052) and an Infinium EPIC (850 K) array dataset (n = 1925), MFCUP achieved an overall accuracy of 93.4% and 84.8%, respectively. Based on MFCUP, we established a targeted bisulfite sequencing panel and validated it with FFPE sections from 78 patients of 20 cancer types. This methylation sequencing panel correctly identified tissue of origin in 88.5% (69/78) of samples. We also found that the methylation levels of specific CpGs can distinguish one cancer type from others, indicating their potential as biomarkers for cancer diagnosis and screening. Conclusion Our methylation-based cancer classifier and targeted methylation sequencing panel can predict tissue of origin in diverse cancer types with high accuracy.
- Published
- 2024
- Full Text
- View/download PDF
36. ERDeR: The Combination of Statistical Shrinkage Methods and Ensemble Approaches to Improve the Performance of Deep Regression
- Author
-
Zari Farhadi, Mohammad-Reza Feizi-Derakhshi, Hossein Bevrani, Wonjoon Kim, and Muhammad Fazal Ijaz
- Subjects
Deep learning ,convolutional neural network ,shrinkage methods ,ensemble learning ,LASSO ,elastic net ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Ensembling is a powerful technique to obtain the most accurate results. In some cases, the large number of learners in ensemble learning mostly increases both computational load during the test phase and error rate. To solve this problem, in this paper we propose an Ensemble of Reduced Deep Regression (ERDeR) model, which is a combination of Deep Regressions (DRs), shrinkage methods, and ensemble approaches. The framework of the proposed model contains three phases. The first phase includes base regressions in which parallel DRs are used as learners. The role of these DRs is to extract features of input data and make prediction. In the second phase, to automatically reduce and select the most suitable DRs, shrinkage methods such as Least Absolute Shrinkage and Selection Operator (LASSO) and Elastic Net (EN) are employed. These models are compared with the non-shrinkage model. The last phase is ensemble phase, which consists of three different ensemble methods namely Multi-Layer Perceptron (MLP), Weighted Average (WA), and Simple Average (SA). These ensemble methods are used to aggregate the remaining learners from previous steps. Finally, the proposed model is applied to Monte Carlo simulation data and three real datasets including Boston House Price, Real Estate Valuation and Gold Price per Ounce. The results show that after applying the shrinkage methods the error rate is significantly reduced and the model accuracy is increased. Accordingly, the results of combining shrinkage methods and ensemble approaches not only decreased the computational load during test phase, but also increased the model accuracy.
- Published
- 2024
- Full Text
- View/download PDF
37. Performance of machine‐learning approach for prediction of pre‐eclampsia in a middle‐income country.
- Author
-
Torres‐Torres, J., Villafan‐Bernal, J. R., Martinez‐Portilla, R. J., Hidalgo‐Carrera, J. A., Estrada‐Gutierrez, G., Adalid‐Martinez‐Cisneros, R., Rojas‐Zepeda, L., Acevedo‐Gallegos, S., Camarena‐Cabrera, D. M., Cruz‐Martínez, M. Y., and Espino‐y‐Sosa, S.
- Subjects
- *
MACHINE learning , *PLACENTAL growth factor , *PREECLAMPSIA , *MIDDLE-income countries , *PREGNANCY complications - Abstract
Objective: Pre‐eclampsia (PE) is a serious complication of pregnancy associated with maternal and fetal morbidity and mortality. As current prediction models have limitations and may not be applicable in resource‐limited settings, we aimed to develop a machine‐learning (ML) algorithm that offers a potential solution for developing accurate and efficient first‐trimester prediction of PE. Methods: We conducted a prospective cohort study in Mexico City, Mexico to develop a first‐trimester prediction model for preterm PE (pPE) using ML. Maternal characteristics and locally derived multiples of the median (MoM) values for mean arterial pressure, uterine artery pulsatility index and serum placental growth factor were used for variable selection. The dataset was split into training, validation and test sets. An elastic‐net method was employed for predictor selection, and model performance was evaluated using area under the receiver‐operating‐characteristics curve (AUC) and detection rates (DR) at 10% false‐positive rates (FPR). Results: The final analysis included 3050 pregnant women, of whom 124 (4.07%) developed PE. The ML model showed good performance, with AUCs of 0.897, 0.963 and 0.778 for pPE, early‐onset PE (ePE) and any type of PE (all‐PE), respectively. The DRs at 10% FPR were 76.5%, 88.2% and 50.1% for pPE, ePE and all‐PE, respectively. Conclusions: Our ML model demonstrated high accuracy in predicting pPE and ePE using first‐trimester maternal characteristics and locally derived MoM. The model may provide an efficient and accessible tool for early prediction of PE, facilitating timely intervention and improved maternal and fetal outcome. © 2023 The Authors. Ultrasound in Obstetrics & Gynecology published by John Wiley & Sons Ltd on behalf of International Society of Ultrasound in Obstetrics and Gynecology. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Feature selection of ground motion intensity measures for data‐driven surrogate modeling of structures.
- Author
-
Ding, Jia‐Yi and Feng, De‐Cheng
- Subjects
GROUND motion ,MACHINE learning ,EARTHQUAKE intensity ,FEATURE selection ,REINFORCED concrete ,CONDITIONAL probability - Abstract
In the probabilistic seismic performance assessment of structures, intensity measures (IMs) represent seismic characteristics and variations. Traditional fragility analysis method based on the assumption of linear regression requires selecting an optimal IM as input variable. By introducing machine learning (ML) techniques, nonparametric fragility analysis theoretically allows for considering all potential IMs as inputs. Nevertheless, to reduce input dimensionality and improve training efficiency, the feature selection of IMs remains imperative. This paper proposes a method to select optimal ground motion IMs for data‐driven surrogate modeling of structures. Specifically, the elastic net algorithm is employed to select the optimal multiple IMs based on the coefficient of determination and regression coefficient, differing from the efficiency and practicality emphasized in the traditional method. Using the optimal multiple IMs as input variables, several ML techniques are employed to construct surrogate models for seismic damage assessment of structures, thereby developing fragility functions, that is, the conditional probability of exceeding a damaged state given seismic intensity. A 3‐span, 6‐storey, reinforced concrete frame is utilized to illustrate the proposed methodology. The predictive performance of all ML models with the optimal multiple IMs outperforms that of the models with the commonly used IM (e.g., peak ground acceleration, PGA) as sole input and all candidate IMs as inputs. Additionally, the surrogate models with the optimal multiple IMs enable a more comprehensive seismic fragility modeling of structures under two or more IMs simultaneously, such as the fragility surface under spectral acceleration at 1.0s (Sa‐1.0s) and velocity spectrum intensity (VSI). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Integrating High-Resolution Mass Spectral Data, Bioassays and Computational Models to Annotate Bioactives in Botanical Extracts: Case Study Analysis of C. asiatica Extract Associates Dicaffeoylquinic Acids with Protection against Amyloid-β Toxicity.
- Author
-
Alcázar Magaña, Armando, Vaswani, Ashish, Brown, Kevin S., Jiang, Yuan, Alam, Md Nure, Caruso, Maya, Lak, Parnian, Cheong, Paul, Gray, Nora E., Quinn, Joseph F., Soumyanath, Amala, Stevens, Jan F., and Maier, Claudia S.
- Subjects
- *
CENTELLA asiatica , *CHEMICAL fingerprinting , *ALZHEIMER'S disease , *CYTOTOXINS , *EXTRACTS , *AMYLOID beta-protein - Abstract
Rapid screening of botanical extracts for the discovery of bioactive natural products was performed using a fractionation approach in conjunction with flow-injection high-resolution mass spectrometry for obtaining chemical fingerprints of each fraction, enabling the correlation of the relative abundance of molecular features (representing individual phytochemicals) with the read-outs of bioassays. We applied this strategy for discovering and identifying constituents of Centella asiatica (C. asiatica) that protect against Aβ cytotoxicity in vitro. C. asiatica has been associated with improving mental health and cognitive function, with potential use in Alzheimer's disease. Human neuroblastoma MC65 cells were exposed to subfractions of an aqueous extract of C. asiatica to evaluate the protective benefit derived from these subfractions against amyloid β-cytotoxicity. The % viability score of the cells exposed to each subfraction was used in conjunction with the intensity of the molecular features in two computational models, namely Elastic Net and selectivity ratio, to determine the relationship of the peak intensity of molecular features with % viability. Finally, the correlation of mass spectral features with MC65 protection and their abundance in different sub-fractions were visualized using GNPS molecular networking. Both computational methods unequivocally identified dicaffeoylquinic acids as providing strong protection against Aβ-toxicity in MC65 cells, in agreement with the protective effects observed for these compounds in previous preclinical model studies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Tissue of origin prediction for cancer of unknown primary using a targeted methylation sequencing panel.
- Author
-
Sun, Miaomiao, Xu, Bo, Chen, Chao, Zhu, Youjie, Li, Xiaomo, and Chen, Kuisheng
- Subjects
- *
CANCER of unknown primary origin , *METHYLATION , *METHYLGUANINE , *DNA methylation - Abstract
Rationale: Cancer of unknown primary (CUP) is a group of rare malignancies with poor prognosis and unidentifiable tissue-of-origin. Distinct DNA methylation patterns in different tissues and cancer types enable the identification of the tissue of origin in CUP patients, which could help risk assessment and guide site-directed therapy. Methods: Using genome-wide DNA methylation profile datasets from The Cancer Genome Atlas (TCGA) and machine learning methods, we developed a 200-CpG methylation feature classifier for CUP tissue of origin prediction (MFCUP). MFCUP was further validated with public-available methylation array data of 2977 specimens and targeted methylation sequencing of 78 Formalin‐fixed paraffin‐embedded (FFPE) samples from a single center. Results: MFCUP achieved an accuracy of 97.2% in a validation cohort (n = 5923) representing 25 cancer types. When applied to an Infinium 450 K array dataset (n = 1052) and an Infinium EPIC (850 K) array dataset (n = 1925), MFCUP achieved an overall accuracy of 93.4% and 84.8%, respectively. Based on MFCUP, we established a targeted bisulfite sequencing panel and validated it with FFPE sections from 78 patients of 20 cancer types. This methylation sequencing panel correctly identified tissue of origin in 88.5% (69/78) of samples. We also found that the methylation levels of specific CpGs can distinguish one cancer type from others, indicating their potential as biomarkers for cancer diagnosis and screening. Conclusion: Our methylation-based cancer classifier and targeted methylation sequencing panel can predict tissue of origin in diverse cancer types with high accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Prenatal Exposure to Opioids and Neurodevelopmental Disorders in Children: A Bayesian Mediation Analysis.
- Author
-
Wang, Shuang, Puggioni, Gavino, Wu, Jing, Meador, Kimford J, Caffrey, Aisling, Wyss, Richard, Slaughter, Jonathan L, Suzuki, Etsuji, Ward, Kristina E, Lewkowitz, Adam K, and Wen, Xuerong
- Subjects
- *
HUMAN abnormalities , *RETROSPECTIVE studies , *PRENATAL exposure delayed effects , *PREGNANCY outcomes , *CHILD psychopathology , *PREGNANCY complications , *RESEARCH funding , *FACTOR analysis , *DESCRIPTIVE statistics , *OPIOID analgesics , *DATA analysis software , *LONGITUDINAL method - Abstract
This study explores natural direct and joint natural indirect effects (JNIE) of prenatal opioid exposure on neurodevelopmental disorders (NDDs) in children mediated through pregnancy complications, major and minor congenital malformations, and adverse neonatal outcomes, using Medicaid claims linked to vital statistics in Rhode Island, United States, 2008–2018. A Bayesian mediation analysis with elastic net shrinkage prior was developed to estimate mean time to NDD diagnosis ratio using posterior mean and 95% credible intervals (CrIs) from Markov chain Monte Carlo algorithms. Simulation studies showed desirable model performance. Of 11,176 eligible pregnancies, 332 had ≥2 dispensations of prescription opioids anytime during pregnancy, including 200 (1.8%) having ≥1 dispensation in the first trimester (T1), 169 (1.5%) in the second (T2), and 153 (1.4%) in the third (T3). A significant JNIE of opioid exposure was observed in each trimester (T1, JNIE = 0.97, 95% CrI: 0.95, 0.99; T2, JNIE = 0.97, 95% CrI: 0.95, 0.99; T3, JNIE = 0.96, 95% CrI: 0.94, 0.99). The proportion of JNIE in each trimester was 17.9% (T1), 22.4% (T2), and 56.3% (T3). In conclusion, adverse pregnancy and birth outcomes jointly mediated the association between prenatal opioid exposure and accelerated time to NDD diagnosis. The proportion of JNIE increased as the timing of opioid exposure approached delivery. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Development of machine learning-based models for describing processes in a continuous solar-driven biomass gasifier.
- Author
-
Tasneem, Shadma, Ageeli, Abeer Ali, Alamier, Waleed M., Hasan, Nazim, and Goodarzi, Marjan
- Subjects
- *
BIOMASS gasification , *BIOMASS energy , *MACHINE learning , *BIOMASS , *CONTINUOUS processing , *SOLAR energy - Abstract
The synergy of two renewable and efficient sources in producing clean fuels, i.e., solar energy and biomass, can result in high efficiency. In this regard, developing syngas production systems based on solar biomass gasification has attracted much attention. However, experimental setups on solar-driven gasifier processes are costly and time-intensive. In such a situation, an accurate and low-cost alternative is to develop data-driven machine learning (ML) models to predict the processes involved in solar-driven biomass gasifiers. In the present study, several ML models, including random forest (RF), RANdom SAmple Consensus (RANSAC), stochastic gradient descent (SGD), automatic relevance determination (ARD) regression, and elastic net linear (ENL) regression, were developed to accurately predict the various processing of a continuous solar-driven biomass gasifier, including CO production rate and H 2 production rate (H 2), carbon feeding rate, solar power input, thermochemical reactor efficiency, solar-to-fuel energy conversion efficiency, solar energy input, and carbon consumption rate. Using efficient ML methods, the eight formulas and the eight models for H 2 , CO production rate, carbon feeding rate, carbon consumption rate, solar energy input, solar power input, thermochemical reactor efficiency, and solar-to-fuel energy conversion efficiency are made in this study. Using the linear form can be reached the best R-Squared (R2) values for all formulas and the model, and the best R2 values are between 0.998 and 0.999 for the formulas by the elastic net and the ARD regression, and also the best R2 values are between 0.998 and 0.999 for the models by the RF and the RANSAC regressor. The R2 values for H 2 , CO production rate, carbon consumption rate, solar energy input, solar power input, thermochemical reactor efficiency, and solar-to-fuel energy conversion efficiency for formulas, respectively, are 0.998, 0.998, 0.999, 0.999, 0.999, 0.996, and 0.998 by the elastic net. For temperature of the carbon feeding rate, this value is 0.999 by the ARD regressor. • Synergy of solar energy and biomass in gasification for high-efficiency clean fuel production. • Data-driven machine learning models developed to predict processes in solar-driven biomass gasifiers. • Several ML models are used to predict CO and H 2 production rate, carbon feeding rate, and more. • Linear forms achieved the best R-Squared values for all formulas and models. • Elastic net and ARD regression had the best R-Squared values for formulas. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier
- Author
-
Xin Liu, Bao Zhu, Xia-Wei Dai, Zhi-Ao Xu, Rui Li, Yuting Qian, Ya-Ping Lu, Wenqing Zhang, Yong Liu, and Junnian Zheng
- Subjects
Lysine glutarylation ,Post-translational modification ,GBDT ,Elastic Net ,NearMiss-3 ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Lysine glutarylation (Kglu) is one of the most important Post-translational modifications (PTMs), which plays significant roles in various cellular functions, including metabolism, mitochondrial processes, and translation. Therefore, accurate identification of the Kglu site is important for elucidating protein molecular function. Due to the time-consuming and expensive limitations of traditional biological experiments, computational-based Kglu site prediction research is gaining more and more attention. Results In this paper, we proposed GBDT_KgluSite, a novel Kglu site prediction model based on GBDT and appropriate feature combinations, which achieved satisfactory performance. Specifically, seven features including sequence-based features, physicochemical property-based features, structural-based features, and evolutionary-derived features were used to characterize proteins. NearMiss-3 and Elastic Net were applied to address data imbalance and feature redundancy issues, respectively. The experimental results show that GBDT_KgluSite has good robustness and generalization ability, with accuracy and AUC values of 93.73%, and 98.14% on five-fold cross-validation as well as 90.11%, and 96.75% on the independent test dataset, respectively. Conclusion GBDT_KgluSite is an effective computational method for identifying Kglu sites in protein sequences. It has good stability and generalization ability and could be useful for the identification of new Kglu sites in the future. The relevant code and dataset are available at https://github.com/flyinsky6/GBDT_KgluSite .
- Published
- 2023
- Full Text
- View/download PDF
44. Cytomegalovirus infection disrupts the influence of short-chain fatty acid producers on Treg/Th17 balance
- Author
-
Chin, Ning, Narayan, Nicole R, Méndez-Lagares, Gema, Ardeshir, Amir, Chang, WL William, Deere, Jesse D, Fontaine, Justin H, Chen, Connie, Kieu, Hung T, Lu, Wenze, Barry, Peter A, Sparger, Ellen E, and Hartigan-O’Connor, Dennis J
- Subjects
Microbiology ,Biological Sciences ,Infectious Diseases ,Microbiome ,Nutrition ,1.1 Normal biological development and functioning ,2.1 Biological and endogenous factors ,Inflammatory and immune system ,Infection ,Animals ,CD8-Positive T-Lymphocytes ,Cytokines ,Cytomegalovirus ,Cytomegalovirus Infections ,Fatty Acids ,Volatile ,Macaca mulatta ,Mice ,T-Lymphocytes ,Regulatory ,Host-microbe interactions ,Cytomegalovirus infection ,Immunophenotype ,Elastic net ,Rhesus macaque ,16S analysis ,Ecology ,Medical Microbiology ,Evolutionary biology - Abstract
BackgroundBoth the gut microbiota and chronic viral infections have profound effects on host immunity, but interactions between these influences have been only superficially explored. Cytomegalovirus (CMV), for example, infects approximately 80% of people globally and drives significant changes in immune cells. Similarly, certain gut-resident bacteria affect T-cell development in mice and nonhuman primates. It is unknown if changes imposed by CMV on the intestinal microbiome contribute to immunologic effects of the infection.ResultsWe show that rhesus cytomegalovirus (RhCMV) infection is associated with specific differences in gut microbiota composition, including decreased abundance of Firmicutes, and that the extent of microbial change was associated with immunologic changes including the proliferation, differentiation, and cytokine production of CD8+ T cells. Furthermore, RhCMV infection disrupted the relationship between short-chain fatty acid producers and Treg/Th17 balance observed in seronegative animals, showing that some immunologic effects of CMV are due to disruption of previously existing host-microbe relationships.ConclusionsGut microbes have an important influence on health and disease. Diet is known to shape the microbiota, but the influence of concomitant chronic viral infections is unclear. We found that CMV influences gut microbiota composition to an extent that is correlated with immunologic changes in the host. Additionally, pre-existing correlations between immunophenotypes and gut microbes can be subverted by CMV infection. Immunologic effects of CMV infection on the host may therefore be mediated by two different mechanisms involving gut microbiota. Video Abstract.
- Published
- 2022
45. Variable Selection Methods-Based Analysis of Macroeconomic Factors for an Enhanced GDP Forecasting: A Case Study of Thailand
- Author
-
Tansuchat, Roengchai, Rakpho, Pichayakone, Klinlampu, Chaiwat, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huynh, Van-Nam, editor, Le, Bac, editor, Honda, Katsuhiro, editor, Inuiguchi, Masahiro, editor, and Kohda, Youji, editor
- Published
- 2023
- Full Text
- View/download PDF
46. Regularization
- Author
-
Emmert-Streib, Frank, Moutari, Salissou, Dehmer, Matthias, Emmert-Streib, Frank, Moutari, Salissou, and Dehmer, Matthias
- Published
- 2023
- Full Text
- View/download PDF
47. Predicting Colour Reflectance with Gradient Boosting and Deep Learning
- Author
-
Akanuma, Asei, Stamate, Daniel, Bishop, J. Mark, Rannenberg, Kai, Editor-in-Chief, Soares Barbosa, Luís, Editorial Board Member, Goedicke, Michael, Editorial Board Member, Tatnall, Arthur, Editorial Board Member, Neuhold, Erich J., Editorial Board Member, Stiller, Burkhard, Editorial Board Member, Stettner, Lukasz, Editorial Board Member, Pries-Heje, Jan, Editorial Board Member, Kreps, David, Editorial Board Member, Rettberg, Achim, Editorial Board Member, Furnell, Steven, Editorial Board Member, Mercier-Laurent, Eunika, Editorial Board Member, Winckler, Marco, Editorial Board Member, Malaka, Rainer, Editorial Board Member, Maglogiannis, Ilias, editor, Iliadis, Lazaros, editor, MacIntyre, John, editor, and Dominguez, Manuel, editor
- Published
- 2023
- Full Text
- View/download PDF
48. Using Bayesian networks with tabu algorithm to explore factors related to chronic kidney disease with mental illness: A cross-sectional study
- Author
-
Xiaoli Yuan, Wenzhu Song, Yaheng Li, Qili Wang, Jianbo Qing, Wenqiang Zhi, Huimin Han, Zhiqi Qin, Hao Gong, Guohua Hou, and Yafeng Li
- Subjects
bayesian networks ,chronic kidney disease with mental illness (kdmi) ,tabu algorithm ,related factors ,model construction ,elastic net ,propensity score matching ,Biotechnology ,TP248.13-248.65 ,Mathematics ,QA1-939 - Abstract
While Bayesian networks (BNs) offer a promising approach to discussing factors related to many diseases, little attention has been poured into chronic kidney disease with mental illness (KDMI) using BNs. This study aimed to explore the complex network relationships between KDMI and its related factors and to apply Bayesian reasoning for KDMI, providing a scientific reference for its prevention and treatment. Data was downloaded from the online open database of CHARLS 2018, a population-based longitudinal survey. Missing values were first imputed using Random Forest, followed by propensity score matching (PSM) for class balancing regarding KDMI. Elastic Net was then employed for variable selection from 18 variables. Afterwards, the remaining variables were included in BNs model construction. Structural learning of BNs was achieved using tabu algorithm and the parameter learning was conducted using maximum likelihood estimation. After PSM, 427 non-KDMI cases and 427 KDMI cases were included in this study. Elastic Net identified 11 variables significantly associated with KDMI. The BNs model comprised 12 nodes and 24 directed edges. The results suggested that diabetes, physical activity, education levels, sleep duration, social activity, self-report on health and asset were directly related factors for KDMI, whereas sex, age, residence and Internet access represented indirect factors for KDMI. BN model not only allows for the exploration of complex network relationships between related factors and KDMI, but also could enable KDMI risk prediction through Bayesian reasoning. This study suggests that BNs model holds great prospects in risk factor detection for KDMI.
- Published
- 2023
- Full Text
- View/download PDF
49. Predicting depression risk in early adolescence via multimodal brain imaging
- Author
-
Zeus Gracia-Tabuenca, Elise B. Barbeau, Yu Xia, and Xiaoqian Chai
- Subjects
Depression risk ,Parental depression ,Adolescence ,Multi-modal MRI ,Multi-site ,Elastic net ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Neurology. Diseases of the nervous system ,RC346-429 - Abstract
Depression is an incapacitating psychiatric disorder with increased risk through adolescence. Among other factors, children with family history of depression have significantly higher risk of developing depression. Early identification of pre-adolescent children who are at risk of depression is crucial for early intervention and prevention. In this study, we used a large longitudinal sample from the Adolescent Brain Cognitive Development (ABCD) Study (2658 participants after imaging quality control, between 9–10 years at baseline), we applied advanced machine learning methods to predict depression risk at the two-year follow-up from the baseline assessment, using a set of comprehensive multimodal neuroimaging features derived from structural MRI, diffusion tensor imaging, and task and rest functional MRI. Prediction performance underwent a rigorous cross-validation method of leave-one-site-out. Our results demonstrate that all brain features had prediction scores significantly better than expected by chance, with brain features from rest-fMRI showing the best classification performance in the high-risk group of participants with parental history of depression (N = 625). Specifically, rest-fMRI features, which came from functional connectomes, showed significantly better classification performance than other brain features. This finding highlights the key role of the interacting elements of the connectome in capturing more individual variability in psychopathology compared to measures of single brain regions. Our study contributes to the effort of identifying biological risks of depression in early adolescence in population-based samples.
- Published
- 2024
- Full Text
- View/download PDF
50. GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier.
- Author
-
Liu, Xin, Zhu, Bao, Dai, Xia-Wei, Xu, Zhi-Ao, Li, Rui, Qian, Yuting, Lu, Ya-Ping, Zhang, Wenqing, Liu, Yong, and Zheng, Junnian
- Subjects
- *
PREDICTION models , *POST-translational modification , *CELL physiology , *AMINO acid sequence , *LYSINE - Abstract
Background: Lysine glutarylation (Kglu) is one of the most important Post-translational modifications (PTMs), which plays significant roles in various cellular functions, including metabolism, mitochondrial processes, and translation. Therefore, accurate identification of the Kglu site is important for elucidating protein molecular function. Due to the time-consuming and expensive limitations of traditional biological experiments, computational-based Kglu site prediction research is gaining more and more attention. Results: In this paper, we proposed GBDT_KgluSite, a novel Kglu site prediction model based on GBDT and appropriate feature combinations, which achieved satisfactory performance. Specifically, seven features including sequence-based features, physicochemical property-based features, structural-based features, and evolutionary-derived features were used to characterize proteins. NearMiss-3 and Elastic Net were applied to address data imbalance and feature redundancy issues, respectively. The experimental results show that GBDT_KgluSite has good robustness and generalization ability, with accuracy and AUC values of 93.73%, and 98.14% on five-fold cross-validation as well as 90.11%, and 96.75% on the independent test dataset, respectively. Conclusion: GBDT_KgluSite is an effective computational method for identifying Kglu sites in protein sequences. It has good stability and generalization ability and could be useful for the identification of new Kglu sites in the future. The relevant code and dataset are available at https://github.com/flyinsky6/GBDT_KgluSite. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.