6,895 results on '"LASSO"'
Search Results
2. Stock Open Price Prediction of Software Companies in the BSE SENSEX 50 Index
- Author
-
Sonar, Chhaya, Al Hammadi, Ahmed M., Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Weber, Gerhard-Wilhelm, editor, Martinez Trinidad, Jose Francisco, editor, Sheng, Michael, editor, Ramachand, Raghavendra, editor, Kharb, Latika, editor, and Chahal, Deepak, editor
- Published
- 2025
- Full Text
- View/download PDF
3. NCAPH drives breast cancer progression and identifies a gene signature that predicts luminal a tumour recurrence.
- Author
-
Mendiburu-Eliçabe, Marina, García-Sancha, Natalia, Corchado-Cobos, Roberto, Martínez-López, Angélica, Chang, Hang, Hua Mao, Jian, Blanco-Gómez, Adrián, García-Casas, Ana, Castellanos-Martín, Andrés, Salvador, Nélida, Jiménez-Navas, Alejandro, Pérez-Baena, Manuel, Sánchez-Martín, Manuel, Abad-Hernández, María, Carmen, Sofía, Claros-Ampuero, Juncal, Cruz-Hernández, Juan, Rodríguez-Sánchez, César, García-Cenador, María, García-Criado, Francisco, Vicente, Rodrigo, Castillo-Lluva, Sonia, and Pérez-Losada, Jesús
- Subjects
LASSO ,NCAPH ,breast cancer ,genetic signature ,luminal A subtype ,prognosis ,relapse-free survival ,Humans ,Mice ,Animals ,Female ,Breast Neoplasms ,Neoplasm Recurrence ,Local ,Gene Expression Profiling ,Prognosis ,Mice ,Transgenic ,Nuclear Proteins ,Cell Cycle Proteins - Abstract
BACKGROUND: Luminal A tumours generally have a favourable prognosis but possess the highest 10-year recurrence risk among breast cancers. Additionally, a quarter of the recurrence cases occur within 5 years post-diagnosis. Identifying such patients is crucial as long-term relapsers could benefit from extended hormone therapy, while early relapsers might require more aggressive treatment. METHODS: We conducted a study to explore non-structural chromosome maintenance condensin I complex subunit Hs (NCAPH) role in luminal A breast cancer pathogenesis, both in vitro and in vivo, aiming to identify an intratumoural gene expression signature, with a focus on elevated NCAPH levels, as a potential marker for unfavourable progression. Our analysis included transgenic mouse models overexpressing NCAPH and a genetically diverse mouse cohort generated by backcrossing. A least absolute shrinkage and selection operator (LASSO) multivariate regression analysis was performed on transcripts associated with elevated intratumoural NCAPH levels. RESULTS: We found that NCAPH contributes to adverse luminal A breast cancer progression. The intratumoural gene expression signature associated with elevated NCAPH levels emerged as a potential risk identifier. Transgenic mice overexpressing NCAPH developed breast tumours with extended latency, and in Mouse Mammary Tumor Virus (MMTV)-NCAPHErbB2 double-transgenic mice, luminal tumours showed increased aggressiveness. High intratumoural Ncaph levels correlated with worse breast cancer outcome and subpar chemotherapy response. A 10-gene risk score, termed Gene Signature for Luminal A 10 (GSLA10), was derived from the LASSO analysis, correlating with adverse luminal A breast cancer progression. CONCLUSIONS: The GSLA10 signature outperformed the Oncotype DX signature in discerning tumours with unfavourable outcomes, previously categorised as luminal A by Prediction Analysis of Microarray 50 (PAM50) across three independent human cohorts. This new signature holds promise for identifying luminal A tumour patients with adverse prognosis, aiding in the development of personalised treatment strategies to significantly improve patient outcomes.
- Published
- 2024
4. A novel IoT-integrated ensemble learning approach for indoor air quality enhancement.
- Author
-
Kareem Abed Alzabali, Saja, Bastam, Mostafa, and Ataie, Ehsan
- Subjects
- *
MACHINE learning , *INDOOR air quality , *AIR quality monitoring , *STANDARD deviations , *PARTICULATE matter , *ATMOSPHERIC carbon dioxide , *LIQUEFIED petroleum gas - Abstract
In indoor environments, air quality significantly impacts human health and well-being, with carbon monoxide (CO) posing a particular hazard due to its colorless and odorless nature and potential to cause severe health issues. Integrating the Internet of Things and remote sensing technologies has revolutionized data monitoring, collection, and evaluation, especially within the context of 'smart' homes. This study leverages these technologies to enhance indoor air quality monitoring. By collecting data on key indoor atmospheric quality indicators—carbon dioxide (CO2), methane (CH4), alcohol, liquefied petroleum gas (LPG), particulate matter (PM1 and PM2.5), humidity, and temperature—the study aims to predict indoor carbon monoxide levels. A custom dataset was compiled from August to October, consisting of 61,710 observations recorded at one-minute intervals. The methodology employs a stacking ensemble approach, integrating multiple machine learning models to boost prediction accuracy and reliability. In the stacking ensemble, six distinct models are employed: Random Forest, Multi-Layer Perceptron, Lasso, Elastic Net, XGBoost, and Support Vector Regression. Each model is individually trained and fine-tuned using the Grid Search method to optimize parameter combinations. These optimized models are then combined in the stacking ensemble, which achieves a Mean Squared Error (MSE) of 0.0140, a Root Mean Squared Error (RMSE) of 0.1185, and a Mean Absolute Error (MAE) of 0.0291. The results demonstrate that the proposed system significantly enhances the precision of CO prediction, underscoring its critical role in air quality surveillance within smart environments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Targeted learning with an undersmoothed LASSO propensity score model for large-scale covariate adjustment in health-care database studies.
- Author
-
Wyss, Richard, van der Laan, Mark, Gruber, Susan, Shi, Xu, Lee, Hana, Dutcher, Sarah K, Nelson, Jennifer C, Toh, Sengwee, Russo, Massimiliano, Wang, Shirley V, Desai, Rishi J, and Lin, Kueiyu Joshua
- Abstract
Least absolute shrinkage and selection operator (LASSO) regression is widely used for large-scale propensity score (PS) estimation in health-care database studies. In these settings, previous work has shown that undersmoothing (overfitting) LASSO PS models can improve confounding control, but it can also cause problems of nonoverlap in covariate distributions. It remains unclear how to select the degree of undersmoothing when fitting large-scale LASSO PS models to improve confounding control while avoiding issues that can result from reduced covariate overlap. Here, we used simulations to evaluate the performance of using collaborative-controlled targeted learning to data-adaptively select the degree of undersmoothing when fitting large-scale PS models within both singly and doubly robust frameworks to reduce bias in causal estimators. Simulations showed that collaborative learning can data-adaptively select the degree of undersmoothing to reduce bias in estimated treatment effects. Results further showed that when fitting undersmoothed LASSO PS models, the use of cross-fitting was important for avoiding nonoverlap in covariate distributions and reducing bias in causal estimates. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Might expert knowledge improve econometric real estate mass appraisal?
- Author
-
Doszyń, Mariusz
- Subjects
VALUATION of real property ,ECONOMETRIC models ,LEAST squares ,REAL property ,PRIOR learning - Abstract
The article examines whether expert knowledge improves the estimation results of real estate mass appraisal models. Six econometric models were compared: OLS, mixed, the Bayesian model, the Inequality Restricted Least Squares (IRLS) model, ridge and LASSO regression (with regularization). In three of the models (mixed, Bayesian, and IRLS) prior knowledge was applied. In mixed and Bayesian models priors took the form of intervals for model parameters. In IRLS, restrictions in the form of inequalities were applied. In the empirical example mass appraisal models were applied in the valuation of undeveloped land for residential purposes. Models with prior knowledge turned out to be the best with regard to the consistency of estimates with theory. Also, prediction accuracy was better for models with prior knowledge. In the case of low quality data expert knowledge might significantly improve estimation results of real estate mass appraisal econometric models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Development of a multivariate model predictive of post-adrenalectomy renal function decline in patients with primary aldosteronism: a large-cohort single-center study.
- Author
-
Lin, Wenhao, Zhao, Juping, Fang, Chen, He, Wei, Huang, Xin, Sun, Fukang, and Dai, Jun
- Subjects
- *
GLOMERULAR filtration rate , *KIDNEY physiology , *REGRESSION analysis , *LINEAR statistical models , *EPIDERMAL growth factor receptors - Abstract
Purpose: To develop a multivariate liniear model for predicting long-term (> 3 months) post-adrenalectomy renal function decline in patients with primary aldosteronism (PA). The model aims to help identify patients who may experience a significant decline in renal function after surgery. Methods: We retrospectively analyzed the clinical data of 357 patients who were diagnosed with PA and underwent adrenalectomy between September 2012 and February 2023. LASSO and multivariate linear regression analyses were used to identify significant risk factors for model construction. The models were further internally validated using bootstrap method. Results: Age (P < 0.001), plasma aldosterone concentration (PAC) measured in the upright-position (PACU, P = 0.066), PAC measured after saline infusion (PACafterNS, P = 0.010), preoperative blood adrenocorticotropic-hormone level (ACTH, P = 0.048), preoperative estimated glomerular filtration rate (eGFR, P < 0.001) and immediate postoperative eGFR (P < 0.001) were finally included in a multivariate model predictive of post-adrenalectomy renal function decline and the coefficients were adjusted by internal validation. The final model is: predicted postoperative long-term (> 3 months) eGFR decline =-70.010 + 0.416*age + 6.343*lg PACU+4.802*lg ACTH + 7.424*lg PACafterNS+0.637*preoperative eGFR-0.438*immediate postoperative eGFR. The predicted values are highly related to the observed values (adjusted R = 0.63). Conclusion: The linear model incorporating perioperative clinical variables can accurately predict long-term (> 3 months) post-adrenalectomy renal function decline. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Association between visceral lipid accumulation indicators and gallstones: a cross-sectional study based on NHANES 2017–2020.
- Author
-
Wu, Weigen, Pei, Yuchen, Wang, Junlong, Liang, Qizhi, and Chen, Wei
- Subjects
- *
RECEIVER operating characteristic curves , *CURVE fitting , *GALLSTONES , *WAIST circumference , *LOGISTIC regression analysis - Abstract
Background: Obesity is a major contributing factor to the formation of gallstones. As early identification typically results in improved outcomes, we explored the relationship between visceral lipid accumulation indicators and the occurrence of gallstones. Methods: This cross-sectional study involved 3,224 adults. The researchers employed multivariable logistic regression, smoothed curve fitting (SCF), threshold effects analysis, and subgroup analysis to examine the relationship between metabolic scores for visceral fat (METS-VF), waist circumference (WC), lipid accumulation products (LAP), and visceral adiposity index (VAI) and gallstones. A Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis was used to identify key factors which were then used in the construction of a nomogram model. The diagnostic efficacy of this model in detecting gallstones was then determined using receiver operating characteristic curves. Results: Visceral lipid accumulation indicators were strongly linked to the likelihood of having gallstones. Specific saturation effects for METS-VF, WC, LAP, and VAI and gallstones were determined using SCF. The inflection points for these effects were found to be 8.565, 108.400, 18.056, and 1.071, respectively. Subgroup analyses showed that associations remained consistent in most subgroups. The nomogram model, which was developed using critical features identified by LASSO regression, demonstrated excellent discriminatory ability, as indicated by an area under the curve value of 0.725. Conclusions: Studies have shown that increases in METS-VF, WC, LAP, and VAI are linked to increased prevalences of gallstones. The nomogram model, designed with critical parameters identified using LASSO regression, exhibits a strong association with the presence of gallstones. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Development and validation of a dynamic nomogram for high care dependency during the hospital-family transition periods in older stroke patients.
- Author
-
Li, Fangyan, Zhang, Lei, Zhang, Ruilei, Liu, Yaoyao, Zhang, Tinglin, Su, Lin, and Geng, Huanhuan
- Subjects
OLDER patients ,STROKE patients ,WEBSITES ,RECEIVER operating characteristic curves ,BURDEN of care ,DEPENDENCY (Psychology) - Abstract
Background: This research aimed to develop and validate a dynamic nomogram for predicting the risk of high care dependency during the hospital-family transition periods in older stroke patients. Methods: 309 older stroke patients in the hospital-family transition periods who were treated in the Department of Neurology outpatient clinics of three general hospitals in Jinzhou, Liaoning Province from June to December 2023 were selected as the training set. The patients were investigated with the General Patient Information Questionnaire, the Care Dependency Scale (CDS), the Tilburg Frailty Inventory (TFI), the Hamilton Anxiety Rating Scale (HAMA), the Hamilton Depression Rating Scale-17 (HAMD-17), and the Mini Nutrition Assessment Short Form (MNA-SF). Lasso-logistic regression analysis was used to screen the risk factors for high care dependency in older stroke patients during the hospital-family transition period, and a dynamic nomogram model was constructed. The model was uploaded in the form of a web page based on Shiny apps. The Bootstrap method was employed to repeat the process 1000 times for internal validation. The model's predictive efficacy was assessed using the calibration plot, decision curve analysis curve (DCA), and area under the curve (AUC) of the receiver operator characteristic (ROC) curve. A total of 133 older stroke patients during the hospital-family transition periods who visited the outpatient department of Neurology of three general hospitals in Jinzhou from January to March 2024 were selected as the validation set for external validation of the model. Results: Based on the history of stroke, chronic disease, falls in the past 6 months, depression, malnutrition, and frailty, build a dynamic nomogram. The AUC under the ROC curves of the training set was 0.830 (95% CI: 0.784–0.875), and that of the validation set was 0.833 (95% CI: 0.766-0.900). The calibration curve was close to the ideal curve, and DCA results confirmed that the nomogram performed well in terms of clinical applicability. Conclusion: The online dynamic nomogram constructed in this study has good specificity, sensitivity, and clinical practicability, which can be applied to senior stroke patients as a prediction and assessment tool for high care dependency. It is of great significance to guide the development of early intervention strategies, optimize resource allocation, and reduce the care burden on families and society. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. From Diabetes to Dementia: Identifying Key Genes in the Progression of Cognitive Impairment.
- Author
-
Cao, Zhaoming, Du, Yage, Xu, Guangyi, Zhu, He, Ma, Yinchao, Wang, Ziyuan, Wang, Shaoying, and Lu, Yanhui
- Subjects
- *
TYPE 2 diabetes , *GENE expression , *MILD cognitive impairment , *RECEIVER operating characteristic curves , *GENE regulatory networks - Abstract
Objectives: To provide a basis for further research on the molecular mechanisms underlying type 2 diabetes-associated mild cognitive impairment (DCI) using two bioinformatics methods to screen key genes involved in the progression of mild cognitive impairment (MCI) and type 2 diabetes. Methods: RNA sequencing data of MCI and normal cognition groups, as well as expression profile and sample information data of clinical characteristic data of GSE63060, which contains 160 MCI samples and 104 normal samples, were downloaded from the GEO database. Hub genes were identified using weighted gene co-expression network analysis (WGCNA). Protein–protein interaction (PPI) analysis, combined with least absolute shrinkage and selection operator (LASSO) and receiver operating characteristic (ROC) curve analyses, was used to verify the genes. Moreover, RNA sequencing and clinical characteristic data for GSE166502 of 13 type 2 diabetes samples and 13 normal controls were downloaded from the GEO database, and the correlation between the screened genes and type 2 diabetes was verified by difference and ROC curve analyses. In addition, we collected clinical biopsies to validate the results. Results: Based on WGCNA, 10 modules were integrated, and six were correlated with MCI. Six hub genes associated with MCI (TOMM7, SNRPG, COX7C, UQCRQ, RPL31, and RPS24) were identified using the LASSO algorithm. The ROC curve was screened by integrating the GEO database, and revealed COX7C, SNRPG, TOMM7, and RPS24 as key genes in the progression of type 2 diabetes. Conclusions: COX7C, SNRPG, TOMM7, and RPS24 are involved in MCI and type 2 diabetes progression. Therefore, the molecular mechanisms of these four genes in the development of type 2 diabetes-associated MCI should be studied. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Identification of Key Immune and Cell Cycle Modules and Prognostic Genes for Glioma Patients through Transcriptome Analysis.
- Author
-
Guo, Kaimin, Yang, Jinna, Jiang, Ruonan, Ren, Xiaxia, Liu, Peng, Wang, Wenjia, Zhou, Shuiping, Wang, Xiaoguang, Ma, Li, and Hu, Yunhui
- Subjects
- *
CELL cycle regulation , *DISEASE risk factors , *BRAIN tumors , *ORGANELLE formation , *GENE regulatory networks - Abstract
Background: Gliomas, the most prevalent type of primary brain tumor, stand out as one of the most aggressive and lethal types of human cancer. Methods & Results: To uncover potential prognostic markers, we employed the weighted correlation network analysis (WGCNA) on the Chinese Glioma Genome Atlas (CGGA) 693 dataset to reveal four modules significantly associated with glioma clinical traits, primarily involved in immune function, cell cycle regulation, and ribosome biogenesis. Using the least absolute shrinkage and selection operator (LASSO) regression algorithm, we identified 11 key genes and developed a prognostic risk score model, which exhibits precise prognostic prediction in the CGGA 325 dataset. More importantly, we also validated the model in 12 glioma patients with overall survival (OS) ranging from 4 to 132 months using mRNA sequencing and immunohistochemical analysis. The analysis of immune infiltration revealed that patients with high-risk scores exhibit a heightened immune infiltration, particularly immune suppression cells, along with increased expression of immune checkpoints. Furthermore, we explored potentially effective drugs targeting 11 key genes for gliomas using the library of integrated network-based cellular signatures (LINCS) L1000 database, identifying that in vitro, both torin-1 and clofarabine exhibit promising anti-glioma activity and inhibitory effect on the cell cycle, a significant pathway enriched in the identified glioma modules. Conclusions: In conclusion, our study provides valuable insights into molecular mechanisms and identifying potential therapeutic targets for gliomas. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Regularization in machine learning models for MVT Pb-Zn prospectivity mapping: applying lasso and elastic-net algorithms.
- Author
-
Hajihosseinlou, Mahsa, Maghsoudi, Abbas, and Ghezelbash, Reza
- Subjects
- *
MACHINE learning , *SEARCH algorithms , *TERM limits (Public office) , *PREDICTION models , *ALGORITHMS - Abstract
The current research employed the least absolute shrinkage and selection operator (Lasso) and Elastic-net algorithms to examine their potential utilization in MVT Pb-Zn prospectivity modeling. In training the model, both Elastic-net and Lasso regularization approaches include a penalty term to the loss function. Since this penalty term limits the feature coefficients, the model is motivated to prioritize the most informative features and penalize the less relevant ones. The Varcheh district in western Iran was the source of the geological, geochemical, tectonic, and alteration dataset. We applied stratified 5-fold cross-validation to train the dataset, ensuring consistent and comprehensive performance evaluation across different data subsets. This method improved data utilization and provided more reliable performance estimates by averaging metrics over multiple folds, thereby enhancing the model's generalization assessment. The hyperparameters were adjusted using random search, quickly finding near-optimal solutions. Our investigation revealed that Elastic-net exhibited superior prediction accuracy and model robustness compared to Lasso. The combination of L1 and L2 regularization in Elastic-net, offers a more adaptable technique than Lasso, which just utilizes L1 regularization. This feature enables Elastic-net to handle scenarios in which there have been correlated predictors successfully. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Development of a predictive model for patients with bone metastases referred to palliative radiotherapy: Secondary analysis of a multicenter study (the PRAIS trial).
- Author
-
Rossi, Romina, Medici, Federica, Habberstad, Ragnhild, Klepstad, Pal, Cilla, Savino, Dall'Agata, Monia, Kaasa, Stein, Caraceni, Augusto Tommaso, Morganti, Alessio Giuseppe, and Maltoni, Marco
- Subjects
- *
LEUKOCYTE count , *LYMPHOCYTE count , *BODY mass index , *BONE metastasis , *RECEIVER operating characteristic curves , *CANCER pain - Abstract
Background: The decision to administer palliative radiotherapy (RT) to patients with bone metastases (BMs), as well as the selection of treatment protocols (dose, fractionation), requires an accurate assessment of survival expectancy. In this study, we aimed to develop three predictive models (PMs) to estimate short‐, intermediate‐, and long‐term overall survival (OS) for patients in this clinical setting. Materials and Methods: This study constitutes a sub‐analysis of the PRAIS trial, a longitudinal observational study collecting data from patients referred to participating centers to receive palliative RT for cancer‐induced bone pain. Our analysis encompassed 567 patients from the PRAIS trial database. The primary objectives were to ascertain the correlation between clinical and laboratory parameters with the OS rates at three distinct time points (short: 3 weeks; intermediate: 24 weeks; prolonged: 52 weeks) and to construct PMs for prognosis. We employed machine learning techniques, comprising the following steps: (i) identification of reliable prognostic variables and training; (ii) validation and testing of the model using the selected variables. The selection of variables was accomplished using the LASSO method (Least Absolute Shrinkage and Selection Operator). The model performance was assessed using receiver operator characteristic curves (ROC) and the area under the curve (AUC). Results: Our analysis demonstrated a significant impact of clinical parameters (primary tumor site, presence of non‐bone metastases, steroids and opioid intake, food intake, and body mass index) and laboratory parameters (interleukin 8 [IL‐8], chloride levels, C‐reactive protein, white blood cell count, and lymphocyte count) on OS. Notably, different factors were associated with the different times for OS with only IL‐8 included both in the PMs for short‐ and long‐term OS. The AUC values for ROC curves for 3‐week, 24‐week, and 52‐week OS were 0.901, 0.767, and 0.806, respectively. Conclusions: We successfully developed three PMs for OS based on easily accessible clinical and laboratory parameters for patients referred to palliative RT for painful BMs. While our findings are promising, it is important to recognize that this was an exploratory trial. The implementation of these tools into clinical practice warrants further investigation and confirmation through subsequent studies with separate databases. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Optimization of drug solubility inside the supercritical CO2 system via numerical simulation based on artificial intelligence approach.
- Author
-
Li, Meixiuli, Jiang, Wenyan, Zhao, Shuang, Huang, Kai, and Liu, Dongxiu
- Subjects
- *
MACHINE learning , *REGRESSION analysis , *ARTIFICIAL intelligence , *MACHINE performance , *BARNACLES , *SUPERCRITICAL carbon dioxide - Abstract
In this research paper, we explored the predictive capabilities of three different models of Polynomial Regression (PR), Extreme Gradient Boosting (XGB), and LASSO to estimate the density of supercritical carbon dioxide (SC-CO2) and the solubility of niflumic acid as functions of the input variables of temperature and pressure. The optimization of hyperparameters for these models is achieved using the innovative Barnacles Mating Optimizer (BMO) algorithm. For SC-CO2 density estimation, PR exhibits remarkable accuracy, showing an R-squared value of 0.99207 for data fitting. XGB performs admirably with an R2 of 0.92673, while LASSO model demonstrates good predictive ability, showing an R2 of 0.81917. Furthermore, we assess the models' performance in predicting the solubility of niflumic acid. PR exhibits excellent predictive capabilities with an R2 of 0.96949. XGB also delivers strong performance, yielding an R-squared score of 0.92961. LASSO performs well, achieving an R-squared score of 0.82094. The results indicated promising performance of machine learning models and optimizer in estimating drug solubility in supercritical CO2 as the solvent applicable for pharmaceutical industry. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Least angle regression, relaxed lasso, and elastic net for algebraic multigrid of systems of elliptic partial differential equations.
- Author
-
Lee, Barry
- Subjects
- *
ELLIPTIC differential equations , *PARSIMONIOUS models , *DEGREES of freedom , *INTERPOLATION , *ABILITY grouping (Education) , *MULTIGRID methods (Numerical analysis) , *PETRI nets - Abstract
In a sequence of papers, the author examined several statistical affinity measures for selecting the coarse degrees of freedom (CDOFs) or coarse nodes (Cnodes) in algebraic multigrid (AMG) for systems of elliptic partial differential equations (PDEs). These measures were applied to a set of relaxed vectors that exposes the problematic error components. Once the CDOFs are determined using any one of these measures, the interpolation operator is constructed in a bootstrap AMG (BAMG) procedure. However, in a recent paper of Kahl and Rottmann, the statistical least angle regression (LARS) method was utilized in the coarsening procedure and shown to be promising in the CDOF selection. This method is generally used in the statistics community to select the most relevant variables in constructing a parsimonious model for a very complicated and high‐dimensional model or data set (i.e., variable selection for a "reduced" model). As pointed out by Kahl and Rottmann, the LARS procedure has the ability to detect group relations between variables, which can be more useful than binary relations that are derived from strength‐of‐connection, or affinity measures, between pairs of variables. Moreover, by using an updated Cholesky factorization approach in the regression computation, the LARS procedure can be performed efficiently even when the original set of variables is large; and due to the LARS formulation itself (i.e., its l1$$ {l}_1 $$‐norm constraint), sparse interpolation operators can be generated. In this article, we extend the LARS coarsening approach to systems of PDEs. Furthermore, we incorporate some modifications to the LARS approach based on the so‐called elastic net and relaxed lasso methods, which are well known and thoroughly analyzed in the statistics community for ameliorating several major issues with LARS as a variable selection procedure. We note that the original LARS coarsening approach may have addressed some of these issues in similar or other ways but due to the limited details provided there, it is difficult to determine the extent of their similarities. Incorporating these modifications (or effecting them in similar ways) leads to improved robustness in the LARS coarsening procedure, and numerical experiments indicate that the changes lead to faster convergence in the multigrid method. Moreover, the relaxed lasso modification permits an indirect BAMG (iBAMG) extension to the interpolation operator. This iBAMG extension applied in an intra‐ or inter‐variable interpolation setting (i.e., nodal‐based coarsening), as well as in variable‐based coarsening, which will not preserve the nodal structure of a finest‐level discretization on the lower levels of the multilevel hierarchy, will be examined. For the variable‐based coarsening, because of the parsimonious feature of LARS, the performance is reasonably good when applied to systems of PDEs albeit at a substantial additional cost over a nodal‐based procedure. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. A Flexible Adaptive Lasso Cox Frailty Model Based on the Full Likelihood.
- Author
-
Hohberg, Maike and Groll, Andreas
- Abstract
In this work, a method to regularize Cox frailty models is proposed that accommodates time‐varying covariates and time‐varying coefficients and is based on the full likelihood instead of the partial likelihood. A particular advantage of this framework is that the baseline hazard can be explicitly modeled in a smooth, semiparametric way, for example, via P‐splines. Regularization for variable selection is performed via a lasso penalty and via group lasso for categorical variables while a second penalty regularizes wiggliness of smooth estimates of time‐varying coefficients and the baseline hazard. Additionally, adaptive weights are included to stabilize the estimation. The method is implemented in the R function coxlasso, which is now integrated into the package PenCoxFrail, and will be compared to other packages for regularized Cox regression. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Post‐Estimation Shrinkage in Full and Selected Linear Regression Models in Low‐Dimensional Data Revisited.
- Author
-
Kipruto, Edwin and Sauerbrei, Willi
- Abstract
The fit of a regression model to new data is often worse due to overfitting. Analysts use variable selection techniques to develop parsimonious regression models, which may introduce bias into regression estimates. Shrinkage methods have been proposed to mitigate overfitting and reduce bias in estimates. Post‐estimation shrinkage is an alternative to penalized methods. This study evaluates effectiveness of post‐estimation shrinkage in improving prediction performance of full and selected models. Through a simulation study, results were compared with ordinary least squares (OLS) and ridge in full models, and best subset selection (BSS) and lasso in selected models. We focused on prediction errors and the number of selected variables. Additionally, we proposed a modified version of the parameter‐wise shrinkage (PWS) approach named non‐negative PWS (NPWS) to address weaknesses of PWS. Results showed that no method was superior in all scenarios. In full models, NPWS outperformed global shrinkage, whereas PWS was inferior to OLS. In low correlation with moderate‐to‐high signal‐to‐noise ratio (SNR), NPWS outperformed ridge, but ridge performed best in small sample sizes, high correlation, and low SNR. In selected models, all post‐estimation shrinkage performed similarly, with global shrinkage slightly inferior. Lasso outperformed BSS and post‐estimation shrinkage in small sample sizes, low SNR, and high correlation but was inferior when the opposite was true. Our study suggests that, with sufficient information, NPWS is more effective than global shrinkage in improving prediction accuracy of models. However, in high correlation, small sample sizes, and low SNR, penalized methods generally outperform post‐estimation shrinkage methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Methods for multi-omic data integration in cancer research.
- Author
-
Hernández-Lemus, Enrique and Ochoa, Soledad
- Subjects
BIOLOGICAL systems ,TRANSCRIPTOMES ,MULTIOMICS ,MACHINE learning ,STATISTICAL models ,DATA integration - Abstract
Multi-omics data integration is a term that refers to the process of combining and analyzing data from different omic experimental sources, such as genomics, transcriptomics, methylation assays, and microRNA sequencing, among others. Such data integration approaches have the potential to provide a more comprehensive functional understanding of biological systems and has numerous applications in areas such as disease diagnosis, prognosis and therapy. However, quantitative integration of multi-omic data is a complex task that requires the use of highly specialized methods and approaches. Here, we discuss a number of data integration methods that have been developed with multi-omics data in view, including statistical methods, machine learning approaches, and network-based approaches. We also discuss the challenges and limitations of such methods and provide examples of their applications in the literature. Overall, this review aims to provide an overview of the current state of the field and highlight potential directions for future research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Dynamic Realized Minimum Variance Portfolio Models.
- Author
-
Kim, Donggyu and Oh, Minseog
- Subjects
MATRIX inversion ,DYNAMIC models ,FORECASTING - Abstract
This article introduces a dynamic minimum variance portfolio (MVP) model using nonlinear volatility dynamic models, based on high-frequency financial data. Specifically, we impose an autoregressive dynamic structure on MVP processes, which helps capture the MVP dynamics directly. To evaluate the dynamic MVP model, we estimate the inverse volatility matrix using the constrained l 1 -minimization for inverse matrix estimation (CLIME) and calculate daily realized non-normalized MVP weights. Based on the realized non-normalized MVP weight estimator, we propose the dynamic MVP model, which we call the dynamic realized minimum variance portfolio (DR-MVP) model. To estimate a large number of parameters, we employ the least absolute shrinkage and selection operator (LASSO) and predict the future MVP and establish its asymptotic properties. Using high-frequency trading data, we apply the proposed method to MVP prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Exploring core symptoms of alcohol withdrawal syndrome in alcohol use disorder patients: a network analysis approach.
- Author
-
Guanghui Shen, Yu-Hsin Chen, Yuyu Wu, Huang Jiahui, Juan Fang, Tang Jiayi, Kang Yimin, Wei Wang, Yanlong Liu, Fan Wang, and Li Chen
- Subjects
ALCOHOLISM ,ALCOHOL withdrawal syndrome ,DRUG withdrawal symptoms ,PATHOLOGICAL psychology ,MENTAL illness ,HOSTILITY - Abstract
Background: Understanding the interplay between psychopathology of alcohol withdrawal syndrome (AWS) in alcohol use disorder (AUD) patients may improve the effectiveness of relapse interventions for AUD. Network theory of mental disorders assumes that mental disorders persist not of a common functional disorder, but from a sustained feedback loop between symptoms, thereby explaining the persistence of AWS and the high relapse rate of AUD. The current study aims to establish a network of AWS, identify its core symptoms and find the bridges between the symptoms which are intervention target to relieve the AWS and break the self-maintaining cycle of AUD. Methods: Graphical lasso networkwere constructed using psychological symptoms of 553 AUD patients. Global network structure, centrality indices, cluster coefficient, and bridge symptom were used to identify the core symptoms of the AWS network and the transmission pathways between different symptom clusters. Results: The results revealed that: (1) AWS constitutes a stable symptom network with a stability coefficient (CS) of 0.21-0.75. (2) Anger (Strength = 1.52) and hostility (Strength = 0.84) emerged as the core symptom in the AWS network with the highest centrality and low clustering coefficient. (3) Hostility mediates aggression and anxiety; anger mediates aggression and impulsivity in AWS network respectively. Conclusions: Anger and hostility may be considered the best intervention targets for researching and treating AWS. Hostility and anxiety, anger and impulsiveness are independent but related dimensions, suggesting that different neurobiological bases may be involved in withdrawal symptoms, which play a similar role in withdrawal syndrome. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. The interplay of depressive symptoms and self-efficacy in adolescents: a network analysis approach.
- Author
-
Xiang Li, Bizhen Xia, Guanghui Shen, Renjie Dong, Su Xu, and Lingkai Yang
- Subjects
MENTAL depression ,BIVARIATE analysis ,SELF-efficacy ,MENTAL health ,SOCIAL context - Abstract
Background: Self-efficacy, a critical psychological construct representing an individual's belief in their ability to control their motivation, behavior, and social environment. In adolescents, self-efficacy plays a crucial role in mental health, particularly concerning depressive symptoms. Despite substantial research, the complex interplay between self-efficacy and depressive symptoms in adolescents remains incompletely understood. Aims: The aim of this study is to investigate the complex interrelationships between self-efficacy and depressive symptoms in adolescents using psychological network analysis. Methods: The cross-sectional study involved 3,654 adolescents. Self-efficacy was assessed using the General Self-Efficacy Scale (GSES), and depressive symptoms were measured with the Patient Health Questionnaire-9 (PHQ-9). Network analysis, incorporating the least absolute shrinkage and selection operator (LASSO) technique and centrality analysis, constructed and compared self-efficacy networks between depressive symptoms and healthy control groups. Results: Of the 3,654 participants, 560 (15.32%) met criteria for moderate to severe depressive symptoms (PHQ-9 scores =10). Among those with depressive symptoms, 373 (66.61%) had moderate, 126 (22.50%) had moderate-severe, and 61 (10.89%) had severe symptoms. Bivariate correlation analyses revealed a significant negative correlation between depressive symptoms and selfefficacy (r = -0.41, p < 0.001). The results of the network analysis showed significant differences in self-efficacy networks between adolescents with and without depressive symptoms (global strength: S = 0.25, p < 0.05). Depressed participants showed a network with reduced global strength, suggesting diminished interconnectedness among self-efficacy items. Specific connections within the self-efficacy network were altered in the presence of depressive symptoms. Bridge analysis revealed that effort-based problem-solving (bridge strengths = 0.13) and suicidal ideation (bridge strengths = 0.09) were the key bridge nodes. Conclusion: Adolescent depressive symptoms significantly impacts the selfefficacy network, resulting in diminished integration of self-efficacy and highlighting the complex interplay between self-efficacy and depressive symptoms. These findings challenge the traditional unidimensional view of selfefficacy and emphasize the need for tailored interventions focusing on unique self-efficacy profiles in adolescents with depressive symptoms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. From bioinformatics to clinical applications: a novel prognostic model of cuproptosis-related genes based on single-cell RNA sequencing data in hepatocellular carcinoma.
- Author
-
Wang, Yong, Zang, Fenglin, Shao, Bing, Gao, Yanan, Yang, Haicui, Guo, Yuhong, Ding, Tingting, and Sun, Baocun
- Subjects
- *
RECEIVER operating characteristic curves , *RNA sequencing , *CANCER prognosis , *PROGNOSTIC models , *DOWNLOADING - Abstract
Objective and methods: To ascertain the connection between cuproptosis-related genes (CRGs) and the prognosis of hepatocellular carcinoma (HCC) via single-cell RNA sequencing (scRNA-seq) and RNA sequencing (RNA-seq) data, relevant data were downloaded from the GEO and TCGA databases. The differentially expressed CRGs (DE-CRGs) were filtered by the overlaps in differentially expressed genes (DEGs) between HCC patients and normal controls (NCs) in the scRNA-seq database, DE-CRGs between high- and low-CRG-activity cells, and DEGs between HCC patients and NCs in the TCGA database. Results: Thirty-three DE-CRGs in HCC were identified. A prognostic model (PM) was created employing six survival-related genes (SRGs) (NDRG2, CYB5A, SOX4, MYC, TM4SF1, and IFI27) via univariate Cox regression analysis and LASSO. The predictive ability of the model was validated via a nomogram and receiver operating characteristic curves. Research has employed tumor immune dysfunction and exclusion as a means to examine the influence of PM on immunological heterogeneity. Macrophage M0 levels were significantly different between the high-risk group (HRG) and the low-risk group (LRG), and a greater macrophage level was linked to a more unfavorable prognosis. The drug sensitivity data indicated a substantial difference in the half-maximal drug-suppressive concentrations of idarubicin and rapamycin between the HRG and the LRG. The model was verified by employing public datasets and our cohort at both the protein and mRNA levels. Conclusion: A PM using 6 SRGs (NDRG2, CYB5A, SOX4, MYC, TM4SF1, and IFI27) was developed via bioinformatics research. This model might provide a fresh perspective for assessing and managing HCC. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. A useful mTORC1 signaling-related RiskScore model for the personalized treatment of osteosarcoma patients by using the bulk RNA-seq analysis.
- Author
-
Chen, Hongxia, Wang, Wei, Chang, Shichuan, Huang, Xiaoping, and Wang, Ning
- Subjects
RECEIVER operating characteristic curves ,GENE regulatory networks ,REGULATION of growth ,CELLULAR control mechanisms ,MEDIAN (Mathematics) - Abstract
Aims: This research developed a prognostic model for OS patients based on the Mechanistic Target of Rapamycin Complex 1 (mTORC1) signature. Background: The mTORC1 signaling pathway has a critical role in the maintenance of cellular homeostasis and tumorigenesis and development through the regulation of cell growth, metabolism and autophagy. However, the mechanism of action of this signaling pathway in Osteosarcoma (OS) remains unclear. Objective: The datasets including the TARGET-OS and GSE39058, and 200 mTORC1 genes were collected. Methods: The mTORC1 signaling-related genes were obtained based on the Molecular Signatures Database (MSigDB) database, and the single sample gene set enrichment analysis (ssGSEA) algorithm was utilized in order to calculate the mTORC1 score. Then, the WGCNA were performed for the mTORC1-correlated gene module, the un/multivariate and lasso Cox regression analysis were conducted for the RiskScore model. The immune infiltration analysis was performed by using the ssGSEA method, ESTIMATE tool and MCP-Count algorithm. KM survival and Receiver Operating Characteristic (ROC) Curve analysis were performed by using the survival and timeROC package. Results: The mTORC1 score and WGCNA with β = 5 screened the mTORC1 positively correlated skyblue2 module that included 67 genes, which are also associated with the metabolism and hypoxia pathways. Further narrowing of candidate genes and calculating the regression coefficient, we developed a useful and reliable RiskScore model, which can classify the patients in the training and validation set into high and low-risk groups based on the median value of RiskScore as an independent and robust prognostic factor. High-risk patients had a significantly poor prognosis, lower immune infiltration level of multiple immune cells and prone to cancer metastasis. Finally, we a nomogram model incorporating the metastasis features and RiskScore showed excellent prediction accuracy and clinical practicability. Conclusion: We developed a useful and reliable risk prognosis model based on the mTORC1 signaling signature. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Post-selection inference in regression models for group testing data.
- Author
-
Shen, Qinyan, Gregory, Karl, and Huang, Xianzheng
- Subjects
- *
MAXIMUM likelihood statistics , *REGRESSION analysis , *CONFIDENCE intervals - Abstract
We develop a methodology for valid inference after variable selection in logistic regression when the responses are partially observed, that is, when one observes a set of error-prone testing outcomes instead of the true values of the responses. Aiming at selecting important covariates while accounting for missing information in the response data, we apply the expectation-maximization algorithm to compute maximum likelihood estimators subject to LASSO penalization. Subsequent to variable selection, we make inferences on the selected covariate effects by extending post-selection inference methodology based on the polyhedral lemma. Empirical evidence from our extensive simulation study suggests that our post-selection inference results are more reliable than those from naive inference methods that use the same data to perform variable selection and inference without adjusting for variable selection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Constructing a predictive model based on peripheral blood signs to differentiate infectious mononucleosis from chronic active EBV infection.
- Author
-
Jin hua Yuan, Chong jie Pang, and Shuang long Yuan
- Subjects
- *
MONONUCLEOSIS , *LOGISTIC regression analysis , *LYMPHOCYTE count , *REGRESSION analysis , *SYMPTOMS - Abstract
Objective: To develop a prediction model based on peripheral blood signs to distinguish between infectious mononucleosis and chronic active EBV infection. Methods: Retrospective data was collected for 60 patients with IM (IM group) and 20 patients with CAEBV infection (CAEBV group) who were hospitalized and diagnosed at the General Hospital of Tianjin Medical University between December 2018 and September 2022. The analyses used were univariate and LASSO (least absolute shrinkage and selection operator) logistic regression. Results: Univariate analyses revealed that both IM and CAEBV-infected patients displayed overlapping and intersecting clinical manifestations, such as fever, sore throat, enlarged lymph nodes, and enlargement of the liver and spleen, and that in contrast to inflammatory responses in peripheral blood, CAEBV-infected patients had more severe inflammatory responses. Nine biomarkers--HGB, lymphocyte count, percentage of lymphocytes, ALB, fibrinogen, CRP, IFN-, IL-6, and EBV-DNA load--were subsequently selected by LASSO logistic regression modeling to serve as discriminatory models. Conclusions: Our investigation offers a solid foundation for diagnosing IM and CAEBV infection using the LASSO logistic regression model based on the significance and availability of peripheral blood indicators. Infected patients with CAEBV require early medical attention. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Improving random forest algorithm by selecting appropriate penalized method.
- Author
-
Farhadi, Zari, Bevrani, Hossein, and Feizi-Derakhshi, Mohammad-Reza
- Subjects
- *
RANDOM forest algorithms , *MONTE Carlo method , *MACHINE learning - Abstract
This article is improved the random forest algorithm by selecting the most appropriate penalized regression methods, and it is tried to improve the post-selection boosting random forest (PBRF) algorithm using elastic net regression. The proposed method with the highest efficiency is called Reducing and Aggregating Random Forest Trees by Elastic Net (RARTEN). The introduced method consists of three steps. In the first step, the random forest algorithm is used as a predictor. In the second step, Elastic Net, as a penalized regression method, is applied to reduce the number of trees and improve the random forest and PBRF. In the last step, selected trees are aggregated. The obtained results of the real data and Monte Carlo simulation are evaluated using various statistical performance criteria. The simulation study shows that the RARTEN with 7%, 5%, and 8.5% reduction in the linear, nonlinear, and noise model, respectively improve the accuracy of the traditional random forest and the proposed method by Wang. In addition, this method has a significant reduction compared to other penalized regression methods. Moreover, the real data results show that the proposed method in our study with a reduction of almost 16% confirms the validity of the proposed model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Sparse Fuzzy C-Means Clustering with Lasso Penalty.
- Author
-
Parveen, Shazia and Yang, Miin-Shen
- Subjects
- *
FUZZY measure theory , *ACQUISITION of data , *SOCIAL media , *ALGORITHMS - Abstract
Clustering is a technique of grouping data into a homogeneous structure according to the similarity or dissimilarity measures between objects. In clustering, the fuzzy c-means (FCM) algorithm is the best-known and most commonly used method and is a fuzzy extension of k-means in which FCM has been widely used in various fields. Although FCM is a good clustering algorithm, it only treats data points with feature components under equal importance and has drawbacks for handling high-dimensional data. The rapid development of social media and data acquisition techniques has led to advanced methods of collecting and processing larger, complex, and high-dimensional data. However, with high-dimensional data, the number of dimensions is typically immaterial or irrelevant. For features to be sparse, the Lasso penalty is capable of being applied to feature weights. A solution for FCM with sparsity is sparse FCM (S-FCM) clustering. In this paper, we propose a new S-FCM, called S-FCM-Lasso, which is a new type of S-FCM based on the Lasso penalty. The irrelevant features can be diminished towards exactly zero and assigned zero weights for unnecessary characteristics by the proposed S-FCM-Lasso. Based on various clustering performance measures, we compare S-FCM-Lasso with the S-FCM and other existing sparse clustering algorithms on several numerical and real-life datasets. Comparisons and experimental results demonstrate that, in terms of these performance measures, the proposed S-FCM-Lasso performs better than S-FCM and existing sparse clustering algorithms. This validates the efficiency and usefulness of the proposed S-FCM-Lasso algorithm for high-dimensional datasets with sparsity. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Can a Transparent Machine Learning Algorithm Predict Better than Its Black Box Counterparts? A Benchmarking Study Using 110 Data Sets.
- Author
-
Peterson, Ryan A., McGrath, Max, and Cavanaugh, Joseph E.
- Subjects
- *
FEATURE selection , *RANDOM forest algorithms , *SUPPORT vector machines , *DATABASES , *PREDICTION models - Abstract
We developed a novel machine learning (ML) algorithm with the goal of producing transparent models (i.e., understandable by humans) while also flexibly accounting for nonlinearity and interactions. Our method is based on ranked sparsity, and it allows for flexibility and user control in varying the shade of the opacity of black box machine learning methods. The main tenet of ranked sparsity is that an algorithm should be more skeptical of higher-order polynomials and interactions a priori compared to main effects, and hence, the inclusion of these more complex terms should require a higher level of evidence. In this work, we put our new ranked sparsity algorithm (as implemented in the open source R package, sparseR) to the test in a predictive model "bakeoff" (i.e., a benchmarking study of ML algorithms applied "out of the box", that is, with no special tuning). Algorithms were trained on a large set of simulated and real-world data sets from the Penn Machine Learning Benchmarks database, addressing both regression and binary classification problems. We evaluated the extent to which our human-centered algorithm can attain predictive accuracy that rivals popular black box approaches such as neural networks, random forests, and support vector machines, while also producing more interpretable models. Using out-of-bag error as a meta-outcome, we describe the properties of data sets in which human-centered approaches can perform as well as or better than black box approaches. We found that interpretable approaches predicted optimally or within 5% of the optimal method in most real-world data sets. We provide a more in-depth comparison of the performances of random forests to interpretable methods for several case studies, including exemplars in which algorithms performed similarly, and several cases when interpretable methods underperformed. This work provides a strong rationale for including human-centered transparent algorithms such as ours in predictive modeling applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Learning classifiers in clustered data: BCI pattern recognition model for EEG-based human emotion recognition.
- Author
-
Kheirabadi, Raoufeh and Omranpour, Hesam
- Subjects
- *
PATTERN recognition systems , *HILBERT-Huang transform , *EMOTION recognition , *EMOTIONS , *FEATURE selection , *DEEP learning - Abstract
Evidence suggests that human emotions can be detected using Electroencephalography (EEG) brain signals. Recorded EEG signals, due to their large size, may not initially perform well in classification. For this reason, various feature selection methods are used to improve the performance of classification. The nature of EEG signals is complex and unstable. This article uses the Empirical Mode Decomposition (EMD) method, which is one of the most successful methods in analyzing these signals in recent years. In the proposed model, first, the EEG signals are decomposed using EMD into the number of Intrinsic Mode Functions (IMF), and then, the statistical properties of the IMFs are extracted. To improve the performance of the proposed model, using the RBF kernel and Least Absolute Shrinkage and Selection Operator (LASSO) feature selection, an effective subset of the features that have changed the space is selected. The data are then clustered, and finally, each cluster is classified with a decision tree and random forest and KNN. The purpose of clustering is to increase the accuracy of the classification, which is achieved by focusing each cluster on a limited number of classes. This experiment was performed on the DEAP dataset. The results show that the proposed model with 99.17% accuracy could perform better than recent research such as deep learning and show good performance. In the latest years, with the development of the BCI system, the demand for recognizing emotions based on EEG has increased. We provide a method for classifying clustered data that is efficient for high accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Statistical modelling of height growth in urban forestry plantations.
- Author
-
Mallick, Swayam and Pattanaik, Akshya
- Subjects
SUBSET selection ,URBAN forestry ,STATISTICAL learning ,LEAST squares ,VEGETATION dynamics ,PARTIAL least squares regression - Abstract
Urban plantation dynamics in different topographical and climatic conditions in Odisha were evaluated using linear model selection and regularisation techniques. The main objective was to evaluate how and to what extent the urban plantations respond to various climatic and edaphic conditions. The relationship between vegetation growth and climatic and soil parameters was studied using four statistical learning tools, subset selection, ridge regression, lasso, and partial least squares regression, and their performance was compared to a multiple regression model. The test MSE for the subset selection, ridge regression, lasso, and partial least squares regression models was evaluated to be 16,261.54, 12245.11, 16263.79 and 14,317.21, respectively. Results proved that statistical learning methods, namely subset selection, lasso, ridge regressions and partial least squares regression, were more accurate than multiple linear regression. From the results, it can be safely concluded that temperature shows greater correlation with the growth parameters. Precipitation also plays a vital role in vegetation dynamics. Soil parameters indicate a positive correlation with that of the growth. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Bioremediation of turquoise blue by Mangifera indica — particle swarm optimization and kinetic modeling.
- Author
-
Deshannavar, Umesh B., Sivaprakash, Baskaran, Rajamohan, Natarajan, Katageri, Basavaraj G., Gadagi, Amith H., Hegde, Prasad G., Kadapure, Santosh A., Sutar, Mayur, Karanth, Madhura, and Naykar, Tejashwini
- Abstract
Treatments of dye contaminated waste water by conventional physico-chemical methods are less attractive due to various factors and bioremediation using natural plant-based sorbents has gained attention due to eco-friendliness and economical advantages. In this research, a natural adsorbent derived from Mangifera indica was employed to remove the dye turquoise blue. Central composite design technique was implemented to investigate the factors influencing the dye adsorption onto the activated Mangifera indica shell and their interactions. The Lasso and Ridge machine learning models were employed to optimize adsorption. The Ridge model accurately predicts the experimental data. Furthermore, a constrained nonlinear particle swarm optimization was employed to maximize the dye removal efficiency. It was observed that the maximum removal percentage of dye using this adsorbent was 87.44% under optimum operating conditions. The adsorption isotherm was best fitted by the Redlich–Peterson model in all the temperature conditions investigated. The parameter values computed from cftool kit in MATLAB are observed as k
RP = 2.4420 1/g, aRP = 0.0474 1/mgβ , and bRP = 1.2360 at the optimum temperature of 50 °C with an R2 of 0.9740. The kinetic studies revealed the fit of Lagergren's pseudo first order kinetics (k1 = 0.1047 1/min and q1 = 11.230 mg/g) in sorption kinetics for which the R2 value was reported as 0.9980. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
32. Hybrid Approach for Streamflow Prediction: LASSO-Hampel Filter Integration with Support Vector Machines, Artificial Neural Networks, and Autoregressive Distributed Lag Models.
- Author
-
Shabbir, Maha, Chand, Sohail, Iqbal, Farhat, and Kisi, Ozgur
- Subjects
ARTIFICIAL neural networks ,STANDARD deviations ,SUPPORT vector machines ,STREAM measurements ,WATER levels - Abstract
The generation of streamflow is linked with different factors such as water level, rainfall intensity, meteorological variables, and many more. In this study, we have developed a new hybrid approach (named LASSO-HF-SAA) by integrating the least absolute shrinkage and selection operator (LASSO) and Hampel filter (HF) with three data-driven models i.e. support vector machine (SVM), artificial neural network (ANN) and autoregressive distributed lag (ARDL). Firstly, LASSO selects meteorological variables important in daily streamflow prediction. Next, the HF detects and correct outliers in the variables to handle the randomness and noise of data. Thirdly, the HF-corrected data is fed to SVM, ANN, and ARDL models to obtain the predictions of the proposed LASSO-HF-SVM, LASSO-HF-ANN, and LASSO-HF-ARDL models. The performance of these models is checked using performance indices and the Diebold-Mariano (DM) test. The proposed hybrid approach is illustrated on the streamflow data of the Kabul River (Nowshera station) of Pakistan. Based on Nash-Sutcliffe efficiency (NSE), it is revealed that the prediction accuracy of the LASSO-HF-SVM hybrid model (NSE = 0.52) is better than SVM (NSE = 0.43), HF-SVM (NSE = 0.49) and LASSO-SVM (NSE = 0.47) models in testing phase. Similar findings are for the proposed LASSO-HF-ARDL and LASSO-HF-ANN hybrid models. Overall, the suggested LASSO-HF-ARDL hybrid model has shown winning performance compared to all models in the study. The root mean squared error (RMSE) and NSE of the proposed LASSO-HF-ARDL model is 443.5m
3 /s and 0.68 on the test data. The DM test confirms that the prediction accuracy of the proposed hybrid models is better than their respective single, HF-based, and LASSO-based models versions of SVM, ANN, and ARDL models respectively. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
33. Bayesian Model Averaging and Regularized Regression as Methods for Data-Driven Model Exploration, with Practical Considerations.
- Author
-
Han, Hyemin
- Subjects
TEACHER researchers ,RESEARCH personnel ,QUANTITATIVE research ,PREDICTION models ,EXPERTISE - Abstract
Methodological experts suggest that psychological and educational researchers should employ appropriate methods for data-driven model exploration, such as Bayesian Model Averaging and regularized regression, instead of conventional hypothesis-driven testing, if they want to explore the best prediction model. I intend to discuss practical considerations regarding data-driven methods for end-user researchers without sufficient expertise in quantitative methods. I tested three data-driven methods, i.e., Bayesian Model Averaging, LASSO as a form of regularized regression, and stepwise regression, with datasets in psychology and education. I compared their performance in terms of cross-validity indicating robustness against overfitting across different conditions. I employed functionalities widely available via R with default settings to provide information relevant to end users without advanced statistical knowledge. The results demonstrated that LASSO showed the best performance and Bayesian Model Averaging outperformed stepwise regression when there were many candidate predictors to explore. Based on these findings, I discussed appropriately using the data-driven model exploration methods across different situations from laypeople's perspectives. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Screening the Best Risk Model and Susceptibility SNPs for Chronic Obstructive Pulmonary Disease (COPD) Based on Machine Learning Algorithms
- Author
-
Yang Z, Zheng Y, Zhang L, Zhao J, Xu W, Wu H, Xie T, and Ding Y
- Subjects
copd ,lasso ,machine learning ,predictive model ,snp ,Diseases of the respiratory system ,RC705-779 - Abstract
Zehua Yang,* Yamei Zheng,* Lei Zhang, Jie Zhao, Wenya Xu, Haihong Wu, Tian Xie, Yipeng Ding Department of Respiratory and Critical Care Medicine, Hainan Affiliated Hospital of Hainan Medical University, Hainan General Hospital, Haikou, Hainan, 570311, People’s Republic of China*These authors contributed equally to this workCorrespondence: Yipeng Ding; Tian Xie, Department of Respiratory and Critical Care Medicine, Hainan Affiliated Hospital of Hainan Medical University, Hainan General Hospital, 19 Xiuhua Road, Xiuying District, Haikou, Hainan, 570311, People’s Republic of China, Tel +86-18976335858, Email yipengding2024@163.com; hpphxietian@163.comBackground and Purpose: Chronic obstructive pulmonary disease (COPD) is a common and progressive disease that is influenced by both genetic and environmental factors, and genetic factors are important determinants of COPD. This study focuses on screening the best predictive models for assessing COPD-associated SNPs and then using the best models to predict potential risk factors for COPD.Methods: Healthy subjects (n=290) and COPD patients (n=233) were included in this study, the Agena MassARRAY platform was applied to genotype the subjects for SNPs. The selected sample loci were first screened by logistic regression analysis, based on which the key SNPs were further screened by LASSO regression, RFE algorithm and Random Forest algorithm, and the ROC curves were plotted to assess the discriminative performance of the models to screen the best prediction model. Finally, the best prediction model was used for the prediction of risk factors for COPD.Results: One-way logistic regression analysis screened 44 candidate SNPs from 146 SNPs, on the basis of which 44 SNPs were screened or feature ranked using LASSO model, RFE-Caret, RFE-Lda, RFE-lr, RFE-nb, RFE-rf, RFE-treebag algorithms and random forest model, respectively, and obtained ROC curve values of 0.809, 0.769, 0.798, 0.743, 0.686, 0.766, 0.743, 0.719, respectively, so we selected the lasso model as the best model, and then constructed a column-line graph model for the 25 SNPs screened in it, and found that rs12479210 might be the potential risk factors for COPD.Conclusion: The LASSO model is the best predictive model for COPD and rs12479210 may be a potential risk locus for COPD.Keywords: COPD, LASSO, machine learning, predictive model, SNP
- Published
- 2024
35. Association between visceral lipid accumulation indicators and gallstones: a cross-sectional study based on NHANES 2017–2020
- Author
-
Weigen Wu, Yuchen Pei, Junlong Wang, Qizhi Liang, and Wei Chen
- Subjects
Gallstones ,Lipid accumulation ,Cross-sectional study ,NHANES ,LASSO ,Nutritional diseases. Deficiency diseases ,RC620-627 - Abstract
Abstract Background Obesity is a major contributing factor to the formation of gallstones. As early identification typically results in improved outcomes, we explored the relationship between visceral lipid accumulation indicators and the occurrence of gallstones. Methods This cross-sectional study involved 3,224 adults. The researchers employed multivariable logistic regression, smoothed curve fitting (SCF), threshold effects analysis, and subgroup analysis to examine the relationship between metabolic scores for visceral fat (METS-VF), waist circumference (WC), lipid accumulation products (LAP), and visceral adiposity index (VAI) and gallstones. A Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis was used to identify key factors which were then used in the construction of a nomogram model. The diagnostic efficacy of this model in detecting gallstones was then determined using receiver operating characteristic curves. Results Visceral lipid accumulation indicators were strongly linked to the likelihood of having gallstones. Specific saturation effects for METS-VF, WC, LAP, and VAI and gallstones were determined using SCF. The inflection points for these effects were found to be 8.565, 108.400, 18.056, and 1.071, respectively. Subgroup analyses showed that associations remained consistent in most subgroups. The nomogram model, which was developed using critical features identified by LASSO regression, demonstrated excellent discriminatory ability, as indicated by an area under the curve value of 0.725. Conclusions Studies have shown that increases in METS-VF, WC, LAP, and VAI are linked to increased prevalences of gallstones. The nomogram model, designed with critical parameters identified using LASSO regression, exhibits a strong association with the presence of gallstones.
- Published
- 2024
- Full Text
- View/download PDF
36. Development and validation of a dynamic nomogram for high care dependency during the hospital-family transition periods in older stroke patients
- Author
-
Fangyan Li, Lei Zhang, Ruilei Zhang, Yaoyao Liu, Tinglin Zhang, Lin Su, and Huanhuan Geng
- Subjects
High care dependency ,Nomogram ,Stroke ,Hospital-family transition period ,Lasso ,Geriatrics ,RC952-954.6 - Abstract
Abstract Background This research aimed to develop and validate a dynamic nomogram for predicting the risk of high care dependency during the hospital-family transition periods in older stroke patients. Methods 309 older stroke patients in the hospital-family transition periods who were treated in the Department of Neurology outpatient clinics of three general hospitals in Jinzhou, Liaoning Province from June to December 2023 were selected as the training set. The patients were investigated with the General Patient Information Questionnaire, the Care Dependency Scale (CDS), the Tilburg Frailty Inventory (TFI), the Hamilton Anxiety Rating Scale (HAMA), the Hamilton Depression Rating Scale-17 (HAMD-17), and the Mini Nutrition Assessment Short Form (MNA-SF). Lasso-logistic regression analysis was used to screen the risk factors for high care dependency in older stroke patients during the hospital-family transition period, and a dynamic nomogram model was constructed. The model was uploaded in the form of a web page based on Shiny apps. The Bootstrap method was employed to repeat the process 1000 times for internal validation. The model’s predictive efficacy was assessed using the calibration plot, decision curve analysis curve (DCA), and area under the curve (AUC) of the receiver operator characteristic (ROC) curve. A total of 133 older stroke patients during the hospital-family transition periods who visited the outpatient department of Neurology of three general hospitals in Jinzhou from January to March 2024 were selected as the validation set for external validation of the model. Results Based on the history of stroke, chronic disease, falls in the past 6 months, depression, malnutrition, and frailty, build a dynamic nomogram. The AUC under the ROC curves of the training set was 0.830 (95% CI: 0.784–0.875), and that of the validation set was 0.833 (95% CI: 0.766-0.900). The calibration curve was close to the ideal curve, and DCA results confirmed that the nomogram performed well in terms of clinical applicability. Conclusion The online dynamic nomogram constructed in this study has good specificity, sensitivity, and clinical practicability, which can be applied to senior stroke patients as a prediction and assessment tool for high care dependency. It is of great significance to guide the development of early intervention strategies, optimize resource allocation, and reduce the care burden on families and society.
- Published
- 2024
- Full Text
- View/download PDF
37. Optimization of drug solubility inside the supercritical CO2 system via numerical simulation based on artificial intelligence approach
- Author
-
Meixiuli Li, Wenyan Jiang, Shuang Zhao, Kai Huang, and Dongxiu Liu
- Subjects
Polynomial regression ,Extreme Gradient Boosting ,LASSO ,Drug solubility ,Supercritical CO2 ,Medicine ,Science - Abstract
Abstract In this research paper, we explored the predictive capabilities of three different models of Polynomial Regression (PR), Extreme Gradient Boosting (XGB), and LASSO to estimate the density of supercritical carbon dioxide (SC-CO2) and the solubility of niflumic acid as functions of the input variables of temperature and pressure. The optimization of hyperparameters for these models is achieved using the innovative Barnacles Mating Optimizer (BMO) algorithm. For SC-CO2 density estimation, PR exhibits remarkable accuracy, showing an R-squared value of 0.99207 for data fitting. XGB performs admirably with an R2 of 0.92673, while LASSO model demonstrates good predictive ability, showing an R2 of 0.81917. Furthermore, we assess the models’ performance in predicting the solubility of niflumic acid. PR exhibits excellent predictive capabilities with an R2 of 0.96949. XGB also delivers strong performance, yielding an R-squared score of 0.92961. LASSO performs well, achieving an R-squared score of 0.82094. The results indicated promising performance of machine learning models and optimizer in estimating drug solubility in supercritical CO2 as the solvent applicable for pharmaceutical industry.
- Published
- 2024
- Full Text
- View/download PDF
38. Computed tomography-based radiomics nomogram for prediction of lympho-vascular and perineural invasion in esophageal squamous cell cancer patients: a retrospective cohort study
- Author
-
Bin Tang, Fan Wu, Lin Peng, Xuefeng Leng, Yongtao Han, Qifeng Wang, Junxiang Wu, and Lucia Clara Orlandini
- Subjects
Esophageal squamous cell cancer ,Lympho-vascular invasion ,Perineural invasion ,Contrast-enhanced CT ,Radiomic ,LASSO ,Medical physics. Medical radiology. Nuclear medicine ,R895-920 ,Neoplasms. Tumors. Oncology. Including cancer and carcinogens ,RC254-282 - Abstract
Abstract Purpose Lympho-vascular invasion (LVI) and perineural invasion (PNI) have been established as prognostic factors in various types of cancers. The preoperative prediction of LVI and PNI has the potential to guide personalized medicine strategies for patients with esophageal squamous cell cancer (ESCC). This study investigates whether radiomics features derived from preoperative contrast-enhanced CT could predict LVI and PNI in ESCC patients. Methods and materials A retrospective cohort of 544 ESCC patients who underwent esophagectomy were included in this study. Preoperative contrast-enhanced CT images, pathological results of PNI and LVI, and clinical characteristics were collected. For each patient, the gross tumor volume (GTV-T) and lymph nodes volume (GTV-N) were delineated and four categories of radiomics features (first-order, shape, textural and wavelet) were extracted from GTV-T and GTV-N. The Mann–Whitney U test was used to select significant features associated with LVI and PNI in turn. Subsequently, radiomics signatures for LVI and PNI were constructed using LASSO regression with ten-fold cross-validation. Significant clinical characteristics were combined with radiomics signature to develop two nomogram models for predicting LVI and PNI, respectively. The area under the curve (AUC) and calibration curve were used to evaluate the predictive performance of the models. Results The radiomics signature for LVI prediction consisted of 28 features, while the PNI radiomics signature comprised 14 features. The AUCs of the LVI radiomics signature were 0.77 and 0.74 in the training and validation groups, respectively, while the AUCs of the PNI radiomics signature were 0.69 and 0.68 in the training and validation groups. The nomograms incorporating radiomics signatures and significant clinical characteristics such as age, gender, thrombin time and D-Dimer showed improved predictive performance for both LVI (AUC: 0.82 and 0.80 in the training and validation group) and PNI (AUC: 0.75 and 0.72 in the training and validation groups) compared to the radiomics signature alone. Conclusion The radiomics features extracted from preoperative contrast-enhanced CT of gross tumor and lymph nodes have demonstrated their potential in predicting LVI and PNI in ESCC patients. Furthermore, the incorporation of clinical characteristics has shown additional value, resulting in improved predictive performance.
- Published
- 2024
- Full Text
- View/download PDF
39. From bioinformatics to clinical applications: a novel prognostic model of cuproptosis-related genes based on single-cell RNA sequencing data in hepatocellular carcinoma
- Author
-
Yong Wang, Fenglin Zang, Bing Shao, Yanan Gao, Haicui Yang, Yuhong Guo, Tingting Ding, and Baocun Sun
- Subjects
Cuproptosis-related genes ,scRNA-seq ,Prognostic model ,LASSO ,Immunologic diseases. Allergy ,RC581-607 - Abstract
Abstract Objective and methods To ascertain the connection between cuproptosis-related genes (CRGs) and the prognosis of hepatocellular carcinoma (HCC) via single-cell RNA sequencing (scRNA-seq) and RNA sequencing (RNA-seq) data, relevant data were downloaded from the GEO and TCGA databases. The differentially expressed CRGs (DE-CRGs) were filtered by the overlaps in differentially expressed genes (DEGs) between HCC patients and normal controls (NCs) in the scRNA-seq database, DE-CRGs between high- and low-CRG-activity cells, and DEGs between HCC patients and NCs in the TCGA database. Results Thirty-three DE-CRGs in HCC were identified. A prognostic model (PM) was created employing six survival-related genes (SRGs) (NDRG2, CYB5A, SOX4, MYC, TM4SF1, and IFI27) via univariate Cox regression analysis and LASSO. The predictive ability of the model was validated via a nomogram and receiver operating characteristic curves. Research has employed tumor immune dysfunction and exclusion as a means to examine the influence of PM on immunological heterogeneity. Macrophage M0 levels were significantly different between the high-risk group (HRG) and the low-risk group (LRG), and a greater macrophage level was linked to a more unfavorable prognosis. The drug sensitivity data indicated a substantial difference in the half-maximal drug-suppressive concentrations of idarubicin and rapamycin between the HRG and the LRG. The model was verified by employing public datasets and our cohort at both the protein and mRNA levels. Conclusion A PM using 6 SRGs (NDRG2, CYB5A, SOX4, MYC, TM4SF1, and IFI27) was developed via bioinformatics research. This model might provide a fresh perspective for assessing and managing HCC.
- Published
- 2024
- Full Text
- View/download PDF
40. A useful mTORC1 signaling-related RiskScore model for the personalized treatment of osteosarcoma patients by using the bulk RNA-seq analysis
- Author
-
Hongxia Chen, Wei Wang, Shichuan Chang, Xiaoping Huang, and Ning Wang
- Subjects
Osteosarcoma (OS) ,MTORC1 signaling signature ,RiskScore ,Tumor microenvironment ,Weighted Gene Co-expression Network Analysis (WGCNA) ,Lasso ,Neoplasms. Tumors. Oncology. Including cancer and carcinogens ,RC254-282 - Abstract
Abstract Aims This research developed a prognostic model for OS patients based on the Mechanistic Target of Rapamycin Complex 1 (mTORC1) signature. Background The mTORC1 signaling pathway has a critical role in the maintenance of cellular homeostasis and tumorigenesis and development through the regulation of cell growth, metabolism and autophagy. However, the mechanism of action of this signaling pathway in Osteosarcoma (OS) remains unclear. Objective The datasets including the TARGET-OS and GSE39058, and 200 mTORC1 genes were collected. Methods The mTORC1 signaling-related genes were obtained based on the Molecular Signatures Database (MSigDB) database, and the single sample gene set enrichment analysis (ssGSEA) algorithm was utilized in order to calculate the mTORC1 score. Then, the WGCNA were performed for the mTORC1-correlated gene module, the un/multivariate and lasso Cox regression analysis were conducted for the RiskScore model. The immune infiltration analysis was performed by using the ssGSEA method, ESTIMATE tool and MCP-Count algorithm. KM survival and Receiver Operating Characteristic (ROC) Curve analysis were performed by using the survival and timeROC package. Results The mTORC1 score and WGCNA with β = 5 screened the mTORC1 positively correlated skyblue2 module that included 67 genes, which are also associated with the metabolism and hypoxia pathways. Further narrowing of candidate genes and calculating the regression coefficient, we developed a useful and reliable RiskScore model, which can classify the patients in the training and validation set into high and low-risk groups based on the median value of RiskScore as an independent and robust prognostic factor. High-risk patients had a significantly poor prognosis, lower immune infiltration level of multiple immune cells and prone to cancer metastasis. Finally, we a nomogram model incorporating the metastasis features and RiskScore showed excellent prediction accuracy and clinical practicability. Conclusion We developed a useful and reliable risk prognosis model based on the mTORC1 signaling signature.
- Published
- 2024
- Full Text
- View/download PDF
41. Drivers of the next-minute Bitcoin price using sparse regressions
- Author
-
Gurrib, Ikhlaas, Kamalov, Firuz, Starkova, Olga, Elshareif, Elgilani Eltahir, and Contu, Davide
- Published
- 2024
- Full Text
- View/download PDF
42. Skew Index: a machine learning forecasting approach.
- Author
-
Vanegas, Esteban and Mora-Valencia, Andrés
- Abstract
The Skew Index originated in response to the Black Monday Crisis in 1987 to provide investors and regulators with a tool to gauge turbulence in the financial markets. Understanding and forecasting the Skew Index is crucial for anticipating market downturns and managing financial risk. This paper presented key descriptive statistics of the Skew Index, a topic not extensively covered in existing literature. Furthermore, we utilized a range of Deep Learning models—Dense, LSTM, GRU, CNN, and Hybrid-CNN—in both stand-alone configurations and with external variables to forecast the Index with daily data from April 1, 1997, to November 30, 2023. LASSO regression analysis was applied to select the most predictive exogenous variables for the index forecast. Our findings indicated that the Dense model provided an effective forecast for stand-alone models, while the CNN-LSTM model offered a superior forecast compared to other deep learning models when external variables were included. This research is novel in its application of neural network architectures to forecast the daily levels of the Skew Index, contributing to the field by providing a robust framework for financial market risk assessment and forecasting. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
43. Screening biomarkers for spinal cord injury using weighted gene co-expression network analysis and machine learning.
- Author
-
Xiaolu Li, Ye Yang, Senming Xu, Yuchang Gui, Jianmin Chen, and Jianwen Xu
- Published
- 2024
- Full Text
- View/download PDF
44. Exploring the factors that influence academic stress among elementary school students using a LASSO penalty regression model.
- Author
-
Kim, JiYoon, An, Saebuyl, and Hong, Sehee
- Abstract
The main objective of the study was to explore the main predictors that influence academic stress among fourth-grade elementary school students using data from the Panel Study on Korean Children via a least absolute shrinkage and selection operator (LASSO) regularized penalty regression model. The study examined 280 explanatory variables using the LASSO model. After preprocessing the data, it finally selected 21 variables. Out of them, the study found that children's persistence, peer attachment, bullying, parental achievement pressure, and subjective socioeconomic status as significant predictors, which is consistent with previous studies. However, several variables, including children's overall happiness, frequency of using slang, preference for mathematics, school life preference, daily time spent for homework and study, average monthly cost of private education, parents' participation in school events, marital conflict, and residential area, which were newly explored in the study, were also significant. Finally, we presented the significance and implications of the results in relation to decreasing academic stress among fourth-grade elementary school children. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Prediction of Taxi-in Time and Analysis of Influencing Factors for Arrival Flights at Airport with a Decentralised Terminal Layout
- Author
-
Xiaowei TANG, Mengfan YE, Shengrun ZHANG, and Kurt FUELLHART
- Subjects
air transportation ,taxi-in time prediction ,terminal layout ,airport surface movement ,lasso ,gbrt ,airport management ,Transportation engineering ,TA1001-1280 - Abstract
Accurately predicting taxi-in times for arrival flights is crucial for efficient ground handling resource allocation, impacting flight departure timeliness. This study investigates terminal layout characteristics, specifically decentralised layouts, to predict and analyse arrival flight taxi-in times. We develop a surface traffic flow calculation method considering arrival and departure flights, eliminating fixed thresholds. We introduce runway-crossing operations for decentralised airports, creating new prediction variables. We consider factors like runway, aircraft type, airline, taxi distance, and time periods. Gradient Boosting Regression Tree predicts taxi-in times, while Lasso analyses factor impact. Our approach yields highly accurate predictions for decentralised airports, with Surface traffic flow and Runway-crossing variables significantly influencing taxi-in times. This research informs airport managers in decentralised layouts, enabling tailored management strategies.
- Published
- 2024
- Full Text
- View/download PDF
46. Development and experimental validation of hypoxia-related gene signatures for osteosarcoma diagnosis and prognosis based on WGCNA and machine learning
- Author
-
Bo Wen, Jian Chen, Tianqi Ding, Zhiyou Mao, Rong Jin, Yirui Wang, Meiqin Shi, Lixun Zhao, Asang Yang, Xianyun Qin, and Xuewei Chen
- Subjects
Osteosarcoma ,Hypoxia ,Diagnosis ,Prognosis ,WGCNA ,LASSO ,Medicine ,Science - Abstract
Abstract Osteosarcoma (OS) is the most common primary malignant tumour of the bone with high mortality. Here, we comprehensively analysed the hypoxia signalling in OS and further constructed novel hypoxia-related gene signatures for OS prediction and prognosis. This study employed Gene Set Enrichment Analysis (GSEA), Weighted correlation network analysis (WGCNA) and Least absolute shrinkage and selection operator (LASSO) analyses to identify Stanniocalcin 2 (STC2) and Transmembrane Protein 45A (TMEM45A) as the diagnostic biomarkers, which further assessed by Receiver Operating Characteristic (ROC), decision curve analysis (DCA), and calibration curves in training and test dataset. Univariate and multivariate Cox regression analyses were used to construct the prognostic model. STC2 and metastasis were devised to forge the OS risk model. The nomogram, risk score, Kaplan Meier plot, ROC, DCA, and calibration curves results certified the excellent performance of the prognostic model. The expression level of STC2 and TMEM45A was validated in external datasets and cell lines. In immune cell infiltration analysis, cancer-associated fibroblasts (CAFs) were significantly higher in the low-risk group. And the immune infiltration of CAFs was negatively associated with the expression of STC2 (P
- Published
- 2024
- Full Text
- View/download PDF
47. Development of a Predictive Nomogram for Intra-Hospital Mortality in Acute Ischemic Stroke Patients Using LASSO Regression
- Author
-
Zhou L, Wu Y, Wang J, Wu H, Tan Y, Chen X, Song X, Ren Y, and Yang Q
- Subjects
ischemic stroke ,nomogram ,predictors ,lasso ,intra-hospital mortality. ,Geriatrics ,RC952-954.6 - Abstract
Li Zhou,1,* Youlin Wu,1,2,* Jiani Wang,1 Haiyun Wu,1 Yongjun Tan,1 Xia Chen,1,3 Xiaosong Song,1,4 Yu Ren,1 Qin Yang1 1Department of Neurology, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China; 2Department of Neurology, Chongzhou People’s Hospital, Sichuan, People’s Republic of China; 3Department of Neurology, the Seventh People’s Hospital of Chongqing, Chongqing, People’s Republic of China; 4Department of Neurology, the Ninth People’s Hospital of Chongqing, Chongqing, People’s Republic of China*These authors contributed equally to this workCorrespondence: Qin Yang, Department of Neurology, the First Affiliated Hospital of Chongqing Medical University, 1 Youyi Road, Yuzhong District, Chongqing, 400016, People’s Republic of China, Tel +86-023-89012008, Fax +86-023-68811487, Email xyqh200@126.comBackground and Purpose: Ischemic stroke is a leading cause of mortality and disability globally, necessitating accurate prediction of intra-hospital mortality (IHM) for improved patient care. This study aimed to develop a practical nomogram for personalized IHM risk prediction in ischemic stroke patients.Methods: A retrospective study of 422 ischemic stroke patients (April 2020 - December 2021) from Chongqing Medical University’s First Affiliated Hospital was conducted, with patients divided into training (n=295) and validation (n=127) groups. Data on demographics, comorbidities, stroke risk factors, and lab results were collected. Stroke severity was assessed using NIHSS, and stroke types were classified by TOAST criteria. Least absolute shrinkage and selection operator (LASSO) regression was employed for predictor selection and nomogram construction, with evaluation through ROC curves, calibration curves, and decision curve analysis.Results: LASSO regression and multivariate logistic regression identified four independent IHM predictors: age, admission NIHSS score, chronic obstructive pulmonary disease (COPD) diagnosis, and white blood cell count (WBC). A highly accurate nomogram based on these variables exhibited excellent predictive performance, with AUCs of 0.958 (training) and 0.962 (validation), sensitivities of 93.2% and 95.7%, and specificities of 93.1% and 90.9%, respectively. Calibration curves and decision curve analysis validated its clinical applicability.Conclusion: Age, admission NIHSS score, COPD history, and WBC were identified as independent IHM predictors in ischemic stroke patients. The developed nomogram demonstrated high predictive accuracy and practical utility for mortality risk estimation. External validation and prospective studies are warranted for further confirmation of its clinical efficacy.Keywords: ischemic stroke, nomogram, predictors, lasso, intra-hospital mortality
- Published
- 2024
48. Construction and validation of prediction model for diabetic retinopathy
- Author
-
Chen Xingyue, Cai Weiqin, Wang Suzhen, An Hongqing, and Qi Leitao
- Subjects
diabetic retinopathy ,lasso ,nomogram ,roc curve ,calibration curve ,dca curve ,Ophthalmology ,RE1-994 - Abstract
AIM: To analyze and screen influencing factors of diabetic patients complicated with retinopathy, and establish and validate prediction model of nomogram.METHODS: A total of 1 252 patients from the Diabetes Complications Early Warning Dataset of the National Population Health Data Archive(PHDA)between January 2013 to January 2021 were selected and randomly divided into a modeling group(n=941)and a validation group(n=311). Univariate analysis, LASSO regression and Logistic regression analysis were used to screen out the influencing factors of diabetic retinopathy, and a nomogram prediction model was established. The receiver operating characteristic curve, Hosmer-Lemeshow test and calibration curve were used to evaluate the model. The clinical benefit was evaluated by the decision curve analysis(DCA).RESULTS: Age, hypertension, nephropathy, systolic blood pressure(SBP), glycated hemoglobin(HbA1c), high-density lipoprotein cholesterol(HDL-C), and blood urea(BU)were the influencing factors of diabetic retinopathy. The area under the curve(AUC)of the modeling group was 0.792(95%CI: 0.763-0.821), and the AUC of the validation group was 0.769(95%CI: 0.716-0.822). The Hosmer-Lemeshow goodness of fit test and calibration curve suggested that the theoretical value of the model was in good agreement(modeling group: χ2=14.520, P=0.069; validation group: χ2=14.400, P=0.072). The DCA results showed that the threshold probabilities range was 0.09-0.89 for modeling group and 0.07-0.84 for the validation group, which suggested the clinical net benefit was higher.CONCLUSION: This study constructed a risk prediction model including age, hypertension, nephropathy, SBP, HbA1c, HDL-C, and BU. The model has a high discrimination and consistency, and can be used to predict the risk of diabetic retinopathy in patients with diabetes.
- Published
- 2024
- Full Text
- View/download PDF
49. Bayesian Model Averaging and Regularized Regression as Methods for Data-Driven Model Exploration, with Practical Considerations
- Author
-
Hyemin Han
- Subjects
data-driven analysis ,model exploration ,variable selection ,Bayesian Model Averaging ,regularized regression ,LASSO ,Statistics ,HA1-4737 - Abstract
Methodological experts suggest that psychological and educational researchers should employ appropriate methods for data-driven model exploration, such as Bayesian Model Averaging and regularized regression, instead of conventional hypothesis-driven testing, if they want to explore the best prediction model. I intend to discuss practical considerations regarding data-driven methods for end-user researchers without sufficient expertise in quantitative methods. I tested three data-driven methods, i.e., Bayesian Model Averaging, LASSO as a form of regularized regression, and stepwise regression, with datasets in psychology and education. I compared their performance in terms of cross-validity indicating robustness against overfitting across different conditions. I employed functionalities widely available via R with default settings to provide information relevant to end users without advanced statistical knowledge. The results demonstrated that LASSO showed the best performance and Bayesian Model Averaging outperformed stepwise regression when there were many candidate predictors to explore. Based on these findings, I discussed appropriately using the data-driven model exploration methods across different situations from laypeople’s perspectives.
- Published
- 2024
- Full Text
- View/download PDF
50. Targeted co-expression networks for the study of traits
- Author
-
A. Gómez-Pascual, G. Rocamora-Pérez, L. Ibanez, and J. A. Botía
- Subjects
Co-expression ,Genes ,LASSO ,Trait ,WGCNA ,Medicine ,Science - Abstract
Abstract Weighted Gene Co-expression Network Analysis (WGCNA) is a widely used approach for the generation of gene co-expression networks. However, networks generated with this tool usually create large modules with a large set of functional annotations hard to decipher. We have developed TGCN, a new method to create Targeted Gene Co-expression Networks. This method identifies the transcripts that best predict the trait of interest based on gene expression using a refinement of the LASSO regression. Then, it builds the co-expression modules around those transcripts. Algorithm properties were characterized using the expression of 13 brain regions from the Genotype-Tissue Expression project. When comparing our method with WGCNA, TGCN networks lead to more precise modules that have more specific and yet rich biological meaning. Then, we illustrate its applicability by creating an APP-TGCN on The Religious Orders Study and Memory and Aging Project dataset, aiming to identify the molecular pathways specifically associated with APP role in Alzheimer’s disease. Main biological findings were further validated in two independent cohorts. In conclusion, we provide a new framework that serves to create targeted networks that are smaller, biologically relevant and useful in high throughput hypothesis driven research. The TGCN R package is available on Github: https://github.com/aliciagp/TGCN .
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.