115 results on '"Tsamardinos, I."'
Search Results
2. Common MicroRNAs in Pre-diagnostic Serum Associated with Lung Cancer in Two Cohorts up to Eight Years Before Diagnosis: A HUNT Study
- Author
-
Røe, O. D., Fotopoulos, I., Nguyen, O. T. D., Nøst, T. H., Markaki, M., Lagani, V., Mjelle, R., Sandanger, T. M., Sætrom, P., and Tsamardinos, I.
- Published
- 2022
3. EP01.01-009 Common MicroRNAs in Pre-diagnostic Serum Associated with Lung Cancer in Two Cohorts up to Eight Years Before Diagnosis: A HUNT Study
- Author
-
Røe, O.D., primary, Fotopoulos, I., additional, Nguyen, O.T.D., additional, Nøst, T.H., additional, Markaki, M., additional, Lagani, V., additional, Mjelle, R., additional, Sandanger, T.M., additional, Sætrom, P., additional, and Tsamardinos, I., additional
- Published
- 2022
- Full Text
- View/download PDF
4. MA11.04 Genetic Variants of HUNT Lung Cancer Model Improve Lung Cancer Risk Assessment Over Clinical Models
- Author
-
Nguyen, O.T.D., primary, Fotopoulos, I., additional, Nøst, T.H., additional, Markaki, M., additional, Lagani, V., additional, Tsamardinos, I., additional, and Røe, O.D., additional
- Published
- 2022
- Full Text
- View/download PDF
5. 1565P Improving lung cancer screening selection: Hunt Lung Cancer Model (HUNT LCM) versus PLCOm2012, early results after two rounds of screening in a prospective screening pilot study in Norway (TIDL)
- Author
-
Roe, O.D., Fotopoulos, I., Nguyen, O.T.D., Tsamardinos, I., Lagani, V., Strand, T.E., and Ashraf, H.
- Published
- 2024
- Full Text
- View/download PDF
6. 1193P Ethical lung cancer screening selection? Comparison of the European 4 - In The Long Run (4ITLR) criteria with the 2021 USPSTF (US), and HUNT lung cancer risk model in a large Norwegian prospective population-based study
- Author
-
Roe, O.D., Nguyen, O.T.D., Fotopoulos, I., Tsamardinos, I., and Lagani, V.
- Published
- 2024
- Full Text
- View/download PDF
7. Liquid Biopsy in Type 2 Diabetes Mellitus Management: Building Specific Biosignatures via Machine Learning
- Author
-
Karaglani, M. Panagopoulou, M. Cheimonidi, C. Tsamardinos, I. Maltezos, E. Papanas, N. Papazoglou, D. Mastorakos, G. Chatzaki, E.
- Subjects
endocrine system diseases ,nutritional and metabolic diseases - Abstract
Background: The need for minimally invasive biomarkers for the early diagnosis of type 2 diabetes (T2DM) prior to the clinical onset and monitoring of β-pancreatic cell loss is emerging. Here, we focused on studying circulating cell-free DNA (ccfDNA) as a liquid biopsy biomaterial for accurate diagnosis/monitoring of T2DM. Methods: ccfDNA levels were directly quantified in sera from 96 T2DM patients and 71 healthy individuals via fluorometry, and then fragment DNA size profiling was performed by capillary electrophoresis. Following this, ccfDNA methylation levels of five βcell-related genes were measured via qPCR. Data were analyzed by automated machine learning to build classifying predictive models. Results: ccfDNA levels were found to be similar between groups but indicative of apoptosis in T2DM. INS (Insulin), IAPP (Islet Amyloid Polypeptide-Amylin), GCK (Glucokinase), and KCNJ11 (Potassium Inwardly Rectifying Channel Subfamily J member 11) levels differed significantly between groups. AutoML analysis delivered biosignatures including GCK, IAPP and KCNJ11 methylation, with the highest ever reported discriminating performance of T2DM from healthy individuals (AUC 0.927). Conclusions: Our data unravel the value of ccfDNA as a minimally invasive biomaterial carrying important clinical information for T2DM. Upon prospective clinical evaluation, the built biosignature can be disruptive for T2DM clinical management. © 2022 by the authors. Licensee MDPI, Basel, Switzerland.
- Published
- 2022
8. Information-Preserving Techniques Improve Chemosensitivity Prediction of Tumours Based on Expression Profiles
- Author
-
Christodoulou, E. G., primary, Røe, O. D., additional, Folarin, A., additional, and Tsamardinos, I., additional
- Published
- 2011
- Full Text
- View/download PDF
9. Circulating cell-free DNA in breast cancer: size profiling, levels, and methylation patterns lead to prognostic and predictive classifiers
- Author
-
Panagopoulou, M. Karaglani, M. Balgkouranidou, I. Biziota, E. Koukaki, T. Karamitrousis, E. Nena, E. Tsamardinos, I. Kolios, G. Lianidou, E. Kakolyris, S. Chatzaki, E.
- Abstract
Blood circulating cell-free DNA (ccfDNA) is a suggested biosource of valuable clinical information for cancer, meeting the need for a minimally-invasive advancement in the route of precision medicine. In this paper, we evaluated the prognostic and predictive potential of ccfDNA parameters in early and advanced breast cancer. Groups consisted of 150 and 16 breast cancer patients under adjuvant and neoadjuvant therapy respectively, 34 patients with metastatic disease and 35 healthy volunteers. Direct quantification of ccfDNA in plasma revealed elevated concentrations correlated to the incidence of death, shorter PFS, and non-response to pharmacotherapy in the metastatic but not in the other groups. The methylation status of a panel of cancer-related genes chosen based on previous expression and epigenetic data (KLK10, SOX17, WNT5A, MSH2, GATA3) was assessed by quantitative methylation-specific PCR. All but the GATA3 gene was more frequently methylated in all the patient groups than in healthy individuals (all p < 0.05). The methylation of WNT5A was statistically significantly correlated to greater tumor size and poor prognosis characteristics and in advanced stage disease with shorter OS. In the metastatic group, also SOX17 methylation was significantly correlated to the incidence of death, shorter PFS, and OS. KLK10 methylation was significantly correlated to unfavorable clinicopathological characteristics and relapse, whereas in the adjuvant group to shorter DFI. Methylation of at least 3 or 4 genes was significantly correlated to shorter OS and no pharmacotherapy response, respectively. Classification analysis by a fully automated, machine learning software produced a single-parametric linear model using ccfDNA plasma concentration values, with great discriminating power to predict response to chemotherapy (AUC 0.803, 95% CI [0.606, 1.000]) in the metastatic group. Two more multi-parametric signatures were produced for the metastatic group, predicting survival and disease outcome. Finally, a multiple logistic regression model was constructed, discriminating between patient groups and healthy individuals. Overall, ccfDNA emerged as a highly potent predictive classifier in metastatic breast cancer. Upon prospective clinical evaluation, all the signatures produced could aid accurate prognosis. © 2019, Springer Nature Limited.
- Published
- 2019
10. P1.11-13 Mass Spectrometry Proteomics Analysis Discovers Biomarkers in Serum Months to Years Before Non-Small Cell Lung Cancer: The HUNT Study
- Author
-
Nguyen, O.T., primary, Markaki, M., additional, Sharma, A., additional, Chatzipantsiou, C., additional, Lagani, V., additional, Tsamardinos, I., additional, and Røe, O., additional
- Published
- 2019
- Full Text
- View/download PDF
11. Prothrombotic and Endothelial Inflammatory Markers in Greek Patients with Type 2 Diabetes Compared to Non-Diabetics
- Author
-
Tsamardinos I, Siomos K, Koygioylis M, Kerkentzes K, Christina-Maria Trakatelli, and Papadakis E
- Subjects
medicine.medical_specialty ,Creatinine ,Homocysteine ,business.industry ,Insulin ,medicine.medical_treatment ,Renal function ,Type 2 diabetes ,Fibrinogen ,medicine.disease ,Thrombomodulin ,Gastroenterology ,chemistry.chemical_compound ,chemistry ,Diabetes mellitus ,Internal medicine ,Immunology ,medicine ,business ,medicine.drug - Abstract
Objective: To evaluate specific factors of coagulation and endothelial inflammatory markers namely, thrombomodulin, soluble receptor of the protein C (sEPCR), factor VIII, plasminogen activator inhibitor 1, Von Willebrandt factor, fibrinogen, fibrinogen dimers (d-dimers), high sensitivity C-reactive protein and homocysteine in a subset of Greek subjects with and without Type 2 (T2) Diabetes. Design: 84 subjects, of which 44 patients with T2 diabetes, were included in the randomized comparative prospective cross sectional study. The subjects were split into a Τ2 diabetics group and a group of healthy controls of similar age, anthropometric profiles and similar gender distribution. Results: A total of 47 variables and biomarkers together with indicators for metabolic profiles, clinical history, as well as detailed anthropometric profiles and traditional risk factors, were evaluated. Dipeptidyl peptidase-4 (DPP4), Insulin, use of Sulfonylurea, high HBA1c and glucose levels, were clearly statistically differentiated in the two groups, while no other biomarkers including the new potential indicators were found to be different. High values of thrombomodulin and homocysteine were correlated with a rise in creatinine and thus seem to affect renal function in the diabetic patients group while in the non-diabetics group the correlations are different with sEPCR having a relative strong negative correlation in renal function as measured with The Modification of Diet in Renal Disease, in agreement with the latest international findings. Conclusions: The presence of T2 diabetes in conjunction with age clearly correlates with problems in renal function, thrombomodulin and homocysteine could serve as indicators for renal damage in diabetics but not in healthy individuals. sEPCR on the other hand could be a potential generic indicator for renal damage. Thrombomodulin and sEPCR as prothombotic agents, did not show any indication that they can be utilised as markers for the prevention and/or treatment of thrombotic complications in diabetic patients.
- Published
- 2016
12. Scoring and Searching over Bayesian Networks with Causal and Associative Priors
- Author
-
Giorgos Borboudakis and Tsamardinos, I.
- Subjects
FOS: Computer and information sciences ,Computer Science - Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
A significant theoretical advantage of search-and-score methods for learning Bayesian Networks is that they can accept informative prior beliefs for each possible network, thus complementing the data. In this paper, a method is presented for assigning priors based on beliefs on the presence or absence of certain paths in the true network. Such beliefs correspond to knowledge about the possible causal and associative relations between pairs of variables. This type of knowledge naturally arises from prior experimental and observational data, among others. In addition, a novel search-operator is proposed to take advantage of such prior knowledge. Experiments show that, using path beliefs improves the learning of the skeleton, as well as the edge directions in the network., Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)
- Published
- 2014
- Full Text
- View/download PDF
13. HEARTFAID Decision Support System Prototype
- Author
-
Colantonio S., Martinelli M., Moroni D., Salvetti O., Chiarugi F., Tsamardinos I., Candelieri A., Conforti D., Lagani V., and Gamberger D.
- Subjects
Decision Support System ,Prototype - Abstract
This deliverable summaries the activities carried out, within Task 5.4 "Implementation of the Decision Support System", for developing the HEARTFAID CDSS Prototype, following the design that was performed in the previous years of the Project, within Task 5.3, and summarized in Deliverable D15 "Functional Specifications of Data Processing and Decision Support Services" (DEL D15, 2007). Such a design was based on a deep investigation of CHF stakeholders' needs and expectations which resulted in a detailed list of CDSS functional requirements; while an accurate analysis of the methodological and technological foundations resulted in the detailed definition of the CDSS functional specifications. The CDSS has been then devised by integrating, in functionally advanced settings: (i) deductive knowledge, elicited from guidelines and medical experts; (ii) inductive knowledge, extracted by data mining techniques applied to significant sets of data; (iii) computational methods for the analysis and interpretation of diagnostic data (i.e., ECG signals and Echocardiographic images). This has required the collection of some results of the activities carried out in other Tasks or Work Packages (WP) of the project. In particular, the deductive knowledge base has been formalized in WP4 and a precise description of its content can be found in Deliverable D22 "Ontologies and Knowledge Representation" (DEL D22, 2007). Inductive models have resulted from the activity of knowledge discovery and described more precisely in Deliverable D29. Methods and algorithms for processing and analysing diagnostic ECG and Echo images have been developed in Task 5.2 and detailed in Deliverable D30 "Models and Methods for Signals and Images Processing" (DEL D30, 2008). This document mainly focuses on the implementation of each component of the CDSS architecture and their integration into the system, and on the integration of the CDSS itself within the platform (integration that is realized mainly by means of the HEARTFAID Web Portal). For the implementation activity, Semantic Web Technologies (SWT) have been used as the most advanced tools for formalizing, re-using and sharing medical knowledge, and reasoning on it; while a service oriented approach has been adopted for the integration and easy access to a number of functionalities. Moreover, realistic clinical scenarios have been carefully defined in cooperation with clinical partners for assuring the realness and effectiveness of the developed functionalities of the system. More in details, the document is organized as follows. In the first introductory chapter, a recall of the significance of decision support in the management of CHF is reported along with a brief description of the main issues to be faced when developing a decision support system. Afterwards, in the second chapter, HEARTFAID CDSS architecture, as was defined in the previous Tasks 5.3 of the Work-Package 5, is described and motivated. Also, the integration of the system within HEARTFAID platform is summarized for assuring a global view of the system functioning. In Chapter 3, attention is focused on the implementation details, hence each component of the CDSS architecture is considered and its implementation described in details along with the problems faced and the solutions adopted. Particular attention is devoted to the ontological Knowledge Base, the computational reasoning methods included into the Model Base, as well as the algorithms defined and developed for processing ECG signals and Echocardiographic images. Finally, the implementation of the Meta Level of the system is described and discussed. In the fourth chapter, the integration within the Platform through the Middleware is discussed along with the issues faced and the technological choices made. The implementation of the showcase related to the clinical management of a HF patient is reported in Chapter 5. Two appendices close the document.
- Published
- 2008
14. Chemosensitivity Prediction of Tumours based on Expression, miRNA, and Proteomics Data
- Author
-
Tsamardinos, I., primary, Borboudakis, G., additional, Christodoulou, E. G., additional, and Røe, O. D., additional
- Full Text
- View/download PDF
15. HITON: A Novel Markov Blanket Algorithm for Optimal Variable Selection
- Author
-
Aliferis, C.F., Tsamardinos, I., and Statnikov, A.
- Subjects
Humans ,Classification ,Decision Support Systems, Clinical ,Article ,Algorithms ,Markov Chains - Abstract
We introduce a novel, sound, sample-efficient, and highly-scalable algorithm for variable selection for classification, regression and prediction called HITON. The algorithm works by inducing the Markov Blanket of the variable to be classified or predicted. A wide variety of biomedical tasks with different characteristics were used for an empirical evaluation. Namely, (i) bioactivity prediction for drug discovery, (ii) clinical diagnosis of arrhythmias, (iii) bibliographic text categorization, (iv) lung cancer diagnosis from gene expression array data, and (v) proteomics-based prostate cancer detection. State-of-the-art algorithms for each domain were selected for baseline comparison. Results: (1) HITON reduces the number of variables in the prediction models by three orders of magnitude relative to the original variable set while improving or maintaining accuracy. (2) HITON outperforms the baseline algorithms by selecting more than two orders-of-magnitude smaller variable sets than the baselines, in the selected tasks and datasets.
- Published
- 2003
16. Chemosensitivity Prediction of Tumours Based on Expression, miRNA, and Proteomics Data
- Author
-
Tsamardinos, I., primary, Borboudakis, G., additional, Christodoulou, E. G., additional, and Røe, O. D., additional
- Published
- 2012
- Full Text
- View/download PDF
17. Morphological classification of heartbeats using similarity features and a two-phase decision tree
- Author
-
Chiarugi, F., primary, Emmanouilidou, D., additional, Tsamardinos, I., additional, and Tollis, I.G., additional
- Published
- 2008
- Full Text
- View/download PDF
18. Predicting the occurrence of acute hypotensive episodes: The PhysioNet Challenge.
- Author
-
Chiarugi, F., Karatzanis, I., Sakkalis, V., Tsamardinos, I., Dermitzaki, T., Foukarakis, M., and Vrouchos, G.
- Published
- 2009
19. Identifying Markov blankets with decision tree induction.
- Author
-
Frey, L., Fisher, D., Tsamardinos, I., Aliferis, C.F., and Statnikov, A.
- Published
- 2003
- Full Text
- View/download PDF
20. A scheme for integrating e-services in establishing virtual enterprises.
- Author
-
Berfield, A., Chrysanthis, P.K., Tsamardinos, I., Pollack, M.E., and Banerjee, S.
- Published
- 2002
- Full Text
- View/download PDF
21. A scheme for integrating e-services in establishing virtual enterprises
- Author
-
Berfield, A., primary, Chrysanthis, P.K., additional, Tsamardinos, I., additional, Pollack, M.E., additional, and Banerjee, S., additional
- Full Text
- View/download PDF
22. Identifying Markov blankets with decision tree induction
- Author
-
Frey, L., primary, Fisher, D., additional, Tsamardinos, I., additional, Aliferis, C.F., additional, and Statnikov, A., additional
- Full Text
- View/download PDF
23. A vision and strategy for the virtual physiological human: 2012 update
- Author
-
Peter Hunter, John Skår, Peter V. Coveney, Karl A. Stroetmann, Vanessa Diaz, Ioannis Tsamardinos, Jesper Tegnér, Bernard de Bono, Marco Viceconti, Johannes H. G. M. van Beek, Miriam Mendes, S. Randall Thomas, Nour Shublaq, Patricia V. Lawford, John Fenner, Alfio Quarteroni, Ioannis G. Tollis, Peter J. Harris, Alejandro F. Frangi, Peter Kohl, Keith McCormack, Tara Chapman, Stig W. Omholt, Rod Hose, Hunter, P., Chapman, T., Coveney, P. V., de Bono, B., Diaz, V., Fenner, J., Frangi, A. F., Harris, P., Hose, R., Kohl, P., Lawford, P., McCormack, K., Mendes, M., Omholt, S., Quarteroni, A., Shublaq, N., Skar, J., Stroetmann, K., Tegner, J., Thomas, S. R., Tollis, I., Tsamardinos, I., van Beek, J. H., Viceconti, M., Human genetics, NCA - neurodegeneration, Bioinformatics, Neuroscience Campus Amsterdam - Neurodegeneration, AIMMS, Functional Genomics, and Integrative Bioinformatics
- Subjects
Computer science ,Biomedical Engineering ,Biophysics ,Bioengineering ,Business model ,Biochemistry ,Biomaterials ,SDG 17 - Partnerships for the Goals ,Health care ,Network of excellence ,Innovation ,business.industry ,CellML ,Virtual Physiological Human ,Open source software ,Articles ,virtual physiological human, multiscale modelling, physiome, systems biology, computational physiology ,Data science ,Engineering management ,Physiome ,Information and Communications Technology ,and Infrastructure ,SDG 9 - Industry, Innovation, and Infrastructure ,business ,SDG 9 - Industry ,Biotechnology - Abstract
European funding under Framework 7 (FP7) for the virtual physiological human (VPH) project has been in place now for 5 years. The VPH Network of Excellence (NoE) has been set up to help develop common standards, open source software, freely accessible data and model repositories, and various training and dissemination activities for the project. It is also working to coordinate the many clinically targeted projects that have been funded under the FP7 calls. An initial vision for the VPH was defined by the FP6 STEP project in 2006. In 2010, we wrote an assessment of the accomplishments of the first two years of the VPH in which we considered the biomedical science, healthcare and information and communications technology challenges facing the project (Hunter et al. 2010 Phil. Trans. R. Soc. A 368 , 2595–2614 ( doi:10.1098/rsta.2010.0048 )). We proposed that a not-for-profit professional umbrella organization, the VPH Institute, should be established as a means of sustaining the VPH vision beyond the time-frame of the NoE. Here, we update and extend this assessment and in particular address the following issues raised in response to Hunter et al. : (i) a vision for the VPH updated in the light of progress made so far, (ii) biomedical science and healthcare challenges that the VPH initiative can address while also providing innovation opportunities for the European industry, and (iii) external changes needed in regulatory policy and business models to realize the full potential that the VPH has to offer to industry, clinics and society generally.
- Published
- 2014
24. The HUNT lung-SNP model: genetic variants plus clinical variables improve lung cancer risk assessment over clinical models.
- Author
-
Nguyen OTD, Fotopoulos I, Nøst TH, Markaki M, Lagani V, Tsamardinos I, and Røe OD
- Subjects
- Humans, Male, Female, Risk Assessment methods, Middle Aged, Prospective Studies, Aged, Norway epidemiology, Genetic Predisposition to Disease, Adult, Lung Neoplasms genetics, Lung Neoplasms epidemiology, Polymorphism, Single Nucleotide
- Abstract
Purpose: The HUNT Lung Cancer Model (HUNT LCM) predicts individualized 6-year lung cancer (LC) risk among individuals who ever smoked cigarettes with high precision based on eight clinical variables. Can the performance be improved by adding genetic information?, Methods: A polygenic model was developed in the prospective Norwegian HUNT2 study with clinical and genotype data of individuals who ever smoked cigarettes (n = 30749, median follow up 15.26 years) where 160 LC were diagnosed within six years. It included the variables of the original HUNT LCM plus 22 single nucleotide polymorphisms (SNPs) highly associated with LC. External validation was performed in the prospective Norwegian Tromsø Study (n = 2663)., Results: The novel HUNT Lung-SNP model significantly improved risk ranking of individuals over the HUNT LCM in both HUNT2 (p < 0.001) and Tromsø (p < 0.05) cohorts. Furthermore, detection rate (number of participants selected to detect one LC case) was significantly better for the HUNT Lung-SNP vs. HUNT LCM in both cohorts (42 vs. 48, p = 0.003 and 11 vs. 14, p = 0.025, respectively) as well as versus the NLST, NELSON and 2021 USPSTF criteria. The area under the receiver operating characteristic curve (AUC) was higher for the HUNT Lung-SNP in both cohorts, but significant only in HUNT2 (AUC 0.875 vs. 0.844, p < 0.001). However, the integrated discrimination improvement index (IDI) indicates a significant improvement of LC risk stratification by the HUNT Lung-SNP in both cohorts (IDI 0.019, p < 0.001 (HUNT2) and 0.013, p < 0.001 (Tromsø))., Conclusion: The HUNT Lung-SNP model could have a clinical impact on LC screening and has the potential to replace the HUNT LCM as well as the NLST, NELSON and 2021 USPSTF criteria in a screening setting. However, the model should be further validated in other populations and evaluated in a prospective trial setting., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
25. Promising microRNAs in pre-diagnostic serum associated with lung cancer up to eight years before diagnosis: a HUNT study.
- Author
-
Fotopoulos I, Nguyen OTD, Nøst TH, Markaki M, Lagani V, Mjelle R, Sandanger TM, Sætrom P, Tsamardinos I, and Røe OD
- Subjects
- Humans, Female, Male, Middle Aged, Prospective Studies, Aged, Case-Control Studies, Smoking blood, Smoking adverse effects, Adult, Lung Neoplasms blood, Lung Neoplasms genetics, Lung Neoplasms diagnosis, Biomarkers, Tumor blood, Biomarkers, Tumor genetics, MicroRNAs blood, MicroRNAs genetics, Early Detection of Cancer methods
- Abstract
Introduction: Blood biomarkers for early detection of lung cancer (LC) are in demand. There are few studies of the full microRNome in serum of asymptomatic subjects that later develop LC. Here we searched for novel microRNA biomarkers in blood from non-cancer, ever-smokers populations up to eight years before diagnosis., Methods: Serum samples from 98,737 subjects from two prospective population studies, HUNT2 and HUNT3, were considered initially. Inclusion criteria for cases were: ever-smokers; no known cancer at study entrance; 0-8 years from blood sampling to LC diagnosis. Each future LC case had one control matched to sex, age at study entrance, pack-years, smoking cessation time, and similar HUNT Lung Cancer Model risk score. A total of 240 and 72 serum samples were included in the discovery (HUNT2) and validation (HUNT3) datasets, respectively, and analysed by next-generation sequencing. The validated serum microRNAs were also tested in two pre-diagnostic plasma datasets from the prospective population studies NOWAC (n = 266) and NSHDS (n = 258). A new model adding clinical variables was also developed and validated., Results: Fifteen unique microRNAs were discovered and validated in the pre-diagnostic serum datasets when all cases were contrasted against all controls, all with AUC > 0.60. In combination as a 15-microRNAs signature, the AUC reached 0.708 (discovery) and 0.703 (validation). A non-small cell lung cancer signature of six microRNAs showed AUC 0.777 (discovery) and 0.806 (validation). Combined with clinical variables of the HUNT Lung Cancer Model (age, gender, pack-years, daily cough parts of the year, hours of indoor smoke exposure, quit time in years, number of cigarettes daily, body mass index (BMI)) the AUC reached 0.790 (discovery) and 0.833 (validation). These results could not be validated in the plasma samples., Conclusion: There were a few significantly differential expressed microRNAs in serum up to eight years before diagnosis. These promising microRNAs alone, in concert, or combined with clinical variables have the potential to serve as early diagnostic LC biomarkers. Plasma is not suitable for this analysis. Further validation in larger prospective serum datasets is needed., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
26. Out-of-Sample Tuning for Causal Discovery.
- Author
-
Biza K, Tsamardinos I, and Triantafillou S
- Abstract
Causal discovery is continually being enriched with new algorithms for learning causal graphical probabilistic models. Each one of them requires a set of hyperparameters, creating a great number of combinations. Given that the true graph is unknown and the learning task is unsupervised, the challenge to a practitioner is how to tune these choices. We propose out-of-sample causal tuning (OCT) that aims to select an optimal combination. The method treats a causal model as a set of predictive models and uses out-of-sample protocols for supervised methods. This approach can handle general settings like latent confounders and nonlinear relationships. The method uses an information-theoretic approach to be able to generalize to mixed data types and a penalty for dense graphs to penalize for complexity. To evaluate OCT, we introduce a causal-based simulation method to create datasets that mimic the properties of real-world problems. We evaluate OCT against two other tuning approaches, based on stability and in-sample fitting. We show that OCT performs well in many experimental settings and it is an effective tuning method for causal discovery.
- Published
- 2024
- Full Text
- View/download PDF
27. Improving Lung Cancer Screening Selection: The HUNT Lung Cancer Risk Model for Ever-Smokers Versus the NELSON and 2021 United States Preventive Services Task Force Criteria in the Cohort of Norway: A Population-Based Prospective Study.
- Author
-
Nguyen OTD, Fotopoulos I, Markaki M, Tsamardinos I, Lagani V, and Røe OD
- Abstract
Background: Improving the method for selecting participants for lung cancer (LC) screening is an urgent need. Here, we compared the performance of the Helseundersøkelsen i Nord-Trøndelag (HUNT) Lung Cancer Model (HUNT LCM) versus the Dutch-Belgian lung cancer screening trial (Nederlands-Leuvens Longkanker Screenings Onderzoek (NELSON)) and 2021 United States Preventive Services Task Force (USPSTF) criteria regarding LC risk prediction and efficiency., Methods: We used linked data from 10 Norwegian prospective population-based cohorts, Cohort of Norway. The study included 44,831 ever-smokers, of which 686 (1.5%) patients developed LC; the median follow-up time was 11.6 years (0.01-20.8 years)., Results: Within 6 years, 222 (0.5%) individuals developed LC. The NELSON and 2021 USPSTF criteria predicted 37.4% and 59.5% of the LC cases, respectively. By considering the same number of individuals as the NELSON and 2021 USPSTF criteria selected, the HUNT LCM increased the LC prediction rate by 41.0% and 12.1%, respectively. The HUNT LCM significantly increased sensitivity ( p < 0.001 and p = 0.028), and reduced the number needed to predict one LC case (29 versus 40, p < 0.001 and 36 versus 40, p = 0.02), respectively. Applying the HUNT LCM 6-year 0.98% risk score as a cutoff (14.0% of ever-smokers) predicted 70.7% of all LC, increasing LC prediction rate with 89.2% and 18.9% versus the NELSON and 2021 USPSTF, respectively (both p < 0.001)., Conclusions: The HUNT LCM was significantly more efficient than the NELSON and 2021 USPSTF criteria, improving the prediction of LC diagnosis, and may be used as a validated clinical tool for screening selection., Competing Interests: The authors declare no conflict of interest., (© 2024 The Authors.)
- Published
- 2024
- Full Text
- View/download PDF
28. XPF interacts with TOP2B for R-loop processing and DNA looping on actively transcribed genes.
- Author
-
Chatzinikolaou G, Stratigi K, Siametis A, Goulielmaki E, Akalestou-Clocher A, Tsamardinos I, Topalis P, Austin C, Bouwman BAM, Crosetto N, Altmüller J, and Garinis GA
- Subjects
- CCCTC-Binding Factor genetics, CCCTC-Binding Factor metabolism, Chromosomes, DNA Repair, Chromatin, DNA-Binding Proteins genetics, DNA-Binding Proteins metabolism, R-Loop Structures
- Abstract
Co-transcriptional RNA-DNA hybrids can not only cause DNA damage threatening genome integrity but also regulate gene activity in a mechanism that remains unclear. Here, we show that the nucleotide excision repair factor XPF interacts with the insulator binding protein CTCF and the cohesin subunits SMC1A and SMC3, leading to R-loop-dependent DNA looping upon transcription activation. To facilitate R-loop processing, XPF interacts and recruits with TOP2B on active gene promoters, leading to double-strand break accumulation and the activation of a DNA damage response. Abrogation of TOP2B leads to the diminished recruitment of XPF, CTCF, and the cohesin subunits to promoters of actively transcribed genes and R-loops and the concurrent impairment of CTCF-mediated DNA looping. Together, our findings disclose an essential role for XPF with TOP2B and the CTCF/cohesin complex in R-loop processing for transcription activation with important ramifications for DNA repair-deficient syndromes associated with transcription-associated DNA damage.
- Published
- 2023
- Full Text
- View/download PDF
29. Correction: Prediction of outcome in patients with non-small cell lung cancer treated with second line PD-1/PDL-1 inhibitors based on clinical parameters: Results from a prospective, single institution study.
- Author
-
Rounis K, Makrakis D, Papadaki C, Monastirioti A, Vamvakas L, Kalbakis K, Gourlia K, Xanthopoulos I, Tsamardinos I, Mavroudis D, and Agelaki S
- Abstract
[This corrects the article DOI: 10.1371/journal.pone.0252537.]., (Copyright: © 2023 Rounis et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
- Published
- 2023
- Full Text
- View/download PDF
30. Corrigendum to "A characteristic cerebellar biosignature for bipolar disorder, identified with fully automatic machine learning" [IBRO Neurosci Rep. 15 (2023) 77-89].
- Author
-
Thomaidis GV, Papadimitriou K, Michos S, Chartampilas E, and Tsamardinos I
- Abstract
[This corrects the article DOI: 10.1016/j.ibneur.2023.06.008.]., (© 2023 The Authors.)
- Published
- 2023
- Full Text
- View/download PDF
31. Automated machine learning for genome wide association studies.
- Author
-
Lakiotaki K, Papadovasilakis Z, Lagani V, Fafalios S, Charonyktakis P, Tsagris M, and Tsamardinos I
- Subjects
- Humans, Phenotype, Computer Simulation, Machine Learning, Genome-Wide Association Study, Polymorphism, Single Nucleotide
- Abstract
Motivation: Genome-wide association studies (GWAS) present several computational and statistical challenges for their data analysis, including knowledge discovery, interpretability, and translation to clinical practice., Results: We develop, apply, and comparatively evaluate an automated machine learning (AutoML) approach, customized for genomic data that delivers reliable predictive and diagnostic models, the set of genetic variants that are important for predictions (called a biosignature), and an estimate of the out-of-sample predictive power. This AutoML approach discovers variants with higher predictive performance compared to standard GWAS methods, computes an individual risk prediction score, generalizes to new, unseen data, is shown to better differentiate causal variants from other highly correlated variants, and enhances knowledge discovery and interpretability by reporting multiple equivalent biosignatures., Availability and Implementation: Code for this study is available at: https://github.com/mensxmachina/autoML-GWAS. JADBio offers a free version at: https://jadbio.com/sign-up/. SNP data can be downloaded from the EGA repository (https://ega-archive.org/). PRS data are found at: https://www.aicrowd.com/challenges/opensnp-height-prediction. Simulation data to study population structure can be found at: https://easygwas.ethz.ch/data/public/dataset/view/1/., (© The Author(s) 2023. Published by Oxford University Press.)
- Published
- 2023
- Full Text
- View/download PDF
32. A characteristic cerebellar biosignature for bipolar disorder, identified with fully automatic machine learning.
- Author
-
Thomaidis GV, Papadimitriou K, Michos S, Chartampilas E, and Tsamardinos I
- Abstract
Background: Transcriptomic profile differences between patients with bipolar disorder and healthy controls can be identified using machine learning and can provide information about the potential role of the cerebellum in the pathogenesis of bipolar disorder.With this aim, user-friendly, fully automated machine learning algorithms can achieve extremely high classification scores and disease-related predictive biosignature identification, in short time frames and scaled down to small datasets., Method: A fully automated machine learning platform, based on the most suitable algorithm selection and relevant set of hyper-parameter values, was applied on a preprocessed transcriptomics dataset, in order to produce a model for biosignature selection and to classify subjects into groups of patients and controls. The parent GEO datasets were originally produced from the cerebellar and parietal lobe tissue of deceased bipolar patients and healthy controls, using Affymetrix Human Gene 1.0 ST Array., Results: Patients and controls were classified into two separate groups, with no close-to-the-boundary cases, and this classification was based on the cerebellar transcriptomic biosignature of 25 features (genes), with Area Under Curve 0.929 and Average Precision 0.955. The biosignature includes both genes connected before to bipolar disorder, depression, psychosis or epilepsy, as well as genes not linked before with any psychiatric disease. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis revealed participation of 4 identified features in 6 pathways which have also been associated with bipolar disorder., Conclusion: Automated machine learning (AutoML) managed to identify accurately 25 genes that can jointly - in a multivariate-fashion - separate bipolar patients from healthy controls with high predictive power. The discovered features lead to new biological insights. Machine Learning (ML) analysis considers the features in combination (in contrast to standard differential expression analysis), removing both irrelevant as well as redundant markers, and thus, focusing to biological interpretation., Competing Interests: None., (© 2023 The Authors.)
- Published
- 2023
- Full Text
- View/download PDF
33. Learning biologically-interpretable latent representations for gene expression data: Pathway Activity Score Learning Algorithm.
- Author
-
Karagiannaki I, Gourlia K, Lagani V, Pantazis Y, and Tsamardinos I
- Abstract
Molecular gene-expression datasets consist of samples with tens of thousands of measured quantities (i.e., high dimensional data). However, lower-dimensional representations that retain the useful biological information do exist. We present a novel algorithm for such dimensionality reduction called Pathway Activity Score Learning (PASL). The major novelty of PASL is that the constructed features directly correspond to known molecular pathways (genesets in general) and can be interpreted as pathway activity scores . Hence, unlike PCA and similar methods, PASL's latent space has a fairly straightforward biological interpretation. PASL is shown to outperform in predictive performance the state-of-the-art method (PLIER) on two collections of breast cancer and leukemia gene expression datasets. PASL is also trained on a large corpus of 50000 gene expression samples to construct a universal dictionary of features across different tissues and pathologies. The dictionary validated on 35643 held-out samples for reconstruction error. It is then applied on 165 held-out datasets spanning a diverse range of diseases. The AutoML tool JADBio is employed to show that the predictive information in the PASL-created feature space is retained after the transformation. The code is available at https://github.com/mensxmachina/PASL., Competing Interests: Conflict of interestI.T. and V.L. are affiliated with the JADBio (Gnosis Data Analysis) company., (© The Author(s) 2022.)
- Published
- 2023
- Full Text
- View/download PDF
34. Don't lose samples to estimation.
- Author
-
Tsamardinos I
- Abstract
In a typical predictive modeling task, we are asked to produce a final predictive model to employ operationally for predictions, as well as an estimate of its out-of-sample predictive performance. Typically, analysts hold out a portion of the available data, called a Test set, to estimate the model predictive performance on unseen (out-of-sample) records, thus "losing these samples to estimation." However, this practice is unacceptable when the total sample size is low. To avoid losing data to estimation, we need a shift in our perspective: we do not estimate the performance of a specific model instance; we estimate the performance of the pipeline that produces the model. This pipeline is applied on all available samples to produce the final model; no samples are lost to estimation. An estimate of its performance is provided by training the same pipeline on subsets of the samples. When multiple pipelines are tried, additional considerations that correct for the "winner's curse" need to be in place., Competing Interests: Ioannis Tsamardinos is co-founder and CEO of JADBio Gnosis DA S.A., (© 2022 The Author.)
- Published
- 2022
- Full Text
- View/download PDF
35. A machine learning approach utilizing DNA methylation as an accurate classifier of COVID-19 disease severity.
- Author
-
Bowler S, Papoutsoglou G, Karanikas A, Tsamardinos I, Corley MJ, and Ndhlovu LC
- Subjects
- Humans, DNA Methylation, Pandemics, Machine Learning, Severity of Illness Index, COVID-19 diagnosis, COVID-19 genetics
- Abstract
Since the onset of the COVID-19 pandemic, increasing cases with variable outcomes continue globally because of variants and despite vaccines and therapies. There is a need to identify at-risk individuals early that would benefit from timely medical interventions. DNA methylation provides an opportunity to identify an epigenetic signature of individuals at increased risk. We utilized machine learning to identify DNA methylation signatures of COVID-19 disease from data available through NCBI Gene Expression Omnibus. A training cohort of 460 individuals (164 COVID-19-infected and 296 non-infected) and an external validation dataset of 128 individuals (102 COVID-19-infected and 26 non-COVID-associated pneumonia) were reanalyzed. Data was processed using ChAMP and beta values were logit transformed. The JADBio AutoML platform was leveraged to identify a methylation signature associated with severe COVID-19 disease. We identified a random forest classification model from 4 unique methylation sites with the power to discern individuals with severe COVID-19 disease. The average area under the curve of receiver operator characteristic (AUC-ROC) of the model was 0.933 and the average area under the precision-recall curve (AUC-PRC) was 0.965. When applied to our external validation, this model produced an AUC-ROC of 0.898 and an AUC-PRC of 0.864. These results further our understanding of the utility of DNA methylation in COVID-19 disease pathology and serve as a platform to inform future COVID-19 related studies., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
36. Corrigendum to "A validated clinical risk prediction model for lung cancer in smokers of all ages and exposure types: A HUNT study" [EBioMedicine 31 (2018) 36-46].
- Author
-
Markaki M, Tsamardinos I, Langhammer A, Lagani V, Hveem K, and Røe OD
- Published
- 2022
- Full Text
- View/download PDF
37. Just Add Data: automated predictive modeling for knowledge discovery and feature selection.
- Author
-
Tsamardinos I, Charonyktakis P, Papoutsoglou G, Borboudakis G, Lakiotaki K, Zenklusen JC, Juhl H, Chatzaki E, and Lagani V
- Abstract
Fully automated machine learning (AutoML) for predictive modeling is becoming a reality, giving rise to a whole new field. We present the basic ideas and principles of Just Add Data Bio (JADBio), an AutoML platform applicable to the low-sample, high-dimensional omics data that arise in translational medicine and bioinformatics applications. In addition to predictive and diagnostic models ready for clinical use, JADBio focuses on knowledge discovery by performing feature selection and identifying the corresponding biosignatures, i.e., minimal-size subsets of biomarkers that are jointly predictive of the outcome or phenotype of interest. It also returns a palette of useful information for interpretation, clinical use of the models, and decision making. JADBio is qualitatively and quantitatively compared against Hyper-Parameter Optimization Machine Learning libraries. Results show that in typical omics dataset analysis, JADBio manages to identify signatures comprising of just a handful of features while maintaining competitive predictive performance and accurate out-of-sample performance estimation., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
38. The Essentials of Multiomics.
- Author
-
Marshall JL, Peshkin BN, Yoshino T, Vowinckel J, Danielsen HE, Melino G, Tsamardinos I, Haudenschild C, Kerr DJ, Sampaio C, Rha SY, FitzGerald KT, Holland EC, Gallagher D, Garcia-Foncillas J, and Juhl H
- Subjects
- Genomics, Humans, Medical Oncology, Proteomics, Artificial Intelligence, Neoplasms genetics, Neoplasms therapy
- Abstract
Within the last decade, the science of molecular testing has evolved from single gene and single protein analysis to broad molecular profiling as a standard of care, quickly transitioning from research to practice. Terms such as genomics, transcriptomics, proteomics, circulating omics, and artificial intelligence are now commonplace, and this rapid evolution has left us with a significant knowledge gap within the medical community. In this paper, we attempt to bridge that gap and prepare the physician in oncology for multiomics, a group of technologies that have gone from looming on the horizon to become a clinical reality. The era of multiomics is here, and we must prepare ourselves for this exciting new age of cancer medicine., (© The Author(s) 2022. Published by Oxford University Press.)
- Published
- 2022
- Full Text
- View/download PDF
39. Tissue-Specific Methylation Biosignatures for Monitoring Diseases: An In Silico Approach.
- Author
-
Karaglani M, Panagopoulou M, Baltsavia I, Apalaki P, Theodosiou T, Iliopoulos I, Tsamardinos I, and Chatzaki E
- Subjects
- DNA Methylation, Epigenome, Female, Humans, Breast Neoplasms metabolism, Osteoarthritis metabolism
- Abstract
Tissue-specific gene methylation events are key to the pathogenesis of several diseases and can be utilized for diagnosis and monitoring. Here, we established an in silico pipeline to analyze high-throughput methylome datasets to identify specific methylation fingerprints in three pathological entities of major burden, i.e., breast cancer (BrCa), osteoarthritis (OA) and diabetes mellitus (DM). Differential methylation analysis was conducted to compare tissues/cells related to the pathology and different types of healthy tissues, revealing Differentially Methylated Genes (DMGs). Highly performing and low feature number biosignatures were built with automated machine learning, including: (1) a five-gene biosignature discriminating BrCa tissue from healthy tissues (AUC 0.987 and precision 0.987), (2) three equivalent OA cartilage-specific biosignatures containing four genes each (AUC 0.978 and precision 0.986) and (3) a four-gene pancreatic β-cell-specific biosignature (AUC 0.984 and precision 0.995). Next, the BrCa biosignature was validated using an independent ccfDNA dataset showing an AUC and precision of 1.000, verifying the biosignature's applicability in liquid biopsy. Functional and protein interaction prediction analysis revealed that most DMGs identified are involved in pathways known to be related to the studied diseases or pointed to new ones. Overall, our data-driven approach contributes to the maximum exploitation of high-throughput methylome readings, helping to establish specific disease profiles to be applied in clinical practice and to understand human pathology.
- Published
- 2022
- Full Text
- View/download PDF
40. The γ-OMP Algorithm for Feature Selection With Application to Gene Expression Data.
- Author
-
Tsagris M, Papadovasilakis Z, Lakiotaki K, and Tsamardinos I
- Subjects
- Case-Control Studies, Gene Expression, Logistic Models, Algorithms, Computational Biology
- Abstract
Feature selection for predictive analytics is the problem of identifying a minimal-size subset of features that is maximally predictive of an outcome of interest. To apply to molecular data, feature selection algorithms need to be scalable to tens of thousands of features. In this paper, we propose γ-OMP, a generalisation of the highly-scalable Orthogonal Matching Pursuit feature selection algorithm. γ-OMP can handle (a)various types of outcomes, such as continuous, binary, nominal, time-to-event, (b)discrete (categorical)features, (c)different statistical-based stopping criteria, (d)several predictive models (e.g., linear or logistic regression), (e)various types of residuals, and (f)different types of association. We compare γ-OMP against LASSO, a prototypical, widely used algorithm for high-dimensional data. On both simulated data and several real gene expression datasets, γ-OMP is on par, or outperforms LASSO in binary classification (case-control data), regression (quantified outcomes), and time-to-event data (censored survival times). γ-OMP is based on simple statistical ideas, it is easy to implement and to extend, and our extensive evaluation shows that it is also effective in bioinformatics analysis settings.
- Published
- 2022
- Full Text
- View/download PDF
41. Liquid Biopsy in Type 2 Diabetes Mellitus Management: Building Specific Biosignatures via Machine Learning.
- Author
-
Karaglani M, Panagopoulou M, Cheimonidi C, Tsamardinos I, Maltezos E, Papanas N, Papazoglou D, Mastorakos G, and Chatzaki E
- Abstract
Background: The need for minimally invasive biomarkers for the early diagnosis of type 2 diabetes (T2DM) prior to the clinical onset and monitoring of β-pancreatic cell loss is emerging. Here, we focused on studying circulating cell-free DNA (ccfDNA) as a liquid biopsy biomaterial for accurate diagnosis/monitoring of T2DM., Methods: ccfDNA levels were directly quantified in sera from 96 T2DM patients and 71 healthy individuals via fluorometry, and then fragment DNA size profiling was performed by capillary electrophoresis. Following this, ccfDNA methylation levels of five β-cell-related genes were measured via qPCR. Data were analyzed by automated machine learning to build classifying predictive models., Results: ccfDNA levels were found to be similar between groups but indicative of apoptosis in T2DM. INS (Insulin), IAPP (Islet Amyloid Polypeptide-Amylin), GCK (Glucokinase), and KCNJ11 (Potassium Inwardly Rectifying Channel Subfamily J member 11) levels differed significantly between groups. AutoML analysis delivered biosignatures including GCK , IAPP and KCNJ11 methylation, with the highest ever reported discriminating performance of T2DM from healthy individuals (AUC 0.927)., Conclusions: Our data unravel the value of ccfDNA as a minimally invasive biomaterial carrying important clinical information for T2DM. Upon prospective clinical evaluation, the built biosignature can be disruptive for T2DM clinical management.
- Published
- 2022
- Full Text
- View/download PDF
42. Automated machine learning optimizes and accelerates predictive modeling from COVID-19 high throughput datasets.
- Author
-
Papoutsoglou G, Karaglani M, Lagani V, Thomson N, Røe OD, Tsamardinos I, and Chatzaki E
- Subjects
- Biomarkers blood, COVID-19 genetics, COVID-19 pathology, Computer Simulation, Databases, Factual, Databases, Genetic, Databases, Protein, Gene Expression Profiling, Humans, Immunity, Innate genetics, Interferon-gamma blood, Metabolomics, Prognosis, Proteomics, ROC Curve, SARS-CoV-2 genetics, Severity of Illness Index, Signal Transduction genetics, Signal Transduction immunology, Software, COVID-19 diagnosis, COVID-19 metabolism, Immunity, Innate immunology, Machine Learning, SARS-CoV-2 metabolism
- Abstract
COVID-19 outbreak brings intense pressure on healthcare systems, with an urgent demand for effective diagnostic, prognostic and therapeutic procedures. Here, we employed Automated Machine Learning (AutoML) to analyze three publicly available high throughput COVID-19 datasets, including proteomic, metabolomic and transcriptomic measurements. Pathway analysis of the selected features was also performed. Analysis of a combined proteomic and metabolomic dataset led to 10 equivalent signatures of two features each, with AUC 0.840 (CI 0.723-0.941) in discriminating severe from non-severe COVID-19 patients. A transcriptomic dataset led to two equivalent signatures of eight features each, with AUC 0.914 (CI 0.865-0.955) in identifying COVID-19 patients from those with a different acute respiratory illness. Another transcriptomic dataset led to two equivalent signatures of nine features each, with AUC 0.967 (CI 0.899-0.996) in identifying COVID-19 patients from virus-free individuals. Signature predictive performance remained high upon validation. Multiple new features emerged and pathway analysis revealed biological relevance by implication in Viral mRNA Translation, Interferon gamma signaling and Innate Immune System pathways. In conclusion, AutoML analysis led to multiple biosignatures of high predictive performance, with reduced features and large choice of alternative predictors. These favorable characteristics are eminent for development of cost-effective assays to contribute to better disease management., (© 2021. The Author(s).)
- Published
- 2021
- Full Text
- View/download PDF
43. Forecasting military mental health in a complete sample of Danish military personnel deployed between 1992-2013.
- Author
-
Nissen LR, Tsamardinos I, Eskelund K, Gradus JL, Andersen SB, and Karstoft KI
- Subjects
- Afghan Campaign 2001-, Denmark epidemiology, Humans, Logistic Models, Mental Health, Risk Factors, Mental Disorders diagnosis, Mental Disorders epidemiology, Military Personnel, Stress Disorders, Post-Traumatic
- Abstract
Objective: Mental health problems (MHP) are a relatively common consequence of deployment to war zones. Early identification of those at risk of post-deployment MHP would improve prevention efforts. However, screening instruments based on linear models have not been successful. Machine learning (ML) has shown promise for providing the methodological frame for better prognostic models., Methods: The study population was all Danish military personnel deployed for the first time between January 1, 1992 and December 31, 2013. From extensive registry data, 21 pre- or at-deployment predictors comprising early adversity, social, clinical and demographic variables were used to predict psychiatric contacts (psychiatric diagnosis and/or use of psychotropic medicine) occurring within 6.5 years after homecoming. Four supervised ML methods (penalized logistic regression, random forests, support vector machines and gradient boosting machines) were compared in ability to classify those with high risk of post-deployment MHP and those without., Results: Of 27594 subjects, 2175 (8%) had a psychiatric contact. All four ML methods applied had performances well above chance (Area under the Receiver-operating Curve 0.62-0.68). Positive predictive value for the best model was 0.16. A range of pre-deployment factors were found to be predictive of post-deployment psychiatric contacts., Conclusions: ML methods can be useful in early identification of soldiers with high risk of MPH in the years following their first deployment. However, performances were modest and positive predictive values were low, limiting the applicability of the models for pre-deployment screening. Future studies should include neurobiological data and deployment experiences to increase accuracy of the models., (Copyright © 2021. Published by Elsevier B.V.)
- Published
- 2021
- Full Text
- View/download PDF
44. Prediction of outcome in patients with non-small cell lung cancer treated with second line PD-1/PDL-1 inhibitors based on clinical parameters: Results from a prospective, single institution study.
- Author
-
Rounis K, Makrakis D, Papadaki C, Monastirioti A, Vamvakas L, Kalbakis K, Gourlia K, Xanthopoulos I, Tsamardinos I, Mavroudis D, and Agelaki S
- Subjects
- Adult, Aged, Aged, 80 and over, Carcinoma, Non-Small-Cell Lung mortality, Female, Follow-Up Studies, Humans, Kaplan-Meier Estimate, Lung Neoplasms mortality, Machine Learning, Male, Middle Aged, Prognosis, Progression-Free Survival, Prospective Studies, Anti-Bacterial Agents adverse effects, B7-H1 Antigen antagonists & inhibitors, Bone Neoplasms secondary, Carcinoma, Non-Small-Cell Lung drug therapy, Carcinoma, Non-Small-Cell Lung pathology, Immune Checkpoint Inhibitors administration & dosage, Immunotherapy methods, Liver Neoplasms secondary, Lung Neoplasms drug therapy, Lung Neoplasms pathology, Programmed Cell Death 1 Receptor antagonists & inhibitors
- Abstract
Objective: We prospectively recorded clinical and laboratory parameters from patients with metastatic non-small cell lung cancer (NSCLC) treated with 2nd line PD-1/PD-L1 inhibitors in order to address their effect on treatment outcomes., Materials and Methods: Clinicopathological information (age, performance status, smoking, body mass index, histology, organs with metastases), use and duration of proton pump inhibitors, steroids and antibiotics (ATB) and laboratory values [neutrophil/lymphocyte ratio, LDH, albumin] were prospectively collected. Steroid administration was defined as the use of > 10 mg prednisone equivalent for ≥ 10 days. Prolonged ATB administration was defined as ATB ≥ 14 days 30 days before or within the first 3 months of treatment. JADBio, a machine learning pipeline was applied for further multivariate analysis., Results: Data from 66 pts with non-oncogenic driven metastatic NSCLC were analyzed; 15.2% experienced partial response (PR), 34.8% stable disease (SD) and 50% progressive disease (PD). Median overall survival (OS) was 6.77 months. ATB administration did not affect patient OS [HR = 1.35 (CI: 0.761-2.406, p = 0.304)], however, prolonged ATBs [HR = 2.95 (CI: 1.62-5.36, p = 0.0001)] and the presence of bone metastases [HR = 1.89 (CI: 1.02-3.51, p = 0.049)] independently predicted for shorter survival. Prolonged ATB administration, bone metastases, liver metastases and BMI < 25 kg/m2 were selected by JADbio as the important features that were associated with increased probability of developing disease progression as response to treatment. The resulting algorithm that was created was able to predict the probability of disease stabilization (PR or SD) in a single individual with an AUC = 0.806 [95% CI:0.714-0.889]., Conclusions: Our results demonstrate an adverse effect of prolonged ATBs on response and survival and underscore their importance along with the presence of bone metastases, liver metastases and low BMI in the individual prediction of outcomes in patients treated with immunotherapy., Competing Interests: The authors have declared that no competing interests exist.
- Published
- 2021
- Full Text
- View/download PDF
45. Deciphering the Methylation Landscape in Breast Cancer: Diagnostic and Prognostic Biosignatures through Automated Machine Learning.
- Author
-
Panagopoulou M, Karaglani M, Manolopoulos VG, Iliopoulos I, Tsamardinos I, and Chatzaki E
- Abstract
DNA methylation plays an important role in breast cancer (BrCa) pathogenesis and could contribute to driving its personalized management. We performed a complete bioinformatic analysis in BrCa whole methylome datasets, analyzed using the Illumina methylation 450 bead-chip array. Differential methylation analysis vs. clinical end-points resulted in 11,176 to 27,786 differentially methylated genes (DMGs). Innovative automated machine learning (AutoML) was employed to construct signatures with translational value. Three highly performing and low-feature-number signatures were built: (1) A 5-gene signature discriminating BrCa patients from healthy individuals (area under the curve (AUC): 0.994 (0.982-1.000)). (2) A 3-gene signature identifying BrCa metastatic disease (AUC: 0.986 (0.921-1.000)). (3) Six equivalent 5-gene signatures diagnosing early disease (AUC: 0.973 (0.920-1.000)). Validation in independent patient groups verified performance. Bioinformatic tools for functional analysis and protein interaction prediction were also employed. All protein encoding features included in the signatures were associated with BrCa-related pathways. Functional analysis of DMGs highlighted the regulation of transcription as the main biological process, the nucleus as the main cellular component and transcription factor activity and sequence-specific DNA binding as the main molecular functions. Overall, three high-performance diagnostic/prognostic signatures were built and are readily available for improving BrCa precision management upon prospective clinical validation. Revisiting archived methylomes through novel bioinformatic approaches revealed significant clarifying knowledge for the contribution of gene methylation events in breast carcinogenesis.
- Published
- 2021
- Full Text
- View/download PDF
46. STATegra: Multi-Omics Data Integration - A Conceptual Scheme With a Bioinformatics Pipeline.
- Author
-
Planell N, Lagani V, Sebastian-Leon P, van der Kloet F, Ewing E, Karathanasis N, Urdangarin A, Arozarena I, Jagodic M, Tsamardinos I, Tarazona S, Conesa A, Tegner J, and Gomez-Cabrero D
- Abstract
Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. Therefore, it is an unmet need to conceptualize how to integrate such data and implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining available multi-omic anlaysis tools (machine learning component analysis, non-parametric data combination, and a multi-omics exploratory analysis) in a step-wise manner. While in several studies, we have previously combined those integrative tools, here, we provide a systematic description of the STATegra framework and its validation using two The Cancer Genome Atlas (TCGA) case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma (SKCM) cases, we demonstrate an enhanced capacity of the framework (and beyond the individual tools) to identify features and pathways compared to single-omics analysis. Such an integrative multi-omics analysis framework for identifying features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATeg R a Bioconductor package., Competing Interests: VL and IT were employed by Gnosis Data Analysis P.C., Greece. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest., (Copyright © 2021 Planell, Lagani, Sebastian-Leon, van der Kloet, Ewing, Karathanasis, Urdangarin, Arozarena, Jagodic, Tsamardinos, Tarazona, Conesa, Tegner and Gomez-Cabrero.)
- Published
- 2021
- Full Text
- View/download PDF
47. Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment.
- Author
-
Marcos-Zambrano LJ, Karaduzovic-Hadziabdic K, Loncar Turukalo T, Przymus P, Trajkovik V, Aasmets O, Berland M, Gruca A, Hasic J, Hron K, Klammsteiner T, Kolev M, Lahti L, Lopes MB, Moreno V, Naskinova I, Org E, Paciência I, Papoutsoglou G, Shigdel R, Stres B, Vilne B, Yousef M, Zdravevski E, Tsamardinos I, Carrillo de Santa Pau E, Claesson MJ, Moreno-Indias I, and Truu J
- Abstract
The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach., Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest., (Copyright © 2021 Marcos-Zambrano, Karaduzovic-Hadziabdic, Loncar Turukalo, Przymus, Trajkovik, Aasmets, Berland, Gruca, Hasic, Hron, Klammsteiner, Kolev, Lahti, Lopes, Moreno, Naskinova, Org, Paciência, Papoutsoglou, Shigdel, Stres, Vilne, Yousef, Zdravevski, Tsamardinos, Carrillo de Santa Pau, Claesson, Moreno-Indias and Truu.)
- Published
- 2021
- Full Text
- View/download PDF
48. Extending greedy feature selection algorithms to multiple solutions.
- Author
-
Borboudakis G and Tsamardinos I
- Abstract
Most feature selection methods identify only a single solution. This is acceptable for predictive purposes, but is not sufficient for knowledge discovery if multiple solutions exist. We propose a strategy to extend a class of greedy methods to efficiently identify multiple solutions, and show under which conditions it identifies all solutions. We also introduce a taxonomy of features that takes the existence of multiple solutions into account. Furthermore, we explore different definitions of statistical equivalence of solutions, as well as methods for testing equivalence. A novel algorithm for compactly representing and visualizing multiple solutions is also introduced. In experiments we show that (a) the proposed algorithm is significantly more computationally efficient than the TIE* algorithm, the only alternative approach with similar theoretical guarantees, while identifying similar solutions to it, and (b) that the identified solutions have similar predictive performance., (© The Author(s) 2021.)
- Published
- 2021
- Full Text
- View/download PDF
49. Accurate Blood-Based Diagnostic Biosignatures for Alzheimer's Disease via Automated Machine Learning.
- Author
-
Karaglani M, Gourlia K, Tsamardinos I, and Chatzaki E
- Abstract
Alzheimer's disease (AD) is the most common form of neurodegenerative dementia and its timely diagnosis remains a major challenge in biomarker discovery. In the present study, we analyzed publicly available high-throughput low-sample -omics datasets from studies in AD blood, by the AutoML technology Just Add Data Bio (JADBIO), to construct accurate predictive models for use as diagnostic biosignatures. Considering data from AD patients and age-sex matched cognitively healthy individuals, we produced three best performing diagnostic biosignatures specific for the presence of AD: A. A 506-feature transcriptomic dataset from 48 AD and 22 controls led to a miRNA-based biosignature via Support Vector Machines with three miRNA predictors (AUC 0.975 (0.906, 1.000)), B. A 38,327-feature transcriptomic dataset from 134 AD and 100 controls led to six mRNA-based statistically equivalent signatures via Classification Random Forests with 25 mRNA predictors (AUC 0.846 (0.778, 0.905)) and C. A 9483-feature proteomic dataset from 25 AD and 37 controls led to a protein-based biosignature via Ridge Logistic Regression with seven protein predictors (AUC 0.921 (0.849, 0.972)). These performance metrics were also validated through the JADBIO pipeline confirming stability. In conclusion, using the automated machine learning tool JADBIO, we produced accurate predictive biosignatures extrapolating available low sample -omics data. These results offer options for minimally invasive blood-based diagnostic tests for AD, awaiting clinical validation based on respective laboratory assays. They also highlight the value of AutoML in biomarker discovery.
- Published
- 2020
- Full Text
- View/download PDF
50. Applicability of an Automated Model and Parameter Selection in the Prediction of Screening-Level PTSD in Danish Soldiers Following Deployment: Development Study of Transferable Predictive Models Using Automated Machine Learning.
- Author
-
Karstoft KI, Tsamardinos I, Eskelund K, Andersen SB, and Nissen LR
- Abstract
Background: Posttraumatic stress disorder (PTSD) is a relatively common consequence of deployment to war zones. Early postdeployment screening with the aim of identifying those at risk for PTSD in the years following deployment will help deliver interventions to those in need but have so far proved unsuccessful., Objective: This study aimed to test the applicability of automated model selection and the ability of automated machine learning prediction models to transfer across cohorts and predict screening-level PTSD 2.5 years and 6.5 years after deployment., Methods: Automated machine learning was applied to data routinely collected 6-8 months after return from deployment from 3 different cohorts of Danish soldiers deployed to Afghanistan in 2009 (cohort 1, N=287 or N=261 depending on the timing of the outcome assessment), 2010 (cohort 2, N=352), and 2013 (cohort 3, N=232)., Results: Models transferred well between cohorts. For screening-level PTSD 2.5 and 6.5 years after deployment, random forest models provided the highest accuracy as measured by area under the receiver operating characteristic curve (AUC): 2.5 years, AUC=0.77, 95% CI 0.71-0.83; 6.5 years, AUC=0.78, 95% CI 0.73-0.83. Linear models performed equally well. Military rank, hyperarousal symptoms, and total level of PTSD symptoms were highly predictive., Conclusions: Automated machine learning provided validated models that can be readily implemented in future deployment cohorts in the Danish Defense with the aim of targeting postdeployment support interventions to those at highest risk for developing PTSD, provided the cohorts are deployed on similar missions., (©Karen-Inge Karstoft, Ioannis Tsamardinos, Kasper Eskelund, Søren Bo Andersen, Lars Ravnborg Nissen. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 22.07.2020.)
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.