45 results on '"Hauschild, AC"'
Search Results
2. Alles besser mit KI? Ein Vergleich von ML-Methoden und klassischen Regressionsmodellen zur Vorhersage poststationärer Ereignisse
- Author
-
Pollmann, T, Weller, L, Starke, P, Metsch, J, Maurer, M, Ritter, Z, Hauschild, AC, Kretzler, M, Grobe, T, Pollmann, T, Weller, L, Starke, P, Metsch, J, Maurer, M, Ritter, Z, Hauschild, AC, Kretzler, M, and Grobe, T
- Published
- 2024
3. Framework for Federated Artificial Intelligence for the Optimization of Pancreatic Cancer Treatment
- Author
-
Park, Y, Hügel, J, Beyer, N, Rheinländer, S, Chereda, H, Fricke, L, Middeke, M, Reichert, M, Buchholz, M, Lauth, M, Schneider, G, Hessmann, E, Beißbarth, T, Sax, U, Hauschild, AC, Park, Y, Hügel, J, Beyer, N, Rheinländer, S, Chereda, H, Fricke, L, Middeke, M, Reichert, M, Buchholz, M, Lauth, M, Schneider, G, Hessmann, E, Beißbarth, T, Sax, U, and Hauschild, AC
- Published
- 2023
4. Data Exploration for Cancer Patients based on Clinical and Genomic Similarities
- Author
-
Schneider, J, Hügel, J, Sax, U, Hauschild, AC, Schneider, J, Hügel, J, Sax, U, and Hauschild, AC
- Published
- 2023
5. Analyzing a Deep Learning Model for 12-Lead ECG Classification with Explainable AI
- Author
-
Bender, T, Beinecke, J, Hauschild, AC, Krefting, D, and Spicher, N
- Subjects
ddc: 610 ,explainable artificial intelligence ,Medicine and health ,deep learning ,atrial fibrillation ,left bundle branch block ,electrocardiogram - Abstract
Introduction: Currently, an increasing number of algorithms for biosignal classification is developed, with deep neural networks (DNNs) accounting for a significant percentage [ref:1]. Contrary to traditional signal processing methods using handcrafted features, DNNs provide a data-driven [for full text, please go to the a.m. URL]
- Published
- 2022
- Full Text
- View/download PDF
6. Multivariate machine learning can improve detection of prostate cancer recurrence on electronic health records
- Author
-
Beinecke, J, Anders, P, Schurrat, T, Heider, D, Luster, M, Librizzi, D, Hauschild, AC, Beinecke, J, Anders, P, Schurrat, T, Heider, D, Luster, M, Librizzi, D, and Hauschild, AC
- Published
- 2022
7. Potential Applications of Transfer Learning in Limited Biomedical Data
- Author
-
Park, Y, Hauschild, AC, Heider, D, Park, Y, Hauschild, AC, and Heider, D
- Published
- 2022
8. CODEX: COunterfactual Deep learning for the in silico EXploration of cancer cell line perturbations.
- Author
-
Schrod S, Zacharias HU, Beißbarth T, Hauschild AC, and Altenbuchinger M
- Subjects
- Humans, Cell Line, Tumor, High-Throughput Screening Assays methods, Neoplasms metabolism, Computational Biology methods, Software, Antineoplastic Agents pharmacology, Deep Learning, Computer Simulation
- Abstract
Motivation: High-throughput screens (HTS) provide a powerful tool to decipher the causal effects of chemical and genetic perturbations on cancer cell lines. Their ability to evaluate a wide spectrum of interventions, from single drugs to intricate drug combinations and CRISPR-interference, has established them as an invaluable resource for the development of novel therapeutic approaches. Nevertheless, the combinatorial complexity of potential interventions makes a comprehensive exploration intractable. Hence, prioritizing interventions for further experimental investigation becomes of utmost importance., Results: We propose CODEX (COunterfactual Deep learning for the in silico EXploration of cancer cell line perturbations) as a general framework for the causal modeling of HTS data, linking perturbations to their downstream consequences. CODEX relies on a stringent causal modeling strategy based on counterfactual reasoning. As such, CODEX predicts drug-specific cellular responses, comprising cell survival and molecular alterations, and facilitates the in silico exploration of drug combinations. This is achieved for both bulk and single-cell HTS. We further show that CODEX provides a rationale to explore complex genetic modifications from CRISPR-interference in silico in single cells., Availability and Implementation: Our implementation of CODEX is publicly available at https://github.com/sschrod/CODEX. All data used in this article are publicly available., (© The Author(s) 2024. Published by Oxford University Press.)
- Published
- 2024
- Full Text
- View/download PDF
9. The effect of data transformation on low-dimensional integration of single-cell RNA-seq.
- Author
-
Park Y and Hauschild AC
- Subjects
- Humans, Algorithms, Cluster Analysis, Neural Networks, Computer, RNA-Seq methods, Single-Cell Gene Expression Analysis methods
- Abstract
Background: Recent developments in single-cell RNA sequencing have opened up a multitude of possibilities to study tissues at the level of cellular populations. However, the heterogeneity in single-cell sequencing data necessitates appropriate procedures to adjust for technological limitations and various sources of noise when integrating datasets from different studies. While many analysis procedures employ various preprocessing steps, they often overlook the importance of selecting and optimizing the employed data transformation methods., Results: This work investigates data transformation approaches used in single-cell clustering analysis tools and their effects on batch integration analysis. In particular, we compare 16 transformations and their impact on the low-dimensional representations, aiming to reduce the batch effect and integrate multiple single-cell sequencing data. Our results show that data transformations strongly influence the results of single-cell clustering on low-dimensional data space, such as those generated by UMAP or PCA. Moreover, these changes in low-dimensional space significantly affect trajectory analysis using multiple datasets, as well. However, the performance of the data transformations greatly varies across datasets, and the optimal method was different for each dataset. Additionally, we explored how data transformation impacts the analysis of deep feature encodings using deep neural network-based models, including autoencoder-based models and proto-typical networks. Data transformation also strongly affects the outcome of deep neural network models., Conclusions: Our findings suggest that the batch effect and noise in integrative analysis are highly influenced by data transformation. Low-dimensional features can integrate different batches well when proper data transformation is applied. Furthermore, we found that the batch mixing score on low-dimensional space can guide the selection of the optimal data transformation. In conclusion, data preprocessing is one of the most crucial analysis steps and needs to be cautiously considered in the integrative analysis of multiple scRNA-seq datasets., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
10. CLARUS: An interactive explainable AI platform for manual counterfactuals in graph neural networks.
- Author
-
Metsch JM, Saranti A, Angerschmid A, Pfeifer B, Klemt V, Holzinger A, and Hauschild AC
- Subjects
- Humans, Artificial Intelligence, Neural Networks, Computer, Algorithms, Tolnaftate, Decision Support Systems, Clinical, Physicians
- Abstract
Background: Lack of trust in artificial intelligence (AI) models in medicine is still the key blockage for the use of AI in clinical decision support systems (CDSS). Although AI models are already performing excellently in systems medicine, their black-box nature entails that patient-specific decisions are incomprehensible for the physician. Explainable AI (XAI) algorithms aim to "explain" to a human domain expert, which input features influenced a specific recommendation. However, in the clinical domain, these explanations must lead to some degree of causal understanding by a clinician., Results: We developed the CLARUS platform, aiming to promote human understanding of graph neural network (GNN) predictions. CLARUS enables the visualisation of patient-specific networks, as well as, relevance values for genes and interactions, computed by XAI methods, such as GNNExplainer. This enables domain experts to gain deeper insights into the network and more importantly, the expert can interactively alter the patient-specific network based on the acquired understanding and initiate re-prediction or retraining. This interactivity allows us to ask manual counterfactual questions and analyse the effects on the GNN prediction., Conclusion: We present the first interactive XAI platform prototype, CLARUS, that allows not only the evaluation of specific human counterfactual questions based on user-defined alterations of patient networks and a re-prediction of the clinical outcome but also a retraining of the entire GNN after changing the underlying graph structures. The platform is currently hosted by the GWDG on https://rshiny.gwdg.de/apps/clarus/., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2024 The Author(s). Published by Elsevier Inc. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
11. A primer on the use of machine learning to distil knowledge from data in biological psychiatry.
- Author
-
Quinn TP, Hess JL, Marshe VS, Barnett MM, Hauschild AC, Maciukiewicz M, Elsheikh SSM, Men X, Schwarz E, Trakadis YJ, Breen MS, Barnett EJ, Zhang-James Y, Ahsen ME, Cao H, Chen J, Hou J, Salekin A, Lin PI, Nicodemus KK, Meyer-Lindenberg A, Bichindaritz I, Faraone SV, Cairns MJ, Pandey G, Müller DJ, and Glatt SJ
- Subjects
- Humans, Psychiatry methods, Biomedical Research methods, Machine Learning, Biological Psychiatry methods
- Abstract
Applications of machine learning in the biomedical sciences are growing rapidly. This growth has been spurred by diverse cross-institutional and interdisciplinary collaborations, public availability of large datasets, an increase in the accessibility of analytic routines, and the availability of powerful computing resources. With this increased access and exposure to machine learning comes a responsibility for education and a deeper understanding of its bases and bounds, borne equally by data scientists seeking to ply their analytic wares in medical research and by biomedical scientists seeking to harness such methods to glean knowledge from data. This article provides an accessible and critical review of machine learning for a biomedically informed audience, as well as its applications in psychiatry. The review covers definitions and expositions of commonly used machine learning methods, and historical trends of their use in psychiatry. We also provide a set of standards, namely Guidelines for REporting Machine Learning Investigations in Neuropsychiatry (GREMLIN), for designing and reporting studies that use machine learning as a primary data-analysis approach. Lastly, we propose the establishment of the Machine Learning in Psychiatry (MLPsych) Consortium, enumerate its objectives, and identify areas of opportunity for future applications of machine learning in biological psychiatry. This review serves as a cautiously optimistic primer on machine learning for those on the precipice as they prepare to dive into the field, either as methodological practitioners or well-informed consumers., (© 2023. The Author(s), under exclusive licence to Springer Nature Limited.)
- Published
- 2024
- Full Text
- View/download PDF
12. Species-agnostic transfer learning for cross-species transcriptomics data integration without gene orthology.
- Author
-
Park Y, Muttray NP, and Hauschild AC
- Subjects
- Mice, Humans, Animals, Gene Expression Profiling, Species Specificity, Machine Learning, Algorithms, Zebrafish genetics
- Abstract
Novel hypotheses in biomedical research are often developed or validated in model organisms such as mice and zebrafish and thus play a crucial role. However, due to biological differences between species, translating these findings into human applications remains challenging. Moreover, commonly used orthologous gene information is often incomplete and entails a significant information loss during gene-id conversion. To address these issues, we present a novel methodology for species-agnostic transfer learning with heterogeneous domain adaptation. We extended the cross-domain structure-preserving projection toward out-of-sample prediction. Our approach not only allows knowledge integration and translation across various species without relying on gene orthology but also identifies similar GO among the most influential genes composing the latent space for integration. Subsequently, during the alignment of latent spaces, each composed of species-specific genes, it is possible to identify functional annotations of genes missing from public orthology databases. We evaluated our approach with four different single-cell sequencing datasets focusing on cell-type prediction and compared it against related machine-learning approaches. In summary, the developed model outperforms related methods working without prior knowledge when predicting unseen cell types based on other species' data. The results demonstrate that our novel approach allows knowledge transfer beyond species barriers without the dependency on known gene orthology but utilizing the entire gene sets., (© The Author(s) 2024. Published by Oxford University Press.)
- Published
- 2024
- Full Text
- View/download PDF
13. Ensemble-GNN: federated ensemble learning with graph neural networks for disease module discovery and classification.
- Author
-
Pfeifer B, Chereda H, Martin R, Saranti A, Clemens S, Hauschild AC, Beißbarth T, Holzinger A, and Heider D
- Subjects
- Humans, Neural Networks, Computer, Protein Interaction Maps, Software, DNA Methylation, Machine Learning
- Abstract
Summary: Federated learning enables collaboration in medicine, where data is scattered across multiple centers without the need to aggregate the data in a central cloud. While, in general, machine learning models can be applied to a wide range of data types, graph neural networks (GNNs) are particularly developed for graphs, which are very common in the biomedical domain. For instance, a patient can be represented by a protein-protein interaction (PPI) network where the nodes contain the patient-specific omics features. Here, we present our Ensemble-GNN software package, which can be used to deploy federated, ensemble-based GNNs in Python. Ensemble-GNN allows to quickly build predictive models utilizing PPI networks consisting of various node features such as gene expression and/or DNA methylation. We exemplary show the results from a public dataset of 981 patients and 8469 genes from the Cancer Genome Atlas (TCGA)., Availability and Implementation: The source code is available at https://github.com/pievos101/Ensemble-GNN, and the data at Zenodo (DOI: 10.5281/zenodo.8305122)., (© The Author(s) 2023. Published by Oxford University Press.)
- Published
- 2023
- Full Text
- View/download PDF
14. The FeatureCloud Platform for Federated Learning in Biomedicine: Unified Approach.
- Author
-
Matschinske J, Späth J, Bakhtiari M, Probul N, Kazemi Majdabadi MM, Nasirigerdeh R, Torkzadehmahani R, Hartebrodt A, Orban BA, Fejér SJ, Zolotareva O, Das S, Baumbach L, Pauling JK, Tomašević O, Bihari B, Bloice M, Donner NC, Fdhila W, Frisch T, Hauschild AC, Heider D, Holzinger A, Hötzendorfer W, Hospes J, Kacprowski T, Kastelitz M, List M, Mayer R, Moga M, Müller H, Pustozerova A, Röttger R, Saak CC, Saranti A, Schmidt HHHW, Tschohl C, Wenke NK, and Baumbach J
- Subjects
- Humans, Health Occupations, Software, Computer Communication Networks, Privacy, Artificial Intelligence, Algorithms
- Abstract
Background: Machine learning and artificial intelligence have shown promising results in many areas and are driven by the increasing amount of available data. However, these data are often distributed across different institutions and cannot be easily shared owing to strict privacy regulations. Federated learning (FL) allows the training of distributed machine learning models without sharing sensitive data. In addition, the implementation is time-consuming and requires advanced programming skills and complex technical infrastructures., Objective: Various tools and frameworks have been developed to simplify the development of FL algorithms and provide the necessary technical infrastructure. Although there are many high-quality frameworks, most focus only on a single application case or method. To our knowledge, there are no generic frameworks, meaning that the existing solutions are restricted to a particular type of algorithm or application field. Furthermore, most of these frameworks provide an application programming interface that needs programming knowledge. There is no collection of ready-to-use FL algorithms that are extendable and allow users (eg, researchers) without programming knowledge to apply FL. A central FL platform for both FL algorithm developers and users does not exist. This study aimed to address this gap and make FL available to everyone by developing FeatureCloud, an all-in-one platform for FL in biomedicine and beyond., Methods: The FeatureCloud platform consists of 3 main components: a global frontend, a global backend, and a local controller. Our platform uses a Docker to separate the local acting components of the platform from the sensitive data systems. We evaluated our platform using 4 different algorithms on 5 data sets for both accuracy and runtime., Results: FeatureCloud removes the complexity of distributed systems for developers and end users by providing a comprehensive platform for executing multi-institutional FL analyses and implementing FL algorithms. Through its integrated artificial intelligence store, federated algorithms can easily be published and reused by the community. To secure sensitive raw data, FeatureCloud supports privacy-enhancing technologies to secure the shared local models and assures high standards in data privacy to comply with the strict General Data Protection Regulation. Our evaluation shows that applications developed in FeatureCloud can produce highly similar results compared with centralized approaches and scale well for an increasing number of participating sites., Conclusions: FeatureCloud provides a ready-to-use platform that integrates the development and execution of FL algorithms while reducing the complexity to a minimum and removing the hurdles of federated infrastructure. Thus, we believe that it has the potential to greatly increase the accessibility of privacy-preserving and distributed data analyses in biomedicine and beyond., (©Julian Matschinske, Julian Späth, Mohammad Bakhtiari, Niklas Probul, Mohammad Mahdi Kazemi Majdabadi, Reza Nasirigerdeh, Reihaneh Torkzadehmahani, Anne Hartebrodt, Balazs-Attila Orban, Sándor-József Fejér, Olga Zolotareva, Supratim Das, Linda Baumbach, Josch K Pauling, Olivera Tomašević, Béla Bihari, Marcus Bloice, Nina C Donner, Walid Fdhila, Tobias Frisch, Anne-Christin Hauschild, Dominik Heider, Andreas Holzinger, Walter Hötzendorfer, Jan Hospes, Tim Kacprowski, Markus Kastelitz, Markus List, Rudolf Mayer, Mónika Moga, Heimo Müller, Anastasia Pustozerova, Richard Röttger, Christina C Saak, Anna Saranti, Harald H H W Schmidt, Christof Tschohl, Nina K Wenke, Jan Baumbach. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 12.07.2023.)
- Published
- 2023
- Full Text
- View/download PDF
15. Analysis of a Deep Learning Model for 12-Lead ECG Classification Reveals Learned Features Similar to Diagnostic Criteria.
- Author
-
Bender T, Beinecke JM, Krefting D, Muller C, Dathe H, Seidler T, Spicher N, and Hauschild AC
- Abstract
Despite their remarkable performance, deep neural networks remain unadopted in clinical practice, which is considered to be partially due to their lack of explainability. In this work, we apply explainable attribution methods to a pre-trained deep neural network for abnormality classification in 12-lead electrocardiography to open this "black box" and understand the relationship between model prediction and learned features. We classify data from two public databases (CPSC 2018, PTB-XL) and the attribution methods assign a "relevance score" to each sample of the classified signals. This allows analyzing what the network learned during training, for which we propose quantitative methods: average relevance scores over a) classes, b) leads, and c) average beats. The analyses of relevance scores for atrial fibrillation and left bundle branch block compared to healthy controls show that their mean values a) increase with higher classification probability and correspond to false classifications when around zero, and b) correspond to clinical recommendations regarding which lead to consider. Furthermore, c) visible P-waves and concordant T-waves result in clearly negative relevance scores in atrial fibrillation and left bundle branch block classification, respectively. Results are similar across both databases despite differences in study population and hardware. In summary, our analysis suggests that the DNN learned features similar to cardiology textbook knowledge.
- Published
- 2023
- Full Text
- View/download PDF
16. MirDIP 5.2: tissue context annotation and novel microRNA curation.
- Author
-
Hauschild AC, Pastrello C, Ekaputeri GKA, Bethune-Waddell D, Abovsky M, Ahmed Z, Kotlyar M, Lu R, and Jurisica I
- Subjects
- Humans, Algorithms, Databases, Nucleic Acid, Epistasis, Genetic, Molecular Sequence Annotation, Data Curation, MicroRNAs genetics, MicroRNAs metabolism
- Abstract
MirDIP is a well-established database that aggregates microRNA-gene human interactions from multiple databases to increase coverage, reduce bias, and improve usability by providing an integrated score proportional to the probability of the interaction occurring. In version 5.2, we removed eight outdated resources, added a new resource (miRNATIP), and ran five prediction algorithms for miRBase and mirGeneDB. In total, mirDIP 5.2 includes 46 364 047 predictions for 27 936 genes and 2734 microRNAs, making it the first database to provide interactions using data from mirGeneDB. Moreover, we curated and integrated 32 497 novel microRNAs from 14 publications to accelerate the use of these novel data. In this release, we also extend the content and functionality of mirDIP by associating contexts with microRNAs, genes, and microRNA-gene interactions. We collected and processed microRNA and gene expression data from 20 resources and acquired information on 330 tissue and disease contexts for 2657 microRNAs, 27 576 genes and 123 651 910 gene-microRNA-tissue interactions. Finally, we improved the usability of mirDIP by enabling the user to search the database using precursor IDs, and we integrated miRAnno, a network-based tool for identifying pathways linked to specific microRNAs. We also provide a mirDIP API to facilitate access to its integrated predictions. Updated mirDIP is available at https://ophid.utoronto.ca/mirDIP., (© The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2023
- Full Text
- View/download PDF
17. Guideline for software life cycle in health informatics.
- Author
-
Hauschild AC, Martin R, Holst SC, Wienbeck J, and Heider D
- Abstract
The long-lasting trend of medical informatics is to adapt novel technologies in the medical context. In particular, incorporating artificial intelligence to support clinical decision-making can significantly improve monitoring, diagnostics, and prognostics for the patient's and medic's sake. However, obstacles hinder a timely technology transfer from research to the clinic. Due to the pressure for novelty in the research context, projects rarely implement quality standards. Here, we propose a guideline for academic software life cycle processes tailored to the needs and capabilities of research organizations. While the complete implementation of a software life cycle according to commercial standards is not feasible in scientific work, we propose a subset of elements that we are convinced will provide a significant benefit while keeping the effort within a feasible range. Ultimately, the emerging quality checks for academic software development can pave the way for an accelerated deployment of academic advances in clinical practice., Competing Interests: The authors declare no competing interests., (© 2022 The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
18. Editorial: Computational systems biomedicine.
- Author
-
Batra R, Baloni P, Alcaraz N, Hauschild AC, and Cervera A
- Abstract
Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
- Published
- 2022
- Full Text
- View/download PDF
19. Federated Random Forests can improve local performance of predictive models for various healthcare applications.
- Author
-
Hauschild AC, Lemanczyk M, Matschinske J, Frisch T, Zolotareva O, Holzinger A, Baumbach J, and Heider D
- Subjects
- Machine Learning, Precision Medicine, Delivery of Health Care, Random Forest, Privacy
- Abstract
Motivation: Limited data access has hindered the field of precision medicine from exploring its full potential, e.g. concerning machine learning and privacy and data protection rules.Our study evaluates the efficacy of federated Random Forests (FRF) models, focusing particularly on the heterogeneity within and between datasets. We addressed three common challenges: (i) number of parties, (ii) sizes of datasets and (iii) imbalanced phenotypes, evaluated on five biomedical datasets., Results: The FRF outperformed the average local models and performed comparably to the data-centralized models trained on the entire data. With an increasing number of models and decreasing dataset size, the performance of local models decreases drastically. The FRF, however, do not decrease significantly. When combining datasets of different sizes, the FRF vastly improve compared to the average local models. We demonstrate that the FRF remain more robust and outperform the local models by analyzing different class-imbalances.Our results support that FRF overcome boundaries of clinical research and enables collaborations across institutes without violating privacy or legal regulations. Clinicians benefit from a vast collection of unbiased data aggregated from different geographic locations, demographics and other varying factors. They can build more generalizable models to make better clinical decisions, which will have relevance, especially for patients in rural areas and rare or geographically uncommon diseases, enabling personalized treatment. In combination with secure multi-party computation, federated learning has the power to revolutionize clinical practice by increasing the accuracy and robustness of healthcare AI and thus paving the way for precision medicine., Availability and Implementation: The implementation of the federated random forests can be found at https://featurecloud.ai/., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.)
- Published
- 2022
- Full Text
- View/download PDF
20. Evaluation of machine learning strategies for imaging confirmed prostate cancer recurrence prediction on electronic health records.
- Author
-
Beinecke JM, Anders P, Schurrat T, Heider D, Luster M, Librizzi D, and Hauschild AC
- Abstract
Background: The main screening parameter to monitor prostate cancer recurrence (PCR) after primary treatment is the serum concentration of prostate-specific antigen (PSA). In recent years, Ga-68-PSMA PET/CT has become an important method for additional diagnostics in patients with biochemical recurrence., Purpose: While Ga-68-PSMA PET/CT performs better, it is an expensive, invasive, and time-consuming examination. Therefore, in this study, we aim to employ modern multivariate Machine Learning (ML) methods on electronic health records (EHR) of prostate cancer patients to improve the prediction of imaging confirmed PCR (IPCR)., Methods: We retrospectively analyzed the clinical information of 272 patients, who were examined using Ga-68-PSMA PET/CT. The PSA values ranged from 0 ng/mL to 2270.38 ng/mL with a median PSA level at 1.79 ng/mL. We performed a descriptive analysis using Logistic Regression. Additionally, we evaluated the predictive performance of Logistic Regression, Support Vector Machine, Gradient Boosting, and Random Forest. Finally, we assessed the importance of all features using Ensemble Feature Selection (EFS)., Results: The descriptive analysis found significant associations between IPCR and logarithmic PSA values as well as between IPCR and performed hormonal therapy. Our models were able to predict IPCR with an AUC score of 0.78 ± 0.13 (mean ± standard deviation) and a sensitivity of 0.997 ± 0.01. Features such as PSA, PSA doubling time, PSA velocity, hormonal therapy, radiation treatment, and injected activity show high importance for IPCR prediction using EFS., Conclusion: This study demonstrates the potential of employing a multitude of parameters into multivariate ML models to improve identification of non-recurring patients compared to the current focus on the main screening parameter (PSA). We showed that ML models are able to predict IPCR, detectable by Ga-68-PSMA PET/CT, and thereby pave the way for optimized early imaging and treatment., (Copyright © 2022 Elsevier Ltd. All rights reserved.)
- Published
- 2022
- Full Text
- View/download PDF
21. Fractal construction of constrained code words for DNA storage systems.
- Author
-
Löchel HF, Welzel M, Hattab G, Hauschild AC, and Heider D
- Subjects
- Computational Biology methods, DNA, Fractals
- Abstract
The use of complex biological molecules to solve computational problems is an emerging field at the interface between biology and computer science. There are two main categories in which biological molecules, especially DNA, are investigated as alternatives to silicon-based computer technologies. One is to use DNA as a storage medium, and the other is to use DNA for computing. Both strategies come with certain constraints. In the current study, we present a novel approach derived from chaos game representation for DNA to generate DNA code words that fulfill user-defined constraints, namely GC content, homopolymers, and undesired motifs, and thus, can be used to build codes for reliable DNA storage systems., (© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2022
- Full Text
- View/download PDF
22. Prediction of antimicrobial resistance based on whole-genome sequencing and machine learning.
- Author
-
Ren Y, Chakraborty T, Doijad S, Falgenhauer L, Falgenhauer J, Goesmann A, Hauschild AC, Schwengers O, and Heider D
- Subjects
- Animals, Humans, Ciprofloxacin, Machine Learning, Genomics, Bacteria genetics, Anti-Bacterial Agents pharmacology, Drug Resistance, Bacterial genetics
- Abstract
Motivation: Antimicrobial resistance (AMR) is one of the biggest global problems threatening human and animal health. Rapid and accurate AMR diagnostic methods are thus very urgently needed. However, traditional antimicrobial susceptibility testing (AST) is time-consuming, low throughput and viable only for cultivable bacteria. Machine learning methods may pave the way for automated AMR prediction based on genomic data of the bacteria. However, comparing different machine learning methods for the prediction of AMR based on different encodings and whole-genome sequencing data without previously known knowledge remains to be done., Results: In this study, we evaluated logistic regression (LR), support vector machine (SVM), random forest (RF) and convolutional neural network (CNN) for the prediction of AMR for the antibiotics ciprofloxacin, cefotaxime, ceftazidime and gentamicin. We could demonstrate that these models can effectively predict AMR with label encoding, one-hot encoding and frequency matrix chaos game representation (FCGR encoding) on whole-genome sequencing data. We trained these models on a large AMR dataset and evaluated them on an independent public dataset. Generally, RFs and CNNs perform better than LR and SVM with AUCs up to 0.96. Furthermore, we were able to identify mutations that are associated with AMR for each antibiotic., Availability and Implementation: Source code in data preparation and model training are provided at GitHub website (https://github.com/YunxiaoRen/ML-iAMR)., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2021. Published by Oxford University Press.)
- Published
- 2022
- Full Text
- View/download PDF
23. Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing.
- Author
-
Park Y, Hauschild AC, and Heider D
- Abstract
Tremendous advances in next-generation sequencing technology have enabled the accumulation of large amounts of omics data in various research areas over the past decade. However, study limitations due to small sample sizes, especially in rare disease clinical research, technological heterogeneity and batch effects limit the applicability of traditional statistics and machine learning analysis. Here, we present a meta-transfer learning approach to transfer knowledge from big data and reduce the search space in data with small sample sizes. Few-shot learning algorithms integrate meta-learning to overcome data scarcity and data heterogeneity by transferring molecular pattern recognition models from datasets of unrelated domains. We explore few-shot learning models with large scale public dataset, TCGA (The Cancer Genome Atlas) and GTEx dataset, and demonstrate their potential as pre-training dataset in other molecular pattern recognition tasks. Our results show that meta-transfer learning is very effective for datasets with a limited sample size. Furthermore, we show that our approach can transfer knowledge across technological heterogeneity, for example, from bulk cell to single-cell data. Our approach can overcome study size constraints, batch effects and technical limitations in analyzing single-cell data by leveraging existing bulk-cell sequencing data., (© The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.)
- Published
- 2021
- Full Text
- View/download PDF
24. Fostering reproducibility, reusability, and technology transfer in health informatics.
- Author
-
Hauschild AC, Eick L, Wienbeck J, and Heider D
- Abstract
Computational methods can transform healthcare. In particular, health informatics with artificial intelligence has shown tremendous potential when applied in various fields of medical research and has opened a new era for precision medicine. The development of reusable biomedical software for research or clinical practice is time-consuming and requires rigorous compliance with quality requirements as defined by international standards. However, research projects rarely implement such measures, hindering smooth technology transfer into the research community or manufacturers as well as reproducibility and reusability. Here, we present a guideline for quality management systems (QMS) for academic organizations incorporating the essential components while confining the requirements to an easily manageable effort. It provides a starting point to implement a QMS tailored to specific needs effortlessly and greatly facilitates technology transfer in a controlled manner, thereby supporting reproducibility and reusability. Ultimately, the emerging standardized workflows can pave the way for an accelerated deployment in clinical practice., Competing Interests: The authors declare no competing interests., (© 2021 The Author(s).)
- Published
- 2021
- Full Text
- View/download PDF
25. Integrative Analysis of Next-Generation Sequencing for Next-Generation Cancer Research toward Artificial Intelligence.
- Author
-
Park Y, Heider D, and Hauschild AC
- Abstract
The rapid improvement of next-generation sequencing (NGS) technologies and their application in large-scale cohorts in cancer research led to common challenges of big data. It opened a new research area incorporating systems biology and machine learning. As large-scale NGS data accumulated, sophisticated data analysis methods became indispensable. In addition, NGS data have been integrated with systems biology to build better predictive models to determine the characteristics of tumors and tumor subtypes. Therefore, various machine learning algorithms were introduced to identify underlying biological mechanisms. In this work, we review novel technologies developed for NGS data analysis, and we describe how these computational methodologies integrate systems biology and omics data. Subsequently, we discuss how deep neural networks outperform other approaches, the potential of graph neural networks (GNN) in systems biology, and the limitations in NGS biomedical research. To reflect on the various challenges and corresponding computational solutions, we will discuss the following three topics: (i) molecular characteristics, (ii) tumor heterogeneity, and (iii) drug discovery. We conclude that machine learning and network-based approaches can add valuable insights and build highly accurate models. However, a well-informed choice of learning algorithm and biological network information is crucial for the success of each specific research question.
- Published
- 2021
- Full Text
- View/download PDF
26. A large-scale comparative study on peptide encodings for biomedical classification.
- Author
-
Spänig S, Mohsen S, Hattab G, Hauschild AC, and Heider D
- Abstract
Owing to the great variety of distinct peptide encodings, working on a biomedical classification task at hand is challenging. Researchers have to determine encodings capable to represent underlying patterns as numerical input for the subsequent machine learning. A general guideline is lacking in the literature, thus, we present here the first large-scale comprehensive study to investigate the performance of a wide range of encodings on multiple datasets from different biomedical domains. For the sake of completeness, we added additional sequence- and structure-based encodings. In particular, we collected 50 biomedical datasets and defined a fixed parameter space for 48 encoding groups, leading to a total of 397 700 encoded datasets. Our results demonstrate that none of the encodings are superior for all biomedical domains. Nevertheless, some encodings often outperform others, thus reducing the initial encoding selection substantially. Our work offers researchers to objectively compare novel encodings to the state of the art. Our findings pave the way for a more sophisticated encoding optimization, for example, as part of automated machine learning pipelines. The work presented here is implemented as a large-scale, end-to-end workflow designed for easy reproducibility and extensibility. All standardized datasets and results are available for download to comply with FAIR standards., (© The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.)
- Published
- 2021
- Full Text
- View/download PDF
27. Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research.
- Author
-
Hufsky F, Lamkiewicz K, Almeida A, Aouacheria A, Arighi C, Bateman A, Baumbach J, Beerenwinkel N, Brandt C, Cacciabue M, Chuguransky S, Drechsel O, Finn RD, Fritz A, Fuchs S, Hattab G, Hauschild AC, Heider D, Hoffmann M, Hölzer M, Hoops S, Kaderali L, Kalvari I, von Kleist M, Kmiecinski R, Kühnert D, Lasso G, Libin P, List M, Löchel HF, Martin MJ, Martin R, Matschinske J, McHardy AC, Mendes P, Mistry J, Navratil V, Nawrocki EP, O'Toole ÁN, Ontiveros-Palacios N, Petrov AI, Rangel-Pineros G, Redaschi N, Reimering S, Reinert K, Reyes A, Richardson L, Robertson DL, Sadegh S, Singer JB, Theys K, Upton C, Welzel M, Williams L, and Marz M
- Subjects
- Biomedical Research, COVID-19 epidemiology, COVID-19 virology, Genome, Viral, Humans, Pandemics, SARS-CoV-2 genetics, COVID-19 prevention & control, Computational Biology, SARS-CoV-2 isolation & purification
- Abstract
SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de., (© The Author(s) 2020. Published by Oxford University Press.)
- Published
- 2021
- Full Text
- View/download PDF
28. Genome-wide analysis suggests the importance of vascular processes and neuroinflammation in late-life antidepressant response.
- Author
-
Marshe VS, Maciukiewicz M, Hauschild AC, Islam F, Qin L, Tiwari AK, Sibille E, Blumberger DM, Karp JF, Flint AJ, Turecki G, Lam RW, Milev RV, Frey BN, Rotzinger S, Foster JA, Kennedy SH, Kennedy JL, Mulsant BH, Reynolds CF 3rd, Lenze EJ, and Müller DJ
- Subjects
- Aged, Antidepressive Agents therapeutic use, Humans, Ion Channels, Male, Multifactorial Inheritance, Venlafaxine Hydrochloride therapeutic use, Depressive Disorder, Major drug therapy, Depressive Disorder, Major genetics, Genome-Wide Association Study
- Abstract
Antidepressant outcomes in older adults with depression is poor, possibly because of comorbidities such as cerebrovascular disease. Therefore, we leveraged multiple genome-wide approaches to understand the genetic architecture of antidepressant response. Our sample included 307 older adults (≥60 years) with current major depression, treated with venlafaxine extended-release for 12 weeks. A standard genome-wide association study (GWAS) was conducted for post-treatment remission status, followed by in silico biological characterization of associated genes, as well as polygenic risk scoring for depression, neurodegenerative and cerebrovascular disease. The top-associated variants for remission status and percentage symptom improvement were PIEZO1 rs12597726 (OR = 0.33 [0.21, 0.51], p = 1.42 × 10
-6 ) and intergenic rs6916777 (Beta = 14.03 [8.47, 19.59], p = 1.25 × 10-6 ), respectively. Pathway analysis revealed significant contributions from genes involved in the ubiquitin-proteasome system, which regulates intracellular protein degradation with has implications for inflammation, as well as atherosclerotic cardiovascular disease (n = 25 of 190 genes, p = 8.03 × 10-6 , FDR-corrected p = 0.01). Given the polygenicity of complex outcomes such as antidepressant response, we also explored 11 polygenic risk scores associated with risk for Alzheimer's disease and stroke. Of the 11 scores, risk for cardioembolic stroke was the second-best predictor of non-remission, after being male (Accuracy = 0.70 [0.59, 0.79], Sensitivity = 0.72, Specificity = 0.67; p = 2.45 × 10-4 ). Although our findings did not reach genome-wide significance, they point to previously-implicated mechanisms and provide support for the roles of vascular and inflammatory pathways in LLD. Overall, significant enrichment of genes involved in protein degradation pathways that may be impaired, as well as the predictive capacity of risk for cardioembolic stroke, support a link between late-life depression remission and risk for vascular dysfunction.- Published
- 2021
- Full Text
- View/download PDF
29. Interleukin-6 Gene Expression Changes after a 4-Week Intake of a Multispecies Probiotic in Major Depressive Disorder-Preliminary Results of the PROVIT Study.
- Author
-
Reiter A, Bengesser SA, Hauschild AC, Birkl-Töglhofer AM, Fellendorf FT, Platzer M, Färber T, Seidl M, Mendel LM, Unterweger R, Lenger M, Mörkl S, Dalkner N, Birner A, Queissner R, Hamm C, Maget A, Pilz R, Kohlhammer-Dohr A, Wagner-Skacel J, Kreuzer K, Schöggl H, Amberger-Otti D, Lahousen T, Leitner-Afschar B, Haybäck J, Kapfhammer HP, and Reininghaus E
- Subjects
- Adult, Affect drug effects, Austria, Cognition drug effects, Depressive Disorder, Major psychology, Female, Gastrointestinal Microbiome drug effects, Gene Expression genetics, Humans, Male, Depressive Disorder, Major blood, Depressive Disorder, Major genetics, Gene Expression drug effects, Interleukin-6 blood, Interleukin-6 genetics, Probiotics pharmacology
- Abstract
Major depressive disorder (MDD) is a prevalent disease, in which one third of sufferers do not respond to antidepressants. Probiotics have the potential to be well-tolerated and cost-efficient treatment options. However, the molecular pathways of their effects are not fully elucidated yet. Based on previous literature, we assume that probiotics can positively influence inflammatory mechanisms. We aimed at analyzing the effects of probiotics on gene expression of inflammation genes as part of the randomized, placebo-controlled, multispecies probiotics PROVIT study in Graz, Austria. Fasting blood of 61 inpatients with MDD was collected before and after four weeks of probiotic intake or placebo. We analyzed the effects on gene expression of tumor necrosis factor ( TNF ), nuclear factor kappa B subunit 1 ( NFKB1 ) and interleukin-6 ( IL-6 ). In IL-6 we found no significant main effects for group ( F
(1,44) = 1.33, p = ns) nor time ( F(1,44) = 0.00, p = ns), but interaction was significant ( F(1,44) = 5.67, p < 0.05). The intervention group showed decreasing IL-6 gene expression levels while the placebo group showed increasing gene expression levels of IL-6 . Probiotics could be a useful additional treatment in MDD, due to their anti-inflammatory effects. Results of the current study are promising, but further studies are required to investigate the beneficial effects of probiotic interventions in depressed individuals.- Published
- 2020
- Full Text
- View/download PDF
30. CORDITE: The Curated CORona Drug InTERactions Database for SARS-CoV-2.
- Author
-
Martin R, Löchel HF, Welzel M, Hattab G, Hauschild AC, and Heider D
- Abstract
Since the outbreak in 2019, researchers are trying to find effective drugs against the SARS-CoV-2 virus based on de novo drug design and drug repurposing. The former approach is very time consuming and needs extensive testing in humans, whereas drug repurposing is more promising, as the drugs have already been tested for side effects, etc. At present, there is no treatment for COVID-19 that is clinically effective, but there is a huge amount of data from studies that analyze potential drugs. We developed CORDITE to efficiently combine state-of-the-art knowledge on potential drugs and make it accessible to scientists and clinicians. The web interface also provides access to an easy-to-use API that allows a wide use for other software and applications, e.g., for meta-analysis, design of new clinical studies, or simple literature search. CORDITE is currently empowering many scientists across all continents and accelerates research in the knowledge domains of virology and drug design., Competing Interests: Declaration of Interests The authors declare no competing interests., (Copyright © 2020 The Authors. Published by Elsevier Inc. All rights reserved.)
- Published
- 2020
- Full Text
- View/download PDF
31. Urinary proteomics links keratan sulfate degradation and lysosomal enzymes to early type 1 diabetes.
- Author
-
Van JAD, Clotet-Freixas S, Hauschild AC, Batruch I, Jurisica I, Elia Y, Mahmud FH, Sochett E, Diamandis EP, Scholey JW, and Konvalinka A
- Subjects
- Adolescent, Adult, Child, Diabetes Mellitus, Type 1 genetics, Diabetes Mellitus, Type 1 metabolism, Diabetes Mellitus, Type 1 pathology, Extracellular Matrix Proteins urine, Female, Humans, Keratan Sulfate genetics, Kidney metabolism, Kidney pathology, Lysosomes metabolism, Lysosomes pathology, Male, Mass Spectrometry, Proteinuria metabolism, Proteinuria urine, Proteome genetics, Proteome metabolism, Young Adult, Diabetes Mellitus, Type 1 urine, Keratan Sulfate metabolism, Proteinuria genetics, Proteomics
- Abstract
Diabetes is the leading cause of end-stage renal disease worldwide. Our understanding of the early kidney response to chronic hyperglycemia remains incomplete. To address this, we first investigated the urinary proteomes of otherwise healthy youths with and without type 1 diabetes and subsequently examined the enriched pathways that might be dysregulated in early disease using systems biology approaches. This cross-sectional study included two separate cohorts for the discovery (N = 30) and internal validation (N = 30) of differentially excreted proteins. Discovery proteomics was performed on a Q Exactive Plus hybrid quadrupole-orbitrap mass spectrometer. We then searched the pathDIP, KEGG, and Reactome databases to identify enriched pathways in early diabetes; the Integrated Interactions Database to retrieve protein-protein interaction data; and the PubMed database to compare fold changes of our signature proteins with those published in similarly designed studies. Proteins were selected for internal validation based on pathway enrichment and availability of commercial enzyme-linked immunosorbent assay kits. Of the 2451 proteins identified, 576 were quantified in all samples from the discovery cohort; 34 comprised the urinary signature for early diabetes after Benjamini-Hochberg adjustment (Q < 0.05). The top pathways associated with this signature included lysosome, glycosaminoglycan degradation, and innate immune system (Q < 0.01). Notably, all enzymes involved in keratan sulfate degradation were significantly elevated in urines from youths with diabetes (|fold change| > 1.6). Increased urinary excretion of monocyte differentiation antigen CD14, hexosaminidase A, and lumican was also observed in the validation cohort (P < 0.05). Twenty-one proteins from our signature have been reported elsewhere as potential mediators of early diabetes. In this study, we identified a urinary proteomic signature for early type 1 diabetes, of which lysosomal enzymes were major constituents. Our findings highlight novel pathways such as keratan sulfate degradation in the early kidney response to hyperglycemia., Competing Interests: The authors have declared that no competing interests exist.
- Published
- 2020
- Full Text
- View/download PDF
32. GWAS-based machine learning approach to predict duloxetine response in major depressive disorder.
- Author
-
Maciukiewicz M, Marshe VS, Hauschild AC, Foster JA, Rotzinger S, Kennedy JL, Kennedy SH, Müller DJ, and Geraci J
- Subjects
- Adult, Duloxetine Hydrochloride administration & dosage, Female, Humans, Male, Middle Aged, Outcome Assessment, Health Care standards, Polymorphism, Single Nucleotide, Prognosis, Sensitivity and Specificity, Serotonin and Noradrenaline Reuptake Inhibitors administration & dosage, Depressive Disorder, Major drug therapy, Depressive Disorder, Major genetics, Duloxetine Hydrochloride pharmacology, Genome-Wide Association Study, Outcome Assessment, Health Care methods, Serotonin and Noradrenaline Reuptake Inhibitors pharmacology, Support Vector Machine
- Abstract
Major depressive disorder (MDD) is one of the most prevalent psychiatric disorders and is commonly treated with antidepressant drugs. However, large variability is observed in terms of response to antidepressants. Machine learning (ML) models may be useful to predict treatment outcomes. A sample of 186 MDD patients received treatment with duloxetine for up to 8 weeks were categorized as "responders" based on a MADRS change >50% from baseline; or "remitters" based on a MADRS score ≤10 at end point. The initial dataset (N = 186) was randomly divided into training and test sets in a nested 5-fold cross-validation, where 80% was used as a training set and 20% made up five independent test sets. We performed genome-wide logistic regression to identify potentially significant variants related to duloxetine response/remission and extracted the most promising predictors using LASSO regression. Subsequently, classification-regression trees (CRT) and support vector machines (SVM) were applied to construct models, using ten-fold cross-validation. With regards to response, none of the pairs performed significantly better than chance (accuracy p > .1). For remission, SVM achieved moderate performance with an accuracy = 0.52, a sensitivity = 0.58, and a specificity = 0.46, and 0.51 for all coefficients for CRT. The best performing SVM fold was characterized by an accuracy = 0.66 (p = .071), sensitivity = 0.70 and a sensitivity = 0.61. In this study, the potential of using GWAS data to predict duloxetine outcomes was examined using ML models. The models were characterized by a promising sensitivity, but specificity remained moderate at best. The inclusion of additional non-genetic variables to create integrated models may improve prediction., (Copyright © 2017. Published by Elsevier Ltd.)
- Published
- 2018
- Full Text
- View/download PDF
33. mirDIP 4.1-integrative database of human microRNA target predictions.
- Author
-
Tokar T, Pastrello C, Rossos AEM, Abovsky M, Hauschild AC, Tsay M, Lu R, and Jurisica I
- Subjects
- Humans, RNA, Messenger chemistry, Databases, Genetic, MicroRNAs metabolism, RNA, Messenger metabolism
- Abstract
MicroRNAs are important regulators of gene expression, achieved by binding to the gene to be regulated. Even with modern high-throughput technologies, it is laborious and expensive to detect all possible microRNA targets. For this reason, several computational microRNA-target prediction tools have been developed, each with its own strengths and limitations. Integration of different tools has been a successful approach to minimize the shortcomings of individual databases. Here, we present mirDIP v4.1, providing nearly 152 million human microRNA-target predictions, which were collected across 30 different resources. We also introduce an integrative score, which was statistically inferred from the obtained predictions, and was assigned to each unique microRNA-target interaction to provide a unified measure of confidence. We demonstrate that integrating predictions across multiple resources does not cumulate prediction bias toward biological processes or pathways. mirDIP v4.1 is freely available at http://ophid.utoronto.ca/mirDIP/., (© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2018
- Full Text
- View/download PDF
34. LifeStyle-Specific-Islands (LiSSI): Integrated Bioinformatics Platform for Genomic Island Analysis.
- Author
-
Barbosa E, Röttger R, Hauschild AC, de Castro Soares S, Böcker S, Azevedo V, and Baumbach J
- Subjects
- Conserved Sequence genetics, Evolution, Molecular, Machine Learning, Acclimatization genetics, Bacteria genetics, Genome, Bacterial genetics, Genomic Islands genetics, Genomics methods
- Abstract
Distinct bacteria are able to cope with highly diverse lifestyles; for instance, they can be free living or host-associated. Thus, these organisms must possess a large and varied genomic arsenal to withstand different environmental conditions. To facilitate the identification of genomic features that might influence bacterial adaptation to a specific niche, we introduce LifeStyle-Specific-Islands (LiSSI). LiSSI combines evolutionary sequence analysis with statistical learning (Random Forest with feature selection, model tuning and robustness analysis). In summary, our strategy aims to identify conserved consecutive homology sequences (islands) in genomes and to identify the most discriminant islands for each lifestyle.
- Published
- 2017
- Full Text
- View/download PDF
35. Carotta: Revealing Hidden Confounder Markers in Metabolic Breath Profiles.
- Author
-
Hauschild AC, Frisch T, Baumbach JI, and Baumbach J
- Abstract
Computational breath analysis is a growing research area aiming at identifying volatile organic compounds (VOCs) in human breath to assist medical diagnostics of the next generation. While inexpensive and non-invasive bioanalytical technologies for metabolite detection in exhaled air and bacterial/fungal vapor exist and the first studies on the power of supervised machine learning methods for profiling of the resulting data were conducted, we lack methods to extract hidden data features emerging from confounding factors. Here, we present Carotta, a new cluster analysis framework dedicated to uncovering such hidden substructures by sophisticated unsupervised statistical learning methods. We study the power of transitivity clustering and hierarchical clustering to identify groups of VOCs with similar expression behavior over most patient breath samples and/or groups of patients with a similar VOC intensity pattern. This enables the discovery of dependencies between metabolites. On the one hand, this allows us to eliminate the effect of potential confounding factors hindering disease classification, such as smoking. On the other hand, we may also identify VOCs associated with disease subtypes or concomitant diseases. Carotta is an open source software with an intuitive graphical user interface promoting data handling, analysis and visualization. The back-end is designed to be modular, allowing for easy extensions with plugins in the future, such as new clustering methods and statistics. It does not require much prior knowledge or technical skills to operate. We demonstrate its power and applicability by means of one artificial dataset. We also apply Carotta exemplarily to a real-world example dataset on chronic obstructive pulmonary disease (COPD). While the artificial data are utilized as a proof of concept, we will demonstrate how Carotta finds candidate markers in our real dataset associated with confounders rather than the primary disease (COPD) and bronchial carcinoma (BC). Carotta is publicly available at http://carotta.compbio.sdu.dk [1].
- Published
- 2015
- Full Text
- View/download PDF
36. Volatile organic compounds during inflammation and sepsis in rats: a potential breath test using ion-mobility spectrometry.
- Author
-
Fink T, Wolf A, Maurer F, Albrecht FW, Heim N, Wolf B, Hauschild AC, Bödeker B, Baumbach JI, Volk T, Sessler DI, and Kreuer S
- Subjects
- Animals, Disease Models, Animal, Exhalation, Inflammation diagnosis, Ions, Male, Rats, Rats, Sprague-Dawley, Sepsis diagnosis, Shock, Hemorrhagic metabolism, Breath Tests methods, Inflammation metabolism, Sepsis metabolism, Spectrum Analysis methods, Volatile Organic Compounds metabolism
- Abstract
Background: Multicapillary column ion-mobility spectrometry (MCC-IMS) may identify volatile components in exhaled gas. The authors therefore used MCC-IMS to evaluate exhaled gas in a rat model of sepsis, inflammation, and hemorrhagic shock., Methods: Male Sprague-Dawley rats were anesthetized and ventilated via tracheostomy for 10 h or until death. Sepsis was induced by cecal ligation and incision in 10 rats; a sham operation was performed in 10 others. In 10 other rats, endotoxemia was induced by intravenous administration of 10 mg/kg lipopolysaccharide. In a final 10 rats, hemorrhagic shock was induced to a mean arterial pressure of 35 ± 5 mmHg. Exhaled gas was analyzed with MCC-IMS, and volatile compounds were identified using the BS-MCC/IMS-analytes database (Version 1209; B&S Analytik, Dortmund, Germany)., Results: All sham animals survived the observation period, whereas mean survival time was 7.9 h in the septic animals, 9.1 h in endotoxemic animals, and 2.5 h in hemorrhagic shock. Volatile compounds showed statistically significant differences in septic and endotoxemic rats compared with sham rats for 3-pentanone and acetone. Endotoxic rats differed significantly from sham for 1-propanol, butanal, acetophenone, 1,2-butandiol, and 2-hexanone. Statistically significant differences were observed between septic and endotoxemic rats for butanal, 3-pentanone, and 2-hexanone. 2-Hexanone differed from all other groups in the rats with shock., Conclusions: Breath analysis of expired organic compounds differed significantly in septic, inflammation, and sham rats. MCC-IMS of exhaled breath deserves additional study as a noninvasive approach for distinguishing sepsis from inflammation.
- Published
- 2015
- Full Text
- View/download PDF
37. On the limits of computational functional genomics for bacterial lifestyle prediction.
- Author
-
Barbosa E, Röttger R, Hauschild AC, Azevedo V, and Baumbach J
- Subjects
- Humans, Computational Biology methods, Genome, Bacterial genetics, Genomics methods
- Abstract
We review the level of genomic specificity regarding actinobacterial pathogenicity. As they occupy various niches in diverse habitats, one may assume the existence of lifestyle-specific genomic features. We include 240 actinobacteria classified into four pathogenicity classes: human pathogens (HPs), broad-spectrum pathogens (BPs), opportunistic pathogens (OPs) and non-pathogenic (NP). We hypothesize: (H1) Pathogens (HPs and BPs) possess specific pathogenicity signature genes. (H2) The same holds for OPs. (H3) Broad-spectrum and exclusively HPs cannot be distinguished from each other because of an observation bias, i.e. many HPs might yet be unclassified BPs. (H4) There is no intrinsic genomic characteristic of OPs compared with pathogens, as small mutations are likely to play a more dominant role to survive the immune system. To study these hypotheses, we implemented a bioinformatics pipeline that combines evolutionary sequence analysis with statistical learning methods (Random Forest with feature selection, model tuning and robustness analysis). Essentially, we present orthologous gene sets that computationally distinguish pathogens from NPs (H1). We further show a clear limit in differentiating OPs from both NPs (H2) and pathogens (H4). HPs may also not be distinguished from bacteria annotated as BPs based only on a small set of orthologous genes (H3), as many HPs might as well target a broad range of mammals but have not been annotated accordingly. In conclusion, we illustrate that even in the post-genome era and despite next-generation sequencing technology, our ability to efficiently deduce real-world conclusions, such as pathogenicity classification, remains quite limited., (© The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.)
- Published
- 2014
- Full Text
- View/download PDF
38. Classification of breast cancer subtypes by combining gene expression and DNA methylation data.
- Author
-
List M, Hauschild AC, Tan Q, Kruse TA, Mollenhauer J, Baumbach J, and Batra R
- Subjects
- Algorithms, Artificial Intelligence, Breast Neoplasms genetics, Breast Neoplasms metabolism, Computational Biology methods, Epigenesis, Genetic, Female, Gene Expression, Humans, Oligonucleotide Array Sequence Analysis methods, Prognosis, Reproducibility of Results, Software, Breast Neoplasms classification, DNA Methylation, Gene Expression Profiling methods, Gene Expression Regulation, Neoplastic
- Abstract
Selecting the most promising treatment strategy for breast cancer crucially depends on determining the correct subtype. In recent years, gene expression profiling has been investigated as an alternative to histochemical methods. Since databases like TCGA provide easy and unrestricted access to gene expression data for hundreds of patients, the challenge is to extract a minimal optimal set of genes with good prognostic properties from a large bulk of genes making a moderate contribution to classification. Several studies have successfully applied machine learning algorithms to solve this so-called gene selection problem. However, more diverse data from other OMICS technologies are available, including methylation. We hypothesize that combining methylation and gene expression data could already lead to a largely improved classification model, since the resulting model will reflect differences not only on the transcriptomic, but also on an epigenetic level. We compared so-called random forest derived classification models based on gene expression and methylation data alone, to a model based on the combined features and to a model based on the gold standard PAM50. We obtained bootstrap errors of 10-20% and classification error of 1-50%, depending on breast cancer subtype and model. The gene expression model was clearly superior to the methylation model, which was also reflected in the combined model, which mainly selected features from gene expression data. However, the methylation model was able to identify unique features not considered as relevant by the gene expression model, which might provide deeper insights into breast cancer subtype differentiation on an epigenetic level.
- Published
- 2014
- Full Text
- View/download PDF
39. Current breathomics--a review on data pre-processing techniques and machine learning in metabolomics breath analysis.
- Author
-
Smolinska A, Hauschild AC, Fijten RR, Dallinga JW, Baumbach J, and van Schooten FJ
- Subjects
- Breath Tests instrumentation, Humans, Multivariate Analysis, Reference Standards, Artificial Intelligence, Breath Tests methods, Electronic Data Processing, Metabolomics
- Abstract
We define breathomics as the metabolomics study of exhaled air. It is a strongly emerging metabolomics research field that mainly focuses on health-related volatile organic compounds (VOCs). Since the amount of these compounds varies with health status, breathomics holds great promise to deliver non-invasive diagnostic tools. Thus, the main aim of breathomics is to find patterns of VOCs related to abnormal (for instance inflammatory) metabolic processes occurring in the human body. Recently, analytical methods for measuring VOCs in exhaled air with high resolution and high throughput have been extensively developed. Yet, the application of machine learning methods for fingerprinting VOC profiles in the breathomics is still in its infancy. Therefore, in this paper, we describe the current state of the art in data pre-processing and multivariate analysis of breathomics data. We start with the detailed pre-processing pipelines for breathomics data obtained from gas-chromatography mass spectrometry and an ion-mobility spectrometer coupled to multi-capillary columns. The outcome of data pre-processing is a matrix containing the relative abundances of a set of VOCs for a group of patients under different conditions (e.g. disease stage, treatment). Independently of the utilized analytical method, the most important question, 'which VOCs are discriminatory?', remains the same. Answers can be given by several modern machine learning techniques (multivariate statistics) and, therefore, are the focus of this paper. We demonstrate the advantages as well the drawbacks of such techniques. We aim to help the community to understand how to profit from a particular method. In parallel, we hope to make the community aware of the existing data fusion methods, as yet unresearched in breathomics.
- Published
- 2014
- Full Text
- View/download PDF
40. On the importance of statistics in breath analysis--hope or curse?
- Author
-
Eckel SP, Baumbach J, and Hauschild AC
- Subjects
- Exhalation physiology, Humans, Nitric Oxide analysis, Principal Component Analysis, Breath Tests methods, Statistics as Topic
- Published
- 2014
- Full Text
- View/download PDF
41. Peak detection method evaluation for ion mobility spectrometry by using machine learning approaches.
- Author
-
Hauschild AC, Kopczynski D, D'Addario M, Baumbach JI, Rahmann S, and Baumbach J
- Abstract
Ion mobility spectrometry with pre-separation by multi-capillary columns (MCC/IMS) has become an established inexpensive, non-invasive bioanalytics technology for detecting volatile organic compounds (VOCs) with various metabolomics applications in medical research. To pave the way for this technology towards daily usage in medical practice, different steps still have to be taken. With respect to modern biomarker research, one of the most important tasks is the automatic classification of patient-specific data sets into different groups, healthy or not, for instance. Although sophisticated machine learning methods exist, an inevitable preprocessing step is reliable and robust peak detection without manual intervention. In this work we evaluate four state-of-the-art approaches for automated IMS-based peak detection: local maxima search, watershed transformation with IPHEx, region-merging with VisualNow, and peak model estimation (PME).We manually generated Metabolites 2013, 3 278 a gold standard with the aid of a domain expert (manual) and compare the performance of the four peak calling methods with respect to two distinct criteria. We first utilize established machine learning methods and systematically study their classification performance based on the four peak detectors' results. Second, we investigate the classification variance and robustness regarding perturbation and overfitting. Our main finding is that the power of the classification accuracy is almost equally good for all methods, the manually created gold standard as well as the four automatic peak finding methods. In addition, we note that all tools, manual and automatic, are similarly robust against perturbations. However, the classification performance is more robust against overfitting when using the PME as peak calling preprocessor. In summary, we conclude that all methods, though small differences exist, are largely reliable and enable a wide spectrum of real-world biomedical applications.
- Published
- 2013
- Full Text
- View/download PDF
42. An integrative clinical database and diagnostics platform for biomarker identification and analysis in ion mobility spectra of human exhaled air.
- Author
-
Schneider T, Hauschild AC, Baumbach JI, and Baumbach J
- Subjects
- Decision Trees, Humans, Ions, Software, Time Factors, Air analysis, Biomarkers analysis, Breath Tests methods, Databases as Topic, Exhalation, Spectrum Analysis methods
- Abstract
Over the last decade the evaluation of odors and vapors in human breath has gained more and more attention, particularly in the diagnostics of pulmonary diseases. Ion mobility spectrometry coupled with multi-capillary columns (MCC/IMS), is a well known technology for detecting volatile organic compounds (VOCs) in air. It is a comparatively inexpensive, non-invasive, high-throughput method, which is able to handle the moisture that comes with human exhaled air, and allows for characterizing of VOCs in very low concentrations. To identify discriminating compounds as biomarkers, it is necessary to have a clear understanding of the detailed composition of human breath. Therefore, in addition to the clinical studies, there is a need for a flexible and comprehensive centralized data repository, which is capable of gathering all kinds of related information. Moreover, there is a demand for automated data integration and semi-automated data analysis, in particular with regard to the rapid data accumulation, emerging from the high-throughput nature of the MCC/IMS technology. Here, we present a comprehensive database application and analysis platform, which combines metabolic maps with heterogeneous biomedical data in a well-structured manner. The design of the database is based on a hybrid of the entity-attribute-value (EAV) model and the EAV-CR, which incorporates the concepts of classes and relationships. Additionally it offers an intuitive user interface that provides easy and quick access to the platform’s functionality: automated data integration and integrity validation, versioning and roll-back strategy, data retrieval as well as semi-automatic data mining and machine learning capabilities. The platform will support MCC/IMS-based biomarker identification and validation. The software, schemata, data sets and further information is publicly available at http://imsdb.mpi-inf.mpg.de.
- Published
- 2013
- Full Text
- View/download PDF
43. Computational methods for metabolomic data analysis of ion mobility spectrometry data-reviewing the state of the art.
- Author
-
Hauschild AC, Schneider T, Pauling J, Rupp K, Jang M, Baumbach JI, and Baumbach J
- Abstract
Ion mobility spectrometry combined with multi-capillary columns (MCC/IMS) is a well known technology for detecting volatile organic compounds (VOCs). We may utilize MCC/IMS for scanning human exhaled air, bacterial colonies or cell lines, for example. Thereby we gain information about the human health status or infection threats. We may further study the metabolic response of living cells to external perturbations. The instrument is comparably cheap, robust and easy to use in every day practice. However, the potential of the MCC/IMS methodology depends on the successful application of computational approaches for analyzing the huge amount of emerging data sets. Here, we will review the state of the art and highlight existing challenges. First, we address methods for raw data handling, data storage and visualization. Afterwards we will introduce de-noising, peak picking and other pre-processing approaches. We will discuss statistical methods for analyzing correlations between peaks and diseases or medical treatment. Finally, we study up-to-date machine learning techniques for identifying robust biomarker molecules that allow classifying patients into healthy and diseased groups. We conclude that MCC/IMS coupled with sophisticated computational methods has the potential to successfully address a broad range of biomedical questions. While we can solve most of the data pre-processing steps satisfactorily, some computational challenges with statistical learning and model validation remain.
- Published
- 2012
- Full Text
- View/download PDF
44. Integrated statistical learning of metabolic ion mobility spectrometry profiles for pulmonary disease identification.
- Author
-
Hauschild AC, Baumbach JI, and Baumbach J
- Subjects
- Bronchial Neoplasms complications, Bronchial Neoplasms metabolism, Case-Control Studies, Humans, Ions, Pulmonary Disease, Chronic Obstructive classification, Pulmonary Disease, Chronic Obstructive complications, Support Vector Machine, Mass Spectrometry methods, Metabolome, Models, Statistical, Pulmonary Disease, Chronic Obstructive diagnosis, Pulmonary Disease, Chronic Obstructive metabolism
- Abstract
Exhaled air carries information on human health status. Ion mobility spectrometers combined with a multi-capillary column (MCC/IMS) is a well-known technology for detecting volatile organic compounds (VOCs) within human breath. This technique is relatively inexpensive, robust and easy to use in every day practice. However, the potential of this methodology depends on successful application of computational approaches for finding relevant VOCs and classification of patients into disease-specific profile groups based on the detected VOCs. We developed an integrated state-of-the-art system using sophisticated statistical learning techniques for VOC-based feature selection and supervised classification into patient groups. We analyzed breath data from 84 volunteers, each of them either suffering from chronic obstructive pulmonary disease (COPD), or both COPD and bronchial carcinoma (COPD + BC), as well as from 35 healthy volunteers, comprising a control group (CG). We standardized and integrated several statistical learning methods to provide a broad overview of their potential for distinguishing the patient groups. We found that there is strong potential for separating MCC/IMS chromatograms of healthy controls and COPD patients (best accuracy COPD vs CG: 94%). However, further examination of the impact of bronchial carcinoma on COPD/no-COPD classification performance is necessary (best accuracy CG vs COPD vs COPD + BC: 79%). We also extracted 20 high-scoring VOCs that allowed differentiating COPD patients from healthy controls. We conclude that these statistical learning methods have a generally high accuracy when applied to well-structured, medical MCC/IMS data.
- Published
- 2012
- Full Text
- View/download PDF
45. Robust modelling, measurement and analysis of human and animal metabolic systems.
- Author
-
van Beek JH, Hauschild AC, Hettling H, and Binsl TW
- Subjects
- Animals, Humans, Myocardium metabolism, Phosphates metabolism, Metabolism, Models, Biological
- Abstract
Modelling human and animal metabolism is impeded by the lack of accurate quantitative parameters and the large number of biochemical reactions. This problem may be tackled by: (i) study of modules of the network independently; (ii) ensemble simulations to explore many plausible parameter combinations; (iii) analysis of 'sloppy' parameter behaviour, revealing interdependent parameter combinations with little influence; (iv) multiscale analysis that combines molecular and whole network data; and (v) measuring metabolic flux (rate of flow) in vivo via stable isotope labelling. For the latter method, carbon transition networks were modelled with systems of ordinary differential equations, but we show that coloured Petri nets provide a more intuitive graphical approach. Analysis of parameter sensitivities shows that only a few parameter combinations have a large effect on predictions. Model analysis of high-energy phosphate transport indicates that membrane permeability, inaccurately known at the organellar level, can be well determined from whole-organ responses. Ensemble simulations that take into account the imprecision of measured molecular parameters contradict the popular hypothesis that high-energy phosphate transport in heart muscle is mostly by phosphocreatine. Combining modular, multiscale, ensemble and sloppy modelling approaches with in vivo flux measurements may prove indispensable for the modelling of the large human metabolic system.
- Published
- 2009
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.