41 results on '"Berthold, MR"'
Search Results
2. Explorative Data Analysis: from machine learning to discovery support systems
- Author
-
Berthold MR
- Subjects
Chemistry ,QD1-999 - Published
- 2009
- Full Text
- View/download PDF
3. Conditional density estimation using fuzzy GARCH models
- Author
-
Almeida, Rui Jorge, Basturk, Nalan, Uzay Kaymak, Da Costa Sousa, Joao Miguel, Kruse, R., Berthold, MR, Moewes, C., Gil, Ma, Grzegorzewski, P., Hryniewicz, O., Econometrics, and Information Systems IE&IS
- Subjects
Computer science ,Simple (abstract algebra) ,Skewness ,Autoregressive conditional heteroskedasticity ,Econometrics ,Density estimation ,Time series ,Conditional variance ,Fuzzy logic ,Algorithm ,Interpretation (model theory) - Abstract
Time series data exhibits complex behavior including non-linearity and path-dependency. This paper proposes a flexible fuzzy GARCH model that can capture different properties of data, such as skewness, fat tails and multimodality in one single model. Furthermore, additional information and simple understanding of the underlying process can be provided by the linguistic interpretation of the proposed model. The model performance is illustrated using two simulated data examples
- Published
- 2013
- Full Text
- View/download PDF
4. Regional spatial analysis combining fuzzy clustering and non-parametric correlation
- Author
-
Tutmez, Bulent, Uzay Kaymak, Kruse, R., Berthold, MR, Moewes, C., Gil, Ma, Grzegorzewski, P., Hryniewicz, O., and Information Systems IE&IS
- Subjects
Fuzzy clustering ,business.industry ,Correlation clustering ,Nonparametric statistics ,FLAME clustering ,Pattern recognition ,Artificial intelligence ,business ,Projection (set theory) ,Cluster analysis ,Spatial analysis ,Rank correlation - Abstract
In this study, regional analysis based on a limited number of data, which is an important real problem in some disciplines such as geosciences and environmental science, was considered for evaluating spatial data. A combination of fuzzy clustering and non-parametrical statistical analysis is made. In this direction, the partitioning performance of a fuzzy clustering on different types of spatial systems was examined. In this way, a regional projection approach has been constructed. The results show that the combination produces reliable results and also presents possibilities for future works.
- Published
- 2013
- Full Text
- View/download PDF
5. Diagnostic findings in stapes revision surgery-a retrospective of 26 years.
- Author
-
Schimanski G, Schimanski E, and Berthold MR
- Published
- 2011
- Full Text
- View/download PDF
6. Discriminative Bias for Learning Probabilistic Sentential Decision Diagrams
- Author
-
Laura Isabel Galindez Olascoaga, Nimish Shah, Guy Van den Broeck, Marian Verhelst, Wannes Meert, Berthold, MR, Feelders, A, and Krempl, G
- Subjects
Structure (mathematical logic) ,Relation (database) ,business.industry ,Computer science ,Probabilistic logic ,02 engineering and technology ,Missing data ,Machine learning ,computer.software_genre ,Class (biology) ,Discriminative model ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,Probability distribution ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer - Abstract
Methods that learn the structure of Probabilistic Sentential Decision Diagrams (PSDD) from data have achieved state-of-the-art performance in tractable learning tasks. These methods learn PSDDs incrementally by optimizing the likelihood of the induced probability distribution given available data and are thus robust against missing values, a relevant trait to address the challenges of embedded applications, such as failing sensors and resource constraints. However PSDDs are outperformed by discriminatively trained models in classification tasks. In this work, we introduce D-LearnPSDD, a learner that improves the classification performance of the LearnPSDD algorithm by introducing a discriminative bias that encodes the conditional relation between the class and feature variables. ispartof: pages:184-196 ispartof: Advances in Intelligent Data Analysis XVIII vol:12080 pages:184-196 ispartof: International Symposium on Intelligent Data Analysis location:Konstanz, Germany (took place online due to pandemic) date:27 Apr - 29 Apr 2020 status: Published online
- Published
- 2020
- Full Text
- View/download PDF
7. Gibbs Sampling Subjectively Interesting Tiles
- Author
-
Tijl De Bie, Jefrey Lijffijt, Anes Bendimerad, Céline Robardet, Marc Plantevit, Berthold, MR, Feelders, A, Krempl, G, Data Mining and Machine Learning (DM2L), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), Internet Technology and Data Science Lab (IDLab), and Universiteit Antwerpen [Antwerpen]-Universiteit Gent = Ghent University [Belgium] (UGENT)
- Subjects
Technology and Engineering ,Computer science ,02 engineering and technology ,KNOWLEDGE DISCOVERY ,computer.software_genre ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Interpretation (model theory) ,Local pattern ,Set (abstract data type) ,symbols.namesake ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Gibbs sampling ,Pattern mining ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Pattern sampling ,[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] ,Subjective interestingness ,business.industry ,Trawling ,Usability ,Mathematics and Statistics ,Large set (Ramsey theory) ,symbols ,020201 artificial intelligence & image processing ,Data mining ,Computational problem ,business ,computer - Abstract
International audience; The local pattern mining literature has long struggled with the so-called pattern explosion problem: the size of the set of patterns found exceeds the size of the original data. This causes computational problems (enumerating a large set of patterns will inevitably take a substantial amount of time) as well as problems for interpretation and usabil-ity (trawling through a large set of patterns is often impractical). Two complementary research lines aim to address this problem. The first aims to develop better measures of interestingness, in order to reduce the number of uninteresting patterns that are returned [6, 10]. The second aims to avoid an exhaustive enumeration of all 'interesting' patterns (where interestingness is quantified in a more traditional way, e.g. frequency), by directly sampling from this set in a way that more 'interest-ing' patterns are sampled with higher probability [2]. Unfortunately, the first research line does not reduce computational cost, while the second may miss out on the most interesting patterns. In this paper, we combine the best of both worlds for mining interesting tiles [8] from binary databases. Specifically, we propose a new pattern sampling approach based on Gibbs sampling, where the probability of sampling a pattern is proportional to their subjective interest-ingness [6]-an interestingness measure reported to better represent true interestingness. The experimental evaluation confirms the theory, but also reveals an important weakness of the proposed approach which we speculate is shared with any other pattern sampling approach. We thus conclude with a broader discussion of this issue, and a forward look.
- Published
- 2020
- Full Text
- View/download PDF
8. Conjunction, Disjunction and Iterated Conditioning of Conditional Events
- Author
-
Giuseppe Sanfilippo, Angelo Gilio, Kruse, R, Berthold, MR, Moewes, C, Gil, MA, Grzegorzewski, P, Hryniewicz, O, Gilio, A, and Sanfilippo, G
- Subjects
Theoretical computer science ,Settore MAT/06 - Probabilita' E Statistica Matematica ,Computer science ,Probabilistic logic ,Coherence (philosophical gambling strategy) ,Conditional events, conditional random quantities, conjunction, disjunction, iterated conditionals ,Conjunction (grammar) ,Set (abstract data type) ,Regular conditional probability ,disjunction ,conditional events ,conjunction ,conditional random quantities ,iterated conditionals ,Iterated function ,Representation (mathematics) ,Settore SECS-S/01 - Statistica ,Mathematical economics ,Event (probability theory) - Abstract
Starting from a recent paper by S. Kaufmann, we introduce a notion of conjunction of two conditional events and then we analyze it in the setting of coherence. We give a representation of the conjoined conditional and we show that this new object is a conditional random quantity, whose set of possible values normally contains the probabilities assessed for the two conditional events. We examine some cases of logical dependencies, where the conjunction is a conditional event; moreover, we give the lower and upper bounds on the conjunction. We also examine an apparent paradox concerning stochastic independence which can actually be explained in terms of uncorrelation. We briefly introduce the notions of disjunction and iterated conditioning and we show that the usual probabilistic properties still hold.
- Published
- 2013
9. InfraWatch: Data management of large systems for monitoring infrastructural performance
- Author
-
Knobbe, A., Hendrik Blockeel, Koopman, A., Calders, T., Obladen, B., Bosnia, C., Galenkamp, H., Koenders, E., Kok, J., Cohen, P. R., Adams, N. M., Berthold, M. R., Cohen, PR, Adams, NM, and Berthold, MR
- Subjects
Computer science ,business.industry ,Data management ,data analysis ,data mining ,Public domain ,Computer security ,computer.software_genre ,Bridge (nautical) ,Construction engineering ,Visualization ,Weather station ,business ,computer ,Wireless sensor network - Abstract
This paper introduces a new project, InfraWatch, that demonstrates the many challenges that a large complex data analysis application has to offer in terms of data capture, management, analysis and reporting. The project is concerned with the intelligent monitoring and analysis of large infrastructural projects in the public domain, such as public roads, highways, tunnels and bridges. As a demonstrator, the project includes the detailed measurement of traffic and weather load on one of the largest highway bridges in the Netherlands. As part of a recent renovation and re-enforcement effort, the bridge has been equipped with a substantial sensor network, which has been producing large amounts of sensor data for more than a year. The bridge is currently equipped with a multitude of vibration and stress sensors, a video camera and weather station. We propose this bridge as a challenging environment for intelligent data analysis research. In this paper we outline the reasons for monitoring infrastructural assets through sensors, the scientific challenges in for example data management and analysis, and we present a visualization tool for the data coming from the bridge. We think that the bridge can serve as a means to promote research and education in intelligent data analysis. ispartof: pages:91-102 ispartof: Lecture Notes in Computer Science vol:6065 pages:91-102 ispartof: Intelligent Data Analysis (IDA) location:Tucson, Arizona, USA date:19 May - 21 May 2010 status: published
- Published
- 2010
10. Empirical asymmetric selective transfer in multi-objective decision trees
- Author
-
Hendrik Blockeel, Beau Piccart, Jan Struyf, Boulicaut, JF, Berthold, MR, and Horvath, T
- Subjects
Computer science ,business.industry ,Decision tree learning ,Decision tree ,Multi-task learning ,computer.software_genre ,Machine learning ,Variable (computer science) ,Inductive transfer ,Selective transfer ,Data mining ,Artificial intelligence ,machine learning,inductive transfer, decision tree ,business ,Cluster analysis ,computer - Abstract
We consider learning tasks where multiple target variables need to be predicted. Two approaches have been used in this setting: (a) build a separate single-target model for each target variable, and (b) build a multi-target model that predicts all targets simultaneously; the latter may exploit potential dependencies among the targets. For a given target, either (a) or (b) can yield the most accurate model. This shows that exploiting information available in other targets may be beneficial as well as detrimental to accuracy. This raises the question whether it is possible to find, for a given target (we call this the main target), the best subset of the other targets (the support targets) that, when combined with the main target in a multi-target model, results in the most accurate model for the main target. We propose Empirical Asymmetric Selective Transfer (EAST), a generally applicable algorithm that approximates such a subset. Applied to decision trees, EAST outperforms single-target decision trees, multi-target decision trees, and multi-target decision trees with target clustering. ispartof: pages:64-75 ispartof: Lecture Notes in Computer Science vol:5255 pages:64-75 ispartof: International conference on Discovery Science location:Budapest date:13 Oct - 16 Oct 2008 status: published
- Published
- 2008
11. Ensemble-trees: Leveraging ensemble power inside decision trees
- Author
-
Albrecht Zimmermann, Boulicaut, JF, Berthold, MR, and Horvath, T
- Subjects
Statistical classification ,Computer science ,business.industry ,Decision tree ,Stability (learning theory) ,Artificial intelligence ,Variance (accounting) ,Data mining ,business ,Machine learning ,computer.software_genre ,computer ,Power (physics) - Abstract
Decision trees are among the most effective and interpretable classification algorithms while ensembles techniques have been proven to alleviate problems regarding over-fitting and variance. On the other hand, decision trees show a tendency to lack stability given small changes in the data, whereas interpreting an ensemble of trees is challenging to comprehend. We propose the technique of Ensemble-Trees which uses ensembles of rules within the test nodes to reduce over-fitting and variance effects. Validating the technique experimentally, we find that improvements in performance compared to ensembles of pruned trees exist, but also that the technique does less to reduce structural instability than could be expected. ispartof: pages:76-87 ispartof: Lecture Notes in Computer Science vol:5255 pages:76-87 ispartof: Discovery Science location:Budapest date:13 Oct - 16 Oct 2008 status: published
- Published
- 2008
12. A comparison between neural network methods for learning aggregate functions
- Author
-
Hendrik Blockeel, Werner Uwents, Boulicaut, JF, Berthold, MR, and Horvath, T
- Subjects
Tensor product network ,Artificial neural network ,Neural Networks ,business.industry ,Computer science ,Time delay neural network ,Competitive learning ,Deep learning ,Feed forward ,Machine learning ,computer.software_genre ,Catastrophic interference ,Machine Learning ,Probabilistic neural network ,Recurrent neural network ,Artificial intelligence ,Types of artificial neural networks ,Stochastic neural network ,business ,computer ,Nervous system network models - Abstract
In various application domains, data can be represented as bags of vectors instead of single vectors. Learning aggregate functions from such bags is a challenging problem. In this paper, a number of simple neural network approaches and a combined approach based on cascade-correlation are examined in order to handle this kind of data. Adapted feedforward networks, recurrent networks and networks with special aggregation units integrated in the network can all be used to construct networks that are capable of learning aggregate function. A combination of these three approaches is possible by using cascade-correlation, creating a method that automatically chooses the best of these options. Results on artificial and multi-instance data sets are reported, allowing a comparison between the different approaches. ispartof: pages:88-99 ispartof: Lecture notes in computer science vol:5255 pages:88-99 ispartof: International Conference on Discovery Science location:Budapest date:13 Oct - 16 Oct 2008 status: published
- Published
- 2008
13. Active learning for high throughput screening
- Author
-
Jan Ramon, Kurt De Grave, Luc De Raedt, Boulicaut, JF, Berthold, MR, and Horvath, T
- Subjects
Optimization ,Active learning (machine learning) ,business.industry ,Computer science ,QSAR ,media_common.quotation_subject ,Active Learning ,Machine learning ,computer.software_genre ,Task (project management) ,symbols.namesake ,Chemical compounds ,Active learning ,symbols ,Artificial intelligence ,HPC-KUL ,Function (engineering) ,business ,Set (psychology) ,Gaussian process ,computer ,media_common - Abstract
An important task in many scientific and engineering disciplines is to set up experiments with the goal of finding the best instances (substances, compositions, designs) as evaluated on an unknown target function using limited resources. We study this problem using machine learning principles, and introduce the novel task of active k-optimization. The problem consists of approximating the k best instances with regard to an unknown function and the learner is active, that is, it can present a limited number of instances to an oracle for obtaining the target value. We also develop an algorithm based on Gaussian processes for tackling active k-optimization, and evaluate it on a challenging set of tasks related to structure-activity relationship prediction. This paper received the Carl Smith Student Award. ispartof: pages:185-196 ispartof: Lecture Notes in Computer Science vol:5255 pages:185-196 ispartof: Discovery Science location:Budapest, Hungary date:13 Oct - 16 Oct 2008 status: published
- Published
- 2008
14. An Efficiently Computable Graph-Based Metric for the Classification of Small Molecules
- Author
-
Hendrik Blockeel, Jan Ramon, Leander Schietgat, Maurice Bruynooghe, Boulicaut, JF, Berthold, MR, and Horvath, T
- Subjects
Discrete mathematics ,Matching (graph theory) ,Computer science ,Subgraph isomorphism problem ,chemoinformatics ,Hamiltonian path ,Graph ,Metric dimension ,symbols.namesake ,machine learning ,Chordal graph ,Outerplanar graph ,Partial k-tree ,symbols ,Induced subgraph isomorphism problem ,Graph operations ,Time complexity ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
In machine learning, there has been an increased interest in metrics on structured data. The application we focus on is drug discovery. Although graphs have become very popular for the representation of molecules, a lot of operations on graphs are NP-complete. Representing the molecules as outerplanar graphs, a subclass within general graphs, and using the block-and-bridge preserving subgraph isomorphism, we define a metric and we present an algorithm for computing it in polynomial time. We evaluate this metric and more generally also the block-and-bridge preserving matching operator on a large dataset of molecules, obtaining favorable results. ispartof: pages:197-209 ispartof: Lecture Notes in Computer Science vol:5255 pages:197-209 ispartof: International Conference on Discovery Science location:Budapest, Hungary date:13 Oct - 16 Oct 2008 status: published
- Published
- 2008
- Full Text
- View/download PDF
15. Traffic sign recognition using discriminative local features
- Author
-
Andrzej Ruta, Yongmin Li, Xiaohui Liu, Berthold, MR, Shawe-Taylor, J, and Lavrač, N
- Subjects
Discriminative model ,business.industry ,Traffic sign recognition ,Word error rate ,Pattern recognition ,Feature selection ,Artificial intelligence ,AdaBoost ,Representation (mathematics) ,business ,Distance transform ,Mathematics ,Sign (mathematics) - Abstract
Real-time road sign recognition has been of great interest for many years. This problem is often addressed in a two-stage procedure involving detection and classification. In this paper a novel approach to sign representation and classification is proposed. In many previous studies focus was put on deriving a set of discriminative features from a large amount of training data using global feature selection techniques e.g. Principal Component Analysis or AdaBoost. In our method we have chosen a simple yet robust image representation built on top of the Colour Distance Transform (CDT). Based on this representation, we introduce a feature selection algorithm which captures a variable-size set of local image regions ensuring maximum dissimilarity between each individual sign and all other signs. Experiments have shown that the discriminative local features extracted from the template sign images enable simple minimum-distance classification with error rate not exceeding 7%.
- Published
- 2007
16. Improved robustness in time series analysis of gene expression data by polynomial model based clustering
- Author
-
Christine A. Orengo, Nigel Martin, Allan Tucker, Xiaohui Liu, Stephen Swift, Michael Hirsch, Paul Kellam, Berthold, M.R., Glen, R.C., Fischer, I., Berthold, MR, Glen, R, and Fischer, I
- Subjects
Clustering high-dimensional data ,Fuzzy clustering ,Data stream clustering ,csis ,CURE data clustering algorithm ,Computer science ,Correlation clustering ,Canopy clustering algorithm ,Data mining ,Missing data ,computer.software_genre ,Cluster analysis ,computer - Abstract
Microarray experiments produce large data sets that often contain noise and considerable missing data. Typical clustering methods such as hierarchical clustering or partitional algorithms can often be adversely affected by such data. This paper introduces a method to overcome such problems associated with noise and missing data by modelling the time series data with polynomials and using these models to cluster the data. Similarity measures for polynomials are given that comply with commonly used standard measures. The polynomial model based clustering is compared with standard clustering methods under different conditions and applied to a real gene expression data set. It shows significantly better results as noise and missing data are increased.
- Published
- 2006
17. Biochemical Pathway Analysis via Signature Mining
- Author
-
Stephen Swift, Eleftherios Panteris, Annette M. Payne, Xiaohui Lui, and Berthold, MR
- Subjects
Microarray ,Microarray analysis techniques ,Computer science ,Computational biology ,Pathway analysis ,computer.software_genre ,Signature (logic) ,Metabolic pathway ,Simulated annealing ,Gene expression ,Data mining ,DNA microarray ,Gene ,Hill climbing ,computer ,Curse of dimensionality - Abstract
Biology has been revolutionised by microarrays and bioinformatics is now a powerful tool in the hands of biologists. Gene expression analysis is at the centre of attention over the last few years mostly in the form of algorithms, exploring cluster relationships and dynamic interactions between gene variables, and programs that try to display the multidimensional microarray data in appropriate formats so that they make biological sense. In this paper we propose a simple yet effective approach to biochemical pathway analysis based on biological knowledge. This approach, based on the concept of signature and heuristic search methods such as hill climbing and simulated annealing, is developed to select a subset of genes for each pathway that fully describes the behaviour of the pathway at a given experimental condition in a bid to reduce the dimensionality of microarray data and make the analysis more biologically relevant.
- Published
- 2005
- Full Text
- View/download PDF
18. Widening for MDL-based Retail Signature Discovery
- Author
-
Matthijs van Leeuwen, Clément Gautrais, Peggy Cellier, Alexandre Termier, Catholic University of Leuven - Katholieke Universiteit Leuven (KU Leuven), Semantics, Logics, Information Systems for Data-User Interaction ( SemLIS), GESTION DES DONNÉES ET DE LA CONNAISSANCE (IRISA-D7), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Rennes 1 (UR1), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Leiden Institute of Advanced Computer Science [Leiden] (LIACS), Universiteit Leiden [Leiden], Large Scale Collaborative Data Mining (LACODAM), Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-GESTION DES DONNÉES ET DE LA CONNAISSANCE (IRISA-D7), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Berthold, MR, Feelders, A, Krempl, G, Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Universiteit Leiden, and Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique)
- Subjects
Data stream ,[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] ,Computer science ,Minimum description length ,02 engineering and technology ,Space (commercial competition) ,computer.software_genre ,Signature (logic) ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Repetitive behavior ,Set (abstract data type) ,020204 information systems ,Encoding (memory) ,Signature discovery ,0202 electrical engineering, electronic engineering, information engineering ,Beam search ,020201 artificial intelligence & image processing ,Data mining ,Widening ,computer ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience; Signature patterns have been introduced to model repetitive behavior, e.g., of customers repeatedly buying the same set of products in consecutive time periods. A disadvantage of existing approaches to signature discovery, however, is that the required number of occurrences of a signature needs to be manually chosen. To address this limitation, we formalize the problem of selecting the best signature using the minimum description length (MDL) principle. To this end, we propose an encoding for signature models and for any data stream given such a signature model. As finding the MDL-optimal solution is unfeasible, we propose a novel algorithm that is an instance of widening, i.e., a diversified beam search that heuristically explores promising parts of the search space. Finally, we demonstrate the effectiveness of the problem formalization and the algorithm on a real-world retail dataset, and show that our approach yields relevant signatures.
19. A DNA Polymerase Variant Senses the Epigenetic Marker 5-Methylcytosine by Increased Misincorporation.
- Author
-
Henkel M, Fillbrunn A, Marchand V, Raghunathan G, Berthold MR, Motorin Y, and Marx A
- Subjects
- Epigenesis, Genetic, Thermus enzymology, Humans, DNA metabolism, DNA chemistry, 5-Methylcytosine metabolism, 5-Methylcytosine chemistry, DNA-Directed DNA Polymerase metabolism, DNA-Directed DNA Polymerase chemistry, DNA Methylation
- Abstract
Dysregulation of DNA methylation is associated with human disease, particularly cancer, and the assessment of aberrant methylation patterns holds great promise for clinical diagnostics. However, DNA polymerases do not effectively discriminate between processing 5-methylcytosine (5 mC) and unmethylated cytosine, resulting in the silencing of methylation information during amplification or sequencing. As a result, current detection methods require multi-step DNA conversion treatments or careful analysis of sequencing data to decipher individual 5 mC bases. To overcome these challenges, we propose a novel DNA polymerase-mediated 5 mC detection approach. Here, we describe the engineering of a thermostable DNA polymerase variant derived from Thermus aquaticus with altered fidelity towards 5 mC. Using a screening-based evolutionary approach, we have identified a DNA polymerase that exhibits increased misincorporation towards 5 mC during DNA synthesis. This DNA polymerase generates mutation signatures at methylated CpG sites, allowing direct detection of 5 mC by reading an increased error rate after sequencing without prior treatment of the sample DNA., (© 2024 The Authors. Angewandte Chemie International Edition published by Wiley-VCH GmbH.)
- Published
- 2024
- Full Text
- View/download PDF
20. SciJava Ops: an improved algorithms framework for Fiji and beyond.
- Author
-
Selzer GJ, Rueden CT, Hiner MC, Evans EL, Kolb D, Wiedenmann M, Birkhold C, Buchholz TO, Helfrich S, Northan B, Walter A, Schindelin J, Pietzsch T, Saalfeld S, Berthold MR, and Eliceiri KW
- Abstract
Decades of iteration on scientific imaging hardware and software has yielded an explosion in not only the size, complexity, and heterogeneity of image datasets but also in the tooling used to analyze this data. This wealth of image analysis tools, spanning different programming languages, frameworks, and data structures, is itself a problem for data analysts who must adapt to new technologies and integrate established routines to solve increasingly complex problems. While many "bridge" layers exist to unify pairs of popular tools, there exists a need for a general solution to unify new and existing toolkits. The SciJava Ops library presented here addresses this need through two novel principles. Algorithm implementations are declared as plugins called Ops, providing a uniform interface regardless of the toolkit they came from. Users express their needs declaratively to the Op environment, which can then find and adapt available Ops on demand. By using these principles instead of direct function calls, users can write streamlined workflows while avoiding the translation boilerplate of bridge layers. Developers can easily extend SciJava Ops to introduce new libraries and more efficient, specialized algorithm implementations, even immediately benefitting existing workflows. We provide several use cases showing both user and developer benefits, as well as benchmarking data to quantify the negligible impact on overall analysis performance. We have initially deployed SciJava Ops on the Fiji platform, however it would be suitable for integration with additional analysis platforms in the future., Competing Interests: Authors DK, MW, CB, SH, AW, and MB were employed by KNIME GmbH. Author JS was employed by Microsoft Corporation. Author BN was employed by True North Intelligent Algorithms. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision., (Copyright © 2024 Selzer, Rueden, Hiner, Evans, Kolb, Wiedenmann, Birkhold, Buchholz, Helfrich, Northan, Walter, Schindelin, Pietzsch, Saalfeld, Berthold and Eliceiri.)
- Published
- 2024
- Full Text
- View/download PDF
21. Integration of the ImageJ Ecosystem in the KNIME Analytics Platform.
- Author
-
Dietz C, Rueden CT, Helfrich S, Dobson ETA, Horn M, Eglinger J, Evans EL 3rd, McLean DT, Novitskaya T, Ricke WA, Sherer NM, Zijlstra A, Berthold MR, and Eliceiri KW
- Abstract
Open-source software tools are often used for analysis of scientific image data due to their flexibility and transparency in dealing with rapidly evolving imaging technologies. The complex nature of image analysis problems frequently requires many tools to be used in conjunction, including image processing and analysis, data processing, machine learning and deep learning, statistical analysis of the results, visualization, correlation to heterogeneous but related data, and more. However, the development, and therefore application, of these computational tools is impeded by a lack of integration across platforms. Integration of tools goes beyond convenience, as it is impractical for one tool to anticipate and accommodate the current and future needs of every user. This problem is emphasized in the field of bioimage analysis, where various rapidly emerging methods are quickly being adopted by researchers. ImageJ is a popular open-source image analysis platform, with contributions from a global community resulting in hundreds of specialized routines for a wide array of scientific tasks. ImageJ's strength lies in its accessibility and extensibility, allowing researchers to easily improve the software to solve their image analysis tasks. However, ImageJ is not designed for development of complex end-to-end image analysis workflows. Scientists are often forced to create highly specialized and hard-to-reproduce scripts to orchestrate individual software fragments and cover the entire life-cycle of an analysis of an image dataset. KNIME Analytics Platform, a user-friendly data integration, analysis, and exploration workflow system, was designed to handle huge amounts of heterogeneous data in a platform-agnostic, computing environment and has been successful in meeting complex end-to-end demands in several communities, such as cheminformatics and mass spectrometry. Similar needs within the bioimage analysis community led to the creation of the KNIME Image Processing extension which integrates ImageJ into KNIME Analytics Platform, enabling researchers to develop reproducible and scalable workflows, integrating a diverse range of analysis tools. Here we present how users and developers alike can leverage the ImageJ ecosystem via the KNIME Image Processing extension to provide robust and extensible image analysis within KNIME workflows. We illustrate the benefits of this integration with examples, as well as representative scientific use cases., Competing Interests: 5Conflict of Interest CD, SH and MRB have a financial interest in KNIME GmbH, the company developing and supporting KNIME Analytics Platform. All other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
- Published
- 2020
- Full Text
- View/download PDF
22. Development of a neural rosette formation assay (RoFA) to identify neurodevelopmental toxicants and to characterize their transcriptome disturbances.
- Author
-
Dreser N, Madjar K, Holzer AK, Kapitza M, Scholz C, Kranaster P, Gutbier S, Klima S, Kolb D, Dietz C, Trefzer T, Meisig J, van Thriel C, Henry M, Berthold MR, Blüthgen N, Sachinidis A, Rahnenführer J, Hengstler JG, Waldmann T, and Leist M
- Subjects
- Cell Differentiation drug effects, Gene Expression Regulation drug effects, Humans, Neural Stem Cells cytology, Neural Stem Cells physiology, Oligonucleotide Array Sequence Analysis, Time Factors, Neural Stem Cells drug effects, Neurodevelopmental Disorders chemically induced, Neurotoxins pharmacology, Rosette Formation methods, Toxicity Tests methods
- Abstract
The first in vitro tests for developmental toxicity made use of rodent cells. Newer teratology tests, e.g. developed during the ESNATS project, use human cells and measure mechanistic endpoints (such as transcriptome changes). However, the toxicological implications of mechanistic parameters are hard to judge, without functional/morphological endpoints. To address this issue, we developed a new version of the human stem cell-based test STOP-tox
(UKN) . For this purpose, the capacity of the cells to self-organize to neural rosettes was assessed as functional endpoint: pluripotent stem cells were allowed to differentiate into neuroepithelial cells for 6 days in the presence or absence of toxicants. Then, both transcriptome changes were measured (standard STOP-tox(UKN) ) and cells were allowed to form rosettes. After optimization of staining methods, an imaging algorithm for rosette quantification was implemented and used for an automated rosette formation assay (RoFA). Neural tube toxicants (like valproic acid), which are known to disturb human development at stages when rosette-forming cells are present, were used as positive controls. Established toxicants led to distinctly different tissue organization and differentiation stages. RoFA outcome and transcript changes largely correlated concerning (1) the concentration-dependence, (2) the time dependence, and (3) the set of positive hits identified amongst 24 potential toxicants. Using such comparative data, a prediction model for the RoFA was developed. The comparative analysis was also used to identify gene dysregulations that are particularly predictive for disturbed rosette formation. This 'RoFA predictor gene set' may be used for a simplified and less costly setup of the STOP-tox(UKN) assay.- Published
- 2020
- Full Text
- View/download PDF
23. Whither systems medicine?
- Author
-
Apweiler R, Beissbarth T, Berthold MR, Blüthgen N, Burmeister Y, Dammann O, Deutsch A, Feuerhake F, Franke A, Hasenauer J, Hoffmann S, Höfer T, Jansen PL, Kaderali L, Klingmüller U, Koch I, Kohlbacher O, Kuepfer L, Lammert F, Maier D, Pfeifer N, Radde N, Rehm M, Roeder I, Saez-Rodriguez J, Sax U, Schmeck B, Schuppert A, Seilheimer B, Theis FJ, Vera J, and Wolkenhauer O
- Subjects
- Decision Support Systems, Clinical, Humans, Translational Research, Biomedical, Biomedical Research, Systems Analysis
- Abstract
New technologies to generate, store and retrieve medical and research data are inducing a rapid change in clinical and translational research and health care. Systems medicine is the interdisciplinary approach wherein physicians and clinical investigators team up with experts from biology, biostatistics, informatics, mathematics and computational modeling to develop methods to use new and stored data to the benefit of the patient. We here provide a critical assessment of the opportunities and challenges arising out of systems approaches in medicine and from this provide a definition of what systems medicine entails. Based on our analysis of current developments in medicine and healthcare and associated research needs, we emphasize the role of systems medicine as a multilevel and multidisciplinary methodological framework for informed data acquisition and interdisciplinary data analysis to extract previously inaccessible knowledge for the benefit of patients.
- Published
- 2018
- Full Text
- View/download PDF
24. Automated workflows for modelling chemical fate, kinetics and toxicity.
- Author
-
Sala Benito JV, Paini A, Richarz AN, Meinl T, Berthold MR, Cronin MTD, and Worth AP
- Subjects
- Automation, Cell Line, Cell Survival, Computer Simulation, Humans, Risk Assessment, Models, Biological, Software
- Abstract
Automation is universal in today's society, from operating equipment such as machinery, in factory processes, to self-parking automobile systems. While these examples show the efficiency and effectiveness of automated mechanical processes, automated procedures that support the chemical risk assessment process are still in their infancy. Future human safety assessments will rely increasingly on the use of automated models, such as physiologically based kinetic (PBK) and dynamic models and the virtual cell based assay (VCBA). These biologically-based models will be coupled with chemistry-based prediction models that also automate the generation of key input parameters such as physicochemical properties. The development of automated software tools is an important step in harmonising and expediting the chemical safety assessment process. In this study, we illustrate how the KNIME Analytics Platform can be used to provide a user-friendly graphical interface for these biokinetic models, such as PBK models and VCBA, which simulates the fate of chemicals in vivo within the body and in vitro test systems respectively., (Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.)
- Published
- 2017
- Full Text
- View/download PDF
25. KNIME for reproducible cross-domain analysis of life science data.
- Author
-
Fillbrunn A, Dietz C, Pfeuffer J, Rahn R, Landrum GA, and Berthold MR
- Subjects
- Biological Science Disciplines, High-Throughput Nucleotide Sequencing, Image Processing, Computer-Assisted, Mass Spectrometry, Computational Biology, Software
- Abstract
Experiments in the life sciences often involve tools from a variety of domains such as mass spectrometry, next generation sequencing, or image processing. Passing the data between those tools often involves complex scripts for controlling data flow, data transformation, and statistical analysis. Such scripts are not only prone to be platform dependent, they also tend to grow as the experiment progresses and are seldomly well documented, a fact that hinders the reproducibility of the experiment. Workflow systems such as KNIME Analytics Platform aim to solve these problems by providing a platform for connecting tools graphically and guaranteeing the same results on different operating systems. As an open source software, KNIME allows scientists and programmers to provide their own extensions to the scientific community. In this review paper we present selected extensions from the life sciences that simplify data exploration, analysis, and visualization and are interoperable due to KNIME's unified data model. Additionally, we name other workflow systems that are commonly used in the life sciences and highlight their similarities and differences to KNIME., (Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.)
- Published
- 2017
- Full Text
- View/download PDF
26. A KNIME-Based Analysis of the Zebrafish Photomotor Response Clusters the Phenotypes of 14 Classes of Neuroactive Molecules.
- Author
-
Copmans D, Meinl T, Dietz C, van Leeuwen M, Ortmann J, Berthold MR, and de Witte PA
- Subjects
- Animals, Ligands, Neurotransmitter Agents therapeutic use, Phenotype, Small Molecule Libraries therapeutic use, Zebrafish embryology, Zebrafish physiology, Drug Discovery methods, High-Throughput Screening Assays methods, Neurotransmitter Agents isolation & purification, Small Molecule Libraries isolation & purification
- Abstract
Recently, the photomotor response (PMR) of zebrafish embryos was reported as a robust behavior that is useful for high-throughput neuroactive drug discovery and mechanism prediction. Given the complexity of the PMR, there is a need for rapid and easy analysis of the behavioral data. In this study, we developed an automated analysis workflow using the KNIME Analytics Platform and made it freely accessible. This workflow allows us to simultaneously calculate a behavioral fingerprint for all analyzed compounds and to further process the data. Furthermore, to further characterize the potential of PMR for mechanism prediction, we performed PMR analysis of 767 neuroactive compounds covering 14 different receptor classes using the KNIME workflow. We observed a true positive rate of 25% and a false negative rate of 75% in our screening conditions. Among the true positives, all receptor classes were represented, thereby confirming the utility of the PMR assay to identify a broad range of neuroactive molecules. By hierarchical clustering of the behavioral fingerprints, different phenotypical clusters were observed that suggest the utility of PMR for mechanism prediction for adrenergics, dopaminergics, serotonergics, metabotropic glutamatergics, opioids, and ion channel ligands., (© 2015 Society for Laboratory Automation and Screening.)
- Published
- 2016
- Full Text
- View/download PDF
27. KNIME for Open-Source Bioimage Analysis: A Tutorial.
- Author
-
Dietz C and Berthold MR
- Subjects
- Animals, Humans, Image Processing, Computer-Assisted methods, Microscopy, Fluorescence instrumentation, Workflow, Algorithms, Image Processing, Computer-Assisted statistics & numerical data, Microscopy, Fluorescence methods, Software
- Abstract
The open analytics platform KNIME is a modular environment that enables easy visual assembly and interactive execution of workflows. KNIME is already widely used in various areas of research, for instance in cheminformatics or classical data analysis. In this tutorial the KNIME Image Processing Extension is introduced, which adds the capabilities to process and analyse huge amounts of images. In combination with other KNIME extensions, KNIME Image Processing opens up new possibilities for inter-domain analysis of image data in an understandable and reproducible way.
- Published
- 2016
- Full Text
- View/download PDF
28. MARK-AGE data management: Cleaning, exploration and visualization of data.
- Author
-
Baur J, Moreno-Villanueva M, Kötter T, Sindlinger T, Bürkle A, Berthold MR, and Junk M
- Subjects
- Biomarkers metabolism, Female, Humans, Male, Aging metabolism, Database Management Systems, Databases, Factual
- Abstract
Databases are an organized collection of data and necessary to investigate a wide spectrum of research questions. For data evaluation analyzers should be aware of possible data quality problems that can compromise results validity. Therefore data cleaning is an essential part of the data management process, which deals with the identification and correction of errors in order to improve data quality. In our cross-sectional study, biomarkers of ageing, analytical, anthropometric and demographic data from about 3000 volunteers have been collected in the MARK-AGE database. Although several preventive strategies were applied before data entry, errors like miscoding, missing values, batch problems etc., could not be avoided completely. Such errors can result in misleading information and affect the validity of the performed data analysis. Here we present an overview of the methods we applied for dealing with errors in the MARK-AGE database. We especially describe our strategies for the detection of missing values, outliers and batch effects and explain how they can be handled to improve data quality. Finally we report about the tools used for data exploration and data sharing between MARK-AGE collaborators., (Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.)
- Published
- 2015
- Full Text
- View/download PDF
29. The MARK-AGE phenotypic database: Structure and strategy.
- Author
-
Moreno-Villanueva M, Kötter T, Sindlinger T, Baur J, Oehlke S, Bürkle A, and Berthold MR
- Subjects
- Biomarkers blood, Biomarkers urine, Confidentiality, Female, Humans, Male, Surveys and Questionnaires, Aging blood, Aging urine, Databases, Factual, Information Storage and Retrieval
- Abstract
In the context of the MARK-AGE study, anthropometric, clinical and social data as well as samples of venous blood, buccal mucosal cells and urine were systematically collected from 3337 volunteers. Information from about 500 standardised questions and about 500 analysed biomarkers needed to be documented per individual. On the one hand handling with such a vast amount of data necessitates the use of appropriate informatics tools and the establishment of a database. On the other hand personal information on subjects obtained as a result of such studies has, of course, to be kept confidential, and therefore the investigators must ensure that the subjects' anonymity will be maintained. Such secrecy obligation implies a well-designed and secure system for data storage. In order to fulfil the demands of the MARK-AGE study we established a phenotypic database for storing information on the study subjects by using a doubly coded system., (Copyright © 2015. Published by Elsevier Ireland Ltd.)
- Published
- 2015
- Full Text
- View/download PDF
30. The MARK-AGE extended database: data integration and pre-processing.
- Author
-
Baur J, Kötter T, Moreno-Villanueva M, Sindlinger T, Berthold MR, Bürkle A, and Junk M
- Subjects
- Confidentiality, Female, Humans, Male, Aging physiology, Databases, Factual, Information Storage and Retrieval
- Abstract
MARK-AGE is a recently completed European population study, where bioanalytical and anthropometric data were collected from human subjects at a large scale. To facilitate data analysis and mathematical modelling, an extended database had to be constructed, integrating the data sources that were part of the project. This step involved checking, transformation and documentation of data. The success of downstream analysis mainly depends on the preparation and quality of the integrated data. Here, we present the pre-processing steps applied to the MARK-AGE data to ensure high quality and reliability in the MARK-AGE Extended Database. Various kinds of obstacles that arose during the project are highlighted and solutions are presented., (Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.)
- Published
- 2015
- Full Text
- View/download PDF
31. Connecting proteins with drug-like compounds: Open source drug discovery workflows with BindingDB and KNIME.
- Author
-
Nicola G, Berthold MR, Hedrick MP, and Gilson MK
- Subjects
- Animals, Humans, Protein Binding, Drug Discovery, Drug Interactions, Knowledge Bases, Pharmaceutical Preparations, Pharmacokinetics, Proteins genetics, Proteins metabolism
- Abstract
Today's large, public databases of protein-small molecule interaction data are creating important new opportunities for data mining and integration. At the same time, new graphical user interface-based workflow tools offer facile alternatives to custom scripting for informatics and data analysis. Here, we illustrate how the large protein-ligand database BindingDB may be incorporated into KNIME workflows as a step toward the integration of pharmacological data with broader biomolecular analyses. Thus, we describe a collection of KNIME workflows that access BindingDB data via RESTful webservices and, for more intensive queries, via a local distillation of the full BindingDB dataset. We focus in particular on the KNIME implementation of knowledge-based tools to generate informed hypotheses regarding protein targets of bioactive compounds, based on notions of chemical similarity. A number of variants of this basic approach are tested for seven existing drugs with relatively ill-defined therapeutic targets, leading to replication of some previously confirmed results and discovery of new, high-quality hits. Implications for future development are discussed. Database URL: www.bindingdb.org., (© The Author(s) 2015. Published by Oxford University Press.)
- Published
- 2015
- Full Text
- View/download PDF
32. Grouping of histone deacetylase inhibitors and other toxicants disturbing neural crest migration by transcriptional profiling.
- Author
-
Dreser N, Zimmer B, Dietz C, Sügis E, Pallocca G, Nyffeler J, Meisig J, Blüthgen N, Berthold MR, Waldmann T, and Leist M
- Subjects
- Cell Line, Transformed, Discriminant Analysis, Gene Expression Profiling, Green Fluorescent Proteins genetics, Green Fluorescent Proteins metabolism, Human Embryonic Stem Cells, Humans, Hydroxamic Acids pharmacology, Oligonucleotide Array Sequence Analysis, Time Factors, Toxicity Tests, Transfection, Up-Regulation drug effects, Vorinostat, Cell Movement drug effects, Hazardous Substances pharmacology, Histone Deacetylase Inhibitors pharmacology, Neural Crest drug effects, Transcription, Genetic drug effects
- Abstract
Functional assays, such as the "migration inhibition of neural crest cells" (MINC) developmental toxicity test, can identify toxicants without requiring knowledge on their mode of action (MoA). Here, we were interested, whether (i) inhibition of migration by structurally diverse toxicants resulted in a unified signature of transcriptional changes; (ii) whether statistically-identified transcript patterns would inform on compound grouping even though individual genes were little regulated, and (iii) whether analysis of a small group of biologically-relevant transcripts would allow the grouping of compounds according to their MoA. We analyzed transcripts of 35 'migration genes' after treatment with 16 migration-inhibiting toxicants. Clustering, principal component analysis and correlation analyses of the data showed that mechanistically related compounds (e.g. histone deacetylase inhibitors (HDACi), PCBs) triggered similar transcriptional changes, but groups of structurally diverse toxicants largely differed in their transcriptional effects. Linear discriminant analysis (LDA) confirmed the specific clustering of HDACi across multiple separate experiments. Similarity of the signatures of the HDACi trichostatin A and suberoylanilide hydroxamic acid to the one of valproic acid (VPA), suggested that the latter compound acts as HDACi when impairing neural crest migration. In conclusion, the data suggest that (i) a given functional effect (e.g. inhibition of migration) can be associated with highly diverse signatures of transcript changes; (ii) statistically significant grouping of mechanistically-related compounds can be achieved on the basis of few genes with small regulations. Thus, incorporation of mechanistic markers in functional in vitro tests may support read-across procedures, also for structurally un-related compounds., (Copyright © 2015 Elsevier Inc. All rights reserved.)
- Published
- 2015
- Full Text
- View/download PDF
33. Workflows for automated downstream data analysis and visualization in large-scale computational mass spectrometry.
- Author
-
Aiche S, Sachsenberg T, Kenar E, Walzer M, Wiswedel B, Kristl T, Boyles M, Duschl A, Huber CG, Berthold MR, Reinert K, and Kohlbacher O
- Subjects
- Computer Graphics, Data Interpretation, Statistical, Humans, Metabolomics, Proteomics, Tandem Mass Spectrometry, Workflow, Software
- Abstract
MS-based proteomics and metabolomics are rapidly evolving research fields driven by the development of novel instruments, experimental approaches, and analysis methods. Monolithic analysis tools perform well on single tasks but lack the flexibility to cope with the constantly changing requirements and experimental setups. Workflow systems, which combine small processing tools into complex analysis pipelines, allow custom-tailored and flexible data-processing workflows that can be published or shared with collaborators. In this article, we present the integration of established tools for computational MS from the open-source software framework OpenMS into the workflow engine Konstanz Information Miner (KNIME) for the analysis of large datasets and production of high-quality visualizations. We provide example workflows to demonstrate combined data processing and visualization for three diverse tasks in computational MS: isobaric mass tag based quantitation in complex experimental setups, label-free quantitation and identification of metabolites, and quality control for proteomics experiments., (© 2015 The Authors. PROTEOMICS published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2015
- Full Text
- View/download PDF
34. From transient transcriptome responses to disturbed neurodevelopment: role of histone acetylation and methylation as epigenetic switch between reversible and irreversible drug effects.
- Author
-
Balmer NV, Klima S, Rempel E, Ivanova VN, Kolde R, Weng MK, Meganathan K, Henry M, Sachinidis A, Berthold MR, Hengstler JG, Rahnenführer J, Waldmann T, and Leist M
- Subjects
- Acetylation drug effects, Cell Differentiation drug effects, Epigenesis, Genetic, Eye Proteins genetics, Gene Expression Regulation drug effects, Histone Deacetylase Inhibitors administration & dosage, Histone Deacetylase Inhibitors toxicity, Homeodomain Proteins genetics, Humans, Hydroxamic Acids administration & dosage, Methylation drug effects, PAX6 Transcription Factor, Paired Box Transcription Factors genetics, Repressor Proteins genetics, Time Factors, Transcriptome, Valproic Acid administration & dosage, Embryonic Stem Cells cytology, Histones metabolism, Hydroxamic Acids toxicity, Valproic Acid toxicity
- Abstract
The superordinate principles governing the transcriptome response of differentiating cells exposed to drugs are still unclear. Often, it is assumed that toxicogenomics data reflect the immediate mode of action (MoA) of drugs. Alternatively, transcriptome changes could describe altered differentiation states as indirect consequence of drug exposure. We used here the developmental toxicants valproate and trichostatin A to address this question. Neurally differentiating human embryonic stem cells were treated for 6 days. Histone acetylation (primary MoA) increased quickly and returned to baseline after 48 h. Histone H3 lysine methylation at the promoter of the neurodevelopmental regulators PAX6 or OTX2 was increasingly altered over time. Methylation changes remained persistent and correlated with neurodevelopmental defects and with effects on PAX6 gene expression, also when the drug was washed out after 3-4 days. We hypothesized that drug exposures altering only acetylation would lead to reversible transcriptome changes (indicating MoA), and challenges that altered methylation would lead to irreversible developmental disturbances. Data from pulse-chase experiments corroborated this assumption. Short drug treatment triggered reversible transcriptome changes; longer exposure disrupted neurodevelopment. The disturbed differentiation was reflected by an altered transcriptome pattern, and the observed changes were similar when the drug was washed out during the last 48 h. We conclude that transcriptome data after prolonged chemical stress of differentiating cells mainly reflect the altered developmental stage of the model system and not the drug MoA. We suggest that brief exposures, followed by immediate analysis, are more suitable for information on immediate drug responses and the toxicity MoA.
- Published
- 2014
- Full Text
- View/download PDF
35. Looking over the rim: algorithms for cheminformatics from computer scientists.
- Author
-
Meinl T, Wiswedel B, and Berthold MR
- Published
- 2014
- Full Text
- View/download PDF
36. Post-transcriptional Boolean computation by combining aptazymes controlling mRNA translation initiation and tRNA activation.
- Author
-
Klauser B, Saragliadis A, Ausländer S, Wieland M, Berthold MR, and Hartig JS
- Subjects
- Escherichia coli genetics, Escherichia coli metabolism, Nucleic Acid Conformation, Protein Biosynthesis genetics, RNA, Catalytic metabolism, RNA, Messenger genetics, RNA, Transfer genetics
- Abstract
In cellular systems environmental and metabolic signals are integrated for the conditional control of gene expression. On the other hand, artificial manipulation of gene expression is of high interest for metabolic and genetic engineering. Especially the reprogramming of gene expression patterns to orchestrate cellular responses in a predictable fashion is considered to be of great importance. Here we introduce a highly modular RNA-based system for performing Boolean logic computation at a post-transcriptional level in Escherichia coli. We have previously shown that artificial riboswitches can be constructed by utilizing ligand-dependent Hammerhead ribozymes (aptazymes). Employing RNA self-cleavage as the expression platform-mechanism of an artificial riboswitch has the advantage that it can be applied to control several classes of RNAs such as mRNAs, tRNAs, and rRNAs. Due to the highly modular and orthogonal nature of these switches it is possible to combine aptazyme regulation of activating a suppressor tRNA with the regulation of mRNA translation initiation. The different RNA classes can be controlled individually by using distinct aptamers for individual RNA switches. Boolean logic devices are assembled by combining such switches in order to act on the expression of a single mRNA. In order to demonstrate the high modularity, a series of two-input Boolean logic operators were constructed. For this purpose, we expanded our aptazyme toolbox with switches comprising novel behaviours with respect to the small molecule triggers thiamine pyrophosphate (TPP) and theophylline. Then, individual switches were combined to yield AND, NOR, and ANDNOT gates. This study demonstrates that post-transcriptional aptazyme-based switches represent versatile tools for engineering advanced genetic devices and circuits without the need for regulatory protein cofactors.
- Published
- 2012
- Full Text
- View/download PDF
37. Biological imaging software tools.
- Author
-
Eliceiri KW, Berthold MR, Goldberg IG, Ibáñez L, Manjunath BS, Martone ME, Murphy RF, Peng H, Plant AL, Roysam B, Stuurman N, Swedlow JR, Tomancak P, and Carpenter AE
- Subjects
- Equipment Design, Software Design, Computational Biology instrumentation, Computational Biology methods, Image Processing, Computer-Assisted instrumentation, Image Processing, Computer-Assisted methods, Information Storage and Retrieval methods, Software
- Abstract
Few technologies are more widespread in modern biological laboratories than imaging. Recent advances in optical technologies and instrumentation are providing hitherto unimagined capabilities. Almost all these advances have required the development of software to enable the acquisition, management, analysis and visualization of the imaging data. We review each computational step that biologists encounter when dealing with digital images, the inherent challenges and the overall status of available software for bioimage informatics, focusing on open-source options.
- Published
- 2012
- Full Text
- View/download PDF
38. Maximum-score diversity selection for early drug discovery.
- Author
-
Meinl T, Ostermann C, and Berthold MR
- Subjects
- Algorithms, Cyclin-Dependent Kinase 2 metabolism, Inhibitory Concentration 50, Data Mining methods, Drug Discovery methods
- Abstract
Diversity selection is a common task in early drug discovery. One drawback of current approaches is that usually only the structural diversity is taken into account, therefore, activity information is ignored. In this article, we present a modified version of diversity selection, which we term Maximum-Score Diversity Selection, that additionally takes the estimated or predicted activities of the molecules into account. We show that finding an optimal solution to this problem is computationally very expensive (it is NP-hard), and therefore, heuristic approaches are needed. After a discussion of existing approaches, we present our new method, which is computationally far more efficient but at the same time produces comparable results. We conclude by validating these theoretical differences on several data sets.
- Published
- 2011
- Full Text
- View/download PDF
39. The coming of age of artificial intelligence in medicine.
- Author
-
Patel VL, Shortliffe EH, Stefanelli M, Szolovits P, Berthold MR, Bellazzi R, and Abu-Hanna A
- Subjects
- Humans, Artificial Intelligence, Biomedical Research trends, Medical Informatics Applications
- Abstract
This paper is based on a panel discussion held at the Artificial Intelligence in Medicine Europe (AIME) conference in Amsterdam, The Netherlands, in July 2007. It had been more than 15 years since Edward Shortliffe gave a talk at AIME in which he characterized artificial intelligence (AI) in medicine as being in its "adolescence" (Shortliffe EH. The adolescence of AI in medicine: will the field come of age in the '90s? Artificial Intelligence in Medicine 1993;5:93-106). In this article, the discussants reflect on medical AI research during the subsequent years and characterize the maturity and influence that has been achieved to date. Participants focus on their personal areas of expertise, ranging from clinical decision-making, reasoning under uncertainty, and knowledge representation to systems integration, translational bioinformatics, and cognitive issues in both the modeling of expertise and the creation of acceptable systems.
- Published
- 2009
- Full Text
- View/download PDF
40. Knowledge-based and data-driven models in arrhythmia fuzzy classification.
- Author
-
Silipo R, Vergassola R, Zong W, and Berthold MR
- Subjects
- Arrhythmias, Cardiac diagnosis, Databases, Factual, Humans, Pattern Recognition, Automated, Signal Processing, Computer-Assisted, Arrhythmias, Cardiac classification, Artificial Intelligence, Decision Making, Computer-Assisted, Electrocardiography, Fuzzy Logic
- Abstract
Objectives: Fuzzy rules automatically derived from a set of training examples quite often produce better classification results than fuzzy rules translated from medical knowledge. This study aims to investigate the difference in domain representation between a knowledge-based and a data-driven fuzzy system applied to an electrocardiography classification problem., Methods: For a three-class electrocardiographic arrhythmia classification task a set of fifteen fuzzy rules is derived from medical expertise on the basis of twelve electrocardiographic measures. A second set of fuzzy rules is automatically constructed on thirty-nine MIT-BIH database's records. The performances of the two classifiers on thirteen different records are comparable and up to a certain extent complementary. The two fuzzy models are then analyzed, by using the concept of information gain to estimate the impact of each ECG measure on each fuzzy decision process., Results: Both systems rely on the beat prematurity degree and the QRS complex width and neglect the P wave existence and the ST segment features. The PR interval is not well characterized across the fuzzy medical rules while it plays an important role in the data-driven fuzzy system. The T wave area shows a higher information gain in the knowledge based decision process, and is not very much exploited by the data-driven system., Conclusions: The main difference between a human designed and a data driven ECG arrhythmia classifier is found about the PR interval and the T wave.
- Published
- 2001
41. Input features' impact on fuzzy decision processes.
- Author
-
Silipo R and Berthold MR
- Abstract
Many real-world applications have very high dimensionality and require very complex decision borders. In this case, the number of fuzzy rules can proliferate, and the easy interpretability of fuzzy models can progressively disappear. An important part of the model interpretation lies on the evaluation of the effectiveness of the input features on the decision process. In this paper, we present a method that quantifies the discriminative power of the input features in a fuzzy model. The separability among all the rules of the fuzzy model produces a measure of the information available in the system. Such measure of information is calculated to characterize the system before and after each input feature is used for classification. The resulting information gain quantifies the discriminative power of that input feature. The comparison among the information gains of the different input features can yield better insights into the selected fuzzy classification strategy, even for very high dimensional cases, and can lead to a possible reduction of the input space dimension. Several artificial and real-world data analysis scenarios are reported as examples in order to illustrate the characteristics and potentialities of the proposed method.
- Published
- 2000
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.