Author: "Boulesteix, Anne‐Laure" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Boulesteix, Anne‐Laure"' showing total 657 results

Start Over Author "Boulesteix, Anne‐Laure"

657 results on '"Boulesteix, Anne‐Laure"'

1. Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark Study

Author: Schulz-Kümpel, Hannah, Fischer, Sebastian, Nagler, Thomas, Boulesteix, Anne-Laure, Bischl, Bernd, and Hornung, Roman
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: When assessing the quality of prediction models in machine learning, confidence intervals (CIs) for the generalization error, which measures predictive performance, are a crucial tool. Luckily, there exist many methods for computing such CIs and new promising approaches are continuously being proposed. Typically, these methods combine various resampling procedures, most popular among them cross-validation and bootstrapping, with different variance estimation techniques. Unfortunately, however, there is currently no consensus on when any of these combinations may be most reliably employed and how they generally compare. In this work, we conduct the first large-scale study comparing CIs for the generalization error - empirically evaluating 13 different methods on a total of 18 tabular regression and classification problems, using four different inducers and a total of eight loss functions. We give an overview of the methodological foundations and inherent challenges of constructing CIs for the generalization error and provide a concise review of all 13 methods in a unified framework. Finally, the CI methods are evaluated in terms of their relative coverage frequency, width, and runtime. Based on these findings, we are able to identify a subset of methods that we would recommend. We also publish the datasets as a benchmarking suite on OpenML and our code on GitHub to serve as a basis for further studies.
Published: 2024

2. On the handling of method failure in comparison studies

Author: Wünsch, Milena, Herrmann, Moritz, Noltenius, Elisa, Mohr, Mattia, Morris, Tim P., and Boulesteix, Anne-Laure
Subjects: Statistics - Methodology
Abstract: Comparison studies in methodological research are intended to compare methods in an evidence-based manner, offering guidance to data analysts to select a suitable method for their application. To provide trustworthy evidence, they must be carefully designed, implemented, and reported, especially given the many decisions made in planning and running. A common challenge in comparison studies is to handle the ``failure'' of one or more methods to produce a result for some (real or simulated) data sets, such that their performances cannot be measured in those instances. Despite an increasing emphasis on this topic in recent literature (focusing on non-convergence as a common manifestation), there is little guidance on proper handling and interpretation, and reporting of the chosen approach is often neglected. This paper aims to fill this gap and provides practical guidance for handling method failure in comparison studies. In particular, we show that the popular approaches of discarding data sets yielding failure (either for all or the failing methods only) and imputing are inappropriate in most cases. We also discuss how method failure in published comparison studies -- in various contexts from classical statistics and predictive modeling -- may manifest differently, but is often caused by a complex interplay of several aspects. Building on this, we provide recommendations derived from realistic considerations on suitable fallbacks when encountering method failure, hence avoiding the need for discarding data sets or imputation. Finally, we illustrate our recommendations and the dangers of inadequate handling of method failure through two illustrative comparison studies.
Published: 2024

3. Position: Why We Must Rethink Empirical Research in Machine Learning

Author: Herrmann, Moritz, Lange, F. Julian D., Eggensperger, Katharina, Casalicchio, Giuseppe, Wever, Marcel, Feurer, Matthias, Rügamer, David, Hüllermeier, Eyke, Boulesteix, Anne-Laure, and Bischl, Bernd
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We warn against a common but incomplete understanding of empirical research in machine learning that leads to non-replicable results, makes findings unreliable, and threatens to undermine progress in the field. To overcome this alarming situation, we call for more awareness of the plurality of ways of gaining knowledge experimentally but also of some epistemic limitations. In particular, we argue most current empirical machine learning research is fashioned as confirmatory research while it should rather be considered exploratory., Comment: 20 pages, accepted for publication at ICML 2024, camera-ready version
Published: 2024

4. Understanding overfitting in random forest for probability estimation: a visualization and simulation study

Author: Barreñada, Lasai, Dhiman, Paula, Timmerman, Dirk, Boulesteix, Anne-Laure, and Van Calster, Ben
Subjects: Statistics - Methodology, Computer Science - Computers and Society, Computer Science - Machine Learning
Abstract: Random forests have become popular for clinical risk prediction modelling. In a case study on predicting ovarian malignancy, we observed training c-statistics close to 1. Although this suggests overfitting, performance was competitive on test data. We aimed to understand the behaviour of random forests by (1) visualizing data space in three real world case studies and (2) a simulation study. For the case studies, risk estimates were visualised using heatmaps in a 2-dimensional subspace. The simulation study included 48 logistic data generating mechanisms (DGM), varying the predictor distribution, the number of predictors, the correlation between predictors, the true c-statistic and the strength of true predictors. For each DGM, 1000 training datasets of size 200 or 4000 were simulated and RF models trained with minimum node size 2 or 20 using ranger package, resulting in 192 scenarios in total. The visualizations suggested that the model learned spikes of probability around events in the training set. A cluster of events created a bigger peak, isolated events local peaks. In the simulation study, median training c-statistics were between 0.97 and 1 unless there were 4 or 16 binary predictors with minimum node size 20. Median test c-statistics were higher with higher events per variable, higher minimum node size, and binary predictors. Median training slopes were always above 1, and were not correlated with median test slopes across scenarios (correlation -0.11). Median test slopes were higher with higher true c-statistic, higher minimum node size, and higher sample size. Random forests learn local probability peaks that often yield near perfect training c-statistics without strongly affecting c-statistics on test data. When the aim is probability estimation, the simulation results go against the common recommendation to use fully grown trees in random forest models., Comment: 20 pages, 8 figures
Published: 2024
Full Text: View/download PDF

5. To tweak or not to tweak. How exploiting flexibilities in gene set analysis leads to over-optimism

Author: Wünsch, Milena, Sauer, Christina, Herrmann, Moritz, Hinske, Ludwig Christian, and Boulesteix, Anne-Laure
Subjects: Statistics - Applications
Abstract: Gene set analysis, a popular approach for analysing high-throughput gene expression data, aims to identify sets of genes that show enriched expression patterns between two conditions. In addition to the multitude of methods available for this task, users are typically left with many options when creating the required input and specifying the internal parameters of the chosen method. This flexibility can lead to uncertainty about the 'right' choice, further reinforced by a lack of evidence-based guidance. Especially when their statistical experience is scarce, this uncertainty might entice users to produce preferable results using a 'trial-and-error' approach. While it may seem unproblematic at first glance, this practice can be viewed as a form of 'cherry-picking' and cause an optimistic bias, rendering the results non-replicable on independent data. After this problem has attracted a lot of attention in the context of classical hypothesis testing, we now aim to raise awareness of such over-optimism in the different and more complex context of gene set analyses. We mimic a hypothetical researcher who systematically selects the analysis variants yielding their preferred results, thereby considering three distinct goals they might pursue. Using a selection of popular gene set analysis methods, we tweak the results in this way for two frequently used benchmark gene expression data sets. Our study indicates that the potential for over-optimism is particularly high for a group of methods frequently used despite being commonly criticised. We conclude by providing practical recommendations to counter over-optimism in research findings in gene set analysis and beyond.
Published: 2024

6. Addressing researcher degrees of freedom through minP adjustment

Author: Mandl, Maximilian M, Becker-Pennrich, Andrea S, Hinske, Ludwig C, Hoffmann, Sabine, and Boulesteix, Anne-Laure
Subjects: Statistics - Methodology
Abstract: When different researchers study the same research question using the same dataset they may obtain different and potentially even conflicting results. This is because there is often substantial flexibility in researchers' analytical choices, an issue also referred to as ''researcher degrees of freedom''. Combined with selective reporting of the smallest p-value or largest effect, researcher degrees of freedom may lead to an increased rate of false positive and overoptimistic results. In this paper, we address this issue by formalizing the multiplicity of analysis strategies as a multiple testing problem. As the test statistics of different analysis strategies are usually highly dependent, a naive approach such as the Bonferroni correction is inappropriate because it leads to an unacceptable loss of power. Instead, we propose using the ''minP'' adjustment method, which takes potential test dependencies into account and approximates the underlying null distribution of the minimal p-value through a permutation-based procedure. This procedure is known to achieve more power than simpler approaches while ensuring a weak control of the family-wise error rate. We illustrate our approach for addressing researcher degrees of freedom by applying it to a study on the impact of perioperative paO2 on post-operative complications after neurosurgery. A total of 48 analysis strategies are considered and adjusted using the minP procedure. This approach allows to selectively report the result of the analysis strategy yielding the most convincing evidence, while controlling the type 1 error -- and thus the risk of publishing false positive results that may not be replicable.
Published: 2024

7. Evaluating machine learning models in non-standard settings: An overview and new findings

Author: Hornung, Roman, Nalenz, Malte, Schneider, Lennart, Bender, Andreas, Bothmann, Ludwig, Bischl, Bernd, Augustin, Thomas, and Boulesteix, Anne-Laure
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Statistics - Applications, Statistics - Computation, Statistics - Methodology
Abstract: Estimating the generalization error (GE) of machine learning models is fundamental, with resampling methods being the most common approach. However, in non-standard settings, particularly those where observations are not independently and identically distributed, resampling using simple random data divisions may lead to biased GE estimates. This paper strives to present well-grounded guidelines for GE estimation in various such non-standard settings: clustered data, spatial data, unequal sampling probabilities, concept drift, and hierarchically structured outcomes. Our overview combines well-established methodologies with other existing methods that, to our knowledge, have not been frequently considered in these particular settings. A unifying principle among these techniques is that the test data used in each iteration of the resampling procedure should reflect the new observations to which the model will be applied, while the training data should be representative of the entire data set used to obtain the final model. Beyond providing an overview, we address literature gaps by conducting simulation studies. These studies assess the necessity of using GE-estimation methods tailored to the respective setting. Our findings corroborate the concern that standard resampling methods often yield biased GE estimates in non-standard settings, underscoring the importance of tailored GE estimation.
Published: 2023

8. From RNA sequencing measurements to the final results: a practical guide to navigating the choices and uncertainties of gene set analysis

Author: Wünsch, Milena, Sauer, Christina, Callahan, Patrick, Hinske, Ludwig Christian, and Boulesteix, Anne-Laure
Subjects: Statistics - Applications
Abstract: Gene set analysis, a popular approach for analyzing high-throughput gene expression data, aims to identify sets of related genes that show significantly enriched or depleted expression patterns between different conditions. In the last years, a multitude of methods and corresponding tools have been developed for this task. However, clear guidance is lacking: choosing the right method is the first hurdle a researcher is confronted with. No less challenging than overcoming this so-called method uncertainty is the procedure of preprocessing, from knowing which steps are required to selecting a corresponding approach from the plethora of valid options to create the accepted input object (data preprocessing uncertainty), with clear guidance again being scarce. Here, we provide a practical guide through all steps required to conduct gene set analysis, beginning with a concise overview of a selection of established methods, including GSEA and DAVID. We thereby lay a special focus on reviewing and explaining the necessary preprocessing steps for each method under consideration (e.g. the necessity of a transformation of the RNA-Seq data)-an essential aspect that is typically paid only limited attention to in both existing reviews and applications. To raise awareness of the spectrum of uncertainties, our review is accompanied by an extensive overview of the literature on valid approaches for each step and illustrative R code demonstrating the complex analysis pipelines. It ends with a discussion and recommendations to both users and developers to ensure that the results of gene set analysis are, despite the above-mentioned uncertainties, replicable and transparent., Comment: 52 pages, 4 figures
Published: 2023

9. Addressing researcher degrees of freedom through minP adjustment

Author: Mandl, Maximilian M., Becker-Pennrich, Andrea S., Hinske, Ludwig C., Hoffmann, Sabine, and Boulesteix, Anne-Laure
Published: 2024
Full Text: View/download PDF

10. Prediction approaches for partly missing multi-omics covariate data: A literature review and an empirical comparison study

Author: Hornung, Roman, Ludwigs, Frederik, Hagenberg, Jonas, and Boulesteix, Anne-Laure
Subjects: Quantitative Biology - Genomics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Statistics - Applications, Statistics - Computation
Abstract: As the availability of omics data has increased in the last few years, more multi-omics data have been generated, that is, high-dimensional molecular data consisting of several types such as genomic, transcriptomic, or proteomic data, all obtained from the same patients. Such data lend themselves to being used as covariates in automatic outcome prediction because each omics type may contribute unique information, possibly improving predictions compared to using only one omics data type. Frequently, however, in the training data and the data to which automatic prediction rules should be applied, the test data, the different omics data types are not available for all patients. We refer to this type of data as block-wise missing multi-omics data. First, we provide a literature review on existing prediction methods applicable to such data. Subsequently, using a collection of 13 publicly available multi-omics data sets, we compare the predictive performances of several of these approaches for different block-wise missingness patterns. Finally, we discuss the results of this empirical comparison study and draw some tentative conclusions.
Published: 2023

11. Improving Software Engineering in Biostatistics: Challenges and Opportunities

Author: Bové, Daniel Sabanés, Seibold, Heidi, Boulesteix, Anne-Laure, Manitz, Juliane, Gasparini, Alessandro, Guünhan, Burak K., Boix, Oliver, Schuüler, Armin, Fillinger, Sven, Nahnsen, Sven, Jacob, Anna E., and Jaki, Thomas
Subjects: Statistics - Computation
Abstract: Programming is ubiquitous in applied biostatistics; adopting software engineering skills will help biostatisticians do a better job. To explain this, we start by highlighting key challenges for software development and application in biostatistics. Silos between different statistician roles, projects, departments, and organizations lead to the development of duplicate and suboptimal code. Building on top of open-source software requires critical appraisal and risk-based assessment of the used modules. Code that is written needs to be readable to ensure reliable software. The software needs to be easily understandable for the user, as well as developed within testing frameworks to ensure that long term maintenance of the software is feasible. Finally, the reproducibility of research results is hindered by manual analysis workflows and uncontrolled code development. We next describe how the awareness of the importance and application of good software engineering practices and strategies can help address these challenges. The foundation is a better education in basic software engineering skills in schools, universities, and during the work life. Dedicated software engineering teams within academic institutions and companies can be a key factor for the establishment of good software engineering practices and catalyze improvements across research projects. Providing attractive career paths is important for the retainment of talents. Readily available tools can improve the reproducibility of statistical analyses and their use can be exercised in community events. [...]
Published: 2023

12. Phases of methodological research in biostatistics - building the evidence base for new methods

Author: Heinze, Georg, Boulesteix, Anne-Laure, Kammer, Michael, Morris, Tim P., and White, Ian R.
Subjects: Statistics - Methodology, 62A01 (Primary)
Abstract: Although the biostatistical scientific literature publishes new methods at a very high rate, many of these developments are not trustworthy enough to be adopted by the scientific community. We propose a framework to think about how a piece of methodological work contributes to the evidence base for a method. Similarly to the well-known phases of clinical research in drug development, we define four phases of methodological research. These four phases cover (I) providing logical reasoning and proofs, (II) providing empirical evidence, first in a narrow target setting, then (III) in an extended range of settings and for various outcomes, accompanied by appropriate application examples, and (IV) investigations that establish a method as sufficiently well-understood to know when it is preferred over others and when it is not. We provide basic definitions of the four phases but acknowledge that more work is needed to facilitate unambiguous classification of studies into phases. Methodological developments that have undergone all four proposed phases are still rare, but we give two examples with references. Our concept rebalances the emphasis to studies in phase III and IV, i.e., carefully planned methods comparison studies and studies that explore the empirical properties of existing methods in a wider range of problems., Comment: 14 pages
Published: 2022
Full Text: View/download PDF

13. Explaining the optimistic performance evaluation of newly proposed methods: a cross-design validation experiment

Author: Nießl, Christina, Hoffmann, Sabine, Ullmann, Theresa, and Boulesteix, Anne-Laure
Subjects: Statistics - Methodology
Abstract: The constant development of new data analysis methods in many fields of research is accompanied by an increasing awareness that these new methods often perform better in their introductory paper than in subsequent comparison studies conducted by other researchers. We attempt to explain this discrepancy by conducting a systematic experiment that we call "cross-design validation of methods". In the experiment, we select two methods designed for the same data analysis task, reproduce the results shown in each paper, and then re-evaluate each method based on the study design (i.e., data sets, competing methods, and evaluation criteria) that was used to show the abilities of the other method. We conduct the experiment for two data analysis tasks, namely cancer subtyping using multi-omic data and differential gene expression analysis. Three of the four methods included in the experiment indeed perform worse when they are evaluated on the new study design, which is mainly caused by the different data sets. Apart from illustrating the many degrees of freedom existing in the assessment of a method and their effect on its performance, our experiment suggests that the performance discrepancies between original and subsequent papers may not only be caused by the non-neutrality of the authors proposing the new method but also by differences regarding the level of expertise and field of application.
Published: 2022
Full Text: View/download PDF

14. Temporary mechanical circulatory support in infarct-related cardiogenic shock: an individual patient data meta-analysis of randomised trials with 6-month follow-up

Author: Burkhoff, Daniel, Cohen, Howard, Brunckhorst, Corinna, O'Neill, William W., Thiele, Holger Thiele, Sick, Peter, Boudriot, Enno, Diederich, Klaus-Werner, Hambrecht, Rainer, Niebauer, Josef, Schuler, Gerhard, Seyfarth, Melchior, Sibbing, Dirk, Bauer, Iris, Fröhlich, Georg, Bott-Flügel, Lorenz, Byrne, Robert, Dirschinger, Josef, Kastrati, Adnan, Schömig, Albert, Ouweneel, Dagmar M., Eriksen, Erlend, Sjauw, Krischan D., van Dongen, Ivo M., Hirsch, Alexander, Packer, Erik J.S, Vis, M. Marie, Wykrzykowska, Joanna J., Koch, Karel T., Baan, Jan, De Winter, Robbert J, Piek, Jan J., Lagrand, Wim K, de Mol, Bas A.J.M, Tijssen, Jan G.P., Henriques, José P.S., Møller, Jacob E., Engstrøm, Thomas, Jensen, Lisette Okkels, Eiskjær, Hans, Mangner, Norman, Polzin, Amin, Schulze, P. Christian, Skurk, Carsten, Nordbeck, Peter, Clemmensen, Peter, Panoulas, Vasileios, Zimmer, Sebastian, Schäfer, Andreas, Werner, Nikos, Frydland, Martin, Holmvang, Lene, Kjærgaard, Jesper, Sørensen, Rikke, Lønborg, Jacob, Greve, Matias, Christiansen, Evald H., Linke, Axel, Woitek, Felix J., Westenfeld, Ralf, Moebius-Winkler, Sven, Wachtell, Kristian, Ravn, Hanne Berg, Lassen, Jens Flensted, Boesgaard, Søren, Gerke, Oke, Hassager, Christian, Lackermair, Korbinian, Brunner, Stefan, Orban, Mathias, Peterss, Sven, Orban, Martin, Theiss, Hans D., Huber, Bruno C., Juchem, Gerd, Born, Frank, Boulesteix, Anne-Laure, Bauer, Axel, Pichlmaier, Maximilian, Hausleiter, Jörg, Massberg, Steffen, Hagl, Christian, Guenther, Sabina P.W., Ostadal, Petr, Rokyta, Richard, Karasek, Jiri, Kruger, Andreas, Vondrakova, Dagmar, Janotka, Marek, Naar, Jan, Smalcova, Jana, Hubatova, Marketa, Hromadk, Milan, Volovar, Stefan, Seyfrydova, Miroslava, Jarkovsky, Jiri, Svoboda, Michal, Linhart, Ales, Belohlavek, Jan, Banning, Amerjeet S., Sabaté, Manel, Gracey, Jay, López-Sobrino, Teresa, Bogaerts, Kris, Adriaenssens, Tom, Berry, Colin, Erglis, Andrejs, Haine, Steven, Myrmel, Truls, Patel, Sameer, Buera, Irene, Sionis, Alessandro, Vilalta, Victoria, Yusuff, Hakeem, Vrints, Christiaan, Adlam, David, Flather, Marcus, Gershlick, Anthony H., Thiele, Holger, Zeymer, Uwe, Akin, Ibrahim, Behnes, Michael, Rassaf, Tienush, Mahabadi, Amir-Abbas, Lehmann, Ralf, Eitel, Ingo, Graf, Tobias, Seidler, Tim, Schuster, Andreas, Duerschmied, Daniel, Hennersdorf, Marcus, Fichtlscherer, Stephan, Voigt, Ingo, John, Stefan, Ewen, Sebastian, Tigges, Eike, Nordbeck, Peter Steffen, Bruch, Leonhard, Jung, Christian, Franz, Jutta, Lauten, Philipp, Goslar, Tomaz, Feistritzer, Hans-Josef, Pöss, Janine, Kirchhof, Eva, Ouarrak, Taoufik, Schneider, Steffen, Desch, Steffen, Freund, Anne, Møller, Jacob E, Henriques, Jose P S, Bogerd, Margriet, Hochadel, Matthias, and Schulze, P Christian
Published: 2024
Full Text: View/download PDF

15. Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

Author: Rahnenführer, Jörg, De Bin, Riccardo, Benner, Axel, Ambrogi, Federico, Lusa, Lara, Boulesteix, Anne-Laure, Migliavacca, Eugenia, Binder, Harald, Michiels, Stefan, Sauerbrei, Willi, and McShane, Lisa
Published: 2023
Full Text: View/download PDF

16. Reproduzierbare und replizierbare Forschung

Author: Hoffmann, Sabine, primary, Scheipl, Fabian, additional, and Boulesteix, Anne-Laure, additional
Published: 2023
Full Text: View/download PDF

17. Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges

Author: Bischl, Bernd, Binder, Martin, Lang, Michel, Pielok, Tobias, Richter, Jakob, Coors, Stefan, Thomas, Janek, Ullmann, Theresa, Becker, Marc, Boulesteix, Anne-Laure, Deng, Difan, and Lindauer, Marius
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Most machine learning algorithms are configured by one or several hyperparameters that must be carefully chosen and often considerably impact performance. To avoid a time consuming and unreproducible manual trial-and-error process to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods, e.g., based on resampling error estimation for supervised machine learning, can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods such as grid or random search, evolutionary algorithms, Bayesian optimization, Hyperband and racing. It gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with ML pipelines, runtime improvements, and parallelization. This work is accompanied by an appendix that contains information on specific software packages in R and Python, as well as information and recommended hyperparameter search spaces for specific learning algorithms. We also provide notebooks that demonstrate concepts from this work as supplementary files.
Published: 2021

18. Over-optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results

Author: Nießl, Christina, Herrmann, Moritz, Wiedemann, Chiara, Casalicchio, Giuseppe, and Boulesteix, Anne-Laure
Subjects: Statistics - Methodology
Abstract: In recent years, the need for neutral benchmark studies that focus on the comparison of methods from computational sciences has been increasingly recognised by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, certain amounts of flexibility always exist. This includes the choice of data sets and performance measures, the handling of missing performance values and the way the performance values are aggregated over the data sets. As a consequence of this flexibility, researchers may be concerned about how their choices affect the results or, in the worst case, may be tempted to engage in questionable research practices (e.g. the selective reporting of results or the post-hoc modification of design or analysis components) to fit their expectations or hopes. To raise awareness for this issue, we use an example benchmark study to illustrate how variable benchmark results can be when all possible combinations of a range of design and analysis options are considered. We then demonstrate how the impact of each choice on the results can be assessed using multidimensional unfolding. In conclusion, based on previous literature and on our illustrative example, we claim that the multiplicity of design and analysis options combined with questionable research practices lead to biased interpretations of benchmark results and to over-optimistic conclusions. This issue should be considered by computational researchers when designing and analysing their benchmark studies and by the scientific community in general in an effort towards more reliable benchmark results., Comment: 39 pages
Published: 2021
Full Text: View/download PDF

19. Validation of cluster analysis results on validation data: A systematic framework

Author: Ullmann, Theresa, Hennig, Christian, and Boulesteix, Anne-Laure
Subjects: Statistics - Methodology
Abstract: Cluster analysis refers to a wide range of data analytic techniques for class discovery and is popular in many application fields. To judge the quality of a clustering result, different cluster validation procedures have been proposed in the literature. While there is extensive work on classical validation techniques, such as internal and external validation, less attention has been given to validating and replicating a clustering result using a validation dataset. Such a dataset may be part of the original dataset, which is separated before analysis begins, or it could be an independently collected dataset. We present a systematic structured framework for validating clustering results on validation data that includes most existing validation approaches. In particular, we review classical validation techniques such as internal and external validation, stability analysis, hypothesis testing, and visual validation, and show how they can be interpreted in terms of our framework. We precisely define and formalise different types of validation of clustering results on a validation dataset and explain how each type can be implemented in practice. Furthermore, we give examples of how clustering studies from the applied literature that used a validation dataset can be classified into the framework., Comment: 32 pages, 1 figure
Published: 2021

20. Over-optimistic evaluation and reporting of novel cluster algorithms: an illustrative study

Author: Ullmann, Theresa, Beer, Anna, Hünemörder, Maximilian, Seidl, Thomas, and Boulesteix, Anne-Laure
Published: 2023
Full Text: View/download PDF

21. Large-scale benchmark study of survival prediction methods using multi-omics data

Author: Herrmann, Moritz, Probst, Philipp, Hornung, Roman, Jurinovic, Vindi, and Boulesteix, Anne-Laure
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Statistics - Applications, Statistics - Methodology
Abstract: Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables (often in addition to classical clinical variables), are increasingly generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions by means of a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets from the database "The Cancer Genome Atlas", containing from 35 to 1,000 observations and from 60,000 to 100,000 variables. The considered outcome was the (censored) survival time. Twelve methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan-Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno's C-index and the integrated Brier-score served as performance metrics. The results show that, although multi-omics data can improve the prediction performance, this is not generally the case. Only the method block forest slightly outperformed the Cox model on average over all datasets. Taking into account the multi-omics structure improves the predictive performance and protects variables in low-dimensional groups - especially clinical variables - from not being included in the model. All analyses are reproducible using freely available R code., Comment: 23 pages, 6 tables, 3 figures
Published: 2020
Full Text: View/download PDF

22. Stereotactic radiosurgery versus whole-brain radiotherapy in patients with 4–10 brain metastases: A nonrandomized controlled trial

Author: Bodensohn, Raphael, Kaempfel, Anna-Lena, Boulesteix, Anne-Laure, Orzelek, Anna Maria, Corradini, Stefanie, Fleischmann, Daniel Felix, Forbrig, Robert, Garny, Sylvia, Hadi, Indrawati, Hofmaier, Jan, Minniti, Giuseppe, Mansmann, Ulrich, Pazos Escudero, Montserrat, Thon, Niklas, Belka, Claus, and Niyazi, Maximilian
Published: 2023
Full Text: View/download PDF

23. Essential guidelines for computational method benchmarking

Author: Weber, Lukas M., Saelens, Wouter, Cannoodt, Robrecht, Soneson, Charlotte, Hapfelmeier, Alexander, Gardner, Paul P., Boulesteix, Anne-Laure, Saeys, Yvan, and Robinson, Mark D.
Subjects: Quantitative Biology - Quantitative Methods, Statistics - Applications
Abstract: In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology., Comment: Minor updates
Published: 2018

24. Use of Resampling Procedures to Investigate Issues of Model Building and Its Stability

Author: Sauerbrei, Willi, Boulesteix, Anne-Laure, George, Stephen, Section editor, Piantadosi, Steven, editor, and Meinert, Curtis L., editor
Published: 2022
Full Text: View/download PDF

25. Benchmarking in cluster analysis: A white paper

Author: Van Mechelen, Iven, Boulesteix, Anne-Laure, Dangl, Rainer, Dean, Nema, Guyon, Isabelle, Hennig, Christian, Leisch, Friedrich, and Steinley, Douglas
Subjects: Statistics - Other Statistics, 62H30
Abstract: Note: A revised version of this is now published. Please cite and read (it's open access): Van Mechelen, I., Boulesteix, A.-L., Dangl, R., Dean, N., Hennig, C., Leisch, F., Steinley, D., Warrens, M. J. (2023). A white paper on good research practices in benchmarking: The case of cluster analysis. WIREs Data Mining and Knowledge Discovery, e1511. https://doi.org/10.1002/widm.1511 To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data pre-processing, new data-analytic techniques, and new methods of output post-processing, should be extensively and carefully compared with existing alternatives, and that existing methods should be subjected to neutral comparison studies. To date, benchmarking and recommendations for benchmarking have been frequently seen in the context of supervised learning. Unfortunately, there has been a dearth of guidelines for benchmarking in an unsupervised setting, with the area of clustering as an important subdomain. To address this problem, discussion is given to the theoretical conceptual underpinnings of benchmarking in the field of cluster analysis by means of simulated as well as empirical data. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made.
Published: 2018
Full Text: View/download PDF

26. Hyperparameters and Tuning Strategies for Random Forest

Author: Probst, Philipp, Wright, Marvin, and Boulesteix, Anne-Laure
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: The random forest algorithm (RF) has several hyperparameters that have to be set by the user, e.g., the number of observations drawn randomly for each tree and whether they are drawn with or without replacement, the number of variables drawn randomly for each split, the splitting rule, the minimum number of samples that a node must contain and the number of trees. In this paper, we first provide a literature review on the parameters' influence on the prediction performance and on variable importance measures. It is well known that in most cases RF works reasonably well with the default values of the hyperparameters specified in software packages. Nevertheless, tuning the hyperparameters can improve the performance of RF. In the second part of this paper, after a brief overview of tuning strategies we demonstrate the application of one of the most established tuning strategies, model-based optimization (MBO). To make it easier to use, we provide the tuneRanger R package that tunes RF with MBO automatically. In a benchmark study on several datasets, we compare the prediction performance and runtime of tuneRanger with other tuning implementations in R and RF with default hyperparameters., Comment: 19 pages, 2 figures
Published: 2018
Full Text: View/download PDF

27. Tunability: Importance of Hyperparameters of Machine Learning Algorithms

Author: Probst, Philipp, Bischl, Bernd, and Boulesteix, Anne-Laure
Subjects: Statistics - Machine Learning
Abstract: Modern supervised machine learning algorithms involve hyperparameters that have to be set before running them. Options for setting hyperparameters are default values from the software package, manual configuration by the user or configuring them for optimal predictive performance by a tuning procedure. The goal of this paper is two-fold. Firstly, we formalize the problem of tuning from a statistical point of view, define data-based defaults and suggest general measures quantifying the tunability of hyperparameters of algorithms. Secondly, we conduct a large-scale benchmarking study based on 38 datasets from the OpenML platform and six common machine learning algorithms. We apply our measures to assess the tunability of their parameters. Our results yield default values for hyperparameters and enable users to decide whether it is worth conducting a possibly time consuming tuning strategy, to focus on the most important hyperparameters and to chose adequate hyperparameter spaces for tuning., Comment: 22 pages, 10 tables, 8 figures
Published: 2018

28. Interaction forests: Identifying and exploiting interpretable quantitative and qualitative interaction effects

Author: Hornung, Roman and Boulesteix, Anne-Laure
Published: 2022
Full Text: View/download PDF

29. Effective methods to enhance medical students’ cardioversion and transcutaneous cardiac pacing skills retention - a prospective controlled study

Author: Kowalski, Christian, Boulesteix, Anne-Laure, and Harendza, Sigrid
Published: 2022
Full Text: View/download PDF

30. Planning preclinical confirmatory multicenter trials to strengthen translation from basic to clinical research – a multi-stakeholder workshop report

Author: Drude, Natascha Ingrid, Martinez-Gamboa, Lorena, Danziger, Meggie, Collazo, Anja, Kniffert, Silke, Wiebach, Janine, Nilsonne, Gustav, Konietschke, Frank, Piper, Sophie K., Pawel, Samuel, Micheloud, Charlotte, Held, Leonhard, Frommlet, Florian, Segelcke, Daniel, Pogatzki-Zahn, Esther M., Voelkl, Bernhard, Friede, Tim, Brunner, Edgar, Dempfle, Astrid, Haller, Bernhard, Jung, Marie Juliane, Riecken, Lars Björn, Kuhn, Hans-Georg, Tenbusch, Matthias, Higuita, Lina Maria Serna, Remarque, Edmond J., Grüninger-Egli, Servan Luciano, Manske, Katrin, Kobold, Sebastian, Rivalan, Marion, Wedekind, Lisa, Wilcke, Juliane C., Boulesteix, Anne-Laure, Meinhardt, Marcus W., Spanagel, Rainer, Hettmer, Simone, von Lüttichau, Irene, Regina, Carla, Dirnagl, Ulrich, and Toelch, Ulf
Published: 2022
Full Text: View/download PDF

31. COMPANION: development of a patient-centred complexity and casemix classification for adult palliative care patients based on needs and resource use – a protocol for a cross-sectional multi-centre study

Author: Hodiamont, Farina, Schatz, Caroline, Gesell, Daniela, Leidl, Reiner, Boulesteix, Anne-Laure, Nauck, Friedemann, Wikert, Julia, Jansky, Maximiliane, Kranz, Steven, and Bausewein, Claudia
Published: 2022
Full Text: View/download PDF

32. To tune or not to tune the number of trees in random forest?

Author: Probst, Philipp and Boulesteix, Anne-Laure
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: The number of trees T in the random forest (RF) algorithm for supervised learning has to be set by the user. It is controversial whether T should simply be set to the largest computationally manageable value or whether a smaller T may in some cases be better. While the principle underlying bagging is that "more trees are better", in practice the classification error rate sometimes reaches a minimum before increasing again for increasing number of trees. The goal of this paper is four-fold: (i) providing theoretical results showing that the expected error rate may be a non-monotonous function of the number of trees and explaining under which circumstances this happens; (ii) providing theoretical results showing that such non-monotonous patterns cannot be observed for other performance measures such as the Brier score and the logarithmic loss (for classification) and the mean squared error (for regression); (iii) illustrating the extent of the problem through an application to a large number (n = 306) of datasets from the public database OpenML; (iv) finally arguing in favor of setting it to a computationally feasible large number, depending on convergence properties of the desired performance measure., Comment: 20 pages, 4 figures
Published: 2017

33. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods

Author: Collins, Gary S, primary, Moons, Karel G M, additional, Dhiman, Paula, additional, Riley, Richard D, additional, Beam, Andrew L, additional, Van Calster, Ben, additional, Ghassemi, Marzyeh, additional, Liu, Xiaoxuan, additional, Reitsma, Johannes B, additional, van Smeden, Maarten, additional, Boulesteix, Anne-Laure, additional, Camaradou, Jennifer Catherine, additional, Celi, Leo Anthony, additional, Denaxas, Spiros, additional, Denniston, Alastair K, additional, Glocker, Ben, additional, Golub, Robert M, additional, Harvey, Hugh, additional, Heinze, Georg, additional, Hoffman, Michael M, additional, Kengne, André Pascal, additional, Lam, Emily, additional, Lee, Naomi, additional, Loder, Elizabeth W, additional, Maier-Hein, Lena, additional, Mateen, Bilal A, additional, McCradden, Melissa D, additional, Oakden-Rayner, Lauren, additional, Ordish, Johan, additional, Parnell, Richard, additional, Rose, Sherri, additional, Singh, Karandeep, additional, Wynants, Laure, additional, and Logullo, Patricia, additional
Published: 2024
Full Text: View/download PDF

34. To adjust or not to adjust: it is not the tests performed that count, but how they are reported and interpreted

Author: Boulesteix, Anne-Laure, primary and Hoffmann, Sabine, additional
Published: 2024
Full Text: View/download PDF

35. Raising awareness of uncertain choices in empirical data analysis: A teaching concept toward replicable research practices

Author: Mandl, Maximilian M., primary, Hoffmann, Sabine, additional, Bieringer, Sebastian, additional, Jacob, Anna E., additional, Kraft, Marie, additional, Lemster, Simon, additional, and Boulesteix, Anne-Laure, additional
Published: 2024
Full Text: View/download PDF

36. Modelling Individual Response to Treatment and Its Uncertainty:A Review of Statistical Methods and Challenges for Future Research

Author: Mansmann, Ulrich, Boulesteix, Anne-Laure, Bokulich, Alisa, Series Editor, Renn, Jürgen, Series Editor, Massimi, Michela, Series Editor, Divarci, Lindy, Managing Editor, Arabatzis, Theodore, Editorial Board Member, Douglas, Heather E., Editorial Board Member, Gayon, Jean, Editorial Board Member, Glick, Thomas F., Editorial Board Member, Goenner, Hubert, Editorial Board Member, Heilbron, John, Editorial Board Member, Kormos-Buchwald, Diana, Editorial Board Member, Lehner, Christoph, Editorial Board Member, McLaughlin, Peter, Editorial Board Member, Nieto-Galan, Agustí, Editorial Board Member, Ordine, Nuccio, Editorial Board Member, Schweber, Sylvan S., Editorial Board Member, Simões, Ana, Editorial Board Member, Stachel, John J., Editorial Board Member, Zhang, Baichun, Editorial Board Member, LaCaze, Adam, editor, and Osimani, Barbara, editor
Published: 2020
Full Text: View/download PDF

37. Outcome of patients treated with extracorporeal life support in cardiogenic shock complicating acute myocardial infarction: 1-year result from the ECLS-Shock study

Author: Lackermair, Korbinian, Brunner, Stefan, Orban, Mathias, Peterss, Sven, Orban, Martin, Theiss, Hans D., Huber, Bruno C., Juchem, Gerd, Born, Frank, Boulesteix, Anne-Laure, Bauer, Axel, Pichlmaier, Maximilian, Hausleiter, Jörg, Massberg, Steffen, Hagl, Christian, and Guenther, Sabina P. W.
Published: 2021
Full Text: View/download PDF

38. Temporary mechanical circulatory support in infarct-related cardiogenic shock: an individual patient data meta-analysis of randomised trials with 6-month follow-up

Author: Thiele, Holger, Møller, Jacob E, Henriques, Jose P S, Bogerd, Margriet, Seyfarth, Melchior, Burkhoff, Daniel, Ostadal, Petr, Rokyta, Richard, Belohlavek, Jan, Massberg, Steffen, Flather, Marcus, Hochadel, Matthias, Schneider, Steffen, Desch, Steffen, Freund, Anne, Eiskjær, Hans, Mangner, Norman, Pöss, Janine, Polzin, Amin, Schulze, P Christian, Skurk, Carsten, Zeymer, Uwe, Hassager, Christian, Burkhoff, Daniel, Cohen, Howard, Brunckhorst, Corinna, O'Neill, William W., Thiele, Holger Thiele, Sick, Peter, Boudriot, Enno, Diederich, Klaus-Werner, Hambrecht, Rainer, Niebauer, Josef, Schuler, Gerhard, Seyfarth, Melchior, Sibbing, Dirk, Bauer, Iris, Fröhlich, Georg, Bott-Flügel, Lorenz, Byrne, Robert, Dirschinger, Josef, Kastrati, Adnan, Schömig, Albert, Ouweneel, Dagmar M., Eriksen, Erlend, Sjauw, Krischan D., van Dongen, Ivo M., Hirsch, Alexander, Packer, Erik J.S, Vis, M. Marie, Wykrzykowska, Joanna J., Koch, Karel T., Baan, Jan, De Winter, Robbert J, Piek, Jan J., Lagrand, Wim K, de Mol, Bas A.J.M, Tijssen, Jan G.P., Henriques, José P.S., Møller, Jacob E., Engstrøm, Thomas, Jensen, Lisette Okkels, Eiskjær, Hans, Mangner, Norman, Polzin, Amin, Schulze, P. Christian, Skurk, Carsten, Nordbeck, Peter, Clemmensen, Peter, Panoulas, Vasileios, Zimmer, Sebastian, Schäfer, Andreas, Werner, Nikos, Frydland, Martin, Holmvang, Lene, Kjærgaard, Jesper, Sørensen, Rikke, Lønborg, Jacob, Greve, Matias, Christiansen, Evald H., Linke, Axel, Woitek, Felix J., Westenfeld, Ralf, Moebius-Winkler, Sven, Wachtell, Kristian, Ravn, Hanne Berg, Lassen, Jens Flensted, Boesgaard, Søren, Gerke, Oke, Hassager, Christian, Lackermair, Korbinian, Brunner, Stefan, Orban, Mathias, Peterss, Sven, Orban, Martin, Theiss, Hans D., Huber, Bruno C., Juchem, Gerd, Born, Frank, Boulesteix, Anne-Laure, Bauer, Axel, Pichlmaier, Maximilian, Hausleiter, Jörg, Massberg, Steffen, Hagl, Christian, Guenther, Sabina P.W., Ostadal, Petr, Rokyta, Richard, Karasek, Jiri, Kruger, Andreas, Vondrakova, Dagmar, Janotka, Marek, Naar, Jan, Smalcova, Jana, Hubatova, Marketa, Hromadk, Milan, Volovar, Stefan, Seyfrydova, Miroslava, Jarkovsky, Jiri, Svoboda, Michal, Linhart, Ales, Belohlavek, Jan, Banning, Amerjeet S., Sabaté, Manel, Orban, Martin, Gracey, Jay, López-Sobrino, Teresa, Massberg, Steffen, Kastrati, Adnan, Bogaerts, Kris, Adriaenssens, Tom, Berry, Colin, Erglis, Andrejs, Haine, Steven, Myrmel, Truls, Patel, Sameer, Buera, Irene, Sionis, Alessandro, Vilalta, Victoria, Yusuff, Hakeem, Vrints, Christiaan, Adlam, David, Flather, Marcus, Gershlick, Anthony H., Thiele, Holger, Zeymer, Uwe, Akin, Ibrahim, Behnes, Michael, Rassaf, Tienush, Mahabadi, Amir-Abbas, Lehmann, Ralf, Eitel, Ingo, Graf, Tobias, Seidler, Tim, Schuster, Andreas, Skurk, Carsten, Duerschmied, Daniel, Clemmensen, Peter, Hennersdorf, Marcus, Fichtlscherer, Stephan, Voigt, Ingo, Seyfarth, Melchior, John, Stefan, Ewen, Sebastian, Linke, Axel, Tigges, Eike, Nordbeck, Peter Steffen, Bruch, Leonhard, Jung, Christian, Franz, Jutta, Lauten, Philipp, Goslar, Tomaz, Feistritzer, Hans-Josef, Pöss, Janine, Kirchhof, Eva, Ouarrak, Taoufik, Schneider, Steffen, Desch, Steffen, and Freund, Anne
Abstract: Percutaneous active mechanical circulatory support (MCS) devices are being increasingly used in the treatment of acute myocardial infarction-related cardiogenic shock (AMICS) despite conflicting evidence regarding their effect on mortality. We aimed to ascertain the effect of early routine active percutaneous MCS versus control treatment on 6-month all-cause mortality in patients with AMICS.
Published: 2024
Full Text: View/download PDF

39. Improved Outcome Prediction Across Data Sources Through Robust Parameter Tuning

Author: Ellenbach, Nicole, Boulesteix, Anne-Laure, Bischl, Bernd, Unger, Kristian, and Hornung, Roman
Published: 2021
Full Text: View/download PDF

40. Special issue: Artificial intelligence in genomics

Author: Boulesteix, Anne-Laure and Wright, Marvin
Published: 2022
Full Text: View/download PDF

41. On the asymptotic behaviour of the variance estimator of a [formula omitted]-statistic

Author: Fuchs, Mathias, Hornung, Roman, Boulesteix, Anne-Laure, and De Bin, Riccardo
Published: 2020
Full Text: View/download PDF

42. Editorial for the special collection “Towards neutral comparison studies in methodological research”

Author: Boulesteix, Anne‐Laure, primary, Baillie, Mark, additional, Edelmann, Dominic, additional, Held, Leonhard, additional, Morris, Tim P., additional, and Sauerbrei, Willi, additional
Published: 2024
Full Text: View/download PDF

43. A comparison of hyperparameter tuning procedures for clinical prediction models: A simulation study

Author: Dunias, Zoë S., primary, Van Calster, Ben, additional, Timmerman, Dirk, additional, Boulesteix, Anne‐Laure, additional, and van Smeden, Maarten, additional
Published: 2024
Full Text: View/download PDF

44. From RNA sequencing measurements to the final results: A practical guide to navigating the choices and uncertainties of gene set analysis

Author: Wünsch, Milena, primary, Sauer, Christina, additional, Callahan, Patrick, additional, Hinske, Ludwig Christian, additional, and Boulesteix, Anne‐Laure, additional
Published: 2024
Full Text: View/download PDF

45. On the optimistic performance evaluation of newly introduced bioinformatic methods

Author: Buchka, Stefan, Hapfelmeier, Alexander, Gardner, Paul P., Wilson, Rory, and Boulesteix, Anne-Laure
Published: 2021
Full Text: View/download PDF

46. Statistical learning approaches in the genetic epidemiology of complex diseases

Author: Boulesteix, Anne-Laure, Wright, Marvin N., Hoffmann, Sabine, and König, Inke R.
Published: 2020
Full Text: View/download PDF

47. A U-statistic estimator for the variance of resampling-based error estimators

Author: Fuchs, Mathias, Hornung, Roman, De Bin, Riccardo, and Boulesteix, Anne-Laure
Subjects: Mathematics - Statistics Theory, Statistics - Machine Learning, Primary: 62 G 09, secondary: 62 G 10, 62 H 15, 62 E 20
Abstract: We revisit resampling procedures for error estimation in binary classification in terms of U-statistics. In particular, we exploit the fact that the error rate estimator involving all learning-testing splits is a U-statistic. Thus, it has minimal variance among all unbiased estimators and is asymptotically normally distributed. Moreover, there is an unbiased estimator for this minimal variance if the total sample size is at least the double learning set size plus two. In this case, we exhibit such an estimator which is another U-statistic. It enjoys, again, various optimality properties and yields an asymptotically exact hypothesis test of the equality of error rates when two learning algorithms are compared. Our statements apply to any deterministic learning algorithms under weak non-degeneracy assumptions., Comment: 15 pages, no figures
Published: 2013

48. Immune function testing in sepsis patients receiving sodium selenite

Author: Guo, Anne, Srinath, Jyotsna, Feuerecker, Matthias, Crucian, Brian, Briegel, Josef, Boulesteix, Anne-Laure, Kaufmann, Ines, and Choukèr, Alexander
Published: 2019
Full Text: View/download PDF

49. A Plea for Neutral Comparison Studies in Computational Sciences

Author: Boulesteix, Anne-Laure and Eugster, Manuel J. A.
Subjects: Statistics - Computation, Computer Science - Computer Vision and Pattern Recognition, Statistics - Methodology, Statistics - Machine Learning
Abstract: In a context where most published articles are devoted to the development of "new methods", comparison studies are generally appreciated by readers but surprisingly given poor consideration by many scientific journals. In connection with recent articles on over-optimism and epistemology published in Bioinformatics, this letter stresses the importance of neutral comparison studies for the objective evaluation of existing methods and the establishment of standards by drawing parallels with clinical research.
Published: 2012
Full Text: View/download PDF

50. Regularized estimation of large-scale gene association networks using graphical Gaussian models

Author: Kraemer, Nicole, Schaefer, Juliane, and Boulesteix, Anne-Laure
Subjects: Statistics - Methodology, Statistics - Applications, Statistics - Computation
Abstract: Graphical Gaussian models are popular tools for the estimation of (undirected) gene association networks from microarray data. A key issue when the number of variables greatly exceeds the number of samples is the estimation of the matrix of partial correlations. Since the (Moore-Penrose) inverse of the sample covariance matrix leads to poor estimates in this scenario, standard methods are inappropriate and adequate regularization techniques are needed. In this article, we investigate a general framework for combining regularized regression methods with the estimation of Graphical Gaussian models. This framework includes various existing methods as well as two new approaches based on ridge regression and adaptive lasso, respectively. These methods are extensively compared both qualitatively and quantitatively within a simulation study and through an application to six diverse real data sets. In addition, all proposed algorithms are implemented in the R package "parcor", available from the R repository CRAN., Comment: added additional experiments
Published: 2009
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

657 results on '"Boulesteix, Anne‐Laure"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources