9 results on '"van de Wiel, Mark A."'
Search Results
2. Fast Marginal Likelihood Estimation of Penalties for Group-Adaptive Elastic Net.
- Author
-
van Nee, Mirrelijn M., van de Brug, Tim, and van de Wiel, Mark A.
- Subjects
ASYMPTOTIC normality ,GENOMICS - Abstract
Elastic net penalization is widely used in high-dimensional prediction and variable selection settings. Auxiliary information on the variables, for example, groups of variables, is often available. Group-adaptive elastic net penalization exploits this information to potentially improve performance by estimating group penalties, thereby penalizing important groups of variables less than other groups. Estimating these group penalties is, however, hard due to the high dimension of the data. Existing methods are computationally expensive or not generic in the type of response. Here we present a fast method for estimation of group-adaptive elastic net penalties for generalized linear models. We first derive a low-dimensional representation of the Taylor approximation of the marginal likelihood for group-adaptive ridge penalties, to efficiently estimate these penalties. Then we show by using asymptotic normality of the linear predictors that this marginal likelihood approximates that of elastic net models. The ridge group penalties are then transformed to elastic net group penalties by matching the ridge prior variance to the elastic net prior variance as function of the group penalties. The method allows for overlapping groups and unpenalized variables, and is easily extended to other penalties. For a model-based simulation study and two cancer genomics applications we demonstrate a substantially decreased computation time and improved or matching performance compared to other methods. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Dynamics of methylated cell-free DNA in the urine of non-small cell lung cancer patients.
- Author
-
Bach, Sander, Wever, Birgit M.M., van de Wiel, Mark A., Veltman, Joris D., Hashemi, Sayed M.S., Kazemier, Geert, Bahce, Idris, and Steenbergen, Renske D.M.
- Subjects
CELL-free DNA ,NON-small-cell lung carcinoma ,CIRCULATING tumor DNA ,URODYNAMICS ,CANCER patients ,INTRACLASS correlation - Abstract
High levels of methylated DNA in urine represent an emerging biomarker for non-small cell lung cancer (NSCLC) detection and are the subject of ongoing research. This study aimed to investigate the circadian variation of urinary cell-free DNA (cfDNA) abundance and methylation levels of cancer-associated genes in NSCLC patients. In this prospective study of 23 metastatic NSCLC patients with active disease, patients were asked to collect six urine samples during the morning, afternoon, and evening of two subsequent days. Urinary cfDNA concentrations and methylation levels of CDO1, SOX17, and TAC1 were measured at each time point. Circadian variation and between- and within-subject variability were assessed using linear mixed models. Variability was estimated using the Intraclass Correlation Coefficient (ICC), representing reproducibility. No clear circadian patterns could be recognized for cfDNA concentrations or methylation levels across the different sampling time points. Significantly lower cfDNA concentrations were found in males (p=0.034). For cfDNA levels, the between- and within-subject variability were comparable, rendering an ICC of 0.49. For the methylation markers, ICCs varied considerably, ranging from 0.14 to 0.74. Test reproducibility could be improved by collecting multiple samples per patient. In conclusion, there is no preferred collection time for NSCLC detection in urine using methylation markers, but single measurements should be interpreted carefully, and serial sampling may increase test performance. This study contributes to the limited understanding of cfDNA dynamics in urine and the continued interest in urine-based liquid biopsies for cancer diagnostics. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models.
- Author
-
Veerman, Jurre R., Leday, Gwenaël G. R., and van de Wiel, Mark A.
- Subjects
HERITABILITY ,MULTICOLLINEARITY ,WEIGHT gain ,REGRESSION analysis ,GENETICS - Abstract
For high-dimensional linear regression models, we review and compare several estimators of variances τ 2 and σ 2 of the random slopes and errors, respectively. These variances relate directly to ridge regression penalty λ and heritability index h
2 , often used in genetics. Several estimators of these, either based on cross-validation (CV) or maximum marginal likelihood (MML), are also discussed. The comparisons include several cases of the high-dimensional covariate matrix such as multi-collinear covariates and data-derived ones. Moreover, we study robustness against model misspecifications such as sparse instead of dense effects and non-Gaussian errors. An example on weight gain data with genomic covariates confirms the good performance of MML compared to CV. Several extensions are presented. First, to the high-dimensional linear mixed effects model, with REML as an alternative to MML. Second, to the conjugate Bayesian setting, shown to be a good alternative. Third, and most prominently, to generalized linear models for which we derive a computationally efficient MML estimator by re-writing the marginal likelihood as an n-dimensional integral. For Poisson and Binomial ridge regression, we demonstrate the superior accuracy of the resulting MML estimator of λ as compared to CV. Software is provided to enable reproduction of all results. [ABSTRACT FROM AUTHOR]- Published
- 2022
- Full Text
- View/download PDF
5. Fast Cross-validation for Multi-penalty High-dimensional Ridge Regression.
- Author
-
van de Wiel, Mark A., van Nee, Mirrelijn M., and Rauschenberger, Armin
- Subjects
- *
MULTICOLLINEARITY , *LEAST squares , *MAGNITUDE (Mathematics) , *GENOMICS - Abstract
High-dimensional prediction with multiple data types needs to account for potentially strong differences in predictive signal. Ridge regression is a simple model for high-dimensional data that has challenged the predictive performance of many more complex models and learners, and that allows inclusion of data type-specific penalties. The largest challenge for multi-penalty ridge is to optimize these penalties efficiently in a cross-validation (CV) setting, in particular for GLM and Cox ridge regression, which require an additional estimation loop by iterative weighted least squares (IWLS). Our main contribution is a computationally very efficient formula for the multi-penalty, sample-weighted hat-matrix, as used in the IWLS algorithm. As a result, nearly all computations are in low-dimensional space, rendering a speed-up of several orders of magnitude. We developed a flexible framework that facilitates multiple types of response, unpenalized covariates, several performance criteria and repeated CV. Extensions to paired and preferential data types are included and illustrated on several cancer genomics survival prediction problems. Moreover, we present similar computational shortcuts for maximum marginal likelihood and Bayesian probit regression. The corresponding R-package, multiridge, serves as a versatile standalone tool, but also as a fast benchmark for other more complex models and multi-view learners. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. A Test for Partial Differential Expression.
- Author
-
VAN WIERINGEN, Wessel N., VAN DE WIEL, Mark A., and VAN DER VAART, Aad W.
- Subjects
- *
NONPARAMETRIC statistics , *CANCER genes , *ALGORITHMS , *MEDICAL statistics , *CANCER , *STATISTICS - Abstract
Even in a single-tissue type cancer is often a collection of different diseases, each with its own genetic mechanism. Consequently, a gene may be expressed in some but not all of the tissues in a sample. Differentially expressed genes are commonly detected by methods that test for a shift in location that ignore the possibility of heterogeneous expression. This article proposes a two-sample test statistic designed to detect shifts that occur in only a part of the sample (partial shifts). The statistic is based on the mixing proportion in a nonparametric mixture and minimizes a weighted distance function. The test is shown to be asymptotically distribution free and consistent, and an efficient permutation-based algorithm for estimating the p value is discussed. A simulation study shows that the test is indeed more powerful than the two-sample t test and the Cramér-von Mises test for detecting partial shifts and is competitive for whole-sample shifts. The use of the test is illustrated on real-life cancer datasets, where the test is able to find genes with clear heterogeneous expression associated with reported subtypes of the cancer. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
7. A Lego System for Conditional Interference.
- Author
-
Hothorn, Torsten, Hornik, Kurt, van de Wiel, Mark A., and Zeileis, Achim
- Subjects
ASYMPTOTIC distribution ,DATA analysis ,STATISTICAL hypothesis testing ,MATHEMATICAL statistics ,DISTRIBUTION (Probability theory) - Abstract
Conditioning on the observed data is an important and flexible design principle for statistical test procedures. Although generally applicable, permutation tests currently in use are limited to the treatment of special cases, such as contingency tables or K-sample problems. A new theoretical framework for permutation tests opens up the way to a unified and generalized view. This article argues that the transfer of such a theory to practical data analysis has important implications in many applications and requires tools that enable the data analyst to compute on the theoretical concepts as closely as possible. We reanalyze four datasets by adapting the general conceptual framework to these challenging inference problems and using the coin add-on package in the R system for statistical computing to show what one can gain from going beyond the "classical" test procedures. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
8. The null distribution of Kendall's rank correlation statistic in the presence of ties.
- Author
-
Van De Wiel, Mark A.
- Subjects
- *
STATISTICAL correlation , *MATHEMATICAL statistics , *REGRESSION analysis , *MONTE Carlo method , *MATHEMATICAL models , *NUMERICAL analysis - Abstract
We present new techniques for computing exact p-values of Kendall's rank correlation statistic when ties are present. An explicit formula for the probability-generating function of this statistic in the ease of ties in one ranking is derived. This allows for computation of exact p-values for sample sizes up to 100, regardless of the tie structure. When ties are present in both rankings, one has to fall back on enumeration methods. We discuss how a large part of this enumeration can be avoided. The exact results for ties in one ranking are used to develop an efficient Monte Carlo simulation algorithm for approximating p-values in case of ties in both rankings. Finally, we shortly discuss computer implementations of our methods, which are available on the internet. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
9. Exact Distributions of Multiple Comparisons Rank Statistics.
- Author
-
van de Wiel, Mark A.
- Subjects
- *
ALGORITHMS , *FOUNDATIONS of arithmetic , *MATHEMATICAL functions , *DIFFERENTIAL equations , *MATHEMATICAL analysis , *MATHEMATICAL optimization - Abstract
Computation of exact p values of multiple comparisons rank statistics is often a very time-consuming task. In some cases approximations are available, but they are often unsatisfactory when samples are small or the number of ties is large. Some existing tables of exact critical values contain errors and are limited to small samples without ties and to Wilcoxon scores. Therefore, this article proposes a new algorithm to compute exact p values of generalized Steel statistics for one-way classification. This recursive algorithm builds up a generating function that represents the null distribution of the statistic. Several techniques are used to reduce computing time considerably. An improvement of an existing algorithm is given for computing exact p values, which is valid for a smaller class of statistics than the recursion. However, for this class of statistics, this algorithm outperforms the recursion. How to compute tight bounds on the p value when exact computation is too time-consuming is discussed. Also discussed is when to use which method (exact, bounds, normal approximation, or simulation) for computation of p values or critical values. [ABSTRACT FROM AUTHOR]
- Published
- 2002
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.