1. Biomarker identification by interpretable maximum mean discrepancy.
- Author
-
Adamer, Michael F, Brüningk, Sarah C, Chen, Dexiong, and Borgwardt, Karsten
- Subjects
STATISTICAL hypothesis testing ,FEATURE selection ,UNIVARIATE analysis ,GENE expression ,SAMPLE size (Statistics) - Abstract
Motivation In many biomedical applications, we are confronted with paired groups of samples, such as treated versus control. The aim is to detect discriminating features, i.e. biomarkers, based on high-dimensional (omics-) data. This problem can be phrased more generally as a two-sample problem requiring statistical significance testing to establish differences , and interpretations to identify distinguishing features. The multivariate maximum mean discrepancy (MMD) test quantifies group-level differences, whereas statistically significantly associated features are usually found by univariate feature selection. Currently, few general-purpose methods simultaneously perform multivariate feature selection and two-sample testing. Results We introduce a sparse, interpretable, and optimized MMD test (SpInOpt-MMD) that enables two-sample testing and feature selection in the same experiment. SpInOpt-MMD is a versatile method and we demonstrate its application to a variety of synthetic and real-world data types including images, gene expression measurements, and text data. SpInOpt-MMD is effective in identifying relevant features in small sample sizes and outperforms other feature selection methods such as SHapley Additive exPlanations and univariate association analysis in several experiments. Availability and implementation The code and links to our public data are available at https://github.com/BorgwardtLab/spinoptmmd. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF