1. Universal Feature Selection for Simultaneous Interpretability of Multitask Datasets
- Author
-
Raymond, Matt, Saldinger, Jacob Charles, Elvati, Paolo, Scott, Clayton, and Violi, Angela
- Subjects
Computer Science - Machine Learning - Abstract
Extracting meaningful features from complex, high-dimensional datasets across scientific domains remains challenging. Current methods often struggle with scalability, limiting their applicability to large datasets, or make restrictive assumptions about feature-property relationships, hindering their ability to capture complex interactions. BoUTS's general and scalable feature selection algorithm surpasses these limitations to identify both universal features relevant to all datasets and task-specific features predictive for specific subsets. Evaluated on seven diverse chemical regression datasets, BoUTS achieves state-of-the-art feature sparsity while maintaining prediction accuracy comparable to specialized methods. Notably, BoUTS's universal features enable domain-specific knowledge transfer between datasets, and suggest deep connections in seemingly-disparate chemical datasets. We expect these results to have important repercussions in manually-guided inverse problems. Beyond its current application, BoUTS holds immense potential for elucidating data-poor systems by leveraging information from similar data-rich systems. BoUTS represents a significant leap in cross-domain feature selection, potentially leading to advancements in various scientific fields., Comment: Main text: 14 pages, 3 figures, 1 table; SI: 7 pages, 1 figure, 4 tables, 3 algorithms
- Published
- 2024