5 results on '"Kui-Yu Lin"'
Search Results
2. Causality-based Feature Selection: Methods and Evaluations.
- Author
-
KUI YU, XIANJIE GUO, LIN LIU, JIUYONG LI, HAO WANG, ZHAOLONG LING, and XINDONG WU
- Subjects
FEATURE selection ,EVALUATION methodology ,MACHINE learning ,PREDICTION models - Abstract
Feature selection is a crucial preprocessing step in data analytics and machine learning. Classical feature selection algorithms select features based on the correlations between predictive features and the class variable and do not attempt to capture causal relationships between them. It has been shown that the knowledge about the causal relationships between features and the class variable has potential benefits for building interpretable and robust prediction models, since causal relationships imply the underlying mechanism of a system. Consequently, causality-based feature selection has gradually attracted greater attentions and many algorithms have been proposed. In this article, we present a comprehensive review of recent advances in causality-based feature selection. To facilitate the development of new algorithms in the research area and make it easy for the comparisons between new methods and existing ones, we develop the first open-source package, called CausalFS, which consists of most of the representative causality-based feature selection algorithms (available at https://github.com/kuiy/CausalFS). Using CausalFS, we conduct extensive experiments to compare the representative algorithms with both synthetic and real-world datasets. Finally, we discuss some challenging problems to be tackled in future research. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
3. Multi-Source Causal Feature Selection.
- Author
-
Yu, Kui, Liu, Lin, Li, Jiuyong, Ding, Wei, and Le, Thuc Duy
- Subjects
FEATURE selection ,INVARIANT sets ,ALGORITHMS ,GENE expression ,PREDICTION models - Abstract
Causal feature selection has attracted much attention in recent years, as the causal features selected imply the causal mechanism related to the class attribute, leading to more reliable prediction models built using them. Currently there is a need of developing multi-source feature selection methods, since in many applications data for studying the same problem has been collected from various sources, such as multiple gene expression datasets obtained from different experiments for studying the causes of the same disease. However, the state-of-the-art causal feature selection methods generally tackle a single dataset, and a direct application of the methods to multiple datasets will result in unreliable results as the datasets may have different distributions. To address the challenges, by utilizing the concept of causal invariance in causal inference, we first formulate the problem of causal feature selection with multiple datasets as a search problem for an invariant set across the datasets, then give the upper and lower bounds of the invariant set, and finally we propose a new Multi-source Causal Feature Selection algorithm, MCFS. Using synthetic and real world datasets and 16 feature selection methods, the extensive experiments have validated the effectiveness of MCFS. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
4. Learning Markov Blankets From Multiple Interventional Data Sets.
- Author
-
Yu, Kui, Liu, Lin, and Li, Jiuyong
- Subjects
DATA distribution ,CAUSAL models ,MACHINE learning ,GLOBAL method of teaching - Abstract
Learning Markov blankets (MBs) plays an important role in many machine learning tasks, such as causal Bayesian network structure learning, feature selection, and domain adaptation. Since variables included in the MB of a target variable of interest have causal relationships with the target, the MB can serve as the basis of learning the global structure of a causal Bayesian network or as a reliable and robust feature set for classification, both within the same domain or across domains. In this article, we study the problem of learning the MB of a target variable from multiple interventional data sets. Data sets attained from interventional experiments contain richer causal information than passively observed data (observational data) for MB discovery. However, almost all existing MB discovery methods are designed for learning MBs from a single observational data set. To learn MBs from multiple interventional data sets, we face two challenges: 1) unknown intervention variables and 2) nonidentical data distributions. To address these challenges, we theoretically analyze: 1) under what conditions we can find the correct MB of a target variable and 2) under what conditions we can identify the causes of the target variable via discovering its MB. Based on the theoretical analysis, we propose a new algorithm for learning MBs from multiple interventional data sets, and we present the conditions/assumptions that assure the correctness of the algorithm. To the best of our knowledge, this article is the first to present the theoretical analyses about the conditions for MB discovery in multiple interventional data sets and the algorithm to find the MBs in relation to the conditions. Using benchmark Bayesian networks and real-world data sets, the experiments have validated the effectiveness and efficiency of the proposed algorithm in this article. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
5. Mining Markov Blankets Without Causal Sufficiency.
- Author
-
Yu, Kui, Liu, Lin, Li, Jiuyong, and Chen, Huanhuan
- Subjects
ARTIFICIAL neural networks ,MARKOV processes ,MACHINE learning - Abstract
Markov blankets (MBs) in Bayesian networks (BNs) play an important role in both local causal discovery and large-scale BN structure learning. Almost all existing MB discovery algorithms are designed under the assumption of causal sufficiency, which states that there are no latent common causes for two or more of the observed variables in data. However, latent common causes are ubiquitous in many applications, and hence, this assumption is often violated in practice. Thus, developing algorithms for discovering MBs without assuming causal sufficiency is of practical significance, and it is crucial for causal structure learning in real-world data. In this paper, we focus on addressing this problem. Specifically, we adopt a maximal ancestral graph (MAG) model to represent latent common causes and the concept of MBs without assuming causal sufficiency. Then, we propose an effective and efficient algorithm to discover the MB of a target variable in an MAG. Using benchmark and real-world data sets, the experiments validate the algorithm proposed in this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.