13 results on '"Trong, Dung Nguyen"'
Search Results
2. [Untitled]
- Author
-
Hiroshi Shimodaira, Masayuki Kimura, Tu Bao Ho, and Trong Dung Nguyen
- Subjects
Computer science ,Rule induction ,business.industry ,Model selection ,Conceptual clustering ,Decision tree ,computer.software_genre ,Machine learning ,Visualization ,Knowledge extraction ,Artificial Intelligence ,Data mining ,Artificial intelligence ,View model ,business ,computer ,Selection (genetic algorithm) - Abstract
The process of knowledge discovery in databases consists of several steps that are iterative and interactive. In each application, to go through this process the user has to exploit different algorithms and their settings that usually yield multiple models. Model selection, that is, the selection of appropriate models or algorithms to achieve such models, requires meta-knowledge of algorithm/model and model performance metrics. Therefore, model selection is usually a difficult task for the user. We believe that simplifying the process of model selection for the user is crucial to the success of real-life knowledge discovery activities. As opposed to most related work that aims to automate model selection, in our view model selection is a semiautomatic process, requiring an effective collaboration between the user and the discovery system. For such a collaboration, our solution is to give the user the ability to try various alternatives and to compare competing models quantitatively by performance metrics, and qualitatively by effective visualization. This paper presents our research on model selection and visualization in the development of a knowledge discovery system called D2MS. The paper addresses the motivation of model selection in knowledge discovery and related work, gives an overview of D2MS, and describes its solution to model selection and visualization. It then presents the usefulness of D2MS model selection in two case studies of discovering medical knowledge in hospital data—on meningitis and stomach cancer—using three data mining methods of decision trees, conceptual clustering, and rule induction.
- Published
- 2003
3. VISUALIZATION SUPPORT FOR USER-CENTERED MODEL SELECTION IN KNOWLEDGE DISCOVERY AND DATA MINING
- Author
-
Tu Bao Ho, Saori Kawasaki, DucDung Nguyen, and Trong Dung Nguyen
- Subjects
Computer science ,Service discovery ,Open Knowledge Base Connectivity ,computer.software_genre ,Data science ,Visualization ,Business process discovery ,Knowledge-based systems ,Knowledge extraction ,Artificial Intelligence ,Domain knowledge ,Software mining ,Data mining ,computer - Abstract
The problem of model selection in knowledge discovery and data mining—the selection of appropriate discovered patterns/models or algorithms to achieve such patterns/models—is generally a difficult task for the user as it requires meta-knowledge on algorithms/models and model performance metrics. Viewing knowledge discovery as a human-centered process that requires an effective collaboration between the user and the discovery system, our work aims to make model selection in knowledge discovery easier and more effective. For such a collaboration, our solution is to give the user the ability to try easily various alternatives and to compare competing models quantitatively and qualitatively. The basic idea of our solution is to integrate data and knowledge visualization with the knowledge discovery process in order to the support the participation of the user. We introduce the knowledge discovery system D2MS in which several visualization techniques of data and knowledge are developed and integrated into the steps of the knowledge discovery process. The visualizers in D2MS greatly help the user gain better insight in each step of the knowledge discovery process as well the relationship between data and discovered knowledge in the whole process.
- Published
- 2001
4. Combining Temporal Abstraction and Data Mining Methods in Medical Data Analysis
- Author
-
Trong Dung Nguyen, Si Quang Le, Tu Bao Ho, and Saori Kawasaki
- Subjects
Association rule learning ,Computer science ,Data mining ,computer.software_genre ,Missing data ,computer ,Knowledge acquisition ,Bottleneck ,Expert system ,Task (project management) ,Abstraction (linguistics) ,Domain (software engineering) - Abstract
Medicine has been a traditional domain for artificial intelligence (AI) research and application. It can be observed that the focus on expert systems (ES) in medicine in early days of AI has been changed to intelligent data analysis (IDA) in medicine, especially by machine learning and data mining techniques [Kononenko 01], [Lavrac et al. 97], [Cios 01]. At least, two reasons for the new trend are the bottleneck of knowledge acquisition and the explosive growth of medical databases. Intelligent data analysis in medicine has its own features because of the characteristics of medical data. These characteristics include the incompleteness (missing values), incorrectness (noise in data), sparseness (few and/or non-representable patient records available), and inexactness (inappropriate selection of parameters for a given task). Moreover, medical databases are characterized by the particular constraints and difficulties of the privacy-sensitive, heterogeneous, but voluminous, data of medicine [Cios and Moore 02].
- Published
- 2005
5. Mining hepatitis data with temporal abstraction
- Author
-
Hideto Yokoi, Trong Dung Nguyen, Si Quang Le, Tu Bao Ho, Saori Kawasaki, Dung Duc Nguyen, and Katsuhiko Takabayashi
- Subjects
Hepatitis ,Thesaurus (information retrieval) ,Information retrieval ,Computer science ,Data stream mining ,medicine ,Data mining ,medicine.disease ,computer.software_genre ,computer ,Temporal database ,Abstraction (linguistics) - Abstract
The hepatitis temporal database collected at Chiba university hospital between 1982--2001 was recently given to challenge the KDD research. The database is large where each patient corresponds to 983 tests represented as sequences of irregular timestamp points with different lengths. This paper presents a temporal abstraction approach to mining knowledge from this hepatitis database. Exploiting hepatitis background knowledge and data analysis, we introduce new notions and methods for abstracting short-term changed and long-term changed tests. The abstracted data allow us to apply different machine learning methods for finding knowledge part of which is considered as new and interesting by medical doctors.
- Published
- 2003
6. Discovery of Trends and States in Irregular Medical Temporal Data
- Author
-
Saori Kawasaki, Trong Dung Nguyen, and Tu Bao Ho
- Subjects
Data abstraction ,Knowledge extraction ,Computer science ,Interval (mathematics) ,Data mining ,computer.software_genre ,computer ,Abstraction (linguistics) ,Temporal database ,Domain (software engineering) - Abstract
Temporal abstraction has been known as a powerful approach of data abstraction by converting temporal data into interval with abstracted values including trends and states. Most temporal abstraction methods, however, has been developed for regular temporal data, and they cannot be used when temporal data are collected irregularly. In this paper we introduced a temporal abstraction approach to irregular temporal data inspired from a real-life application of a large database in hepatitis domain.
- Published
- 2003
7. Visualization support for user-centered model selection in knowledge discovery in databases
- Author
-
Tu Bao Ho and Trong Dung Nguyen
- Subjects
Database ,Computer science ,Process (engineering) ,business.industry ,Model selection ,Machine learning ,computer.software_genre ,Visualization ,Data visualization ,Knowledge extraction ,Data pre-processing ,Artificial intelligence ,View model ,business ,computer ,Selection (genetic algorithm) - Abstract
The process of knowledge discovery in databases inherently consists of several steps that are necessarily iterative and interactive. In each application, to go through this process the user has to exploit different algorithms and their settings that usually yield different discovered models. The selection of appropriate discovered models or algorithms to achieve such models, referred to as model selection-requires meta-knowledge on algorithm/model and model performance metrics - is generally a difficult task for the user. Taking account of this difficulty, we consider that the ease of model selection is crucial in the success of real-life knowledge discovery activities. Different from most related work that aims to an automatic model selection, in our view model selection should be a semiautomatic work requiring an effective collaboration between the user and the discovery system. For such a collaboration, our solution is to give the user the ability to try easily various alternatives and to compare competing models quantitatively by performance metrics, and qualitatively by effective visualization. This paper presents our research on such model selection and visualization in the development of a knowledge discovery system called D2MS.
- Published
- 2002
8. Visualization method and tool for interactive learning of large decision trees
- Author
-
Tu Bao Ho and Trong Dung Nguyen
- Subjects
Incremental decision tree ,Computer science ,Process (engineering) ,business.industry ,Decision tree learning ,ID3 algorithm ,Decision tree ,Machine learning ,computer.software_genre ,Data science ,Interactive Learning ,Visualization ,Tree (data structure) ,Artificial intelligence ,business ,computer - Abstract
When learning from large datasets, decision tree induction programs often produce very large trees. How to visualize efficiently trees in the learning process, particularly large trees, is still questionable and currently requires efficient tools. This paper presents a visualization method and tool for interactive learning of large decision trees, that includes a new visualization technique called T2.5D (stands for Tress 2.5 Dimensions). After a brief discussion on requirements for tree visualizers and related work, the paper focuses on presenting developing techniques for the issues (1) how to visualize efficiently large decision trees; and (2) how to visualize decision trees in the learning process.
- Published
- 2002
9. A User-Centered Visual Approach to Data Mining
- Author
-
DucDung Nguyen, Tu Bao Ho, and Trong Dung Nguyen
- Subjects
Computer science ,business.industry ,Process (engineering) ,Model selection ,computer.software_genre ,Visualization ,Active participation ,Visual approach ,Text mining ,Knowledge extraction ,Key (cryptography) ,Data mining ,business ,computer - Abstract
We present a human-centered approach to model selection in machine learning and data mining that emphasizes and facilitates the active participation of the user in the knowledge discovery process with quantitative and qualitative evaluation of patterns/models. The key idea of such a model selection is it would result from a combination of a quantitative evaluation of model characteristics and performance metrics with a qualitative evaluation of patterns/model by the user. We develop data mining methods integrated with visualization tools in the user-centered visual system D2MS (Data Mining with Model Selection). We finally present a case-study of D2MS in mining stomach cancer data.
- Published
- 2002
10. A Scalable Algorithm for Rule Post-pruning of Large Decision Trees
- Author
-
Hiroshi Shimodaira, Tu Bao Ho, and Trong Dung Nguyen
- Subjects
Incremental decision tree ,business.industry ,Computer science ,Decision tree learning ,ID3 algorithm ,Decision tree ,Machine learning ,computer.software_genre ,Set (abstract data type) ,Tree (data structure) ,Tree structure ,Knowledge extraction ,Pruning (decision trees) ,Artificial intelligence ,Data mining ,business ,computer - Abstract
Decision tree learning has become a popular and practical method in data mining because of its high predictive accuracy and ease of use. However, a set of if-then rules generated from large trees may be preferred in many cases because of at least three reasons: (i) large decision trees are difficult to understand as we may not see their hierarchical structure or get lost in navigating them, (ii) the tree structure may cause individual subconcepts to be fragmented (this is sometimes known as the "replicated subtree" problem), (iii) it is easier to combine new discovered rules with existing knowledge in a given domain. To fulfill that need, the popular decision tree learning system C4.5 applies a rule post-pruning algorithm to transform a decision tree into a rule set. However, by using a global optimization strategy, C4.5rules functions extremely slow on large datasets. On the other hand, rule post-pruning algorithms that learn a set of rules by the separate-and-conquer strategy such as CN2, IREP, or RIPPER can be scalable to large datasets, but they suffer from the crucial problem of overpruning, and do not often achieve a high accuracy as C4.5. This paper proposes a scalable algorithm for rule post-pruning of large decision trees that employs incremental pruning with improvements in order to overcome the overpruning problem. Experiments show that the new algorithm can produce rule sets that are as accurate as those generated by C4.5 and is scalable for large datasets.
- Published
- 2001
11. Interactive Visualization in Mining Large Decision Trees
- Author
-
Trong Dung Nguyen, Hiroshi Shimodaira, and Tu Bao Ho
- Subjects
business.industry ,Computer science ,Information processing ,Decision tree ,computer.software_genre ,Knowledge acquisition ,Field (computer science) ,Information visualization ,Tree (data structure) ,Data visualization ,Data mining ,business ,Interactive visualization ,computer - Abstract
This paper presents a tree visualizer that combines several techniques from the field of information visualization to handle efficiently large decision trees in an interactive mining system.
- Published
- 2000
12. Induction of Decision Trees Based on the Rough Set Theory
- Author
-
Masayuki Kimura, Trong Dung Nguyen, and Tu Bao Ho
- Subjects
Dependency (UML) ,Computer science ,business.industry ,Dominance-based rough set approach ,Decision tree ,Feature selection ,Machine learning ,computer.software_genre ,Measure (mathematics) ,Data set ,Relevance (information retrieval) ,Rough set ,Artificial intelligence ,business ,computer - Abstract
This paper aimed at two following objectives. One was the introduction of a new measure (R-measure) of dependency between groups of attributes in a data set, inspired by the notion of dependency of attribute in the rough set theory. The second was the application of this measure to the problem of attribute selection in decision tree induction, and an experimental comparative evaluation of decision tree systems using R-measure and other different attribute selection measures most of them are widely used in machine learning: gain-ratio, gini-index, d N distance, relevance, x 2.
- Published
- 1998
13. Interactive visualisation for predictive modelling with decision tree induction
- Author
-
Tu Bao Ho and Trong Dung Nguyen
- Subjects
Decision support system ,business.industry ,Computer science ,Model selection ,Decision tree ,computer.software_genre ,Machine learning ,Interactive Learning ,Visualization ,Information system ,Artificial intelligence ,Data mining ,business ,Interactive visualization ,computer ,Selection (genetic algorithm) - Abstract
In this paper we describe system CABRO for decision tree induction (DTI) that contributes to the combination of machine learning, visualisation and model selection techniques. We first discuss some issues in data mining and briefly introduce R-measure for attribution selection problem in DTI. We then present the DTI interactive visualisation system CABRO, based on R-measure and a combination of several DTI techniques, in which we focus on solutions to two problems: (1) support for understanding of large decision trees, and (2) support for interactive learning and model selection.
- Published
- 1998
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.