Descriptor: "RANDOM forest algorithms" / Language: thai / Topic: acquisition of data - Searchworks@Jio Institute Digital Library Search Results

Author: วสวัตติ์ อินทร์แปลง and จารี ทองคำา
Subjects: *RANDOM forest algorithms, *ACQUISITION of data, *SUPPORT vector machines, *ELECTRONIC data processing, *DATA analysis, *NEAREST neighbor analysis (Statistics)
Abstract: Text mining is one of the most effective data analysis processes using alphabetic methods. Currently, text mining techniques are classified in a variety of ways. This research aims to find the most effective of 5 techniques that were Naïve Bayes, Support Vector Machine (SVM), K-Nearest Neighbor C4.5, and Random Forest. The data collected, in total of 3,798 messages, were all made by the viewers. The categorization process divided the data into 2 groups: positive character and negative character. Interestingly, the process has only indicated selection of adverbs and slangs as a core division to produce positive and negative characters. After analyzing the data, two problems were found class imbalanced. SMOTE were used for filtering and to increase the minority class. 10-fold cross validation was applied to segment the data into training and testing sets. Moreover, precision recall and accuracy are used as the criteria for selecting the most effective model. The results showed that the K-Nearest Neighbor produced greatest accuracy in categorizing the messages with a precision of 99.75% recall of 100% and accuracy score of 99.87%. [ABSTRACT FROM AUTHOR]
Published: 2020

Searchworks