51. An attribute extending method to improve learning performance for small datasets
- Author
-
Liang Sian Lin, Yu Chun Chiang, Der-Chiang Li, and Hung Yu Chen
- Subjects
0209 industrial biotechnology ,Computer science ,business.industry ,Cognitive Neuroscience ,Sample (statistics) ,02 engineering and technology ,Extension (predicate logic) ,Space (commercial competition) ,Machine learning ,computer.software_genre ,Fuzzy logic ,Representativeness heuristic ,Computer Science Applications ,020901 industrial engineering & automation ,Null (SQL) ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer - Abstract
A small dataset often makes it difficult to build a reliable learning model, and thus some researchers have proposed virtual sample generation (VSG) methods to add artificial samples into small datasets to extend the data size. However, for some datasets the assumption of the distribution of data in the VSG methods may be vague, and when data only has a few attributes, such approaches may not work effectively. Other researchers thus proposed attribute extension methods to generate attributes to convert data into a higher dimensional space. Unfortunately, the resulting dataset may become a sparse dataset with many null or zero values in extended attributes, and then a large quantity of such attributes will reduce the representativeness of instances for the learning model. Therefore, based on fuzzy theories, this paper proposes a novel sample attribute extending (SEA) method to extend a suitable quantity of attributes to improve small dataset learning. In order to verify the validity of the SEA method, using SVR and BPNN, this paper adopts two real cases and two public datasets to conduct the learning of the predictive model, and uses the paired t-test to statistically examine the significance of improvement. The experimental results show that the proposed SEA method can effectively improve the learning accuracy of small datasets.
- Published
- 2018