Back to Search
Start Over
Pseudo support vector domain description to train large-size and continuously growing datasets
- Source :
- Knowledge and Information Systems. 63:2671-2692
- Publication Year :
- 2021
- Publisher :
- Springer Science and Business Media LLC, 2021.
-
Abstract
- Support vector domain description (SVDD) is a data description method inspired by support vector machine (SVM). This classifier describes a set of data points with a sphere that encloses the majority of them and has a minimal volume. The boundary of this sphere is used to classify new samples. SVDD has been successfully applied to many challenging classification problems and has shown a good generalization capability. However, this classifier still has some major weaknesses. This paper focuses on two of them: The first regards the large amount of memory and computational time required by SVDD in the training step. This problem manifests most strongly when dealing with large-size datasets and can hinder or prevent its use. This paper presents an approximate solution to this problem that permits to apply SVDD to large-scale datasets. This new version is based on divide-and-conquer strategy and it processes in two steps: It begins by dividing the whole large-size dataset into random subsets that each can be described efficiently with a small sphere using SVDD. Then, it applies our new algorithm that can find the smallest sphere that encloses the minimal spheres built in the previous step. The second weak point of standard SVDD concerns its static learning process. This classifier must be re-trained with the whole dataset each time when new training samples are available. This paper proposes a new dynamic approach that only trains the new samples with SVDD and incorporates the resulting minimal sphere with the previous one (s) to construct the smallest sphere that encloses all the samples. Like Support Vector Domain Description, the proposed approach can be extended to non-linear classification cases by using kernel functions. Experimental results on artificial and real datasets have successfully validated the performance of our approach.
- Subjects :
- Generalization
business.industry
Computer science
Boundary (topology)
Pattern recognition
Domain (software engineering)
Human-Computer Interaction
Set (abstract data type)
Support vector machine
Data point
Artificial Intelligence
Hardware and Architecture
Classifier (linguistics)
Point (geometry)
Artificial intelligence
business
Software
Information Systems
Subjects
Details
- ISSN :
- 02193116 and 02191377
- Volume :
- 63
- Database :
- OpenAIRE
- Journal :
- Knowledge and Information Systems
- Accession number :
- edsair.doi...........cd32d9b6a897d4b6cb9137037dab87a3