Start Over

Pseudo support vector domain description to train large-size and continuously growing datasets

Authors :: Mohamed El Boujnouni
Source :: Knowledge and Information Systems. 63:2671-2692
Publication Year :: 2021
Publisher :: Springer Science and Business Media LLC, 2021.
Abstract: Support vector domain description (SVDD) is a data description method inspired by support vector machine (SVM). This classifier describes a set of data points with a sphere that encloses the majority of them and has a minimal volume. The boundary of this sphere is used to classify new samples. SVDD has been successfully applied to many challenging classification problems and has shown a good generalization capability. However, this classifier still has some major weaknesses. This paper focuses on two of them: The first regards the large amount of memory and computational time required by SVDD in the training step. This problem manifests most strongly when dealing with large-size datasets and can hinder or prevent its use. This paper presents an approximate solution to this problem that permits to apply SVDD to large-scale datasets. This new version is based on divide-and-conquer strategy and it processes in two steps: It begins by dividing the whole large-size dataset into random subsets that each can be described efficiently with a small sphere using SVDD. Then, it applies our new algorithm that can find the smallest sphere that encloses the minimal spheres built in the previous step. The second weak point of standard SVDD concerns its static learning process. This classifier must be re-trained with the whole dataset each time when new training samples are available. This paper proposes a new dynamic approach that only trains the new samples with SVDD and incorporates the resulting minimal sphere with the previous one (s) to construct the smallest sphere that encloses all the samples. Like Support Vector Domain Description, the proposed approach can be extended to non-linear classification cases by using kernel functions. Experimental results on artificial and real datasets have successfully validated the performance of our approach.

Subjects :: Generalization
business.industry
Computer science
Boundary (topology)
Pattern recognition
Domain (software engineering)
Human-Computer Interaction
Set (abstract data type)
Support vector machine
Data point
Artificial Intelligence
Hardware and Architecture
Classifier (linguistics)
Point (geometry)
Artificial intelligence
business
Software
Information Systems

Details

ISSN :: 02193116 and 02191377
Volume :: 63
Database :: OpenAIRE
Journal :: Knowledge and Information Systems
Accession number :: edsair.doi...........cd32d9b6a897d4b6cb9137037dab87a3

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Pseudo support vector domain description to train large-size and continuously growing datasets

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Pseudo support vector domain description to train large-size and continuously growing datasets

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources