Back to Search Start Over

Theoretical derivation of interval principal component analysis.

Authors :
Girão Serrão, Rodrigo
Oliveira, M. Rosário
Oliveira, Lina
Source :
Information Sciences. Apr2023, Vol. 621, p227-247. 21p.
Publication Year :
2023

Abstract

• We develop a mathematical framework that allows for the definition of interval principal component analysis. • Interval-valued principal components are defined as linear combinations of the original interval-valued variables of maximum symbolic variance. • The projection of the original symbolic observations on the reduced space spanned by the first principal components allows for the use of interval principal component analysis as a dimensionality reduction method. • We develop an outlier-detection method based on the first interval-valued principal components. • We explore real world data from the telecommunications sector, to detect Internet redirection attacks. Principal Component Analysis is a well-known method that can be used for dimensionality reduction, a useful technique in the Big Data era. There have been a series of proposed adaptations of the Principal Component Analysis method for interval-valued symbolic data, all of which have the downside of having intermediate steps that deal with conventional data. In this work, we put forward an interval Principal Component Analysis that only deals with symbolic data by developing a theoretical framework that allows for the definition of symbolic principal components. This framework provides the mathematical tools needed to use the symbolic principal components to transform the original data in a way that is mathematically coherent with the remainder of the framework and defines the principal components as solutions to maximisation problems, similarly to what is done in conventional Principal Component Analysis. After the theoretical foundations are laid down, we explore real-world data from the telecommunications sector, in an attempt to detect Internet redirection attacks in real-time. In particular, we use our symbolic method to improve and simplify an anomaly detection method that has been proposed in the literature for conventional data. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00200255
Volume :
621
Database :
Academic Search Index
Journal :
Information Sciences
Publication Type :
Periodical
Accession number :
161726844
Full Text :
https://doi.org/10.1016/j.ins.2022.11.093