742,519 results on '"Outlier"'
Search Results
102. The Outlier : The Unfinished Presidency of Jimmy Carter
- Author
-
Kai Bird and Kai Bird
- Subjects
- Presidents--United States--Biography
- Abstract
“Important... [a] landmark presidential biography... Bird is able to build a persuasive case that the Carter presidency deserves this new look.”—The New York Times Book ReviewAn essential re-evaluation of the complex triumphs and tragedies of Jimmy Carter's presidential legacy—from the expert biographer and Pulitzer Prize–winning co-author of American PrometheusFour decades after Ronald Reagan's landslide win in 1980, Jimmy Carter's one-term presidency is often labeled a failure; indeed, many Americans view Carter as the only ex-president to have used the White House as a stepping-stone to greater achievements. But in retrospect the Carter political odyssey is a rich and human story, marked by both formidable accomplishments and painful political adversity. In this deeply researched, brilliantly written account, Pulitzer Prize–winning biographer Kai Bird deftly unfolds the Carter saga as a tragic tipping point in American history.As president, Carter was not merely an outsider; he was an outlier. He was the only president in a century to grow up in the heart of the Deep South, and his born-again Christianity made him the most openly religious president in memory. This outlier brought to the White House a rare mix of humility, candor, and unnerving self-confidence that neither Washington nor America was ready to embrace. Decades before today's public reckoning with the vast gulf between America's ethos and its actions, Carter looked out on a nation torn by race and demoralized by Watergate and Vietnam and prescribed a radical self-examination from which voters recoiled. The cost of his unshakable belief in doing the right thing would be losing his re-election bid—and witnessing the ascendance of Reagan.In these remarkable pages, Bird traces the arc of Carter's administration, from his aggressive domestic agenda to his controversial foreign policy record, taking readers inside the Oval Office and through Carter's battles with both a political establishment and a Washington press corps that proved as adversarial as any foreign power. Bird shows how issues still hotly debated today—from national health care to growing inequality and racism to the Israeli-Palestinian conflict—burned at the heart of Carter's America, and consumed a president who found a moral duty in solving them.Drawing on interviews with Carter and members of his administration and recently declassified documents, Bird delivers a profound, clear-eyed evaluation of a leader whose legacy has been deeply misunderstood. The Outlier is the definitive account of an enigmatic presidency—both as it really happened and as it is remembered in the American consciousness.
- Published
- 2021
103. Adulteration detection of edible oil by one‐class classification and outlier detection
- Author
-
Xinjing Dou, Fengqin Tu, Li Yu, Yong Yang, Fei Ma, Xuefang Wang, Du Wang, Liangxiao Zhang, Xiaoming Jiang, and Peiwu Li
- Subjects
edible oil ,food adulteration ,market surveillance ,one‐class classification ,outlier detection ,Nutrition. Foods and food supply ,TX341-641 ,Food processing and manufacture ,TP368-456 - Abstract
Abstract Edible oil adulteration is a mostly practiced phenomenon. However, the traditional discriminant methods fail to detect oil adulteration involving more than one adulterant. Recently, one‐class classifiers were built for food or oil authentication. Unfortunately, as it is hard to determine the application domain of the one‐class classifier, high prediction error was obtained for real samples in market surveillance. In this study, a new method was developed based on one‐class classification and outlier detection for edible oil adulteration detection in market surveillance. The model population was constructed using Monte Carlo sampling of unidentified inspected samples to select the plateau region exhibiting the highest accumulated absolute centered residual (ACR) values. Subsequently, the number of models in the plateau region was validated by the theoretical ones calculated by the classical probability model. The models in the plateau region with the highest cumulative accumulated ACR values were used to identify adulterated oils. Furthermore, the cross‐validation was conducted by comparing identification results from two different Monte Carlo sampling ratios to ensure the accuracy of our method. Both single adulteration and multiple adulteration of peanut oils were prepared to validate our method. Moreover, this method was used to detect adulteration of sesame oils, which have already been identified by the markers in our previous study. The validation results of three datasets indicated that this method could effectively identify adulterated samples and therefore provide a novel solution for inspecting potential adulteration in practice.
- Published
- 2024
- Full Text
- View/download PDF
104. ASOD: an adaptive stream outlier detection method using online strategy
- Author
-
Zhichao Hu, Xiangzhan Yu, Likun Liu, Yu Zhang, and Haining Yu
- Subjects
Outlier detection ,Anomaly detection ,Stream data ,Online learning ,Machine learning ,Computer engineering. Computer hardware ,TK7885-7895 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Abstract In the current era of information technology, blockchain is widely used in various fields, and the monitoring of the security and status of the blockchain system is of great concern. Online anomaly detection for the real-time stream data plays vital role in monitoring strategy to find abnormal events and status of blockchain system. However, as the high requirements of real-time and online scenario, online anomaly detection faces many problems such as limited training data, distribution drift, and limited update frequency. In this paper, we propose an adaptive stream outlier detection method (ASOD) to overcome the limitations. It first designs a K-nearest neighbor Gaussian mixture model (KNN-GMM) and utilizes online learning strategy. So, it is suitable for online scenarios and does not rely on large training data. The K-nearest neighbor optimization limits the influence of new data locally rather than globally, thus improving the stability. Then, ASOD applies the mechanism of dynamic maintenance of Gaussian components and the strategy of dynamic context control to achieve self-adaptation to the distribution drift. And finally, ASOD adopts a dimensionless distance metric based on Mahalanobis distance and proposes an automatic threshold method to accomplish anomaly detection. In addition, the KNN-GMM provides the life cycle and the anomaly index for continuous tracking and analysis, which facilities the cause analysis and further interpretation and traceability. From the experimental results, it can be seen that ASOD achieves near-optimal F1 and recall on the NAB dataset with an improvement of 6% and 20.3% over the average, compared to baselines with sufficient training data. ASOD has the lowest F1 variance among the five best methods, indicating that it is effective and stable for online anomaly detection on stream data.
- Published
- 2024
- Full Text
- View/download PDF
105. Rule-based outlier detection of AI-generated anatomy segmentations
- Author
-
Krishnaswamy, Deepa, Thiriveedhi, Vamsi Krishna, Ciausu, Cosmin, Clunie, David, Pieper, Steve, Kikinis, Ron, and Fedorov, Andrey
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing - Abstract
There is a dire need for medical imaging datasets with accompanying annotations to perform downstream patient analysis. However, it is difficult to manually generate these annotations, due to the time-consuming nature, and the variability in clinical conventions. Artificial intelligence has been adopted in the field as a potential method to annotate these large datasets, however, a lack of expert annotations or ground truth can inhibit the adoption of these annotations. We recently made a dataset publicly available including annotations and extracted features of up to 104 organs for the National Lung Screening Trial using the TotalSegmentator method. However, the released dataset does not include expert-derived annotations or an assessment of the accuracy of the segmentations, limiting its usefulness. We propose the development of heuristics to assess the quality of the segmentations, providing methods to measure the consistency of the annotations and a comparison of results to the literature. We make our code and related materials publicly available at https://github.com/ImagingDataCommons/CloudSegmentatorResults and interactive tools at https://huggingface.co/spaces/ImagingDataCommons/CloudSegmentatorResults.
- Published
- 2024
106. Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
- Author
-
Wagner, Dominik, Baumann, Ilja, Riedhammer, Korbinian, and Bocklet, Tobias
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
This paper explores the improvement of post-training quantization (PTQ) after knowledge distillation in the Whisper speech foundation model family. We address the challenge of outliers in weights and activation tensors, known to impede quantization quality in transformer-based language and vision models. Extending this observation to Whisper, we demonstrate that these outliers are also present when transformer-based models are trained to perform automatic speech recognition, necessitating mitigation strategies for PTQ. We show that outliers can be reduced by a recently proposed gating mechanism in the attention blocks of the student model, enabling effective 8-bit quantization, and lower word error rates compared to student models without the gating mechanism in place., Comment: Accepted at Interspeech 2024
- Published
- 2024
107. Outlier detection in maritime environments using AIS data and deep recurrent architectures
- Author
-
Maganaris, Constantine, Protopapadakis, Eftychios, and Doulamis, Nikolaos
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,68T10 - Abstract
A methodology based on deep recurrent models for maritime surveillance, over publicly available Automatic Identification System (AIS) data, is presented in this paper. The setup employs a deep Recurrent Neural Network (RNN)-based model, for encoding and reconstructing the observed ships' motion patterns. Our approach is based on a thresholding mechanism, over the calculated errors between observed and reconstructed motion patterns of maritime vessels. Specifically, a deep-learning framework, i.e. an encoder-decoder architecture, is trained using the observed motion patterns, enabling the models to learn and predict the expected trajectory, which will be compared to the effective ones. Our models, particularly the bidirectional GRU with recurrent dropouts, showcased superior performance in capturing the temporal dynamics of maritime data, illustrating the potential of deep learning to enhance maritime surveillance capabilities. Our work lays a solid foundation for future research in this domain, highlighting a path toward improved maritime safety through the innovative application of technology., Comment: Presented in PETRA '24 The PErvasive Technologies Related to Assistive Environments Conference June 26--28, 2024 Crete, Greece
- Published
- 2024
- Full Text
- View/download PDF
108. A novel robust meta-analysis model using the $t$ distribution for outlier accommodation and detection
- Author
-
Wang, Yue, Zhao, Jianhua, Jiang, Fen, Shi, Lei, and Pan, Jianxin
- Subjects
Statistics - Methodology ,Statistics - Machine Learning ,62P10 ,I.2.6 - Abstract
Random effects meta-analysis model is an important tool for integrating results from multiple independent studies. However, the standard model is based on the assumption of normal distributions for both random effects and within-study errors, making it susceptible to outlying studies. Although robust modeling using the $t$ distribution is an appealing idea, the existing work, that explores the use of the $t$ distribution only for random effects, involves complicated numerical integration and numerical optimization. In this paper, a novel robust meta-analysis model using the $t$ distribution is proposed ($t$Meta). The novelty is that the marginal distribution of the effect size in $t$Meta follows the $t$ distribution, enabling that $t$Meta can simultaneously accommodate and detect outlying studies in a simple and adaptive manner. A simple and fast EM-type algorithm is developed for maximum likelihood estimation. Due to the mathematical tractability of the $t$ distribution, $t$Meta frees from numerical integration and allows for efficient optimization. Experiments on real data demonstrate that $t$Meta is compared favorably with related competitors in situations involving mild outliers. Moreover, in the presence of gross outliers, while related competitors may fail, $t$Meta continues to perform consistently and robustly., Comment: 15 pages, 7 figures
- Published
- 2024
109. Outlier detection in spatial error models using modified thresholding-based iterative procedure for outlier detection approach
- Author
-
Jiaxin Cai, Weiwei Hu, Yuhui Yang, Hong Yan, and Fangyao Chen
- Subjects
Outliers ,Iterative procedure for outlier detection ,Mean-shift outlier model ,Spatial error model ,Robust estimation ,Medicine (General) ,R5-920 - Abstract
Abstract Background Outliers, data points that significantly deviate from the norm, can have a substantial impact on statistical inference and provide valuable insights in data analysis. Multiple methods have been developed for outlier detection, however, almost all available approaches fail to consider the spatial dependence and heterogeneity in spatial data. Spatial data has diverse formats and semantics, requiring specialized outlier detection methodology to handle these unique properties. For now, there is limited research exists on robust spatial outlier detection methods designed specifically under the spatial error model (SEM) structure. Method We propose the Spatial-Θ-Iterative Procedure for Outlier Detection (Spatial-Θ-IPOD), which utilizes a mean-shift vector to identify outliers within the SEM. Our method enables an effective detection of spatial outliers while also providing robust coefficient estimates. To assess the performance of our approach, we conducted extensive simulations and applied it to a real-world empirical study using life expectancy data from multiple countries. Results Simulation results showed that the masking and JD (Joint Detection) indicators of our Spatial-Θ-IPOD method outperformed several commonly used methods, even in high-dimensional scenarios, demonstrating stable performance. Conversely, the Θ-IPOD method proved to be ineffective in detecting outliers when spatial correlation was present. Moreover, our model successfully provided reliable coefficient estimation alongside outlier detection. The proposed method consistently outperformed other models (both robust and non-robust) in most cases. In the empirical study, our proposed model successfully detected outliers and provided valuable insights in the modeling process. Conclusions Our proposed Spatial-Θ-IPOD offers an effective solution for detecting spatial outliers for SEM while providing robust coefficient estimates. Notably, our approach showcases its relative superiority even in the presence of high leverage points. By successfully identifying outliers, our method enhances the overall understanding of the data and provides valuable insights for further analysis.
- Published
- 2024
- Full Text
- View/download PDF
110. Application of multivariate binary logistic regression grouped outlier statistics and geospatial logistic model to identify villages having unusual health-seeking habits for childhood malaria in Malawi
- Author
-
Gracious A. Hamuza, Emmanuel Singogo, and Tsirizani M. Kaombe
- Subjects
Childhood malaria ,Malawi malaria indicator survey data ,Caregiver treatment-seeking habit ,Mixed-effects logistic regression diagnostics ,GeoSpatial statistics ,Outlier traditional authorities ,Arctic medicine. Tropical medicine ,RC955-962 ,Infectious and parasitic diseases ,RC109-216 - Abstract
Abstract Background Early diagnosis and prompt treatment of malaria in young children are crucial for preventing the serious stages of the disease. If delayed treatment-seeking habits are observed in certain areas, targeted campaigns and interventions can be implemented to improve the situation. Methods This study applied multivariate binary logistic regression model diagnostics and geospatial logistic model to identify traditional authorities in Malawi where caregivers have unusual health-seeking behaviour for childhood malaria. The data from the 2021 Malawi Malaria Indicator Survey were analysed using R software version 4.3.0 for regressions and STATA version 17 for data cleaning. Results Both models showed significant variability in treatment-seeking habits of caregivers between villages. The mixed-effects logit model residual identified Vuso Jere, Kampingo Sibande, Ngabu, and Dzoole as outliers in the model. Despite characteristics that promote late reporting of malaria at clinics, most mothers in these traditional authorities sought treatment within twenty-four hours of the onset of malaria symptoms in their children. On the other hand, the geospatial logit model showed that late seeking of malaria treatment was prevalent in most areas of the country, except a few traditional authorities such as Mwakaboko, Mwenemisuku, Mwabulambya, Mmbelwa, Mwadzama, Zulu, Amidu, Kasisi, and Mabuka. Conclusions These findings suggest that using a combination of multivariate regression model residuals and geospatial statistics can help in identifying communities with distinct treatment-seeking patterns for childhood malaria within a population. Health policymakers could benefit from consulting traditional authorities who demonstrated early reporting for care in this study. This could help in understanding the best practices followed by mothers in those areas which can be replicated in regions where seeking care is delayed.
- Published
- 2024
- Full Text
- View/download PDF
111. A novel hybrid approach based on outlier and error correction methods to predict river discharge using meteorological variables
- Author
-
Shabbir, Maha, Chand, Sohail, and Iqbal, Farhat
- Published
- 2024
- Full Text
- View/download PDF
112. Automatic Outlier Rectification via Optimal Transport
- Author
-
Blanchet, Jose, Li, Jiajin, Pelger, Markus, and Zanotti, Greg
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning ,Mathematics - Optimization and Control ,Statistics - Methodology - Abstract
In this paper, we propose a novel conceptual framework to detect outliers using optimal transport with a concave cost function. Conventional outlier detection approaches typically use a two-stage procedure: first, outliers are detected and removed, and then estimation is performed on the cleaned data. However, this approach does not inform outlier removal with the estimation task, leaving room for improvement. To address this limitation, we propose an automatic outlier rectification mechanism that integrates rectification and estimation within a joint optimization framework. We take the first step to utilize the optimal transport distance with a concave cost function to construct a rectification set in the space of probability distributions. Then, we select the best distribution within the rectification set to perform the estimation task. Notably, the concave cost function we introduced in this paper is the key to making our estimator effectively identify the outlier during the optimization process. We demonstrate the effectiveness of our approach over conventional approaches in simulations and empirical analyses for mean estimation, least absolute regression, and the fitting of option implied volatility surfaces.
- Published
- 2024
113. DTOR: Decision Tree Outlier Regressor to explain anomalies
- Author
-
Crupi, Riccardo, Regoli, Daniele, Sabatino, Alessandro Damiano, Marano, Immacolata, Brinis, Massimiliano, Albertazzi, Luca, Cirillo, Andrea, and Cosentini, Andrea Claudio
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
Explaining outliers occurrence and mechanism of their occurrence can be extremely important in a variety of domains. Malfunctions, frauds, threats, in addition to being correctly identified, oftentimes need a valid explanation in order to effectively perform actionable counteracts. The ever more widespread use of sophisticated Machine Learning approach to identify anomalies make such explanations more challenging. We present the Decision Tree Outlier Regressor (DTOR), a technique for producing rule-based explanations for individual data points by estimating anomaly scores generated by an anomaly detection model. This is accomplished by first applying a Decision Tree Regressor, which computes the estimation score, and then extracting the relative path associated with the data point score. Our results demonstrate the robustness of DTOR even in datasets with a large number of features. Additionally, in contrast to other rule-based approaches, the generated rules are consistently satisfied by the points to be explained. Furthermore, our evaluation metrics indicate comparable performance to Anchors in outlier explanation tasks, with reduced execution time.
- Published
- 2024
114. Robust covariance estimation and explainable outlier detection for matrix-valued data
- Author
-
Mayrhofer, Marcus, Radojičić, Una, and Filzmoser, Peter
- Subjects
Statistics - Methodology ,Statistics - Computation - Abstract
This work introduces the Matrix Minimum Covariance Determinant (MMCD) method, a novel robust location and covariance estimation procedure designed for data that are naturally represented in the form of a matrix. Unlike standard robust multivariate estimators, which would only be applicable after a vectorization of the matrix-variate samples leading to high-dimensional datasets, the MMCD estimators account for the matrix-variate data structure and consistently estimate the mean matrix, as well as the rowwise and columnwise covariance matrices in the class of matrix-variate elliptical distributions. Additionally, we show that the MMCD estimators are matrix affine equivariant and achieve a higher breakdown point than the maximal achievable one by any multivariate, affine equivariant location/covariance estimator when applied to the vectorized data. An efficient algorithm with convergence guarantees is proposed and implemented. As a result, robust Mahalanobis distances based on MMCD estimators offer a reliable tool for outlier detection. Additionally, we extend the concept of Shapley values for outlier explanation to the matrix-variate setting, enabling the decomposition of the squared Mahalanobis distances into contributions of the rows, columns, or individual cells of matrix-valued observations. Notably, both the theoretical guarantees and simulations show that the MMCD estimators outperform robust estimators based on vectorized observations, offering better computational efficiency and improved robustness. Moreover, real-world data examples demonstrate the practical relevance of the MMCD estimators and the resulting robust Shapley values.
- Published
- 2024
115. Outlier detection by ensembling uncertainty with negative objectness
- Author
-
Delić, Anja, Grcić, Matej, and Šegvić, Siniša
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Outlier detection is an essential capability in safety-critical applications of supervised visual recognition. Most of the existing methods deliver best results by encouraging standard closed-set models to produce low-confidence predictions in negative training data. However, that approach conflates prediction uncertainty with recognition of the negative class. We therefore reconsider direct prediction of K+1 logits that correspond to K groundtruth classes and one outlier class. This setup allows us to formulate a novel anomaly score as an ensemble of in-distribution uncertainty and the posterior of the outlier class which we term negative objectness. Now outliers can be independently detected due to i) high prediction uncertainty or ii) similarity with negative data. We embed our method into a dense prediction architecture with mask-level recognition over K+2 classes. The training procedure encourages the novel K+2-th class to learn negative objectness at pasted negative instances. Our models outperform the current state-of-the art on standard benchmarks for image-wide and pixel-level outlier detection with and without training on real negative data., Comment: Accepted to BMVC 2024
- Published
- 2024
116. Robust mass lumping and outlier removal strategies in isogeometric analysis
- Author
-
Voet, Yannis, Sande, Espen, and Buffa, Annalisa
- Subjects
Mathematics - Numerical Analysis ,65M60, 65F15 - Abstract
Mass lumping techniques are commonly employed in explicit time integration schemes for problems in structural dynamics and both avoid solving costly linear systems with the consistent mass matrix and increase the critical time step. In isogeometric analysis, the critical time step is constrained by so-called "outlier" frequencies, representing the inaccurate high frequency part of the spectrum. Removing or dampening these high frequencies is paramount for fast explicit solution techniques. In this work, we propose robust mass lumping and outlier removal techniques for nontrivial geometries, including multipatch and trimmed geometries. Our lumping strategies provably do not deteriorate (and often improve) the CFL condition of the original problem and are combined with deflation techniques to remove persistent outlier frequencies. Numerical experiments reveal the advantages of the method, especially for simulations covering large time spans where they may halve the number of iterations with little or no effect on the numerical solution., Comment: 31 pages, 16 figures. Submitted manuscript
- Published
- 2024
117. Efficient Generation of Hidden Outliers for Improved Outlier Detection
- Author
-
Cribeiro-Ramallo, Jose, Arzamasov, Vadim, and Böhm, Klemens
- Subjects
Computer Science - Machine Learning - Abstract
Outlier generation is a popular technique used for solving important outlier detection tasks. Generating outliers with realistic behavior is challenging. Popular existing methods tend to disregard the 'multiple views' property of outliers in high-dimensional spaces. The only existing method accounting for this property falls short in efficiency and effectiveness. We propose BISECT, a new outlier generation method that creates realistic outliers mimicking said property. To do so, BISECT employs a novel proposition introduced in this article stating how to efficiently generate said realistic outliers. Our method has better guarantees and complexity than the current methodology for recreating 'multiple views'. We use the synthetic outliers generated by BISECT to effectively enhance outlier detection in diverse datasets, for multiple use cases. For instance, oversampling with BISECT reduced the error by up to 3 times when compared with the baselines., Comment: Preprint. Full paper is scheduled to appear in TKDD; Updated results in table 4
- Published
- 2024
- Full Text
- View/download PDF
118. High-dimensional Outlier Detection via Stability
- Author
-
Heng, Qiang, Shen, Hui, and Lange, Kenneth
- Subjects
Statistics - Methodology ,Statistics - Computation - Abstract
The Minimum Covariance Determinant (MCD) method is a widely adopted tool for robust estimation and outlier detection. In this paper, we introduce a new framework for model selection in MCD with spectral embedding based on the notion of stability. Our best subset algorithm leverages principal component analysis for dimension reduction, statistical depths for effective initialization, and concentration steps for subset refinement. Subsequently, we construct a bootstrap procedure to estimate the instability of the best subset algorithm. The parameter combination exhibiting minimal instability proves ideal for the purposes of high-dimensional outlier detection, while the instability path offers insights into the inlier/outlier structure. We rigorously benchmark the proposed framework against existing MCD variants and illustrate its practical utility on two spectra data sets and a cancer genomics data set.
- Published
- 2024
119. Dimensionality-Aware Outlier Detection: Theoretical and Experimental Analysis
- Author
-
Anderberg, Alastair, Bailey, James, Campello, Ricardo J. G. B., Houle, Michael E., Marques, Henrique O., Radovanović, Miloš, and Zimek, Arthur
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,68T99 (Primary) 62G07, 62G32, 62H30 (Secondary) - Abstract
We present a nonparametric method for outlier detection that takes full account of local variations in intrinsic dimensionality within the dataset. Using the theory of Local Intrinsic Dimensionality (LID), our 'dimensionality-aware' outlier detection method, DAO, is derived as an estimator of an asymptotic local expected density ratio involving the query point and a close neighbor drawn at random. The dimensionality-aware behavior of DAO is due to its use of local estimation of LID values in a theoretically-justified way. Through comprehensive experimentation on more than 800 synthetic and real datasets, we show that DAO significantly outperforms three popular and important benchmark outlier detection methods: Local Outlier Factor (LOF), Simplified LOF, and kNN., Comment: 13 pages, 3 figures. Extended version of a paper accepted for publication at the SIAM International Conference on Data Mining (SDM24)
- Published
- 2024
120. On the use of the M-quantiles for outlier detection in multivariate data
- Author
-
Chakroborty, Sajal, Iyer, Ram, and Trindade, A. Alexandre
- Subjects
Mathematics - Statistics Theory ,Mathematics - Optimization and Control - Abstract
Defining a successful notion of a multivariate quantile has been an open problem for more than half a century, motivating a plethora of possible solutions. Of these, the approach of [8] and [25] leading to M-quantiles, is very appealing for its mathematical elegance combining elements of convex analysis and probability theory. The key idea is the description of a convex function (the K-function) whose gradient (the K-transform) is in one-to-one correspondence between all of R^d and the unit ball in R^d. By analogy with the d=1 case where the K-transform is a cumulative distribution function-like object (an M-distribution), the fact that its inverse is guaranteed to exist lends itself naturally to providing the basis for the definition of a quantile function for all d>=1. Over the past twenty years the resulting M-quantiles have seen applications in a variety of fields, primarily for the purpose of detecting outliers in multidimensional spaces. In this article we prove that for odd d>=3, it is not the gradient but a poly-Laplacian of the K-function that is (almost everywhere) proportional to the density function. For d even one cannot establish a differential equation connecting the K-function with the density. These results show that usage of the K-transform for outlier detection in higher odd-dimensions is in principle flawed, as the K-transform does not originate from inversion of a true M-distribution. We demonstrate these conclusions in two dimensions through examples from non-standard asymmetric distributions. Our examples illustrate a feature of the K-transform whereby regions in the domain with higher density map to larger volumes in the co-domain, thereby producing a magnification effect that moves inliers closer to the boundary of the co-domain than outliers. This feature obviously disrupts any outlier detection mechanism that relies on the inverse K-transform.
- Published
- 2024
121. Outlier Ranking in Large-Scale Public Health Streams
- Author
-
Joshi, Ananya, Townes, Tina, Gormley, Nolan, Neureiter, Luke, Rosenfeld, Roni, and Wilder, Bryan
- Subjects
Computer Science - Artificial Intelligence - Abstract
Disease control experts inspect public health data streams daily for outliers worth investigating, like those corresponding to data quality issues or disease outbreaks. However, they can only examine a few of the thousands of maximally-tied outliers returned by univariate outlier detection methods applied to large-scale public health data streams. To help experts distinguish the most important outliers from these thousands of tied outliers, we propose a new task for algorithms to rank the outputs of any univariate method applied to each of many streams. Our novel algorithm for this task, which leverages hierarchical networks and extreme value analysis, performed the best across traditional outlier detection metrics in a human-expert evaluation using public health data streams. Most importantly, experts have used our open-source Python implementation since April 2023 and report identifying outliers worth investigating 9.1x faster than their prior baseline. Other organizations can readily adapt this implementation to create rankings from the outputs of their tailored univariate methods across large-scale streams., Comment: 6 figures, 8 pages
- Published
- 2024
122. Unsupervised Outlier Detection using Random Subspace and Subsampling Ensembles of Dirichlet Process Mixtures
- Author
-
Kim, Dongwook, Park, Juyeon, Chung, Hee Cheol, and Jeong, Seonghyun
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
Probabilistic mixture models are recognized as effective tools for unsupervised outlier detection owing to their interpretability and global characteristics. Among these, Dirichlet process mixture models stand out as a strong alternative to conventional finite mixture models for both clustering and outlier detection tasks. Unlike finite mixture models, Dirichlet process mixtures are infinite mixture models that automatically determine the number of mixture components based on the data. Despite their advantages, the adoption of Dirichlet process mixture models for unsupervised outlier detection has been limited by challenges related to computational inefficiency and sensitivity to outliers in the construction of outlier detectors. Additionally, Dirichlet process Gaussian mixtures struggle to effectively model non-Gaussian data with discrete or binary features. To address these challenges, we propose a novel outlier detection method that utilizes ensembles of Dirichlet process Gaussian mixtures. This unsupervised algorithm employs random subspace and subsampling ensembles to ensure efficient computation and improve the robustness of the outlier detector. The ensemble approach further improves the suitability of the proposed method for detecting outliers in non-Gaussian data. Furthermore, our method uses variational inference for Dirichlet process mixtures, which ensures both efficient and rapid computation. Empirical analyses using benchmark datasets demonstrate that our method outperforms existing approaches in unsupervised outlier detection.
- Published
- 2024
123. A hybrid dimensionality reduction method for outlier detection in high-dimensional data
- Author
-
Meng, Guanglei, Wang, Biao, Wu, Yanming, Zhou, Mingzhe, and Meng, Tiankuo
- Published
- 2023
- Full Text
- View/download PDF
124. A double-weighted outlier detection algorithm considering the neighborhood orientation distribution of data objects
- Author
-
Gao, Qiang, Gao, Qin-Qin, Xiong, Zhong-Yang, Zhang, Yu-Fang, Wang, Yu-Qin, and Zhang, Min
- Published
- 2023
- Full Text
- View/download PDF
125. Density kernel depth for outlier detection in functional data
- Author
-
Hernández, Nicolás, Muñoz, Alberto, and Martos, Gabriel
- Published
- 2023
- Full Text
- View/download PDF
126. Flight data outlier detection by constrained LSTM-autoencoder
- Author
-
Gao, Long, Xu, Congan, Wang, Fengqin, Wu, Junfeng, and Su, Hang
- Published
- 2023
- Full Text
- View/download PDF
127. An Ensemble Model Based on Combining BayesDel and Revel Scores Indicates Outstanding Performance: Importance of Outlier Detection and Comparison of Models.
- Author
-
Alay, Mustafa Tarık
- Subjects
- *
MACHINE learning , *OUTLIER detection , *STATISTICAL accuracy , *STATISTICAL models , *DATABASES - Abstract
Objective: Our objective is to create an effective ensemble tool that can accurately predict MEFV gene variants and determine the threshold value for pathogenicity based on the optimal distribution. Methods: First, we extracted a dataset from the Infevers database [https://infevers.umai-montpellier.fr/web/search.php?n=1]. Second, we merged the variant classification into 2 categories: likely benign and likely pathogenic. Third, we implemented our high-sensitivity model to obtain disease-causing variants. In the 4 steps, we implemented curve estimation analysis to determine which curve was fitting our variant distribution. We implemented the receiver operating curve after the curve estimation analysis to find suitable in silico tool models for logistic regression. Repeated outlier detection analysis was performed in the fifth step until no outliers were detected. Ensemble tree-based machine-learning models were used to test a statistical model in the final step. Results: When outliers were taken out, the Revel and BayesDel algorithms both had much higher ROCAUC scores (0.982 [0.967-0.998], P < .001 for the combined model; 0.982 [0.967-0.998], P < .001 for Revel; and 0.933 [0.889-0.977], P < .001 for BayesDel). AdaBoost was the most accurate machine learning model, with 0.982 ROACUAC scores. Conclusion: Our study revealed that the implementation of outlier and anomaly detection techniques can enhance the accuracy of statistical models and yield more precise outcomes in machine learning datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
128. A Context-Sensitive, Outlier-Based Static Analysis to Find Kernel Race Conditions
- Author
-
Dossche, Niels, Abrath, Bert, and Coppens, Bart
- Subjects
Computer Science - Software Engineering ,Computer Science - Cryptography and Security - Abstract
Race conditions are a class of bugs in software where concurrent accesses to shared resources are not protected from each other. Consequences of race conditions include privilege escalation, denial of service, and memory corruption which can potentially lead to arbitrary code execution. However, in large code bases the exact rules as to which fields should be accessed under which locks are not always clear. We propose a novel static technique that infers rules for how field accesses should be locked, and then checks the code against these rules. Traditional static analysers for detecting race conditions are based on lockset analysis. Instead, we propose an outlier-based technique enhanced with a context-sensitive mechanism that scales well. We have implemented this analysis in LLIF, and evaluated it to find incorrectly protected field accesses in Linux v5.14.11. We thoroughly evaluate its ability to find race conditions, and study the causes for false positive reports. In addition, we reported a subset of the issues and submitted patches. The maintainers confirmed 24 bugs.
- Published
- 2024
129. Outlier Robust Multivariate Polynomial Regression
- Author
-
Arora, Vipul, Bhattacharyya, Arnab, Boban, Mathews, Guruswami, Venkatesan, and Kelman, Esty
- Subjects
Computer Science - Data Structures and Algorithms ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
We study the problem of robust multivariate polynomial regression: let $p\colon\mathbb{R}^n\to\mathbb{R}$ be an unknown $n$-variate polynomial of degree at most $d$ in each variable. We are given as input a set of random samples $(\mathbf{x}_i,y_i) \in [-1,1]^n \times \mathbb{R}$ that are noisy versions of $(\mathbf{x}_i,p(\mathbf{x}_i))$. More precisely, each $\mathbf{x}_i$ is sampled independently from some distribution $\chi$ on $[-1,1]^n$, and for each $i$ independently, $y_i$ is arbitrary (i.e., an outlier) with probability at most $\rho < 1/2$, and otherwise satisfies $|y_i-p(\mathbf{x}_i)|\leq\sigma$. The goal is to output a polynomial $\hat{p}$, of degree at most $d$ in each variable, within an $\ell_\infty$-distance of at most $O(\sigma)$ from $p$. Kane, Karmalkar, and Price [FOCS'17] solved this problem for $n=1$. We generalize their results to the $n$-variate setting, showing an algorithm that achieves a sample complexity of $O_n(d^n\log d)$, where the hidden constant depends on $n$, if $\chi$ is the $n$-dimensional Chebyshev distribution. The sample complexity is $O_n(d^{2n}\log d)$, if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most $O(\sigma)$, and the run-time depends on $\log(1/\sigma)$. In the setting where each $\mathbf{x}_i$ and $y_i$ are known up to $N$ bits of precision, the run-time's dependence on $N$ is linear. We also show that our sample complexities are optimal in terms of $d^n$. Furthermore, we show that it is possible to have the run-time be independent of $1/\sigma$, at the cost of a higher sample complexity.
- Published
- 2024
130. QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning
- Author
-
Chen, Jiun-Man, Chao, Yu-Hsuan, Wang, Yu-Jie, Shieh, Ming-Der, Hsu, Chih-Chung, and Lin, Wei-Fen
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Multimedia - Abstract
Transformer-based models have gained widespread popularity in both the computer vision (CV) and natural language processing (NLP) fields. However, significant challenges arise during post-training linear quantization, leading to noticeable reductions in inference accuracy. Our study focuses on uncovering the underlying causes of these accuracy drops and proposing a quantization-friendly fine-tuning method, \textbf{QuantTune}. Firstly, our analysis revealed that, on average, 65\% of quantization errors result from the precision loss incurred by the dynamic range amplification effect of outliers across the target Transformer-based models. Secondly, \textbf{QuantTune} adjusts weights based on the deviation of outlier activations and effectively constrains the dynamic ranges of the problematic activations. As a result, it successfully mitigates the negative impact of outliers on the inference accuracy of quantized models. Lastly, \textbf{QuantTune} can be seamlessly integrated into the back-propagation pass in the fine-tuning process without requiring extra complexity in inference software and hardware design. Our approach showcases significant improvements in post-training quantization across a range of Transformer-based models, including ViT, Bert-base, and OPT. QuantTune reduces accuracy drops by 12.09\% at 8-bit quantization and 33.8\% at 7-bit compared to top calibration methods, outperforming state-of-the-art solutions by over 18.84\% across ViT models.
- Published
- 2024
131. Testing Outlier Detection Algorithms for Identifying Early-Stage Solute Clusters in Atom Probe Tomography
- Author
-
Stroud, R S., Al-Saffar, A., Carter, M., Moody, M P., Pedrazzini, S., and Wenman, M R.
- Subjects
Condensed Matter - Materials Science - Abstract
Atom probe tomography is commonly used to study solute clustering and precipitation in materials. However, standard techniques, such as the density based spatial clustering applications with noise (DBSCAN) perform poorly with respect to small clusters of less than 25 atoms. This is a fundamental limitation of density-based clustering techniques due to the usage of Nmin, an arbitrary lower limit placed on cluster sizes. Therefore, this paper attempts to consider atom probe clustering as an outlier detection problem of which KNN, LOF, LUNAR algorithms were tested against a simulated dataset and compared to the standard method, for a range of cluster sizes. The decision score output of the algorithms was then auto thresholded by the Karcher mean to remove human bias. Each of the major models tested outperforms DBSCAN for cluster sizes of less than 25 atoms but underperforms for sizes greater than 30 atoms. However, the new combined kNN and DBSCAN method presented was able to perform well at all cluster sizes. The combined kNN and DBSCAN method is presented as a possible new standard approach to identifying solute clusters in atom probe tomography.
- Published
- 2024
132. Outlier-Aware Training for Low-Bit Quantization of Structural Re-Parameterized Networks
- Author
-
Niu, Muqun, Ren, Yuan, Li, Boyu, and Ding, Chenchen
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Neural and Evolutionary Computing - Abstract
Lightweight design of Convolutional Neural Networks (CNNs) requires co-design efforts in the model architectures and compression techniques. As a novel design paradigm that separates training and inference, a structural re-parameterized (SR) network such as the representative RepVGG revitalizes the simple VGG-like network with a high accuracy comparable to advanced and often more complicated networks. However, the merging process in SR networks introduces outliers into weights, making their distribution distinct from conventional networks and thus heightening difficulties in quantization. To address this, we propose an operator-level improvement for training called Outlier Aware Batch Normalization (OABN). Additionally, to meet the demands of limited bitwidths while upkeeping the inference accuracy, we develop a clustering-based non-uniform quantization framework for Quantization-Aware Training (QAT) named ClusterQAT. Integrating OABN with ClusterQAT, the quantized performance of RepVGG is largely enhanced, particularly when the bitwidth falls below 8., Comment: 8 pages, 8 figures
- Published
- 2024
133. Outlier Accommodation for GNSS Precise Point Positioning using Risk-Averse State Estimation
- Author
-
Hu, Wang, Uwineza, Jean-Bernard, and Farrell, Jay A.
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
Reliable and precise absolute positioning is necessary in the realm of Connected Automated Vehicles (CAV). Global Navigation Satellite Systems (GNSS) provides the foundation for absolute positioning. Recently enhanced Precise Point Positioning (PPP) technology now offers corrections for GNSS on a global scale, with the potential to achieve accuracy suitable for real-time CAV applications. However, in obstructed sky conditions, GNSS signals are often affected by outliers; therefore, addressing outliers is crucial. In GNSS applications, there are many more measurements available than are required to meet the specification. Therefore, selecting measurements to avoid outliers is of interest. The recently developed Risk-Averse Performance-Specified (RAPS) state estimation optimally selects measurements to minimize outlier risk while meeting a positive semi-definite constraint on performance; at present, the existing solution methods are not suitable for real-time computation and have not been demonstrated using challenging real-world data or in Real-time PPP (RT-PPP) applications. This article makes contributions in a few directions. First, it uses a diagonal performance specification, which reduces computational costs relative to the positive semi-definite constraint. Second, this article considers GNSS RT-PPP applications. Third, the experiments use real-world GNSS data collected in challenging environments. The RT-PPP experimental results show that among the compared methods: all achieve comparable performance in open-sky conditions, and all exceed the Society of Automotive Engineers (SAE) specification; however, in challenging environments, the diagonal RAPS approach shows improvement of 6-19% over traditional methods. Throughout, RAPS achieves the lowest estimation risk., Comment: 7 pages,2 figures, Accepted by 2024 American Control Conference
- Published
- 2024
134. Voronoi Diagram-based USBL Outlier Rejection for AUV Localization
- Author
-
Hyeonmin Sim and Hangil Joe
- Subjects
usbl ,outlier detection ,voronoi diagram ,acoustic sensors ,Ocean engineering ,TC1501-1800 - Abstract
USBL systems are essential for providing accurate positions of autonomous underwater vehicles (AUVs). On the other hand, the accuracy can be degraded by outliers because of the environmental conditions. A failure to address these outliers can significantly impact the reliability of underwater localization and navigation systems. This paper proposes a novel outlier rejection algorithm for AUV localization using Voronoi diagrams and query point calculation. The Voronoi diagram divides data space into Voronoi cells that center on ultra-short baseline (USBL) data, and the calculated query point determines if the corresponding USBL data is an inlier. This study conducted experiments acquiring GPS and USBL data simultaneously and optimized the algorithm empirically based on the acquired data. In addition, the proposed method was applied to a sensor fusion algorithm to verify its effectiveness, resulting in improved pose estimations. The proposed method can be applied to various sensor fusion algorithms as a preprocess and could be used for outlier rejection for other 2D-based location sensors.
- Published
- 2024
- Full Text
- View/download PDF
135. Correlation-based outlier detection for ships’ in-service datasets
- Author
-
Prateek Gupta, Adil Rasheed, and Sverre Steen
- Subjects
Outlier detection ,Ship in-service data ,PCA ,Autoencoders ,Non-linear transformations ,Latent variables ,Computer engineering. Computer hardware ,TK7885-7895 ,Information technology ,T58.5-58.64 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Abstract With the advent of big data, it has become increasingly difficult to obtain high-quality data. Solutions are required to remove undesired outlier samples from massively large datasets. Ship operators rely on high-frequency in-service datasets recorded onboard the ships for monitoring the performance of their fleet. The large in-service datasets are known to be highly unbalanced, making it difficult to adopt ordinary outlier detection techniques, as they would also result in the removal of rare but quite valuable data samples. Thus, the current work proposes to establish a correlation-based outlier detection scheme for ships’ in-service datasets using two well-known dimensionality reduction methods, namely, Principal Component Analysis (PCA) and Autoencoders. The correlation-based approach detects samples which do not fit the prominent correlations present in the dataset and avoids misidentifying the rare but correlation-following samples in the sparse regions of data domain. The study also attempts to provide the physical meaning of the latent variables obtained using PCA. The effectiveness of the proposed methodology is proven using an actual dataset recorded onboard a ship.
- Published
- 2024
- Full Text
- View/download PDF
136. VOD: Vision-Based Building Energy Data Outlier Detection
- Author
-
Jinzhao Tian, Tianya Zhao, Zhuorui Li, Tian Li, Haipei Bie, and Vivian Loftness
- Subjects
AI-driven ,deep learning ,outlier detection ,load shape ,building energy ,Computer engineering. Computer hardware ,TK7885-7895 - Abstract
Outlier detection plays a critical role in building operation optimization and data quality maintenance. However, existing methods often struggle with the complexity and variability of building energy data, leading to poorly generalized and explainable results. To address the gap, this study introduces a novel Vision-based Outlier Detection (VOD) approach, leveraging computer vision models to spot outliers in the building energy records. The models are trained to identify outliers by analyzing the load shapes in 2D time series plots derived from the energy data. The VOD approach is tested on four years of workday time-series electricity consumption data from 290 commercial buildings in the United States. Two distinct models are developed for different usage purposes, namely a classification model for broad-level outlier detection and an object detection model for the demands of precise pinpointing of outliers. The classification model is also interpreted via Grad-CAM to enhance its usage reliability. The classification model achieves an F1 score of 0.88, and the object detection model achieves an Average Precision (AP) of 0.84. VOD is a very efficient path to identifying energy consumption outliers in building operations, paving the way for the enhancement of building energy data quality, operation efficiency, and energy savings.
- Published
- 2024
- Full Text
- View/download PDF
137. Optimasi Algoritma K-Nearest Neighbors Berdasarkan Perbandingan Analisis Outlier (Berbasis Jarak, Kepadatan, LOF)
- Author
-
Fitri Ayuning Tyas, Mahda Nurayuni, and Hidayatur Rakhmawati
- Subjects
k-nearest neighbors ,outlier ,kepadatan ,jarak ,lof ,uji friedman ,uji nemenyi ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Pertumbuhan data yang terjadi saat ini berpengaruh terhadap analisis data di berbagai bidang, seperti astronomi, bisnis, kedokteran, pendidikan, dan finansial. Data yang terkumpul dan tersimpan mengandung nilai ekstrem atau nilai pengamatan yang berbeda dari kebanyakan nilai hasil pengamatan lain. Nilai ekstrem tersebut disebut dengan outlier. Outlier pada sebagian data sering kali memiliki nilai yang mengandung informasi penting, sehingga perlu dikaji agar dapat diambil keputusan untuk menghapus atau menggunakan data tersebut sebelum diterapkan dalam penambangan data. Deteksi outlier dapat dilakukan sebagai prapemrosesan data dengan menggunakan teknik analisis outlier. Beberapa teknik analisis outlier yang banyak diterapkan antara lain metode berbasis jarak (distance), metode berbasis kepadatan (density), dan metode local outlier factor (LOF). K-nearest neighbors (KNN) merupakan salah satu algoritma penambangan data yang sangat sensitif terhadap outlier karena cara kerjanya yang bergantung pada nilai k. Oleh karena itu, perlu penanganan tepat saat KNN bekerja pada dataset yang mengandung outlier. Metode eksperimen dipilih dalam menerapkan metode usulan, dengan tujuan untuk mengoptimasi algoritma KNN berdasarkan perbandingan analisis outlier (KNN-distance, KNN-density, dan KNN-LOF). Hasil penelitian menunjukkan bahwa KNN-kepadatan unggul sebanyak tiga kali: pada Wisconsin Breast Cancer dengan nilai rata-rata akurasi sebesar 99,34% pada k=3 dan k=5; pada Glass dengan nilai rata-rata akurasi sebesar 85,25% pada k=7; dan pada Lymphography dengan nilai rata-rata akurasi sebesar 85,45% pada k=5. Selanjutnya, berdasarkan hasil uji Friedman dan uji Nemenyi, juga terbukti bahwa ada perbedaan yang signifikan antara KNN-kepadatan dengan KNN-LOF.
- Published
- 2024
- Full Text
- View/download PDF
138. A novel outlier detection method based on Bayesian change point analysis and Hampel identifier for GNSS coordinate time series
- Author
-
Hüseyin Pehlivan
- Subjects
GNSS data ,Outlier ,Hampel identifier ,Bayesian change point ,Time series ,Telecommunication ,TK5101-6720 ,Electronics ,TK7800-8360 - Abstract
Abstract The identification and removal of outliers in time series are important problems in numerous fields. In this paper, a novel method (BCP-HI) is proposed to enhance the accuracy of outlier detection in GNSS coordinate time series by combining Bayesian change point (BCP) analysis and the Hampel identifier (HI). By using BCP, change points (cps) in the time series are lidentified, and so the time series is divided into subsegments that have properties of a normal distribution. In each of these separated segments, outliers are detected using HI. Each data element identified as an outlier is corrected by a median filter of window size (w) to obtain the corrected signal. The BCP-HI method was tested on both simulated and real GNSS coordinate time series. Outliers from three different synthetic test datasets with different sampling frequencies and outlier amplitudes were detected with approximately 98% accuracy after processing. After this process, Signal-to-Noise Ratio (SNR) increased from 0.0084 to 10.8714 dB and Root Mean Square (RMS) decreased from 24 to 23 mm. Similarly, for real GNSS data, approximately 98% accuracy was achieved, with an increase in SNR from 0.0003 to 4.4082 dB and a decrease in RMS from 7.6 to 6.6 mm observed. In addition, the output signals after BCP-HI were examined graphically using Lomb–Scargle periodograms and it was observed that clearer power spectrum distributions emerged. When the input and output signals were examined using the Kolmogorov–Smirnov (KS) test, they were found to be statistically similar. These results indicate that the BCP-HI algorithm effectively removes outliers, and enhances processing accuracy and reliability, and improves signal quality.
- Published
- 2024
- Full Text
- View/download PDF
139. Outlier Detection: Techniques and Applications : A Data Mining Perspective
- Author
-
N. N. R. Ranga Suri, Narasimha Murty M, G. Athithan, N. N. R. Ranga Suri, Narasimha Murty M, and G. Athithan
- Subjects
- Outliers (Statistics), Data mining
- Abstract
This book, drawing on recent literature, highlights several methodologies for the detection of outliers and explains how to apply them to solve several interesting real-life problems. The detection of objects that deviate from the norm in a data set is an essential task in data mining due to its significance in many contemporary applications. More specifically, the detection of fraud in e-commerce transactions and discovering anomalies in network data have become prominent tasks, given recent developments in the field of information and communication technologies and security. Accordingly, the book sheds light on specific state-of-the-art algorithmic approaches such as the community-based analysis of networks and characterization of temporal outliers present in dynamic networks. It offers a valuable resource for young researchers working in data mining, helping them understand the technical depth of the outlier detection problem and devise innovative solutions to address related challenges.
- Published
- 2019
140. Manifold-based denoising, outlier detection, and dimension reduction algorithm for high-dimensional data
- Author
-
Zhao, Guanghua, Yang, Tao, and Fu, Dongmei
- Published
- 2023
- Full Text
- View/download PDF
141. A Comparison of Fixed and Random Effect Models by the Number of Research in the Meta-Analysis Studies with and without an Outlier
- Author
-
Demir, Seda and Doguyurt, Mehmet Fatih
- Abstract
The purpose of this research was to compare the performances of the Fixed Effect Model (FEM) and the Random Effects Model (REM) in the meta-analysis studies conducted through 5, 10, 20 and 40 studies with an outlier and 4, 9, 19 and 39 studies without an outlier in terms of estimated common effect size, confidence interval coverage rate and heterogeneity measures. In this descriptive study, real data set consisting of different studies examining teachers' emotional burnout in terms of gender were used and a total of 72 meta-analyses were performed with R program. The results indicated that REM was more advantageous when compared to FEM for the meta-analysis of data sets with an outlier. On the other hand, without an outlier, it was determined that the common effect size was generally estimated to be similar for all methods. Moreover, the increase in the number of studies included in the meta-analysis reduced the effect of the outlier on the effect size estimation and decreased the heterogeneity. When the examination of the confidence interval coverage accuracy rates of the meta-analysis methods was examined, it was concluded that the confidence intervals included the estimated effect sizes in all data sets and all methods. The findings of the current study showed that the methods used in meta-analysis studies with 20 or more studies were less affected by the outlier runs in the estimated common effect size.
- Published
- 2022
142. A data-adaptive method for outlier detection from functional data
- Author
-
Lakra, Arjun, Banerjee, Buddhananda, and Laha, Arnab Kumar
- Published
- 2024
- Full Text
- View/download PDF
143. Optimization-Based Risk-Averse Outlier Accommodation With Linear Performance Constraints: Real-Time Computation and Constraint Feasibility in CAV State Estimation
- Author
-
Hu, Wang
- Subjects
Optimization ,State Estimation ,Navigation ,GNSS - Abstract
Connected and Autonomous Vehicles (CAV) require positioning that is consistently reliable and accurate. This is achieved through the choice of sensors and the real-time selection of high-quality measurements. Global Navigation Satellite Systems (GNSS) are the foundation to achieve accurate absolute positioning. GNSS Common-mode Errors (CME)mitigation can be realized with Differential GNSS (DGNSS) approach and Precise Point Positioning (PPP) techniques. With the evolution of the International GNSS Service (IGS) Multi-GNSS Experiment (MGEX), Real-time PPP (RT-PPP) corrections for multi-GNSS have only recently become accessible.GNSS measurements are prone to outliers. This results in an inherent performance versus risk trade-off in CAV state estimation applications. Recently proposed Risk-Averse Performance Specified (RAPS) methods address this trade-off by optimally selecting a subset of measurements to minimize risk while achieving a target performance. The existing RAPS literature presents cases where the performance specification is stated for the full information matrix. However, those methods are not computationally efficient as required for real-time and do not address situations where that specification is infeasible.This dissertation focuses on the Diagonal Performance-Specified RAPS (DiagRAPS) formulation. This dissertation begins with a review of GNSS measurement models and real-time CME mitigation techniques, such as DGNSS, PPP, and Virtual Network DGNSS (VN-DGNSS). It then develops the theory of DiagRAPS for both binary and non-binary measurement selection variables. Algorithms suitable for real-time applications are proposed within Linear Programming (LP) and Mixed-Integer Linear Programming (ILP) optimization frameworks, achieving polynomial time complexity. The convergence and computation costs of these algorithms are discussed. For binary DiagRAPS, a novel convex reformulation is derived, leading to a globally optimal solution that can be solved using existing tools. Additionally, a soft constraint optimization approach is proposed for situations when the specified performance is unfeasible. Finally, this dissertation evaluates DiagRAPS state estimation approaches using real-world multi-GNSS data from challenging environments for both DGNSS and RT-PPP applications. The results reveal that the locally optimal approach achieves state estimation performance comparable to the global solution. Both binary and non-binary DiagRAPS outperform traditional methods. Notably, the non-binary approach yielded the lowest computation cost and the best overall performance.
- Published
- 2024
144. Outlier Detection Using a GPU-Based Parallel Algorithm: Quantum Clustering.
- Author
-
Liu, Ding, Wang, Zhe, and Li, Hui
- Subjects
- *
OUTLIER detection , *CENTROID , *PARALLEL algorithms - Abstract
We introduce a novel hypothesis in the field of outlier detection, suggesting that normal data tend to be distributed in regions where the density changes smoothly or is less pronounced, whereas abnormal data often exhibit distribution in areas characterized by abrupt changes in data density. Relying on this hypothesis, we develop a novel density-based unsupervised outlier detection method, referred to as Quantum Clustering (QC). This approach addresses the processing of unlabeled data and employs a potential function to identify the centroids of clusters and outliers effectively. Experimental results demonstrate that the potential function can accurately detect hidden outliers within data points. Furthermore, by adjusting the parameter σ , QC enables the identification of more subtle outliers. Additionally, our method is evaluated on several benchmarks from diverse research areas, affirming its broad applicability. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
145. SEAOP: a statistical ensemble approach for outlier detection in quantitative proteomics data.
- Author
-
Huang, Jinze, Zhao, Yang, Meng, Bo, Lu, Ao, Wei, Yaoguang, Dong, Lianhua, Fang, Xiang, An, Dong, and Dai, Xinhua
- Subjects
- *
OUTLIER detection , *STATISTICAL ensembles , *PROTEOMICS , *DATA structures , *QUALITY control , *CHI-squared test - Abstract
Quality control in quantitative proteomics is a persistent challenge, particularly in identifying and managing outliers. Unsupervised learning models, which rely on data structure rather than predefined labels, offer potential solutions. However, without clear labels, their effectiveness might be compromised. Single models are susceptible to the randomness of parameters and initialization, which can result in a high rate of false positives. Ensemble models, on the other hand, have shown capabilities in effectively mitigating the impacts of such randomness and assisting in accurately detecting true outliers. Therefore, we introduced SEAOP, a Python toolbox that utilizes an ensemble mechanism by integrating multi-round data management and a statistics-based decision pipeline with multiple models. Specifically, SEAOP uses multi-round resampling to create diverse sub-data spaces and employs outlier detection methods to identify candidate outliers in each space. Candidates are then aggregated as confirmed outliers via a chi-square test, adhering to a 95% confidence level, to ensure the precision of the unsupervised approaches. Additionally, SEAOP introduces a visualization strategy, specifically designed to intuitively and effectively display the distribution of both outlier and non-outlier samples. Optimal hyperparameter models of SEAOP for outlier detection were identified by using a gradient-simulated standard dataset and Mann–Kendall trend test. The performance of the SEAOP toolbox was evaluated using three experimental datasets, confirming its reliability and accuracy in handling quantitative proteomics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
146. Outlier identification and adjustment for time series.
- Author
-
Fröhlich, Markus
- Subjects
- *
OUTLIER detection , *ALGORITHMS - Abstract
Identification and replacement of erroneous data is of fundamental importance for the quality of statistical surveys. If statistical units are continuously sampled over an extended period, time series methods can facilitate this task. Numerous outlier identification and replacement procedures are accessible for this particular purpose, like RegArima Approaches within the seasonal adjustment procedures in X13-Arima or Tramo/Seats. These algorithms can be used to identify different types of outliers, like additive outliers, level shifts or transitory changes. In this paper an alternative outlier identification procedure is proposed which is based on a nonlinear model estimated with support vector regressions. The focus of this procedure is on the identification of additive outliers and on the applicability for short time series with less than 3 years of observations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
147. QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
- Author
-
Ashkboos, Saleh, Mohtashami, Amirkeivan, Croci, Maximilian L., Li, Bo, Jaggi, Martin, Alistarh, Dan, Hoefler, Torsten, and Hensman, James
- Subjects
Computer Science - Machine Learning - Abstract
We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot rotates LLMs in a way that removes outliers from the hidden state without changing the output, making quantization easier. This computational invariance is applied to the hidden state (residual) of the LLM, as well as to the activations of the feed-forward components, aspects of the attention mechanism and to the KV cache. The result is a quantized model where all matrix multiplications are performed in 4-bits, without any channels identified for retention in higher precision. Our quantized LLaMa2-70B model has losses of at most 0.29 WikiText-2 perplexity and retains 99% of the zero-shot performance. Code is available at: https://github.com/spcl/QuaRot., Comment: 19 pages, 6 figures
- Published
- 2024
148. Hyperbolic Metric Learning for Visual Outlier Detection
- Author
-
Gonzalez-Jimenez, Alvaro, Lionetti, Simone, Bazazian, Dena, Gottfrois, Philippe, Gröger, Fabian, Pouly, Marc, and Navarini, Alexander
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Out-Of-Distribution (OOD) detection is critical to deploy deep learning models in safety-critical applications. However, the inherent hierarchical concept structure of visual data, which is instrumental to OOD detection, is often poorly captured by conventional methods based on Euclidean geometry. This work proposes a metric framework that leverages the strengths of Hyperbolic geometry for OOD detection. Inspired by previous works that refine the decision boundary for OOD data with synthetic outliers, we extend this method to Hyperbolic space. Interestingly, we find that synthetic outliers do not benefit OOD detection in Hyperbolic space as they do in Euclidean space. Furthermore we explore the relationship between OOD detection performance and Hyperbolic embedding dimension, addressing practical concerns in resource-constrained environments. Extensive experiments show that our framework improves the FPR95 for OOD detection from 22\% to 15\% and from 49% to 28% on CIFAR-10 and CIFAR-100 respectively compared to Euclidean methods.
- Published
- 2024
149. Outlier-Detection for Reactive Machine Learned Potential Energy Surfaces
- Author
-
Vazquez-Salazar, Luis Itza, Käser, Silvan, and Meuwly, Markus
- Subjects
Physics - Chemical Physics ,Computer Science - Machine Learning - Abstract
Uncertainty quantification (UQ) to detect samples with large expected errors (outliers) is applied to reactive molecular potential energy surfaces (PESs). Three methods - Ensembles, Deep Evidential Regression (DER), and Gaussian Mixture Models (GMM) - were applied to the H-transfer reaction between ${\it syn-}$Criegee and vinyl hydroxyperoxide. The results indicate that ensemble models provide the best results for detecting outliers, followed by GMM. For example, from a pool of 1000 structures with the largest uncertainty, the detection quality for outliers is $\sim 90$ \% and $\sim 50$ \%, respectively, if 25 or 1000 structures with large errors are sought. On the contrary, the limitations of the statistical assumptions of DER greatly impacted its prediction capabilities. Finally, a structure-based indicator was found to be correlated with large average error, which may help to rapidly classify new structures into those that provide an advantage for refining the neural network.
- Published
- 2024
150. OVOR: OnePrompt with Virtual Outlier Regularization for Rehearsal-Free Class-Incremental Learning
- Author
-
Huang, Wei-Cheng, Chen, Chun-Fu, and Hsu, Hsiang
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent works have shown that by using large pre-trained models along with learnable prompts, rehearsal-free methods for class-incremental learning (CIL) settings can achieve superior performance to prominent rehearsal-based ones. Rehearsal-free CIL methods struggle with distinguishing classes from different tasks, as those are not trained together. In this work we propose a regularization method based on virtual outliers to tighten decision boundaries of the classifier, such that confusion of classes among different tasks is mitigated. Recent prompt-based methods often require a pool of task-specific prompts, in order to prevent overwriting knowledge of previous tasks with that of the new task, leading to extra computation in querying and composing an appropriate prompt from the pool. This additional cost can be eliminated, without sacrificing accuracy, as we reveal in the paper. We illustrate that a simplified prompt-based method can achieve results comparable to previous state-of-the-art (SOTA) methods equipped with a prompt pool, using much less learnable parameters and lower inference cost. Our regularization method has demonstrated its compatibility with different prompt-based methods, boosting those previous SOTA rehearsal-free CIL methods' accuracy on the ImageNet-R and CIFAR-100 benchmarks. Our source code is available at https://github.com/jpmorganchase/ovor., Comment: Accepted by ICLR 2024
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.