29,034 results on '"Outlier"'
Search Results
2. Comparative Analysis of Generic Outlier Detection Techniques
- Author
-
Vasudev, Kini T., Manohara Pai, M. M., Pai, Radhika M., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Guru, D. S., editor, Kumar, N. Vinay, editor, and Javed, Mohammed, editor
- Published
- 2024
- Full Text
- View/download PDF
3. Abnormal Transaction Node Detection on Bitcoin
- Author
-
Zhang, Yuhang, Lu, Yanjing, Li, Mian, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Tan, Kay Chen, Series Editor, Zhang, Yonghong, editor, Qi, Lianyong, editor, Liu, Qi, editor, Yin, Guangqiang, editor, and Liu, Xiaodong, editor
- Published
- 2024
- Full Text
- View/download PDF
4. Combined Processing of Outlier and Multipath in GNSS Precise Point Positioning
- Author
-
Yuan, Haijun, He, Xiufeng, Zhang, Zhetao, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Tan, Kay Chen, Series Editor, Yang, Changfeng, editor, and Xie, Jun, editor
- Published
- 2024
- Full Text
- View/download PDF
5. Detection of outliers in survey–weighted linear regression.
- Author
-
Kumar, Raju, Biswas, Ankur, Singh, Deepak, and Ahmad, Tauqueer
- Abstract
Regression diagnostics help identify influential data points in a model. Detecting outliers in complex survey design data involving stratification, clustering, and unequal probability sampling is difficult due to the presence of masking, where one outlier makes it hard to detect others. The masking factor for survey–weighted linear regression is developed and applied to analyzing the Household Consumer Expenditure dataset of 68th round of the National Sample Survey Organization survey of India. Regression parameters are calculated before and after detection and removal of outliers. The standard error of regression parameters for survey-weighted least squares models is reduced by 2% for the intercept, 5% for variable “meat” (
X 5), 4% for “served processed food” (X 9), and 4% for “packaged processed food” (X 10). Inference alters the significance of regression coefficients of the variable “served processed food” (X 9) leading to the emergence of significance. There is no change in inference for other variables. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
6. A dynamic density-based clustering method based on K-nearest neighbor.
- Author
-
Sorkhi, Mahshid Asghari, Akbari, Ebrahim, Rabbani, Mohsen, and Motameni, Homayun
- Subjects
K-nearest neighbor classification ,BIG data - Abstract
Many density-based clustering algorithms already proposed in the literature are capable of finding clusters with different shapes, sizes, and densities. Also, the noise points are detected well. However, many of these methods require input parameters that are static and must be defined by user. Since it is difficult for users to determine these parameters in large data sets, the proper determination of them has an effective role in the identification of a suitable clustering. Therefore, a challenge in this domain is how to reduce the number of input parameters, thereby reducing the errors caused by users' involvement. In order to handle this challenge, a dynamic density-based clustering (DDBC) method is proposed in this paper for clustering purposes, which needs the smallest number of parameters to be set by users since many of them are determined automatically. This method has the ability to distinguish close clusters with different densities in a dynamic manner. Additionally, it can detect outliers and noises before starting the clustering process without scanning these points. Several real and artificial data sets were used to examine the efficiency of the proposed method, and its outcomes were compared to those of other algorithms in this domain. The comparative results confirmed the acceptable performance of DDBC and its higher accuracy in clustering tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Sensitivity analysis with iterative outlier detection for systematic reviews and meta‐analyses.
- Author
-
Meng, Zhuo, Wang, Jingshen, Lin, Lifeng, and Wu, Chong
- Subjects
- *
OUTLIER detection , *SENSITIVITY analysis , *HETEROGENEITY , *DECISION making - Abstract
Meta‐analysis is a widely used tool for synthesizing results from multiple studies. The collected studies are deemed heterogeneous when they do not share a common underlying effect size; thus, the factors attributable to the heterogeneity need to be carefully considered. A critical problem in meta‐analyses and systematic reviews is that outlying studies are frequently included, which can lead to invalid conclusions and affect the robustness of decision‐making. Outliers may be caused by several factors such as study selection criteria, low study quality, small‐study effects, and so on. Although outlier detection is well‐studied in the statistical community, limited attention has been paid to meta‐analysis. The conventional outlier detection method in meta‐analysis is based on a leave‐one‐study‐out procedure. However, when calculating a potentially outlying study's deviation, other outliers could substantially impact its result. This article proposes an iterative method to detect potential outliers, which reduces such an impact that could confound the detection. Furthermore, we adopt bagging to provide valid inference for sensitivity analyses of excluding outliers. Based on simulation studies, the proposed iterative method yields smaller bias and heterogeneity after performing a sensitivity analysis to remove the identified outliers. It also provides higher accuracy on outlier detection. Two case studies are used to illustrate the proposed method's real‐world performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. occTest: An integrated approach for quality control of species occurrence data.
- Author
-
Serra‐Diaz, Josep M., Borderieux, Jeremy, Maitner, Brian, Boonman, Coline C. F., Park, Daniel, Guo, Wen‐Yong, Callebaut, Arnaud, Enquist, Brian J., Svenning, Jens‐C., and Merow, Cory
- Abstract
Aim Innovation Main conclusions Species occurrence data are valuable information that enables one to estimate geographical distributions, characterize niches and their evolution, and guide spatial conservation planning. Rapid increases in species occurrence data stem from increasing digitization and aggregation efforts, and citizen science initiatives. However, persistent quality issues in occurrence data can impact the accuracy of scientific findings, underscoring the importance of filtering erroneous occurrence records in biodiversity analyses.We introduce an R package, occTest, that synthesizes a growing open‐source ecosystem of biodiversity cleaning workflows to prepare occurrence data for different modelling applications. It offers a structured set of algorithms to identify potential problems with species occurrence records by employing a hierarchical organization of multiple tests. The workflow has a hierarchical structure organized in testPhases (i.e. cleaning vs. testing) that encompass different testBlocks grouping different testTypes (e.g. environmental outlier detection), which may use different testMethods (e.g. Rosner test, jacknife,etc.). Four different testBlocks characterize potential problems in geographic, environmental, human influence and temporal dimensions. Filtering and plotting functions are incorporated to facilitate the interpretation of tests. We provide examples with different data sources, with default and user‐defined parameters. Compared to other available tools and workflows, occTest offers a comprehensive suite of integrated tests, and allows multiple methods associated with each test to explore consensus among data cleaning methods. It uniquely incorporates both coordinate accuracy analysis and environmental analysis of occurrence records. Furthermore, it provides a hierarchical structure to incorporate future tests yet to be developed.occTest will help users understand the quality and quantity of data available before the start of data analysis, while also enabling users to filter data using either predefined rules or custom‐built rules. As a result, occTest can better assess each record's appropriateness for its intended application. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. A novel outlier detection method based on Bayesian change point analysis and Hampel identifier for GNSS coordinate time series.
- Author
-
Pehlivan, Hüseyin
- Subjects
OUTLIER detection ,CHANGE-point problems ,GLOBAL Positioning System ,TIME series analysis ,ROOT-mean-squares ,GAUSSIAN distribution ,SIGNAL-to-noise ratio - Abstract
The identification and removal of outliers in time series are important problems in numerous fields. In this paper, a novel method (BCP-HI) is proposed to enhance the accuracy of outlier detection in GNSS coordinate time series by combining Bayesian change point (BCP) analysis and the Hampel identifier (HI). By using BCP, change points (cps) in the time series are lidentified, and so the time series is divided into subsegments that have properties of a normal distribution. In each of these separated segments, outliers are detected using HI. Each data element identified as an outlier is corrected by a median filter of window size (w) to obtain the corrected signal. The BCP-HI method was tested on both simulated and real GNSS coordinate time series. Outliers from three different synthetic test datasets with different sampling frequencies and outlier amplitudes were detected with approximately 98% accuracy after processing. After this process, Signal-to-Noise Ratio (SNR) increased from 0.0084 to 10.8714 dB and Root Mean Square (RMS) decreased from 24 to 23 mm. Similarly, for real GNSS data, approximately 98% accuracy was achieved, with an increase in SNR from 0.0003 to 4.4082 dB and a decrease in RMS from 7.6 to 6.6 mm observed. In addition, the output signals after BCP-HI were examined graphically using Lomb–Scargle periodograms and it was observed that clearer power spectrum distributions emerged. When the input and output signals were examined using the Kolmogorov–Smirnov (KS) test, they were found to be statistically similar. These results indicate that the BCP-HI algorithm effectively removes outliers, and enhances processing accuracy and reliability, and improves signal quality. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. High-dimensional robust inference for censored linear models.
- Author
-
Huang, Jiayu and Wu, Yuanshan
- Abstract
Due to the direct statistical interpretation, censored linear regression offers a valuable complement to the Cox proportional hazards regression in survival analysis. We propose a rank-based high-dimensional inference for censored linear regression without imposing any moment condition on the model error. We develop a theory of the high-dimensional U-statistic, circumvent challenges stemming from the non-smoothness of the loss function, and establish the convergence rate of the regularized estimator and the asymptotic normality of the resulting de-biased estimator as well as the consistency of the asymptotic variance estimation. As censoring can be viewed as a way of trimming, it strengthens the robustness of the rank-based high-dimensional inference, particularly for the heavy-tailed model error or the outlier in the presence of the response. We evaluate the finite-sample performance of the proposed method via extensive simulation studies and demonstrate its utility by applying it to a subcohort study from The Cancer Genome Atlas (TCGA). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Performance of T2-based PCA mix control chart with KDE control limit for monitoring variable and attribute characteristics.
- Author
-
Ahsan, Muhammad, Mashuri, Muhammad, Prastyo, Dedy Dwi, and Lee, Muhammad Hisyam
- Subjects
- *
QUALITY control charts , *PROBABILITY density function , *FALSE alarms - Abstract
In this work, the mixed multivariate T2 control chart's detailed performance evaluation based on PCA mix is explored. The control limit of the proposed control chart is calculated using the kernel density approach. Through simulation studies, the proposed chart's performance is assessed in terms of its capacity to identify outliers and process shifts. When 30% more outliers are included in the data, the proposed chart provides a consistent accuracy rate for identifying mixed outliers. For the balanced percentage of attribute qualities, misdetection happens because of the high false alarm rate. For unbalanced attribute qualities and excessive proportions, the masking effect is the key issue. The proposed chart shows the improved performance for the shift in identifying the shift in the process. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Sequential data analysis and outlier prediction using hybrid seagull optimized neural network and extreme value analysis.
- Author
-
Swaroop, Chigurupati Ravi and Raja, K.
- Abstract
With the rapid development of deep learning approaches, outlier detection is essential and used in a variety of applications. It is challenging to predict the boundaries from the original and abnormal regions due to the unlabelled dataset. Complexity also arises with the noise identified in real-time data and the tendency of the newer data included in the dataset. These challenges are resolved with a novel approach hybrid seagull optimized Convolutional Neural Network with Extreme Value Theory (CNN-EVT), by measuring the deviation of the dataset from the original distribution. The deviation and the probability of extreme values are estimated for the uni-variate data streams and peak threshold. The proposed approach is simulated in a MATLAB environment and evaluated with the performance metrics such as prediction accuracy, specificity, sensitivity, G-mean, Matthews correlation coefficient (MCC), Mean Square Error (MSE) and K 2 is evaluated. Using the proposed approach increases the accuracy by 95.6% with a mean square error of 0.049. The proposed approach is compared with the existing state of the approaches. The comparison results show that the proposed approach outperforms the existing approaches in terms of performance metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. NODSTAC: Novel Outlier Detection Technique Based on Spatial, Temporal and Attribute Correlations on IoT Bigdata.
- Author
-
Brahmam, M Veera and Gopikrishnan, S
- Abstract
An outlier in the Internet of Things is an immediate change in data induced by a significant difference in the atmosphere (Event) or sensor malfunction (Error). Outliers in the data cause the decision-maker to make incorrect judgments about data analysis. Hence it is essential to detect outliers in any discipline. The detection of outliers becomes the most crucial task to improve sensor data quality and ensure accuracy, reliability and robustness. In this research, a novel outlier detection technique based on spatial, temporal correlations and attribute correlations is proposed to detect outliers (both Errors and Events). This research uses a correlation measure in the temporal correlation algorithm to determine outliers and the spatial correlation algorithm to classify the outliers, whether the outliers are events or errors. This research uses optimal clusters to improve network lifetime, and malicious nodes were also detected based on spatial–temporal correlations and attribute correlations in these clusters. The experimental results proved that the proposed method in this research outperforms some other models in terms of accuracy against the percentage of outliers infused and detection rate against the false alarm rate. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Typical Yet Unlikely and Normally Abnormal: The Intuition Behind High-Dimensional Statistics.
- Author
-
Vowels, Matthew J.
- Abstract
Normality, in the colloquial sense, has historically been considered an aspirational trait, synonymous with ideality. The arithmetic average and, by extension, statistics including linear regression coefficients, have often been used to characterize normality, and are often used as a way to summarize samples and identify outliers. We provide intuition behind the behavior of such statistics in high dimensions, and demonstrate that even for datasets with a relatively low number of dimensions, data start to exhibit a number of peculiarities which become severe as the number of dimensions increases. Whilst our main goal is to familiarize researchers with these peculiarities, we also show that normality can be better characterized with 'typicality', an information theoretic concept relating to entropy. An application of typicality to both synthetic and real-world data concerning political values reveals that in multi-dimensional space, to be 'normal' is actually to be atypical. We briefly explore the ramifications for outlier detection, demonstrating how typicality, in contrast with the popular Mahalanobis distance, represents a viable method for outlier detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Reliability for Zeghdoudi distribution with an outlier, fuzzy reliability and application.
- Author
-
Belhamra, Thara, Zeghdoudi, Halim, and Raman, Vinoth
- Subjects
MAXIMUM likelihood statistics ,NEWTON-Raphson method ,RANDOM variables ,PARAMETER estimation ,COMPUTER simulation - Abstract
This study focuses on estimating reliability P[Y
- Published
- 2024
- Full Text
- View/download PDF
16. A novel outlier detection method based on Bayesian change point analysis and Hampel identifier for GNSS coordinate time series
- Author
-
Hüseyin Pehlivan
- Subjects
GNSS data ,Outlier ,Hampel identifier ,Bayesian change point ,Time series ,Telecommunication ,TK5101-6720 ,Electronics ,TK7800-8360 - Abstract
Abstract The identification and removal of outliers in time series are important problems in numerous fields. In this paper, a novel method (BCP-HI) is proposed to enhance the accuracy of outlier detection in GNSS coordinate time series by combining Bayesian change point (BCP) analysis and the Hampel identifier (HI). By using BCP, change points (cps) in the time series are lidentified, and so the time series is divided into subsegments that have properties of a normal distribution. In each of these separated segments, outliers are detected using HI. Each data element identified as an outlier is corrected by a median filter of window size (w) to obtain the corrected signal. The BCP-HI method was tested on both simulated and real GNSS coordinate time series. Outliers from three different synthetic test datasets with different sampling frequencies and outlier amplitudes were detected with approximately 98% accuracy after processing. After this process, Signal-to-Noise Ratio (SNR) increased from 0.0084 to 10.8714 dB and Root Mean Square (RMS) decreased from 24 to 23 mm. Similarly, for real GNSS data, approximately 98% accuracy was achieved, with an increase in SNR from 0.0003 to 4.4082 dB and a decrease in RMS from 7.6 to 6.6 mm observed. In addition, the output signals after BCP-HI were examined graphically using Lomb–Scargle periodograms and it was observed that clearer power spectrum distributions emerged. When the input and output signals were examined using the Kolmogorov–Smirnov (KS) test, they were found to be statistically similar. These results indicate that the BCP-HI algorithm effectively removes outliers, and enhances processing accuracy and reliability, and improves signal quality.
- Published
- 2024
- Full Text
- View/download PDF
17. Performance of T 2-based PCA mix control chart with KDE control limit for monitoring variable and attribute characteristics
- Author
-
Muhammad Ahsan, Muhammad Mashuri, Dedy Dwi Prastyo, and Muhammad Hisyam Lee
- Subjects
Hotelling’s T 2 ,Kernel density estimation ,Mixed quality characteristics ,Outlier ,PCA mix ,Medicine ,Science - Abstract
Abstract In this work, the mixed multivariate T 2 control chart’s detailed performance evaluation based on PCA mix is explored. The control limit of the proposed control chart is calculated using the kernel density approach. Through simulation studies, the proposed chart’s performance is assessed in terms of its capacity to identify outliers and process shifts. When 30% more outliers are included in the data, the proposed chart provides a consistent accuracy rate for identifying mixed outliers. For the balanced percentage of attribute qualities, misdetection happens because of the high false alarm rate. For unbalanced attribute qualities and excessive proportions, the masking effect is the key issue. The proposed chart shows the improved performance for the shift in identifying the shift in the process.
- Published
- 2024
- Full Text
- View/download PDF
18. Outlier Detection Performance of a Modified Z-Score Method in Time-Series RSS Observation With Hybrid Scale Estimators
- Author
-
Abdulmalik Shehu Yaro, Filip Maly, Pavel Prazak, and Karel Maly
- Subjects
Average ,hybrid scale estimator ,MAD ,maximum ,mZ-score ,outlier ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The modified Z-score (mZ-score) method has been used to detect outliers in time series received signal strength (RSS) observations. Its performance is dependent on the scale estimator used, and each has advantages and disadvantages over the others. One approach to developing a scale estimator that combines the advantages of two or more scale estimators is through scale estimator hybridization. In this paper, the outlier detection performance of a mZ-score method with different hybridization approaches for Sn and median absolute deviation (MAD) scale estimators is determined and analysed. Three different hybrid scale estimators are identified, namely weighted, maximum, and average hybrid scale estimators. The performance of the mZ-score method using the three different hybrid scale estimators is determined using three experimentally generated and publicly available time-series RSS datasets. Based on the simulation results, the weighted hybrid scale estimator results in the best outlier detection performance amongst the three hybrid scale estimators. When compared to the mean-shift-based outlier detection (MOD) technique, the k-means clustering-based technique, and the density-based spatial clustering (DBSCAN) technique, the mZ-score method with the weighted hybrid scale estimator performs better with little or no false alarm and false negative detections.
- Published
- 2024
- Full Text
- View/download PDF
19. A Survey on How College Students in a Statistical Literacy Course Apply Statistics Terms to People
- Author
-
Lawrence M. Lesser and Martin Santos
- Subjects
Anonymous ,Average ,Language ,Lexical ambiguity ,Outlier ,Random ,Probabilities. Mathematical statistics ,QA273-280 ,Special aspects of education ,LC8-6691 - Abstract
AbstractAn anonymous survey was given to n = 73 students in an asynchronous online statistical literacy course at a mid-sized Hispanic Serving Institution. Informed by teaching experience, literature on lexical ambiguity, and everyday usage of statistics words and phrases, the first author designed the survey to yield insight into how students view phrases such as “average person,” “random person,” and “outlier person,” and to explore possible connections or patterns with such phrases. Findings suggest that students view phrases such as “random person” in a way much further from standard usage than they do phrases such as “anonymous person.” Considerations such as diversity, agency, and variable identification are identified as possibly affecting usage and meriting further investigation. Recommendations for teaching and future research are provided. Supplementary materials for this article are available online.
- Published
- 2024
- Full Text
- View/download PDF
20. Low Velocity Impact Monitoring of Composite Tubes Based on FBG Sensors.
- Author
-
Huan, Shengsheng, Lu, Linjiao, Shen, Tao, and Du, Jianke
- Subjects
- *
FIBER Bragg gratings , *FIBROUS composites , *PIPE flow , *DISCRETE wavelet transforms , *IMPACT (Mechanics) , *CARBON composites , *DETECTORS - Abstract
Carbon fiber reinforced composites (CFRP) are susceptible to hidden damage from low velocity external impacts during their service life. To ensure the proper monitoring of the state of the composites, it is crucial to predict the location of an impact event. In this paper, fiber Bragg grating (FBG) sensors are affixed to the surface of a carbon fiber composite tube, and an optical sensing interrogator is used to capture the central wavelength shift of the FBG sensors due to low-velocity impacts. A discrete wavelet transform is used for noise reduction in the response signals. Then, the differences in the captured response signals of the FBG sensors at different locations of the impact were analyzed. Moreover, two methods were implemented to predict the location of low-velocity impacts, according to the differences in the captured response signals. The BP neural network-based method utilized three data sets to train the neural network, resulting in an average localization error of 20.68 mm. In contrast, the method based on error outliers selected a specific data set as the reference dataset, achieving an average localization error of 13.98 mm. The comparison of the predicted results shows that the latter approach has a higher predictive accuracy and does not require a significant amount of data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Low-Flow Identification in Flood Frequency Analysis: A Case Study for Eastern Australia.
- Author
-
Rima, Laura, Haddad, Khaled, and Rahman, Ataur
- Subjects
DISTRIBUTION (Probability theory) ,STREAM-gauging stations ,FLOOD damage ,RUNOFF ,FLOODS - Abstract
Design flood estimation is an essential step in many water engineering design tasks such as the planning and design of infrastructure to reduce flood damage. Flood frequency analysis (FFA) is widely used in estimating design floods when the at-site flood data length is adequate. One of the problems in FFA with an annual maxima (AM) modeling approach is deciding how to handle smaller discharge values (outliers) in the selected AM flood series at a given station. The objective of this paper is to explore how the practice of censoring (which involves adjusting for smaller discharge values in FFA) affects flood quantile estimates in FFA. In this regard, two commonly used probability distributions, log-Pearson type 3 (LP3) and generalized extreme value distribution (GEV), are used. The multiple Grubbs and Beck (MGB) test is used to identify low-flow outliers in the selected AM flood series at 582 Australian stream gauging stations. It is found that censoring is required for 71% of the selected stations in using the MGB test with the LP3 distribution. The differences in flood quantile estimates between LP3 (with MGB test and censoring) and GEV distribution (without censoring) increase as the return period reduces. A modest correlation is found (for South Australian catchments) between censoring and the selected catchment characteristics (correlation coefficient: 0.43), with statistically significant associations for the mean annual rainfall and catchment shape factor. The findings of this study will be useful to practicing hydrologists in Australia and other countries to estimate design floods using AM flood data by FFA. Moreover, it may assist in updating Australian Rainfall and Runoff (national guide). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. A Study of Outliers in GNSS Clock Products.
- Author
-
Maciuk, Kamil, Varna, Inese, and Krzykowska-Piotrowska, Karolina
- Subjects
- *
GLOBAL Positioning System , *ORBITS (Astronomy) , *NATURAL satellites - Abstract
Time is an extremely important element in the field of GNSS positioning. In precise positioning with a single-centimetre accuracy, satellite clock corrections are used. In this article, the longest available data set of satellite clock corrections of four GNSS systems from 2014 to 2021 was analysed. This study covers the determination of the quality (outliers number and magnitude), availability, stability, and determination of the specificity and nature of the clock correction for each satellite system. One problem with the two newest satellite systems (Galileo and BeiDou) is the lack of availability of satellite signals in the early years of the analysis. These data were available only in the later years of the period covered by the analysis, as most of the satellites have only been in orbit since 2018–2019. Interestingly, the percentage of outlying observations was highest in Galileo and lowest in BeiDou. Phase and frequency plots showed a significant number of outlying observations. On the other hand, after eliminating outlying observations, each system showed a characteristic graph waveform. The most consistent and stable satellite clock corrections are provided by the GPS and GLONASS systems. The main problems discussed in this paper are the determination of the number and magnitude of outliers in clock products of four GNSS systems (GPS, GLONASS, Galileo, Beidou) and the study on the long-term stability of GNSS clocks analysis, which covers the years 2014–2021. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Robust Liu‐type estimator based on GM estimator.
- Author
-
Işılar, Melike and Bulut, Y. Murat
- Subjects
- *
MULTICOLLINEARITY , *MONTE Carlo method , *REGRESSION analysis , *PROBLEM solving , *INDEPENDENT variables - Abstract
Ordinary Least Squares Estimator (OLSE) is widely used to estimate parameters in regression analysis. In practice, the assumptions of regression analysis are often not met. The most common problems that break these assumptions are outliers and multicollinearity problems. As a result of these problems, OLSE loses efficiency. Therefore, alternative estimators to OLSE have been proposed to solve these problems. Robust estimators are often used to solve the outlier problem, and biased estimators are often used to solve the multicollinearity problem. These problems do not always occur individually in the real‐world dataset. Therefore, robust biased estimators are proposed for simultaneous solutions to these problems. The aim of this study is to propose Liu‐type Generalized M Estimator as an alternative to the robust biased estimators available in the literature to obtain more efficient results. This estimator gives effective results in the case of outlier and multicollinearity in both dependent and independent variables. The proposed estimator is theoretically compared with other estimators available in the literature. In addition, Monte Carlo simulation and real dataset example are performed to compare the performance of the estimator with existing estimators. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. A data-adaptive method for outlier detection from functional data.
- Author
-
Lakra, Arjun, Banerjee, Buddhananda, and Laha, Arnab Kumar
- Abstract
Outliers present in a data set can severely impact the statistical analysis and lead to erroneous conclusions. Hence, outlier identification is an important task before analysis of data is undertaken. Outliers being different from the rest of the observations in a data set may contain valuable information which can be obtained by carefully examining the identified outliers. While several methods of outlier identification exists for univariate and multivariate data, not that many methods exist for functional data. In sequential identification of outliers from a set of functional data, the corresponding estimation of covariance operator is affected by the outliers that are still present in the data. This leads to degradation in performance of these methods when the proportion of outliers in the data set increases. In this paper we propose a new outlier detection algorithm that uses an adaptive and data driven approach of dimension selection. The proposed method is seen to have better efficiency in an extensive simulation exercise in comparison to the existing method. Three illustrations with real life environmental data sets are also reported. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Methodology for Introducing Corrections in Statistical Studies Based on Control Charts.
- Author
-
Sidnyaev, N. I. and Battulga, E.
- Abstract
It is shown that changes in conditions during an experiment can go unnoticed and how anomalous measurements arise that can lead to incorrect values containing gross errors. It is interpreted that distortion of measurements may be a consequence of incorrect operation of recording devices; if a fault is detected, such values should be discarded. The problems of the uncertainty of information on input data in calculations using classical methods are outlined. The influence of the deviation of external impacts from nominal values, variability of emission intensity, and the nonlinear nature of influence of external factors on the probability of an event were studied. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Separation of the Linear and Nonlinear Covariates in the Sparse Semi-Parametric Regression Model in the Presence of Outliers †.
- Author
-
Amini, Morteza, Roozbeh, Mahdi, and Mohamed, Nur Anisah
- Subjects
- *
REGRESSION analysis , *INDEPENDENT variables , *MONTE Carlo method , *REAL property sales & prices , *RESEARCH personnel - Abstract
Determining the predictor variables that have a non-linear effect as well as those that have a linear effect on the response variable is crucial in additive semi-parametric models. This issue has been extensively investigated by many researchers in the area of semi-parametric linear additive models, and various separation methods are proposed by the authors. A popular issue that might affect both estimation and separation results is the existence of outliers among the observations. In order to address this lack of sensitivity towards extreme observations, robust estimating approaches are frequently applied. We propose a robust method for simultaneously identifying the linear and nonlinear components of a semi-parametric linear additive model, even in the presence of outliers in the observations. Additionally, this model is sparse in that it may be used to determine which explanatory variables are ineffective by giving accurate zero estimates for their coefficients. To assess the effectiveness of the proposed method, a comprehensive Monte Carlo simulation study is conducted along with an application to investigate the dataset, which includes Boston property prices dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. An Application of Robust Principal Component Analysis Methods for Anomaly Detection.
- Author
-
BAĞCI GENEL, Kübra and ÇELİK, Halit Eray
- Subjects
- *
PRINCIPAL components analysis , *ANOMALY detection (Computer security) , *ROBUST statistics , *INVARIANT subspaces , *LITHIUM-ion batteries - Abstract
Ensuring a secure network environment is crucial, especially with the increasing number of threats and attacks on digital systems. Implementing effective security measures, such as anomaly detection can help detect any abnormal traffic patterns. Several statistical and machine learning approaches are used to detect network anomalies including robust statistical methods. Robust methods can help identify abnormal traffic patterns and distinguish them from normal traffic accurately. In this study, a robust Principal Component Analysis (PCA) method called ROBPCA which is known for its extensive use in the literature of chemometrics and genetics is utilized for detecting network anomalies and compared with another robust PCA method called PCAGRID. The anomaly detection performances of these methods are evaluated by injecting synthetic traffic volume into a well-known traffic matrix. According to the application results, when the normal subspace is contaminated with large anomalies the ROBPCA method provides much better performance in detecting anomalies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. A Survey on How College Students in a Statistical Literacy Course Apply Statistics Terms to People.
- Author
-
Lesser, Lawrence M. and Santos, Martin
- Subjects
- *
STATISTICAL literacy , *COLLEGE students , *ANONYMOUS persons , *TEACHING experience , *STATISTICS , *AMBIGUITY - Abstract
An anonymous survey was given to n = 73 students in an asynchronous online statistical literacy course at a mid-sized Hispanic Serving Institution. Informed by teaching experience, literature on lexical ambiguity, and everyday usage of statistics words and phrases, the first author designed the survey to yield insight into how students view phrases such as "average person," "random person," and "outlier person," and to explore possible connections or patterns with such phrases. Findings suggest that students view phrases such as "random person" in a way much further from standard usage than they do phrases such as "anonymous person." Considerations such as diversity, agency, and variable identification are identified as possibly affecting usage and meriting further investigation. Recommendations for teaching and future research are provided. Supplementary materials for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. "I'm Not a Refugee Girl, Call Me Bella": Professional Refugee Women, Agency, Recognition, and Emancipation.
- Author
-
Groutsis, Dimitria, Collins, Jock, and Reid, Carol
- Subjects
WOMEN refugees ,BUSINESSWOMEN ,REFUGEES ,LIBERTY ,SYRIAN refugees ,LABOR market ,AGENCY theory ,HUMAN capital - Abstract
The notion of refugees as a viable source of labor to address skill shortages in the destination country's labor market has rarely been the dominant discourse on refugee entrants. Bella's
1 lived experience as a professional woman who arrived as a Syrian conflict refugee to Australia in 2017 presents an outlier in refugee research and challenges conventional scholarly wisdom and public discourse. A combination of human capital, a purposeful use of networks, supported by her desire for recognition and a deep sense of self-worth allowed her to navigate the formalized and structured Australian business landscape. Accordingly, she was able to overcome the stigma of being a refugee: Less worthy of employment status in a position representative of her overseas skills and qualifications. In drawing on an outlier methodology and critical theory, we develop a more nuanced understanding of the agency of skilled and qualified refugee women drawing attention to lessons for business which typically takes a "one size fits all" approach to labor integration. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
30. Innovative Inter Quartile Range-based Outlier Detection and Removal Technique for Teaching Staff Performance Feedback Analysis.
- Author
-
Magar, Vikas, Ruikar, Darshan, Bhoite, Sachine, and Mente, Rajivkumar
- Subjects
TEACHING methods ,EMPLOYEE reviews ,DATA distribution ,OUTLIERS (Statistics) ,OUTLIER detection ,COLLEGE teachers - Abstract
The teaching-learning process plays an important role in education. To improve this process valuable and timely feedback is taken from the students. That feedback can be used for two main purposes: one is to improve the process according to student expectations and another is to evaluate the teaching faculty's performance for the sake of appraisal. However, some students can give unfair and biased feedback. Such feedback may produce an adverse effect on appraisal. To remove these anomalies generated due to favoritism and biasness in the feedback innovative interquartile range (IQR) based outlier detection and removal technique is implemented in this article. The proposed technique removes the outliers based on IQR and precisely selects the central tendency (mean or median) from the feedback data distribution based on skewness. Then the identified central tendency will be considered to compute the appraisal indicator. To conduct the experiments feedback data is collected from the students via questionnaire. The questionnaire is prepared by expert academicians. The questionnaire contains twenty-one questions which are divided into five categories. By confirming experimental results, the proposed IQR-based outlier detection and removal technique removes the outliers from the data set and improves the performance analysis which intern helpful for teaching faculty performance appraisal. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Outlier robust inference in the instrumental variable model with applications to causal effects.
- Author
-
Klooster, Jens and Zhelonkin, Mikhail
- Subjects
CAUSAL models ,CHI-square distribution - Abstract
Summary: The Anderson‐Rubin (AR) test is an important method that allows for reliable inference in the instrumental variable model when the instruments are weak. Yet, the robustness properties of this test have not been formally studied. As it turns out that the AR test is not robust to outliers, we show how to construct an outlier robust alternative—the robust AR test. We investigate the robustness properties of the robust AR test and show that the robust AR statistic asymptotically follows a chi‐square distribution. The theoretical results are illustrated by a simulation study. Finally, we apply the robust AR test to three different case studies that are affected by different types of outliers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Some enhancements to DeepKriging.
- Author
-
Lin, Ding‐Chih, Huang, Hsin‐Cheng, and Tzeng, ShengLi
- Subjects
- *
PARTICULATE matter , *SET functions , *DATA structures , *SPLINES , *FEATURE selection , *OUTLIER detection - Abstract
With an increased volume of spatial data, conventional spatial prediction methods have encountered significant challenges in handling complex data structures while maintaining statistical and computational efficiency. Recently, a spatial neural network method called DeepKriging has been proposed, which utilizes a set of basis functions as an embedded input to capture spatial information. In this study, we enhance this approach by using an ordered set of multi‐resolution thin plate spline basis functions, which offers ease of implementation and alleviates the challenges associated with basis function allocation, particularly when the data locations are highly irregular. The proposed method requires only the selection of the number of basis functions based on a validation dataset. In addition, we propose a robust version of DeepKriging, which is resistant to outliers. Several simulation experiments are conducted to show the advantages of our method over conventional statistical methods and the original DeepKriging approach. Finally, we apply the proposed method to a PM2.5 dataset in Taiwan. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
33. Estimation of loss on ignition values of the magnesite minerals using robust multiple regression.
- Author
-
Akıska, Sinan, Akıska, Elif, and Güney, Yeşim
- Subjects
- *
MAGNESITE , *INDUSTRIAL minerals , *MINERALS , *ANALYTICAL geochemistry , *MAGNESIUM compounds , *MULTIPLE regression analysis , *PARAMETER estimation - Abstract
Magnesite is an ore used in the production of a wide variety of industrial minerals and compounds and magnesium metal, as well as its alloys. The main components of magnesite are MgO and CO2. However, magnesite, which is not generally observed in nature as pure, contains certain amounts of SiO2, CaO, and Fe2O3. The loss on ignition (LOI) value in magnesite minerals largely depends on the CO2 content. This study aims to develop a model that estimates the LOI values of magnesite minerals by using SiO2, MgO, CaO, and Fe2O3tot data obtained from the geochemical analysis. Measurement of the LOI can provide important information not only for calculating the amount of volatiles in magnesite, but also about the acquisition of the element magnesium. With the estimation of LOI values, samples will not need to be subjected to LOI analysis. The data used in this study were compiled from the literature and information about the study areas, deposit types, analysis methods and the laboratory/device names was presented as supplementary material. The multivariate linear regression model was applied to represent the relationship between the LOI and these major oxides. A global independent data set comprising 170 observations was used to validate the proposed model. The preliminary analysis of the data is discussed in detail to improve the quality of the analysis. In the first step, the OLS estimation method is implemented to estimate the unknown parameters. Then, model assumptions are tested. Due to the existence of outliers, violation of the assumptions can lead to the misuse of the OLS method. In such cases, obtaining reliable estimates depends on strong estimators such as robust estimators, which are resistant to the outliers. Robust M-estimation methods can be used as effective tools for this purpose. For this reason, in the final step, we consider a robust M-estimation method based on Huber, Tukey, and Hampel objective functions. The results of OLS and robust regression methods, including parameter estimates, standard errors, residual standard error, and weighted R2 are presented as a comparison. The validity of the models and the importance of each explanatory variable to the relationship are also investigated. According to the residual standard error (RSE) which is a way to measure the standard deviation of the residuals in a regression model, and R W 2 values, a robust M- estimation method based on all three considered objective functions produces more accurate results in comparison with the outcomes of the conventional OLS method. In particular, the robust estimation method based on the Tukey objective function has the smallest RSE and largest R W 2 among the others. It is observed that deleting a few outlying points has a big impact on the regression results. In this study, a multi-linear relationship between the main oxide values and LOI was determined. As a result, an estimation model with 91% accuracy and an RSE value of 0.283 was proposed. In addition, some relationships were established between the outliers and the determination of the ranges of major oxide values in the magnesite mineral. Our results suggest that robust M-estimation can provide efficient and stable estimates when analyzing geochemical data that may contain outliers. The application of the multivariate regression analysis has been confirmed to estimate LOI by using SiO2, MgO, CaO, and Fe2O3tot as a new approach. This additional field investigation has shown promising results. Also, we recommend that this study can be improved by using more data and considering different magnesite deposit types. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
34. Data Pre-processing Issues in Medical Data Classification.
- Author
-
Tuppad, Ashwini and Patil, Shantala Devi
- Subjects
MEDICAL coding ,BIG data ,MISSING data (Statistics) ,ACCESS to information - Abstract
With digitalization of data and the rise of World Wide Web, access to information has been very easy and affordable. Especially the Web and the Internet have boosted research activities by facilitating access to large, publicly available medical datasets under open access scheme. These developments have resulted in explosive amounts of data being generated varying in volume, variety and velocity thus referred to as big data. Availability of such medical big data has catalyzed the research in medical predictive analytics. However, the true value of such data can be derived only after subjecting it to careful processing and analysis before drawing inferences from it. Publicly available medical datasets have noise in the form of missing values, outliers and data inconsistencies, that may affect the results or outcomes negatively. Pre-processing of such data is essential to eliminate noisy elements and refine the data to be suitable for further analysis and processing. This paper signifies the need for data pre-processing and explains the data pre-processing pipeline with various underlying stages constituting it. It also presents a comparative analysis of various data pre-processing techniques for handling missing values and outliers in a dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
35. Machine Learning Method for Changepoint Detection in Short Time Series Data.
- Author
-
Smejkalová, Veronika, Šomplák, Radovan, Rosecký, Martin, and Šramková, Kristína
- Subjects
TIME series analysis ,OUTLIER detection ,MACHINE learning ,WASTE management ,SIMPLE machines ,DATA analysis - Abstract
Analysis of data is crucial in waste management to improve effective planning from both short- and long-term perspectives. Real-world data often presents anomalies, but in the waste management sector, anomaly detection is seldom performed. The main goal and contribution of this paper is a proposal of a complex machine learning framework for changepoint detection in a large number of short time series from waste management. In such a case, it is not possible to use only an expert-based approach due to the time-consuming nature of this process and subjectivity. The proposed framework consists of two steps: (1) outlier detection via outlier test for trend-adjusted data, and (2) changepoints are identified via comparison of linear model parameters. In order to use the proposed method, it is necessary to have a sufficient number of experts' assessments of the presence of anomalies in time series. The proposed framework is demonstrated on waste management data from the Czech Republic. It is observed that certain waste categories in specific regions frequently exhibit changepoints. On the micro-regional level, approximately 31.1% of time series contain at least one outlier and 16.4% exhibit changepoints. Certain groups of waste are more prone to the occurrence of anomalies. The results indicate that even in the case of aggregated data, anomalies are not rare, and their presence should always be checked. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
36. Adaptive local neighborhood information based efficient fuzzy clustering approach
- Author
-
Wu, Ziheng, Zhao, Yuan, Li, Cong, and Zhou, Fang
- Published
- 2024
- Full Text
- View/download PDF
37. Progression of clock DBD changes over time
- Author
-
Kamil Maciuk, Inese Varna, and Jacek Kudrys
- Subjects
gps ,satellite ,clock ,jump ,outlier ,dbd ,reference clock ,Technology - Abstract
Day-boundary discontinuity (DBD) is an effect present in precise GNSS satellite orbit and clock products originating from the method used for orbit and clock determination. The non-Gaussian measurement noise and data processing in 24 h batches are responsible for DBDs. In the case of the clock product, DBD is a time jump in the boundary epochs of two adjacent batches of processed data and its magnitude might reach a couple of ns. This article presents the four GNSS (Global Navigation Satellite System) systems DBD analysis in terms of change over an 8 year period. For each of 118 satellites available in this period, the yearly value of DBD was subject to analysis including standard deviation and frequency of outliers. Results show that the smallest DBDs appear in the GPS system, the biggest – for the BeiDou space segment. Moreover, the phenomenon of changes in DBDs over time is clearly seen at the beginning of the analysed period when the magnitude and number of the DBDs were larger than for current, newest clock products
- Published
- 2023
- Full Text
- View/download PDF
38. New Interval Improved Fuzzy Partitions Fuzzy C-Means Clustering Algorithms under Different Distance Measures for Symbolic Interval Data Analysis.
- Author
-
Chang, Sheng-Chieh, Chuang, Wei-Ching, and Jeng, Jin-Tsong
- Subjects
INTERVAL analysis ,DATA analysis ,INTERVAL measurement ,EUCLIDEAN distance ,EUCLIDEAN algorithm ,RESEARCH personnel ,MATHEMATICAL notation - Abstract
Symbolic interval data analysis (SIDA) has been successfully applied in a wide range of fields, including finance, engineering, and environmental science, making it a valuable tool for many researchers for the incorporation of uncertainty and imprecision in data, which are often present in real-world scenarios. This paper proposed the interval improved fuzzy partitions fuzzy C-means (IIFPFCM) clustering algorithm from the viewpoint of fast convergence that independently combined with Euclidean distance and city block distance. The two proposed methods both had a faster convergence speed than the traditional interval fuzzy c-means (IFCM) clustering method in SIDA. Moreover, there was a problem regarding large and small group division for symbolic interval data. The proposed methods also had better performance results than the traditional interval fuzzy c-means clustering method in this problem. In addition, the traditional IFCM clustering method will be affected by outliers. This paper also proposed the IIFPFCM algorithm to deal with outliers from the perspective of interval distance measurement. From experimental comparative analysis, the proposed IIFPFCM clustering algorithm with the city block distance measure was found to be suitable for dealing with SIDA with outliers. Finally, nine symbolic interval datasets were assessed in the experimental results. The statistical results of convergence and efficiency on performance revealed that the proposed algorithm has better results. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
39. A novel hybrid robust tapering approach for nonlinear regression in the presence of autocorrelation and outliers.
- Author
-
Kucuk, Serenay and Asikgil, Baris
- Subjects
- *
NONLINEAR regression , *LEAST squares , *MONTE Carlo method , *PARAMETER estimation , *AUTOCORRELATION (Statistics) , *OUTLIER detection - Abstract
Nonlinear models are commonly used for analyzing real-life data such as in medicine, engineering, and economics. To make efficient inferences about model parameter estimations and statistical results in nonlinear regression, assumptions related to error term are needed to be satisfied. Ordinary least squares and some modified least squares methods fail to give efficient parameter estimates when there are the problems of autocorrelation and outlier together in nonlinear regression. In this study, a novel hybrid robust tapering approach called as robust modified two-stage least squares is proposed to overcome the problems for obtaining more efficient parameter estimates in nonlinear regression. Two numerical examples and a comprehensive Monte–Carlo simulation study are given in order to examine the performance of robust modified two-stage least squares. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
40. Learning-based correspondence classifier with self-attention hierarchical network.
- Author
-
Chu, Mingfan, Ma, Yong, Mei, Xiaoguang, Huang, Jun, and Fan, Fan
- Subjects
POSE estimation (Computer vision) ,IMAGE registration ,REMOTE sensing - Abstract
Finding valid correspondences is of considerable significance to image matching, which has been regarded as the key of numerous vision-based tasks. Current methods usually have drawbacks in sets with high proportion of outliers. To address the problem, given a set of putative correspondences in two images, this paper proposes a novel framework (named SAH-Net) to remove outliers and recover camera pose through essential matrix using an end-to-end network. The proposed SAH-Net is hierarchical with a multi-scale structure, which consists of correspondence level and cluster level. First, correspondence level takes advantage of two-view geometry to learn correspondence features. Next, in order to integrate structural information of the scene, correspondences are pooled via a self-attention method. Additionally, SAH-Net applies a spatial correlation operation after the clustering, separating features into segments and learning the spatial characteristics of clustered nodes. Finally, clusters have been integrated with spatial information, and they are recovered to original scale via a learned upsampling operation. Extensive experiments are conducted on remote sensing image registration, general image matching (outdoor and indoor image datasets respectively) and loop closure detection, which demonstrate the excellence of SAH-Net in mismatch removal and relative pose estimation compared to other state-of-the-art competitors. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
41. Similarity Distribution Density: An Optimized Approach to Outlier Detection.
- Author
-
Quan, Li, Gong, Tao, and Jiang, Kaida
- Subjects
OUTLIER detection ,SUPERVISED learning ,MISSING data (Statistics) ,DENSITY ,DATA analysis ,PROBLEM solving - Abstract
When dealing with uncertain data, traditional model construction methods often ignore or filter out noise data to improve model performance. However, this simple approach can lead to insufficient data utilization, model bias, reduced detection ability, and decreased robustness of detection models. Outliers can be considered as data that are inconsistent with other patterns at certain specific moments and are not always negative data, so their emergence is not always bad. In the process of data analysis, outliers play a crucial role in sample vector recognition, missing value processing, and model stability verification. In addition, unsupervised models have very high computation costs when recognizing outliers, especially non-parameterized unsupervised models. To solve the above problems, we used semi-supervised learning processes and used similarity as a negative selection criterion to propose a local density verification detection model (Vd-LOD). This model establishes similarity pseudo-labels for multi-label and multi-type samples, verifies the accuracy of outlier values based on local outlier factors, and increases the detector's sensitivity to outliers. The experimental results show that under different parameter settings with varying outlier quantities, Vd-LOD outperforms other detection models in terms of the significant increase in average time consumption caused by verifying the presence of relationships, while also achieving an approximate 6% improvement in average detection accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
42. Robust logistic regression with shift parameter estimation.
- Author
-
Shin, Bokyoung and Lee, Seokho
- Subjects
- *
PARAMETER estimation , *LOGISTIC regression analysis - Abstract
We investigate a shift parameter approach to logistic regression for robust classification. Shift parameter moves margin to the minimum of loss function. For robust estimation, margin-based logistic regression requires its own version of thresholding-type estimate which is different from residual-based regression. We discuss shift parameter estimation desirable to robust classification and propose some penalty functions producing such shift parameter estimates. Comparing to existing robust logistic regression methods requiring non-convex optimization or label transition modelling, our proposal is implemented in a simple alternating optimization: the classifier is obtained as a solution of conventional logistic regression with an offset and shift parameter is individually estimated in a closed form. We discuss some robust properties of the method and demonstrate its performance in linear and nonlinear classification with synthetic and real-world examples. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
43. Reference clock impact on GNSS clock outliers.
- Author
-
Maciuk, Kamil, Nistor, Sorin, Brusak, Ivan, Lewińska, Paulina, and Kudrys, Jacek
- Subjects
- *
GLOBAL Positioning System , *ORBIT determination , *OUTLIER detection - Abstract
With the advent of the Global Navigation Satellite System (GNSS), the need for precise and highly accurate orbit and clock products becomes crucial in processing GNSS data. Clocks in GNSS observations form the basis of positioning. Their high quality and stability enable high accuracy and the reliability of the obtained results. The clock modelling algorithms are continuously improved; thus, the accuracy of the clock products is evolving. At present, 8 Analysis Centers (ACs) contribute to the International GNSS Service final clock products. These products are based on GNSS observations on a network of reference stations, where for a given day one of the reference station clocks is the reference clock. In this paper, the authors determined the impact of the reference clock on the quality of clock product, especially outliers, for the first time. For this purpose, the multi-GNSS final clock products provided by the Center for Orbit Determination in Europe (CODE) for the period 2014–2021 (1773–2190 GPS week, 2921 days) were analysed. Analysis shows that by applying the Median Absolute Deviation (MAD) algorithm for outlier detection, the Passive Hydrogen Maser (PHM) clock installed on board the GALILEO satellites have the lowest level of noise, whereas the Block IIR GPS satellite launched in 1999 appears to have the highest levels of noise. Furthermore, the GNSS station OHIE3, when used as a reference clock, generates an increase in the level of noise, especially noticeable on the G09 and E03 satellites. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
44. New robust ridge estimators for the linear regression model with outliers.
- Author
-
Majid, Abdul, Amin, Muhammad, Aslam, Muhammad, and Ahmad, Shakeel
- Subjects
- *
REGRESSION analysis , *OUTLIER detection , *LEAST squares , *MONTE Carlo method - Abstract
The ridge regression estimator (RRE) is a widely used estimation method for the multiple linear regression model when the explanatory variables are correlated. The situation becomes problematic for the RRE when the data set contains outliers in the y-direction. The use of the RRE in the presence of outliers may have some adverse effects on parameter estimates. To address this issue, the robust ridge estimators based on M-estimator are available in the literature which are less sensitive to the presence of outliers. It is a well-known fact that the selection of ridge parameter k is very crucial while using the RRE and the same phenomenon may happen in the case of robust ridge estimators. This study proposes some robust ridge estimators for the ridge parameter k. The performance of proposed estimators is evaluated with the help of the Monte Carlo simulations and a real application where the mean squared error (MSE) is considered as a performance evaluation criterion. Results show a better performance of the proposed robust ridge estimators as compared to the RRE, least square and M-estimation methods. While for modified ridge M-estimator, different ridge parameters found to be better for different conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
45. Outlier detection in a repeated measure design.
- Author
-
Emenike, Ifesinachi Chinagorom
- Subjects
- *
REPEATED measures design , *OUTLIER detection , *GAUSSIAN distribution , *SAMPLE size (Statistics) , *STATISTICAL sampling , *EMPIRICAL research - Abstract
This paper considered the problem of outlier detection in a repeated measure design (RMD) using the Estimate distance and the Liu and Weng's residual methods. The comparative performance of the two methods in identifying outlier in a RMD was evaluated through empirical and simulation studies. For the empirical study, two types of outliers (outliers I and II) were randomly introduced into the real‐life dataset from physiological research at a time. The results obtained revealed that the Liu and Weng's Residual and Estimate Distance tests correctly detected the outlier I randomly introduced into all the subjects at a time. However, the Liu and Weng's Residual test was only able to correctly detect the outlier II in subjects 1, 4 and 6 while the Estimate distance test could not detect the outlier II in any of the subject. In the simulation study, random samples of size 105, 120, 135, and 150 with a corresponding number of subjects (k = 7, 8, 9, 10), respectively, were generated from a multivariate normal distribution. Two types of outliers (outliers I and II) were randomly introduced into the simulated datasets. The results of the simulation study indicated that the Liu and Weng's residual and Estimate distance methods correctly detected the outlier I randomly introduced into the simulated data for all the sample sizes considered. However, the Liu and Weng's Residual test outperformed the Estimate distance test in detecting the outlier II. Thus, the Liu and Weng's Residual proved to be more powerful than the Estimate Distance test in identifying outlier in a RMD. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
46. A Study of Least Absolute Deviation Fuzzy Transform.
- Author
-
Min, Hee-Jun and Jung, Hye-Young
- Subjects
SIGNAL-to-noise ratio ,IMAGE processing ,IMAGE compression ,SOFT computing ,IMAGE reconstruction ,IMAGE denoising - Abstract
Fuzzy transform (FT) is a soft computing method that has many successful applications. Least-squares fuzzy transform (LS-FT) combining L 2 -norm and FT was proposed by Patane in 2011, but it can be severely affected by the presence of outlier. To solve this problem, we proposed least absolute deviation fuzzy transform (LAD-FT) combining L 1 -norm and FT and verified the robustness of outlier through experiments based on the various functions. In the process, we found the solution of LAD-FT for a function of one variable cannot be directly extended to a function of two variables. This paper is a first attempt to prove this problem. We also propose a novel algorithm for applying the LAD-FT to a function of two variables. Since FT is already known as a useful tool for various image processing problems, we validate and compare the performance of FT, LS-FT, and LAD-FT on the three main perspectives, especially, image reconstruction, image denoising, and outlier robustness. Experiments are conducted by many various sizes of images and compression rates and peak signal to noise ratio (PSNR) and structural similarity index (SSIM) are used to measure the difference between two images. Results show that LAD-FT is robust to outlier, FT is superior in image reconstruction and image denoising, and SSIM has better performance than PSNR. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
47. The asymptotic distribution of robust maximum likelihood estimator with Huber function for the mixed spatial autoregressive model with outliers.
- Author
-
Yang, Zhen, Luan, Yihui, and Jiang, Jiming
- Subjects
- *
ASYMPTOTIC distribution , *MAXIMUM likelihood statistics , *AUTOREGRESSIVE models , *INFERENTIAL statistics , *HOME prices , *PARAMETER estimation - Abstract
There is a wide range of outliers in spatial data, and these potential outliers will have a great impact on parameter estimation and corresponding statistical inference. Relying on the framework of maximum likelihood estimation (MLE), we investigate the asymptotic distribution of robust ML estimator under the mixed spatial autoregressive models with outliers and compare it with that of the ML estimator. Furthermore, based on the asymptotic theoretical result, we conduct the confidence interval of robust MLE and MLE. Similar to the results of MLE, we construct the second-order-corrected robust confidence interval using the parametric and semi-parametric bootstrap method. Simulation studies using Monte Carlo show that the robust estimator with the Huber loss function is more accurate and outperforms the MLE in most sample settings when data is contaminated by outliers. Then the use of the method is demonstrated in the analysis of the Neighborhood Crimes Data and the Boston Housing Price Data. The results further support the eligibility of the robust method in practical situations. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
48. Makine Öğrenmesi Yöntemleriyle Anormal İçme Suyu Tüketimlerinin Tespit Edilmesi ve Tahmin Modellerinin Geliştirilmesi.
- Author
-
Güney, İsmail and Selvi, İhsan Hakan
- Abstract
In this study, it is predicted that there may be a certain order in the consumption of an important need such as drinking water by the household, as well as irregular consumption depending on different factors. Increasing population, limited drinking water resources, developing infrastructure and technology have increased the demand for drinking and utility water. There is a search for alternative water sources to meet this demand, but it is foreseen that these demands can be met by not wasting existing water and using it more efficiently. By using machine learning (ML) methods, which is a sub-branch of artificial intelligence (AI), drinking water consumption data in the past periods were analyzed, and ordinary and unusual consumption behavior models were extracted. It is envisaged that by detecting abnormal consumptions that may occur in drinking water consumption and informing the subscribers about this issue, it will be ensured that the consumption in the household remains within the normal consumption range. Although the amount of data collected, recorded and processed in today's IT world has increased significantly, it is known that the exact analysis is difficult in terms of time and cost. In this study, subscriber, meter, consumption, bill and payment data of 8,224 residential subscribers, whose water meter index reading is more than 160 periods throughout the province of Kayseri, between 2006 and 2022 (first 6 months) were taken into account. The data are combined on a spatial subscriber basis and a 41-features dataset is obtained. The dataset was transformed into a dataset with 24 features as a result of data preprocessing. In the study, 6 sub-datasets were obtained by using information gain (IG), gain ratio (GR), symmetric uncertainty coefficient (SU), pearson correlation coefficient (r), f-score and random forest (RF) feature selection methods. The 7th sub-dataset was obtained from the intersections of the selected features in the sub-datasets. In all datasets, abnormal and normal drinking water consumptions were determined by using 7 different ML anomaly analysis methods: tukey outlier labeling (TOL), forest of isolation (IF), z-score, copula-based outlier detection (COPOD), median absolute deviation (MAD), local outlier factor (LOF), and elliptical envelope (EE). At the beginning of the study were unsupervised drinking water consumption data at the end of the study, labeled as 4 different classes and the dataset was made supervised. Using the finally obtained supervised dataset, decision trees (DT), gaussian naive bayes (NB), k-nearest neighbors (KNN), logistic regression (LJR), multilayer perceptron neural network (MLP-NN), RF and gradient boosting (GB) have been developed consumption class estimation models with 7 different ML methods. As a result of the study, it has been proven that abnormal drinking water consumption can be detected by ML methods, and it has been revealed that necessary policies can be created for more efficient use of water without wasting water and measures can be taken for this. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
49. Application of state-of-the-art machine learning algorithms for slope stability prediction by handling outliers of the dataset.
- Author
-
Demir, Selçuk and Sahin, Emrehan Kutlug
- Subjects
- *
SLOPE stability , *MACHINE learning , *OUTLIER detection , *K-nearest neighbor classification , *SUPPORT vector machines , *RANDOM forest algorithms , *DECISION trees , *ROCK slopes - Abstract
This paper addresses the issue of the prediction of slope stability with machine learning (ML) applications. Five well-known and popular ML algorithms, namely neural network (NNet), decision tree (DT), support vector machine (SVM), k-nearest neighbor (kNN), and random forest (RF), are used to demonstrate the effectiveness of the ML algorithms for predicting binary classification of slope stability based on a case history dataset containing outliers. This study also evaluates the winsorization method used to treat outliers in the dataset by outlining the effect of outliers on the prediction performances of models. To this end, the performance of all the generated ML models is assessed and compared both for unwinsorized (e.g., raw) and winsorized datasets based on performance metrics (i.e., Recall, Precision, Accuracy, and F1-Score) obtained from the confusion matrix. The experimental outputs showed that the application of winsorization enhanced the prediction performance of the models, and thus, all ML models built with winsorized datasets outperformed the unwinsorized ones. In this paper, the RF model achieves the best prediction performance, especially in the case of the winsorized dataset used. Moreover, it is found that SVM is the most sensitive algorithm to outliers as against the other ML algorithms, while the kNN algorithm is the least among the applied algorithms. Results showed that the increment percentage of accuracy nearly reaches 20% for the SVM model and the following 18% for DT, 11% for NNet, 10% for RF, and 4% for kNN, respectively. Furthermore, the results of the study reveal not only the performance of ML algorithms for the slope stability problem but also show how the handling of outliers of a dataset affects the models' prediction performance. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
50. An Outlier Detection Study of Ozone in Kolkata India by the Classical Statistics, Statistical Process Control and Functional Data Analysis.
- Author
-
Ahmad, Mohammad, Cheng, Weihu, and Zhao, Xu
- Abstract
Air pollution is prevalent throughout the entire world due to the release of various gases such as NO
x , PM, SO2 , tropospheric ozone (O3 ), etc. Ground-stage ozone is the predominant issue in smog and is the product of the interplay between sunlight and emissions. The destructive impact on the health of the populace might also still occur in cities with noticeably clean air and where ozone levels hardly ever exceed safe limits. Therefore, the findings of small variations in air quality and the technique of regulating air contamination are thought-provoking. The study employs various techniques to effectively observe and assess strategies for detecting and eliminating outliers in ozone emissions from pollution episodes. This technique helps to describe the sources and exceedance values and enhance the value of monitoring the data. In this study, the data have some missing observations. The method of imputation, the classical statistical technique, the statistical process control (SPC) technique, functional data analysis (FDA), and functional process control help to fill in the data and detect outliers, trend deviations, and changes in ozone concentration at ground level. A comparison study is carried out using these three techniques: classical analysis, SPC, and FDA, and the results show how the statistical process control and functional data methods performed better than the classical technique for the detection of outliers and also in what way this methodology can enable an additional, comprehensive method of defining air pollution control measures and water pollution control measures. [ABSTRACT FROM AUTHOR]- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.