6,389 results on '"dimension reduction"'
Search Results
152. Data-driven dimension reduction for high-dimensional random vibration systems with cubic nonlinearity
- Author
-
Tian, Yanping, Jin, Xiaoling, Zhu, Guangyu, Hu, Yanchao, Wang, Yong, and Huang, Zhilong
- Published
- 2024
- Full Text
- View/download PDF
153. Functional Principal Component Analysis for Multiple Variables on Different Riemannian Manifolds
- Author
-
Wang, Haixu and Cao, Jiguo
- Published
- 2024
- Full Text
- View/download PDF
154. An application on forecasting for stock market prices: hybrid of some metaheuristic algorithms with multivariate adaptive regression splines
- Author
-
Sabancı, Dilek, Kılıçarslan, Serhat, and Adem, Kemal
- Published
- 2023
- Full Text
- View/download PDF
155. Reduced order modeling of blood perfusion in parametric multipatch liver lobules
- Author
-
Siddiqui, Ahsan Ali, Jessen, Etienne, Stoter, Stein K. F., Néron, David, and Schillinger, Dominik
- Published
- 2024
- Full Text
- View/download PDF
156. GraphPCA: a fast and interpretable dimension reduction algorithm for spatial transcriptomics data
- Author
-
Yang, Jiyuan, Wang, Lu, Liu, Lin, and Zheng, Xiaoqi
- Published
- 2024
- Full Text
- View/download PDF
157. Learning manifolds from non-stationary streams
- Author
-
Mahapatra, Suchismit and Chandola, Varun
- Published
- 2024
- Full Text
- View/download PDF
158. Dimension reduction and outlier detection of 3-D shapes derived from multi-organ CT images
- Author
-
Selle, Michael, Kircher, Magdalena, Schwennen, Cornelia, Visscher, Christian, and Jung, Klaus
- Published
- 2024
- Full Text
- View/download PDF
159. Deep learning assisted XRF spectra classification
- Author
-
Andric, Velibor, Kvascev, Goran, Cvetanovic, Milos, Stojanovic, Sasa, Bacanin, Nebojsa, and Gajic-Kvascev, Maja
- Published
- 2024
- Full Text
- View/download PDF
160. PredictEFC: a fast and efficient multi-label classifier for predicting enzyme family classes
- Author
-
Chen, Lei, Zhang, Chenyu, and Xu, Jing
- Published
- 2024
- Full Text
- View/download PDF
161. Estimation of place-based vulnerability scores for HIV viral non-suppression: an application leveraging data from a cohort of people with histories of using drugs
- Author
-
Nguyen, Trang Quynh, Roberts Lavigne, Laken C., Brantner, Carly Lupton, Kirk, Gregory D., Mehta, Shruti H., and Linton, Sabriya L.
- Published
- 2024
- Full Text
- View/download PDF
162. Band selection using hybridization of particle swarm optimization and crow search algorithm for hyperspectral data classification.
- Author
-
Giri, Ram Nivas, Janghel, Rekh Ram, and Pandey, Saroj Kumar
- Abstract
A Hyperspectral image (HSI) contains numerous spectral bands, providing better differentiation of ground objects. Although the data from HSI are very rich in information, their processing presents some difficulties in terms of computational effort and reduction of information redundancy. These difficulties stem mainly from the fact that the HSI consists of a large number of bands along with some redundant bands. Band selection (BS) is used to select a subset of bands to reduce processing costs and eliminate spectral redundancy. BS methods based on a metaheuristic approach have become popular in recent years. However, most BS methods based on a metaheuristic approach can get stuck in the local optimum and converge slowly due to a lack of balance between exploration and exploitation. In this paper, three BS methods are proposed for HSI data. The first method applies Crow Search Algorithm (CSA) for BS. The other two proposed methods, HPSOCSA_SP and HPSOCSA_SLP, are based on the hybridization of Particle Swarm Optimization (PSO) and CSA. The purpose of these hybridizations is to balance exploration and exploitation in a search process for optimal band selection and fast convergence. In hybridization techniques, PSO and CSA exchange informative data at each iteration. HPSOCSA_SP split the population into two equal parts. PSO is applied to one part and CSA to the other. HPSOCSA_SLP selects half of the top-performing members based on fitness. PSO and CSA are applied to the selected population sequentially. Our proposed models underwent rigorous testing on four HSI datasets and showed superiority over other metaheuristic techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
163. A Fast and Efficient Approach to Strength Prediction for Carbon/Epoxy Composites with Resin-Missing Defects.
- Author
-
Li, Hongfeng, Li, Feng, and Zhu, Lingxue
- Subjects
- *
CHEBYSHEV polynomials , *FINITE element method , *STATISTICAL errors , *PREDICTION models , *FORECASTING - Abstract
A novel method is proposed to quickly predict the tensile strength of carbon/epoxy composites with resin-missing defects. The univariate Chebyshev prediction model (UCPM) was developed using the dimension reduction method and Chebyshev polynomials. To enhance the computational efficiency and reduce the manual modeling workload, a parameterization script for the finite element model was established using Python during the model construction process. To validate the model, specimens with different defect sizes were prepared using the vacuum assistant resin infusion (VARI) process, the mechanical properties of the specimens were tested, and the model predictions were analyzed in comparison with the experimental results. Additionally, the impact of the order (second–ninth) on the predictive accuracy of the UCPM was examined, and the performance of the model was evaluated using statistical errors. The results demonstrate that the prediction model has a high prediction accuracy, with a maximum prediction error of 5.20% compared to the experimental results. A low order resulted in underfitting, while increasing the order can improve the prediction accuracy of the UCPM. However, if the order is too high, overfitting may occur, leading to a decrease in the prediction accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
164. An ensemble approach to determine the number of latent dimensions and assess its reliability.
- Author
-
Neishabouri, Asana and Desmarais, Michel C.
- Abstract
AbstractDetermining the number of latent dimensions (LD) of a data set is a ubiquitous problem, for which numerous methods have been developed. We compare some of the most effective ones on synthetic data, which allows proper evaluation given that the true number of LD is known. Results show that their performance is sensitive to data set attributes such as sparsity, number of observations in relation to number of features, and underlying feature distributions. Results also show this sensitivity is different across methods. This observation brings us to devise an ensemble technique to combine LD estimates from multiple methods and achieve an estimate that is more reliable than any single method. We also demonstrate that the variance of the estimates across the single methods is a good indicator of the expected loss of the ensemble-based LD estimate. This observation leads, in turn, to deriving a method for the assessment of the reliability of the estimate. Finally, we discuss the practical implications of the findings. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
165. Novel Machine Learning Approach for DDoS Cloud Detection: Bayesian-Based CNN and Data Fusion Enhancements.
- Author
-
AlSaleh, Ibtihal, Al-Samawi, Aida, and Nissirat, Liyth
- Subjects
- *
CONVOLUTIONAL neural networks , *INFORMATION technology , *MULTISENSOR data fusion , *MACHINE learning , *DENIAL of service attacks , *CLOUD computing - Abstract
Cloud computing has revolutionized the information technology landscape, offering businesses the flexibility to adapt to diverse business models without the need for costly on-site servers and network infrastructure. A recent survey reveals that 95% of enterprises have already embraced cloud technology, with 79% of their workloads migrating to cloud environments. However, the deployment of cloud technology introduces significant cybersecurity risks, including network security vulnerabilities, data access control challenges, and the ever-looming threat of cyber-attacks such as Distributed Denial of Service (DDoS) attacks, which pose substantial risks to both cloud and network security. While Intrusion Detection Systems (IDS) have traditionally been employed for DDoS attack detection, prior studies have been constrained by various limitations. In response to these challenges, we present an innovative machine learning approach for DDoS cloud detection, known as the Bayesian-based Convolutional Neural Network (BaysCNN) model. Leveraging the CICDDoS2019 dataset, which encompasses 88 features, we employ Principal Component Analysis (PCA) for dimensionality reduction. Our BaysCNN model comprises 19 layers of analysis, forming the basis for training and validation. Our experimental findings conclusively demonstrate that the BaysCNN model significantly enhances the accuracy of DDoS cloud detection, achieving an impressive average accuracy rate of 99.66% across 13 multi-class attacks. To further elevate the model's performance, we introduce the Data Fusion BaysFusCNN approach, encompassing 27 layers. By leveraging Bayesian methods to estimate uncertainties and integrating features from multiple sources, this approach attains an even higher average accuracy of 99.79% across the same 13 multi-class attacks. Our proposed methodology not only offers valuable insights for the development of robust machine learning-based intrusion detection systems but also enhances the reliability and scalability of IDS in cloud computing environments. This empowers organizations to proactively mitigate security risks and fortify their defenses against malicious cyber-attacks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
166. FAST METRIC EMBEDDING INTO THE HAMMING CUBE.
- Author
-
DIRKSEN, SJOERD, MENDELSON, SHAHAR, and STOLLENWERK, ALEXANDER
- Subjects
- *
CIRCULANT matrices , *CUBES - Abstract
We consider the problem of embedding a subset of Rn into a low-dimensional Hamming cube in an almost isometric way. We construct a simple, data-oblivious, and computationally efficient map that achieves this task with high probability; we first apply a specific structured ran- dom matrix, which we call the double circulant matrix; using that a matrix requires linear storage and matrix-vector multiplication that can be performed in near-linear time. We then binarize each vector by comparing each of its entries to a random threshold, selected uniformly at random from a well-chosen interval. We estimate the number of bits required for this encoding scheme in terms of two natural geometric complexity parameters of the set: its Euclidean covering numbers and its localized Gaussian complexity. The estimate we derive turns out to be the best that one can hope for, up to logarithmic terms. The key to the proof is a phenomenon of independent interest: we show that the double circulant matrix mimics the behavior of the Gaussian matrix in two important ways. First, it maps an arbitrary set in Rn into a set of well-spread vectors. Second, it yields a fast near- isometric embedding of any finite subset of l2n into l1m. This embedding achieves the same dimension reduction as the Gaussian matrix in near-linear time, under an optimal condition---up to logarithmic factors---on the number of points to be embedded. This improves a well-known construction due to Ailon and Chazelle. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
167. Asymptotic distribution of one‐component partial least squares regression estimators in high dimensions.
- Author
-
Basa, Jerónimo, Cook, R. Dennis, Forzani, Liliana, and Marcos, Miguel
- Subjects
- *
ASYMPTOTIC distribution , *LEAST squares , *REGRESSION analysis , *PARTIAL least squares regression , *GAUSSIAN distribution , *SAMPLE size (Statistics) - Abstract
In a one‐component partial least squares fit of a linear regression model, we find the asymptotic normal distribution, as the sample size and number of predictors approach infinity, of a user‐selected univariate linear combination of the coefficient estimator and give corresponding asymptotic confidence and prediction intervals. Simulation studies and an analysis of a dopamine dataset are used to support our theoretical asymptotic results and their practical application. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
168. Accuracy Improvement of Breast Tumor Detection based on Dimension Reduction in the Spatial and Edge Features and Edge Structure in the Image.
- Author
-
Seyed Abolghasemi, Samaneh Sadat, Emadi, Mehran, and Karimi, Mohsen
- Subjects
- *
BREAST , *BREAST tumors , *FEATURE extraction , *ULTRASONIC imaging , *IMAGE processing , *PRINCIPAL components analysis - Abstract
Ultrasound images and ultrasound imaging method is an effective method in examining the challenges, problems and diseases related to the breast in women. The contrast of these images is generally very weak, however, the tumor tissue and calcium grains are evident in it. Methods based on image processing are widely used in breast tumor diagnosis and classification. In this article, a method based on pattern recognition is presented in order to detect the type of tumor. GLCM-based features are extracted from the target area, and Gabor and texture features. Then it is reduced with the help of dimension reduction methods based on principal component analysis. Finally, with the help of the improved classification of Ada KKN with ELM, they are grouped into three categories. Evaluation criteria such as Accuracy (98.81%), sensitivity (91.51%) and specificity (94.54%) compared to other similar methods show the superiority of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
169. Dynamic Principal Component Analysis in High Dimensions.
- Author
-
Hu, Xiaoyu and Yao, Fang
- Subjects
- *
PRINCIPAL components analysis , *TIKHONOV regularization , *OPTIMIZATION algorithms , *COVARIANCE matrices , *REGULARIZATION parameter , *EIGENVECTORS , *STOCHASTIC processes - Abstract
Principal component analysis is a versatile tool to reduce dimensionality which has wide applications in statistics and machine learning. It is particularly useful for modeling data in high-dimensional scenarios where the number of variables p is comparable to, or much larger than the sample size n. Despite an extensive literature on this topic, researchers have focused on modeling static principal eigenvectors, which are not suitable for stochastic processes that are dynamic in nature. To characterize the change in the entire course of high-dimensional data collection, we propose a unified framework to directly estimate dynamic eigenvectors of covariance matrices. Specifically, we formulate an optimization problem by combining the local linear smoothing and regularization penalty together with the orthogonality constraint, which can be effectively solved by manifold optimization algorithms. We show that our method is suitable for high-dimensional data observed under both common and irregular designs, and theoretical properties of the estimators are investigated under l q (0 ≤ q ≤ 1) sparsity. Extensive experiments demonstrate the effectiveness of the proposed method in both simulated and real data examples. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
170. Nonparametric Estimation of Repeated Densities with Heterogeneous Sample Sizes.
- Author
-
Qiu, Jiaming, Dai, Xiongtao, and Zhu, Zhengyuan
- Subjects
- *
SAMPLE size (Statistics) , *ELECTRONIC health records , *FUNCTIONAL analysis , *DENSITY - Abstract
We consider the estimation of densities in multiple subpopulations, where the available sample size in each subpopulation greatly varies. This problem occurs in epidemiology, for example, where different diseases may share similar pathogenic mechanism but differ in their prevalence. Without specifying a parametric form, our proposed method pools information from the population and estimate the density in each subpopulation in a data-driven fashion. Drawing from functional data analysis, low-dimensional approximating density families in the form of exponential families are constructed from the principal modes of variation in the log-densities. Subpopulation densities are subsequently fitted in the approximating families based on likelihood principles and shrinkage. The approximating families increase in their flexibility as the number of components increases and can approximate arbitrary infinite-dimensional densities. We also derive convergence results of the density estimates formed with discrete observations. The proposed methods are shown to be interpretable and efficient in simulation studies as well as applications to electronic medical record and rainfall data. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
171. Comparing community detection algorithms in psychometric networks: A Monte Carlo simulation.
- Author
-
Christensen, Alexander P., Garrido, Luis Eduardo, Guerra-Peña, Kiero, and Golino, Hudson
- Subjects
- *
MONTE Carlo method , *PSYCHOMETRICS , *FACTOR analysis , *ALGORITHMS - Abstract
Identifying the correct number of factors in multivariate data is fundamental to psychological measurement. Factor analysis has a long tradition in the field, but it has been challenged recently by exploratory graph analysis (EGA), an approach based on network psychometrics. EGA first estimates a network and then applies the Walktrap community detection algorithm. Simulation studies have demonstrated that EGA has comparable or better accuracy for recovering the same number of communities as there are factors in the simulated data than factor analytic methods. Despite EGA's effectiveness, there has yet to be an investigation into whether other sparsity induction methods or community detection algorithms could achieve equivalent or better performance. Furthermore, unidimensional structures are fundamental to psychological measurement yet they have been sparsely studied in simulations using community detection algorithms. In the present study, we performed a Monte Carlo simulation using the zero-order correlation matrix, GLASSO, and two variants of a non-regularized partial correlation sparsity induction methods with several community detection algorithms. We examined the performance of these method–algorithm combinations in both continuous and polytomous data across a variety of conditions. The results indicate that the Fast-greedy, Louvain, and Walktrap algorithms paired with the GLASSO method were consistently among the most accurate and least-biased overall. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
172. Sparse multiway canonical correlation analysis for multimodal stroke recovery data.
- Author
-
Das, Subham, West, Franklin D., and Park, Cheolwoo
- Abstract
Conventional canonical correlation analysis (CCA) measures the association between two datasets and identifies relevant contributors. However, it encounters issues with execution and interpretation when the sample size is smaller than the number of variables or there are more than two datasets. Our motivating example is a stroke‐related clinical study on pigs. The data are multimodal and consist of measurements taken at multiple time points and have many more variables than observations. This study aims to uncover important biomarkers and stroke recovery patterns based on physiological changes. To address the issues in the data, we develop two sparse CCA methods for multiple datasets. Various simulated examples are used to illustrate and contrast the performance of the proposed methods with that of the existing methods. In analyzing the pig stroke data, we apply the proposed sparse CCA methods along with dimension reduction techniques, interpret the recovery patterns, and identify influential variables in recovery. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
173. DR-LSTM: Dimension reduction based deep learning approach to predict stock price.
- Author
-
Ah-ram Lee, Jae Youn Ahn, Ji Eun Choi, and Kyongwon Kim
- Subjects
DIMENSION reduction (Statistics) ,DEEP learning ,STOCK prices - Abstract
In recent decades, increasing research attention has been directed toward predicting the price of stocks in financial markets using deep learning methods. For instance, recurrent neural network (RNN) is known to be competitive for datasets with time-series data. Long short term memory (LSTM) further improves RNN by providing an alternative approach to the gradient loss problem. LSTM has its own advantage in predictive accuracy by retaining memory for a longer time. In this paper, we combine both supervised and unsupervised dimension reduction methods with LSTM to enhance the forecasting performance and refer to this as a dimension reduction based LSTM (DR-LSTM) approach. For a supervised dimension reduction method, we use methods such as sliced inverse regression (SIR), sparse SIR, and kernel SIR. Furthermore, principal component analysis (PCA), sparse PCA, and kernel PCA are used as unsupervised dimension reduction methods. Using datasets of real stock market index (S&P 500, STOXX Europe 600, and KOSPI), we present a comparative study on predictive accuracy between six DR-LSTM methods and time series modeling. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
174. Non-linear Feature Selection Based on Convolution Neural Networks with Sparse Regularization.
- Author
-
Wu, Wen-Bin, Chen, Si-Bao, Ding, Chris, and Luo, Bin
- Abstract
The efficacy of feature selection methods in dimensionality reduction and enhancing the performance of learning algorithms has been well documented. Traditional feature selection algorithms often grapple with delineating non-linear relationships between features and responses. While deep neural networks excel in capturing such non-linearities, their inherent "black-box" nature detracts from their interpretability. Furthermore, the complexity of deep network architectures can give rise to prolonged training durations and the challenge of vanishing gradients. This study aims to refine network structures, hasten network training, and bolster model interpretability without forfeiting accuracy. This paper delves into a sparse-weighted feature selection approach grounded in convolutional neural networks, termed the low-dimensional sparse-weighted feature selection network (LSWFSNet). LSWFSNet integrates a convolutional selection kernel between the input and convolutional layers, facilitating weighted convolutional calculations on input data while imposing sparse constraints on the selection kernel. Features with significant weights in this kernel are earmarked for subsequent operations in the LSWFSNet computational domain, while those with negligible weights are eschewed to diminish model intricacy. By streamlining the network's input data, LSWFSNet refines the post-convolution feature maps, thus simplifying its structure. Acknowledging the intrinsic interconnections within the data, our study amalgamates diverse sparse constraints into a cohesive objective function. This ensures the convolutional kernel's sparsity while acknowledging the structural dynamics of the data. Notably, the foundational convolutional network in this method can be substituted with any deep convolutional network, contingent upon suitable adjustments to the convolutional selection kernel in relation to input data dimensions. The LSWFSNet model was tested on human emotion electroencephalography (EEG) datasets curated by Shanghai Jiao Tong University. When various sparse constraint methodologies were employed, the convolutional kernel manifested sparsity. Regions in the convolutional selection kernel with non-zero weights were identified as having strong correlations with emotional responses. The empirical outcomes not only resonate with extant neuroscience insights but also supersede the baseline network in accuracy metrics. LSWFSNet's applicability extends to pivotal tasks like keypoint recognition, be it the extraction of salient pixels in facial detection models or the isolation of target attributes in object detection frameworks. This study's significance is anchored in the amalgamation of sparse constraint techniques with deep convolutional networks, supplanting traditional fully connected networks. This fusion amplifies model interpretability and broadens its applicability, notably in image processing arenas. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
175. An empirical contribution towards measuring Sustainability-oriented Entrepreneurial Intentions: A Study of Indian Youth.
- Author
-
Srivastava, Mayuri, Shivani, Shradha, and Dutta, Sraboni
- Subjects
INTENTION ,LITERATURE reviews ,PRINCIPAL components analysis - Abstract
A rigorous exploration of the available literature outlined a theoretical and empirical gap related to the identification of the key antecedents of Sustainability-oriented Entrepreneurial Intentions and the availability of a comprehensive scale for measurement of Sustainability-oriented Entrepreneurial Intentions of an individual. Therefore, this study aimed to identify the antecedents of Sustainability-oriented Entrepreneurial Intentions from an exhaustive review of the literature and then use the antecedents to propose a scale for measuring Sustainability-oriented Entrepreneurial Intentions. Significant findings from the available literature were collated, data was collected through a structured survey of youth in India, and appropriate statistical procedures for Dimension Reduction using Principal Component Analysis on Statistical Package for Social Sciences v28 were applied. Finally, Internal Reliability and Face validity of the proposed scale was also tested with responses obtained from experts. A comprehensive 31-item measurement scale of Sustainability-oriented Entrepreneurial Intentions was structured based upon the Dimension Reduction results. Aligned with the Sustainable Development Goals adopted by the United Nations in 2015, practitioners and researchers have advanced the need to promote a new perspective of entrepreneurship: Sustainable Entrepreneurship. Given that it is empirically well established that intentions lead to behaviour, it is imperative to study Sustainability-oriented Entrepreneurial Intentions in order to promote Sustainable Entrepreneurship Behaviour, especially among youth. The findings can help policymakers and educationists design strategies to expand the adoption of Sustainable Entrepreneurship in the population by strengthening the identified antecedents. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
176. Black-Box Acceleration of Monotone Convex Program Solvers.
- Author
-
London, Palma, Vardi, Shai, Eghbali, Reza, and Wierman, Adam
- Subjects
CONVEX programming ,LINEAR programming ,RESEARCH personnel ,MACHINE learning ,JOB titles ,POSTDOCTORAL programs - Abstract
When and where was the study conducted: This work was done in 2018, 2019 and 2020 when Palma London was a PhD student at Caltech and Shai Vardi was a postdoc at Caltech. This work was also done in part while Palma London was visiting Purdue University, and while Reza Eghbali was a postdoctoral fellow the Simons Institute for the Theory of Computing. Adam Wierman is a professor at Caltech. Article Summary and Talking Points: Please describe the primary purpose/findings of your article in 3 sentences or less. This paper presents a framework for accelerating (speeding up) existing convex program solvers. Across engineering disciplines, a fundamental bottleneck is the availability of fast, efficient, accurate solvers. We present an acceleration method that speeds up linear programing solvers such as Gurobi and convex program solvers such as the Splitting Conic Solver by two orders of magnitude. Please include 3-5 short bullet points of "Need to Know" items regarding this research and your findings. - Optimizations problems arise in many engineering and science disciplines, and developing efficient optimization solvers is key to future innovation. - We speed up linear programing solver Gurobi by two orders of magnitude. - This work applies to optimization problems with monotone objective functions and packing constraints, which is a common problem formulation across many disciplines. Please identify 2 pull quotes from your article that best capture the novelty and impact of your research. "We propose a framework for accelerating exact and approximate convex programming solvers for packing linear programming problems and a family of convex programming problems with linear constraints. Analytically, we provide worst-case guarantees on the run time and the quality of the solution produced. Numerically, we demonstrate that our framework speeds up Gurobi and the Splitting Conic Solver by two orders of magnitude, while maintaining a near-optimal solution." "Our focus in this paper is on a class of packing problems for which data is either very costly or hard to obtain. In these situations, the number of data points available is much smaller than the number of variables. In a machine-learning setting, this regime is increasingly prevalent because it is often advantageous to consider larger and larger feature spaces, while not necessarily obtaining proportionally more data." Article Implications - Please describe in 5 sentences or less the innovative takeaway(s) of your research.This framework applies to optimization problems with monotone objective functions and packing constraints, which is a common problem formulation across many disciplines, including machine learning, inference, and resource allocation. Providing fast solvers for these problems is crucial. We exploit characteristics of the problem structure and leverage statistical properties of the problem constraints to allow us to speed up optimization solvers. We present worst-case guarantees on run-time, and empirically demonstrate speedups of two orders of magnitude. - Please describe in 5 sentences or less why your findings would be of interest to the general public.Many problems in engineering, science, math, and machine learning involve solving an optimization problem. Fast, efficient optimization solvers are key to future innovation in science and engineering. This work presents a tool to accelerate existing convex solvers, and thus can also be applied to future solvers. As the size of datasets grow it is even more crucial to have fast solvers. - Who would be the most impacted by your research (i.e. by industry, job title, consumer category).Our work impacts machine-learning researchers and optimization researchers, in industry or academia. This paper presents a black-box framework for accelerating packing optimization solvers. Our method applies to packing linear programming problems and a family of convex programming problems with linear constraints. The framework is designed for high-dimensional problems, for which the number of variables n is much larger than the number of measurements m. Given an (m × n) problem, we construct a smaller (m × ϵ n) problem, whose solution we use to find an approximation to the optimal solution. Our framework can accelerate both exact and approximate solvers. If the solver being accelerated produces an α-approximation, then we produce a (1 − ϵ) / α 2 -approximation of the optimal solution to the original problem. We present worst-case guarantees on run time and empirically demonstrate speedups of two orders of magnitude. Funding: Financial support from the National Science Foundation [Grants AitF-1637598, CNS-151894, and CPS-154471] and the Linde Institute is gratefully acknowledged. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
177. Coexistence mechanism of ecological specialists and generalists based on a network dimension reduction method.
- Author
-
Duan, Dongli, Hang, Jiale, Wu, Chengxing, Bai, Xue, Rong, Yisheng, and Hou, Gege
- Subjects
- *
COEXISTENCE of species , *HABITATS , *ECOSYSTEMS - Abstract
As an ecological strategy for species coexistence, some species adapt to a wide range of habitats, while others specialize in particular environments. Such 'generalists' and 'specialists' achieve normal ecological balance through a complex network of interactions between species. However, the role of these interactions in maintaining the coexistence of generalist and specialist species has not been elucidated within a general theoretical framework. Here, we analyze the ecological mechanism for the coexistence of specialist and generalist species in a class of mutualistic and competitive interaction ecosystems based on the network dimension reduction method. We find that ecological specialists and generalists can be identified based on the number of their respective interactions. We also find, using real‐world empirical network simulations, that the removal of ecological generalists can lead to the collapse of local ecosystems, which is rarely observed with the loss of ecological specialists. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
178. Support vector machine in ultrahigh-dimensional feature space.
- Author
-
Kazemi, Mohammad
- Subjects
- *
SUPPORT vector machines , *FEATURE selection , *DATA mining , *AUTOMATIC classification - Abstract
Classification and feature selection play an important role in knowledge discovery in high-dimensional data. Although penalized Support Vector Machine (SVM) is among the most powerful methods for classification and automatic feature selection in high-dimensional feature space, it is not directly applicable to ultrahigh-dimensional cases, wherein the number of features far exceeds the sample size. In this paper, we suggest an efficient two-step method for simultaneous classification and identifying important features in the setting of ultrahigh-dimensional models. Specifically, we first develop an independence screening procedure to reduce the dimensionality of the feature space to a moderate scale, and then penalized support vector machine is applied to the dimension-reduced feature space to select important features further and estimate the coefficients, via a (penalized) model fit. Implementation of the suggested two-step method is not limited by the dimensionality of the models and entails much less computational cost. Numerical examples and a real data analysis are used to demonstrate the finite sample performance of our proposal. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
179. Artificial intelligence for classification and regression tree based feature selection method for network intrusion detection system in various telecommunication technologies.
- Author
-
Kumar, Neeraj and Kumar, Upendra
- Subjects
- *
ARTIFICIAL intelligence , *REGRESSION trees , *TELECOMMUNICATION systems , *TIME complexity , *MACHINE learning , *FEATURE selection , *PYTHON programming language - Abstract
Now a days, secure data communication over computer network system is a major issue in which impact of feature reduction plays a vital role to secure network by early detection of intrusion. It not only keeps a deep impact on the performance of existing Intrusion Detection System (IDS) algorithms but also affects the computational complexity. Although lots of techniques have been offered for feature reduction by researchers and they have their own perks and quirks, but still they are several flows. To manipulate the same dataset for different classifiers and to select different number of features for the detection of attacks are not only having too much computational cost but also time consuming. The experiments have been carried out using "Python" programming language based library "Scikit‐Learn" software on "Kddcup99" dataset from UCI machine learning repository as a test bed. In this article a classification and regression trees (CART) based feature selection algorithm has been proposed which offers optimum set of features. Further optimum set of features has been offered by our proposed work passed over various classifiers for training and testing to establish network intrusion detection system (NIDS). We have compared the performance accuracy of various existing machine learning (ML) based classification algorithms and obtained higher performance accuracy with lower computational cost. The proposed algorithm having optimum time complexity and accuracy in designing of IDS. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
180. Elegant robustification of sparse partial least squares by robustness-inducing transformations.
- Author
-
Serneels, Sven, Insolia, Luca, and Verdonck, Tim
- Subjects
- *
LEAST squares , *MONTE Carlo method , *GLASS analysis , *ANALYTICAL chemistry - Abstract
Robust alternatives exist for many statistical estimators. State-of-the-art robust methods are fine-tuned to optimize the balance between statistical efficiency and robustness. The resulting estimators may, however, require computationally intensive iterative procedures. Recently, several robustness-inducing transformations (RIT) have been introduced. By merely applying such transformations as a preprocessing step, a computationally very fast robust estimator can be constructed. Building upon the example of sparse partial least squares (SPLS), this work shows that such an approach can lead to performance close to the computationally more intensive methods. This article proves that the resulting estimator is robust, by showing that it has a bounded influence function. To establish the latter, this article is first to formulate SPLS at the population level and therefrom, to derive (classical) SPLS's influence function. It also shows that the breakdown point of the resulting regression coefficients can approach 50% when properly tuned. Extensive Monte Carlo simulations highlight the advantages of the new method, which performs comparably and at times even better than existing robust methods based on M-estimation, yet at a significantly lower computational burden. Two application studies related to the cancer cell panel of the National Cancer Institute and the chemical analysis of archaeological glass vessels further support the applicability of the proposed robustness-inducing transformations, combined with SPLS. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
181. Multi-surrogate framework with an adaptive selection mechanism for production optimization.
- Author
-
Jia-Lin Wang, Li-Ming Zhang, Kai Zhang, Jian Wang, Jian-Ping Zhou, Wen-Feng Peng, Fa-Liang Yin, Chao Zhong, Xia Yan, Pi-Yang Liu, Hua-Qing Zhang, Yong-Fei Yang, and Hai Sun
- Abstract
Data-driven surrogate models that assist with efficient evolutionary algorithms to find the optimal development scheme have been widely used to solve reservoir production optimization problems. However, existing research suggests that the effectiveness of a surrogate model can vary depending on the complexity of the design problem. A surrogate model that has demonstrated success in one scenario may not perform as well in others. In the absence of prior knowledge, finding a promising surrogate model that performs well for an unknown reservoir is challenging. Moreover, the optimization process often relies on a single evolutionary algorithm, which can yield varying results across different cases. To address these limitations, this paper introduces a novel approach called the multi-surrogate framework with an adaptive selection mechanism (MSFASM) to tackle production optimization problems. MSFASM consists of two stages. In the first stage, a reduced-dimensional broad learning system (BLS) is used to adaptively select the evolutionary algorithm with the best performance during the current optimization period. In the second stage, the multi-objective algorithm, non-dominated sorting genetic algorithm II (NSGA-II), is used as an optimizer to find a set of Pareto solutions with good performance on multiple surrogate models. A novel optimal point criterion is utilized in this stage to select the Pareto solutions, thereby obtaining the desired development schemes without increasing the computational load of the numerical simulator. The two stages are combined using sequential transfer learning. From the two most important perspectives of an evolutionary algorithm and a surrogate model, the proposed method improves adaptability to optimization problems of various reservoir types. To verify the effectiveness of the proposed method, four 100-dimensional benchmark functions and two reservoir models are tested, and the results are compared with those obtained by six other surrogate-model-based methods. The results demonstrate that our approach can obtain the maximum net present value (NPV) of the target production optimization problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
182. Evidence for the factor structure of formal thought disorder: A systematic review.
- Author
-
Zamperoni, Georgia, Tan, Eric J., Rossell, Susan L., Meyer, Denny, and Sumner, Philip J.
- Subjects
- *
FACTOR structure , *FACTOR analysis , *PSYCHOSES - Abstract
Disorganised speech, or, formal thought disorder (FTD), is considered one of the core features of psychosis, yet its factor structure remains debated. This systematic review aimed to identify the core dimensions of FTD. In line with the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA), a systematic review was conducted on the FTD factor analytic literature. Sixteen studies were identified from PsycINFO, PubMed and Web of Science between October 1971 and January 2023. Across the 39 factor analyses investigated, findings demonstrated the prominence of a three-factor structure. Broad agreement was found for two factors within the three-factor model, which were typically referred to as disorganisation and negative , with the exact nature of the third dimension requiring further clarification. The quality assessment revealed some methodological challenges relating to the assessment of FTD and conducted factor analyses. Future research should clarify the exact nature of the third dimension across different patient groups and methodologies to determine whether a consistent transdiagnostic concept of FTD can be elucidated. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
183. Driving mode analysis—How uncertain functional inputs propagate to an output.
- Author
-
Vander Wiel, Scott A., Grosskopf, Michael J., Michaud, Isaac J., and Neudecker, Denise
- Subjects
- *
DATA visualization , *NEUTRONS , *FAST neutrons - Abstract
Driving mode analysis elucidates how correlated features of uncertain functional inputs jointly propagate to produce uncertainty in the output of a computation. Uncertain input functions are decomposed into three terms: the mean functions, a zero‐mean driving mode, and zero‐mean residual. The random driving mode varies along a single direction, having fixed functional shape and random scale. It is uncorrelated with the residual, and under linear error propagation, it produces an output variance equal to that of the full input uncertainty. Finally, the driving mode best represents how input uncertainties propagate to the output because it minimizes expected squared Mahalanobis distance amongst competitors. These characteristics recommend interpretation of the driving mode as the single‐degree‐of‐freedom component of input uncertainty that drives output uncertainty. We derive the functional driving mode, show its superiority to other seemingly sensible definitions, and demonstrate the utility of driving mode analysis in an application. The application is the simulation of neutron transport in criticality experiments. The uncertain input functions are nuclear data that describe how 239$$ {}^{239} $$Pu reacts to bombardment by neutrons. Visualization of the driving mode helps scientists understand what aspects of correlated functional uncertainty have effects that either reinforce or cancel one another in propagating to the output of the simulation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
184. A study on ship hull form transformation using convolutional autoencoder.
- Author
-
Seo, Jeongbeom, Kim, Dayeon, and Lee, Inwon
- Subjects
ARTIFICIAL neural networks ,CONVOLUTIONAL neural networks ,TANKERS ,PARTICLE swarm optimization ,NAVAL architecture ,MACHINE learning - Abstract
The optimal ship hull form in contemporary design practice primarily consists of three parts: hull form modification, performance prediction, and optimization. Hull form modification is a crucial step to affect optimization efficiency because the baseline hull form is varied to search for performance improvements. The conventional hull form modification methods mainly rely on human decisions and intervention. As a direct expression of the three-dimensional hull form, the lines are not appropriate for machine learning techniques. This is because they do not explicitly express a meaningful performance metric despite their relatively large data dimension. To solve this problem and develop a novel machine-based hull form design technique, an autoencoder, which is a dimensional reduction technique based on an artificial neural network, was created in this study. Specifically, a convolutional autoencoder was designed; firstly, a convolutional neural network (CNN) preprocessor was used to effectively train the offsets, which are the half-width coordinate values on the hull surface, to extract feature maps. Secondly, the stacked encoder compressed the feature maps into an optimal lower dimensional-latent vector. Finally, a transposed convolution layer restored the dimension of the lines. In this study, 21 250 hull forms belonging to three different ship types of containership, LNG carrier, and tanker, were used as training data. To describe the hull form in more detail, each was divided into several zones, which were then input into the CNN preprocessor separately. After the training, a low-dimensional manifold consisting of the components of the latent vector was derived to represent the distinctive hull form features of the three ship types considered. The autoencoder technique was then combined with another novel approach of the surrogate model to form an objective function neural network. Further combination with the deterministic particle swarm optimization method led to a successful hull form optimization example. In summary, the present convolutional autoencoder has demonstrated its significance within the machine learning-based design process for ship hull forms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
185. A data-adaptive dimension reduction for functional data via penalized low-rank approximation.
- Author
-
Park, Yeonjoo, Oh, Hee-Seok, and Lim, Yaeji
- Abstract
We introduce a data-adaptive nonparametric dimension reduction tool to obtain a low-dimensional approximation of functional data contaminated by erratic measurement errors following symmetric or asymmetric distributions. We propose to apply robust submatrix completion techniques to matrices consisting of coefficients of basis functions calculated by projecting the observed trajectories onto a given orthogonal basis set. In this process, we use a composite asymmetric Huber loss function to accommodate domain-specific erratic behaviors in a data-adaptive manner. We further incorporate the L 1 penalty to regularize the smoothness of latent factor curves. The proposed method can also be applied to partially observed functional data, where each trajectory contains individual-specific missing segments. Moreover, since our method does not require estimating the covariance operator, the extension to any dimensional functional data observed over a continuum is straightforward. We demonstrate the empirical performance in estimating lower-dimensional space and reconstruction of trajectories of the proposed method through simulation studies. We then apply the proposed method to two real datasets, one-dimensional Advanced Metering Infrastructure (AMI) data in South Korea and two-dimensional max precipitation spatial data collected in North America and South America. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
186. High-dimensional local polynomial regression with variable selection and dimension reduction.
- Author
-
Cheung, Kin Yap and Lee, Stephen M. S.
- Abstract
Variable selection and dimension reduction have been considered in nonparametric regression for improving the precision of estimation, via the formulation of a semiparametric multiple index model. However, most existing methods are ill-equipped to cope with a high-dimensional setting where the number of variables may grow exponentially fast with sample size. We propose a new procedure for simultaneous variable selection and dimension reduction in high-dimensional nonparametric regression problems. It consists essentially of penalised local polynomial regression, with the bandwidth matrix regularised to facilitate variable selection, dimension reduction and optimal estimation at the oracle convergence rate, all in one go. Unlike most existing methods, the proposed procedure does not require explicit bandwidth selection or an additional step of dimension determination using techniques like cross-validation or principal components. Empirical performance of the procedure is illustrated with both simulated and real data examples. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
187. ε-Isometric Dimension Reduction for Incompressible Subsets of ℓp.
- Author
-
Eskenazis, Alexandros
- Subjects
- *
NORMED rings , *LINEAR operators , *PROBABILITY measures , *UNIT ball (Mathematics) , *EMPIRICAL research - Abstract
Fix p ∈ [ 1 , ∞) , K ∈ (0 , ∞) , and a probability measure μ . We prove that for every n ∈ N , ε ∈ (0 , 1) , and x 1 , ... , x n ∈ L p (μ) with ‖ max i ∈ { 1 , ... , n } | x i | ‖ L p (μ) ≤ K , there exist d ≤ 32 e 2 (2 K) 2 p log n ε 2 and vectors y 1 , ... , y n ∈ ℓ p d such that ∀ i , j ∈ { 1 , ... , n } , ‖ x i - x j ‖ L p (μ) p - ε ≤ ‖ y i - y j ‖ ℓ p d p ≤ ‖ x i - x j ‖ L p (μ) p + ε. Moreover, the argument implies the existence of a greedy algorithm which outputs { y i } i = 1 n after receiving { x i } i = 1 n as input. The proof relies on a derandomized version of Maurey's empirical method (1981) combined with a combinatorial idea of Ball (1990) and a suitable change of measure. Motivated by the above embedding, we introduce the notion of ε -isometric dimension reduction of the unit ball B E of a normed space (E , ‖ · ‖ E) and we prove that B ℓ p does not admit ε -isometric dimension reduction by linear operators for any value of p ≠ 2 . [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
188. Model-Based Tensor Low-Rank Clustering.
- Author
-
Li, Junge and Mai, Qing
- Subjects
- *
PARAMETER estimation , *PARSIMONIOUS models , *EXPECTATION-maximization algorithms - Abstract
Tensors have become prevalent in business applications and scientific studies. It is of great interest to analyze and understand the heterogeneity in tensor-variate observations. We propose a novel tensor low-rank mixture model (TLMM) to conduct efficient estimation and clustering on tensors. The model combines the Tucker low-rank structure in mean contrasts and the separable covariance structure to achieve parsimonious and interpretable modeling. To implement efficient computation under this model, we develop a low-rank enhanced expectation-maximization (LEEM) algorithm. The pseudo E-step and the pseudo M-step are carefully designed to incorporate variable selection and efficient parameter estimation. Numerical results in extensive experiments demonstrate the encouraging performance of the proposed method compared to popular vector and tensor methods. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
189. Statistical Significance of Clustering with Multidimensional Scaling.
- Author
-
Shen, Hui, Bhamidi, Shankar, and Liu, Yufeng
- Subjects
- *
MULTIDIMENSIONAL scaling , *STATISTICAL significance , *PARAMETER estimation , *RESEARCH personnel - Abstract
Clustering is a fundamental tool for exploratory data analysis. One central problem in clustering is deciding if the clusters discovered by clustering methods are reliable as opposed to being artifacts of natural sampling variation. Statistical significance of clustering (SigClust) is a recently developed cluster evaluation tool for high-dimension, low-sample size data. Despite its successful application to many scientific problems, there are cases where the original SigClust may not work well. Furthermore, for specific applications, researchers may not have access to the original data and only have the dissimilarity matrix. In this case, clustering is still a valuable exploratory tool, but the original SigClust is not applicable. To address these issues, we propose a new SigClust method using multidimensional scaling (MDS). The underlying idea behind MDS-based SigClust is that one can achieve low-dimensional representations of the original data via MDS using only the dissimilarity matrix and then apply SigClust on the low-dimensional MDS space. The proposed MDS-based SigClust can circumvent the challenge of parameter estimation of the original method in high-dimensional spaces while keeping the essential clustering structure in the MDS space. Both simulations and real data applications demonstrate that the proposed method works remarkably well for assessing the statistical significance of clustering. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
190. Optimal population‐specific HLA imputation with dimension reduction.
- Author
-
Douillard, Venceslas, dos Santos Brito Silva, Nayane, Bourguiba‐Hachemi, Sonia, Naslavsky, Michel S., Scliar, Marilia O., Duarte, Yeda A. O., Zatz, Mayana, Passos‐Bueno, Maria Rita, Limou, Sophie, Gourraud, Pierre‐Antoine, Launay, Élise, Castelli, Erick C., and Vince, Nicolas
- Subjects
- *
GENOME-wide association studies , *DISEASE susceptibility , *GENOMICS , *GENOMES , *GENETIC polymorphisms - Abstract
Human genomics has quickly evolved, powering genome‐wide association studies (GWASs). SNP‐based GWASs cannot capture the intense polymorphism of HLA genes, highly associated with disease susceptibility. There are methods to statistically impute HLA genotypes from SNP‐genotypes data, but lack of diversity in reference panels hinders their performance. We evaluated the accuracy of the 1000 Genomes data as a reference panel for imputing HLA from admixed individuals of African and European ancestries, focusing on (a) the full dataset, (b) 10 replications from 6 populations, and (c) 19 conditions for the custom reference panels. The full dataset outperformed smaller models, with a good F1‐score of 0.66 for HLA‐B. However, custom models outperformed the multiethnic or population models of similar size (F1‐scores up to 0.53, against up to 0.42). We demonstrated the importance of using genetically specific models for imputing populations, which are currently underrepresented in public datasets, opening the door to HLA imputation for every genetic population. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
191. Dimensional reduction and emergence of defects in the Oseen-Frank model for nematic liquid crystals.
- Author
-
Canevari, Giacomo and Segatti, Antonio
- Abstract
In this paper we discuss the behavior of the Oseen-Frank model for nematic liquid crystals in the limit of vanishing thickness. More precisely, in a thin slab $ \Omega\times (0, h) $ with $ \Omega\subset \mathbb{R}^2 $ and $ h>0 $ we consider the one-constant approximation of the Oseen-Frank model for nematic liquid crystals. We impose Dirichlet boundary conditions on the lateral boundary and weak anchoring conditions on the top and bottom faces of the cylinder $ \Omega\times (0, h) $. The Dirichlet datum has the form $ (g, 0) $, where $ g\colon\partial\Omega\to \mathbb{S}^1 $ has non-zero winding number. Under appropriate conditions on the scaling, in the limit as $ h\to 0 $ we obtain a behavior that is similar to the one observed in the asymptotic analysis (see [7]) of the two-dimensional Ginzburg-Landau functional. More precisely, we rigorously prove the emergence of a finite number of defect points in $ \Omega $ having topological charges that sum to the degree of the boundary datum. Moreover, the position of these points is governed by a Renormalized Energy, as in the seminal results of Bethuel, Brezis and Hélein [7]. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
192. Human Activity Discovery With Automatic Multi-Objective Particle Swarm Optimization Clustering With Gaussian Mutation and Game Theory.
- Author
-
Hadikhani, Parham, Lai, Daphne Teck Ching, and Ong, Wee-Hong
- Published
- 2024
- Full Text
- View/download PDF
193. The vector error correction index model: representation, estimation and identification.
- Author
-
Cubadda, Gianluca and Mazzali, Marco
- Subjects
VECTOR error-correction models ,AUTOREGRESSIVE models ,TIME series analysis - Abstract
This paper extends the multivariate index autoregressive model to the case of cointegrated time series of order (1,1). In this new modelling, namely the vector error-correction index model (VECIM), the first differences of series are driven by some linear combinations of the variables, namely the indexes. When the indexes are significantly fewer than the variables, the VECIM achieves a substantial dimension reduction with reference to the vector error correction model. We show that the VECIM allows one to decompose the reduced-form errors into sets of common and uncommon shocks, and that the former can be further decomposed into permanent and transitory shocks. Moreover, we offer a switching algorithm for optimal estimation of the VECIM. Finally, we document the practical value of the proposed approach by both simulations and an empirical application, where we search for the shocks that drive the aggregate fluctuations at different frequency bands in the US. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
194. Robust Hashing via Global and Local Invariant Features for Image Copy Detection.
- Author
-
XIAOPING LIANG, ZHENJUN TANG, ZHIXIN LI, MENGZHU YU, HANYUN ZHANG, and XIANQUAN ZHANG
- Subjects
PRINCIPAL components analysis ,FOURIER transforms - Abstract
Robust hashing is a powerful technique for processing large-scale images. Currently, many reported image hashing schemes do not perform well in balancing the performances of discrimination and robustness, and thus they cannot efficiently detect image copies, especially the image copies with multiple distortions. To address this, we exploit global and local invariant features to develop a novel robust hashing for image copy detection. A critical contribution is the global feature calculation by gray level co-occurrence moment learned from the saliency map determined by the phase spectrum of quaternion Fourier transform, which can significantly enhance discrimination without reducing robustness. Another essential contribution is the local invariant feature computation via Kernel Principal Component Analysis (KPCA) and vector distances. As KPCA can maintain the geometric relationships within image, the local invariant features learned with KPCA and vector distances can guarantee discrimination and compactness. Moreover, the global and local invariant features are encrypted to ensure security. Finally, the hash is produced via the ordinal measures of the encrypted features for making a short length of hash. Numerous experiments are conducted to show efficiency of our scheme. Compared with some well-known hashing schemes, our scheme demonstrates a preferable classification performance of discrimination and robustness. The experiments of detecting image copies with multiple distortions are tested and the results illustrate the effectiveness of our scheme. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
195. Text mining analysis on PubMed database
- Author
-
Lalami, Seyedeh Z. Rezaei
- Subjects
unstructured text data ,co-occurrence model ,co-occurrence matrix ,co-occurrence network ,dimension reduction ,search engine ,R Elastic Map ,clustered data ,thesis - Abstract
Due to the increasing amount of unstructured text data, information retrieval from large volumes of data has become highly important. Applying most of the algorithms, such as classification and clustering, is challenging because of the high dimensionality of the text data. This study investigates a novel, co-occurrence model of text data to help reduce the dimension of the data set. We present a graph-based text mining approach for discovering similar documents in a scientific corpus and use it in a search engine that is built into the R Shiny web application. The Biological Scientific Corpus (BSC) is a collection of 764,213 PubMed-indexed English abstracts of research papers and proceedings papers, chosen to reflect the widest range of abstracts of scientific works published in 2012. Analysis of the co-occurrence matrix helps to understand the feature of interconnection between the words. Applying the community detection method, we discovered hubs and strong communities in the co-occurrence network and use them to reduce the dimension of the network. After dimension reduction, we produced meaningful clusters of the data set. To see whether or not the clustering is correct we investigated the distribution of the authors of the papers over the clusters and the results were satisfactory. Finally, we used a hierarchal approach to develop a search engine on the data set that accepts a query from a user and responds with a set of retrieved documents. The main advantage of this search engine is the ability to take long text, and abstracts, as a query. Another part of this work is to reproduce the well-known Elastic Map algorithm in R as an open resource for data visualization. We used the R Elastic Map package we developed to present a zoomable and rotatable visualization of a map fitted to clustered data in a two and three-dimensional space.
- Published
- 2022
- Full Text
- View/download PDF
196. Private Query Release via the Johnson-Lindenstrauss Transform
- Author
-
Aleksandar Nikolov
- Subjects
differential privacy ,dimension reduction ,Johnson-Lindenstrauss ,query release ,K-norm mechanism ,Technology ,Social Sciences - Abstract
We introduce a new method for releasing answers to statistical queries with differential privacy, based on the Johnson-Lindenstrauss lemma. The key idea is to randomly project the query answers to a lower dimensional space so that the distance between any two vectors of feasible query answers is preserved up to an additive error. Then we answer the projected queries using a simple noise-adding mechanism, and lift the answers up to the original dimension. Using this method, we give, for the first time, purely differentially private mechanisms with optimal worst case sample complexity under average error for answering a workload of $k$ queries over a universe of size $N$. As other applications, we give the first purely private efficient mechanisms with optimal sample complexity for computing the covariance of a bounded high-dimensional distribution, and for answering 2-way marginal queries. We also show that, up to the dependence on the error, a variant of our mechanism is nearly optimal for every given query workload.
- Published
- 2024
- Full Text
- View/download PDF
197. BOF steelmaking endpoint carbon content and temperature soft sensor model based on supervised weighted local structure preserving projection
- Author
-
Su YunKe, Liu Hui, Chen FuGang, Liu JianXun, Li Heng, and Xue XiaoJun
- Subjects
bof steelmaking endpoint ,soft sensor ,just-in-time ,dimension reduction ,Technology ,Chemical technology ,TP1-1185 ,Chemicals: Manufacture, use, etc. ,TP200-248 - Abstract
Endpoint control stands as a pivotal determinant of steel quality. However, the data derived from the BOF steelmaking process are characterized by high dimension, with intricate nonlinear relationships between variables and diverse working conditions. Traditional dimension reduction does not fully use non-local structural information within manifold shapes. To address these challenges, the article introduces a novel approach termed supervised weighting-based local structure preserving projection. This method dynamically includes label information using sparse representation and constructs weighted submanifolds to mitigate the influence of irrelevant labels. Subsequently, trend match is employed to establish the same distribution datasets for the submanifold. The global and local initial neighborhood maps are then constructed, extracting non-local relations from the submanifold by analyzing manifold curvature. This process eliminates interference from non-nearest-neighbor points on the manifold while preserving the local geometric structure, facilitating adaptive neighborhood parameter change. The proposed method enhances the adaptability of the model to changing working conditions and improves overall performance. The carbon content prediction maintains a controlled error range of within ±0.02%, achieving an accuracy rate of 82.50%. The temperature prediction maintains a controlled error range of within ±10°C, achieving an accuracy rate of 79.00%.
- Published
- 2024
- Full Text
- View/download PDF
198. Effect of network structure on the accuracy of resilience dimension reduction
- Author
-
Min Liu, Qiang Guo, and Jianguo Liu
- Subjects
system’s resilience ,dimension reduction ,network structure ,social network ,assortativity ,clustering coefficient ,Physics ,QC1-999 - Abstract
Dimension reduction is an effective method for system’s resilience analysis. In this paper, we investigate the effect of network structure on the accuracy of resilience dimension reduction. First, we introduce the resilience dimension reduction method and define the evaluation indicator of the resilience dimension reduction method. Then, by adjusting node connections, preferential connection mechanisms, and connection probabilities, we generate artificial networks, small-world networks and social networks with tunable assortativity coefficients, average clustering coefficients, and modularities, respectively. Experimental results for the gene regulatory dynamics show that the network structures with positive assortativity, large clustering coefficient, and significant community can enhance the accuracy of resilience dimension reduction. The result of this paper indicates that optimizing network structure can enhance the accuracy of resilience dimension reduction, which is of great significance for system resilience analysis and provides a new perspective and theoretical basis for selecting dimension reduction methods in system resilience analysis.
- Published
- 2024
- Full Text
- View/download PDF
199. Application of Dimension Reduction Methods for Stress Detection
- Author
-
Erhan Bergil
- Subjects
feature selection ,dimension reduction ,stress detection ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Effective detection of stress situations plays an important role in combating it. This is the main source of motivation for research to identify and evaluate different psychological conditions. Different monitor signals are used to identify individuals' stress situations in daily life. Electroencephalogram (EEG) signals are the main component used to detect stress and depression. The long-term acquisition of this signals partially interrupts daily life and negatively affects it. Researchers are trying to develop wearable technologies that can eliminate this disadvantage. In this study, stress situations are detected utilizing different sensors without EEG signals. The achievements of three different classification methods for different dimensional feature spaces have been compared. The effects of the feature selection and dimension reduction methods on the system performance have been analyzed. During the dimension reduction process, Minimum Redundancy Maximum Relevance (MRMR), Anova, Chi-2, Relieff, Kruskal Wallis (KW) and Principal Component Analysis (PCA) methods are implemented. Support Vector Machines (SVM), Linear Discriminant Analysis (LDA) and k-Nearest Neighbor (k-NN) methods are used as classifiers. The best performance is achieved with 96.2 % accuracy in 15-dimensional by using LDA and PCA methods together.
- Published
- 2023
- Full Text
- View/download PDF
200. Auto-UFSTool: An Automatic Unsupervised Feature Selection Toolbox for MATLAB
- Author
-
Farhad Abedinzadeh Torghabeh, Yeganeh Modaresnia, and Seyyed Abed Hosseini
- Subjects
unsupervised feature selection ,matlab ,automatic toolbox ,dimension reduction ,unsupervised learning ,Information technology ,T58.5-58.64 ,Computer software ,QA76.75-76.765 - Abstract
Various data analysis research has recently become necessary in to find and select relevant features without class labels using Unsupervised Feature Selection (UFS) approaches. Despite the fact that several open-source toolboxes provide feature selection techniques to reduce redundant features, data dimensionality, and computation costs, these approaches require programming knowledge, which limits their popularity and has not adequately addressed unlabeled real-world data. Automatic UFS Toolbox (Auto-UFSTool) for MATLAB, proposed in this study, is a user-friendly and fully-automatic toolbox that utilizes several UFS approaches from the most recent research. It is a collection of 25 robust UFS approaches, most of which were developed within the last five years. Therefore, a clear and systematic comparison of competing methods is feasible without requiring a single line of code. Even users without any previous programming experience may utilize the actual implementation by the Graphical User Interface (GUI). It also provides the opportunity to evaluate the feature selection results and generate graphs that facilitate the comparison of subsets of varying sizes. It is freely accessible in the MATLAB File Exchange repository and includes scripts and source code for each technique. The link to this toolbox is freely available to the general public on: bit.ly/AutoUFSTool
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.