Descriptor: "dimension reduction" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"dimension reduction"' showing total 6,389 results

Start Over Descriptor "dimension reduction"

6,389 results on '"dimension reduction"'

1. A Topological Evaluation Model for Manifold Learning and Embedding Techniques

Author: Reyes, Victor, Liarou, Margarita, Marchand-Maillet, Stephane, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Chávez, Edgar, editor, Kimia, Benjamin, editor, Lokoč, Jakub, editor, Patella, Marco, editor, and Sedmidubsky, Jan, editor
Published: 2025
Full Text: View/download PDF

2. Practical Aspects in Machine Learning

Author: Gupta, Pramod, Sehgal, Naresh Kumar, Acken, John M., Gupta, Pramod, Sehgal, Naresh Kumar, and Acken, John M.
Published: 2025
Full Text: View/download PDF

3. Structure-adaptive canonical correlation analysis for microbiome multi-omics data.

Author: Deng, Linsui, Tang, Yanlin, Zhang, Xianyang, and Chen, Jun
Abstract: Sparse canonical correlation analysis (sCCA) has been a useful approach for integrating different high-dimensional datasets by finding a subset of correlated features that explain the most correlation in the data. In the context of microbiome studies, investigators are always interested in knowing how the microbiome interacts with the host at different molecular levels such as genome, methylol, transcriptome, metabolome and proteome. sCCA provides a simple approach for exploiting the correlation structure among multiple omics data and finding a set of correlated omics features, which could contribute to understanding the host-microbiome interaction. However, existing sCCA methods do not address compositionality, and its application to microbiome data is thus not optimal. This paper proposes a new sCCA framework for integrating microbiome data with other high-dimensional omics data, accounting for the compositional nature of microbiome sequencing data. It also allows integrating prior structure information such as the grouping structure among bacterial taxa by imposing a "soft" constraint on the coefficients through varying penalization strength. As a result, the method provides significant improvement when the structure is informative while maintaining robustness against a misspecified structure. Through extensive simulation studies and real data analysis, we demonstrate the superiority of the proposed framework over the state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. A two-stage framework for credit scoring based on feature augmentation and dimension reduction.

Author: Deng, Xuanjie and Wang, Siyang
Abstract: Credit scoring is an important area for financial risk management, where a small improvement in prediction performance will save significant losses for financial institutions. This article proposes a two-stage framework for credit scoring based on feature augmentation and dimension reduction to improve the model performance. In this framework, the logarithm marginal density ratios transformation is employed to provide more prominent features for credit scoring, fully utilizing original data information. For the dimension reduction process, we apply three classifiers for L 1 penalized logistic regression (add a sum of the parameter absolute value to the criterion function), extreme gradient boosting with L 1 regularization term, and k-nearest neighbor with sequential forward selection to two different credit data sets. The results indicate that this two-stage framework can improve the performance of credit scoring models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. An improved optimization method combining particle swarm optimization and dimension reduction kriging surrogate model for high-dimensional optimization problems.

Author: Li, Junxiang, Han, Ben, Chen, Jianqiao, and Wu, Zijun
Subjects: *PARTICLE swarm optimization, *PRINCIPAL components analysis, *KRIGING, *ALGORITHMS
Abstract: An improved optimization method is proposed which combines the particle swarm optimization (PSO) algorithm with the dimension reduction kriging surrogate model (DK), namely PSO + DK. In this method, a surrogate model is dynamically constructed and updated between the lower dimensional principal components and the response, instead of between the high-dimensional design variables and the response as in traditional methods. The key point of DK is that the principal components analysis (PCA) is integrated with the active learning kriging surrogate model (AK). Since PCA can reduce the dimension effectively, DK makes surrogate model construction possible for complex high-dimensional optimization problems. Numerical examples and four classical engineering problems are presented to validate the effectiveness of the proposed method. The results show that the proposed method can decrease the computational cost significantly while guaranteeing the precision compared with PSO and PSO + AK for high-dimensional optimization problems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Reduced-dimension Bayesian optimization for model calibration of transient vapor compression cycles.

Author: Ma, Jiacheng, Kim, Donghun, and Braun, James E.
Subjects: *VAPOR compression cycle, *DYNAMIC models, *CALIBRATION
Abstract: Development and calibration of first-principles dynamic models of vapor compression cycles (VCCs) is of critical importance for applications that include control design and fault detection and diagnostics. Nevertheless, the inherent complexity of models that are represented by large systems of differential–algebraic equations leads to significant challenges for model calibration processes that utilize classical gradient-based methods. Bayesian optimization (BO) is a sample-efficient and gradient-free approach using a probabilistic surrogate model and optimal search over a feasible parameter space. Despite the benefits of BO in reducing computational costs, challenges remain in dealing with a high-dimensional calibration task resulting from a large set of parameters that have significant impacts on system behavior and need to be calibrated simultaneously. This paper presents a reduced-dimension BO framework for calibrating transient VCCs models where the calibration space is projected to a low-dimensional subspace for accelerating convergence of the solution algorithm and consequently reducing the number of transient simulations. The proposed approach was demonstrated via two case studies associated with different VCC applications where 10 parameters were calibrated in each case using laboratory measurements. The reduced-dimension BO framework only required 1 / 8 th of the iterations associated with a standard BO method that deals with high-dimensional calibration parameters for converged solutions and yielded comparable accuracy. Furthermore, both calibrated models revealed significant accuracy improvements compared to uncalibrated models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Randomized low‐rank approximation of parameter‐dependent matrices.

Author: Kressner, Daniel and Lam, Hei Yin
Subjects: *NUMERICAL solutions for linear algebra, *RANDOM matrices, *COMPUTATIONAL statistics, *ERROR probability, *APPROXIMATION error
Abstract: This work considers the low‐rank approximation of a matrix A(t)$$ A(t) $$ depending on a parameter t$$ t $$ in a compact set D⊂ℝd$$ D\subset {\mathbb{R}}^d $$. Application areas that give rise to such problems include computational statistics and dynamical systems. Randomized algorithms are an increasingly popular approach for performing low‐rank approximation and they usually proceed by multiplying the matrix with random dimension reduction matrices (DRMs). Applying such algorithms directly to A(t)$$ A(t) $$ would involve different, independent DRMs for every t$$ t $$, which is not only expensive but also leads to inherently non‐smooth approximations. In this work, we propose to use constant DRMs, that is, A(t)$$ A(t) $$ is multiplied with the same DRM for every t$$ t $$. The resulting parameter‐dependent extensions of two popular randomized algorithms, the randomized singular value decomposition and the generalized Nyström method, are computationally attractive, especially when A(t)$$ A(t) $$ admits an affine linear decomposition with respect to t$$ t $$. We perform a probabilistic analysis for both algorithms, deriving bounds on the expected value as well as failure probabilities for the approximation error when using Gaussian random DRMs. Both, the theoretical results and numerical experiments, show that the use of constant DRMs does not impair their effectiveness; our methods reliably return quasi‐best low‐rank approximations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. A principal-weighted penalized regression model and its application in economic modeling.

Author: Sun, Mingwei and Xu, Murong
Subjects: *PRINCIPAL components analysis, *REGRESSION analysis, *ECONOMIC models, *ECONOMIC statistics, *COMPARATIVE studies
Abstract: This paper introduces a novel Principal-Weighted Penalized (PWP) regression model, designed for dimensionality reduction in large datasets without sacrificing essential information. This new model retains the favorable features of the principal component analysis (PCA) technique and penalized regression models. It weighs the variables in a large data set based on their contributions to principal components identified by PCA, enhancing its capacity to uncover crucial hidden variables. The PWP model also efficiently performs variable selection and estimates regression coefficients through regularization. An application of the proposed model on high-dimensional economic data is studied. The results of comparative studies in simulations and a real example in economic modeling demonstrate its superior fitting and predictive abilities. The resulting model excels in accuracy and interpretability, outperforming existing methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Scale invariant and efficient estimation for groupwise scaled envelope model.

Author: Zhang, Jing and Huang, Zhensheng
Abstract: Motivated by different groups containing different group information under the heteroscedastic error structure, we propose the groupwise scaled envelope model that is invariable to scale changes and is permissible for distinct regression coefficients and the heteroscedastic error structure across groups. It retains the potential of the scaled envelope methods to keep the scale invariant and allows for both different regression coefficients and different error structures for diverse groups. Further, we demonstrate the maximum likelihood estimators and its theoretical properties including parameter identifiability, asymptotic distribution and consistency of the groupwise scaled envelope estimator. Lastly, simulation studies and a real-data example demonstrate the advantages of the groupwise scaled envelope estimators, including a comparison with the standard model estimators, groupwise envelope estimators, scaled envelope estimators and separate scaled envelope estimators. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Probability of entering an orthant by correlated fractional Brownian motion with drift: exact asymptotics.

Author: Dȩbicki, Krzysztof, Ji, Lanpeng, and Novikov, Svyatoslav
Subjects: QUADRATIC programming, LARGE deviations (Mathematics), PROBABILITY theory
Abstract: For { B H (t) = (B H , 1 (t) , ... , B H , d (t)) ⊤ , t ≥ 0 } , where { B H , i (t) , t ≥ 0 } , 1 ≤ i ≤ d are mutually independent fractional Brownian motions, we obtain the exact asymptotics of P (∃ t ≥ 0 : A B H (t) - μ t > ν u) , u → ∞ , where A is a non-singular d × d matrix and μ = (μ 1 , ... , μ d) ⊤ ∈ R d , ν = (ν 1 , ... , ν d) ⊤ ∈ R d are such that there exists some 1 ≤ i ≤ d such that μ i > 0 , ν i > 0. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Variety Evasive Subspace Families.

Author: Guo, Zeyu
Abstract: We introduce the problem of constructing explicit variety evasive subspace families. Given a family F of subvarieties of a projective or affine space, a collection H of projective or affine k -subspaces is (F , ϵ) -evasive if for every V ∈ F , all but at most ϵ -fraction of W ∈ H intersect every irreducible component of V with (at most) the expected dimension. The problem of constructing such an explicit subspace family generalizes both deterministic black-box polynomial identity testing (PIT) and the problem of constructing explicit (weak) lossless rank condensers. Using Chow forms, we construct explicit k -subspace families of polynomial size that are evasive for all varieties of bounded degree in a projective or affine n -space. As one application, we obtain a complete derandomization of Noether’s normalization lemma for varieties of low degree in a projective or affine n -space. In another application, we obtain a simple polynomial-time black-box PIT algorithm for depth-4 arithmetic circuits with bounded top fan-in and bottom fan-in that are not in the Sylvester–Gallai configuration, improving and simplifying a result of Gupta (ECCC TR 14-130). As a complement of our explicit construction, we prove a tight lower bound for the size of k -subspace families that are evasive for degree- d varieties in a projective n -space. When n - k = n Ω (1) , the lower bound is superpolynomial unless d is bounded. The proof uses a dimension counting argument on Chow varieties that parametrize projective subvarieties. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Shrinkage for extreme partial least-squares.

Author: Arbel, Julyan, Girard, Stéphane, and Lorenzo, Hadrien
Abstract: This work focuses on dimension-reduction techniques for modelling conditional extreme values. Specifically, we investigate the idea that extreme values of a response variable can be explained by nonlinear functions derived from linear projections of an input random vector. In this context, the estimation of projection directions is examined, as approached by the extreme partial least squares (EPLS) method—an adaptation of the original partial least squares (PLS) method tailored to the extreme-value framework. Further, a novel interpretation of EPLS directions as maximum likelihood estimators is introduced, utilizing the von Mises–Fisher distribution applied to hyperballs. The dimension reduction process is enhanced through the Bayesian paradigm, enabling the incorporation of prior information into the projection direction estimation. The maximum a posteriori estimator is derived in two specific cases, elucidating it as a regularization or shrinkage of the EPLS estimator. We also establish its asymptotic behavior as the sample size approaches infinity. A simulation data study is conducted in order to assess the practical utility of our proposed method. This clearly demonstrates its effectiveness even in moderate data problems within high-dimensional settings. Furthermore, we provide an illustrative example of the method’s applicability using French farm income data, highlighting its efficacy in real-world scenarios. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Deconvolution closure for mesoscopic continuum models of particle systems.

Author: Panchenko, Alexander, Barannyk, Lyudmyla L., and Cooper, Kevin
Subjects: *LINEAR momentum, *GRANULAR materials, *OPERATOR equations, *PARTICLE dynamics, *HEAT flux
Abstract: We present a framework for derivation of closed‐form continuum equations governing mesoscale dynamics of large particle systems. Balance equations for spatial averages such as density, linear momentum, and energy were previously derived by a number of authors. These equations are exact, but are not in closed form because the stress and the heat flux cannot be evaluated without the knowledge of particle positions and velocities. Recently, we proposed a method for approximating exact fluxes by true constitutive equations, that is, using nonlocal operators acting only on the average density and velocity. In the paper, constitutive operators are obtained by using filtered regularization methods from the theory of ill‐posed problems. We also formulate conditions on fluctuation statistics which permit approximating these operators by local equations. The performance of the method is tested numerically using Fermi–Pasta–Ulam particle chains with two different potentials: the classical Lennard–Jones and the purely repulsive potential used in granular materials modeling. The initial conditions incorporate velocity fluctuations on scales that are smaller than the size of the averaging window. Simulation results show good agreement between the exact stress and its closed‐form approximation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Addressing overfitting in classification models for transport mode choice prediction: a practical application in the Aburrá Valley, Colombia.

Author: Salazar-Serna, Kathleen, Barona, Sergio A., García, Isabel C., Cadavid, Lorena, and Franco, Carlos J.
Subjects: *K-nearest neighbor classification, *DATA distribution, *RANDOM forest algorithms, *MACHINE learning, *DATA reduction, *CHOICE of transportation, *DIMENSION reduction (Statistics)
Abstract: Overfitting poses a significant limitation in mode choice prediction using classification models, often worsened by the proliferation of features from encoding categorical variables. While dimensionality reduction techniques are widely utilized, their effects on travel-mode choice models’ performance have yet to be comparatively studied. This research compares the impact of dimensionality reduction methods (PCA, CATPCA, FAMD, LDA) on the performance of multinomial models and various supervised learning classifiers (XGBoost, Random Forest, Naive Bayes, K-Nearest Neighbors, Multinomial Logit) for predicting travel mode choice. Utilizing survey data from the Aburrá Valley in Colombia, we detail the process of analyzing derived dimensions and selecting optimal models for both overall and class-specific predictions. Results indicate that dimension reduction enhances predictive power, particularly for less common transport modes, providing a strategy to address class imbalance without modifying data distribution. This methodology deepens understanding of travel behavior, offering valuable insights for modelers and policymakers in developing regions with similar characteristics. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Functional projection <italic>K</italic>-means.

Author: Rocci, Roberto and Gattone, Stefano A.
Subjects: *STATISTICAL smoothing, *LEAST squares, *DATA reduction, *FUNCTIONAL analysis, *DATA analysis, *CENTROID
Abstract: AbstractA new technique for simultaneous clustering and dimensionality reduction of functional data is proposed. The observations are projected into a low-dimensional subspace and clustered by means of a functional K-means. The subspace and the partition are estimated simultaneously by minimizing the within deviance in the reduced space. This allows us to find new dimensions with a very low within deviance, which should correspond to a high level of discriminant power. However, in some cases, the total deviance explained by the new dimensions is so low as to make the subspace, and therefore the partition identified in it, insignificant. To overcome this drawback, we add to the loss a penalty equal to the negative total deviance in the reduced space. In this way, subspaces with a low deviance are avoided. We show how several existing methods are particular cases of our proposal simply by varying the weight of the penalty. The estimation is improved by adding a regularization term to the loss in order to take into account the functional nature of the data by smoothing the centroids. In contrast to existing literature, which largely considers the smoothing as a pre-processing step, in our proposal regularization is integrated with the identification of both subspace and cluster partition. An alternating least squares algorithm is introduced to compute model parameter estimates. The effectiveness of our proposal is demonstrated through its application to both real and simulated data. Supplementary materials for this article are available online. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Statistical Inference for Counting Processes Under Shape Heterogeneity.

Author: Sheng, Ying and Sun, Yifei
Subjects: *INFERENTIAL statistics, *PARAMETER estimation, *HETEROGENEITY, *COUNTING, *HOSPITAL care
Abstract: ABSTRACT Proportional rate models are among the most popular methods for analyzing recurrent event data. Although providing a straightforward rate‐ratio interpretation of covariate effects, the proportional rate assumption implies that covariates do not modify the shape of the rate function. When the proportionality assumption fails to hold, we propose to characterize covariate effects on the rate function through two types of parameters: the shape parameters and the size parameters. The former allows the covariates to flexibly affect the shape of the rate function, and the latter retains the interpretability of covariate effects on the magnitude of the rate function. To overcome the challenges in simultaneously estimating the two sets of parameters, we propose a conditional pseudolikelihood approach to eliminate the size parameters in shape estimation, followed by an event count projection approach for size estimation. The proposed estimators are asymptotically normal with a root‐n$$ n $$ convergence rate. Simulation studies and an analysis of recurrent hospitalizations using SEER‐Medicare data are conducted to illustrate the proposed methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Evaporation-driven tear film thinning and breakup in two space dimensions.

Author: Chen, Qinying, Driscoll, Tobin A., and Braun, R. J.
Abstract: Evaporation profiles have a strong effect on tear film thinning and breakup (TBU), a key factor in dry eye disease (DED). In experiments, TBU is typically seen to occur in patterns that locally can be circular (spot), linear (streak), or intermediate to those states. We investigate a two-dimensional (2D) model of localized TBU using a Fourier spectral collocation method to observe how the evaporation distribution affects the resulting dynamics of tear film thickness and osmolarity, among other variables. We find that the dynamics are not simply an addition of individual 1D solutions of independent TBU events, and we show how the TBU quantities of interest vary continuously between spots and streaks in the shape of the evaporation distribution. We also find a significant speedup by using a proper orthogonal decomposition to reduce the dimension of the numerical system. The speedup will be especially useful for future applications of the model to inverse problems, allowing the clinical observation at scale of quantities that are thought to be important to DED but not directly measurable in vivo within TBU locales. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Disentangling dynamic and stochastic modes in multivariate time series.

Author: Uhl, Christian, Stiehl, Annika, Weeger, Nicolas, Schlarb, Markus, and Hüper, Knut
Subjects: NONLINEAR differential equations, LINEAR differential equations, BLIND source separation, ORDINARY differential equations, PRINCIPAL components analysis
Abstract: A signal decomposition is presented that disentangles the deterministic and stochastic components of a multivariate time series. The dynamical component analysis (DyCA) algorithm is based on the assumption that an unknown set of ordinary differential equations (ODEs) describes the dynamics of the deterministic part of the signal. The algorithm is thoroughly derived and accompanied by a link to the GitHub repository containing the algorithm. The method was applied to both simulated and real-world data sets and compared to the results of principal component analysis (PCA), independent component analysis (ICA), and dynamic mode decomposition (DMD). The results demonstrate that DyCA is capable of separating the deterministic and stochastic components of the signal. Furthermore, the algorithm is able to estimate the number of linear and non-linear differential equations and to extract the corresponding amplitudes. The results demonstrate that DyCA is an effective tool for signal decomposition and dimension reduction of multivariate time series. In this regard, DyCA outperforms PCA and ICA and is on par or slightly superior to the DMD algorithm in terms of performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Identifying relevant features of CSE-CIC-IDS2018 dataset for the development of an intrusion detection system.

Author: Göcs, László and Johanyák, Zsolt Csaba
Subjects: *INFORMATION technology, *COMPUTER network traffic, *FEATURE selection, *CLASSIFICATION algorithms, *PYTHONS
Abstract: Intrusion detection systems (IDSs) are essential elements of IT systems. Their key component is a classification module that continuously evaluates some features of the network traffic and identifies possible threats. Its efficiency is greatly affected by the right selection of the features to be monitored. Therefore, the identification of a minimal set of features that are necessary to safely distinguish malicious traffic from benign traffic is indispensable in the course of the development of an IDS. This paper presents the preprocessing and feature selection workflow as well as its results in the case of the CSE-CIC-IDS2018 on AWS dataset, focusing on five attack types. To identify the relevant features, six feature selection methods were applied, and the final ranking of the features was elaborated based on their average score. Next, several subsets of the features were formed based on different ranking threshold values, and each subset was tried with five classification algorithms to determine the optimal feature set for each attack type. During the evaluation, four widely used metrics were taken into consideration. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. BFAST: joint dimension reduction and spatial clustering with Bayesian factor analysis for zero-inflated spatial transcriptomics data.

Author: Xu, Yang, Lv, Dian, Zou, Xuanxuan, Wu, Liang, Xu, Xun, and Zhao, Xin
Subjects: *TRANSCRIPTOMES, *BAYESIAN analysis, *GENE expression profiling, *FACTOR analysis, *PHENOTYPES
Abstract: The development of spatially resolved transcriptomics (ST) technologies has made it possible to measure gene expression profiles coupled with cellular spatial context and assist biologists in comprehensively characterizing cellular phenotype heterogeneity and tissue microenvironment. Spatial clustering is vital for biological downstream analysis. However, due to high noise and dropout events, clustering spatial transcriptomics data poses numerous challenges due to the lack of effective algorithms. Here we develop a novel method, jointly performing dimension reduction and spatial clustering with Bayesian Factor Analysis for zero-inflated Spatial Transcriptomics data (BFAST). BFAST has showcased exceptional performance on simulation data and real spatial transcriptomics datasets, as proven by benchmarking against currently available methods. It effectively extracts more biologically informative low-dimensional features compared to traditional dimensionality reduction approaches, thereby enhancing the accuracy and precision of clustering. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Reconstruction of blood flow velocity with deep learning information fusion from spectral ct projections and vessel geometry.

Author: Huang, Shusong, Sigovan, Monica, and Sixou, Bruno
Subjects: *FLOW velocity, *BLOOD flow, *UNSTEADY flow, *INVERSE problems, *HEMODYNAMICS, *DEEP learning
Abstract: AbstractIn this work, we investigate a new deep learning reconstruction method of blood flow velocity within deformed vessels from contrast enhanced X-ray projections and vessel geometry. The principle of the method is to perform linear or nonlinear dimension reductions on the Radon projections and on the mesh of the vessel. These low dimensional projections are then fused to obtain the velocity field in the vessel. The accuracy of the reconstruction method is proved using various neural network architectures with realistic unsteady blood flows. The approach leverages the vessel geometry information and outperforms the simple PCA-net. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Optimal stock investment strategy using prediction models.

Author: Jimin Kim and Jongwoo Song
Abstract: Stock price prediction has traditionally been known as a challenging task. However, recent advancements in machine learning and deep learning models have spurred extensive research in predicting stock returns. This study applies these predictive models to U.S. stock data to forecast stock returns and develop investment strategies based on these forecasts. Additionally, the performance of the model-based investment strategy was compared with that of a widely recognized method, market capitalization-weighted investing. The results indicate that, overall, market capitalization-weighted investing outperformed model-based investing. However, the highest returns were observed in the model-based strategy. It was also found that model-based investing exhibits higher volatility in returns, with significant disparities between years of high and low returns. While investing through machine learning methodologies may be attractive to investors seeking high risk and high return, market capitalization-weighted investing is likely more suitable for those desiring stable returns. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. CP factor model for dynamic tensors.

Author: Han, Yuefeng, Yang, Dan, Zhang, Cun-Hui, and Chen, Rong
Subjects: ORTHOGRAPHIC projection, TIME series analysis, STATISTICAL errors, DYNAMIC models
Abstract: Observations in various applications are frequently represented as a time series of multidimensional arrays, called tensor time series, preserving the inherent multidimensional structure. In this paper, we present a factor model approach, in a form similar to tensor CANDECOMP/PARAFAC (CP) decomposition, to the analysis of high-dimensional dynamic tensor time series. As the loading vectors are uniquely defined but not necessarily orthogonal, it is significantly different from the existing tensor factor models based on Tucker-type tensor decomposition. The model structure allows for a set of uncorrelated one-dimensional latent dynamic factor processes, making it much more convenient to study the underlying dynamics of the time series. A new high-order projection estimator is proposed for such a factor model, utilizing the special structure and the idea of the higher order orthogonal iteration procedures commonly used in Tucker-type tensor factor model and general tensor CP decomposition procedures. Theoretical investigation provides statistical error bounds for the proposed methods, which shows the significant advantage of utilizing the special model structure. Simulation study is conducted to further demonstrate the finite sample properties of the estimators. Real data application is used to illustrate the model and its interpretations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Spherical random projection.

Author: Kang, Seungwoo and Oh, Hee-Seok
Subjects: SPHERICAL projection, DATA mapping
Abstract: We propose a new method for dimension reduction of high-dimensional spherical data based on the nonlinear projection of sphere-valued data to a randomly chosen subsphere. The proposed method, spherical random projection, leads to a probabilistic lower-dimensional mapping of spherical data into a subsphere of the original. In this paper, we investigate some properties of spherical random projection, including expectation preservation and distance concentration, from which we derive an analogue of the Johnson–Lindenstrauss Lemma for spherical random projection. Clustering model selection is discussed as an application of spherical random projection, and numerical experiments are conducted using real and simulated data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. On the Modeling and Prediction of High-Dimensional Functional Time Series.

Author: Chang, Jinyuan, Fang, Qin, Qiao, Xinghao, and Yao, Qiwei
Subjects: *TIME series analysis, *EIGENANALYSIS, *PREDICTION models, *FORECASTING, *PERMUTATIONS
Abstract: AbstractWe propose a two-step procedure to model and predict high-dimensional functional time series, where the number of function-valued time series p is large in relation to the length of time series n. Our first step performs an eigenanalysis of a positive definite matrix, which leads to a one-to-one linear transformation for the original high-dimensional functional time series, and the transformed curve series can be segmented into several groups such that any two subseries from any two different groups are uncorrelated both contemporaneously and serially. Consequently in our second step those groups are handled separately without the information loss on the overall linear dynamic structure. The second step is devoted to establishing a finite-dimensional dynamical structure for all the transformed functional time series within each group. Furthermore the finite-dimensional structure is represented by that of a vector time series. Modeling and forecasting for the original high-dimensional functional time series are realized via those for the vector time series in all the groups. We investigate the theoretical properties of our proposed methods, and illustrate the finite-sample performance through both extensive simulation and two real datasets. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. A note on switching eigenvalues under small perturbations.

Author: Masioti, Marina, S. N. Li-Wai-Suen, Connie, A. Prendergast, Luke, and Shaker, Amanda
Subjects: *PRINCIPAL components analysis, *SYMMETRIC matrices, *EIGENVECTORS, *RESEARCH personnel, *EIGENVALUES
Abstract: Sensitivity of eigenvectors and eigenvalues of symmetric matrix estimates to the removal of a single observation have been well documented in the literature. However, a complicating factor can exist in that the rank of the eigenvalues may change due to the removal of an observation, and with that so too does the perceived importance of the corresponding eigenvector. We refer to this problem as "switching of eigenvalues". Since there is not enough information in the new eigenvalues, post observation removal, to indicate that this has happened, how do we know that this switching has occurred? In this article, we show that approximations to the eigenvalues can be used to help determine when switching may have occurred. We then discuss possible actions researchers can take based on this knowledge, for example making better choices when it comes to deciding how many principal components should be retained and adjustments to approximate influence diagnostics that perform poorly when switching has occurred. Our results are easily applied to any eigenvalue problem involving symmetric matrix estimators. We highlight our approach with application to two real data examples. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Factor IV Estimation in Conditional Moment Models with an Application to Inflation Dynamics*.

Author: Antoine, Bertille and Sun, Xiaolin
Abstract: In a conditional moment model, we develop a new integrated conditional moment (ICM) estimator which directly exploits factor-based conditional moment restrictions without having to first parametrize, or estimate such restrictions. We focus on a time series framework where the large number of available instruments and associated lags is driven by a relatively small number of unobserved factors. We build on the ICM principle originally proposed by Bierens (1982) and combine it with information reduction methods to handle the large number of potential instruments which may exceed the sample size. Under the maintained validity of the true factors, but not that of observed instruments, and standard regularity assumptions, our estimator is consistent, asymptotically normally distributed, and easy to compute. In our simulation studies, we document its reliability and power in cases where the underlying relationship between the endogenous variables and the instruments may be heterogeneous, non-linear, or even unstable over time. Our estimation of the New Keynesian Phillips curve with U.S. data reveals that forward- and backward-looking behaviors are quantitatively equally as important, while the driver's role is nil. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. An enriched approach to combining high-dimensional genomic and low-dimensional phenotypic data.

Author: Cabrera, Javier, Emir, Birol, Cheng, Ge, Duan, Yajie, Alemayehu, Demissie, and Cherkas, Yauheniya
Subjects: *RANDOM forest algorithms, *INDIVIDUALIZED medicine, *PHENOTYPES
Abstract: We describe an approach for combining and analyzing high-dimensional genomic and low-dimensional phenotypic data. The approach leverages a scheme of weights applied to the variables instead of observations and, hence, permits incorporation of the information provided by the low dimensional data source. It can also be incorporated into commonly used downstream techniques, such as random forest or penalized regression. Finally, the simulated lupus studies involving genetic and clinical data are used to illustrate the overall idea and show that the proposed enriched penalized method can select significant genetic variables while keeping several important clinical variables in the final model. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Mapper–Type Algorithms for Complex Data and Relations.

Author: Dłotko, Paweł, Gurnari, Davide, and Sazdanovic, Radmila
Subjects: *KNOT theory, *POINT cloud, *DATA analysis, *DATA reduction, *DATA visualization
Abstract: Mapper and Ball Mapper are Topological Data Analysis tools used for exploring high dimensional point clouds and visualizing scalar–valued functions on those point clouds. Inspired by open questions in knot theory, new features are added to Ball Mapper that enable encoding of the structure, internal relations and symmetries of the point cloud. Moreover, the strengths of Mapper and Ball Mapper constructions are combined to create a tool for comparing high dimensional data descriptors of a single dataset. This new hybrid algorithm, Mapper on Ball Mapper, is applicable to high dimensional lens functions. As a proof of concept we include applications to knot and game theory. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Analyzing the Influence of Telematics-Based Pricing Strategies on Traditional Rating Factors in Auto Insurance Rate Regulation.

Author: Xie, Shengkun
Subjects: *INSURANCE companies, *AUTOMOBILE insurance, *INSURANCE rates, *PRINCIPAL components analysis, *ACTUARIAL risk
Abstract: This study examines how telematics variables such as annual percentage driven, total miles driven, and driving patterns influence the distributional behaviour of conventional rating factors when incorporated into predictive models for capturing auto insurance risk in rate regulation. To effectively manage the complexity inherent in telematics data, we advocate for the adoption of non-negative sparse principal component analysis (NSPCA) as a structured approach for data dimensionality reduction. By emphasizing sparsity and non-negativity constraints, NSPCA enhances the interpretability and predictive power of models concerning both loss severity and claim counts. This methodological innovation aims to advance statistical analyses within insurance pricing frameworks, ensuring the robustness of predictive models and providing insights crucial for rate regulation strategies specific to the auto insurance sector. Results show that, to enhance auto insurance risk pricing models, it is essential to address data dimension reduction challenges when integrating telematics data variables. Our findings underscore that integrating telematics variables into predictive models maintains the integrity of risk relativity estimates associated with traditional policy variables. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Effective quasistatic evolution models for perfectly plastic plates with periodic microstructure: The limiting regimes.

Author: Bužančić, Marin, Davoli, Elisa, and Velčić, Igor
Subjects: *MICROSTRUCTURE, *PLASTICS
Abstract: We identify effective models for thin, linearly elastic and perfectly plastic plates exhibiting a microstructure resulting from the periodic alternation of two elastoplastic phases. We study here both the case in which the thickness of the plate converges to zero on a much faster scale than the periodicity parameter and the opposite scenario in which homogenization occurs on a much finer scale than dimension reduction. After performing a static analysis of the problem, we show convergence of the corresponding quasistatic evolutions. The methodology relies on two-scale convergence and periodic unfolding, combined with suitable measure-disintegration results and evolutionary Γ-convergence. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Asymptotic analysis of single-slip crystal plasticity in the limit of vanishing thickness and rigid elasticity.

Author: Engl, Dominik, Krömer, Stefan, and Kružík, Martin
Subjects: *ELASTOPLASTICITY, *ENERGY industries, *ELASTICITY, *CRYSTALS, *PLASTICS
Abstract: We perform via Γ-convergence a 2d-1d dimension reduction analysis of a single-slip elastoplastic body in large deformations. Rigid plastic and elastoplastic regimes are considered. In particular, we show that limit deformations can essentially freely bend even if subjected to the most restrictive constraints corresponding to the elastically rigid single-slip regime. The primary challenge arises in the upper bound where the differential constraints render any bending without incurring an additional energy cost particularly difficult. We overcome this obstacle with suitable non-smooth constructions and prove that a Lavrentiev phenomenon occurs if we artificially restrict our model to smooth deformations. This issue is absent if the differential constraints are appropriately softened. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. How to visualize high‐dimensional data.

Author: Mrowka, Ralf and Schmauder, Ralf
Subjects: *SCIENTIFIC literature, *PRINCIPAL components analysis, *REGIONAL development, *DATA structures, *BLOOD pressure measurement
Abstract: This article discusses the visualization of high-dimensional data in the field of physiology. The authors emphasize the importance of clarifying the axes and variables represented in diagrams to ensure accurate interpretation. They explain that traditional methods like principal component analysis (PCA) may not be sufficient for high-dimensional data and introduce nonlinear techniques like t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP). These methods allow for the visualization of complex data and have been widely used in various fields, including neurophysiology, immunology, cancer research, and infectious diseases. The authors caution that the interpretation of these plots requires careful consideration due to the nonlinear transformations involved. They also mention ongoing efforts to improve these methods. Overall, the article highlights the need for clear explanations of high-dimensional plots in presentations and acknowledges the interdisciplinary nature of physiology and the rapid development of methods in the field. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

34. Joint modeling of an outcome variable and integrated omics datasets using GLM-PO2PLS.

Author: Gu, Zhujie, Uh, Hae-Won, Houwing-Duistermaat, Jeanine, and el Bouhaddani, Said
Subjects: *MAXIMUM likelihood statistics, *ASYMPTOTIC distribution, *EXPECTATION-maximization algorithms, *DATA integration, *DOWN syndrome
Abstract: In many studies of human diseases, multiple omics datasets are measured. Typically, these omics datasets are studied one by one with the disease, thus the relationship between omics is overlooked. Modeling the joint part of multiple omics and its association to the outcome disease will provide insights into the complex molecular base of the disease. Several dimension reduction methods which jointly model multiple omics and two-stage approaches that model the omics and outcome in separate steps are available. Holistic one-stage models for both omics and outcome are lacking. In this article, we propose a novel one-stage method that jointly models an outcome variable with omics. We establish the model identifiability and develop EM algorithms to obtain maximum likelihood estimators of the parameters for normally and Bernoulli distributed outcomes. Test statistics are proposed to infer the association between the outcome and omics, and their asymptotic distributions are derived. Extensive simulation studies are conducted to evaluate the proposed model. The method is illustrated by modeling Down syndrome as outcome and methylation and glycomics as omics datasets. Here we show that our model provides more insight by jointly considering methylation and glycomics. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Empirical Bayes linked matrix decomposition.

Author: Lock, Eric F.
Subjects: MATRIX decomposition, MISSING data (Statistics), GENE expression, DATA integration, BREAST cancer
Abstract: Data for several applications in diverse fields can be represented as multiple matrices that are linked across rows or columns. This is particularly common in molecular biomedical research, in which multiple molecular "omics" technologies may capture different feature sets (e.g., corresponding to rows in a matrix) and/or different sample populations (corresponding to columns). This has motivated a large body of work on integrative matrix factorization approaches that identify and decompose low-dimensional signal that is shared across multiple matrices or specific to a given matrix. We propose an empirical variational Bayesian approach to this problem that has several advantages over existing techniques, including the flexibility to accommodate shared signal over any number of row or column sets (i.e., bidimensional integration), an intuitive model-based objective function that yields appropriate shrinkage for the inferred signals, and a relatively efficient estimation algorithm with no tuning parameters. A general result establishes conditions for the uniqueness of the underlying decomposition for a broad family of methods that includes the proposed approach. For scenarios with missing data, we describe an associated iterative imputation approach that is novel for the single-matrix context and a powerful approach for "blockwise" imputation (in which an entire row or column is missing) in various linked matrix contexts. Extensive simulations show that the method performs very well under different scenarios with respect to recovering underlying low-rank signal, accurately decomposing shared and specific signals, and accurately imputing missing data. The approach is applied to gene expression and miRNA data from breast cancer tissue and normal breast tissue, for which it gives an informative decomposition of variation and outperforms alternative strategies for missing data imputation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Variable selection for both outcomes and predictors: sparse multivariate principal covariates regression.

Author: Park, Soogeun, Ceulemans, Eva, and Van Deun, Katrijn
Subjects: INDEPENDENT variables, PARAMETERS (Statistics), LEAST squares, REGRESSION analysis, PREDICTION models
Abstract: Datasets comprised of large sets of both predictor and outcome variables are becoming more widely used in research. In addition to the well-known problems of model complexity and predictor variable selection, predictive modelling with such large data also presents a relatively novel and under-studied challenge of outcome variable selection. Certain outcome variables in the data may not be adequately predicted by the given sets of predictors. In this paper, we propose the method of Sparse Multivariate Principal Covariates Regression that addresses these issues altogether by expanding the Principal Covariates Regression model to incorporate sparsity penalties on both of predictor and outcome variables. Our method is one of the first methods that perform variable selection for both predictors and outcomes simultaneously. Moreover, by relying on summary variables that explain the variance in both predictor and outcome variables, the method offers a sparse and succinct model representation of the data. In a simulation study, the method performed better than methods with similar aims such as sparse Partial Least Squares at prediction of the outcome variables and recovery of the population parameters. Lastly, we administered the method on an empirical dataset to illustrate its application in practice. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. CLUSTERDC: A New Density-Based Clustering Algorithm and its Application in a Geological Material Characterization Workflow.

Author: Meyrieux, Maximilien, Hmoud, Samer, van Geffen, Pim, and Kaeter, David
Subjects: PROBABILITY density function, ORE deposits, WASTE products, HIERARCHICAL clustering (Cluster analysis), MINING corporations
Abstract: The ore and waste materials extracted from a mineral deposit during the mining process can have significant variations in their physical and chemical characteristics. The current approaches to geological material characterization are often subjective and usually involve a significant human workload, as there is no optimized, well-defined, and robust methodology to perform this task. This paper proposes a robust, data-driven workflow for geological material characterization. The methodology involves selecting relevant features as a starting point to discriminate between material types. The workflow then employs a robust, state-of-the-art nonlinear dimension reduction (DR) algorithm when the dataset is multidimensional to obtain a two-dimensional embedding. From this two-dimensional embedding, a kernel density estimation (KDE) function is derived. Subsequently, a new clustering algorithm, named ClusterDC, is employed to generate clusters from the KDE function, accurately reflecting geological material types while achieving scalable clustering performance on large drillhole datasets. ClusterDC is a density-based clustering algorithm capable of delineating and ranking high-density zones corresponding to clusters of data samples from a two-dimensional KDE function. The algorithm reduces subjectivity by automatically determining optimal cluster numbers and minimizing reliance on hyperparameters. It also offers hierarchical and flexible clustering, allowing users to group or split clusters, optimally reassign data samples, and identify cluster core points as well as potential outliers. Two case studies were carried out to test the algorithm and demonstrate its application to geochemical drill-core assay data. The results of these case studies demonstrate that the application of ClusterDC in the presented workflow supports the characterization of geological material types based on multi-element geochemistry and thus has the potential to help mining companies optimize downstream processes and mitigate technical risks by improving their understanding of their orebodies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Robustifying and simplifying high-dimensional regression with applications to yearly stock return and telematics data.

Author: Marchese, Malvina, Martínez-Miranda, María Dolores, Nielsen, Jens Perch, and Scholz, Michael
Subjects: RATE of return on stocks, INSURANCE companies, TELEMATICS, PREDICTION models, FINANCIAL services industry
Abstract: The availability of many variables with predictive power makes their selection in a regression context difficult. This study considers robust and understandable low-dimensional estimators as building blocks to improve overall predictive power by optimally combining these building blocks. Our new algorithm is based on generalized cross-validation and builds a predictive model step-by-step from a simple mean to more complex predictive combinations. Empirical applications to annual financial returns and actuarial telematics data show its usefulness in the financial and insurance industries. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Locally sparse and robust partial least squares in scalar-on-function regression.

Author: Gurer, Sude, Shang, Han Lin, Mandal, Abhijit, and Beyaztas, Ufuk
Abstract: We present a novel approach for estimating a scalar-on-function regression model, leveraging a functional partial least squares methodology. Our proposed method involves computing the functional partial least squares components through sparse partial robust M regression, facilitating robust and locally sparse estimations of the regression coefficient function. This strategy delivers a robust decomposition for the functional predictor and regression coefficient functions. After the decomposition, model parameters are estimated using a weighted loss function, incorporating robustness through iterative reweighting of the partial least squares components. The robust decomposition feature of our proposed method enables the robust estimation of model parameters in the scalar-on-function regression model, ensuring reliable predictions in the presence of outliers and leverage points. Moreover, it accurately identifies zero and nonzero sub-regions where the slope function is estimated, even in the presence of outliers and leverage points. We assess our proposed method’s estimation and predictive performance through a series of Monte Carlo experiments and an empirical dataset—that is, data collected in relation to oriented strand board. Compared to existing methods our proposed method performs favorably. Notably, our robust procedure exhibits superior performance in the presence of outliers while maintaining competitiveness in their absence. Our method has been implemented in the robsfplsr package in. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. Feature Vector Effectiveness Evaluation for Pattern Selection in Computational Lithography.

Author: Feng, Yaobin, Liu, Jiamin, Jiang, Hao, and Liu, Shiyuan
Subjects: FAST Fourier transforms, LITHOGRAPHY, KEY performance indicators (Management), CALIBRATION
Abstract: Pattern selection is crucial for optimizing the calibration process of optical proximity correction (OPC) models in computational lithography. However, it remains a challenge to achieve a balance between representative coverage and computational efficiency. This work presents a comprehensive evaluation of the feature vectors' (FVs') effectiveness in pattern selection for OPC model calibration, leveraging key performance indicators (KPIs) based on Kullback–Leibler divergence and distance ranking. Through the construction of autoencoder-based FVs and fast Fourier transform (FFT)-based FVs, we compare their efficacy in capturing critical pattern features. Validation experimental results indicate that autoencoder-based FVs, particularly augmented with the lithography domain knowledge, outperform FFT-based counterparts in identifying anomalies and enhancing lithography model performance. These results also underscore the importance of adaptive pattern representation methods in calibrating the OPC model with evolving complexities. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. GraphPCA: a fast and interpretable dimension reduction algorithm for spatial transcriptomics data

Author: Jiyuan Yang, Lu Wang, Lin Liu, and Xiaoqi Zheng
Subjects: Spatial transcriptomics, Dimension reduction, PCA, Spatial domain detection, Biology (General), QH301-705.5, Genetics, QH426-470
Abstract: Abstract The rapid advancement of spatial transcriptomics technologies has revolutionized our understanding of cell heterogeneity and intricate spatial structures within tissues and organs. However, the high dimensionality and noise in spatial transcriptomic data present significant challenges for downstream data analyses. Here, we develop GraphPCA, an interpretable and quasi-linear dimension reduction algorithm that leverages the strengths of graphical regularization and principal component analysis. Comprehensive evaluations on simulated and multi-resolution spatial transcriptomic datasets generated from various platforms demonstrate the capacity of GraphPCA to enhance downstream analysis tasks including spatial domain detection, denoising, and trajectory inference compared to other state-of-the-art methods.
Published: 2024
Full Text: View/download PDF

42. Robustifying and simplifying high-dimensional regression with applications to yearly stock return and telematics data

Author: Malvina Marchese, María Dolores Martínez-Miranda, Jens Perch Nielsen, and Michael Scholz
Subjects: Forecasting, Non-linear prediction, Stock returns, Dimension reduction, Telematics, Public finance, K4430-4675, Finance, HG1-9999
Abstract: Abstract The availability of many variables with predictive power makes their selection in a regression context difficult. This study considers robust and understandable low-dimensional estimators as building blocks to improve overall predictive power by optimally combining these building blocks. Our new algorithm is based on generalized cross-validation and builds a predictive model step-by-step from a simple mean to more complex predictive combinations. Empirical applications to annual financial returns and actuarial telematics data show its usefulness in the financial and insurance industries.
Published: 2024
Full Text: View/download PDF

43. Corporate Bond Market Distress.

Author: Boyarchenko, Nina, Crump, Richard K., Kovner, Anna, and Shachar, Or
Subjects: CORPORATE bonds, CAPITAL market, STATISTICAL hypothesis testing, CREDIT, DIMENSION reduction (Statistics)
Abstract: We link bond market functioning to future economic activity through a new measure, the Corporate Bond Market Distress Index (CMDI). The CMDI coalesces metrics from primary and secondary markets in real time, offering a unified measure to capture access to debt capital markets. The index correctly identifies periods of distress and predicts future realizations of commonly used measures of market functioning, while the converse is not the case. We show that disruptions in access to corporate bond markets have an economically material, statistically significant impact on the real economy, even after controlling for standard predictors including credit spreads. [ABSTRACT FROM AUTHOR]
Published: 2024

44. The evaluation of relationships between milk composition traits and breeds with categorical principal component analysis in Akkaraman and Awasi sheep

Author: Cak, Bahattin, Keskin, Siddik, and Aydemir, Gokhan
Published: 2024
Full Text: View/download PDF

45. Clustering and unconstrained ordination with Dirichlet process mixture models

Author: Christian Stratton, Andrew Hoegh, Thomas J. Rodhouse, Jennifer L. Green, Katharine M. Banner, and Kathryn M. Irvine
Subjects: clustering, dimension reduction, Dirichlet process mixture models, hierarchical Bayesian models, latent variable models, ordination, Ecology, QH540-549.5, Evolution, QH359-425
Abstract: Abstract Assessment of similarity in species composition or abundance across sampled locations is a common goal in multi‐species monitoring programs. Existing ordination techniques provide a framework for clustering sample locations based on species composition by projecting high‐dimensional community data into a low‐dimensional, latent ecological gradient representing species composition. However, these techniques require specification of the number of distinct ecological communities present in the latent space, which can be difficult to determine in advance. We develop an ordination model capable of simultaneous clustering and ordination that allows for estimation of the number of clusters present in the latent ecological gradient. This model draws latent coordinates for each sample location from a Dirichlet process mixture model, affording researchers with probabilistic statements about the number of clusters present in the latent ecological gradient. The model is compared to existing methods for simultaneous clustering and ordination via simulation and applied to two empirical datasets; JAGS code to fit the proposed model is provided in an appendix. The first dataset concerns presence‐absence records of fish in the Doubs river in eastern France and the second dataset describes presence‐absence records of plant species in Craters of the Moon National Monument and Preserve (CRMO) in Idaho, USA. Results from both analyses align with existing ecological gradients at each location. Development of the Dirichlet process ordination model provides wildlife managers with data‐driven inferences about the number of distinct communities present across monitored locations, allowing for more cost‐effective monitoring and reliable decision‐making for conservation management.
Published: 2024
Full Text: View/download PDF

46. Confirmatory Factor Analysis to Reduce the Knowledge and Economic Dimensions of the Behavior of Cerebral Palsy Parents

Author: Al Um Aniswatun Khasanah, Sri Yuliana, Dhofirul Fadhil Dzil Ikrom, Sangidatus Sholiha, and Wardhani Utami Dewi
Subjects: children cerebral palsy, confirmatory factor analysis, dimension reduction, parental behavior., Mathematics, QA1-939
Abstract: This research is important to understand the behavior of parents of Cerebral Palsy (CP) children by reducing the complexity in understanding knowledge and economic dimensions, thus providing a structured approach to identifying factors that influence the psychology of children with CP. This study aimed to identify and reduce the dimensions that influence the behavior of parents of children with CFA.This study aimed to reduce the socio-economic dimensions and knowledge of the behavior of parents of Cerebral Palsy (CP) children using CFA. A quantitative approach with a cross-sectional design was used in this study to comprehensively examine the behavior of parents of children with CP. Purposive sampling was used to select 200 participants from various backgrounds. The instrument used was a 28-item questionnaire which was distributed online to collect data. The questionnaire has gone through rigorous testing, including CFA, determinant KMO test, Bartlett test PCA, and grouping of variables based on identified factors.Research methods , a quantitative approach with cross-sectional design, and purposive sampling were used to select 200 parents of children with CP from various backgrounds. The instrument in the form of a questionnaire with 28 statement items was distributed online. Data analysis was carried out using CFA, including the determinant test, Kaiser-Meyer-Olkin (KMO) test, Bartlett test, Principal Components Analysis (PCA), and grouping variables based on factors. Research in Lampung shows that the number of boys with CP is slightly higher, with the majority aged 1-5 years. Factor analysis identified three main dimensions: parental knowledge about CP, family economic situation, and parental behavior. Better knowledge and a stable economic situation are positively correlated with better parental behavior in caring for children with CP. Many parents experience high levels of stress due to the physical, emotional, and financial burden of caring for a child with CP. The research conclusion shows that the 28 statement items were successfully reduced to three main dimensions (knowledge, economics, and behavior) which have a significant relationship in the care of children with CP. These findings suggest the need for interventions to increase parental knowledge about CP and family economic stability to increase parental participation in child care.Findings the sex distribution of children with cerebral palsy in Lampung showed a slight excess in the number of boys, with the majority of children being in the age range of 1-5 years. Factor analysis identified three main dimensions: parental knowledge of CP, family economic situation, and parental behavior, where better knowledge and stable economic situation positively correlated with better parental behavior in caring for children with CP. Many parents experience high levels of stress in caring for a child with CP because of the physical, emotional, and financial burden that the condition brings. In conclusion, the 28 items of the statement are reduced to 3 dimensions or factors, including knowledge, economics, and behavior. Each dimension has a relationship with each other significantly to the behavior of parents of CP children.
Published: 2024
Full Text: View/download PDF

47. Hilbert-curve assisted structure embedding method

Author: Gergely Zahoránszky-Kőhalmi, Kanny K. Wan, and Alexander G. Godfrey
Subjects: Chemical space embedding, Clustering, Hilbert-curve, Scaffold-Keys, HCASE, Dimension reduction, Information technology, T58.5-58.64, Chemistry, QD1-999
Abstract: Abstract Motivation Chemical space embedding methods are widely utilized in various research settings for dimensional reduction, clustering and effective visualization. The maps generated by the embedding process can provide valuable insight to medicinal chemists in terms of the relationships between structural, physicochemical and biological properties of compounds. However, these maps are known to be difficult to interpret, and the ‘‘landscape’’ on the map is prone to ‘‘rearrangement’’ when embedding different sets of compounds. Results In this study we present the Hilbert-Curve Assisted Space Embedding (HCASE) method which was designed to create maps by organizing structures according to a logic familiar to medicinal chemists. First, a chemical space is created with the help of a set of ‘‘reference scaffolds’’. These scaffolds are sorted according to the medicinal chemistry inspired Scaffold-Key algorithm found in prior art. Next, the ordered scaffolds are mapped to a line which is folded into a higher dimensional (here: 2D) space. The intricately folded line is referred to as a pseudo-Hilbert-Curve. The embedding of a compound happens by locating its most similar reference scaffold in the pseudo-Hilbert-Curve and assuming the respective position. Through a series of experiments, we demonstrate the properties of the maps generated by the HCASE method. Subjects of embeddings were compounds of the DrugBank and CANVASS libraries, and the chemical spaces were defined by scaffolds extracted from the ChEMBL database. Scientific contribution The novelty of HCASE method lies in generating robust and intuitive chemical space embeddings that are reflective of a medicinal chemist’s reasoning, and the precedential use of space filling (Hilbert) curve in the process. Availability https://github.com/ncats/hcase Graphical Abstract
Published: 2024
Full Text: View/download PDF

48. Co-Active Subspace Methods for the Joint Analysis of Adjacent Computer Models.

Author: Rumsey, Kellin N., Hardy, Zachary K., Ahrens, Cory, and Vander Wiel, Scott
Subjects: *COMPUTER simulation, *RATE setting, *GENERALIZATION, *PHYSICS
Abstract: AbstractActive subspace (AS) methods are a valuable tool for understanding the relationship between the inputs and outputs of a Physics simulation. In this article, an elegant generalization of the traditional ASM is developed to assess the co-activity of two computer models. This generalization, which we refer to as a Co-Active Subspace (Co-AS) Method, allows for the joint analysis of two or more computer models allowing for thorough exploration of the alignment (or non-alignment) of the respective gradient spaces. We define co-active directions, co-sensitivity indices, and a scalar “concordance” metric (and complementary “discordance” pseudo-metric) and we demonstrate that these are powerful tools for understanding the behavior of a class of computer models, especially when used to supplement traditional AS analysis. Details for efficient estimation of the Co-AS and an accompanying R package (concordance) are provided. Practical application is demonstrated through analyzing a set of simulated rate stick experiments for PBX 9501, a high explosive, offering insights into complex model dynamics. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. A Comprehensive Comparative Study of Handcrafted Descriptors in Face and Palmprint Recognition.

Author: Karanwal, Shekhar and Arora, Nitin
Subjects: *PALMPRINT recognition, *FEATURE extraction, *IMAGE recognition (Computer vision), *COMPACTING, *NEIGHBORHOODS
Abstract: For different applications, various handcrafted descriptors are reported in the literature. Their results are satisfactory concerning the application they were proposed. Furthermore in the literature, the comparative study discusses these handcrafted descriptors. The main drawback which was noticed in these studies is the restriction of implementation only to single application. This work fills this gap and provides the comparative study of 10 handcrafted for two different applications and these are face recognition (FR) and palmprint recognition (PR). The 10 handcrafted descriptors which are analyzed are local binary pattern (LBP), horizontal elliptical LBP (HELBP), VELBP, robust LBP (RLBP), local phase quantization (LPQ), multiscale block zigzag LBP (MB-ZZLBP), neighborhood mean LBP (NM-LBP), directional threshold LBP (DT-LBP), median robust extended LBP based on neighborhood intensity (MRELBP-NI) and radial difference LBP (RD-LBP). The global feature extraction is performed for all 10 descriptors. PCA and SVMs are used for compaction and matching. Results are done on ORL, GT, IITD-TP and TP. The first two are face datasets and the latter two are palmprint datasets. In face datasets, the descriptor which attains the best recognition accuracy is DT-LBP and in palmprint datasets, it is MB-ZZLBP which surpass the accuracy of the other compared methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. TWO-SCALE FINITE ELEMENT APPROXIMATION OF A HOMOGENIZED PLATE MODEL.

Author: RUMPF, MARTIN, SIMON, STEFAN, and SMOCH, CHRISTOPH
Subjects: *ELASTIC deformation, *PARTIAL differential equations, *QUADRATIC forms, *TRIANGLES, *ASYMPTOTIC homogenization
Abstract: This paper studies the discretization of a homogenization and dimension reduction model for the elastic deformation of microstructured thin plates proposed by Hornung, Neukamm, and Vel\ci\c [Calc. Var. Partial Differential Equations, 51 (2014), pp. 677-699]. Thereby, a nonlinear bending energy is based on a homogenized quadratic form which acts on the second fundamental form associated with the elastic deformation. Convergence is proved for a multi-affine finite element discretization of the involved three-dimensional microscopic cell problems and a discrete Kirchhof triangle discretization of the two-dimensional isometry-constrained macroscopic problem. Finally, the convergence properties are numerically verified in selected test cases and qualitatively compared with deformation experiments for microstructured sheets of paper. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

6,389 results on '"dimension reduction"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources