981 results
Search Results
2. Semiparametric accelerated failure time models under unspecified random effect distributions.
- Author
-
Seo, Byungtae and Ha, Il Do
- Subjects
- *
RANDOM effects model , *GAUSSIAN distribution - Abstract
Accelerated failure time (AFT) models with random effects, a useful alternative to frailty models, have been widely used for analyzing clustered (or correlated) time-to-event data. In the AFT model, the distribution of the unobserved random effect is conventionally assumed to be parametric, often modeled as a normal distribution. Although it has been known that a misspecfied random-effect distribution has little effect on regression parameter estimates, in some cases, the impact caused by such misspecification is not negligible. Particularly when our focus extends to quantities associated with random effects, the problem could become worse. In this paper, we propose a semi-parametric maximum likelihood approach in which the random-effect distribution under the AFT models is left unspecified. We provide a feasible algorithm to estimate the random-effect distribution as well as model parameters. Through comprehensive simulation studies, our results demonstrate the effectiveness of this proposed method across a range of random-effect distribution types (discrete or continuous) and under conditions of heavy censoring. The efficacy of the approach is further illustrated through simulation studies and real-world data examples. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Heterogeneous quantile regression for longitudinal data with subgroup structures.
- Author
-
Hou, Zhaohan and Wang, Lei
- Subjects
- *
QUANTILE regression , *PANEL analysis , *GENERALIZED estimating equations , *REGRESSION analysis , *STATISTICAL learning , *PARAMETER estimation - Abstract
Subgroup analysis for modeling longitudinal data with heterogeneity across all individuals has drawn attention in the modern statistical learning. In this paper, we focus on heterogeneous quantile regression model and propose to achieve variable selection, heterogeneous subgrouping and parameter estimation simultaneously, by using the smoothed generalized estimating equations in conjunction with the multi-directional separation penalty. The proposed method allows individuals to be divided into multiple subgroups for different heterogeneous covariates such that estimation efficiency can be gained through incorporating individual correlation structure and sharing information within subgroups. A data-driven procedure based on a modified BIC is applied to estimate the number of subgroups. Theoretical properties of the oracle estimator given the underlying true subpopulation information are firstly provided and then it is shown that the proposed estimator is equivalent to the oracle estimator under some conditions. The finite-sample performance of the proposed estimators is studied through simulations and an application to an AIDS dataset is also presented. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Integrated subgroup identification from multi-source data.
- Author
-
Shao, Lihui, Wu, Jiaqi, Zhang, Weiping, and Chen, Yu
- Abstract
Subgroup identification is crucial in dealing with the heterogeneous population and has wide applications in various areas, such as clinical trials and market segmentation. With the prevalence of multi-source data, there is a practical need to identify subgroups based on multi-source data. This paper proposes a working-independence pseudo-loglikelihood and integrates the parameters of each source into a pairwise fusion penalty for simultaneous parameter estimation and subgroup identification. To implement the proposed method, an alternating direction method of multipliers (ADMM) algorithm is derived. Furthermore, the weak oracle properties of parameter estimation are established, illustrating the latent subgroups can be consistently identified. Finally, numerical simulations and an analysis of a randomized trial on reduced nicotine standards for cigarettes are conducted to evaluate the performance of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. On bootstrap consistency of MAVE for single index models.
- Author
-
Zhang, Hong-Fan, Huang, Lei, and Liu, Lian-Lian
- Subjects
- *
STATISTICAL bootstrapping , *MINIMUM variance estimation , *ASYMPTOTIC distribution , *CONFORMANCE testing - Abstract
This paper concerns the bootstrap consistency of the minimum average variance estimation (MAVE) method for the single index model. This paper shows that the conditional wild bootstrap estimator of the parameter index shares the same asymptotic covariance of the original MAVE estimator. Thus, the asymptotic distribution can be accurately estimated by the proposed wild bootstrap method. As an application of this method, this paper proposes a conditional Wald type test for the parameter index. It will be shown by simulations that the conditional bootstrap based test is more powerful than the test based on the traditional plug-in covariance estimator. A real data analysis is also provided to demonstrate the effectiveness of the bootstrap method. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
6. A novel MM algorithm and the mode-sharing method in Bayesian computation for the analysis of general incomplete categorical data.
- Author
-
Tian, Guo-Liang, Liu, Yin, Tang, Man-Lai, and Li, Tao
- Subjects
- *
CATEGORIES (Mathematics) , *BAYESIAN analysis , *MAXIMUM likelihood statistics , *GIBBS sampling , *ALGORITHMS - Abstract
Incomplete categorical data often occur in the fields such as biomedicine, epidemiology, psychology, sports and so on. In this paper, we first introduce a novel minorization–maximization (MM) algorithm to calculate the maximum likelihood estimates (MLEs) of parameters and the posterior modes for the analysis of general incomplete categorical data. Although the data augmentation (DA) algorithm and Gibbs sampling as the corresponding stochastic counterparts of the expectation–maximization (EM) and ECM algorithms are developed very well, up to now, little work has been done on creating stochastic versions to the existing MM algorithms. This is the first paper to propose a mode-sharing method in Bayesian computation for general incomplete categorical data by developing a new acceptance–rejection (AR) algorithm aided with the proposed MM algorithm. The key idea is to construct a class of envelope densities indexed by a working parameter and to identify a specific envelope density which can overcome the four drawbacks associated with the traditional AR algorithm. The proposed mode-sharing based AR algorithm has three significant characteristics: (I) it can automatically establish a family of envelope densities { g λ (⋅) : λ ∈ S λ } indexed by a working parameter λ , where each member in the family shares mode with the posterior density; (II) with the one-dimensional grid method searching over the finite interval S λ , it can identify an optimal working parameter λ opt by maximizing the theoretical acceptance probability, yielding a best easy-sampling envelope density g λ opt (⋅) , which is more dispersive than the posterior density; (III) it can obtain the optimal envelope constant c opt by using the mode-sharing theorem (indicating that the high-dimensional optimization can be completely avoided) or by using the proposed MM algorithm again. Finally, a toy model and three real data sets are used to illustrate the proposed methodologies. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
7. HiQR: An efficient algorithm for high-dimensional quadratic regression with penalties.
- Author
-
Wang, Cheng, Chen, Haozhe, and Jiang, Binyan
- Subjects
- *
ALGORITHMS , *QUADRATIC differentials - Abstract
This paper investigates the efficient solution of penalized quadratic regressions in high-dimensional settings. A novel and efficient algorithm for ridge-penalized quadratic regression is proposed, leveraging the matrix structures of the regression with interactions. Additionally, an alternating direction method of multipliers (ADMM) framework is developed for penalized quadratic regression with general penalties, including both single and hybrid penalty functions. The approach simplifies the calculations to basic matrix-based operations, making it appealing in terms of both memory storage and computational complexity for solving penalized quadratic regressions in high-dimensional settings. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Two-sample test of stochastic block models.
- Author
-
Wu, Qianyong and Hu, Jiang
- Subjects
- *
STOCHASTIC models , *ASYMPTOTIC distribution - Abstract
In this paper, we consider the problem of two-sample test of large networks with community structures. A test statistic is proposed based on the maximum entry of the difference between the two adjacency matrices. Asymptotic null distribution is derived, and the asymptotic power guarantee against the alternative hypothesis is provided. The simulations and real data examples show that the proposed test statistic performs well. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Multi-block alternating direction method of multipliers for ultrahigh dimensional quantile fused regression.
- Author
-
Wu, Xiaofei, Ming, Hao, Zhang, Zhimin, and Cui, Zhenyu
- Subjects
- *
QUANTILE regression , *REGRESSION analysis - Abstract
In this paper, we consider a quantile fused LASSO regression model that combines quantile regression loss with the fused LASSO penalty. Intuitively, this model offers robustness to outliers, thanks to the quantile regression, while also effectively recovering sparse and block coefficients through the fused LASSO penalty. To adapt our proposed method for ultrahigh dimensional datasets, we introduce an iterative algorithm based on the multi-block alternating direction method of multipliers (ADMM). Moreover, we demonstrate the global convergence of the algorithm and derive comparable convergence rates. Importantly, our ADMM algorithm can be easily applied to solve various existing fused LASSO models. In terms of theoretical analysis, we establish that the quantile fused LASSO can achieve near oracle properties with a practical penalty parameter, and additionally, it possesses a sure screening property under a wide class of error distributions. The numerical experimental results support our claims, showing that the quantile fused LASSO outperforms existing fused regression models in robustness, particularly under heavy-tailed distributions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Simultaneous confidence region of an embedded one-dimensional curve in multi-dimensional space.
- Author
-
Yamazoe, Hiroya and Naito, Kanta
- Subjects
- *
CONFIDENCE regions (Mathematics) - Abstract
This paper focuses on the simultaneous confidence region of a one-dimensional curve embedded in multi-dimensional space. Local linear regression is applied component-wise to each variable in multi-dimensional data, which yields an estimator of the one-dimensional curve. A simultaneous confidence region of the curve is proposed based on this estimator and theoretical results for the estimator and the region are developed under some reasonable assumptions. Practically efficient algorithms to determine the thickness of the region are also addressed. The effectiveness of the region is investigated through simulation studies and applications to artificial and real datasets, which reveal that the proposed simultaneous confidence region works well. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Variable selection for high-dimensional incomplete data.
- Author
-
Liang, Lixing, Zhuang, Yipeng, and Yu, Philip L.H.
- Subjects
- *
MISSING data (Statistics) , *REGRESSION analysis , *MULTICOLLINEARITY , *ASIAN Americans , *AT-risk behavior , *AMERICAN studies - Abstract
Regression analysis is often affected by high dimensionality, severe multicollinearity, and a large proportion of missing data. These problems may mask important relationships and even lead to biased conclusions. This paper proposes a novel computationally efficient method that integrates data imputation and variable selection to address these issues. More specifically, the proposed method incorporates a new multiple imputation algorithm based on matrix completion (Multiple Accelerated Inexact Soft-Impute), a more stable and accurate new randomized lasso method (Hybrid Random Lasso), and a consistent method to integrate a variable selection method with multiple imputation. Compared to existing methodologies, the proposed approach offers greater accuracy and consistency through mechanisms that enhances robustness against different missing data patterns and sampling variations. The method is applied to analyze the Asian American minority subgroup in the 2017 National Youth Risk Behavior Survey, where key risk factors related to the intention for suicide among Asian Americans are studied. Through simulations and real data analyses on various regression and classification settings, the proposed method demonstrates enhanced accuracy, consistency, and efficiency in both variable selection and prediction. • Proposed MultiAIS-hrlasso, an efficient variable selection algorithm for high-dimensional incomplete data. • MultiAIS offered better consistency under challenging missing patterns. • The hrlasso showed accuracy in variable selection and prediction, particularly in high dimensionality and multicollinearity. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Nonparametric augmented probability weighting with sparsity.
- Author
-
He, Xin, Mao, Xiaojun, and Wang, Zhonglei
- Subjects
- *
MACHINE learning , *SAMPLE size (Statistics) , *CENTRAL limit theorem , *FEATURE selection , *PROBABILITY theory , *KERNEL (Mathematics) - Abstract
Nonresponse frequently arises in practice, and simply ignoring it may lead to erroneous inference. Besides, the number of collected covariates may increase as the sample size in modern statistics, so parametric imputation or propensity score weighting usually leads to estimation inefficiency and introduces a large variability without consideration of sparsity. In this paper, we propose a nonparametric imputation method with sparsity to estimate the finite population mean, where an efficient kernel-based method in the reproducing kernel Hilbet space is employed for estimation and sparse learning. Moreover, an augmented inverse probability weighting framework is adopted to achieve a central limit theorem for the proposed estimator under regularity conditions. The performance of the proposed method is also supported by several simulated examples and one real-life analysis. • A kernel-based nonparametric sparse learning algorithm is conducted for feature selection. • A nonparametric augmented inverse probability weighting framework is proposed. • Theoretical properties of the proposed estimators are investigated. • The proposed estimator outperforms its competitors numerically. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Bayesian boundary trend filtering.
- Author
-
Onizuka, Takahiro, Iwashige, Fumiya, and Hashimoto, Shintaro
- Subjects
- *
GIBBS sampling , *GAUSSIAN distribution , *BOLTZMANN factor , *DATA augmentation , *CLIMATOLOGY , *FILTERS & filtration - Abstract
Estimating boundary curves has many applications such as economics, climate science, and medicine. Bayesian trend filtering has been developed as one of locally adaptive smoothing methods to estimate the non-stationary trend of data. This paper develops a Bayesian trend filtering for estimating the boundary trend. To this end, the truncated multivariate normal working likelihood and global-local shrinkage priors based on the scale mixtures of normal distribution are introduced. In particular, well-known horseshoe prior for difference leads to locally adaptive shrinkage estimation for boundary trend. However, the full conditional distributions of the Gibbs sampler involve high-dimensional truncated multivariate normal distribution. To overcome the difficulty of sampling, an approximation of truncated multivariate normal distribution is employed. Using the approximation, the proposed models lead to an efficient Gibbs sampling algorithm via the Pólya-Gamma data augmentation. The proposed method is also extended by considering a nearly isotonic constraint. The performance of the proposed method is illustrated through some numerical experiments and real data examples. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Distributed debiased estimation of high-dimensional partially linear models with jumps.
- Author
-
Zhao, Yan-Yong, Zhang, Yuchun, Liu, Yuan, and Ismail, Noriszura
- Subjects
- *
JUMP processes , *NONPARAMETRIC estimation , *PARAMETER estimation - Abstract
In this paper, we focus on the estimations of both parameter vector and nonparametric component in a high-dimensional partially linear model with jumps within the framework of divide and conquer strategy. We find that a three-stage estimation procedure works well in this setting. Applying the lasso penalty and projected spline approximation, first a profiled estimator for the linear part and a projected spline estimator for the nonparametric part are obtained on each local machine. In the second stage, an efficient jump detection algorithm is developed to obtain the new knot sequence, and then based on this, the estimate of the nonparametric function is obtained and averaged after plugging in the linear part estimate on each local machine. The aggregated estimate of the nonparametric function is then computed by pooling these local estimates. In the third stage, a debiased lasso estimator is averaged to obtain a distributed debiased estimator of the linear part after plugging in the aggregated estimate of nonparametric function. Asymptotic properties of resultant estimators are established under some mild assumptions. Some simulations are conducted to illustrate the empirical performances of our proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Laplace approximated quasi-likelihood method for heteroscedastic survival data.
- Author
-
Yu, Lili and Zhao, Yichuan
- Subjects
- *
HETEROSCEDASTICITY , *LAPLACE distribution , *LEAST squares , *CENSORING (Statistics) , *ASYMPTOTIC distribution , *ESTIMATION bias , *SPLINES - Abstract
The classical accelerated failure time model is the major linear model for right censored survival data. It requires the survival data to exhibit homoscedasticity of variance and excludes heteroscedastic survival data that are often seen in practical applications. The least squares method for the classical accelerated failure time model has been extended to accommodate the heteroscedasticity in survival data. However, the estimating equations are discrete and hence they are time consuming and may not be feasible for large datasets. This paper proposes a Laplace approximated quasi-likelihood method with a continuous estimating equation. It utilizes the Laplace approximation to approximate the survival function in the quasi-likelihood, in which the variance function is approximated by a spline function. Then it shows the asymptotic distribution of the Laplace approximated estimator, its estimation bias and the formula for confidence interval estimation for the parameter of interest. The finite sample performance of the proposed approach is evaluated through simulation studies and follows real data examples for illustration. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. ICS for multivariate outlier detection with application to quality control.
- Author
-
Archimbaud, Aurore, Nordhausen, Klaus, and Ruiz-Gazen, Anne
- Subjects
- *
MULTIVARIATE analysis , *QUALITY control , *AVIONICS , *AEROSPACE computing , *AEROSPACE databases - Abstract
Abstract In high reliability standards fields such as automotive, avionics or aerospace, the detection of anomalies is crucial. An efficient methodology for automatically detecting multivariate outliers is introduced. It takes advantage of the remarkable properties of the Invariant Coordinate Selection (ICS) method which leads to an affine invariant coordinate system in which the Euclidian distance corresponds to a Mahalanobis Distance (MD) in the original coordinates. The limitations of MD are highlighted using theoretical arguments in a context where the dimension of the data is large. Owing to the resulting dimension reduction, ICS is expected to improve the power of outlier detection rules such as MD-based criteria. The paper includes practical guidelines for using ICS in the context of a small proportion of outliers. The use of the regular covariance matrix and the so called matrix of fourth moments as the scatter pair is recommended. This choice combines the simplicity of implementation together with the possibility to derive theoretical results. The selection of relevant invariant components through parallel analysis and normality tests is addressed. A simulation study confirms the good properties of the proposal and provides a comparison with Principal Component Analysis and MD. The performance of the proposal is also evaluated on two real data sets using a user-friendly R package accompanying the paper. Highlights • Detecting automatically multivariate outliers in high reliability standards fields. • Combining the advantages of Mahalanobis distance and Principal Component Analysis. • Simple and efficient procedure in the context of a small proportion of outliers. • Reducing the number of false positives compared to competitors. • An R package available: ICSOutlier. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
17. Variational Bayesian inference for bipartite mixed-membership stochastic block model with applications to collaborative filtering.
- Author
-
Liu, Jie, Ye, Zifeng, Chen, Kun, and Zhang, Panpan
- Subjects
- *
BAYESIAN field theory , *STOCHASTIC models , *BIPARTITE graphs , *FILTERS & filtration , *RECOMMENDER systems , *EXPONENTIAL families (Statistics) - Abstract
A network-based method applied to collaborative filtering in recommender systems is introduced in this paper. Specifically, a novel mixed-membership stochastic block model with a conjugate prior from the exponential family is proposed for bipartite networks. The analytical expression of the model is derived, and a variational Bayesian algorithm that is computationally feasible for approximating the untractable posterior distributions is presented. Extensive simulations show that the proposed model provides more accurate inference than competing methods with the presence of outliers. The proposed model is also applied to a MovieLens dataset for a real data application. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Significance test for semiparametric conditional average treatment effects and other structural functions.
- Author
-
Zhou, Niwen, Guo, Xu, and Zhu, Lixing
- Subjects
- *
STATISTICAL hypothesis testing , *CONDITIONAL expectations , *AIDS treatment , *NONPARAMETRIC estimation , *NULL hypothesis , *DATA analysis - Abstract
The paper investigates a hypothesis testing problem concerning the potential additional contributions of other covariates to the structural function, given the known covariates. The structural function is the conditional expectation given covariates in which the response may depend on unknown nuisance functions. It includes classic regression functions and the conditional average treatment effects as illustrative instances. Based on Neyman's orthogonality condition, the proposed distance-based test exhibits the quasi-oracle property in the sense that the nuisance function asymptotically does not influence on the limiting distributions of the test statistic under both the null and alternatives. This novel test can effectively detect the local alternatives distinct from the null at the fastest possible rate in hypothesis testing. This is particularly noteworthy given the involvement of nonparametric estimation of the conditional expectation. Numerical studies are conducted to examine the performance of the test. In the real data analysis section, the proposed tests are applied to identify significantly explanatory covariates that are associated with AIDS treatment effects, yielding noteworthy insights. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Multiclass Laplacian support vector machine with functional analysis of variance decomposition.
- Author
-
Park, Beomjin and Park, Changyi
- Subjects
- *
SUPPORT vector machines , *FUNCTIONAL analysis , *ANALYSIS of variance , *SUPERVISED learning - Abstract
In classification problems, acquiring a sufficient amount of labeled samples sometimes proves expensive and time-consuming, while unlabeled samples are relatively easier to obtain. The Laplacian Support Vector Machine (LapSVM) is one of the successful methods that learn better classification functions by incorporating unlabeled samples. However, since LapSVM was originally designed for binary classification, it can not be applied directly to multiclass classification problems commonly encountered in practice. Thus we derive an extension of LapSVM to multiclass classification problems using an appropriate multiclass formulation. Another problem with LapSVM is that irrelevant variables easily degrade classification performance. The irrelevant variables can increase the variance of predicted values and make the model difficult to interpret. Therefore, this paper also proposes the multiclass LapSVM with functional analysis of variance decomposition to identify relevant variables. Through comprehensive simulations and real-world datasets, we demonstrate the efficiency and improved classification performance of the proposed methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. Communication-efficient estimation of quantile matrix regression for massive datasets.
- Author
-
Yang, Yaohong, Wang, Lei, Liu, Jiamin, Li, Rui, and Lian, Heng
- Subjects
- *
QUANTILE regression , *DATA structures , *AIR quality , *NUCLEAR matrix - Abstract
In modern scientific applications, more and more data sets contain natural matrix predictors and traditional regression methods are not directly applicable. Matrix regression has been adapted to such data structure and received increasing attention in recent years. In this paper, we consider estimation of the conditional quantile in high-dimensional regularized matrix regression with a nuclear norm penalty and establish the convergence rate of the estimator. In order to construct a quantile matrix regression estimator in the distributed setting or for massive data sets, we propose a regularized communication-efficient surrogate loss (CSL) function. The proposed CSL method only needs the worker machines to compute the gradient based on local data and the central machine solves a regularized estimation problem. We prove that the estimation error based on the proposed CSL method matches the estimation error bound of the centralized method that analyzes the entire data set. An alternating direction method of multipliers algorithm is developed to efficiently obtain the distributed CSL estimator. The finite-sample performance of the proposed estimator is studied through simulations and an application to Beijing Air Quality data set. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. Online regularized matrix regression with streaming data.
- Author
-
Yang, Yaohong, Zhao, Weihua, and Wang, Lei
- Subjects
- *
VECTOR data , *THRESHOLDING algorithms , *AIR quality , *MATRICES (Mathematics) - Abstract
As extensions of vector data with ultrahigh dimensionality and complex structures, matrix data are fast emerging in a large variety of scientific applications. In this paper, we consider the matrix regression with streaming data and propose two-stage online regularized estimators with nuclear norm (NN) and adaptive nuclear norm (ANN) penalties, respectively. In the first stage, an equivalent form of offline matrix regression loss function using current raw data and summary statistics from historical data is established. In the second stage, gradient descent algorithm and soft thresholding methods are implemented iteratively to obtain the proposed online NN and ANN estimators. We establish the asymptotic properties of the resulting online regularized estimators and show the rank selection consistency for the online ANN estimator. The finite-sample performance of the proposed estimators is studied through simulations and an application to Beijing Air Quality data set. • We consider the matrix regression with streaming data. • We establish the asymptotic properties and rank selection consistency. • The finite-sample performance of the proposed estimators is studied through simulations. • A real example on Beijing Air Quality data set is provided to show the performance of the proposed estimators. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
22. Estimation of projection pursuit regression via alternating linearization.
- Author
-
Tan, Xin, Zhan, Haoran, and Qin, Xu
- Subjects
- *
STATISTICAL models , *DATA modeling , *DATA analysis , *STATISTICS - Abstract
The projection pursuit regression (PPR) has played an important role in statistical modeling. It can be used both as a data model for statistical interpretation and as an algorithmic model for approximating general non-parametric regressions. Existing estimation methods of PPR usually involve complicated minimization in order to achieve desired efficiency under general settings. This paper presents an algorithm by alternatively linearizing the estimation loss function, referred to as aPPR hereafter, which is easy to implement. The asymptotic theory is also established for both the PPR data model and the algorithmic model. Numerical performance of aPPR in model estimation and model interpretation is demonstrated through simulations and real data analysis. • We propose an algorithm to estimate projection pursuit regression(PPR) that is easy to implement and performs better than existing algorithms. • Asymptotic theory is established for PPR data model under the settings of both fixed and diverging number of predictors. • We also investigate the estimation efficiency of using PPR to approximate a general regression function. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
23. Analysis of binary longitudinal data with time-varying effects.
- Author
-
Jeong, Seonghyun, Park, Minjae, and Park, Taeyoung
- Subjects
- *
TIME-varying systems , *MATHEMATICAL variables , *REGRESSION analysis , *SMOOTHNESS of functions , *SOCIOECONOMIC factors - Abstract
This paper considers the analysis of longitudinal data where a binary response variable is observed repeatedly for each subject over time. In analyzing such data, regression coefficients are commonly assumed constant over time, which may not properly account for the time-varying effects of some subject characteristics on a sequence of binary outcomes. This paper proposes a Bayesian method for the analysis of binary longitudinal data with time-varying regression coefficients and random effects to account for nonlinear subject-specific effects over time as well as between-subject variation. The proposed method facilitates posterior computation via the method of partial collapse and accommodates spatially inhomogeneous smoothness of nonparametric functions without overfitting via a basis search technique. The proposed method is illustrated with a simulated study and the binary longitudinal data from the German socioeconomic panel study. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
24. Conditional absolute mean calibration for partial linear multiplicative distortion measurement errors models.
- Author
-
Zhang, Jun, Lin, Bingqing, and Feng, Zhenghui
- Subjects
- *
ERRORS-in-variables models , *CHI-squared test , *CHI-square distribution , *MEASUREMENT errors , *NULL hypothesis , *ASYMPTOTIC distribution , *LEAST squares - Abstract
In this paper we consider partial linear regression models when all the variables are measured with multiplicative distortion measurement errors. To eliminate the effect caused by the distortion, we propose the conditional absolute mean calibration, which avoids to use the nonzero expectation conditions imposed on the variables. With these calibrated variables, a profile least squares estimator is obtained, associated with its normal approximation based and empirical likelihood based confidence intervals. For the hypothesis testing on parameters, a restricted estimator under the null hypothesis and a test statistic are proposed. A smoothly clipped absolute deviation penalty is employed to select the relevant variables. The resulting penalized estimators are shown to be asymptotically normal and have the oracle property. Lastly, a score-type test statistic is then proposed for checking the validity of partial linear models. We derive asymptotic distribution of the proposed test statistic. The quadratic form of the scaled test statistic has an asymptotic chi-squared distribution under the null hypothesis and follows a noncentral chi-squared distribution under local alternatives, which converge to the null hypothesis at a parametric rate. Simulation studies demonstrate the performance of our proposed procedure and a real example is analyzed as illustrate its practical usage. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
25. Model-based clustering of censored data via mixtures of factor analyzers.
- Author
-
Wang, Wan-Lun, Castro, Luis M., Lachos, Victor H., and Lin, Tsung-I
- Subjects
- *
CENSORING (Statistics) , *MAXIMUM likelihood statistics , *CONDITIONAL expectations - Abstract
Mixtures of factor analyzers (MFA) provide a promising tool for modeling and clustering high-dimensional data that contain an overwhelmingly large number of attributes measured on individuals arisen from a heterogeneous population. Due to the restriction of experimental apparatus, measurements can be limited to some lower and/or upper detection bounds and thus the data are possibly censored. In this paper, we extend the MFA to accommodate censored data, and the new model is called the MFA with censoring (MFAC). A computationally feasible alternating expectation conditional maximization (AECM) algorithm is developed to carry out maximum likelihood estimation of the MFAC model. Practical issues related to model-based clustering and recovery of censored data are also discussed. Simulation studies are conducted to examine the effect of censoring in classification, estimation and cluster validation. We also present an application of the proposed approach to two real data examples in which a certain number of left-censored observations are present. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
26. Estimation and test of jump discontinuities in varying coefficient models with empirical applications.
- Author
-
Zhao, Yan-Yong and Lin, Jin-Guan
- Subjects
- *
SMOOTHING (Numerical analysis) , *JUMPING , *NONPARAMETRIC estimation - Abstract
Varying coefficient models are very important tools to explore the hidden structure between the response and its predictors. This paper focuses on estimating and diagnosing jump discontinuities in coefficient functions. A nonparametric procedure is proposed to estimate jump discontinuities based on the Nadaraya–Watson kernel smoothing and least-squares fitting, and asymptotic properties of resulting estimators are derived. Then, a jump size-based test statistic is developed for checking whether the estimated jump discontinuities are true. A computationally feasible approximation is derived for critical values of its limiting null distribution. Monte Carlo simulations are conducted to assess the finite sample performance of the proposed methodologies, and an empirical example is discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
27. The isotonic regression approach for an instrumental variable estimation of the potential outcome distributions for compliers.
- Author
-
Choi, Byeong Yeob and Lee, Jae Won
- Subjects
- *
ISOTONIC regression , *CUMULATIVE distribution function , *QUANTILE regression , *LEAST squares , *INSTRUMENTAL variables (Statistics) ,POTENTIAL distribution - Abstract
This paper discusses an instrumental variable estimation of the potential outcome distributions for compliers. The existing nonparametric estimators have a limitation in that they give non-proper cumulative distribution functions that violate the non-decreasing property. Using the least squares representation of the standard nonparametric estimators, a simple isotonic regression approach has been developed. A nonparametric bootstrap method is proposed as an appropriate method used to estimate the variances of the isotonic regression estimators. A simulation study demonstrates that the isotonic regression estimators provide more proper and efficient cumulative distribution functions, with much smaller standard errors than those of the standard nonparametric estimators when the proportion of compliers is small. The methods are illustrated with a study to estimate the distributional causal effect of a veteran status on future earnings. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
28. Estimating population size of heterogeneous populations with large data sets and a large number of parameters.
- Author
-
Li, Haoqi, Lin, Huazhen, Yip, Paul S.F., and Li, Yuan
- Subjects
- *
BIG data , *ASYMPTOTIC distribution , *REGRESSION analysis , *LINEAR statistical models , *POPULATION , *DRUG abusers - Abstract
A generalized partial linear regression model is proposed to estimate population size at a specific time from multiple lists of a time-varying and heterogeneous population. The challenge is that we have millions of records and hundreds of parameters for a long period of time. This presents a challenge for data analysis, mainly due to the limitation of computer memory, computational convergence and infeasibility. In the paper, an analytical methodology is proposed for modeling a large data set with a large number of parameters. The basic idea is to apply the maximum likelihood estimator to data observed at each time separately, and then combine these results via weighted averages so that the final estimator becomes the maximum likelihood estimator of the whole data set (full MLE). The asymptotic distribution and inference of the proposed estimators is derived. Simulation studies show that the proposed procedure gives exactly the same performance as the full MLE, but the proposed method is computationally feasible while the full MLE is not, and has much lower computational cost than the full MLE if both methods work. The proposed method is applied to estimate the number of drug-abusers in Hong Kong over the period 1977–2014. • A generalized partial linear model is proposed for time-varying populations. • We propose an analytical methodology suitable for modeling large data set. • Our method is more computationally feasible for which the full MLE is not capable. • The proposed method is applied Hong Kong drug abusers populations estimation. • We developed an efficient and user-friendly R package called COWA. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
29. Weighted covariance matrix estimation.
- Author
-
Yang, Guangren, Liu, Yiming, and Pan, Guangming
- Subjects
- *
DATA analysis - Abstract
The paper proposes a cross-validated linear shrinkage estimation for population covariance matrices. Moreover we also propose a novel weighted estimator based on the thresholding and shrinkage methods for high dimensional datasets. It is applicable to a wider scope of different structures of covariance matrices. Some theoretical results about the cross-validated shrinkage method and weighted covariance estimation methods are also developed. The finite-sample performance of the proposed methods is illustrated through extensive simulations and real data analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
30. Estimation for biased partial linear single index models.
- Author
-
Lu, Jun, Zhu, Xuehu, Lin, Lu, and Zhu, Lixing
- Subjects
- *
CONDITIONAL expectations , *DATA analysis , *REGRESSION analysis , *MATHEMATICAL models , *LEAST squares , *STATISTICAL bias - Abstract
In this paper, we propose a novel method to consistently estimate, at the root- n rate, the coefficient parameters in a biased partial linear single-index model whose error term does not have zero conditional expectation. To achieve this purpose, we first transfer the model to a pro forma linear model and then introduce an artificial variable into a linear bias correction model. Based on the bias correction model, the parameters can then be consistently estimated by the linear least squares method. Both numerical studies and real data analyses are conducted to show the effectiveness of the proposed estimation procedure. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
31. Efficient inference for nonlinear state space models: An automatic sample size selection rule.
- Author
-
Cheng, Jing and Chan, Ngai Hang
- Subjects
- *
MARKOV chain Monte Carlo , *MAXIMUM likelihood statistics , *NONLINEAR estimation , *NONLINEAR statistical models , *MONTE Carlo method - Abstract
This paper studies the maximum likelihood estimation of nonlinear state space models. Particle Markov chain Monte Carlo method is introduced to implement the Monte Carlo expectation maximization algorithm for more accurate and robust estimation. Under this framework, an automated sample size selection criterion is constructed via renewal theory. This criterion would increase the sample size when the relative likelihood indicates that the parameters are close to each other. The proposed methodology is applied to the stochastic volatility model and another nonlinear state space model for illustration, where the results show better estimation performance. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
32. Estimation for single-index models via martingale difference divergence.
- Author
-
Liu, Jicai, Xu, Peirong, and Lian, Heng
- Subjects
- *
MONTE Carlo method , *ASYMPTOTIC normality , *MARTINGALES (Mathematics) , *SMOOTHNESS of functions - Abstract
In this paper, we focus on the estimation of the index coefficients in single-index models and develop a new procedure based on martingale difference divergence. Since the proposed procedure can capture automatically the conditional mean dependence of the response variable on the covariates, it does not involve smoothing techniques or require the commonly used assumptions in the literature of single-index models, such as smooth link functions and at least one continuous covariate. Under some mild conditions, we establish the asymptotic normality of the estimators. We assess the finite sample performance of the proposed procedure by Monte Carlo simulation studies. We further illustrate the proposed method through empirical analyses of a real dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
33. Maximum penalized likelihood estimation of additive hazards models with partly interval censoring.
- Author
-
Li, Jinqing and Ma, Jun
- Subjects
- *
MAXIMUM likelihood statistics , *CENSORING (Statistics) , *PROPORTIONAL hazards models , *CONSTRAINED optimization - Abstract
Existing likelihood methods for the additive hazards model with interval censored survival data are limited and often ignore the non-negative constraints on hazards. This paper proposes a maximum penalized likelihood method to fit additive hazards models with interval censoring. Our method firstly models the baseline hazard using a finite number of non-negative basis functions, and then regression coefficients and baseline hazard are estimated simultaneously by maximizing a penalized log-likelihood function, where a penalty function is introduced to regularize the baseline hazard estimate. In the estimation procedure, non-negative constraints are imposed on both the baseline hazard and the hazard of each subject. A primal–dual interior-point algorithm is applied to solve the constrained optimization problem. Asymptotic properties are obtained and a simulation study is conducted for assessment of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
34. A graph Laplacian prior for Bayesian variable selection and grouping.
- Author
-
Chakraborty, Sounak and Lozano, Aurelie C.
- Subjects
- *
SUBSET selection , *LAPLACIAN matrices , *EXPECTATION-maximization algorithms , *STATISTICAL models , *ARABIDOPSIS thaliana - Abstract
Abstract Variable selection, or subset selection, plays a fundamental role in modern statistical modeling. In many applications, interactions exist between the selected variables. Statistical modeling of such dependence structure is of great importance. In this paper, the focus is on cases in which some correlated predictors have similar effects on the response, and will be grouped into predictive clusters. Here a graph Laplacian prior (GL-prior) is introduced within the Bayesian framework, the Maximum A Posterior (MAP) estimate which simultaneously allows for variable selection, coefficient estimation and predictive group identification. The connections between the GL-prior (graph Laplacian) and the existing regularized regression methods are established accordingly. For computation, an EM based algorithm is proposed, where an efficient augmented Lagrangian approach is utilized for the maximization step. The performance of the proposed approach is examined through simulation studies, followed by a microarray data analysis concerning the plant Arabidopsis thaliana. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
35. Alternating direction method of multipliers for nonconvex fused regression problems.
- Author
-
Xiu, Xianchao, Liu, Wanquan, Li, Ling, and Kong, Lingchen
- Subjects
- *
IMAGE processing , *SIGNAL processing , *TECHNOLOGY convergence , *REGRESSION analysis , *MULTIPLIERS (Mathematical analysis) - Abstract
Abstract It is well-known that the fused least absolute shrinkage and selection operator (FLASSO) has been playing an important role in signal and image processing. Recently, the nonconvex penalty is extensively investigated due to its success in sparse learning. In this paper, a novel nonconvex fused regression model, which integrates FLASSO and the nonconvex penalty nicely, is proposed. The developed alternating direction method of multipliers (ADMM) approach is shown to be very efficient owing to the fact that each derived subproblem has a closed-form solution. In addition, the convergence is discussed and proved mathematically. This leads to a fast and convergent algorithm. Extensive numerical experiments show that our proposed nonconvex fused regression outperforms the state-of-the-art approach FLASSO. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
36. Marginalized models for longitudinal count data.
- Author
-
Lee, Keunbaik and Joo, Yongsung
- Subjects
- *
GENERALIZED estimating equations , *EPILEPSY - Abstract
Abstract In this paper, we propose two marginalized models for longitudinal count data. The first marginalized model has a Markovian structure to account for the serial correlation of longitudinal outcomes. We also propose another marginalized model with a Markovian structure for serial correlation as well as random effects for both overdispersion and long-term dependence. In these models, along with it being possible to permit likelihood-based estimation, inference is valid under ignorability which distinguishes them from generalized estimating equation (GEE) approaches. Fisher-scoring and Quasi-Newton algorithms are developed for estimation purposes. Monte Carlo studies show that the proposed models perform well in the sense of reducing the bias of marginal mean parameters compared to the misspecification of the dependence model in these models. The models are used to draw inferences from a previously analyzed trial on epileptic seizures. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
37. Online renewable smooth quantile regression.
- Author
-
Sun, Xiaofei, Wang, Hongwei, Cai, Chao, Yao, Mei, and Wang, Kangning
- Subjects
- *
QUANTILE regression , *ASYMPTOTIC normality , *ONLINE algorithms - Abstract
This paper concerns quantile regression for streaming data, where large amounts of data arrive batch by batch. Limited memory and non-smoothness of quantile regression loss all pose challenges in both computation and theoretical development. To address the challenges, we first introduce a convex smooth quantile loss, which is infinitely differentiable and converges to the quantile loss uniformly. Then an online renewable framework is proposed, in which the quantile regression estimator is renewed with current data and summary statistics of historical data. In theory, the estimation consistency and asymptotic normality of the renewable estimator are established without any restriction on the total number of data batches, which leads to the oracle property, and gives theoretical guarantee that the new method is adaptive to the situation where streaming data sets arrive perpetually. Numerical experiments on both synthetic and real data verify the theoretical results and illustrate the good performance of the new method. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
38. Small area estimation of general finite-population parameters based on grouped data.
- Author
-
Kawakubo, Yuki and Kobayashi, Genya
- Subjects
- *
EXPECTATION-maximization algorithms , *GIBBS sampling , *INCOME , *GINI coefficient , *REGRESSION analysis - Abstract
This paper proposes a new model-based approach to small area estimation of general finite-population parameters based on grouped data or frequency data, often available from sample surveys. Grouped data contains information on frequencies of some pre-specified groups in each area, for example, the numbers of households in the income classes. Thus, grouped data provide more detailed insight into small areas than area-level aggregated data. A direct application of the widely used small area methods, such as the Fay–Herriot model for area-level data and nested error regression model for unit-level data, is not appropriate since they are not designed for grouped data. Our novel method adopts the multinomial likelihood function for the grouped data. In order to connect the group probabilities of the multinomial likelihood and the auxiliary variables within the framework of small area estimation, we introduce the unobserved unit-level quantities of interest. They follow a linear mixed model with random intercepts and dispersions after some transformation. Then the probabilities that a unit belongs to the groups can be derived and are used to construct the likelihood function for the grouped data given the random effects. The unknown model parameters (hyperparameters) are estimated by a newly developed Monte Carlo EM algorithm which uses an efficient importance sampling. The empirical best predicts (empirical Bayes estimates) of small area parameters are calculated by a simple Gibbs sampling algorithm. The numerical performance of the proposed method is illustrated based on the model-based and design-based simulations. In the application to the city-level grouped income data of Japan, we complete the patchy maps of the Gini coefficient as well as mean income across the country. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
39. Weighted least squares model averaging for accelerated failure time models.
- Author
-
Dong, Qingkai, Liu, Binxia, and Zhao, Hui
- Subjects
- *
LEAST squares , *BILIARY liver cirrhosis , *MALVACEAE - Abstract
This paper proposes a new model averaging method for the accelerated failure time models with right censored data. A weighted least squares procedure is used to estimate the parameters of candidate models. In this procedure, the candidate models are not required to be nested, and the weights selected by Mallows criterion are not limited to be discrete, which make the proposed method very flexible and general. The asymptotic optimality of the proposed method is proved under some mild conditions. Particularly, it is shown that the optimality remains valid even when the variances of the error terms are estimated and the feasible weighted least squares estimators are averaged. Simulation studies show that the proposed method has better prediction performance than many popular model selection or model averaging methods when all candidate models are misspecified. Finally, an application about primary biliary cirrhosis is provided. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
40. Estimation of mean residual life based on ranked set sampling.
- Author
-
Zamanzade, Elham, Parvardeh, Afshin, and Asadi, Majid
- Subjects
- *
STOCHASTIC orders , *EXTREME value theory , *MONTE Carlo method - Abstract
Abstract The mean residual life (MRL) of a nonnegative random variable X plays an important role in various disciplines such as reliability, survival analysis, and extreme value theory. This paper deals with the problem of estimating the MRL in ranked set sampling (RSS) design. An RSS-based estimator for MRL is proposed and its properties are investigated. For finite sample sizes, a Monte Carlo simulation study is carried out to show that the resulting estimator is more efficient than its counterpart in simple random sampling (SRS) design. It is proved that the proposed estimator asymptotically follows a Gaussian process and its asymptotic variance is no larger than its counterpart in the SRS design, regardless of the quality of ranking. Different methods of constructing a confidence interval for MRL in the RSS and SRS designs are then discussed. It is observed that while both the RSS and SRS-based confidence intervals do not control the nominal confidence level equally well, the RSS-based confidence intervals have generally shorter lengths than those in the SRS scheme. Finally, a potential application in the context of medical studies is presented for illustration purpose. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
41. Regression with stagewise minimization on risk function.
- Author
-
Yoshida, Takuma and Naito, Kanta
- Subjects
- *
REGRESSION analysis , *DIVERGENCE theorem , *ASYMPTOTIC theory in estimation theory , *ESTIMATION theory , *ITERATIVE methods (Mathematics) - Abstract
Abstract This paper studies a curve estimation based on empirical risk minimization. The estimator is composed as a convex combination of words (learners) in a dictionary. A word is selected in each step of the proposed stagewise algorithm, which minimizes a certain divergence measure. A non-asymptotic error bound of the estimator is developed, and it is shown that the error bound becomes sharp as the number of iterations of the algorithm increases. A simulation study and real data example confirm the performance of the estimator. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
42. Regression adjustment for treatment effect with multicollinearity in high dimensions.
- Author
-
Yue, Lili, Li, Gaorong, Lian, Heng, and Wan, Xiang
- Subjects
- *
MULTICOLLINEARITY , *STATISTICAL correlation , *ESTIMATION theory , *REGRESSION analysis , *BREAST cancer patients - Abstract
Abstract Randomized experiment is an important tool for studying the Average Treatment Effect (ATE). This paper considers the regression adjustment estimation of the Sample Average Treatment Effect (SATE) in high-dimensional case, where the multicollinearity problem is often encountered and needs to be properly handled. Many existing regression adjustment methods fail to achieve satisfactory performances. To solve this issue, an Elastic-net adjusted estimator for SATE is proposed under the Rubin causal model of randomized experiments with multicollinearity in high dimensions. The asymptotic properties of the proposed SATE estimator are shown under some regularity conditions, and the asymptotic variance is proved to be not greater than that of the unadjusted estimator. Furthermore, Neyman-type conservative estimators for the asymptotic variance are proposed, which yields tighter confidence intervals than both the unadjusted and the Lasso-based adjusted estimators. Some simulation studies are carried out to show that the Elastic-net adjusted method is better in addressing collinearity problem than the existing methods. The advantages of our proposed method are also shown in analyzing the dataset of HER2 breast cancer patients. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
43. Detecting structural breaks in realized volatility.
- Author
-
Song, Junmo and Baek, Changryong
- Subjects
- *
GAUSSIAN distribution , *MAXIMUM likelihood statistics , *STATISTICS , *PUBLIC debts , *CONTINUOUS distributions - Abstract
Abstract This paper considers the detection of structural changes in realized volatility based on HAR–GARCH models. For this, we propose a quasi-likelihood based score test for parameter changes in HAR–GARCH models. We derive the limiting null distribution of the score test by first introducing the quasi-maximum likelihood estimator to the HAR–GARCH model and establishing its asymptotic properties. The proposed test statistic is shown to converge weakly to a function of the Brownian bridge under the null of no structural change. Our simulations study shows reasonable sizes and powers of the test, even for non-Gaussian innovations. A real data application to S&P 500 realized volatility over the last 12 years coincides with three waves of financial crisis, namely the US housing, European sovereign debt, and emerging market crisis. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
44. Modal regression statistical inference for longitudinal data semivarying coefficient models: Generalized estimating equations, empirical likelihood and variable selection.
- Author
-
Wang, Kangning, Li, Shaomin, Sun, Xiaofei, and Lin, Lu
- Subjects
- *
REGRESSION analysis , *INFERENTIAL statistics , *GENERALIZED estimating equations , *MAXIMUM likelihood statistics , *MATHEMATICAL variables - Abstract
Abstract Modal regression is a good alternative of the mean regression, because of its merits of both robustness and high inference efficiency. This paper is concerned with modal regression based statistical inference for semivarying coefficient models with longitudinal data, which include modal regression generalized estimating equations, modal regression empirical likelihood inference procedure for the parametric component and smooth- threshold modal regression generalized estimating equations for variable selection. These methods can incorporate the correlation structure of the longitudinal data and inherit the robustness and efficiency superiorities of the modal regression by choosing an appropriate data adaptive tuning parameter. Under mild conditions, the large sample theoretical properties are established. Simulation studies and real data analysis are also included to illustrate the finite sample performance. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
45. A class of semiparametric transformation cure models for interval-censored failure time data.
- Author
-
Li, Shuwei, Hu, Tao, Zhao, Xingqiu, and Sun, Jianguo
- Subjects
- *
PARAMETER estimation , *MATHEMATICAL transformations , *FAILURE time data analysis , *ASYMPTOTIC expansions , *REGRESSION analysis - Abstract
Abstract This paper discusses regression analysis of interval-censored failure time data with a cured subgroup under a general class of semiparametric transformation cure models. For inference, a novel and stable expectation maximization (EM) algorithm with the use of Poisson variables is developed to overcome the difficulty in maximizing the observed data likelihood function with complex form. The asymptotic properties of the resulting estimators are established and in particular, the estimators of regression parameters are shown to be semiparametrically efficient. The numerical results obtained from a simulation study indicate that the proposed approach works well for practical situations. An application to a set of data on children's mortality is also provided. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
46. Feature screening in ultrahigh-dimensional partially linear models with missing responses at random.
- Author
-
Tang, Niansheng, Xia, Linli, and Yan, Xiaodong
- Subjects
- *
FEATURE selection , *RANDOM variables , *LINEAR systems , *KERNEL functions , *ANALYSIS of covariance - Abstract
Abstract This paper proposes a new feature screening procedure in ultrahigh-dimensional partially linear models with missing responses at random for longitudinal data based on the profile marginal kernel-assisted estimating equations imputation technique. The proposed feature screening procedure has three key merits. First, it is computationally efficient, and can be used to screen significant covariates in the presence of missing responses. Second, it does not require estimating respondent probability and is robust to the misspecification of respondent probability models. Third, the univariate kernel smoothing method is adopted to estimate nonparametric functions, and is employed to impute estimating equations with missing responses at random, which avoids the well-known "curse of dimensionality". The ranking consistency property and the sure screening property are shown under some regularity conditions. Simulation studies are conducted to investigate the finite sample performance of the proposed screening procedure. An example is used to illustrate the proposed procedure. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
47. Construction of multiple decrement tables under generalized fractional age assumptions.
- Author
-
Lee, Hangsuck, Ahn, Jae Youn, and Ko, Bangwon
- Subjects
- *
FRACTIONAL calculus , *GENERALIZATION , *MATHEMATICAL constants , *MATHEMATICAL functions , *ERROR analysis in mathematics - Abstract
Abstract In this paper, we intend to develop a consistent methodology for constructing multiple decrement tables under generalized fractional age assumptions. Assuming that decrements have a common distribution at fractional ages, we derive conversion formulas to split or merge given multiple decrement tables in order to obtain a new multiple decrement table of interest. The assumptions that we consider are quite general, with a wide range of fractional age assumptions including the uniform distribution of decrements or the constant forces of decrement. Our proposed approaches allow us to directly obtain multiple decrement tables without the need for the associated single rates of decrement. They will also enable us to avoid potential inconsistency under the uniform distribution assumptions or unnaturalness arising from the constant forces assumption. In addition, as they navigate through a larger window, they will deepen our understanding of the classical results under the uniform distribution assumptions. Although our methodology is based on a common distribution function assumption, knowing the specific form of the function is unnecessary, since our conversion formulas do not depend upon it. Finally, numerical examples are illustrated where we investigate the main factors of the errors induced by the discrepancy between the true and assumed distributions. The numerical result shows that the relative errors under our approaches are practically negligible for moderate ranges of multiple decrement probabilities. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
48. Modified spatial scan statistics using a restricted likelihood ratio for ordinal outcome data.
- Author
-
Lee, Myeonggyun and Jung, Inkyung
- Subjects
- *
LIKELIHOOD ratio tests , *CLUSTER analysis (Statistics) , *STATISTICS , *SIMULATION methods & models , *POISSON distribution - Abstract
Abstract Spatial scan statistics are widely used as a technique to detect geographical disease clusters for different types of data. It has been pointed out that the Poisson-based spatial scan statistic tends to detect rather larger clusters by absorbing insignificant neighbors with non-elevated risks. We suspect that the spatial scan statistic for ordinal data may also have similar undesirable phenomena. In this paper, we propose to apply a restricted likelihood ratio to spatial scan statistics for ordinal outcome data to circumvent such a phenomenon. Through a simulation study, we demonstrated not only that original spatial scan statistics have the over-detection phenomenon but also that our proposed methods have reasonable or better performance compared with the original methods. We illustrated the proposed methods using a real data set from the 2014 Health Screening Program of Korea with the diagnosis results of normal, caution, suspected disease, and diagnosed with disease as an ordinal outcome. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
49. Bayesian analysis for mixture of latent variable hidden Markov models with multivariate longitudinal data.
- Author
-
Xia, Ye-Mao and Tang, Nian-Sheng
- Subjects
- *
HIDDEN Markov models , *LATENT variables , *BAYESIAN analysis , *FINITE mixture models (Statistics) , *MATRICES (Mathematics) - Abstract
Abstract Latent variable hidden Markov models (LVHMMs) are important statistical methods in exploring the possible heterogeneity of data and explaining the pattern of subjects moving from one group to another over time. Classic subject- and/or time-homogeneous assumptions on transition matrices in transition model as well as the emission distribution in the observed process may be inappropriate to interpret heterogeneity at the subject level. For this end, a general extension of LVHMM is proposed to address the heterogeneity of multivariate longitudinal data both at the subject level and the occasion level. The main modeling strategy is that the observed time sequences are first grouped into different clusters, and then within each cluster the observed sequences are formulated via latent variable hidden Markov model. The local heterogeneity at the occasion level is characterized by the distribution related to the latent states, while the global heterogeneity at the subject level is identified with the finite mixture model. Compared to the existing methods, an appeal underlying the proposal is its capacity of accommodating non-homogeneous patterns of state sequences and emission distributions across the subjects simultaneously. As a result, the proposal provides a comprehensive framework for exploring various kinds of relevance among the multivariate longitudinal data. Within the Bayesian paradigm, Markov Chains Monte Carlo (MCMC) method is used to implement posterior analysis. Gibbs sampler is used to draw observations from the related full conditionals and posterior inferences are carried out based on these simulated observations. Empirical results including simulation studies and a real example are used to illustrate the proposed methodology. Highlights • This paper proposes a general extension of latent variable hidden Markov model to address the heterogeneity of multivariate longitudinal data both at the subject level and the occasion level. • The local heterogeneity at the occasion level is characterized by the distribution related to the latent states, while the global heterogeneity at the subject level is identified with the finite mixture model. • A Bayesian procedure coupled with Gibbs sampler is developed to carry out posterior inference. • Empirical results including simulation studies and a real example are presented to illustrate the proposed methodology. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
50. Robust estimation and confidence interval in meta-regression models.
- Author
-
Yu, Dalei, Ding, Chang, He, Na, Wang, Ruiwu, Zhou, Xiaohua, and Shi, Lei
- Subjects
- *
ROBUST statistics , *ESTIMATION theory , *ASYMPTOTES , *REGRESSION analysis , *CONFIDENCE intervals - Abstract
Abstract Meta-analysis provides a quantitative method for combining results from independent studies with the same treatment. However, existing estimation methods are sensitive to the presence of outliers in the datasets. In this paper we study the robust estimation for the parameters in meta-regression, including the between-study variance and regression parameters. Huber’s rho function and Tukey’s biweight function are adopted to derive the formulae of robust maximum likelihood (ML) estimators. The corresponding algorithms are developed. The asymptotic confidence interval and second-order-corrected confidence interval are investigated. Extensive simulation studies are conducted to assess the performance of the proposed methodology, and our results show that the robust estimators are promising and outperform the conventional ML and restricted maximum likelihood estimators when outliers exist in the dataset. The proposed methods are applied in three case studies and the results further support the eligibility of our methods in practical situations. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.