282 results on '"Cattaneo, Matias D."'
Search Results
2. Randomization Inference for Before-and-After Studies with Multiple Units: An Application to a Criminal Procedure Reform in Uruguay
- Author
-
Cattaneo, Matias D., Diaz, Carlos, and Titiunik, Rocio
- Subjects
Statistics - Methodology ,Statistics - Applications - Abstract
We study the immediate impact of a new code of criminal procedure on crime. In November 2017, Uruguay switched from an inquisitorial system (where a single judge leads the investigation and decides the appropriate punishment for a particular crime) to an adversarial system (where the investigation is now led by prosecutors and the judge plays an overseeing role). To analyze the short-term effects of this reform, we develop a randomization-based approach for before-and-after studies with multiple units. Our framework avoids parametric time series assumptions and eliminates extrapolation by basing statistical inferences on finite-sample methods that rely only on the time periods closest to the time of the policy intervention. A key identification assumption underlying our method is that there would have been no time trends in the absence of the intervention, which is most plausible in a small window around the time of the reform. We also discuss several falsification methods to assess the plausibility of this assumption. Using our proposed inferential approach, we find statistically significant short-term causal effects of the crime reform. Our unbiased estimate shows an average increase of approximately 25 police reports per day in the week following the implementation of the new adversarial system in Montevideo, representing an 8 percent increase compared to the previous week under the old system. more...
- Published
- 2024
Catalog
3. Uniform Estimation and Inference for Nonparametric Partitioning-Based M-Estimators
- Author
-
Cattaneo, Matias D., Feng, Yingjie, and Shigida, Boris
- Subjects
Mathematics - Statistics Theory ,Economics - Econometrics - Abstract
This paper presents uniform estimation and inference theory for a large class of nonparametric partitioning-based M-estimators. The main theoretical results include: (i) uniform consistency for convex and non-convex objective functions; (ii) optimal uniform Bahadur representations; (iii) optimal uniform (and mean square) convergence rates; (iv) valid strong approximations and feasible uniform inference methods; and (v) extensions to functional transformations of underlying estimators. Uniformity is established over both the evaluation point of the nonparametric functional parameter and a Euclidean parameter indexing the class of loss functions. The results also account explicitly for the smoothness degree of the loss function (if any), and allow for a possibly non-identity (inverse) link function. We illustrate the main theoretical and methodological results with four substantive applications: quantile regression, distribution regression, $L_p$ regression, and Logistic regression; many other possibly non-smooth, nonlinear, generalized, robust M-estimation settings are covered by our theoretical results. We provide detailed comparisons with the existing literature and demonstrate substantive improvements: we achieve the best (in some cases optimal) known results under improved (in some cases minimal) requirements in terms of regularity conditions and side rate restrictions. The supplemental appendix reports other technical results that may be of independent interest. more...
- Published
- 2024
4. Nonlinear Binscatter Methods
- Author
-
Cattaneo, Matias D., Crump, Richard K., Farrell, Max H., and Feng, Yingjie
- Subjects
Statistics - Methodology ,Economics - Econometrics ,Mathematics - Statistics Theory - Abstract
Binned scatter plots are a powerful statistical tool for empirical work in the social, behavioral, and biomedical sciences. Available methods rely on a quantile-based partitioning estimator of the conditional mean regression function to primarily construct flexible yet interpretable visualization methods, but they can also be used to estimate treatment effects, assess uncertainty, and test substantive domain-specific hypotheses. This paper introduces novel binscatter methods based on nonlinear, possibly nonsmooth M-estimation methods, covering generalized linear, robust, and quantile regression models. We provide a host of theoretical results and practical tools for local constant estimation along with piecewise polynomial and spline approximations, including (i) optimal tuning parameter (number of bins) selection, (ii) confidence bands, and (iii) formal statistical tests regarding functional form or shape restrictions. Our main results rely on novel strong approximations for general partitioning-based estimators covering random, data-driven partitions, which may be of independent interest. We demonstrate our methods with an empirical application studying the relation between the percentage of individuals without health insurance and per capita income at the zip-code level. We provide general-purpose software packages implementing our methods in Python, R, and Stata. more...
- Published
- 2024
5. Strong Approximations for Empirical Processes Indexed by Lipschitz Functions
- Author
-
Cattaneo, Matias D. and Yu, Ruiqi Rae
- Subjects
Mathematics - Statistics Theory ,Economics - Econometrics ,Mathematics - Probability ,Statistics - Methodology - Abstract
This paper presents new uniform Gaussian strong approximations for empirical processes indexed by classes of functions based on $d$-variate random vectors ($d\geq1$). First, a uniform Gaussian strong approximation is established for general empirical processes indexed by possibly Lipschitz functions, improving on previous results in the literature. In the setting considered by Rio (1994), and if the function class is Lipschitzian, our result improves the approximation rate $n^{-1/(2d)}$ to $n^{-1/\max\{d,2\}}$, up to a $\operatorname{polylog}(n)$ term, where $n$ denotes the sample size. Remarkably, we establish a valid uniform Gaussian strong approximation at the rate $n^{-1/2}\log n$ for $d=2$, which was previously known to be valid only for univariate ($d=1$) empirical processes via the celebrated Hungarian construction (Koml\'os et al., 1975). Second, a uniform Gaussian strong approximation is established for multiplicative separable empirical processes indexed by possibly Lipschitz functions, which addresses some outstanding problems in the literature (Chernozhukov et al., 2014, Section 3). Finally, two other uniform Gaussian strong approximation results are presented when the function class is a sequence of Haar basis based on quasi-uniform partitions. Applications to nonparametric density and regression estimation are discussed. more...
- Published
- 2024
6. Protocols for Observational Studies: An Application to Regression Discontinuity Designs
- Author
-
Cattaneo, Matias D. and Titiunik, Rocio
- Subjects
Statistics - Methodology - Abstract
In his 2022 IMS Medallion Lecture delivered at the Joint Statistical Meetings, Prof. Dylan S. Small eloquently advocated for the use of protocols in observational studies. We discuss his proposal and, inspired by his ideas, we develop a protocol for the regression discontinuity design. more...
- Published
- 2024
7. On Rosenbaum's Rank-based Matching Estimator
- Author
-
Cattaneo, Matias D., Han, Fang, and Lin, Zhexiao
- Subjects
Mathematics - Statistics Theory ,Economics - Econometrics - Abstract
In two influential contributions, Rosenbaum (2005, 2020) advocated for using the distances between component-wise ranks, instead of the original data values, to measure covariate similarity when constructing matching estimators of average treatment effects. While the intuitive benefits of using covariate ranks for matching estimation are apparent, there is no theoretical understanding of such procedures in the literature. We fill this gap by demonstrating that Rosenbaum's rank-based matching estimator, when coupled with a regression adjustment, enjoys the properties of double robustness and semiparametric efficiency without the need to enforce restrictive covariate moment assumptions. Our theoretical findings further emphasize the statistical virtues of employing ranks for estimation and inference, more broadly aligning with the insights put forth by Peter Bickel in his 2004 Rietz lecture (Bickel, 2004)., Comment: Assumption 4.1 is slightly weakened in this version more...
- Published
- 2023
8. Inference with Mondrian Random Forests
- Author
-
Cattaneo, Matias D., Klusowski, Jason M., and Underwood, William G.
- Subjects
Mathematics - Statistics Theory ,Statistics - Methodology ,Statistics - Machine Learning ,62G08 (Primary), 62G05, 62G20 (Secondary) - Abstract
Random forests are popular methods for regression and classification analysis, and many different variants have been proposed in recent years. One interesting example is the Mondrian random forest, in which the underlying constituent trees are constructed via a Mondrian process. We give precise bias and variance characterizations, along with a Berry-Esseen-type central limit theorem, for the Mondrian random forest regression estimator. By combining these results with a carefully crafted debiasing approach and an accurate variance estimator, we present valid statistical inference methods for the unknown regression function. These methods come with explicitly characterized error bounds in terms of the sample size, tree complexity parameter, and number of trees in the forest, and include coverage error rates for feasible confidence interval estimators. Our novel debiasing procedure for the Mondrian random forest also allows it to achieve the minimax-optimal point estimation convergence rate in mean squared error for multivariate $\beta$-H\"older regression functions, for all $\beta > 0$, provided that the underlying tuning parameters are chosen appropriately. Efficient and implementable algorithms are devised for both batch and online learning settings, and we carefully study the computational complexity of different Mondrian random forest implementations. Finally, simulations with synthetic data validate our theory and methodology, demonstrating their excellent finite-sample properties., Comment: 64 pages, 1 figure, 6 tables more...
- Published
- 2023
9. On the Implicit Bias of Adam
- Author
-
Cattaneo, Matias D., Klusowski, Jason M., and Shigida, Boris
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Mathematics - Optimization and Control ,Statistics - Computation ,Statistics - Machine Learning - Abstract
In previous literature, backward error analysis was used to find ordinary differential equations (ODEs) approximating the gradient descent trajectory. It was found that finite step sizes implicitly regularize solutions because terms appearing in the ODEs penalize the two-norm of the loss gradients. We prove that the existence of similar implicit regularization in RMSProp and Adam depends on their hyperparameters and the training stage, but with a different "norm" involved: the corresponding ODE terms either penalize the (perturbed) one-norm of the loss gradients or, conversely, impede its reduction (the latter case being typical). We also conduct numerical experiments and discuss how the proven facts can influence generalization. more...
- Published
- 2023
10. Context-Dependent Heterogeneous Preferences: A Comment on Barseghyan and Molinari (2023)
- Author
-
Cattaneo, Matias D, Ma, Xinwei, and Masatlioglu, Yusufcan
- Subjects
Economics ,Applied Economics ,Mathematical Sciences ,Commerce ,Management ,Tourism and Services ,Econometrics ,Commerce ,management ,tourism and services ,Mathematical sciences - Published
- 2023
11. Context-Dependent Heterogeneous Preferences: A Comment on Barseghyan and Molinari (2023)
- Author
-
Cattaneo, Matias D., Ma, Xinwei, and Masatlioglu, Yusufcan
- Subjects
Economics - Theoretical Economics ,Economics - Econometrics ,Statistics - Methodology - Abstract
Barseghyan and Molinari (2023) give sufficient conditions for semi-nonparametric point identification of parameters of interest in a mixture model of decision-making under risk, allowing for unobserved heterogeneity in utility functions and limited consideration. A key assumption in the model is that the heterogeneity of risk preferences is unobservable but context-independent. In this comment, we build on their insights and present identification results in a setting where the risk preferences are allowed to be context-dependent. more...
- Published
- 2023
12. Bootstrap-Assisted Inference for Generalized Grenander-type Estimators
- Author
-
Cattaneo, Matias D., Jansson, Michael, and Nagasawa, Kenichi
- Subjects
Mathematics - Statistics Theory ,Economics - Econometrics ,Statistics - Methodology - Abstract
Westling and Carone (2020) proposed a framework for studying the large sample distributional properties of generalized Grenander-type estimators, a versatile class of nonparametric estimators of monotone functions. The limiting distribution of those estimators is representable as the left derivative of the greatest convex minorant of a Gaussian process whose monomial mean can be of unknown order (when the degree of flatness of the function of interest is unknown). The standard nonparametric bootstrap is unable to consistently approximate the large sample distribution of the generalized Grenander-type estimators even if the monomial order of the mean is known, making statistical inference a challenging endeavour in applications. To address this inferential problem, we present a bootstrap-assisted inference procedure for generalized Grenander-type estimators. The procedure relies on a carefully crafted, yet automatic, transformation of the estimator. Moreover, our proposed method can be made ``flatness robust'' in the sense that it can be made adaptive to the (possibly unknown) degree of flatness of the function of interest. The method requires only the consistent estimation of a single scalar quantity, for which we propose an automatic procedure based on numerical derivative estimation and the generalized jackknife. Under random sampling, our inference method can be implemented using a computationally attractive exchangeable bootstrap procedure. We illustrate our methods with examples and we also provide a small simulation study. The development of formal results is made possible by some technical results that may be of independent interest. more...
- Published
- 2023
13. A Guide to Regression Discontinuity Designs in Medical Applications
- Author
-
Cattaneo, Matias D., Keele, Luke, and Titiunik, Rocio
- Subjects
Statistics - Methodology ,Economics - Econometrics ,Statistics - Applications - Abstract
We present a practical guide for the analysis of regression discontinuity (RD) designs in biomedical contexts. We begin by introducing key concepts, assumptions, and estimands within both the continuity-based framework and the local randomization framework. We then discuss modern estimation and inference methods within both frameworks, including approaches for bandwidth or local neighborhood selection, optimal treatment effect point estimation, and robust bias-corrected inference methods for uncertainty quantification. We also overview empirical falsification tests that can be used to support key assumptions. Our discussion focuses on two particular features that are relevant in biomedical research: (i) fuzzy RD designs, which often arise when therapeutic treatments are based on clinical guidelines but patients with scores near the cutoff are treated contrary to the assignment rule; and (ii) RD designs with discrete scores, which are ubiquitous in biomedical applications. We illustrate our discussion with three empirical applications: the effect of CD4 guidelines for anti-retroviral therapy on retention of HIV patients in South Africa, the effect of genetic guidelines for chemotherapy on breast cancer recurrence in the United States, and the effects of age-based patient cost-sharing on healthcare utilization in Taiwan. We provide replication materials employing publicly available statistical software in Python, R and Stata, offering researchers all necessary tools to conduct an RD analysis. more...
- Published
- 2023
14. A Practical Introduction to Regression Discontinuity Designs: Extensions
- Author
-
Cattaneo, Matias D., Idrobo, Nicolas, and Titiunik, Rocio
- Subjects
Statistics - Methodology ,Economics - Econometrics ,Statistics - Applications ,Statistics - Computation - Abstract
This monograph, together with its accompanying first part Cattaneo, Idrobo and Titiunik (2020), collects and expands the instructional materials we prepared for more than $50$ short courses and workshops on Regression Discontinuity (RD) methodology that we taught between 2014 and 2023. In this second monograph, we discuss several topics in RD methodology that build on and extend the analysis of RD designs introduced in Cattaneo, Idrobo and Titiunik (2020). Our first goal is to present an alternative RD conceptual framework based on local randomization ideas. This methodological approach can be useful in RD designs with discretely-valued scores, and can also be used more broadly as a complement to the continuity-based approach in other settings. Then, employing both continuity-based and local randomization approaches, we extend the canonical Sharp RD design in multiple directions: fuzzy RD designs, RD designs with discrete scores, and multi-dimensional RD designs. The goal of our two-part monograph is purposely practical and hence we focus on the empirical analysis of RD designs. more...
- Published
- 2023
- Full Text
- View/download PDF
15. Higher-order Refinements of Small Bandwidth Asymptotics for Density-Weighted Average Derivative Estimators
- Author
-
Cattaneo, Matias D., Farrell, Max H., Jansson, Michael, and Masini, Ricardo
- Subjects
Economics - Econometrics ,Mathematics - Statistics Theory ,Statistics - Methodology - Abstract
The density weighted average derivative (DWAD) of a regression function is a canonical parameter of interest in economics. Classical first-order large sample distribution theory for kernel-based DWAD estimators relies on tuning parameter restrictions and model assumptions that imply an asymptotic linear representation of the point estimator. These conditions can be restrictive, and the resulting distributional approximation may not be representative of the actual sampling distribution of the statistic of interest. In particular, the approximation is not robust to bandwidth choice. Small bandwidth asymptotics offers an alternative, more general distributional approximation for kernel-based DWAD estimators that allows for, but does not require, asymptotic linearity. The resulting inference procedures based on small bandwidth asymptotics were found to exhibit superior finite sample performance in simulations, but no formal theory justifying that empirical success is available in the literature. Employing Edgeworth expansions, this paper shows that small bandwidth asymptotic approximations lead to inference procedures with higher-order distributional properties that are demonstrably superior to those of procedures based on asymptotic linear approximations. more...
- Published
- 2022
16. On the Pointwise Behavior of Recursive Partitioning and Its Implications for Heterogeneous Causal Effect Estimation
- Author
-
Cattaneo, Matias D., Klusowski, Jason M., and Tian, Peter M.
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning ,Mathematics - Statistics Theory - Abstract
Decision tree learning is increasingly being used for pointwise inference. Important applications include causal heterogenous treatment effects and dynamic policy decisions, as well as conditional quantile regression and design of experiments, where tree estimation and inference is conducted at specific values of the covariates. In this paper, we call into question the use of decision trees (trained by adaptive recursive partitioning) for such purposes by demonstrating that they can fail to achieve polynomial rates of convergence in uniform norm with non-vanishing probability, even with pruning. Instead, the convergence may be arbitrarily slow or, in some important special cases, such as honest regression trees, fail completely. We show that random forests can remedy the situation, turning poor performing trees into nearly optimal procedures, at the cost of losing interpretability and introducing two additional tuning parameters. The two hallmarks of random forests, subsampling and the random feature selection mechanism, are seen to each distinctively contribute to achieving nearly optimal performance for the model class considered. more...
- Published
- 2022
17. Convergence Rates of Oblique Regression Trees for Flexible Function Libraries
- Author
-
Cattaneo, Matias D., Chandak, Rajita, and Klusowski, Jason M.
- Subjects
Mathematics - Statistics Theory ,Statistics - Methodology - Abstract
We develop a theoretical framework for the analysis of oblique decision trees, where the splits at each decision node occur at linear combinations of the covariates (as opposed to conventional tree constructions that force axis-aligned splits involving only a single covariate). While this methodology has garnered significant attention from the computer science and optimization communities since the mid-80s, the advantages they offer over their axis-aligned counterparts remain only empirically justified, and explanations for their success are largely based on heuristics. Filling this long-standing gap between theory and practice, we show that oblique regression trees (constructed by recursively minimizing squared error) satisfy a type of oracle inequality and can adapt to a rich library of regression models consisting of linear combinations of ridge functions and their limit points. This provides a quantitative baseline to compare and contrast decision trees with other less interpretable methods, such as projection pursuit regression and neural networks, which target similar model forms. Contrary to popular belief, one need not always trade-off interpretability with accuracy. Specifically, we show that, under suitable conditions, oblique decision trees achieve similar predictive accuracy as neural networks for the same library of regression models. To address the combinatorial complexity of finding the optimal splitting hyperplane at each decision node, our proposed theoretical framework can accommodate many existing computational tools in the literature. Our results rely on (arguably surprising) connections between recursive adaptive partitioning and sequential greedy approximation algorithms for convex optimization problems (e.g., orthogonal greedy algorithms), which may be of independent theoretical interest. Using our theory and methods, we also study oblique random forests. more...
- Published
- 2022
18. Uncertainty Quantification in Synthetic Controls with Staggered Treatment Adoption
- Author
-
Cattaneo, Matias D., Feng, Yingjie, Palomba, Filippo, and Titiunik, Rocio
- Subjects
Economics - Econometrics ,Statistics - Applications ,Statistics - Computation ,Statistics - Methodology - Abstract
We propose principled prediction intervals to quantify the uncertainty of a large class of synthetic control predictions (or estimators) in settings with staggered treatment adoption, offering precise non-asymptotic coverage probability guarantees. From a methodological perspective, we provide a detailed discussion of different causal quantities to be predicted, which we call causal predictands, allowing for multiple treated units with treatment adoption at possibly different points in time. From a theoretical perspective, our uncertainty quantification methods improve on prior literature by (i) covering a large class of causal predictands in staggered adoption settings, (ii) allowing for synthetic control methods with possibly nonlinear constraints, (iii) proposing scalable robust conic optimization methods and principled data-driven tuning parameter selection, and (iv) offering valid uniform inference across post-treatment periods. We illustrate our methodology with an empirical application studying the effects of economic liberalization on real GDP per capita for Sub-Saharan African countries. Companion general-purpose software packages are provided in Python, R, and Stata. more...
- Published
- 2022
19. Yurinskii's Coupling for Martingales
- Author
-
Cattaneo, Matias D., Masini, Ricardo P., and Underwood, William G.
- Subjects
Mathematics - Statistics Theory ,Economics - Econometrics ,Statistics - Methodology ,62E20, 62G20, 60G42 - Abstract
Yurinskii's coupling is a popular theoretical tool for non-asymptotic distributional analysis in mathematical statistics and applied probability, offering a Gaussian strong approximation with an explicit error bound under easily verifiable conditions. Originally stated in $\ell^2$-norm for sums of independent random vectors, it has recently been extended both to the $\ell^p$-norm, for $1 \leq p \leq \infty$, and to vector-valued martingales in $\ell^2$-norm, under some strong conditions. We present as our main result a Yurinskii coupling for approximate martingales in $\ell^p$-norm, under substantially weaker conditions than those previously imposed. Our formulation further allows for the coupling variable to follow a more general Gaussian mixture distribution, and we provide a novel third-order coupling method which gives tighter approximations in certain settings. We specialize our main result to mixingales, martingales, and independent data, and derive uniform Gaussian mixture strong approximations for martingale empirical processes. Applications to nonparametric partitioning-based and local polynomial regression procedures are provided, alongside central limit theorems for high-dimensional martingale vectors., Comment: 57 pages, 1 figure more...
- Published
- 2022
20. Beta-Sorted Portfolios
- Author
-
Cattaneo, Matias D., Crump, Richard K., and Wang, Weining
- Subjects
Economics - Econometrics - Abstract
Beta-sorted portfolios -- portfolios comprised of assets with similar covariation to selected risk factors -- are a popular tool in empirical finance to analyze models of (conditional) expected returns. Despite their widespread use, little is known of their statistical properties in contrast to comparable procedures such as two-pass regressions. We formally investigate the properties of beta-sorted portfolio returns by casting the procedure as a two-step nonparametric estimator with a nonparametric first step and a beta-adaptive portfolios construction. Our framework rationalize the well-known estimation algorithm with precise economic and statistical assumptions on the general data generating process and characterize its key features. We study beta-sorted portfolios for both a single cross-section as well as for aggregation over time (e.g., the grand mean), offering conditions that ensure consistency and asymptotic normality along with new uniform inference procedures allowing for uncertainty quantification and testing of various relevant hypotheses in financial applications. We also highlight some limitations of current empirical practices and discuss what inferences can and cannot be drawn from returns to beta-sorted portfolios for either a single cross-section or across the whole sample. Finally, we illustrate the functionality of our new procedures in an empirical application. more...
- Published
- 2022
21. A Practical Introduction to Regression Discontinuity Designs: Extensions
- Author
-
Cattaneo, Matias D., Idrobo, Nicolas, and Titiunik, Rocío
- Published
- 2024
- Full Text
- View/download PDF
22. lpcde: Estimation and Inference for Local Polynomial Conditional Density Estimators
- Author
-
Cattaneo, Matias D., Chandak, Rajita, Jansson, Michael, and Ma, Xinwei
- Subjects
Statistics - Computation ,Statistics - Applications ,Statistics - Methodology - Abstract
This paper discusses the R package lpcde, which stands for local polynomial conditional density estimation. It implements the kernel-based local polynomial smoothing methods introduced in Cattaneo, Chandak, Jansson, Ma (2024( for statistical estimation and inference of conditional distributions, densities, and derivatives thereof. The package offers mean square error optimal bandwidth selection and associated point estimators, as well as uncertainty quantification based on robust bias correction both pointwise (e.g., confidence intervals) and uniformly (e.g., confidence bands) over evaluation points. The methods implemented are boundary adaptive whenever the data is compactly supported. The package also implements regularized conditional density estimation methods, ensuring the resulting density estimate is non-negative and integrates to one. We contrast the functionalities of lpcde with existing R packages for conditional density estimation, and showcase its main features using simulated data. more...
- Published
- 2022
23. Boundary Adaptive Local Polynomial Conditional Density Estimators
- Author
-
Cattaneo, Matias D., Chandak, Rajita, Jansson, Michael, and Ma, Xinwei
- Subjects
Mathematics - Statistics Theory ,Economics - Econometrics ,Statistics - Methodology - Abstract
We begin by introducing a class of conditional density estimators based on local polynomial techniques. The estimators are boundary adaptive and easy to implement. We then study the (pointwise and) uniform statistical properties of the estimators, offering characterizations of both probability concentration and distributional approximation. In particular, we establish uniform convergence rates in probability and valid Gaussian distributional approximations for the Studentized t-statistic process. We also discuss implementation issues such as consistent estimation of the covariance function for the Gaussian approximation, optimal integrated mean squared error bandwidth selection, and valid robust bias-corrected inference. We illustrate the applicability of our results by constructing valid confidence bands and hypothesis tests for both parametric specification and shape constraints, explicitly characterizing their approximation errors. A companion R software package implementing our main results is provided. more...
- Published
- 2022
24. scpi: Uncertainty Quantification for Synthetic Control Methods
- Author
-
Cattaneo, Matias D., Feng, Yingjie, Palomba, Filippo, and Titiunik, Rocio
- Subjects
Statistics - Methodology ,Economics - Econometrics ,Statistics - Applications ,Statistics - Computation - Abstract
The synthetic control method offers a way to quantify the effect of an intervention using weighted averages of untreated units to approximate the counterfactual outcome that the treated unit(s) would have experienced in the absence of the intervention. This method is useful for program evaluation and causal inference in observational studies. We introduce the software package scpi for prediction and inference using synthetic controls, implemented in Python, R, and Stata. For point estimation or prediction of treatment effects, the package offers an array of (possibly penalized) approaches leveraging the latest optimization methods. For uncertainty quantification, the package offers the prediction interval methods introduced by Cattaneo, Feng and Titiunik (2021) and Cattaneo, Feng, Palomba and Titiunik (2022). The paper includes numerical illustrations and a comparison with other synthetic control software. more...
- Published
- 2022
25. Uniform Inference for Kernel Density Estimators with Dyadic Data
- Author
-
Cattaneo, Matias D., Feng, Yingjie, and Underwood, William G.
- Subjects
Mathematics - Statistics Theory ,Statistics - Methodology ,62G05, 62G07, 62M99 (Primary) 91D30, 90B15 (Secondary) - Abstract
Dyadic data is often encountered when quantities of interest are associated with the edges of a network. As such it plays an important role in statistics, econometrics and many other data science disciplines. We consider the problem of uniformly estimating a dyadic Lebesgue density function, focusing on nonparametric kernel-based estimators taking the form of dyadic empirical processes. Our main contributions include the minimax-optimal uniform convergence rate of the dyadic kernel density estimator, along with strong approximation results for the associated standardized and Studentized $t$-processes. A consistent variance estimator enables the construction of valid and feasible uniform confidence bands for the unknown density function. We showcase the broad applicability of our results by developing novel counterfactual density estimation and inference methodology for dyadic data, which can be used for causal inference and program evaluation. A crucial feature of dyadic distributions is that they may be "degenerate" at certain points in the support of the data, a property making our analysis somewhat delicate. Nonetheless our methods for uniform inference remain robust to the potential presence of such points. For implementation purposes, we discuss inference procedures based on positive semi-definite covariance estimators, mean squared error optimal bandwidth selectors and robust bias correction techniques. We illustrate the empirical finite-sample performance of our methods both in simulations and with real-world trade data, for which we make comparisons between observed and counterfactual trade distributions in different years. Our technical results concerning strong approximations and maximal inequalities are of potential independent interest., Comment: Article: 23 pages, 3 figures. Supplemental appendix: 72 pages, 3 figures more...
- Published
- 2022
26. Attention Overload
- Author
-
Cattaneo, Matias D., Cheung, Paul, Ma, Xinwei, and Masatlioglu, Yusufcan
- Subjects
Economics - Theoretical Economics ,Economics - Econometrics - Abstract
We introduce an Attention Overload Model that captures the idea that alternatives compete for the decision maker's attention, and hence the attention that each alternative receives decreases as the choice problem becomes larger. Using this nonparametric restriction on the random attention formation, we show that a fruitful revealed preference theory can be developed and provide testable implications on the observed choice behavior that can be used to (point or partially) identify the decision maker's preference and attention frequency. We then enhance our attention overload model to accommodate heterogeneous preferences. Due to the nonparametric nature of our identifying assumption, we must discipline the amount of heterogeneity in the choice model: we propose the idea of List-based Attention Overload, where alternatives are presented to the decision makers as a list that correlates with both heterogeneous preferences and random attention. We show that preference and attention frequencies are (point or partially) identifiable under nonparametric assumptions on the list and attention formation mechanisms, even when the true underlying list is unknown to the researcher. Building on our identification results, for both preference and attention frequencies, we develop econometric methods for estimation and inference that are valid in settings with a large number of alternatives and choice problems, a distinctive feature of the economic environment we consider. We provide a software package in R implementing our empirical methods, and illustrate them in a simulation study. more...
- Published
- 2021
27. Covariate Adjustment in Regression Discontinuity Designs
- Author
-
Cattaneo, Matias D., Keele, Luke, and Titiunik, Rocio
- Subjects
Statistics - Methodology ,Economics - Econometrics - Abstract
The Regression Discontinuity (RD) design is a widely used non-experimental method for causal inference and program evaluation. While its canonical formulation only requires a score and an outcome variable, it is common in empirical work to encounter RD analyses where additional variables are used for adjustment. This practice has led to misconceptions about the role of covariate adjustment in RD analysis, from both methodological and empirical perspectives. In this chapter, we review the different roles of covariate adjustment in RD designs, and offer methodological guidance for its correct use. more...
- Published
- 2021
28. Regression Discontinuity Designs
- Author
-
Cattaneo, Matias D. and Titiunik, Rocio
- Subjects
Economics - Econometrics ,Statistics - Applications ,Statistics - Methodology - Abstract
The Regression Discontinuity (RD) design is one of the most widely used non-experimental methods for causal inference and program evaluation. Over the last two decades, statistical and econometric methods for RD analysis have expanded and matured, and there is now a large number of methodological results for RD identification, estimation, inference, and validation. We offer a curated review of this methodological literature organized around the two most popular frameworks for the analysis and interpretation of RD designs: the continuity framework and the local randomization framework. For each framework, we discuss three main topics: (i) designs and parameters, which focuses on different types of RD settings and treatment effects of interest; (ii) estimation and inference, which presents the most popular methods based on local polynomial regression and analysis of experiments, as well as refinements, extensions, and alternatives; and (iii) validation and falsification, which summarizes an array of mostly empirical approaches to support the validity of RD designs in practice. more...
- Published
- 2021
29. Local regression distribution estimators
- Author
-
Cattaneo, Matias D., Jansson, Michael, and Ma, Xinwei
- Published
- 2024
- Full Text
- View/download PDF
30. Local Regression Distribution Estimators
- Author
-
Cattaneo, Matias D., Jansson, Michael, and Ma, Xinwei
- Subjects
Economics - Econometrics ,Mathematics - Statistics Theory ,Statistics - Methodology - Abstract
This paper investigates the large sample properties of local regression distribution estimators, which include a class of boundary adaptive density estimators as a prime example. First, we establish a pointwise Gaussian large sample distributional approximation in a unified way, allowing for both boundary and interior evaluation points simultaneously. Using this result, we study the asymptotic efficiency of the estimators, and show that a carefully crafted minimum distance implementation based on "redundant" regressors can lead to efficiency gains. Second, we establish uniform linearizations and strong approximations for the estimators, and employ these results to construct valid confidence bands. Third, we develop extensions to weighted distributions with estimated weights and to local $L^{2}$ least squares estimation. Finally, we illustrate our methods with two applications in program evaluation: counterfactual density testing, and IV specification and heterogeneity density analysis. Companion software packages in Stata and R are available. more...
- Published
- 2020
31. Local regression distribution estimators
- Author
-
Cattaneo, Matias D, Jansson, Michael, and Ma, Xinwei
- Subjects
Statistics ,Applied Economics ,Econometrics - Published
- 2021
32. Analysis of Regression Discontinuity Designs with Multiple Cutoffs or Multiple Scores
- Author
-
Cattaneo, Matias D., Titiunik, Rocio, and Vazquez-Bare, Gonzalo
- Subjects
Statistics - Computation ,Economics - Econometrics - Abstract
We introduce the \texttt{Stata} (and \texttt{R}) package \texttt{rdmulti}, which includes three commands (\texttt{rdmc}, \texttt{rdmcplot}, \texttt{rdms}) for analyzing Regression Discontinuity (RD) designs with multiple cutoffs or multiple scores. The command \texttt{rdmc} applies to non-cumulative and cumulative multi-cutoff RD settings. It calculates pooled and cutoff-specific RD treatment effects, and provides robust bias-corrected inference procedures. Post estimation and inference is allowed. The command \texttt{rdmcplot} offers RD plots for multi-cutoff settings. Finally, the command \texttt{rdms} concerns multi-score settings, covering in particular cumulative cutoffs and two running variables contexts. It also calculates pooled and cutoff-specific RD treatment effects, provides robust bias-corrected inference procedures, and allows for post-estimation estimation and inference. These commands employ the \texttt{Stata} (and \texttt{R}) package \texttt{rdrobust} for plotting, estimation, and inference. Companion \texttt{R} functions with the same syntax and capabilities are provided. more...
- Published
- 2019
33. Prediction Intervals for Synthetic Control Methods
- Author
-
Cattaneo, Matias D., Feng, Yingjie, and Titiunik, Rocio
- Subjects
Statistics - Methodology ,Economics - Econometrics - Abstract
Uncertainty quantification is a fundamental problem in the analysis and interpretation of synthetic control (SC) methods. We develop conditional prediction intervals in the SC framework, and provide conditions under which these intervals offer finite-sample probability guarantees. Our method allows for covariate adjustment and non-stationary data. The construction begins by noting that the statistical uncertainty of the SC prediction is governed by two distinct sources of randomness: one coming from the construction of the (likely misspecified) SC weights in the pre-treatment period, and the other coming from the unobservable stochastic error in the post-treatment period when the treatment effect is analyzed. Accordingly, our proposed prediction intervals are constructed taking into account both sources of randomness. For implementation, we propose a simulation-based approach along with finite-sample-based probability bound arguments, naturally leading to principled sensitivity analysis methods. We illustrate the numerical performance of our methods using empirical applications and a small simulation study. \texttt{Python}, \texttt{R} and \texttt{Stata} software packages implementing our methodology are available. more...
- Published
- 2019
34. A Practical Introduction to Regression Discontinuity Designs: Foundations
- Author
-
Cattaneo, Matias D., Idrobo, Nicolas, and Titiunik, Rocio
- Subjects
Statistics - Methodology ,Economics - Econometrics ,Statistics - Applications ,Statistics - Computation - Abstract
In this Element and its accompanying Element, Matias D. Cattaneo, Nicolas Idrobo, and Rocio Titiunik provide an accessible and practical guide for the analysis and interpretation of Regression Discontinuity (RD) designs that encourages the use of a common set of practices and facilitates the accumulation of RD-based empirical evidence. In this Element, the authors discuss the foundations of the canonical Sharp RD design, which has the following features: (i) the score is continuously distributed and has only one dimension, (ii) there is only one cutoff, and (iii) compliance with the treatment assignment is perfect. In the accompanying Element, the authors discuss practical and conceptual extensions to the basic RD setup. more...
- Published
- 2019
- Full Text
- View/download PDF
35. lpdensity: Local Polynomial Density Estimation and Inference
- Author
-
Cattaneo, Matias D., Jansson, Michael, and Ma, Xinwei
- Subjects
Statistics - Computation ,Economics - Econometrics ,Statistics - Applications - Abstract
Density estimation and inference methods are widely used in empirical work. When the underlying distribution has compact support, conventional kernel-based density estimators are no longer consistent near or at the boundary because of their well-known boundary bias. Alternative smoothing methods are available to handle boundary points in density estimation, but they all require additional tuning parameter choices or other typically ad hoc modifications depending on the evaluation point and/or approach considered. This article discusses the R and Stata package lpdensity implementing a novel local polynomial density estimator proposed and studied in Cattaneo, Jansson, and Ma (2020, 2021), which is boundary adaptive and involves only one tuning parameter. The methods implemented also cover local polynomial estimation of the cumulative distribution function and density derivatives. In addition to point estimation and graphical procedures, the package offers consistent variance estimators, mean squared error optimal bandwidth selection, robust bias-corrected inference, and confidence bands construction, among other features. A comparison with other density estimation packages available in R using a Monte Carlo experiment is provided. more...
- Published
- 2019
36. The Regression Discontinuity Design
- Author
-
Cattaneo, Matias D., Titiunik, Rocio, and Vazquez-Bare, Gonzalo
- Subjects
Economics - Econometrics ,Statistics - Applications ,Statistics - Methodology - Abstract
This handbook chapter gives an introduction to the sharp regression discontinuity design, covering identification, estimation, inference, and falsification methods.
- Published
- 2019
- Full Text
- View/download PDF
37. nprobust: Nonparametric Kernel-Based Estimation and Robust Bias-Corrected Inference
- Author
-
Calonico, Sebastian, Cattaneo, Matias D., and Farrell, Max H.
- Subjects
Statistics - Computation ,Economics - Econometrics ,Statistics - Methodology - Abstract
Nonparametric kernel density and local polynomial regression estimators are very popular in Statistics, Economics, and many other disciplines. They are routinely employed in applied work, either as part of the main empirical analysis or as a preliminary ingredient entering some other estimation or inference procedure. This article describes the main methodological and numerical features of the software package nprobust, which offers an array of estimation and inference procedures for nonparametric kernel-based density and local polynomial regression methods, implemented in both the R and Stata statistical platforms. The package includes not only classical bandwidth selection, estimation, and inference methods (Wand and Jones, 1995; Fan and Gijbels, 1996), but also other recent developments in the statistics and econometrics literatures such as robust bias-corrected inference and coverage error optimal bandwidth selection (Calonico, Cattaneo and Farrell, 2018, 2019). Furthermore, this article also proposes a simple way of estimating optimal bandwidths in practice that always delivers the optimal mean square error convergence rate regardless of the specific evaluation point, that is, no matter whether it is implemented at a boundary or interior point. Numerical performance is illustrated using an empirical application and simulated data, where a detailed numerical comparison with other R packages is given. more...
- Published
- 2019
38. lspartition: Partitioning-Based Least Squares Regression
- Author
-
Cattaneo, Matias D., Farrell, Max H., and Feng, Yingjie
- Subjects
Statistics - Computation ,Economics - Econometrics ,Statistics - Methodology - Abstract
Nonparametric partitioning-based least squares regression is an important tool in empirical work. Common examples include regressions based on splines, wavelets, and piecewise polynomials. This article discusses the main methodological and numerical features of the R software package lspartition, which implements modern estimation and inference results for partitioning-based least squares (series) regression estimation. This article discusses the main methodological and numerical features of the R software package lspartition, which implements results for partitioning-based least squares (series) regression estimation and inference from Cattaneo and Farrell (2013) and Cattaneo, Farrell, and Feng (2019). These results cover the multivariate regression function as well as its derivatives. First, the package provides data-driven methods to choose the number of partition knots optimally, according to integrated mean squared error, yielding optimal point estimation. Second, robust bias correction is implemented to combine this point estimator with valid inference. Third, the package provides estimates and inference for the unknown function both pointwise and uniformly in the conditioning variables. In particular, valid confidence bands are provided. Finally, an extension to two-sample analysis is developed, which can be used in treatment-control comparisons and related problems more...
- Published
- 2019
39. Average Density Estimators: Efficiency and Bootstrap Consistency
- Author
-
Cattaneo, Matias D. and Jansson, Michael
- Subjects
Economics - Econometrics ,Mathematics - Statistics Theory - Abstract
This paper highlights a tension between semiparametric efficiency and bootstrap consistency in the context of a canonical semiparametric estimation problem, namely the problem of estimating the average density. It is shown that although simple plug-in estimators suffer from bias problems preventing them from achieving semiparametric efficiency under minimal smoothness conditions, the nonparametric bootstrap automatically corrects for this bias and that, as a result, these seemingly inferior estimators achieve bootstrap consistency under minimal smoothness conditions. In contrast, several "debiased" estimators that achieve semiparametric efficiency under minimal smoothness conditions do not achieve bootstrap consistency under those same conditions. more...
- Published
- 2019
40. Binscatter Regressions
- Author
-
Cattaneo, Matias D., Crump, Richard K., Farrell, Max H., and Feng, Yingjie
- Subjects
Economics - Econometrics ,Statistics - Computation - Abstract
We introduce the package Binsreg, which implements the binscatter methods developed by Cattaneo, Crump, Farrell, and Feng (2024b,a). The package includes seven commands: binsreg, binslogit, binsprobit, binsqreg, binstest, binspwc, and binsregselect. The first four commands implement binscatter plotting, point estimation, and uncertainty quantification (confidence intervals and confidence bands) for least squares linear binscatter regression (binsreg) and for nonlinear binscatter regression (binslogit for Logit regression, binsprobit for Probit regression, and binsqreg for quantile regression). The next two commands focus on pointwise and uniform inference: binstest implements hypothesis testing procedures for parametric specifications and for nonparametric shape restrictions of the unknown regression function, while binspwc implements multi-group pairwise statistical comparisons. Finally, the command binsregselect implements data-driven number of bins selectors. The commands offer binned scatter plots, and allow for covariate adjustment, weighting, clustering, and multi-sample analysis, which is useful when studying treatment effect heterogeneity in randomized and observational studies, among many other features. more...
- Published
- 2019
41. On Binscatter
- Author
-
Cattaneo, Matias D., Crump, Richard K., Farrell, Max H., and Feng, Yingjie
- Subjects
Economics - Econometrics ,Statistics - Methodology ,Statistics - Machine Learning - Abstract
Binscatter is a popular method for visualizing bivariate relationships and conducting informal specification testing. We study the properties of this method formally and develop enhanced visualization and econometric binscatter tools. These include estimating conditional means with optimal binning and quantifying uncertainty. We also highlight a methodological problem related to covariate adjustment that can yield incorrect conclusions. We revisit two applications using our methodology and find substantially different results relative to those obtained using prior informal binscatter methods. General purpose software in Python, R, and Stata is provided. Our technical work is of independent interest for the nonparametric partition-based estimation literature. more...
- Published
- 2019
42. Simple Local Polynomial Density Estimators
- Author
-
Cattaneo, Matias D, Jansson, Michael, and Ma, Xinwei
- Subjects
Density estimation ,Local polynomial methods ,Manipulation test ,Regression discontinuity ,Statistics ,Econometrics ,Demography ,Statistics & Probability - Abstract
This article introduces an intuitive and easy-to-implement nonparametric density estimator based on local polynomial techniques. The estimator is fully boundary adaptive and automatic, but does not require prebinning or any other transformation of the data. We study the main asymptotic properties of the estimator, and use these results to provide principled estimation, inference, and bandwidth selection methods. As a substantive application of our results, we develop a novel discontinuity in density testing procedure, an important problem in regression discontinuity designs and other program evaluation settings. An illustrative empirical application is given. Two companion Stata and R software packages are provided. more...
- Published
- 2020
43. A Random Attention Model
- Author
-
Cattaneo, Matias D, Ma, Xinwei, Masatlioglu, Yusufcan, and Suleymanov, Elchin
- Subjects
Economics ,Commerce ,Management ,Tourism and Services ,Commerce ,Management ,Tourism and Services - Published
- 2020
44. Bootstrap‐Based Inference for Cube Root Asymptotics
- Author
-
Cattaneo, Matias D, Jansson, Michael, and Nagasawa, Kenichi
- Subjects
Economics ,Econometrics ,Cube root asymptotics ,bootstrapping ,maximum score ,empirical risk minimization ,Economic Theory ,Applied Economics ,Applied economics ,Economic theory - Abstract
This paper proposes a valid bootstrap-based distributional approximation for M-estimators exhibiting a Chernoff (1964)-type limiting distribution. For estimators of this kind, the standard nonparametric bootstrap is inconsistent. The method proposed herein is based on the nonparametric bootstrap, but restores consistency by altering the shape of the criterion function defining the estimator whose distribution we seek to approximate. This modification leads to a generic and easy-to-implement resampling method for inference that is conceptually distinct from other available distributional approximations. We illustrate the applicability of our results with four examples in econometrics and machine learning. more...
- Published
- 2020
45. Covariate Adjustment in Regression Discontinuity Designs
- Author
-
Cattaneo, Matias D., primary, Keele, Luke, additional, and Titiunik, Rocío, additional
- Published
- 2023
- Full Text
- View/download PDF
46. Simple Local Polynomial Density Estimators
- Author
-
Cattaneo, Matias D., Jansson, Michael, and Ma, Xinwei
- Subjects
Economics - Econometrics ,Statistics - Methodology - Abstract
This paper introduces an intuitive and easy-to-implement nonparametric density estimator based on local polynomial techniques. The estimator is fully boundary adaptive and automatic, but does not require pre-binning or any other transformation of the data. We study the main asymptotic properties of the estimator, and use these results to provide principled estimation, inference, and bandwidth selection methods. As a substantive application of our results, we develop a novel discontinuity in density testing procedure, an important problem in regression discontinuity designs and other program evaluation settings. An illustrative empirical application is given. Two companion Stata and R software packages are provided. more...
- Published
- 2018
47. Regression Discontinuity Designs Using Covariates
- Author
-
Calonico, Sebastian, Cattaneo, Matias D., Farrell, Max H., and Titiunik, Rocio
- Subjects
Economics - Econometrics ,Statistics - Methodology - Abstract
We study regression discontinuity designs when covariates are included in the estimation. We examine local polynomial estimators that include discrete or continuous covariates in an additive separable way, but without imposing any parametric restrictions on the underlying population regression functions. We recommend a covariate-adjustment approach that retains consistency under intuitive conditions, and characterize the potential for estimation and inference improvements. We also present new covariate-adjusted mean squared error expansions and robust bias-corrected inference procedures, with heteroskedasticity-consistent and cluster-robust standard errors. An empirical illustration and an extensive simulation study is presented. All methods are implemented in \texttt{R} and \texttt{Stata} software packages. more...
- Published
- 2018
48. Characteristic-Sorted Portfolios: Estimation and Inference
- Author
-
Cattaneo, Matias D., Crump, Richard K., Farrell, Max H., and Schaumburg, Ernst
- Subjects
Economics - Econometrics ,Economics - General Economics ,Statistics - Methodology - Abstract
Portfolio sorting is ubiquitous in the empirical finance literature, where it has been widely used to identify pricing anomalies. Despite its popularity, little attention has been paid to the statistical properties of the procedure. We develop a general framework for portfolio sorting by casting it as a nonparametric estimator. We present valid asymptotic inference methods and a valid mean square error expansion of the estimator leading to an optimal choice for the number of portfolios. In practical settings, the optimal choice may be much larger than the standard choices of 5 or 10. To illustrate the relevance of our results, we revisit the size and momentum anomalies. more...
- Published
- 2018
49. Optimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs
- Author
-
Calonico, Sebastian, Cattaneo, Matias D., and Farrell, Max H.
- Subjects
Economics - Econometrics ,Statistics - Methodology - Abstract
Modern empirical work in Regression Discontinuity (RD) designs often employs local polynomial estimation and inference with a mean square error (MSE) optimal bandwidth choice. This bandwidth yields an MSE-optimal RD treatment effect estimator, but is by construction invalid for inference. Robust bias corrected (RBC) inference methods are valid when using the MSE-optimal bandwidth, but we show they yield suboptimal confidence intervals in terms of coverage error. We establish valid coverage error expansions for RBC confidence interval estimators and use these results to propose new inference-optimal bandwidth choices for forming these intervals. We find that the standard MSE-optimal bandwidth for the RD point estimator is too large when the goal is to construct RBC confidence intervals with the smallest coverage error. We further optimize the constant terms behind the coverage error to derive new optimal choices for the auxiliary bandwidth required for RBC inference. Our expansions also establish that RBC inference yields higher-order refinements (relative to traditional undersmoothing) in the context of RD designs. Our main results cover sharp and sharp kink RD designs under conditional heteroskedasticity, and we discuss extensions to fuzzy and other RD designs, clustered sampling, and pre-intervention covariates adjustments. The theoretical findings are illustrated with a Monte Carlo experiment and an empirical application, and the main methodological results are available in \texttt{R} and \texttt{Stata} packages. more...
- Published
- 2018
50. Extrapolating Treatment Effects in Multi-Cutoff Regression Discontinuity Designs
- Author
-
Cattaneo, Matias D., Keele, Luke, Titiunik, Rocio, and Vazquez-Bare, Gonzalo
- Subjects
Economics - Econometrics ,Statistics - Applications ,Statistics - Methodology - Abstract
In non-experimental settings, the Regression Discontinuity (RD) design is one of the most credible identification strategies for program evaluation and causal inference. However, RD treatment effect estimands are necessarily local, making statistical methods for the extrapolation of these effects a key area for development. We introduce a new method for extrapolation of RD effects that relies on the presence of multiple cutoffs, and is therefore design-based. Our approach employs an easy-to-interpret identifying assumption that mimics the idea of "common trends" in difference-in-differences designs. We illustrate our methods with data on a subsidized loan program on post-education attendance in Colombia, and offer new evidence on program effects for students with test scores away from the cutoff that determined program eligibility. more...
- Published
- 2018
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.