31 results on '"randomized experiment"'
Search Results
2. The LOOP Estimator: Adjusting for Covariates in Randomized Experiments
- Author
-
Edward Wu and Johann A. Gagnon-Bartsch
- Subjects
Computer science ,Randomized experiment ,05 social sciences ,General Social Sciences ,Estimator ,01 natural sciences ,law.invention ,Loop (topology) ,010104 statistics & probability ,Arts and Humanities (miscellaneous) ,Randomized controlled trial ,law ,Statistical analyses ,Causal inference ,0502 economics and business ,Statistics ,Covariate ,0101 mathematics ,050205 econometrics - Abstract
Background: When conducting a randomized controlled trial, it is common to specify in advance the statistical analyses that will be used to analyze the data. Typically, these analyses will involve adjusting for small imbalances in baseline covariates. However, this poses a dilemma, as adjusting for too many covariates can hurt precision more than it helps, and it is often unclear which covariates are predictive of outcome prior to conducting the experiment. Objectives: This article aims to produce a covariate adjustment method that allows for automatic variable selection, so that practitioners need not commit to any specific set of covariates prior to seeing the data. Results: In this article, we propose the “leave-one-out potential outcomes” estimator. We leave out each observation and then impute that observation’s treatment and control potential outcomes using a prediction algorithm such as a random forest. In addition to allowing for automatic variable selection, this estimator is unbiased under the Neyman–Rubin model, generally performs at least as well as the unadjusted estimator, and the experimental randomization largely justifies the statistical assumptions made.
- Published
- 2018
3. Using Bayesian Correspondence Criteria to Compare Results From a Randomized Experiment and a Quasi-Experiment Allowing Self-Selection
- Author
-
William R. Shadish, M. H. Clark, and David Rindskopf
- Subjects
Yield (engineering) ,Randomized experiment ,05 social sciences ,Bayesian probability ,050401 social sciences methods ,General Social Sciences ,Bayes Theorem ,Empirical Research ,01 natural sciences ,010104 statistics & probability ,Bias ,0504 sociology ,Arts and Humanities (miscellaneous) ,Evaluation Studies as Topic ,Research Design ,Propensity score matching ,Statistics ,Treatment effect ,0101 mathematics ,Propensity Score ,Quasi-experiment ,Selection (genetic algorithm) ,Randomized Controlled Trials as Topic ,Mathematics - Abstract
Background: Randomized experiments yield unbiased estimates of treatment effect, but such experiments are not always feasible. So researchers have searched for conditions under which randomized and nonrandomized experiments can yield the same answer. This search requires well-justified and informative correspondence criteria, that is, criteria by which we can judge if the results from an appropriately adjusted nonrandomized experiment well-approximate results from randomized experiments. Past criteria have relied exclusively on frequentist statistics, using criteria such as whether results agree in sign or statistical significance or whether results differ significantly from each other. Objectives: In this article, we show how Bayesian correspondence criteria offer more varied, nuanced, and informative answers than those from frequentist approaches. Research design: We describe the conceptual bases of Bayesian correspondence criteria and then illustrate many possibilities using an example that compares results from a randomized experiment to results from a parallel nonequivalent comparison group experiment in which participants could choose their condition. Results: Results suggest that, in this case, the quasi-experiment reasonably approximated the randomized experiment. Conclusions: We conclude with a discussion of the advantages (computation of relevant quantities, interpretation, and estimation of quantities of interest for policy), disadvantages, and limitations of Bayesian correspondence criteria. We believe that in most circumstances, the advantages of Bayesian approaches far outweigh the disadvantages.
- Published
- 2018
4. What Can Be Learned From Empirical Evaluations of Nonexperimental Methods?
- Author
-
Kylie L. Anglin, Peter M. Steiner, and Vivian C. Wong
- Subjects
Program evaluation ,Computer science ,Randomized experiment ,Yield (finance) ,05 social sciences ,050401 social sciences methods ,050301 education ,General Social Sciences ,Target population ,Field (computer science) ,0504 sociology ,Arts and Humanities (miscellaneous) ,Need to know ,Causal inference ,Econometrics ,0503 education - Abstract
Given the widespread use of nonexperimental (NE) methods for assessing program impacts, there is a strong need to know whether NE approaches yield causally valid results in field settings. In within-study comparison (WSC) designs, the researcher compares treatment effects from an NE with those obtained from a randomized experiment that shares the same target population. The goal is to assess whether the stringent assumptions required for NE methods are likely to be met in practice. This essay provides an overview of recent efforts to empirically evaluate NE method performance in field settings. We discuss a brief history of the design, highlighting methodological innovations along the way. We also describe papers that are included in this two-volume special issue on WSC approaches and suggest future areas for consideration in the design, implementation, and analysis of WSCs.
- Published
- 2018
5. Comparative Regression Discontinuity: A Stress Test With Small Samples
- Author
-
Thomas D. Cook, Yang Tang, M. H. Clark, and Yasemin Kisbu-Sakarya
- Subjects
Randomized experiment ,05 social sciences ,Reproducibility of Results ,050401 social sciences methods ,General Social Sciences ,Statistical power ,0504 sociology ,Arts and Humanities (miscellaneous) ,Research Design ,Sample size determination ,Data Interpretation, Statistical ,Sample Size ,Causal inference ,0502 economics and business ,Linear regression ,Statistics ,Covariate ,Exercise Test ,Regression discontinuity design ,Humans ,Regression Analysis ,Cutoff ,050207 economics ,Mathematics - Abstract
Compared to the randomized experiment (RE), the regression discontinuity design (RDD) has three main limitations: (1) In expectation, its results are unbiased only at the treatment cutoff and not for the entire study population; (2) it is less efficient than the RE and so requires more cases for the same statistical power; and (3) it requires correctly specifying the functional form that relates the assignment and outcome variables. One way to overcome these limitations might be to add a no-treatment functional form to the basic RDD and including it in the outcome analysis as a comparison function rather than as a covariate to increase power. Doing this creates a comparative regression discontinuity design (CRD). It has three untreated regression lines. Two are in the untreated segment of the RDD—the usual RDD one and the added untreated comparison function—while the third is in the treated RDD segment. Also observed is the treated regression line in the treated segment. Recent studies comparing RE, RDD, and CRD causal estimates have found that CRD reduces imprecision compared to RDD and also produces valid causal estimates at the treatment cutoff and also along all the rest of the assignment variable. The present study seeks to replicate these results, but with considerably smaller sample sizes. The power difference between RDD and CRD is replicated, but not the bias results either at the treatment cutoff or away from it. We conclude that CRD without large samples can be dangerous.
- Published
- 2018
6. Can Propensity Score Analysis Approximate Randomized Experiments Using Pretest and Demographic Information in Pre-K Intervention Research?
- Author
-
Mark W. Lipsey and Nianbo Dong
- Subjects
Male ,medicine.medical_specialty ,Randomized experiment ,0504 sociology ,Arts and Humanities (miscellaneous) ,Covariate ,medicine ,Humans ,Propensity Score ,Demography ,Randomized Controlled Trials as Topic ,business.industry ,05 social sciences ,050401 social sciences methods ,050301 education ,General Social Sciences ,United States ,Ignorability ,Evaluation Studies as Topic ,Child, Preschool ,Intervention research ,Propensity score matching ,Physical therapy ,Female ,business ,0503 education - Abstract
Background: It is unclear whether propensity score analysis (PSA) based on pretest and demographic covariates will meet the ignorability assumption for replicating the results of randomized experiments. Purpose: This study applies within-study comparisons to assess whether pre-Kindergarten (pre-K) treatment effects on achievement outcomes estimated using PSA based on a pretest and demographic covariates can approximate those found in a randomized experiment. Methods: Data—Four studies with samples of pre-K children each provided data on two math achievement outcome measures with baseline pretests and child demographic variables that included race, gender, age, language spoken at home, and mother’s highest education. Research Design and Data Analysis—A randomized study of a pre-K math curriculum provided benchmark estimates of effects on achievement measures. Comparison samples from other pre-K studies were then substituted for the original randomized control and the effects were reestimated using PSA. The correspondence was evaluated using multiple criteria. Results and Conclusions: The effect estimates using PSA were in the same direction as the benchmark estimates, had similar but not identical statistical significance, and did not differ from the benchmarks at statistically significant levels. However, the magnitude of the effect sizes differed and displayed both absolute and relative bias larger than required to show statistical equivalence with formal tests, but those results were not definitive because of the limited statistical power. We conclude that treatment effect estimates based on a single pretest and demographic covariates in PSA correspond to those from a randomized experiment on the most general criteria for equivalence.
- Published
- 2018
7. The Efficacy of the Rio Hondo DUI Court-A 2-Year Field Experiment.
- Author
-
MacDonald, John M., Morral, Andrew R., Raymond, Barbara, and Eibner, Christine
- Subjects
- *
DRUNK driving , *COURTS , *CRIMINAL procedure , *FIELD research , *CORRECTIONAL institutions , *SOCIAL sciences fieldwork , *DRINKING & traffic accidents , *LIFE change events - Abstract
This study reports results from an evaluation of the experimental Rio Hondo driving under the influence (DUI) court of Los Angeles County, California. Interviews and official record checks with 284 research participants who were randomly assigned to a DUI court or a traditional criminal court were assessed at baseline and at 24-month follow-up. The interviews assessed the impact of the DUI court on self-reported drunk driving behavior, the completion of treatment, time spent in jail, alcohol use, and stressful life events. Official record checks assessed the impact of the DUI court on subsequent arrests for driving under the influence and other drinking-related behaviors. Few differences on any outcomes were observed between participants in the experimental DUI court and those assigned to the traditional court. The results suggest that the DUI court model had little additional therapeutic or public safety benefit over the traditional court process. The implication of these findings for the popularity of specialized courts for treating social problems is discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
8. THE BALTIMORE CITY DRUG TREATMENT COURT: 3-Year Self-Report Outcome Study.
- Author
-
Gottfredson, Denise C., Kearley, Brook W., Najaka, Stacy S., and Rocha, Carlos M.
- Subjects
- *
DRUG abuse treatment , *SUBSTANCE abuse treatment , *DRUGS & crime , *SOCIAL interaction , *MENTAL health - Abstract
This study reports results from interviews with 157 research participants who were interviewed 3 years after randomization into treatment and control conditions in the evaluation of the Baltimore City Drug Treatment Court. The interviews asked about crime, substance use, welfare, employment, education, mental and physical health, and family and social relationships. Program participants reported less crime and substance use than did controls. Few differences between groups were observed on other outcomes, although treatment cases were less likely than controls to be on the welfare rolls at the time of the interview. Effects differed substantially according to the originating court. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
9. A Design-Based Approach to Improve External Validity in Welfare Policy Evaluations
- Author
-
Elizabeth Tipton and Laura R. Peck
- Subjects
Content area ,Management science ,Randomized experiment ,media_common.quotation_subject ,05 social sciences ,Applied psychology ,Psychological intervention ,Reproducibility of Results ,050401 social sciences methods ,050301 education ,General Social Sciences ,Public Policy ,External validity ,0504 sociology ,Arts and Humanities (miscellaneous) ,Evaluation Studies as Topic ,Research Design ,Job training ,Cluster Analysis ,Psychology ,0503 education ,Welfare ,Social Welfare ,media_common - Abstract
Background: Large-scale randomized experiments are important for determining how policy interventions change average outcomes. Researchers have begun developing methods to improve the external validity of these experiments. One new approach is a balanced sampling method for site selection, which does not require random sampling and takes into account the practicalities of site recruitment including high nonresponse. Method: The goal of balanced sampling is to develop a strategic sample selection plan that results in a sample that is compositionally similar to a well-defined inference population. To do so, a population frame is created and then divided into strata, which “focuses” recruiters on specific subpopulations. Units within these strata are then ranked, thus identifying “replacements” similar to sites that can be recruited when the ideal site refuses to participate in the experiment. Result: In this article, we consider how a balanced sample strategic site selection method might be implemented in a welfare policy evaluation. Conclusion: We find that simply developing a population frame can be challenging, with three possible and reasonable options arising in the welfare policy arena. Using relevant study-specific contextual variables, we craft a recruitment plan that considers nonresponse.
- Published
- 2016
10. Designs of Empirical Evaluations of Nonexperimental Methods in Field Settings
- Author
-
Peter M. Steiner and Vivian C. Wong
- Subjects
Research design ,Computer science ,Randomized experiment ,Empirical Research ,Machine learning ,computer.software_genre ,Empirical research ,0504 sociology ,Arts and Humanities (miscellaneous) ,Replication (statistics) ,Randomized Controlled Trials as Topic ,business.industry ,05 social sciences ,050401 social sciences methods ,050301 education ,General Social Sciences ,Replicate ,Benchmarking ,Research Design ,Benchmark (computing) ,Observational study ,Artificial intelligence ,business ,0503 education ,computer ,Quasi-experiment - Abstract
Over the last three decades, a research design has emerged to evaluate the performance of nonexperimental (NE) designs and design features in field settings. It is called the within-study comparison (WSC) approach or the design replication study. In the traditional WSC design, treatment effects from a randomized experiment are compared to those produced by an NE approach that shares the same target population. The nonexperiment may be a quasi-experimental design, such as a regression-discontinuity or an interrupted time-series design, or an observational study approach that includes matching methods, standard regression adjustments, and difference-in-differences methods. The goals of the WSC are to determine whether the nonexperiment can replicate results from a randomized experiment (which provides the causal benchmark estimate), and the contexts and conditions under which these methods work in practice. This article presents a coherent theory of the design and implementation of WSCs for evaluating NE methods. It introduces and identifies the multiple purposes of WSCs, required design components, common threats to validity, design variants, and causal estimands of interest in WSCs. It highlights two general approaches for empirical evaluations of methods in field settings, WSC designs with independent and dependent benchmark and NE arms. This article highlights advantages and disadvantages for each approach, and conditions and contexts under which each approach is optimal for addressing methodological questions.
- Published
- 2018
11. Comments on 'Covariance Adjustments for the Analysis of Randomized Field Experiments'
- Author
-
Winston Lin
- Subjects
Regression adjustment ,Psychoanalysis ,Arts and Humanities (miscellaneous) ,GEORGE (programming language) ,Randomized experiment ,Data Interpretation, Statistical ,Humans ,General Social Sciences ,Sociology ,Randomized Controlled Trials as Topic ,Freedman - Abstract
Richard Berk, Emil Pitkin, Lawrence Brown, Andreas Buja, Edward George, and Linda Zhao (2014) have written a valuable Evaluation Review paper on regression adjustment in randomized experiments. I’ve long been a fan of Berk’s critical writings on regression and meta-analysis (Berk 2004, 2007; Berk and Freedman 2003), and I recently recommended Berk, Brown, et al.’s (2014) helpful Sociological Methods and Research paper to a colleague teaching a course on regression. Also, I am grateful to Berk for sending me kind and constructively critical comments on Lin (2013) after it went to press. In my reply to his e-mail, I shared an informal essay (Lin 2012a, 2012b) discussing regression adjustment in practice. We are in agreement on many points.
- Published
- 2014
12. Reference values of within-district intraclass correlations of academic achievement by district characteristics: results from a meta-analysis of district-specific values
- Author
-
Larry V. Hedges and Eric Hedberg
- Subjects
Schools ,Intraclass correlation ,Randomized experiment ,General Social Sciences ,Academic achievement ,Disease cluster ,United States ,Test (assessment) ,law.invention ,Geography ,Arts and Humanities (miscellaneous) ,Randomized controlled trial ,law ,Reference Values ,Meta-analysis ,Statistics ,Econometrics ,Educational Status ,Humans ,Completely randomized design ,Randomized Controlled Trials as Topic - Abstract
Background: Randomized experiments are often considered the strongest designs to study the impact of educational interventions. Perhaps the most prevalent class of designs used in large-scale education experiments is the cluster randomized design in which entire schools are assigned to treatments. In cluster randomized trials that assign schools to treatments within a set of school districts, the statistical power of the test for treatment effects depends on the within-district school-level intraclass correlation (ICC). Hedges and Hedberg (2014) recently computed within-district ICC values in 11 states using three-level models (students in schools in districts) that pooled results across all the districts within each state. Although values from these analyses are useful when working with a representative sample of districts, they may be misleading for other samples of districts because the magnitude of ICCs appears to be related to district size. To plan studies with small or nonrepresentative samples of districts, better information are needed about the relation of within-district school-level ICCs to district size. Objective: Our objective is to explore the relation between district size and within-district ICCs to provide reference values for math and reading achievement for Grades 3–8 by district size, poverty level, and urbanicity level. These values are not derived from pooling across all districts within a state as in previous work but are based on the direct calculation of within-district school-level ICCs for each school district. Research Design: We use mixed models to estimate over 7,000 district-specific ICCs for math and reading achievement in 11 states and for Grades 3–8. We then perform a random effects meta-analysis on the estimated within-district ICCs. Our analysis is performed by grade and subject for different strata designated by district size (number of schools), urbanicity, and poverty rates.
- Published
- 2014
13. The Baltimore City Drug Treatment Court
- Author
-
Stacy S. Najaka, Brook W. Kearley, Carlos M. Rocha, and Denise C. Gottfredson
- Subjects
Adult ,Male ,Program evaluation ,medicine.medical_specialty ,Time Factors ,Randomization ,Substance-Related Disorders ,Randomized experiment ,media_common.quotation_subject ,030508 substance abuse ,Academic achievement ,Outcome (game theory) ,Interviews as Topic ,03 medical and health sciences ,Drug treatment ,Arts and Humanities (miscellaneous) ,Outcome Assessment, Health Care ,medicine ,Humans ,Psychiatry ,health care economics and organizations ,0505 law ,media_common ,Judicial Role ,05 social sciences ,General Social Sciences ,Health Surveys ,Mental health ,Treatment Outcome ,Baltimore ,050501 criminology ,Drug and Narcotic Control ,Female ,Substance Abuse Treatment Centers ,0305 other medical science ,Psychology ,Welfare ,Program Evaluation - Abstract
This study reports results from interviews with 157 research participants who were interviewed 3 years after randomization into treatment and control conditions in the evaluation of the Baltimore City Drug Treatment Court. The interviews asked about crime, substance use, welfare, employment, education, mental and physical health, and family and social relationships. Program participants reported less crime and substance use than did controls. Few differences between groups were observed on other outcomes, although treatment cases were less likely than controls to be on the welfare rolls at the time of the interview. Effects differed substantially according to the originating court.
- Published
- 2005
14. Evaluation of Treatment Programs for Persons with Severe Mental Illness
- Author
-
W. Dean Klinkenberg, Robert J. Calsyn, Michael L. Trusty, David A. Kenny, Gary A. Morse, and Joel P. Winter
- Subjects
Adult ,Halfway Houses ,Male ,Mental Health Services ,Program evaluation ,medicine.medical_specialty ,Randomized experiment ,Assertive community treatment ,0211 other engineering and technologies ,02 engineering and technology ,Risk Assessment ,Severity of Illness Index ,Residential Facilities ,law.invention ,0504 sociology ,Arts and Humanities (miscellaneous) ,Randomized controlled trial ,law ,Severity of illness ,medicine ,Humans ,Longitudinal Studies ,Program Development ,Psychiatry ,Aged ,Psychiatric Status Rating Scales ,Models, Statistical ,Mental Disorders ,05 social sciences ,050401 social sciences methods ,General Social Sciences ,021107 urban & regional planning ,Middle Aged ,Mental illness ,medicine.disease ,Moderation ,Community Mental Health Services ,United States ,Clinical trial ,Ill-Housed Persons ,Female ,Psychology ,Case Management ,Program Evaluation ,Clinical psychology - Abstract
This study evaluated several statistical models for estimating treatment effects in a randomized, longitudinal experiment comparing assertive community treatment (ACT) versus brokered case management (BCM). In addition, mediator and moderator analyses were conducted. The ACT clients had improved outcomes in terms of housing and psychiatric symptoms thanBCM clients. Case management housing assistance and financial assistance partially mediated housing outcomes. No reliable mediators were found for psychiatric symptoms, and no reliable moderators were found for either housing or psychiatric symptoms. The study also made several important methodological advances in the analysis of longitudinal data in randomized experiments.
- Published
- 2004
15. A Short History of Randomized Experiments in Criminology
- Author
-
David P. Farrington
- Subjects
Program evaluation ,Research design ,Randomized experiment ,MEDLINE ,Criminology ,0504 sociology ,Arts and Humanities (miscellaneous) ,Humans ,Sociology ,Justice (ethics) ,Social science ,Randomized Controlled Trials as Topic ,Mass media ,business.industry ,05 social sciences ,050401 social sciences methods ,050301 education ,General Social Sciences ,Historical Article ,History, 20th Century ,United Kingdom ,United States ,Research Design ,Famine ,business ,0503 education ,Program Evaluation - Abstract
This article discusses advantages of randomized experiments and key issues raised in the following articles. The main concern is the growth and decrease in the use of randomized experiments by the California Youth Authority, the U.S. National Institute of Justice, and the British Home Office, although other experiments are also discussed. It is concluded that feast and famine periods are influenced by key individuals. It is recommended that policy makers, practitioners, funders, the mass media, and the general public need better education in research quality so that they can tell the difference between good and poor evaluation studies. They might then demand better evaluations using randomized experiments.
- Published
- 2003
16. Intraclass Correlations and Covariate Outcome Correlations for Planning Two- and Three-Level Cluster-Randomized Experiments in Education
- Author
-
Eric Hedberg and Larry V. Hedges
- Subjects
Correlation ,Multivariate analysis ,Arts and Humanities (miscellaneous) ,Randomized experiment ,Intraclass correlation ,Multilevel model ,Statistics ,Covariate ,Econometrics ,Variance decomposition of forecast errors ,General Social Sciences ,Statistical power ,Mathematics - Abstract
Background: Cluster-randomized experiments that assign intact groups such as schools or school districts to treatment conditions are increasingly common in educational research. Such experiments are inherently multilevel designs whose sensitivity (statistical power and precision of estimates) depends on the variance decomposition across levels. This variance decomposition is usually summarized by the intraclass correlation (ICC) structure and, if covariates are used, the effectiveness of the covariates in explaining variation at each level of the design. Objectives: This article provides a compilation of school- and district-level ICC values of academic achievement and related covariate effectiveness based on state longitudinal data systems. These values are designed to be used for planning group-randomized experiments in education. The use of these values to compute statistical power and plan two- and three-level group-randomized experiments is illustrated. Research Design: We fit several hierarchical linear models to state data by grade and subject to estimate ICCs and covariate effectiveness. The total sample size is over 4.8 million students. We then compare our average of state estimates with the national work by Hedges and Hedberg.
- Published
- 2014
17. Covariance adjustments for the analysis of randomized field experiments
- Author
-
Linda Zhao, Edward I. George, Lawrence D. Brown, Andreas Buja, Richard A. Berk, and Emil Pitkin
- Subjects
Models, Statistical ,Field (physics) ,Randomized experiment ,General Social Sciences ,Covariance ,Causality ,Arts and Humanities (miscellaneous) ,Bias ,Data Interpretation, Statistical ,Covariate ,Linear regression ,Statistics ,Econometrics ,Linear Models ,Humans ,Regression Analysis ,Treatment effect ,Mathematics ,Randomized Controlled Trials as Topic - Abstract
Background: It has become common practice to analyze randomized experiments using linear regression with covariates. Improved precision of treatment effect estimates is the usual motivation. In a series of important articles, David Freedman showed that this approach can be badly flawed. Recent work by Winston Lin offers partial remedies, but important problems remain. Results: In this article, we address those problems through a reformulation of the Neyman causal model. We provide a practical estimator and valid standard errors for the average treatment effect. Proper generalizations to well-defined populations can follow. Conclusion: In most applications, the use of covariates to improve precision is not worth the trouble.
- Published
- 2014
18. Using Group Mean Centering for Computing Adjusted Means by Site in a Randomized Experimental Design
- Author
-
Michael N. Mitchell and Alisa C. Lewin
- Subjects
Adult ,Employment ,Male ,Adolescent ,Randomized experiment ,Computer science ,Pooling ,Mothers ,California ,Grand mean ,Meta-Analysis as Topic ,0504 sociology ,Arts and Humanities (miscellaneous) ,Covariate ,Statistics ,Econometrics ,Humans ,Multicenter Studies as Topic ,Child ,Randomized Controlled Trials as Topic ,Analysis of covariance ,Analysis of Variance ,Group (mathematics) ,05 social sciences ,Work (physics) ,Infant ,050401 social sciences methods ,050301 education ,General Social Sciences ,Rehabilitation, Vocational ,Middle Aged ,Mean centering ,Research Design ,Child, Preschool ,Data Interpretation, Statistical ,Female ,0503 education ,Social Welfare - Abstract
When analyzing data from a randomized experiment that is replicated across multiple sites and includes covariates, the covariates can adjust for differences from either the grand mean or the group (site) mean. The analysis strategy determines the reference point. Pooling the sites and using a standard analysis of covariance (ANCOVA) adjusts for differences around the grand mean, whereas analyzing each site separately adjusts for differences around each group (site) mean. This article demonstrates that group mean centering permits pooling data from multiple sites into a single analysis while still using the group mean as a reference point for evaluating the covariate.
- Published
- 1999
19. Comparison of a Randomized and Two Quasi-Experimental Designs in a Single Outcome Evaluation
- Author
-
James L. Carroll, Stephen G. West, Shenghwa Hsiung, Leona S. Aiken, and David E. Schwalm
- Subjects
education.field_of_study ,Computer science ,Randomized experiment ,Design of experiments ,05 social sciences ,Population ,050401 social sciences methods ,050301 education ,General Social Sciences ,Context (language use) ,Outcome (probability) ,0504 sociology ,Arts and Humanities (miscellaneous) ,Regression discontinuity design ,Mathematics education ,Remedial education ,education ,0503 education ,Completely randomized design - Abstract
The authors assessed the impact of three designs (randomized experiment, nonequivalent control group design, regression discontinuity design) on estimates of effect size of a university-level freshman remedial writing program. Designs were implemented within the same context, same time frame, and with the same population. The 375 freshman participants were either randomly assigned or self-selected into specific evaluation groups, according to design protocols. The three designs led to highly similar effect size estimates of the impact of a semester of remedial writing on writing outcomes following standard freshman composition. Specific design features contributed to the convergence of effect size estimates across designs.
- Published
- 1998
20. Locking-in effects due to early interventions? An evaluation of a multidisciplinary screening programs for avoiding long-term sickness
- Author
-
Per Johansson and Erica Lindahl
- Subjects
Adult ,Male ,medicine.medical_specialty ,Matching (statistics) ,Social Work ,Randomized experiment ,Psychological intervention ,Holistic Health ,Physical medicine and rehabilitation ,Return to Work ,Arts and Humanities (miscellaneous) ,Medicine ,Humans ,Mass Screening ,Mass screening ,Proportional Hazards Models ,Retrospective Studies ,Sweden ,Primary Health Care ,business.industry ,Health services research ,General Social Sciences ,Occupational Diseases ,Treatment Outcome ,Sick leave ,Observational study ,Female ,Health Services Research ,Sick Leave ,business ,Risk assessment - Abstract
Objective: In this article, we estimate the effect of a multidisciplinary collaborationprogram on the length of sickness absence. The intention with the programwas to avoid long-term sickness absence by providing an early and holistic evaluation of the sick-listed individuals' conditions. The target group was individuals who were at risk of becoming long-term sick. The eligibility criteria were mainly based on register information that we have access to. Methods: Using this register information, we estimate different Cox regression models and apply a nonparametric matching estimator. We have also conducted a small randomized experiment. Results: The result from the randomized experiment is not statistically significant, but the point estimate provides the same result as was found in the observational study: The program prolongs rather than shortens the sickness absence spell. That is, the average sickness absence spell is prolonged by about 3 months. Conclusions: Our main explanation for this discouraging result is that the team focuses too much on rehabilitation rather than encouraging the sick-listed individual to return to work.
- Published
- 2012
21. Introduction: rethinking child care research
- Author
-
Jeffrey S. Morrow and Douglas J. Besharov
- Subjects
Program evaluation ,Early childhood education ,Educational measurement ,Quality Assurance, Health Care ,Randomized experiment ,media_common.quotation_subject ,Child Behavior ,Social Environment ,Developmental psychology ,Child Development ,0504 sociology ,Arts and Humanities (miscellaneous) ,Health care ,Outcome Assessment, Health Care ,Early Intervention, Educational ,Medicine ,Humans ,media_common ,Variables ,business.industry ,05 social sciences ,050401 social sciences methods ,050301 education ,General Social Sciences ,Social environment ,Infant ,Child Day Care Centers ,Child development ,United States ,Child, Preschool ,Educational Status ,Educational Measurement ,business ,0503 education ,Program Evaluation - Abstract
This introduction summarizes the articles in this collection. It describes how the articles address one or more of the key elements of the child care research model: (a) selecting and measuring the independent variables to determine the characteristics (“qualities”) of the child care environment (and, in some studies, the characteristics of parents and family), (b) selecting and measuring the dependent variables to determine the child's physical and developmental status after a period of time in a particular child care arrangement (usually a school year) compared with that of children in other arrangements (or simply the same child before spending time in the arrangement), (c) establishing causal links between the independent and dependent variables that are either assumed in randomized experiments or estimated through statistical controls in nonexperimental studies, and (d) assessing impacts across subgroups to see whether the program benefits one particular group more (or less) than others. The collection closes with a proposal to develop a systematic federal research program to pursue improvements in child care and early childhood education programs.
- Published
- 2006
22. Propensity scores: an introduction and experimental test
- Author
-
William R. Shadish, Jason K. Luellen, and M. H. Clark
- Subjects
Models, Statistical ,Computer science ,Randomized experiment ,Decision tree learning ,05 social sciences ,050401 social sciences methods ,General Social Sciences ,Ensemble learning ,0506 political science ,Test (assessment) ,Statistical classification ,0504 sociology ,Arts and Humanities (miscellaneous) ,Bias ,Research Design ,Covariate ,Statistics ,Propensity score matching ,Outcome Assessment, Health Care ,050602 political science & public administration ,Humans ,Quasi-experiment ,Algorithms ,Randomized Controlled Trials as Topic - Abstract
Propensity score analysis is a relatively recent statistical innovation that is useful in the analysis of data from quasi-experiments. The goal of propensity score analysis is to balance two non-equivalent groups on observed covariates to get more accurate estimates of the effects of a treatment on which the two groups differ. This article presents a general introduction to propensity score analysis, provides an example using data from a quasi-experiment compared to a benchmark randomized experiment, offers practical advice about how to do such analyses, and discusses some limitations of the approach. It also presents the first detailed instructions to appear in the literature on how to use classification tree analysis and bagging for classification trees in the construction of propensity scores. The latter two examples serve as an introduction for researchers interested in computing propensity scores using more complex classification algorithms known as ensemble methods.
- Published
- 2005
23. The 'experimenting agency'. The California Youth Authority Research Division
- Author
-
Ted Palmer and Anthony Petrosino
- Subjects
Research design ,Adult ,Male ,Adolescent ,Randomized experiment ,Research methodology ,Psychological intervention ,Public administration ,California ,Professional Staff Committees ,Politics ,Random Allocation ,0504 sociology ,Arts and Humanities (miscellaneous) ,Agency (sociology) ,Humans ,Justice (ethics) ,Sociology ,0505 law ,Randomized Controlled Trials as Topic ,05 social sciences ,050401 social sciences methods ,General Social Sciences ,History, 20th Century ,United States ,Test (assessment) ,Research Design ,050501 criminology ,Juvenile Delinquency ,Female ,Health Services Research ,Program Evaluation - Abstract
During the 1960s and 1970s, the California Youth Authority embarked on a series of randomized field trials to test interventions for juvenile and young adult offenders. This article examines the institutional and political reasons why rigorous tests were adopted for such interventions as the Community Treatment Program. It also describes the effect these trials had on the agency and on California justice, as well as how the experimental method eventually became less often used in the Youth Authority. The authors explore some general reasons why this happened.
- Published
- 2003
24. Ethical practice and evaluation of interventions in crime and justice. The moral imperative for randomized trials
- Author
-
David Weisburd
- Subjects
Research design ,Randomized experiment ,Research Subjects ,Psychological intervention ,Poison control ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,Moral imperative ,Ethics, Professional ,Ethics, Research ,0504 sociology ,Arts and Humanities (miscellaneous) ,Argument ,Criminal Law ,Humans ,Justice (ethics) ,Obligation ,Randomized Controlled Trials as Topic ,ComputingMilieux_THECOMPUTINGPROFESSION ,05 social sciences ,050401 social sciences methods ,050301 education ,General Social Sciences ,Nontherapeutic Human Experimentation ,Research Personnel ,United States ,Research Design ,Law ,ComputingMilieux_COMPUTERSANDSOCIETY ,Engineering ethics ,Psychology ,0503 education - Abstract
In considering the ethical dilemmas associated with randomized experiments, scholars ordinarily focus on the ways in which randomization of treatments or interventions violates accepted norms of conduct of social science research more generally or evaluation of crime and justice questions more specifically. The weight of ethical judgment is thus put on experimental research to justify meeting ethical standards. In this article, it is argued that just the opposite should be true, and that in fact there is a moral imperative for the conduct of randomized experiments in crime and justice. That imperative develops from our professional obligation to provide valid answers to questions about the effectiveness of treatments, practices, and programs. It is supported by a statistical argument that makes randomized experiments the preferred method for ruling out alternative causes of the outcomes observed. Common objections to experimentation are reviewed and found overall to relate more to the failure to institutionalize experimentation than to any inherent limitations in the experimental method and its application in crime and justice settings. It is argued that the failure of crime and justice practitioners, funders, and evaluators to develop a comprehensive infrastructure for experimental evaluation represents a serious violation of professional standards.
- Published
- 2003
25. Drug abuse treatment training in Peru. A social policy experiment
- Author
-
Geetha Suresh, Linda C. Young, Knowlton Johnson, and Michael L. Berbaum
- Subjects
Inservice Training ,Time Factors ,Quality Assurance, Health Care ,Randomized experiment ,Attitude of Health Personnel ,Substance-Related Disorders ,media_common.quotation_subject ,Health Personnel ,education ,Public policy ,Fidelity ,Public Policy ,Training (civil) ,0504 sociology ,Arts and Humanities (miscellaneous) ,Peru ,Medicine ,Humans ,Program Development ,Empowerment ,Social policy ,media_common ,Medical education ,business.industry ,05 social sciences ,Therapeutic community ,050401 social sciences methods ,050301 education ,General Social Sciences ,Community Health Centers ,medicine.disease ,Substance abuse ,Workforce ,Educational Measurement ,business ,0503 education ,Clinical psychology ,Program Evaluation - Abstract
A social policy experiment is presented that was conducted from 1997 to 2000 in a setting with a high level of readiness for implementing a randomized experiment of therapeutic community (TC) drug treatment training in Peru. Seventy-six drug abuse treatment organizations were randomly assigned into three groups, and data were collected at multiple assessment periods. Staff and directors in organizations assigned to the training groups participated in either 6-week basic training or 8-week basic plus booster training sessions, which were theoretically grounded. Small- to medium-size positive effects were found on increased staff empowerment to use actual tools and principles from the training; medium and large positive effects were found on the implementation of TC methods with fidelity after the training. A follow-up with the funding and training organizations 1 year later showed use of the evaluation results in decision making in both organizations.
- Published
- 2002
26. Implementing Randomized Experiments
- Author
-
Joan Petersilia
- Subjects
Program evaluation ,Medical education ,Management science ,Randomized experiment ,05 social sciences ,050401 social sciences methods ,050301 education ,General Social Sciences ,0504 sociology ,Arts and Humanities (miscellaneous) ,Evaluation methods ,Justice (ethics) ,Psychology ,0503 education ,Demon - Abstract
Eleven jurisdictions across the country are participating in the Intensive Supervision Demon stration Projectfunded by the Bureau of Justice Assistance (BJA). The demonstration is designed to assess the effects — and costs — of sentencing convicted felons to community-based programs. One of the unique aspects of the project is that it involves random assignment of offenders to intensive probation/parole supervision or control program conditions. The demonstration will run until 1990, but it has already provided instructive insights into the issues and problems involved in managing large randomized field experiments in criminal justice. The purpose of this article is to describe the programs and sites participating in the BJA Demonstration Project. The details of the RAND evaluation are then outlined, along with the data collection methods and random assignment procedures. The remaining sections describe and discuss the author's experiences, both positive and negative, in designing and evaluating the demonstration project. The author's hope is that the lessons learned from this experiment will prove instructive and help pave the way toward more refined randomized experiments in the future.
- Published
- 1989
27. Analysis of No-Difference Findings in Evaluation Research
- Author
-
Lawrence B. Mohr and George Julnes
- Subjects
Research evaluation ,Alternative methods ,Randomized experiment ,05 social sciences ,Interval estimation ,Control variable ,050401 social sciences methods ,050301 education ,General Social Sciences ,0504 sociology ,Arts and Humanities (miscellaneous) ,If and only if ,Statistics ,Econometrics ,Metric (unit) ,0503 education ,Equivalence (measure theory) ,Mathematics - Abstract
Conclusions of no difference are becoming increasingly important in evaluation research. We delineate three major uses of no-difference findings and analyze their meanings. (1) No-differ ence findings in randomized experiments can be interpreted as support for conclusions of the absence of a meaningful treatment effect, but only if the proper analytic methods are used. (2) Statistically based conclusions in quasi-experiments do not allow causal statements about the treatment impact but do provide a metric to judge the size of the resulting difference. (3) Using no-difference findings to conclude equivalence on control variables is inefficient and potentially misleading. The final section of the article presents alternative methods by which conclusions of no difference may be supported when applicable. These methods include the use of arbitrarily high alpha levels, interval estimation, and power analysis.
- Published
- 1989
28. Do We Need Experimental Data To Evaluate the Impact of Manpower Training On Earnings?
- Author
-
V. Joseph Hotz, Marcelo P. Dabós, and James J. Heckman
- Subjects
Program evaluation ,Earnings ,Randomized experiment ,Computer science ,05 social sciences ,050401 social sciences methods ,050301 education ,General Social Sciences ,Inference ,Experimental data ,Estimator ,Test (assessment) ,Adult education ,0504 sociology ,Arts and Humanities (miscellaneous) ,Econometrics ,0503 education - Abstract
This article assesses several recent studies in the manpower training evaluation literature claiming that (1) nonexperimental methods of program evaluation produce unreliable estimates of program impacts and (2) randomized experiments are necessary to produce reliable ones. We present a more optimistic statement about the value of nonexperimental methods in analyzing the effects of training programs on earnings. Previous empirical demonstrations of the sensitivity of estimates of program impact to alternative non experimental procedures either do not test the validity of the testable assumptions that justify the nonexperimental procedures or else disregard the inference from such tests. We reanalyze data from the National Supported Work Demonstration experiment (NSW) utilized by LaLonde and Fraker and Maynard and reexamine the performance of nonexperimental estimates of the net impact of the NSW program on the posttraining earnings of young high school dropouts and adult women. Using several simple strategies for testing the appropriateness of alternative formulations of such estimators, we show that a number of the nonexperimental estimators used in these studies can be rejected. Although we eliminate a number of nonexperimental estimators by such tests, we are able to find estimators that are not rejected by these tests. Estimators not rejected by such tests yield net impact estimates that lead to the same inference about the impact of the program as the experimental estimates. The empirical results from our limited study provide tangible evidence that the recent denunciation of nonexperimental methods forevaluating manpower training effects is premature.
- Published
- 1987
29. Lessons From the Delaware Dislocated Worker Pilot Program
- Author
-
Howard S. Bloom
- Subjects
Program evaluation ,Process management ,Computer science ,Randomized experiment ,Process (engineering) ,Specific-information ,05 social sciences ,Retraining ,050401 social sciences methods ,050301 education ,General Social Sciences ,0504 sociology ,Arts and Humanities (miscellaneous) ,Operations management ,Program Design Language ,State (computer science) ,Social experiment ,0503 education - Abstract
This article presents lessons learned from an innovative employment and training program for dislocated workers—persons who have lost stable, long-term jobs due to changing technology and/or increased international competition. Because the program was conducted as a randomized experiment, it was possible to obtain unbiased estimates of its impact as well as considerable information about its implementation. The article provides specific information about program design and serves as a prototype for how social experimentation can be used by state and local governments. In addition, the article illustrates how qualitative and quantitative information from process and impact analyses can be integrated to provide a richer picture of the program experience than would be possible otherwise.
- Published
- 1987
30. The Freshman Seminar Program
- Author
-
Melvin M. Mark and John J. Romano
- Subjects
Program evaluation ,Liberal arts education ,Higher education ,business.industry ,Randomized experiment ,Research methodology ,General Social Sciences ,General education ,Academic advising ,Arts and Humanities (miscellaneous) ,Liberal education ,Mathematics education ,business ,Psychology - Abstract
The Freshman Seminar program at the Pennsylvania State University attempts to improve the quality of liberal education by giving Freshman students a detailed introduction to a particular liberal arts discipline, improved advising, and an orientation to college life. The program was introduced on a trial basis and its effects investigated using a randomized experimental design. The outcomes measured were rate of retention in the College of Liberal Arts, grade point average, credit hours completed, ratings by instructors, and various attitudes, including those toward Penn State, advising, and the value of the liberal arts. Results suggest that the program led to more favorable attitudes but not to perceptible differences in performance or retention.
- Published
- 1982
31. Randomized Experiments for Planning and Testing Projects in Developing Countries
- Author
-
Michael L. Dennis and Robert F. Boruch
- Subjects
Program evaluation ,Research design ,Research evaluation ,International studies ,Management science ,Randomized experiment ,05 social sciences ,050401 social sciences methods ,050301 education ,General Social Sciences ,Developing country ,Engineering management ,0504 sociology ,Arts and Humanities (miscellaneous) ,Cross-cultural ,Business ,Program Design Language ,0503 education - Abstract
Increased use of randomized experiments to evaluate social programs throughout the world has been a major advance in evaluation research. This article focuses on determining which program evaluations are appropriate or feasible for randomized experiments. These threshold conditions include: (1) the present practice must need improvement; (2) the efficacy of the proposed intervention(s) must be uncertain; (3) there should be no simpler alternatives; (4) the results must be potentially important for policy; and (5) the design must meet the ethical standards of both the researchers and the service providers. To illustrate the issues involved and examine some of the innovative research designs for addressing them, experiments from Barbados, China, Colombia, Kenya, India, Israel, Nicaragua, Pakistan, Taiwan, and the U.S. are reviewed.
- Published
- 1989
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.