Michael V. Ellis University at Albany, State University of New York Nicholas Ladany Temple University Maxine Krengel Boston Veterans Affairs Medical Center and Boston University School of Medicine Deborah Schult University at Albany, State University of New York The empirical studies in clinical supervision published from 1981 through 1993 were investigated to assess scientific rigor and to test whether the quality of methodology had improved since the review by R. K. Russell, A. M. Crimmings, and R. W. Lent (1984). The 144 studies were evaluated according to 49 threats to validity (T. D. Cook & D. T. Campbell, 1979; R. K. Russell et al., 1984; B. E. Wampold, B. Davis, & R. H. Good III, 1990) and 8 statistical variables (e.g., effect size, statistical power, and Type I and Type II error rates). The data revealed a shift to realistic field studies, unchecked Type I and Type II error rates, medium effect sizes, and inattention to hypothesis validity. Recommendations for designing and conducting a feasible and well-designed supervision study are offered. It can be argued that a primary goal of research in clinical supervision is to test and improve theory and to guide the practice of supervision (Ellis, 1991b). A thorough under- standing of the strengths and weaknesses of supervision research would ostensibly expand supervision theory and provide practitioners with information on how to train ef- fective counselors who, in turn, will provide more effective therapy. Although there have been numerous calls for in- creasing the scientific rigor of research on counselor super- vision and training (e.g., Ellis, 1991b; Hansen & Warner, Michael V. Ellis and Deborah Schult, Department of Counseling Psychology, University at Albany, State University of New York; Nicholas Ladany, Department of Counseling Psychology, Temple University; Maxine Krengel, Psychology Section (1168), Boston Veterans Affairs Medical Center and Department of Psychology, Boston University School of Medicine. Earlier versions of this article were presented at the 96th Annual Convention of the American Psychological Association, Atlanta, Georgia, August 1988, and at the meeting of the North Atlantic Regional Association of Counselor Education and Supervision, Albany, New York, October 1991. Maxine Krengel completed some of this research while a doc- toral student in the Department of Counseling Psychology, Uni- versity at Albany, State University of New York. We are grateful to Micki Friedlander, Richard Haase, and Erica Robbins Ellis for their insightful comments on earlier versions of this article. We express our appreciation to Eric Adams, Mafoozal Ali, Elizabeth Bhargava, Virginia Flander, David Hahn, Gohpa Khan, Michelle Mautner, Deborah Melincoff, Michael Remshard, Greg Savage, Heidi Weiss, Donna Wilson, and Bradley Wolgast for their data coding and entry assistance. Correspondence concerning this article should be addressed to Michael V. Ellis, Department of Counseling Psychology, Educa- tion 220, University at Albany, State University of New York, 1400 Washington Avenue, Albany, New York 12222. Electronic mail may be sent via Internet to me464@cnsibm.albany.edu. 35 1971; Holloway & Hosford, 1983; Russell, Crimmings, & Lent, 1984), a comprehensive and in-depth investigation of the actual state of scientific rigor has yet to be conducted. If supervision research is going to meet the goal of informing theory and practice, then a thorough assessment of its meth- odological limitations and implications is warranted. At least 32 reviews of empirically based articles pertain- ing to clinical supervision and counselor training have ap- peared in the literature. Although these reviews have made substantial contributions to the field, many did not evaluate systematically the methodological or the scientific rigor of the examined studies (e.g., Harkness & Poertner, 1989; Holloway, 1984, 1992; Holloway & Neufeldt, 1995; Kaplan, 1983; Lambert & Arnold, 1987; Leddick & Ber- nard, 1980; Liddle & Halpin, 1978; Matarazzo, 1971, 1978; Matarazzo & Garner, 1992; Matarazzo & Patterson, 1986; Russell & Petrie, 1994; Stoltenberg, McNeill, & Crethar, 1994; Yutrzenka, 1995) or did so in a cursory fashion (i.e., Baker & Daniels, 1989; Baker, Daniels, & Greeley, 1990; Ford, 1979; Hansen, Pound, & Petro, 1976; Robins, & Grimes, 1982; Hansen & Warner, 1971; Holloway & Johnston, 1985; Holloway & Wampold, 1986; Kurtz, Mar- shall, & Banspach, 1985; Loganbill, Hardy, & Delworth, 1982; Stein & Lambert, 1995; Worthington, 1987). Only four reviewers presented details of the methodological flaws encountered in the studies reviewed (Alberts & Edelstein, 1990; Avis & Sprenkle, 1990; Holloway, 1987; Russell et al., 1984). The result of not systematically evaluating the methodological issues may have erroneously led to (a) equating (or even outweighing) the findings of excellent research with poor research (Hogarty, 1989; Kline, 1983), (b) exacerbating the theoretical ambiguity in the field (Meehl, 1990), and (c) drawing inaccurate inferences and conclusions (Cooper, 1989; Ellis, 1991a). The most recent review of research in individual clinical supervision that