Managed care plans in the United States are increasingly using paid claims records to monitor and judge the physicians who provide services to plan members. The motivation for economic, or “practice efficiency,” profiling, as this is termed, is primarily financial—physicians identified as being inefficient are considered to be wasteful of health plan resources, and these physicians can be encouraged to change their practice styles or they can be dropped from the plan's provider network (Sandy 1999; Nickerson and Rutledge 1999; Litton, Sisk, and Akins 2000). Vendors and clients (primarily health plans) of profiling systems consider a physician to be efficient if total claims costs of services provided to managed patients are no greater than costs expected for those patients, given the patients' demographic characteristics and health conditions. “Inefficient” physicians are those for whom actual claims costs exceed the expected amounts.1 When several physicians are involved in providing care to a patient, it is often quite difficult to determine from claims records which of the physicians ordered a particular service, prescribed a particular drug, or even admitted the patient to a hospital. Because of this attribution problem, economic profiling appears to be used most frequently in managed care plans that require primary care physicians (PCPs) to serve as gatekeepers. In these plans, the PCPs are assumed responsible for all costs incurred on behalf of the patients they manage, regardless of whether they themselves performed or even ordered the services. And with this assumption, attribution of responsibility is not an analytic problem. Even though dozens of firms offer profiling software and services to health plans, there are relatively few methodologies that can be used for risk-adjusting physician profiles, that is, for estimating patients' expected costs. And nearly all profiling vendors use one or more of these established methodologies. In the project reported here, our purposes were to determine whether some risk-adjustment methodologies used for physician profiling are more accurate than others, and whether risk-adjustment differences lead to different judgments about PCP practice efficiency. Our findings on accuracy of risk-adjustment methodologies are presented in Thomas, Grazier, and Ward (2002). Here, we describe findings related to levels of agreement among PCP practice efficiency rankings when, for a common set of PCPs, patients' expected costs are estimated using different risk-adjustment methodologies and two different measures of practice efficiency. For our analyses, we used the membership and claims databases of an independent practice association HMO that serves the five counties of Southeast Michigan. Six profiling system vendors/developers agreed to work with us on the project, to either make their risk-adjustment software available, or to process our data through their software and return the risk-adjusted results to us. A detailed description of each of the participating methodologies is provided in our project report (Thomas, Grazier, and Ward 2002). The methodologies are: Adjusted Clinical Groups (ACGs Version 4.5, 2000) from Johns Hopkins University. Adjusted clinical groups cluster health plan members having similar comorbidities into groups that have similar resource requirements and clinical characteristics. The ACG Case-Mix System then uses a branching algorithm to place each patient into one of 82 discrete, mutually exclusive categories based on the mix of clinical groups experienced during the time period under study (Johns Hopkins University 2000). Burden of Illness Score (BOI Version PRS 4.6, 2001) from MEDecision, Inc. This system is based on MEDecision's Practice Review System (PRS), which partitions care into episodes of illness and assigns services, severity levels, and medications to these episodes. The BOI Score is a linear-scaled measure that indicates relative health care cost risks associated with the particular mix of episodes experienced by a patient during a defined time period (Anderson and Gilbert 2002). Clinical Complexity Index (CCI Version 3.6, 1997) from Solucient, Inc. The CCI methodology considers age, severity, comorbidity, hospital admissions, and categories of diagnoses (acute, chronic, mental health, and pregnancy) to assign patients into mutually exclusive CCI risk categories. Although the system provides for 1,418 different categories, 95 percent of patients fall into just 45 of these (Solucient 1999). Diagnostic Cost Groups (DCGs Version 5.1, 2000) from DxCG, Inc. The DCG system includes a whole family of multiple linear regression models. For this study, we utilized the all encounter, hierarchical model (DCG/HCC) for a commercial population, which uses data on age, sex, and all diagnoses—inpatient and outpatient—to explain patients' health care expenditures for the period under study. In our analyses, we used the DCG retrospective risk measure, together with patients' age/sex categories (DxCG Inc. 2002). Episode Risk Groups (ERGs Version 4.2, 2001) from Symmetry Health Systems, Inc. Like BOI Score, ERGs are episode-based. The episodes underlying ERGs are created using Symmetry's Episode Treatment Groups (ETG™) methodology, a basic illness classification system that uses a series of clinical and statistical algorithms to combine related services into more than 600 mutually exclusive and exhaustive categories. For a given patient, episodes experienced during a time period are mapped into 119 Episode Risk Groups, and then a risk score is determined based on age, gender, and mix of ERGs. For our analyses, we used the ERG retrospective risk score (Symmetry Health Data Systems 2001). General Diagnostic Groups (GDGs Version 1.0, 2000) from Allegiance LLC. General Diagnostic Groups were developed using the Agency for Health Care Policy and Research's Clinical Classification Software (CCS). CCS aggregates individual ICD-9-CM codes identified on health care claims into 260 broad diagnosis categories for statistical analysis and reporting. The GDG system then lumps together CCS categories considered to be clinically similar and to have similar associated per-patient charges into 57 diagnostic categories. These 57 diagnostic categories are used as dummy variables in a multiple regression model for predicting health care costs (Cowen et al. 1998). Although CCI and the two episode-based methodologies include utilization data as independent predictors of risk, our analyses (not reported here) found that these systems were not different from the other three systems in terms of accuracy of member cost predictions (Thomas, Grazier, and Ward forthcoming).