Objective: To evaluate the ability of two prognostic systems to predict hospital mortality in adult intensive care patients., Design: Prospective cohort study., Setting: A mixed medical and surgical intensive care unit (ICU) in the United Kingdom., Patients: A total of 1,144 patients consecutively admitted to the study., Interventions: None., Measurements and Main Results: Acute Physiology and Chronic Health Evaluation (APACHE) II and III prognostic systems were applied to assess probabilities of hospital mortality, which were compared with the actual outcome. The overall goodness-of-fit of both models was assessed. Hospital death rates were higher than those predicted by each system. Risk estimates showed a strong positive correlation between both systems (nonsurvivors r2 = 0.756, p < .0001; survivors r2 = 0.787, p < .0001). Calibration of APACHE II (chi 2 = 98.6, Lemeshow-Hosmer) was superior to that of APACHE III (chi 2 = 129.8, Lemeshow-Hosmer). The total correct classification rate of APACHE III was greater for all decision criteria applied; the best overall total correct classification rate was 80.6% for APACHE III and 77.9% for APACHE II (both for a decision criterion of 40%). The areas under the receiver operating characteristic curves were 0.806 and 0.847 for APACHE II and III, respectively, confirming the better discrimination of APACHE III. When patients were classified by diagnostic categories, risk predictions did not fit uniformly across the spectrum of disease groups. For both models, mortality ratios were highest for trauma patients and lowest for the group with respiratory disease. APACHE II predictions for patients with gastrointestinal disease were significantly better. Risk estimates for surgical admissions were superior with APACHE II (MR = 1.27) compared with APACHE III (MR = 1.56), but were similar for medical patients (1.22 vs. 1.28 for APACHE II and III, respectively). Bias induced by factors reflecting the clinical practice in an individual ICU (e.g., admission criteria, treatment before admission) may have considerable impact on risk estimates. The identification of such factors appears to be a prerequisite for the meaningful interpretation of observed and predicted death rates on the individual ICU level., Conclusions: Both predictive models demonstrated a similar degree of overall goodness-of-fit. APACHE II showed better calibration, but discrimination was better with APACHE III. Hospital mortality was higher than predicted by both models, but was underestimated to a greater degree by APACHE III. Risk estimates by both models showed considerable variation across the disease spectrum of ICU patients. Risk predictions for surgical patients and patients with gastrointestinal disease were better with APACHE II. Factors reflecting the clinical practice of an individual ICU are not accounted for by APACHE II and III. Overall, the performance of APACHE III was not superior to that of its predecessor for a cohort of United Kingdom ICU patients; for certain diagnostic categories, APACHE III performed worse than APACHE II despite an improved system of disease classification.