The assessment of change over time is an important health issue: it is a way of testing new medical treatments and of investigating the natural courses of aging and disease. Of considerable import and interest in clinical research is the definition of “clinically meaningful” change. The definition of clinically meaningful change is challenged on at least two fronts, one statistical and the other clinical. Statistically, there are two important problems associated with measuring change. First, there are limits to the amount of change possible when scores or ratings at the first of two timepoints in question are very high or very low. This differential sensitivity to change due to extreme scores is a characteristic of many assessment instruments, and it makes interpretation of some change scores impossible. Second is the issue of the reliability of the raw change score. Several authors1–3 have argued that the errors associated with measurements at each timepoint greatly complicate the computation, interpretation, and subsequent testing of change scores. Clinical impediments to defining meaningful change arise because (1) there is little information available on the performance of control populations on many instruments and (2) there is little information on the longitudinal use of many instruments and the types of changes to be expected (although see Green et al.4). An example of a domain in which measuring change over time is important but difficult is behavioral problems arising during the course of Alzheimer’s disease (AD). There are many different tests being used to study various aspects of behavioral symptoms in AD; however, the relationship between changes on any two of these tests is unknown. Complicating interpretation of change on behavioral measures is the finding of waxing and waning4 or recurrence5 of behavioral symptoms, rather than sustained change, in persons with AD. Rasmusson and colleagues6 reviewed concisely several mechanisms for defining cognitive changes in AD. In commenting on the issue of predicting decline, the assumption is that decline will be the only change. This assumption is valid over the very long term, but a substantial proportion of AD patients may remain stable or show cognitive improvement over a 12-month period. The distribution of cognitive change in the short term can also cover the spectrum from improvement through no change to worsening.7 Similarly, no-change, worsening, and improving have all been observed with different behavioral measures in studies of persons with AD.4,8,9 There are clearly many contingencies and difficulties in defining change in behavioral symptoms, as well as in other domains. Although degenerative disease inevitably results in decline, truly sustained change in AD, especially behavioral change, may be observed only over very long periods of study. Therefore, change scores observed in clinical trials may not be unidirectional. It would be helpful to have a simple way of characterizing change over the short or long term. Another impediment to defining change is that on many instruments, change in the total score has no fixed meaning. Currently, no norms exist for the behavior assessment instruments used widely in research of patients with AD, such as the Behavior Rating Scale for Dementia (BRSD)10 and the Cohen-Mansfield Agitation Inventory (CMAI),11 although standardization data are available for the BRSD from a large sample of individuals with AD.12,13 Statistical means to assess change in behavioral symptom frequency do exist,3 but mapping clinical meaning on to a statistically significant change over a study period is not straightforward.14,15 Only the actual quantification of change is required to determine differences in change across treatment groups, but this approach may lead to the identification of a treatment as effective when the actual change, while statistically significant, is clinically meaningless.16 Therefore, this study sought to qualify change in an effort to differentiate clinically meaningful change from fluctuation and, at the same time, identify stability (no change) caused by the absence of symptoms from that caused by the continued presence of symptoms. Bereiterl points out that the unreliability of change scores comes from the respective unreliabilities of test scores at times 1 and 2 (echoed by Overall and Woodward2). To increase the reliability of a change score, Bereiter proposed focusing analysis on the item level. The model described here follows from this proposal by computing change on a per-item basis and then qualifying this change to facilitate the decision as to whether the observed change is clinically meaningful. In order to define clinically meaningful change, with the example of a behavioral measurement for persons with AD, we defined the types of changes we expect to carry the most clinical meaning based on the possible changes we can observe between two visits. We classified changes depending on whether the symptom never appeared, emerged, disappeared, increased or decreased in frequency, or remained at a constant level over 1 year. To test this approach, we applied it to changes in the reported frequencies of the specific behaviors assessed in the BRSD over 12 months in a group of well characterized AD patients living in the community. The BRSD total and subscores, as well as each item, have well established validity and reliability.9,10,12,13.