## Analytic and Enumerative Studies: Thinking about Control Charts-2

In part 1 of this two-part post, I reviewed Deming’s distinction between analytic and enumerative studies and presented a record of No Show data from a dental clinic.

In this post, I explore the options for control charts applied to the No Show data. Control charts are a fundamental tool for analytic studies.

### What kind of control chart could we use for no-show rates?

To build a control chart, we need to assess variation to calculate the control limits. Deming’s picture helps clarify the nature of the assessment.

Each month we count all the patients with appointments; the monthly data corresponds to a monthly sequence of “Lots.”

In survey language, we have a complete count or census of the patients each month with respect to no shows. (Deming and colleagues at the U.S. Census Bureau understood that a census could be either enumerative or analytic in nature. The analytic use of a census means that you consider the census itself as a sample from a cause system. See for example W. E Deming & F. F. Stephan (1941), “On the Interpretation of Censuses as Samples”, *Journal of the American Statistical Association*, **36**:213, 45-49.)

Using Deming’s notation for the June 2016 record, we have n = N = 467. There is no sampling variation associated with the rate value p= 25.9% in terms of the Sample and Lot relationship—the Sample in Deming’s picture in our case is the same as the Lot.

However, the improvement team in Clinic A wants to understand the cause system that generates the no-show rate, month after month. They consider each Lot—the monthly patients with appointments—as a sample from the cause system. Each Lot yields a rate of no-shows. Different control charts assess the variation between and within Lots in different ways.

### p chart option

A p chart derived from the No Show data assumes that the distribution of items in each month follows the binomial distribution:

- We can count individual items.
- We can classify each item as one of two types, e.g. keeps appointment or no-shows.
- Each month, the probability p of a no-show patient is constant.
- Each month, the probability of a patient no-show does not depend on the probability of any other patient no-showing (independence).

We have no way to guarantee assumptions 3 and 4. The Cause system represents a conceptual bowl and we can’t carry out a simple random sampling operation to justify assumption 4.

To build the control limits, first estimate the value of p by the average number patients who no-show over the sequence of months in the record set. Then calculate the limits based on the number of patients each month, *n _{i}* ; the limits vary month to month:

This formula combines information from the sequence of Lots into *p* with information about each Lot, *n _{i}* the number of patients with appointments month by month. However, the limit calculation incorporates no information about Lot to Lot variation.

The p chart shows multiple rates out of control, which means we should seek to explain these special rates to get ideas to the clinic system more effectively. Is this an effective course of action? Does this chart provide a better guide to action than the run chart?

### Chart options that incorporate Lot to Lot variation

Like the run chart, the individuals control chart characterizes each Lot by a single number. It uses only Lot to Lot variation in observed rates to create the control limits. It ignores any variation within the Lots. For happenstance rate data like the No Shows example, this ignorance matches the fact we have no observed sampling variation within a month.

Importantly, in characterizing the Lot to Lot variation, the calculation of the process variation weights each Lot equally, no matter the size of the Lot.

The p’ (“p-prime”) chart developed by D.B. Laney ((2002), “Improved Control Charts for Attributes,” *Quality Engineering*,** 14**(4), 531–537) interpolates between the p-chart and the individuals chart. The p’ chart reduces to a p chart if all variation is binomial (no theoretical Lot to Lot variation) and reduces to an individuals chart if Lot sizes are all the same, regardless of the distribution within Lots.

For the No Show data, the individuals chart and p-prime charts are similar, with no points outside the control limits.

The p’ chart and the individuals charts both show all points within control limits, with a similar width of the control limits. This means the No shows data has more than binomial variation, modeled in the Laney procedure by variation in month-to-month standardized No Shows rates.

The two charts have the same message: Don’t spend time chasing explanations for individual months varying up or down. However, the pattern of values flagged by the run chart still deserves our attention.

### Conclusion

How should we monitor performance of Clinic A's No Show rates?

In general, there appears to be no fully satisfactory solution for charting happenstance rate data like the No Shows example.

All four charting solutions reduce the No Shows phenomenon to one or two summary numbers for the monthly Lots.

For my clients, a run chart of No Show rates along with a run chart of monthly denominators is the starting point; for our dental project work these charts will be good enough for monitoring.

To understand why patients do not show up for appointments and to improve, we need to dive into the summary numbers, moving toward the experience of the individual patient: break up the lot sizes, stratify rates by day of week, time of day, type of appointment, age or other characteristics of the patient. What patterns emerge that suggest changes you can try?

In fact, I now think we should follow the Lean advice to abandon batch processing and ideally move to tracking one patient at a time, in the order of patient appointments.

If tradition or constraints of your dashboard lead you to continue to display monthly No Show rates, adding notes of actions to a run chart is a better use of time for most people than calculating limits and wrestling with assumptions and interpretations that arise for happenstance rate data.

Note: An R Markdown file and Excel® data table used to create the control charts in this post may be found here on GitHub.