In the previous post I described an exercise that uses Galton’s Quincunx.
The challenge: maximize the number of results in a five-value range, over three rounds of 20 drops of the Quincunx. Based on the original exercise devised by Dr. Rob Stiratelli, the exercise starts with the “aim” of the Quincunx at the low end of the range. And at the beginning of the second and third rounds, the device has a calibration problem that shifts the aim three or four units.
Last week, we ran the exercise with four teams in a workshop. Groups A, B and C used a simulation I wrote in R and Group D used the modern Quincunx shown at left.
The exercise's participant instruction sheet is available here.
As I stated in the first post, if you can aim the funnel at the center of the desired range, you usually can get at least 16 of 20 values to fall in that range.
All four teams realized on Round 1 that the system was running on the low side of the range and made adjustments to move the aim closer to the center of the range.
However, the hidden change in the aim at the start of Round 2 caused confusion and uncertainty. Teams that had been getting almost all values in the desired range now were getting values outside the range, even though the nominal position of the meter setting was the same.
With some testing and debate, teams recovered and by the end of the second round were again getting results mostly in the desired range.
The change at the start of Round 3 caused the same challenges—initial hesitation and lack of confidence in the relationship between meter setting and output gave way to better performance, after adjustments to the aim.
Team B had the best results, but this seems to be related to a problem in the simulation (see below.)
1. No teams made a run chart of the results and the meter values. They looked at the table of numbers and did their best to look at average results and the impact of the meter setting. Our management exercise followed a 90-minute presentation and practice with run charts. The failure to make run charts is a sobering reminder that it takes repetition and presence of mind to apply improvement methods and tools when under pressure to perform, even in a training-room exercise.
2. No team got perfect results because the Quincunx system as designed is not capable of regularly producing values in a five point range. Team best efforts, management incentives, and public scorecards don’t change the underlying structure.
3. No teams systematically tested the relationship between meter setting and output. Experimentation comes at the cost of possibly poor results. I did not hear any clear discussion of how to test in the face of uncertain outcomes.
4. The teams did not cooperate; no one sent any representatives to other tables to try to learn what strategies seemed promising.
1. The Quincunx device shows that
a. Variation in results arises from variation in funnel position (an input measured by the meter setting) and system structure, represented by the pins.
b. In other words, variation in input and system structure causes the results to vary.
c. If we can study and identify how the changes in inputs and system conditions drive the variation in results, we can work “upstream” to reduce this variation in a cost-effective way.
2. A Model for Common Causes
a. The structure of the pins provides a physical model for what we call “common causes of variation.”
b. The built-in variation of the pins drives variation in results.
c. Each pin contributes a small but meaningful amount of variation to the results.
d. We can describe the variation that results from the pattern of pins--we expect to see a range of plausible values, without systematic patterns.
3. A Model for Special Causes
a. We can assign the specific movement of the funnel to changes in results so this movement will serve as our model for “special causes of variation.” Walter Shewhart, the inventor of control charts, used the term “assignable causes” to indicate that we may be able to assign a specific cause to this class of variation.
b. Movement of the funnel causes variation in results on top of the variation that arises from the pins.
c. The movement of the funnel can be relatively large or small; the size of the movement matches the ease of detecting the movement.
d. As a useful approximation, the total variation in results is composed of variation from movement of the funnel plus the variation contributed by the pins.
4. General Description of “Common Causes of Variation”
Common causes of variation are system conditions and inputs with the following properties:
a. Variation of the common causes drives variation in system results.
b. Variation in the common causes (and hence variation in the results) is built in to the current structure of the system.
c. Each common cause contributes a small portion of the variation in the results.
d. Common causes of variation combine to give a range of plausible values in the results, without systematic patterns or unusual values.
e. In practice, we can mimic “variation without systematic patterns or unusual values” by models of randomness. Also, we define system variation (with respect to a particular type of result) in terms of common causes of variation.
5. General Description of “Special Causes of Variation”
Special causes of variation are system conditions and inputs with the following properties:
a. Variation of special causes leads to variation in system results.
b. The variation in results from special causes is added to the variation that comes from common causes.
c. If the variation in special causes crosses a threshold, we will see patterns of variation or unusually large or small values in the results.
d. In practice, we say we have “evidence of special causes” when we detect patterns of variation or unusual values in system results. Then, using system knowledge, we often can match the variation in the results with variation in system conditions or inputs. In such a case, we say we have identified one or more special causes of variation.
A control chart is the primary tool to help to distinguish common causes of variation from special causes. See for example L.P. Provost and S.K. Murray (2011), The Health Care Data Guide: Learning from Data for Improvement, Jossey-Bass: San Francisco, especially chapters 4-8.
If you don’t have a physical Quincunx device, you can use the quincunx function in the animation package in R, which shows the pin variation:
My R simulation is an R shiny web app that mimics the physical Quincunx, available at https://iecodesign.shinyapps.io/Quincunx_shiny/ .
The code for the simulation is available at https://github.com/klittle314/Stiratelli_quincunx
The Admin-T and Admin-P tabs show a table and plot, respectively, for the simulation results. I do not show these tabs to participants during the simulation.
To generate a value for a given Meter Setting, the system manager clicks the “Tell System to Get Ready!” button and then clicks the “Get value” button.
The meter setting of 30 corresponds to an output value of 48, on the low end of the desired range 48-52.
After generation of 20 values, the meter setting “slips”: if the average of the first 20 values less than 50, meter value is offset three units lower. Otherwise, the meter is offset three units higher. Similarly, after the next 20 values, the meter slips again: if the average of the next 20 values is less than 50, the meter value is offset four units lower. Otherwise, the meter is offset four units higher.
Here’s a graph of 60 results from the Admin-P tab of the web app, with no adjustment to the meter (hence, no adjustment to the aim of the Quincunx funnel.) You can see how the system “center” changes during each set of 20.
If you lose connection to the server in the middle of a simulation, the web app restarts—there is no persistent memory in the version I used last week.
This restart phenomenon accounts for Team B’s good performance on rounds 2 and 3: the laptop used with this group repeatedly lost the connection to the server, so they worked with a system that never experienced a “slip” in the meter—once they had learned that a meter value of 32 was about right, they just could keep that setting and get pretty good results.
To avoid the problem of reset, you can run the simulation locally or edit the code to allow for persistent storage, e.g. https://shiny.rstudio.com/articles/persistent-data-storage.html.
In the mid-19th century, Francis Galton introduced the ideas and tools of regression and correlation to the world, derived from his studies of inherited characteristics.
To help his thinking and explanations, he built an analog simulation device, called the Quincunx. The picture at left is from Galton’s lucid description of the Quincunx on pp. 63-65 of his book Natural Inheritance (available in facsimile at http://galton.org).
The device consists of a chamber at the top that holds small metal balls and a funnel that directs the balls to drop through a field of pins, symmetrically placed.
Quincunx is simply a Latin word for five points, referring to the pattern of the pips representing the number five on a common six-sided die. This pattern, repeated, gives the field of pins in the device.
As the illustration from Galton’s book shows and Galton explicitly observed, the balls falling through the field of pins will pile up in a shape that looks like a “normal” distribution.
Dr. Rob Stiratelli taught me to use a Quincunx model slightly different from Galton’s version—Rob’s modern version, pictured at left, has a funnel that an operator can move left or right, changing the aim of the dropped bead. The operator of the modern Quincunx also can drop one bead at a time rather than dumping a whole collection at once as in Galton's model.
Rob used the Quincunx to help people understand the difference between special and common causes of variation.
While these two types of variation are the basis for control charts, Rob’s exercise aimed to get people to grasp the essence of acting rationally when faced with system results that vary.
Rob’s exercise poses a challenge: maximize the number of beads that fall in a specific five consecutive slot range. That’s not too hard to do if you can see the funnel and get the funnel aimed at the middle of that range. Given ten rows of pins and a funnel centered correctly on the center of the range in the Quincunx pictured here, you will consistently achieve at least 16 out of 20, and frequently 18 of 20.
But Rob made things more difficult, with two twists.
First, he covered up the front of the Quincunx. The only thing the audience could see was the bottom row of the display, where the slots were numbered. A five slot target range could be numbered 48 to 52.
To give people a little help, Rob described the position of the funnel, by means of a “meter value” in 1:1 relationship with the position.
The exercise starts with the funnel centered at the low end of the target range. In the set up I've used for years, the initial funnel position corresponds to a meter value of 30. Move the meter to 32 and the funnel will be centered at the middle of the target range.
Rob added one more twist. After the first 20 beads are dropped, he surreptitiously offset the funnel position—slipping the funnel left or right 3 positions, without telling the audience (remember, the Quincunx’s funnel is hidden from view.)
For example, at the beginning of the second round of 20 bead drops, a meter value of 30 will now point the funnel at outcome slot 45. At the beginning of the third round of 20, Rob slipped the funnel left or right 4 positions.
Teams are often confused by the offset, which mimics the fact that human-engineered systems never behave the same for long without regular maintenance and attention. How should managers respond? There’s a tension between calling for an adjustment when the funnel is stationary (variation from the pins only) and waiting too long to make an adjustment for a system that has moved off target—either way on average leads to a lower total score.
I’ve continued to use Rob’s exercise with my clients.
In the next post, I’ll summarize the results from a training class running next week and link to the workshop materials I’m using.
You can buy a version of Galton’s device (e.g. http://www.qualitytng.com/). I am fortunate enough to own three of these devices, a legacy from a Ford Motor Company training project a number of years ago.
The R statistical language has a package animations that includes a Quincunx simulator. I thought about using that simulator for Rob’s exercise but it turned out to be easier to build my own version, as a Shiny app. I’ll use it for the first time in public next week at the training class and then post the code to GitHub.
I like having a physical Quincunx in class though it is heavy and awkward to transport.
The physical version allows you to easily make changes to the system to see what happens—tilting the device or blocking off channels in the pin block with tape or paper. These changes are easy for anyone to see, requiring no knowledge of R code.
More importantly, the physical device provokes a discussion of why the outcomes of the beads dropping through the pins vary. Usually, someone in the class will say “it’s just random variation”, which allows us to ponder the difference between math models of a system and the system itself.
That’s an important lesson any day but perhaps particularly so this month in the aftermath of our recent U.S. national election.
In the previous post , I discussed a problem in application of control charts to per cent data. One type of control chart--the p-chart--uses the binomial probability model to define the upper and lower control limits.
We need to distinguish a related but different use of the binomial probability model, which arises from random sampling:
Suppose a clinic applies simple random sampling to all patients served in September, with a sample size of n (say, n=100) and the sample size is small relative to the total number of patients served. We ask a question about how well staff listened of each of those 100 patients; we want to know the per cent of patients who answer “Always” to our question.
We can use the binomial probability model to characterize the per cent of all patients in September who would have responded “Always” to the listening question, if we were able to ask every patient and get a reply.
Simple random sampling starts with a frame that lists each of the patients served in September. We apply a specific random procedure to select the 100 patients from the frame.
Here’s an example of a simple random sampling procedure:
(1) amend the frame of patients served in September by numbering the patients from 1 to N;
(2) go to www.random.org and generate 100 random integers between 1 and N;
(3) Select the patients from the frame according to the list of integers in step (2);
(4) Ask the patients in step (3) the listening question.
In practice, there will be issues that affect the sampling procedure like patients in the sample who refuse to respond or who can’t be located. Let’s set those aside for now, even though we should recognize that such issues often turn out to be more important than the math details we’re discussing.
(A short, clear article on the effects of sampling bias, applied to political polling, appeared in The New York Times on 5 October 2016, .)
To illustrate the binomial calculation, if 60 patients in the random sample of patients answer “Always”, our simple estimate for all patients served and answering “Always” in September is 60%.
We can estimate two or three sigma limits for the simple estimate. We use the standard deviation formula of the binomial distribution to get a value for σ-hat, our estimate of σ.
In our example, p-hat = 0.60 and n = 100, so σ-hat ≈ 0.05.
If we use three-sigma limits, we can make an interval: (0.60 – 3 x 0.05, 0.60 + 3 x 0.05) or (0.45, 0.75).
The interpretation of this interval is a little tricky, as the interval depends on numbers generated from the random sample. We have to imagine repeating the sampling procedure many times, each time calculating the interval in the same way. In that series of intervals, 99% of the intervals will contain the actual per cent of September patients who would respond “Always” to our question.
So, when we look at the interval (0.45, 0.75), it would be surprising for that interval to miss the actual per cent from the entire September population; missing the actual value would happen only about 1% of the time.
The assumption that the sample size is small relative to the total number of patients served in September allows us to say that the probability of selecting patients answering “Always” does not vary appreciably for different patients in our sample.
If the sample size is larger than about 10% of the population, the estimate of variability derived from the binomial distribution will over-estimate the sampling variation.
See this blog post that includes a simulator of the binomial and alternative (hypergeometric) sampling distributions. .
If you have a time series of per cents that describe the answer “Always” to the listening question, two conditions will allow you to construct a valid p-chart:
The sample size must be small relative to the frame size. Each member of the series must be generated by simple random sampling as described in the previous section.
In terms of the four assumptions described in the previous post, these two conditions assure assumptions (3) and (4), respectively.
W.E. Deming distinguished between two types of studies that benefit from statistical methods.
Enumerative studies focus on characterizing a particular situation, for example, the properties of patients seen by an organization in one month. We are concerned with counting or assessing attributes of those patients. We are not concerned with the system of causes that generated the pattern of attributes exhibited by the population.
The simple random sample application, which also uses the binomial probability distribution to characterize proportion of patients answering “Always” to a question, is an example of an enumerative study.
On the other hand, analytic studies purposely focus on a system of causes; calculations of a specific instance help the analyst gain insight into that system. Typically, analytic studies inquire about the behavior of a system of causes over time. A key question in analytic studies is whether the system of causes remains essentially the same over time; if so, analysts may make a prediction about future performance, with the prediction derived from study of past performance.
In our discussion of p-charts in the previous post, the study of patient experience over time is an example of an analytic study.
W.E. Deming (1975), “On Probability as a Basis For Action”, The American Statistician, 29, 4, pp 146-152, https://deming.org/media/pdf/145.pdf