Control Chart Theory and the Replication Crisis in Science

Control Chart Theory and the Replication Crisis in Science

Two weeks ago in Science, Eric Loken and Andrew Gelman provided another perspective on the "replication crisis" in science: the inability of scientists to repeat an experiment and get results similar to the original study.

(“Measurement error and the replication crisis”, Eric Loken and Andrew Gelman2, Science, 10 Feb 2017: Vol. 355, Issue 6325, pp. 584-585; http://science.sciencemag.org/content/355/6325/584 ).

There is a healthy body of literature that offers both theoretical explanations and empirical evidence of this problem. 

A foundational article laying out the theory is “Why Most Published Research Findings Are False”, John P. A. Ioannidis, PLOS Medicine, August 30, 2005, http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124.   

An influential empirical study is “Estimating the reproducibility of psychological science” by the Open Science Collaboration, Science, 28 Aug 2015, Vol. 349, Issue 6251; http://science.sciencemag.org/content/349/6251/aac4716psychology.

Luder and Gelman carefully explain how studies with noisy measurements that report statistically significant effects may contribute to the replication problem.  For example:

“…in noisy research settings, statistical significance provides very weak evidence for either the sign or the magnitude of any underlying effect. Statistically significant estimates are, roughly speaking, at least two standard errors from zero. In a study with noisy measurements and small or moderate sample size, standard errors will be high and statistically significant estimates will therefore be large, even if the underlying effects are small. This is known as the statistical significance filter and can be a severe upward bias in the magnitude of effects; as one of us has shown, reported estimates can be an order-of-magnitude larger than any plausible underlying effects.”

The authors set up their analysis by discussing the nature of measurement error and offer a definition that deserves further explanation: 

“Measurement error adds noise to predictions, increases uncertainty in parameter estimates, and makes it more difficult to discover new phenomena or to distinguish among competing theories…Measurement error can be defined as random variation, of some distributional form, that produces a difference between observed and true values.”

Should measurement error be defined as random variation of some distributional form? 

We can take a step back and determine circumstances that support a belief that measurement error in any particular setting in fact can be treated as random variation.

Here’s the argument.

Measurement error arises from the act of measurement; in other words, measurement is a production process created by people.  The production process has measurements as outputs.  

Production processes yield values that are random with a distributional form if the production process is stable in the control chart sense, as explained by Wheeler and Lyday (Evaluating the Measurement Process, 2nd edition, Donald J. Wheeler and Richard W. Lyday , SPC Press, 1989, Knoxville TN).

For example, will repeated measurement of one object yield values that show no signals of assignable (special) causes on an appropriate control chart?   If so, we have evidence of the random variation offered in the definition of measurement error.

The theory of control charts also relates to the second part of the Loken and Gelman definition that refers to a "true value" of a thing or an activity.

As discussed by Shewhart in the context of quality characteristics rooted in the “economic mass production of interchangeable parts”:

“…the concept of true value leads us to choose operationally verifiable criteria that measurements of a quality characteristic must satisfy in order that they may be considered to be measurements of the true value… These criteria, as we shall see, include those for control of any method of measurement and those for checking the consistency between measurements by different methods.”  (W.A. Shewhart (1939), Statistical Method from the Viewpoint of Quality Control, Dover Edition, New York, reprinted 1986, p. 72).

In contrast to a definition of measurement error as random variation, the control chart view of measurement error requires the scientist to focus on measurement as a process that can be studied, controlled and improved.  

 

Scales built by Experts and Decision-Making

Scales built by Experts and Decision-Making

Improvement Project Progress on a 1 to 5 scale—Part 2

Improvement Project Progress on a 1 to 5 scale—Part 2