Limits to Naïve Application of Fisher’s Advice in Social Science

In the last post, I began to describe how R.A. Fisher’s experimental design principles might apply to studies of management interventions.

A little bit of digging revealed a rich literature that discusses the limitations of randomized experiments especially in social science, with cogent criticisms of a naïve application of Fisher’s advice to studies of behavior.

For example, economist James Heckman’s 1991 article (“Randomization and Social Policy Evaluation”, NBER Technical Working Paper #107, ) outlines three methodological challenges in applying randomized experimental designs to program evaluation in social science.

Heckman focuses on designs that are structured as randomized controlled trials (RCTs), where researchers “…randomly assign persons to a program and compare target responses of participants to those of randomized-out nonparticipants. The mean difference between participants and randomized-out nonparticipants is defined to be the effect of the program.” (p. 3)

RCTs have three major assumptions that Heckman proceeds to examine:

(1) Randomization does not alter the program being studied; that is, randomization does not influence the behavior of people assigned to treatment or control regimes;

(2) Average (mean) differences in outcome variables are the primary (or only) measures relevant to program evaluation;

(3) Randomization is applied only one time (in allocation to treatment or control); social programs with multiple stages would appear to require randomization at each stage.

Note that a simple “A versus B” experiment, for example a random delivery of alternate web-pages to visitors to a website and a simple response (clicks or time on page), will not violate any of the assumptions.

However, more complex applications merit careful thinking along the lines Heckman describes.

In light of last week’s post, I want to focus on Heckman’s discussion of the first major assumption. (Heckman’s award of the Economics Nobel Prize in 2000, for methods to reduce selection bias in non-randomized studies, forms the intellectual backdrop of this discussion. See )

Heckman generalizes the concern I noted last week. In a study of two management interventions to improve care of stroke patients, I worried about cross-talk between work units assigned to the management interventions: “Cross-talk may mean that instead of two distinct interventions, you actually have two hybrid interventions that share bits and pieces of the interventions in unanticipated ways. In other words, the two interventions may not be as separated as you think. If you don’t have a clear understanding of the interventions, your inferences are less solid than if you can maintain clean distinctions.”

Heckman points out a related issue to cross-talk, again a consequence of human agency, that has potentially serious implications for analysis.  Work units or individuals may partially or completely opt out of treatments. It is plausible that opting out behavior may be related to performance, so now the actual experiment has confounded other factors with treatment allocation that are not "averaged out" by randomization.   The confounding can introduce bias to any estimate of average difference in effects of the interventions:  the calculated difference and associated standard error do not give a correct summary of the effect of treatment.

In the context of an experiment to increase the success of trainees after training in federally funded job centers, Heckman writes:

“…the Fisher model is a potentially misleading paradigm for social science. Humans act purposively and their behaviour is likely to be altered by introducing randomization into their choice environment. The Fisher model may be ideal for the study of fertilizer treatments on crop yields. Plots of ground do not respond to anticipated treatments of fertilizer nor can they excuse themselves from being treated. Commercial manufacturers of fertilizer can be excluded from selecting favorable plots of ground in an agricultural experimental setting in a way that training center managers cannot be excluded from selecting favorable trainees in a social science setting.” (p. 18)

In summary, treatments applied to people can provoke actions by the experimental units—people as individuals or teams—in ways unanticipated by the experimenter that depart from the ideal structure of the experiment.  Such departures weaken or wreck the foundation for inference that randomization provides. To determine the extent and impact of the departures from assumptions requires sophisticated modeling and additional data, which Heckman implies are rare complements to typical RCT analyses in the social sciences.

Back to Fisher

My reading of Fisher suggests that he would not have been surprised by Heckman’s criticisms.  In Chapter II of Design of Experiments, Fisher introduces randomization in the context of a social setting, the lady tasting tea, and discusses how to design a useful experiment that involves a human subject. 

Of course, Fisher's extensive experience designing and analyzing experiments in agriculture and genetics vitally informs his perspective.  Fisher insists that experimenters understand the nature of their experimental material or subjects to develop effective designs and not be swayed by “…mathematicians who often discuss experimentation without personal experience of the material.” (p. 49, Design of Experiments, 8th edition, 1966; Oliver and Boyd, Edinburgh). 

Read Design of Experiment’s Chapter III, “A Historical Experiment on Growth Rate” to get a deeper sense for Fisher’s insights into the practice of experimental design. Fisher examines in detail an experiment published by Charles Darwin and analyzed by Darwin's half-cousin, Francis Galton.  Fisher first quotes from Darwin and Galton’s analyses and then adds his own commentary.  The text is free of complicated mathematics and full of useful advice to experimenters, including experimenters faced with the problem of human agency in social settings.  

Chapter II of Design of Experiments is available on-line here; chapter III is available on-line here, accessed 4 January 2016.

Next Steps

In my next post on designed experiments, I want to explore additional criticisms of RCTs raised by Nancy Cartwright (the philosopher, not the actress who is the voice of Bart Simpson.)

Developments in Data Science

Developments in Data Science

Rereading Fisher: Experimental Design and Quality Improvement in Healthcare

Rereading Fisher: Experimental Design and Quality Improvement in Healthcare