More on Randomized Control Trials: The Views of Philosopher Nancy Cartwright

My blogs on the application of randomized experiments (here and here) have pushed me to review more articles that examine the nature of randomized controlled trials (RCTs).

Nancy Cartwright ( ) has considered the structure of RCTs in several articles. I draw upon two of them in this post:

(1) “What are randomised controlled trials good for?”, Philosophical Studies (2010) 147, 59–70,

(2) “A philosopher’s view of the long road from RCTs to effectiveness”, The Lancet, 377 April 23, 2011 1400-1401,

Logic and Limitations of RCTs

As Cartwright (2010) states, the logic of RCTs is compelling when we seek to demonstrate whether or not a cause and effect relationship exists:

“The RCT is neat because it allows us to learn causal conclusions without knowing what the possible confounding factors actually are. By definition of an ideal RCT, these are distributed equally in both the treatment and control wing [through the operation of randomization], so that when a difference in probability of the effect between treatment and control wings appears, we can infer that there is an arrangement of confounding factors in which C and E are probabilistically dependent and hence in that arrangement C causes E because no alternative explanation is left.” (p. 64)

If we judge the treated and control groups as different on some measured characteristic, what can we conclude?

Cartwright (2011) notes:

“The circumstances [defined by an RCT] are ideal for ensuring ‘the treatment caused the outcome in some members of the study’—i.e., they are ideal for supporting ‘it-works-somewhere’ claims. But they are in no way ideal for other purposes; in particular they provide no better base for extrapolating or generalising than knowledge that the treatment caused the outcome in any other individuals in any other circumstances.” (p. 1401)

This is the nub of the inferential problem--the tension between the internal and external validity of a study. External validity is what we need in improvement applications: will the change we observed in the contrived situation of the RCT work in other circumstances?

(Cartwright also points out that ‘the treatment caused the outcome in some members of the study’ may mask effects in the opposite direction for a relatively small number of treated individuals. That’s another reason to be modest in our inferential leaps from one study.)

Cartwright (2010) outlines several requirements to justify generalization from our study to other situations.

Even if we suppose that individuals enrolled in our RCT are ‘representative’ of the larger population to which we want to apply our change, Cartwright lists three assumptions that need to hold to support a logical bridge from trial to wider application:

  • The single cause C in our treatment is the only thing to be changed in our new applications;
  • C is introduced in the new circumstances just like it was introduced in the RCT (no new psychological twists, additional catalysts, campaigns, etc.)
  • The introduction of C does not affect the causal structure in the new circumstances.

As Cartwright (2010) says: “These are heavy demands.” (p. 67). This seems especially true in social or management situations, which are common in quality improvement.

What Options Do We Have to Achieve Relevance Beyond the Circumstances of the Original Trial?

As far as I can tell, we have only two options, which are bound together in practice.

Option (1): we have to develop a provisional theory, a causal explanation that reaches beyond the immediate testing circumstances that we are ready to modify given new evidence.

Option(2): we have to repeat the test of change in additional settings, building up our degree of belief incrementally.

Angus Deaton makes the case for theory clearly in a discussion of experiments to demonstrate effective social programs:

“…[RCTs] even when done without error or contamination, are unlikely to be helpful for policy, or to move beyond the local, unless they tell us something about why the program worked, something to which they are often neither targeted nor well-suited…For an RCT to produce ‘useful knowledge’ beyond its local context, it must illustrate some general tendency, some effect that is the result of mechanism that is likely to apply more broadly.” (“Instruments, Randomization, and Learning about Development”, Journal of Economic Literature, Vol. XLVIII (June 2010), p. 448, )

Cartwright (2011) weaves the two options together:

“For policy and practice we do not need to know ‘it works somewhere’. We need evidence for ‘it-will-work-for-us’ claims: the treatment will produce the desired outcome in our situation as implemented there. How can we get from it-works-somewhere to it-will-work-for-us? Perhaps by simple enumerative induction: swan 1 is white; swan 2 is white…so the next swan will be white. For this we need a large and varied inductive base—lots of swans from lots of places; lots of RCTs from different populations—plus reason to believe the observations are projectable, plus an account of the range across which they project. “(p. 1401)

Cartwright (2011) then reminds us that naïve analogies between experiments in physics and experiments in social settings may lead us to think our treatment effects have more general application than is warranted:

“ Electron charge is projectable everywhere—one good experiment is enough to generalise to all electrons; bird colour sometimes is; causality is dicey. Many causal connections depend on intimate, complex interactions among factors present so that no special role for the factor of interest can be prised out and projected to new situations.” (p. 1401).

Implications for Practice

Knowledge of what works in social settings is hard won, typically requiring many cycles of experience, whether informal or rigorously designed.

Experimenters derive benefit from the challenges in designing a trial—the design process invites careful thought about causal relations and measurement systems.

Randomized trials should explore the limits of our theories. The aim of a randomized trial should be to help us to understand why differences in treatment exist, not just whether differences exist.

More complex designs than the single treatment versus control structure of a basic RCT are attractive. As Parry and Power advised in the BMJ Quality and Safety commentary that kicked off this series of blogs, revisiting Fisher’s work on experimental design seems in order. Randomized block designs that account for variation across experimental units and screening designs that put into play a large number of factors are both tools for our toolkit. These alternatives have more complexity than the simple RCT and consequently can yield more insights, contributing to better theories and stronger degrees of belief.

The Burden of Measurement

The Burden of Measurement

Developments in Data Science

Developments in Data Science