Causal Diagrams to Data and Back Again
In my previous post, I argued for the use of causal diagrams to guide thinking and action.
Of course, pictures with arrows that represent causal connections are not new.
Deming’s ‘Chain Reaction’ picture first sketched in 1950 embodies a causal theory.
Starting in the 1950’s, Jay Forrester and his students promoted the use of causal loop diagrams that explicitly include feedback loops and interactions. You can simulate system performance using difference equations and compare modeled performance with observations.
So, what’s different about the causal diagrams that I’ve discussed in my last two posts?
How are Directed Acyclic Graphs Different from other Causal Pictures?
The theory of directed acyclic graphs (DAGs) complements statistical inference; other causal pictures don’t connect to statistical theory and practice in the same way.
DAGs inform data analysis and planning of studies explicitly. In particular
1. Given a DAG, you can make specific predictions about causal and non-causal associations in relevant data.
2. Given associations in observational data, you can distinguish among causal theories represented by different DAGs.
Sonia Hernández-Díaz and co-authors give a clear example of these DAG features in a non-mathematical paper. The paper discusses an apparent paradox: Maternal smoking appears to reduce mortality for low-birth weight babies.
Here’s the key graph that sets the stage. This is Figure 2 Birth-weight-specific infant mortality curves for infants born to smokers and nonsmokers, United States, 1991 (national linked birth/infant-death data, National Center for Health Statistics) as shown in Hernández-Díaz S, Schisterman EF, Hernán MA. The birth weight “paradox” uncovered, Am J Epidemiol. 2006 Dec 1;164(11):1115-20, available here, accessed 25 September 2018.
As birth weight decreases, the increased mortality associated with maternal smoking shown by the difference between the two curves decreases to zero near the common definition of low birth weight, 2500 grams. At birth weights below 2000 grams,. the mortality curve for babies born to smoking mothers actually is a bit lower than the mortality rate for infants of non-smoking mothers.
However, the authors show that under reasonable biological causal models represented by DAGs, the protective association between maternal smoking and infant mortality is spurious.
As an added bonus, DAGs also give operational meaning to types of bias in estimates of causal effects (confounding, specification bias and measurement bias).
Hernández-Díaz’s co-author Miguel Hernan explains these types of bias in his edX course “Causal Diagrams” (ph559x, https://www.edx.org/course/causal-diagrams-draw-assumptions-harvardx-ph559x ).
Note on the History of Diagrams linked to Statistical Inference
The roots of DAG theory stretch back to the early 20th century. Pearl outlines the history of causal diagrams linked to statistical inference in The Book of Why. Sewell Wright developed path analysis that linked causal relationships with observed data. Pearl makes the case for independent development of path analysis by Barbara Burks in Chapter 9, illustrating the challenges for creative women scientists.