Hotspotting Lessons—Part 3

Hotspotting Lessons—Part 3

Part 1 of this post described the context and conclusions of a randomized trial of the Camden Coalition’s hotspotting intervention (“Health Care Hotspotting — A Randomized, Controlled Trial”, New England Journal of Medicine 382;2 January 9, 2020.)

The authors found no difference in 180-day readmission rates for patients treated by the Camden intervention compared to a control group of patients. 

Part 2 of this post discussed the challenge of defining treatment and control in the study, a feature of real-world social experiments. 

In this post, I will draw and discuss simple directed acyclic graphs related to the NEJM study.   The graphs clarify the message of the NEJM study and illustrate the use of Pearl’s ‘do’ operator.  

Causal Diagram 

Here is a directed acyclic graph (DAG) that represents one view of the hotspotting system.   We accept the regression to the mean phenomenon identified by the authors:  on average, patients are likely to have a lower rate of subsequent hospitalization given an index hospitalization than if they do not have an index hospitalization.  The arrow between index hospitalization and subsequent hospitalization represents this relationship. 

Hotspotting2

In this diagram, there is a ‘back-door’ path between subsequent hospitalization and hotspotting:  you can trace a path between the subsequent hospitalization outcome and the Camden program intervention through the index hospitalization.  You can block the back-door path by conditioning on index hospitalization. 

There is another back-door’ path through disease burden and social factors; as drawn, this path is also blocked by conditioning on index hospitalization. 

From DAG to conditional probabilities to an estimate of causal effect  

Let’s consider a simpler version of the DAG that ignores Disease burden and social factors.   We can use the simpler DAG to march through a few calculations that will clarify how to carry out causal inference.  The calculations will shed light on the NEJM study, too. 

Hotspotting3

Can we estimate a causal effect of the Camden Program in terms of this causal diagram?  

With the ‘do’ operator invented by Judea Pearl, the average causal effect is one way to characterize the impact of the Camden Program: 

P(Subsequent Hospitalization = Yes | do(Camden Program) = Yes) –  P(Subsequent Hospitalization = Yes | do(Camden Program) = No). 

If this difference in conditional probabilities is negative, with appropriate uncertainty bounds, then we have evidence of causal impact. 

Pearl tells us that the do(X) notation indicates an intervention to set the level of variable X; we don’t just rely on passive observation of the level of variable X.   In Pearl’s words, we distinguish doing from seeing

In general, the average causal effect is NOT the same as a similar-looking difference that is not tied to a specific causal diagram: 

P(Subsequent Hospitalization = Yes | Camden Program = Yes) – P(Subsequent Hospitalization = Yes | Camden Program = No) 

The average causal effect can only be interpreted and estimated given a specific causal diagram.  Use of the do-operator forces you to state your causal beliefs.   That’s good! 

Given our simplified DAG, the average causal effect (ACE) reduces to four terms of conditional probabilities; details are in the appendix:   

ACE = P(Subs Hosp = Yes | do(Camden) = Yes) – P(Subs Hosp = Yes | do(Camden) = No)

= {(1) + (2)} – {(3) + (4)} 

with 

(1) = P(Subs Hosp=Yes| Camden=Yes, Index Hosp=Yes) x P(Index Hosp=Yes)  

(2) = P(Subs Hosp=Yes| Camden=Yes, Index Hosp=No) x P(Index Hosp=No)  

(3) = P(Subs Hosp=Yes| Camden=No, Index Hosp=Yes) x P(Index Hosp=Yes)  

(4) = P(Subs Hosp=Yes| Camden=No, Index Hosp=No) x P(Index Hosp=No) 

Connection to the NEJM Study 

To calculate the average causal effect, we must account for the patients who had an index hospitalization and those who did not have an index hospitalization.  

The NEJM study only included patients with an index hospitalization.  

For those patients, the authors showed that the rate of subsequent hospitalization for both the Camden = Yes group and the Camden = No group were roughly the same.   As discussed in the earlier posts in this series, the authors demonstrate that the dominant effect seems to be ‘regression to the mean’, with both groups having similar 180-day hospitalization rates after index hospitalization. 

The approximate equality of subsequent hospitalization rates for the two groups implies approximate equality of terms (1) and (3) in the ACE: 

P(Subsequent Hosp=Yes| Camden=Yes, Index Hosp=Yes) ≈  P(Subsequent Hosp=Yes| Camden=No, Index Hosp=Yes). 

The equivalence means that terms (1) and (3) cancel each other, given opposite signs in the ACE formula. 

However, the ACE will only be zero if the other two terms (2) and (4) also cancel out.    

The algebra raises a caution: It looks like analysis of causal effect of the Camden program needs to account for patients who did NOT have an index hospitalization given our causal diagram.

In other words, if the causal diagram is a reasonable summary of the causal system, the study’s conclusion should be stated this way:  there does not seem to be any effect from the Camden program on subsequent 180-day hospitalizations for patients with an index admission.   This restricted conclusion is not equivalent to a general demonstration that the Camden program has no causal effect. 

Conclusion 

In his EdX course on causal analysis, Miguel Hernán thoughtfully advises:  Draw your assumptions before you draw your conclusions.  A sketch of a causal diagram helped me to understand and interpret the NEJM study. I plan to keep sketching DAGs as I read research studies. Reading Pearl’s Book of Why and taking Hernán’s class are good places to start to build your causal skills.

Appendix: Going Through the Details

To cut down on typing, let’s use X to represent the Camden Program intervention and Y to be subsequent hospitalization.

Given a table of data for patients in Camden and the definition of conditional probability, we can estimate P(Y = Yes |X = Yes) as P(Y = Yes and X = Yes)/P(X = Yes) = f11/(f11+ f10 ).

Table1.jpg

Similarly, we can also estimate P(Y = Yes |X = No) as f01/(f01 + f00).

Could we measure the causal effect of the Camden program as the difference between these two conditional probabilities? In general, no—we can be badly fooled if there are confounding variables. And the simplified DAG shows that the index hospitalization is a confounder.

Judea Pearl’s invention of the do-operator and calculus can help us out. He distinguishes observational studies from interventional studies. In a causal study, we want to estimate quantities like P(Y = Yes | do(X) = Yes), which we interpret as ‘the probability of a subsequent hospitalization given that we assigned the Camden program intervention’.

Pearl figured out how to link the structure of a DAG with calculation rules that allow us to translate a conditional causal quantity like P(Y = Yes| do(X)= Yes) into ordinary conditional probabilities.

Our simplified DAG introduces a third variable, the index hospitalization; let’s call that variable H, which also has two levels “Yes” and “No.”

Now let’s translate.

By ordinary ‘anti-marginalization’, we can rewrite the conditional causal quantity this way:

P(Y = Yes| do(X) = Yes) = ∑H P(Y = Yes| do(X)=Yes, H) P(H| do(X) = Yes)

Given our DAG, Pearl’s calculation Rule 2 tells us how to rewrite the first term in the sum, since the index hospitalization H blocks the back-door path from Y to X.

 “We know that if a set Z of variables blocks all back-door paths from X to Y, then conditional on Z, do(X) is equivalent to see(X). We can, therefore, write P(Y|do(X), Z) = P(Y|X, Z) if Z satisfies the back-door criterion.”  (Pearl, Judea. The Book of Why: The New Science of Cause and Effect (p. 234). Basic Books. Kindle Edition). 

Pearl’s calculation Rule 3 simplifies the second term in the sum: since the DAG shows that there are no forward arrows from the Camden Program X to the index hospitalization H, P(H | do(X) = Yes) reduces to P(H).  As Pearl summarizes, if we do something that does not affect the index hospitalization, then the probability of index hospitalization will not change.  (pp. 234-235 The Book of Why, Kindle Edition.) 

Combining rules 2 and 3, we get: 

P(Y = Yes | do(X) = Yes) = {P(Y = Yes|  X =Yes, H = Yes) x P(H = Yes ) + P(Y = Yes| X = Yes, H = No) x P(H = No) } 

We have the first term of the average causal effect. 

The second term, P(Y = Yes | do(X) = No) is translated using the same approach and gives:     

P(Y = Yes| do(X) = No) = {P(Y = Yes| X = No, H = Yes) x P(H = Yes ) + P(Y = Yes| X = No, H = No) x P(H = No) } 

That’s how we got the expression for the average causal effect in the main section of this blog. 

To estimate the average causal effect, we need data on all three factors and again apply the definition of conditional probability.  

Table 2.jpg

P(Y = Yes| do(X) = Yes) = (f111/(f111 + f110) ) x (f111 + f101 + f110 + f100) + (f011/(f011 + f010)) x (f011 + f001 + f010 + f000)

P(Y= Yes| do(X) = No) = (f101/(f101 + f100)) x (f111 + f101 + f110 + f100) + (f001/(f001 + f000)) x (f011 + f001 + f010 + f000)

Slogging through the algebra, terms like P(Y = Yes| do(X) = Yes)  equal terms like P(Y = Yes | X = Yes) only when all the frequencies are equal—a case in which there is no effect of the variable X.  More generally, P(Y|X)will not equal P(Y|do(X)).

Learning during the Pandemic

Learning during the Pandemic

Hotspotting Lessons—Part 2

Hotspotting Lessons—Part 2