Is the most recent week in the Wisconsin Covid death series higher?

Is the most recent week in the Wisconsin Covid death series higher?

Background

Last week, Wisconsin data colleagues posed a question.  Was the most recent week’s count of Wisconsin deaths attributed to Covid-19 higher than the previous week?

It pays to consider the purpose of the question.   Government agencies and media outlets use a range of measures, including reported deaths, to upgrade or downgrade the state of Covid disease in states and countries based on threshold levels and shifts. 

For example, last week The New York Times included Wisconsin in the set of states with recent increase in deaths attributed to Covid-19, with implications for travel restrictions and resource decisions.

The basic question might be rephrased like this:  Is the count of 50 deaths for the week ending 27 July sufficiently higher to act differently?   

A control chart analysis shifts the attention from the last week of the series to the pattern in the entire series.  The weeks early in July have an unusually low number of deaths.  Deaths in the week ending 27 July revert to levels seen earlier in the series.

I used R 4.0.1 to create the data tables and charts in this post.  You can find an RMarkdown file and support files on GitHub.

Plot the Data

What do the data tell us?  I made run charts with same R code applied in my previous post.  The death series comes from The New York Times through 27 July, accessed 28 July. 

The blue panel shows the 14-day period 14 July through 27 July.   The median is based on the 30 days before the final 14-day period.

Picture1.png

In Wisconsin and other locations, Covid data show strong day to day differences.  There is a similar pattern in the death series.  Reported values in The New York Times data table tend to be lowest on Sunday and Monday, peak on Tuesday then drop a bit on Wednesday through Saturday.   The table here splits the data series by day of week and shows the minimum, quartiles, mean and maximum for each day.

Picture2.png

Some analysts like to use seven day moving averages to smooth out the day to day variation. 

My Wisconsin colleagues prefer seven-day sums.  Each sum will include one day of the week.   I refer to these seven-day sums as ‘weeks’ in the following discussion.   Just remember that a week is seven consecutive days, not necessarily Sunday through Saturday.

The next chart shows the weekly data, with a median based on all the data except the week ending 27 July, with the same blue band as before. 

Picture3.png

Is the week ending 27 July 2020 different from previous weeks?

Just look!  The plot answers the question:  the weekly count of 50 deaths for the seven days ending 27 July is higher than any of the previous five weeks.    Deaths in late June and early July appear to much lower than other weeks in April, May and early June.

Control Chart Approach

Control chart calculations require specification of a baseline period to build preliminary limits.   

If we cannot specify a baseline period, then we cannot construct a chart.  

Sometimes we must guess a baseline period and proceed with humility.

The baseline period in my initial control charts uses all the data except the most recent 7-day period, shown as a blue dot.

The c-chart uses the structure of the Poisson distribution to represent common cause variation.   The control limits are the average of the counts c in the baseline period +/- 3 √c.  This formula uses the fact that the variance of a Poisson variable is equal to its mean.  This remarkable relationship constrains how much variation we can expect to see in a sequence of Poisson counts.

Picture4.png

The chart shows seven points of 18 outside the control limits; 11 consecutive points are at or above the weekly average.  The variation in weekly deaths is not adequately described by a Poisson distribution with a common mean 50!

In contrast to the c-chart, the individuals chart uses the week-to-week differences to represent common cause variation.   

Picture5.png

The individuals chart has the same center line as the c-chart. The i-chart consequently shows the 11 consecutive counts at or above the mean.   Three counts lie beyond the control limits: low counts the first full week of deaths in March and the week ending 6 July; a high count the week ending 1 June.  

The i-chart limits are 41% wider than the c-chart limits.   This ‘over-dispersion’ is commonly observed by epidemiologists, who look at statistical models that are less restrictive than the Poisson model.  See for example this paper in Nature.

Stiratelli’s Guidance

30 years ago, my colleague Rob Stiratelli, an industrial statistician, gave me this guidance: When you see signals of special cause in a control chart, stand back and look at the chart. You have license to consider the entire series and interpret what you see in light of those signals. In the absence of special cause signals, you have no license to interpret patterns. Do not waste your time asking why one point is higher than the next or whether three consecutive points increasing is really a trend.

Applying Rob’s guidance, here’s the message of the two control charts:  After the initial low value, death counts in April and May were consistently above the 17-week average. Starting the second week in June, death counts dropped. Low death counts in early July are a signal that indicates a change in the system of causes.  The higher death count for the week ending 27 July is more consistent with counts in April-June than with counts in early July.

Appendix:  An alternative approach to answering the question?

Colleagues in Wisconsin proposed a series of statistical tests to investigate whether the last week in the Wisconsin death series was ‘significantly’ higher than weeks earlier in July.  They created a set of adjacent blocks of increasing size, working backwards from July 27. Next, they used the poisson.test function in R to perform a hypothesis test for each block comparison:  How likely is it to observe counts in Blocks 1 and 2 as or more extreme than what we see in the data, if both counts are drawn from the same Poisson distribution?

Picture7.png

Critique of statistical testing

Major criticism:  The test procedure ignores the observed order of the counts. The tests will yield the same statistical significance no matter what the order of counts within the blocks.    As larger blocks are constructed in an effort to increase statistical power, the presumed echangeability becomes increasingly problematic.

In contrast, the control chart approach analyzes the counts in the order reported.  The evolution of the pandemic is inherently a process unfolding in time. The control chart respects this structure.

“Presentation of results, to be optimally useful, and to be good science, must conform to Shewhart’s rule: viz., preserve, for the uses intended, all the evidence in the original data.” (W.E. Deming, “On probability as a basis for action”, American Statistician29, No. 4., 148).

Time order is central to the evidence in the death count data.

Additional criticism:  The control charts do not yield a precise statistical statement, compared to the Poisson tests.   This is a feature, not a bug.   Precise statements depend on distributional assumptions, in this case Poisson structure, that the control charts call into question.

I

Analytic vs. Enumerative, one more time

Analytic vs. Enumerative, one more time

Measurement Notes:  Monitoring Covid-19 Signals

Measurement Notes: Monitoring Covid-19 Signals