Problem: death data lags behind by a random amount

Fatalities, however, continue to come in some of the largest numbers of the pandemic, with an average of nearly 500 per day over the past week and a monthly total well over 10,000, far and away the state’s deadliest month of the pandemic, with another week still to go.

Source: Coronavirus: California’s cases, hospitalizations looking up

Reports of deaths are reports of past events. You may think when you hear a daily body count report of “Reported 54 deaths today” that this implied 54 deaths the prior day or at least recently.

No. When someone dies, a record has to eventually make its way to the state’s public health office. This takes time. Some times a long time. Consequently, a daily report of deaths may become a number that does not mean much. (To clarify, yes, people died of Covid-19 – its that their date of death is presented in a confusing or misleading way.)

I analyzed my state’s death data and found as deaths increased, they got further and further behind in recording and reporting them. Consequently, daily death reports are increasingly far off from the actual number of recent deaths.

This chart, below, is up to date through January 24th, 2021.

The orange line is the public “daily body count” report. This is the number the media reports each day – this is the number the public hears.

If you dive beneath the headline number, the daily report includes the actual date of death for each person who died. These actual dates are days, weeks and even months ago. It is completely understandable why the data is reported this way – for one, it has to be as deaths are past events.

When we re-tally the data to display deaths by actual date of death – not the late date on which they were reported to the public – we get the blue line. The red line is 7-day smoothing of actual dates of death. The green dotted line is a 7-day smoothing of the “day of report” deaths.

Seeing only the Orange line, is the pandemic getting worse in January or better?

January looks worse, based on the Orange line. But the Orange line is not measuring an actual death trend – because it includes deaths that occurred long ago.

Peak day of deaths was Dec 9, 2nd highest was Nov 25 and third highest was Dec 6.

That is different than the view created by the highly visible daily body counts.

Some people do not think this is a problem. But suppose you use the Orange trend line to set policies? That would not be good because it does not reflect the current state of affairs.

The orange line is not a measure of health data – it is a measure of office efficiency in collating death records. The orange line only tells us that OHA is getting further behind – it is not epidemiological data.

(The right edge of the chart, especially the past 2 weeks, will backfill with lagging data in the weeks to come. Currently, we are not seeing backfills sufficient to change January in to the peak month. Hospitalizations have fallen by half, ICU bed usage is down, ventilator usage is down – these are co-incident indicators that deaths in January will not magically leap skyward.)

The “Day of report” deaths is always off because it reports on past deaths. When deaths in Oregon were low, we never noticed the error between “Day of report” and “Actual date of death”. “Day of Report” was adequate for seeing the general trend.

In November, the number dying increased dramatically and the state got further and further behind. Even before November, the state was behind in reporting prior deaths, some times by months. By late December they began to “catch up”. On the first peak day of 54 deaths in December, the overwhelming majority of deaths occurred weeks and months earlier – all the way back to May! This pattern continued into January – these orange line peaks were catching up on past deaths.

For example, this daily update report of 44 deaths included deaths going back to May. Take a look.

When the death reports fall far behind, the daily death report numbers no longer provide an accurate trend indicator.

Another way to think of this is that death reports are ALWAYS time shifted forward by a random amount based on reporting delays. Today’s report is of past deaths occurring over a broad period of time.

It is understandable how this occurs but leads to a misleading public perception of the current state of the pandemic and will always occur when the change in deaths is large. It’s not a big problem when the number of deaths per day is low (and possibly stable) – even though the daily report measures something other than deaths per day, the numbers stay close together.

The easy solution is for public health to create their own chart based on actual deaths – that would be perfect.

Someone argued my chart is wrong since reports are lagging, the right side will probably just fill up and the designation of the peak in early December cannot be made. While possible, that’s unlikely. Because new cases, hospitalizations and ICU bed usage are also dropping. It would be very unlikely for deaths to continue rising while the other metrics are dropping by a lot. Note that hospitalizations have fallen by half.

Another argument is that since data is lagging, even the chart by actual date of death is wrong. Of course, it too will catch up in time but it is LESS WRONG than the method used by OHA to publicize deaths due to Covid-19. All OHA needs to do is show both graphs – the “reported by date” chart and the “actual date of death” chart.

Furthermore, and perhaps the bottom line, the OHA’s method of reporting does not meet CDC standards which calls for “condition onset time” (not day it is reported).

The above ought to be obvious and why you want to use “onset time” not office reporting time for a key metric like deaths.

One more analogy. Suppose a company reported its quarterly revenue. Past quarters were lower than the present, showing an increasing set of revenue. From an investor standpoint, it looks like a growing company. But what if they had moved past revenues into the present? What if (as in the OHA chart), a chart by actual quarterly revenue showed revenues were going down? Would you invest in this company?

This is what OHA is doing – regardless of the reasons – it does not accurately convey the current situation. (Note – in accounting there is a concept of “deferred revenue” recognition. Suppose you publish a magazine and renewals are on January 1st. Do you “book” all the year’s subscription revenue in January? No. You match your revenue with the expenses associated with delivering the product or service. Thus, you book 1/12th of that amount every month. And you disclose this. This time shifts past revenue into the period in which the revenue is spent. But this is not what OHA is doing.)

Death data is always going to be lagging – by random amounts. The effect is that the actual peak will always be sometime in the past – even though “reported deaths” may still be climbing.

Another argument is that OHA is being transparent – in the 250 or so daily updates issued since last spring, they list all of the Actual Dates of Death and anyone can see what they are.

Let’s get real. I had to go through all 250+ reports and transcribe each of the 1,600+ deaths, by hand, into a spreadsheet. Sure, anyone can do this who has a spare day available. That is not being transparent and such arguments fall flat.

Could someone do this by hand for a state with 25,000 deaths?

Could someone do this by hand for the entire country?

No. Instead, we end up with charts that are time shifting deaths by a random and unknown amount. We will think we are hitting a new peak when that peak actually occurred some time ago.

You can argue this is like economic data, which is always in the past. But past economic data is graphed as past economic data – not contemporary data.

Leading indicators are somewhat like the Orange line – but it is clear and well known they are unreliable.

Death data, on the other hand, is treated as a “gold standard” – but here it is off by weeks or months and almost no one is aware of how large the error has become.

Presenting data that does not reflect reality may lead to increased public fear, increased mental health problems, and the setting of in appropriate policies by politicians who, for some reason, are not transcribing 1000s of death records, by hand, into a spreadsheet.

Some people think this is not a problem. Do you this this is a problem or not?

AGAIN: EASY FIX. Just draw both charts on the OHA web site. One with reported by dates and one with actual death dates.

This should not be a big deal to accurately display this information to the public.