How can a highly accurate COVID test be wrong half of the time?

Aug 9, 2020 6 min read

The developing confusion of overly-informed people
What’s the problem? Considering base rates.
Visualizing the issue
Closing

The developing confusion of overly-informed people

Science is often mistakenly touted as an undeniable collection of facts. But the availability of information about COVID-19 has introduced people to the inevitable uncertainty of scientific discovery. Those not familiar with the process have begun to use this feature as a reason to doubt science as a whole, but complex issues like a pandemic are full of complicated layers that scientists have to navigate and re-evaluate. What I want to do here is to give a (hopefully) intuitive example of one of these layers using a popular and sometimes counterintuitive topic: testing¹.

There have been a number of confusing reports about testing accuracy in the past months. For example, Ohio’s Governor Mike DeWine tested negative after getting a positive test a couple of days earlier. Some months ago the CDC announced that highly accurate (93%) tests can yield such false positives more than half of the time (1, 2). So, how do we make sense of the fragility of highly accurate tests, and what can we trust?

What’s the problem? Considering base rates.

First, it’s important to know that there are two general types of tests: diagnostic ones that identify whether you currently have the virus, and antibody ones that check if you’ve had it in the recent past. Diagnostic tests like PCR are slow, but produce very reliable results. Antibody and diagnostic antigen tests are the fast ones you often get in drive-thru testing facilities, and that’s where things get tricky. We can look at a specific antibody example to understand why.

Below is an accuracy table for a popular 15 minute test (source). Compared to the molecular gold standard, this shows that the antibody test correctly identified 67 out of 77 positive cases, and 89 out 90 negatives (the diagonal numbers from the table versus the totals at the bottom). So we have (67 + 89) correct out of (77 + 90) = 93% accuracy. We are often content when we see this high number.

So, what’s the problem? Antibody tests like this one can produce so many false positives because this calculation disregards base rates. One of my daughter’s baby books portrays this issue pretty well. Let’s say that you’re at a party where they’re serving sugar and chocolate chip cookies, and someone hands you a third of a cookie that has no chips. If you had to guess, you’d probably say they gave you a sugar cookie. That’s perfectly fine if you assume that there are equal numbers of each cookie available. But if you then realized that most cookies are actually chocolate chip, then you might guess that the piece came from a chocolate chip cookie instead (one that just happened to have few chips in it). The point is: knowing how many of each cookie there are (in other words, the base rate), should persuade you to update your belief of the probability.

Visualizing the issue

Just like your initial guess of what cookie you were given, the sample the company used to calculate accuracy doesn’t consider how often the disease occurs in the wild. In fact, we humans are pretty bad at considering this intuitively². This little cognitive hiccup is very prevalent, and I wrote in another post how it’s possible to explain the 2008 economic recession using it (based on a book). In any case, it might be helpful to visualize the problem. The circles below depict the 77 positive cases and the 90 negative cases that the company identified to evaluate their antibody test (confirming them with a molecular test).

Now let’s see how the antibody test categorized them below. True positive (TP) and true negative (TN) are the correctly-assigned cases that we used to calculate the accuracy. You can also see where the test went wrong: 10 people with COVID that were wrongly classified as negative (false negatives, FN) and the single negative case that was mistakenly told they had COVID (false positive, FP).

That’s where we left off above. Visually, it looks like the test is doing well in keeping false positives at a minimum (only 1%), especially when compared to the number of true positives (87%)! But here is where base rates come in. Even though COVID has spread a lot across the US, the percentage of the population that has it remains relatively low. Here in Boston, around our worst time during the pandemic, we knew about 10,000 or so people who actively had the disease on a given day (very rough estimate using data from 6/5/2020, when I first considered writing this post ). For a city of almost 700,000 people, that means that around 1.43% of our residents were truly COVID positive. Let’s stick to that number, as I think it remains a fair estimate (and this is a toy example anyways). It’s clear that the somewhat well-split sample the company used doesn’t represent this low case percentage. Instead, let’s say we randomly picked 1,000 people from the streets of Boston as our sample. At 1.43%, we can expect that only 14 of those people would be infected. Now, let’s run the antibody test again and group our new sample just like before.

Notice how different the case populations look. Since the test picks up 87% of truly infected people, we are left with 12 true positives. On the other hand, because the number of sampled people without the disease is so large (986), the 1% false positive rate results in 11 people being wrongly told they’re hosting the virus. Now let’s isolate everyone who tested positive using the antibody test.

Now you can see how much of a coin toss a positive result from this accurate test can be: out of all our hypothetical people who were told they had a positive test, only 52% (12 out of 23) actually had the disease. Hopefully you can see where the CDC was coming from, and how DeWine’s positive antibody test was later shown to be false using molecular methods (PCR). The moral of the statistical story is that base rates matter.

Closing

Of course this is a simplistic way of looking at things, and I’ve only looked at part of the issue. In reality, those who get tested are likely symptomatic (depending on your local testing policy), so the base rate changes. Still, this shortcoming of testing is observed across other diseases, which is why doctors get paid big bucks to properly diagnose us! What’s more, the issue of improper sampling is very important, and has been (purposefully?) used to misguide people into believing that COVID’s death rate is impossibly low. Similarly, some states have mixed in antibody and molecular tests in their COVID reports, throwing the raliability of the case fatality rate out the window.

Either way, the point here is to exemplify how even things that appear simple have hidden layers, and scientists have to deal with much harder ones than the testing situation (epidemiologists naturally account for these issues pretty well!). Mistakes are bound to happen, but that’s not necessarily out of incompetence or malice. It’s all just part of the process, and being flexible in the face of uncertainty is just what science (the honest type, at least) is supposed to be like.

this is not my area of expertise, but the statistical nature of this problem is very similar to what I face for a living↩
Handbook of Behavioral Economics - Foundations and Applications 1.(2018).Netherlands: Elsevier Science.↩

R COVID-19 Bayes

How can a highly accurate COVID test be wrong half of the time?

The developing confusion of overly-informed people

What’s the problem? Considering base rates.

Visualizing the issue

Closing

Related