Enlarge /. A COVID-19 blood test will be performed outside of Delmont Medical Care in Franklin Square, New York on April 22, 2020. The test identifies antibodies to the coronavirus.
Frustrated statisticians and epidemiologists used social media this week to identify significant shortcomings in two widespread studies that attempted to estimate the actual prevalence of COVID-19 in two California counties, Santa Clara and Los Angeles.
The studies indicated that in each of the districts, far more people were infected with the new corona virus than expected – that is, they estimated that the actual number of cases in the two districts was up to 85 times and 55 times that of the current number confirmed cases is in the counties. Accordingly, this suggests that COVID-19 is far less fatal than expected. Due to the large number of deaths compared to the unchanged number of deaths, the death rate of COVID-19 is in the same range as the seasonal flu.
How dangerous is that?
We go into the details of the following studies, but it is important to note that they have not been published in a scientific journal, nor have they been through a standard peer review for scientific review. Instead, they were put online in draft form (a common occurrence in the midst of a rapidly evolving pandemic that causes researchers to access data quickly, even if this is uncertain).
The results seem to support minority arguments that COVID-19 may not be worse than seasonal flu (one of the main causes of death in the United States) and that the restrictive mitigation efforts currently strangling the economy may be unnecessary. In fact, three researchers who co-authored the new studies have made exactly these arguments public.
As early as mid-March, Stanford's public health researcher John Ioannidis argued in a controversial statement in the STAT biomedical news agency that the mortality rate of COVID-19 may be much lower than expected, which may make the current barriers "completely irrational". Health researchers Eran Bendavid and Jay Bhattacharya, both from Stanford, made a similar argument in the Wall Street Journal at the end of March. They called the current COVID 19 death estimates – in the range of 2 to 4 percent – "deeply flawed".
Ioannidis is a co-author of the study conducted in Santa Clara County, and Bendavid and Bhattacharya were leading researchers in both studies released online this month.
The new studies appear to support the researchers' earlier arguments. But a choir by her colleagues is anything but convinced. In fact, criticism of the two studies has woven a damn carpet of Twitter threads and blog posts that point to deficiencies in the studies – from basic math errors to alleged statistical sloppiness and sample bias.
In a blog report on the Santa Clara County study, Columbia University statistician Andrew Gelman described some troubling aspects of statistical analysis. He concluded:
I think the authors of the linked paper owe us an apology. We wasted time and effort discussing this paper, the main selling point of which was some numbers that were essentially the product of a statistical error.
I'm serious about the apology. Everybody makes mistakes. I don't think the authors have to apologize just because they screwed it up. I think you have to apologize because these were avoidable mistakes.
A Twitter account from the laboratory of Erik van Nijmegen, a biologist for computer systems at the University of Basel, responded to the study by tweeting the joke "Loud sobs under the tombstone of Reverend Bayes". The tweet refers to Thomas Bayes, an 18th-century English reverend and statistician, who made a basic sentence about probability.
Pautschi Pennings, an evolutionary biologist at San Francisco State University, said in a blog about the Santa Clara study: “In research, we want to say that extraordinary claims require exceptional evidence. Here the claim is exceptional, but the evidence is not. We also learn that even if a degree comes from a great university, it is not a guarantee that the degree is good. "
Harvard epidemiologist Marc Lipsitch said on Twitter that he agreed with similar statistical criticisms on the Internet. He added "praise" to the authors for the conduct of the study and "provided an interpretation of it (which supports their" it's over the top "view).
What do all these researchers have in their arms?
The goal of the studies
The primary aim of both studies was to estimate how many people in each of the two counties were infected with SARS-CoV-2 at any given time. This is an extremely important endeavor because it can show us the true extent of the infection, guide the efforts to end the transmission, and better assess the full range of severity of COVID-19 disease and the death rate.
Because diagnostic testing in the United States was so limited and many COVID-19 cases have mild or no symptoms, researchers expect the actual number of people infected to be confirmed based on confirmed cases is much higher than we know. There is no debate about that. But how much higher is the subject of significant debates.
The researchers went through their studies by recruiting small groups of residents and testing their blood for antibodies to SARS-CoV-2. Antibodies are Y-shaped proteins that the immune system produces to attack certain molecular enemies such as viruses. If a person has antibodies that recognize SARS-CoV-2 or its components, this indicates that the person was previously infected.
In the Santa Clara County study, researchers recruited volunteers on Facebook and had them come to one of three transit testing centers. In the end, they tested the blood of 3,330 adults and children for antibodies. They found that 50 blood samples, or 1.5 percent, were positive for SARS-CoV-2 antibodies.
They then adjusted their numbers to estimate what positive tests they would have had if their volunteer pool matched the county demographics better. The volunteer pool was geared towards certain postal codes in the district and was enriched in relation to the actual composition of the district for women and whites. The researchers' adjustment resulted in the prevalence of positives almost doubling and increasing from 1.5 percent to an estimated 2.8 percent.
The data was then adjusted again to take account of inaccuracies in the antibody test. There are two metrics for accuracy here: sensitivity and specificity. Sensitivity refers to how well the test correctly identifies all real positives. Specificity refers to how good the test is at correctly identifying all true negatives – in other words, avoiding false positives.
According to the authors of the Santa Clara study, the sensitivity and specificity data of their antibody test led to an estimate that the actual prevalence of SARS-CoV-2 infections was between 2.49 percent and 4.16 percent.
Based on the population of the county, this would indicate that between 48,000 and 81,000 people in the county were infected. The number of confirmed cases at the time of publication was only 956. This means that their infection estimate is 50 to 85 times higher than that of the confirmed cases.
The researchers then estimated an Infection Mortality Rate (IFR) with this large number of estimated infections and an estimate of only 100 cumulative deaths (including infections at this time. Deaths may lag behind the original infections for weeks). They calculated an IFR of 0.12 to 0.2 percent. This falls into the area of seasonal flu, which has an estimated death rate of around 0.1 percent.
Less data is available from the Los Angeles study. In an unusual step – even according to today's pandemic standards – the results were first published in a press release from the district health office, which contained only a few statistical and methodological details. A brief draft of the study (PDF can be found here) was also distributed online, but still contains less methodological information than the Santa Clara study. The draft also has higher prevalence estimates than the press release. It is unclear why the estimates are different, but we mainly focus on the conclusions that have been officially released by the Health Department.
In general, the researchers used data from a market research company to randomly select residents and invite them to be tested at one of six test locations for the study. The researchers set quotas for participants by age, gender, race, and ethnicity to match the population characteristics of the county. Their goal was to recruit 1,000 participants.
They tested 863 adults using the same antibody test used in the Santa Clara study by Premier Biotech of Minneapolis, MN. Of the tests reported, 35 (or 4.1 percent) were positive. According to the press release, the adjusted data indicate that 2.8 to 5.6 percent of the district's population was infected with the new corona virus.
Given the county's population, this indicates that 221,000 to 442,000 adults in the county were infected. This estimate is 28 to 55 times higher than the 7,994 cases confirmed at this time. As in the Santa Clara study, the IFR is in the range of 0.3 to 0.13 percent, closer to the IFR of seasonal flu.