Skip to nav Skip to content

Journal Club

February 16, 2022

 All Issues
 Read the last Journal Club Read the next Journal Club 

Journal Club

Data Quality and Data Quantity: Analyzing Discrepancies in Large Surveys of COVID-19 Vaccine Uptake

By Brendan J. Kelly, MD, MS

A recently published report from Bradley and colleagues in Nature presents an analysis of discrepancies between two large surveys of early COVID-19 vaccine uptake and subsequent, gold-standard benchmark data published by the Centers for Disease Control and Prevention.

The authors found that the two large surveys of U.S adults (Delphi-Facebook and Census Household Pulse) significantly overestimated early COVID-19 vaccine uptake by 17%. The authors compared the poor performance of the two large surveys to a smaller survey (Axios-Ipsos Coronavirus Tracker), which performed better. They highlight “the big data paradox” — the fact that conventional formulas for statistical uncertainty mislead when applied to surveys with systematic sampling bias, because as sample size increases, the bias dominates the estimator error.

In their report, the authors propose a novel framework for quantifying survey data quality, which adds to the “data scarcity” (survey sampling error, which is typically used) two additional components: the “data quality defect” (the correlation between the event that an individual’s response is recorded and the response itself) and the “inherent problem difficulty” (population heterogeneity). They found that the errors in the two large surveys were dominated by increasing data quality defects. Though the raw weekly sample size of the largest survey was 250,000, the authors found that its bias-adjusted sample size was less than 10. Comparing the survey methods, the authors found no single factor that drove bias: Panel recruitment, sampling, and weighting the survey data all contributed.

They conclude with a statement of caution for the big data era, urging clinicians, scientists, and policymakers to consider that large sample sizes can actually exacerbate the effects of small biases in data collection and lead to incorrect inferences.

(Bradley et al. Nature. 2021;600:695-700.)


This website uses cookies

We use cookies to ensure that we give you the best experience on our website. Cookies facilitate the functioning of this site including a member login and personalized experience. Cookies are also used to generate analytics to improve this site as well as enable social media functionality.