Divergent modes of online collective attention to the COVID-19 pandemic are associated with future caseload variance

D. R. Dewhurst, T. Alshaabi, M. V. Arnold, J. R. Minot, C. M. Danforth, and P. S. Dodds



Using a random 10% sample of tweets authored from 2019-09-01 through 2020-03-25, we analyze the dynamic behavior of words (1-grams) used on Twitter to describe the ongoing COVID-19 pandemic.

Across 24 languages, we find two distinct dynamic regimes: One characterizing the rise and subsequent collapse in collective attention to the initial Coronavirus outbreak in late January, and a second that represents March COVID-19-related discourse.

Aggregating countries by dominant language use, we find that volatility in the first dynamic regime is associated with future volatility in new cases of COVID-19 roughly three weeks (average 22.7 ± 2.17 days) later.

Our results suggest that surveillance of change in usage of epidemiology-related words on social media may be useful in forecasting later change in disease case numbers, but we emphasize that our current findings are not causal or necessarily predictive