Socio-technical instruments for public health

How do we quantify the well-being of a population in real time? Traditionally, we’ve done so with a mixture of key economic indicators, from GDP to consumer confidence indices.

But are people happy? Do they believe their lives—their personal stories—are going well? Do they have hope for the future?

More than 2 billion people use social media such as Twitter, WeChat, Facebook and Instagram to express themselves. Can this information be used as an input to build a showcase of daily collective happiness?

The Computational Story Lab (co-directed by Peter & Chris) as well as some other groups are working to develop such a “hedonometer” – a measure of real time happiness with the power of a major economic index, using social media data as an input.

What do our Tweets reveal about us?

Our analyses of a representative subset of Tweets in the English language reveals that, throughout the day, people use the happiest words on Twitter in the morning, with profanity increasing in frequency as the day wears on: what we call the Daily Unraveling of the Human Mind. We see a robust weekly cycle with Saturdays tending to be the happiest day of the week, with “love”, “family”, and “fun” used more often. Mondays and Tuesdays are low points, with less “hahaha”, “awesome”, and “playing”.

Society’s collective attention to major events lead to large spikes and crashes in happiness. U.S. data show that holidays tend to be our happiest days, with Christmas and New Year’s Eve producing more “hope”, “blessed”, and “joy” and less “no”, “not”, and “hate”. Similar findings are likely to appear in Mumbai on Diwali or in Beijing on Chinese New Year. We appear to need organized, structured celebrations for people to coalesce around positivity.

Likewise, at a subpopulation level, large fan bases unsurprisingly explode in happiness when their team wins a major sports championship (e.g. Super Bowl in the U.S.), while the losing team’s fans’ tweets are sad and angry.

The saddest days are responses to system-level shocks. Natural disasters like the earthquake and tsunami in Indonesia, celebrity deaths like those of Michael Jackson, and terrorist attacks like those in Paris all led to extremely negative days. Geo-tagged social media data allows this analysis to be narrowed down to specific geographic zones such as a city or state.

Daily timeseries of happiness in English tweets, based on a random 10% of messages posted worldwide. Interactive version available on https://hedonometer.org

Can social media provide indicators of population health?

Social media based Hedonometers also offer strong correlations with some traditional measures of health and well-being.

For example, in a recent study specific words and phrases in tweets showed strong correlations with obesity and diabetes rates in the US: states where people tweeted about running, skiing and hiking had lower levels of these diseases. In another study of geo-tagged public tweets from about 1,300 U.S. counties, tweets conveying negative emotions (e.g. anxiety, hostility) correlated with heart disease death rate data for these same counties (as provided by the Centers for Disease Control and Prevention). Another analysis of tweets posted by 1,400 users with self-reported ADHD revealed the group posted more messages related to lack of focus, self-regulation, and expressions of mental, physical and emotional exhaustion versus a control group without the disease.

Analyses of Instagram photos from 166 individuals successfully identified markers of depression - depressed subjects tended to post bluer and darker photos (at a pixel-by-pixel level) than non-depressed subjects.

And Microsoft researchers recently reported that search engine queries may identify some cases of cancer even before people had received such a diagnosis. One of us (Murali) is currently working with the Microsoft team to examine whether web search phenotypes (the pattern of search queries) can provide clues about early symptoms of brain diseases like Parkinson’s and Alzheimer’s disease.

With further refinement it is conceivable that these “digital markers” could serve as public-health dashboards, offering near real-time feedback to policy-makers on the impacts of their decisions.

What are the pitfalls?

Clearly we must bear in mind lessons from the rise and fall of Google Flu Trends (GFT) – an algorithm to predict flu rates based on people’s web searches - which after initial success, failed to detect the peak of the 2013 season by a large margin (140%). A subsequent post-mortem revealed that GFT’s algorithms were prone to “overfitting” – giving too much weight to search terms that only correlated to flu by chance - and did not take into account changes in search behavior over time.

Sentiment and mood analyses using social media is still in its early days. The lack of random samples, inability to do deep sentiment analyses (i.e. go beyond what people think about specific topics) and deciphering the intricacies of hundreds of native dialects are some of the challenges researchers are working to overcome.

Hence, it is vital that such instruments will be tunable, improveable, and transparent--- and not rely on black box algorithms. A public private partnership may be ideal for validating, monitoring and refining such instruments.

What’s next?

The development of multi-lingual social media-based hedonometers, that also cover other languages such as Mandarin, Arabic, Hindi and Spanish, could allow for a global sensing mechanism to detect worldwide emotional changes in response to events such as Brexit, World Cup Football matches, Nuclear tests, and financial market moves.

A large-scale database of word-based measures for emotions other than happiness and sadness such as fear, anger, and surprise would allow these emotions to be incorporated into future versions of the hedonometer. Including text from other online sources such as Google or Baidu (what people are searching for) and BBC, Xinhua or Times of India (what people are reading about) can serve as different lenses through which to explore societal trends.

Intelligence agencies around the world have already begun to put such tools into use but their data and algorithms are not available for public scrutiny or use. We believe it is time to establish an ethical and scientific framework for such instruments for public use – and are seeking funding to create such a framework.

As social media continues to reveal finer scale insights into our behavior, we aim to make sense of the habits, emotions, and stories we share with each other. Just as the invention of accurate clocks and thermometers and telescopes did for so many scientific fields, society should build instruments that unlock the power of science to describe and explain social phenomena, thereby enabling technological developments for the betterment of people everywhere.

*This blog post was co-authored by Chris Danforth and Peter Dodds from the University of Vermont, and Murali Doraiswamy of Duke University. It originally appeared on the World Economic Forum on November 16, 2017.