By Fabio Rojas —
Digital democracy is here. We no longer passively watch our leaders on television and register our opinions on Election Day. Modern politics happens when somebody comments on Twitter or links to a campaign through Facebook. In our hyper-networked world, anyone can say anything, and it can be read by millions.
This new world will undermine the polling industry. For nearly a century, conventional wisdom has argued that we can only truly know what the public thinks about an issue if we survey a random sample of adults. An entire industry is built on this view. Nearly every serious political campaign in the United States spends thousands, even millions, of dollars hiring campaign consultants who conduct these polls and interpret the results.
Digital democracy will put these campaign professionals out of work. New research in computer science, sociology and political science shows that data extracted from social media platforms yield accurate measurements of public opinion. It turns out that what people say on Twitter or Facebook is a very good indicator of how they will vote.
How good? In a paper to be presented Monday, co-authors Joseph DiGrazia, Karissa McKelvey, Johan Bollen and I show that Twitter discussions are an unusually good predictor of U.S. House elections. Using a massive archive of billions of randomly sampled tweets stored at Indiana University, we extracted 542,969 tweets that mention a Democratic or Republican candidate for Congress in 2010. For each congressional district, we computed the percentage of tweets that mentioned these candidates. We found a strong correlation between a candidate's "tweet share" and the final two-party vote share, especially when we account for a district's economic, racial and gender profile. In the 2010 data, our Twitter data predicted the winner in 404 out of 406 competitive races.
Why does this happen? We believe that Twitter and other social media reflect the underlying trend in a political race that goes beyond a district's fundamental geographic and demographic composition. If people must talk about you, even in negative ways, it is a signal that a candidate is on the verge of victory. The attention given to winners creates a situation in which all publicity is good publicity.
This finding is remarkable because it doesn't depend on exactly what people say or who says it. We measured only the total discussion and estimated each candidate's share. It is this relative level of discussion that matters for tracking public opinion in electoral contests. Furthermore, social media data mimic what polls measure. For example, in Ohio's 3rd Congressional District, we found that Republican Mike Turner got 65.4 percent of his district's tweet share. In the final election, he got 68.1 percent of the two-party vote. The tweet prediction was off by 2.7 percentage points — a figure that is within the margin of error of any poll.
This finding has profound implications for the democratic process. There are many nations that remain mired in poverty and do not have the infrastructure required for extensive polling. Furthermore, these nations often have governments that are suspicious of polling and try to suppress it. For these reasons, it is very hard to monitor elections. In contrast, as long as citizens have access to the Internet, they can talk about their views in a less-restricted manner. The "grassroots" buzz found in social media can be studied, and it will reveal how elections are conducted and if the state is respecting human rights. And as with U.S. elections, even if the people who use social media are not completely representative of the public, the amount of attention paid to an issue is an indicator of what is happening in society. Important events generate scrutiny that can be measured and studied.
Social media analysis is also important for elections in the United States. Polling favors the established candidates because it is relatively expensive. In contrast, social media analysis is cheap. Anyone with programming skills can write a program that will harvest tweets, sort them for content and analyze the results. This can be done with nothing more than a laptop computer.
Current polling practices also pay disproportionate attention to "big" races. Every four years, we have dozens of polls on the presidential election, but many other races for important offices will not be consistently polled. Some congressional races are never polled. Social media analysis can be used to systematically gather data on any race at any time. Thus, people in smaller states no longer need to rely on polling organizations for information. A single citizen can harvest social media data and learn about the election in his or her area.
Traditional polling will remain useful, especially for learning about voters' beliefs and backgrounds, but polls are no longer the only tool for forecasting elections. In the future, you will not need a polling organization to understand how your elected representative will fare at the ballot box. Instead, all you will need is an app on your phone.