Who Said What?

U.S. Presidential Elections 2016


Motivation

I wanted to explore the Twitter data of U.S. presidential candidates to find out which topics did they talk about most on the social media platform.

Project Details

I used the script from this Kaggle webpage to download ~50,000 total tweets of Donald Trump, Hillary Clinton, Jill Stein, Mike Pence, Gary Johnson and Tim Kaine from January 2014 through December 2016. After that, Python was used to categorize the tweets through keyword analysis. The visualization was done entirely in D3.js.

The trend in total number of tweets per month by different candidates can be visualized interactively below. The cursor can be used to identify the number of tweets at the end of a month.

Visualization of Twitter presence by month of U.S. presidential candidates
Figure 1: Twitter Presence by Month

We observe some interesting trends here:

  • Donald Trump of the Republican Party had the highest number of tweets per month on average, and tweeted 970 times in January 2015 alone, after which he had an on-and-off decreasing trend of Twitter presence.
  • Hillary Clinton from the Democratic Party consistently started tweeting more from March 2015.
  • Jill Stein of the Green Party had a sudden rise in number of tweets in March 2016, and overshadowed Hillary Clinton in terms of total number of tweets per month. She was consistently the most active candidate on Twitter from March 2016 until December 2016.

The first step was identifying keywords by searching through online news articles about important issues which were discussed in U.S. Presidential Elections 2016 and then further categorizing them into broad categories such as:

  • Economy and Trade
  • Foreign Policy
  • Domestic Policy
  • Energy and Environment
  • Campaign Controversies
  • People
  • Miscellaneous (including Social Issues, National Security, Education, Electoral Issues)

Visualization of important issues in U.S. presidential elections 2016
Figure 2: Important Issues in U.S. Elections 2016

The analysis of tweets by different categories can be well-summarised by the interactive visualizations below. The comparison of presence of the presidential candidates in different categories can be explored through the donut chart, and a summary by each candidate can be explored by the histogram.



We see that Donald Trump had the highest total number of tweets, and had the highest category presence in foreign policy, campaign controversies and people. Hillary Clinton tweeted the most among all the candidates about domestic policy and social issues. Jill Stein had the highest category presence in economy and trade, energy and environment and education, while Tim Kaine tweeted the most about electoral reforms.

Furthermore, the issues talked about the most can be visualized interactively by the Sankey diagram below. The individual nodes can be dragged for better visibility and the tooltip feature on the links and nodes can be used to find the number of tweets.


Number of tweets by U.S. presidential candidates on issues
Figure 5: Number of tweets by candidates on issues

It is interesting to note that the highest number of tweets referenced Barack Obama, with Donald Trump tweeting about him 357 times. Climate change was another main topic of discussion, followed by jobs, tax, voter, immigration, military, guns and debt related issues. The main issues tweeted about by the candidates are as follows:

  • Jill Stein - climate change, voters, debt, jobs, police and healthcare
  • Hillary Clinton - guns, tax, climate change and immigration
  • Donald Trump - Barack Obama, jobs, immigration, ISIS and Obamacare
  • Tim Kaine - military, school, veterans and gun related issues
  • Mike Pence - jobs, unemployment, school and education
  • Gary Johnson - voter and tax related issues

Possible developments

  • Create an integrated interactive framework allowing the user to navigate the data coherently
  • Do analysis of retweets and temporal variations of the topics discussed

Code