I setup a Python script using Twitter Python, which grabs all tweets that have to do with the following search keywords:
Although the list is not exhaustive, it resulted in some rather interesting insights.
This specific blog post is about the approximately 7,800 tweets collected on 11 April 2014.
Fun with HashTags
The first bit of analysis was to get all the tweets and extract the most used hashtags with regards to the upcoming elections in South Africa.
I created a histogram of the number of occurrences of a hashtag in the figure below.
The interesting part about the above histogram (most frequent hashtag) is that it also gives us a glimpse of what was the main talking point on April 11th, 2014 as far as South African elections are concerned.
As can be expected, South African President (Jacob Zuma) featured heavily on the day as the second most used hashtag, #zuma.
What would be more interesting and somewhat useful, would be to do some sentiment analysis on these tweets.
Given South Africa has eleven official languages, getting a good sentiment model would be complex and time consuming.
Having said that, if anyone is interested in doing sentiment analysis on these tweets, the data is available and can be accessed from the link I will provide later in this post.
Klout and Retweets
I am aware that this might be circular, but I wanted to measure the correlation between the user's Klout score and the number of retweets they received in a given day.
For this I used the Klout API via their Python package to retrieve the Klout scores of each retweeted user. I present the scatter plot of the number of retweets versus the user's Klout score below.
I annotated the plot with the labels of the top five retweeted Twitter handles.
The correlation for the top 30 most retweeted users was 0.51. So the correlation is positive but not strong.
This was still interesting, as we can see from the plot that @EconFreedomZA were punching above their weight on this day with more retweets than the
@DA_News Twitter account, which has the highest Klout score.
Something interesting also can be observed when we look at the number mentions versus Klout score.
Below I plot only the top 20 mentioned users and their Klout scores. I did not annotate it but to help in reading it I will say that the users with the second highest Klout score is @hellenzille (leader of the Democratic Alliance).
The correlation here is a strong 0.71.
Now it might be interesting if we can reverse engineer the complex Klout calculation via linear regression.
The Mention Network
The first image in this post is a snapshot of the social network constructed by tracking the mentions of specific accounts in the Twitter dataset.
The relative size of each node (Twitter account) is the number of mentions each of those accounts has gotten in the network.
As can somewhat be expected with this metric,
@DA_News is the largest. You can view the full network with accounts that have at least 6 mentions in the network here: 11th April SA Election Mention Network
I have made the Twitter JSON dumps available at my GitHub. Grab the continuously updated data here: github:za-2014-election-tweets
Yes, I will upload them when they get interesting.
Cover Image Credit: Niko Knigge