Evidence that ‘Whatdoestheinternetthink? (.Net)’ cannot track changes in public opinion.
‘Whatdoestheinternetthink? (.Net)’ is a website that tries to do something quite special. It tries to tell you how the internet ‘feels’. Specifically, how it feels about particular words and phrases. It will tell you whether the internet is “positive”, “negative” or “indifferent” about a piece of text. How can it do this? According to the “How It Works’ section of the website, this is achieved by using Google and Bing search engines and some kind of algorithm. The website collects together sentences from all over the internet (containing the word or phrase of choice). Then it sorts and counts them according to whether they are “positive”, “negative”, or “indifferent” to that word. So for example, if you type the word “science”, and press enter, you see this:
It isn’t clear exactly how the process of sorting sentences into categories works. The site doesn’t say, for example, whether the algorithm just counts positive or negative words, or does something more sophisticated. If the former, then we have good reasons to expect that it will make some mistakes, as this sentence illustrates:
“We could do a lot worse than Jeremy Corbyn as prime minister, he isnt nearly as bad as some people say”
Without knowing how to it works, it isn’t clear how seriously we should take this website. The “How It works” and “Disclaimer” sections of the site say that its results shouldn’t be relied upon. However, some experts are concerned that results from the website will be used to make real decisions. For example, one cyber security expert has called it another example of “lies, damn lies and statistics”, expressing concerns that the website will be used by unscrupulous marketing firms. For example, some firms might try to recruit clients by pointing out that ‘Whatdoestheinternetthink? (.Net)’gives their product an abysmal popularity rating.
In the first page of Google-Search results for “Whatdoestheinternetthink? (.Net)” I found a company called “SourceCon” which appears to have used ‘Whatdoestheinternetthink? (.Net)‘ to promote a professional conference. According to SourceCon, ‘Whatdoestheinternetthink? (.Net)‘ shows that ‘recruiters’ are even less popular than ‘dentists’ (and their conference can help)!
Not only are some companies taking this site seriously, but it has been gaining press on the internet. In an article called “Why Emo Fashion Doesn’t Deserve All The Hate It Gets” Georgina Jones says this about the site:
“search results for “emo” yield over 13,000 negative hits and only 1,000 positive ones. Statistically, that means 88.6 percent of the Internet hates emo fashion while only 11.3 percent looks back upon it with nostalgia.”
Another user reviewer describes how much he “loves” the website, comparing its usefulness to that of Gallup Polling!
This site isn’t a fringe internet oddity. It receives lots of attention and some of the people using it appear to take its reports quite seriously. But, we know very little about how it works, or whether the results it produces actually represent any meaningful information. Even if we could verify the way that the website collects and processes data, that wouldn’t tell us what these results actually mean. Do the results correspond to real world attitudes and opinions? Do they tell us anything meaningful about the views of people in society? The purpose of this article is to provide an initial test of whether or not the results from the site can track changes in public opinion over time.
One author is already convinced that they do. Velar Trill defends the legitimacy of the site, suggesting that bizarre results like “Hitler” being reportedly negative and “Adolf Hitler” being reportedly positive are real indicators of people’s attitudes. The argument made, is that far-right fans of Adolf Hitler tend to feel positive about him and to use his full name, whereas others don’t bother.
No evidence is presented for this explanation and so I suspect that it may be a post-hoc rationalisation. In other words, it may be a story, invented after the fact, to explain a counter intuitive result. It would be easy to discount evidence that undermines the legitimacy of the site and to focus on evidence that supports it. This is called confirmation bias. If the result had been the other way around and “Adolf Hitler” had been the more positive phrase, Velar could have made the same argument. Velar could have claimed that this was because ‘Hitler’ is more familiar and the full name, “Adolf Hitler” is more formal. In other words, the logic of this argument is largely retrospective.
EDIT: Since writing the paragraph above I have been made aware that the results for “Adolf Hitler” and “Hitler” have reversed in popularity since Velar’s article. Hitler is now more popular. It also appears that ‘the internet’ is now rather fond of Hitler in all his forms.
In reality, the difference between “Adolf Hitler” and “Hitler” is trivial. Discrepancies between the reported positivity of the different phrases are simply evidence that the site is a weak indicator of public attitudes.
Still, perhaps it would be unfair to use extreme cases like “Hitler” to evaluate the site. Also, it is clear that the results from the site change over time. So, perhaps the results are just very sensitive to relative shifts in attitudes. Perhaps, rather than being absolute indicators of public perceptions, these results indicate changes in perceptions?
In order to conduct a more lenient test of the website’s ability to track public attitudes, I took advantage of its facility for displaying changes in the positivity of words and phrases over time. The website provides a graph showing the reported positivity of a word/phrase for each month since the inception of the site in 2009. I decided to test the site by finding out whether the changes in this graph, over time, bore any relationship with changes in public attitudes during the same period.
The site claims that results get more “accurate” with a larger numbers of hits. So I chose a phrase that would get a larger number of results: “Barack Obama”, the current president of the USA. I then compared Barack Obama’s popularity, according to the site, to real Political (Gallup) polls, of Presidential Approval Rating, conducted since 2009. Plotted on a graph, the results look something like this:
It’s clear that there is a divergence between public opinion polls and “what the internet thinks”. In many places “what the internet thinks” shows a dramatic drop in popularity, whilst the real popularity of Barack Obama was actually increasing! However, it’s hard to be systematic in evaluating a graph. It is easy to focus on interesting details and forget the data as a whole.
In order to try to be rigorous in analysing this graph I used the statistical software ‘SPSS’ and some statistical analysis called “Spearmans Rho“. To cut a long story short, Spearman’s Rho (aka Rank Order Correlation Coefficient) is a statistical test that looks at whether two sets of variables are correlated. It looks at whether, when the values of one measure on the graph increase (real approval ratings), the other measure (“what the internet thinks”) does as well (and visa-versa). It is insensitive to the size of this change and instead, it works by ranking values from each dataset. This is makes it an appropriate statistical technique for this test. It is less sensitive to whether or not ‘Whatdoestheinternetthink? (.Net)’ over-estimates or under-estimates changes in real approval ratings.
The results of the analysis were clear. There is no useful relationship between changes in Barack Obama’s approval rating and the results reported by ‘Whatdoestheinternetthink? (.Net)’ . There was a weak, statistically significant, negative correlation between “Real Approval Polls” and “What The Internet Thinks” (Rs (68) = -0.418, p < 0.01) This means that at time points when the site reported that Barack Obama was becoming less popular, he was likely to be becoming slightly more popular (and visa versa)! This small negative relationship (17.5% of the variance in the data) may well be due to chance rather than a genuine correlation . Either way, it is clear that ‘Whatdoestheinternetthink? (.Net)’ cannot provide a meaningful indicator of changes in Barack Obama’s approval rating.
It may be that the website provides interesting insights of some other kind. However, the lesson here is clear. People should not be using ‘Whatdoestheinternetthink? (.Net)’ as an indicator of popular attitudes or opinions.
Ideas for Further Work:
Barack Obama’s approval rating is a US-only measure and ‘Whatdoestheinternetthink? (.Net)‘ uses results from all over the English Speaking world. Although, I suspect the majority of the online writing about the president of the US, occurs in the US.
Nevertheless, a further interesting test might involve tracking down an opinion poll taken for the whole English-speaking world (at multiple points in time). However I am not aware of the existence of such a poll.
I obtained data from ‘Whatdoestheinternetthink? (.Net)’ by copying from the “Statistics” section of the website by hand. I used the search term “barack obama”.
Presidential Approval Ratings are based on the results of publicly available Gallup polling. The monthly approval rating figures are based on the mean average of all opinion polls begun within each month.
PDF copies of relevant webpages, SPSS output and the raw-data for the analysis are available on request.