Following the referendum result - and while everyone tries to find answers to the question 'What happens now?' - we've been doing our own analysis. Why did our referendometer.eu experiment predict 52% to 48% remain?
We looked again at our algorithm and have identified three things which we think swung our prediction the wrong way:
- In the days running up to the vote, the data from the bookies saw a strong swing in favour of remain, despite being much less sure earlier in the election week. Betting odds are not a measure of public opinion, but the bookies are naturally attuned to the public mood. In any case, what the bookies said was not a very useful measure and without this data the referendometer would have scored 59% leave, illustrated by the manipulated results
- Secondly, we had applied an artificial Twiter weighting of -15% which also proved to be detrimental to our overall prediction. Without this weighting, the referendometer would have scored 53% leave, illustrated by the manipulated results
- Finally, the twitter results become flatter towards the end of the campaign period. This could be a reflection of public opinion becoming more consistent over time. Or because the system may have increased the error margin due to the public talking about new issues which were not present in the initial keyword data we used for training (we all know the language being used got ugly towards the end). Had we consistently trained the system in intervals with new keywords we may have improved accuracy, but further investigation and testing would be needed here. And maybe - just maybe - if there's a second referendum we might try!
Thank you all for your interest in this experiment and for your comments and feedback along the way. We are TheTin a brand and technology agency.
Referendometer - How we did it
On Thursday 23 June 2016 the UK will vote in a referendum to decide on our continued membership of the European Union. We wanted to find out who’s winning the debate.
Opinion polls alone aren’t working
In recent times, opinion polls have been a little more miss than hit. The polls in the run-up to the general election in 2015 were criticised for providing inaccurate predictions of the outcome. According to BBC News on 31 March:
Opinion polls before the 2015 election were among the most inaccurate since surveying began more than 70 years ago, an industry-wide review has suggested
At TheTin we thought: there must be a better, more accurate way of measuring the public mood.
We wanted to find a better way of analysing who is winning the debate. Here’s the full technical explanation of what we did and how the Referendometer works…
News data is very straightforward to collect - all credible news organisations provide an RSS (Really Simple Syndication) feed. These feeds follow a standard format for listing article information, so we built a scheduled task that checks for updates and stores the headlines along with publish times and URL’s to our database.
Collecting poll and betting data
Collecting poll and betting data is less straightforward as it is not exposed through API’s. We used a technique called ‘screen scraping’ in which a program is written to scan web pages and extract data by looking for specific elements such as tables.
In order to select tweets relating to the EU referendum we implemented a continuous background task - this subscribes to the Twitter Firehose API. The Tweets received from this API are filtered by searching for specific hashtags:
|Leave-oriented hashtags||Stay-oriented hashtags|
Machine learning and crowd-sourcing
Classification of Tweets simply based on the hashtag was not enough to determine whether they were in favour or against leaving the EU. To obtain a higher accuracy of classification, artificial intelligent machine learning was built using Microsoft Azure ML tools.
To begin training the system, we created a simple interface which displays a single tweet at a time, with a selection of options to choose from:
The aim was to gameify this process and crowdsource the work, for maximum efficiency. This worked really well and we received very positive feedback from participants.
Following an audit of this manual classification, we developed a training algorithm. Microsoft offers some extremely powerful tools for building these types of systems. We experimented with different models and found that the Support Vector Machine algorithm produced the most accurate results.
The results were tested by splitting the data (4:1) and using the former portion for training and the latter for testing against the known manual scores.
During experimentation, we also wanted to see if we could measure the sentiment of the Tweet. To achieve this we used an open source data set containing manually classified Tweets, based on sentiment.
We used this data in conjunction with a Bayesian classifier to train a second algorithm:
The two training algorithm flow charts: Left: Brexit (Stay / Leave). Right: Sentiment (Positive / Negative)
With the test results scoring between 70% and 80% accuracy, we refactored the training model into a classification model which could then be published as a web service. Put more simply, this service accepts Tweets as input and outputs a Stay / Leave percentage and a sentiment positive / negative percentage.
We saw that it is necessary to add a ‘neutral’ category: Tweets are classified as neutral, if their percentage scores in the stay and leave groups are too low too close together. This would catch Tweets, which do not have a clear classification, or if the system is simply not certain enough to place in one group.
Combining classification metrics
Once the machine learning service had been set-up, we created a background task that would periodically process the tweets which were being stored to our database. This task is made up of five steps:
- Dumb classification based on the hashtag
- Classification of Tweets based on Brexit percentage using the machine learning service
- Classification of Tweets based on sentiment percentage using the machine learning service
- Invert or retain the result of step one, based on the result of two
- Compare the results of step three and step four and select score with the highest certainty (percentage)
Warehousing classification data
Referendometer.eu processes approximately 35,000 tweets a day. To ensure that the system scales and remains responsive we built a warehousing process. This calculates the average classification across all tweets in each 24 hr block. The same logic is also applied to betting data and poll data. The aggregate of this data is then stored as a single row, per day.
The web application connects to the warehoused data and performs one final calculation to determine the final score for each day. This is performed by calculating an average of Tweets, Polls and the Betting data.
Finally, we implemented data caching to reduce dependency on the database and speed up page load times.
This project has allowed us to experiment with Microsoft Application Insights. The platform offers a really great insight into the performance of the application and flags up any issues so that we can fix them pro-actively.
Are we right?
Of course we won’t know until after the vote if this experiment provides a more accurate prediction than the opinion polls have done, but we think it is an interesting exercise.
You can keep up to speed with the debate by following @referendometer
And if you’d like to talk to us about this project or what TheTin might be able to do for you, please get in touch:
In any case - and if you haven’t already – please do register to voteLaunch full screen