In my research for model inspiration whilst trying to predict the Cannes Lions, I found two papers discussing approaches to predicting the Oscars. A fitting link, as the Cannes Lions have often been dubbed as the Oscars of the ad industry.
One paper explains how to predict from historic data, the other focuses on social networks, but there was one statement that stood out:
“The Web has turned into a major platform for information exchange, thus becoming a mirror of the real world.
That got me thinking about Snow White and the magic mirror. The evil Queen asks the mirror a question and he gives her an answer. Now is he an expert of all? Is he God? Or is he in fact reporting back the wisdom of the crowd. I mean how can Snow White be the ‘fairest’ of them all if there was no opinion poll. In 2015, one can only assume he mined some social media sites to get that answer. Tinder and Facebook likes, perhaps, or a URL scrape of all ‘hottest 100 lists’ from the lads mags.
The last decade (or so), the idea that a large volume of non-experts can more reliably predict a trend than a single expert has come to the forefront of our consciousness. Luckily the tools of agent based modeling, Natural Language Processing (NLP) and network theory were there ready to help.
The paper predicting Oscars by social network analysis of IMDb chat narrowed it down to one equation – a truly amazing feat of elegace for anyone who tries to construct a mathematical model. That’s so amazing in fact that even Vanity Fair decided to tell people about it, here.
That one equation covers a whole lot of mathematical awesomeness and technical tricks.
Firstly they scraped ALL the content created my IMDb’s 4 million users.
Then they started with some basic adding up and dividing. They measure intensity of comments. How many comments each movie had received and at what frequency gave them communication intensity.
Then something a bit more involved. They used NLP to decide if the sentiment of a comment was positive or negative. Looking for phrases like “win,” “nominate,” “great,” “good,” “award,” “super,” “oscar,” and “academy”. They give a rating to each of these words, as some of them were found to have more relevance than others. Then they figured out the frequency with which the positive mentions were made and they called this the Positivity Index
Here comes the first catch (luckily Ettie has already worked on solving this problem).
“Intensity and Positivity Index are not fully independent: the number of positive terms mentioned in context with a movie will increase with the number of messages about this movie. However, it is also possible that a movie will be talked up in a negative context. To prove this we would also need to incorporate a “Negativity Index.”
This they kept for a later date.
Their final metric looks at the time left before a movie, assuming that a movie is more talked about around its release date. These three measures predicted box office sales and Oscar awards.
Today, I am seeing if YouTube data can create us a magic Lion formula. I will be correlating the comments and likes with the Lion nominees to see if the wisdom of the crowd can predict the winners. In doing so we will discover if the tastes and trends of ad-industry professional are reflected in those of the general public.