Making predictions this year has been an incredibly uncomfortable experience as far as my ‘sciencey spider senses’ are concerned. I’d like to say I sought solace in meditation- through focused reflection on Buddhist teachings.

That’s sort of true. In reality, I watched ‘Orange is the New Black’ and it taught me something. Specifically the hippie inmate – the lady who leads all the yoga classes – taught me something. She tells the inmates to think about their time in prison like Mandalas,  the Tibetan sand sculptures laboriously constructed and then destroyed.

I’m applying that philosophy to the Cannes Lions prediction model.

We worked hard to pull together the best model we could, and in the end gained insight into what we did wrong, but had no real success story. Now its time to move on, and this model gets washed away until next year.

Like building a sandcastle where each grain of sand is important, each piece of data in a model can totally transform what you are left with. There’s always that critical point at which you take away enough sand, and the whole thing collapses to a rather uninspiring mound. I fear this is what happened with our model last week.

It took us a really long time to get our hands on the historic data, and even longer to tease it out and clean it up into a useable format. Even after all that we did not have enough information to get clear picture. To predict the Grand Prix prize it would have been useful to see all the winners of Bronze, Silver and Gold historically, but we did not. We could characterise each entry, yet we don’t actually know who won what- so had very, very limited data to map it to.

Part of this is to do with developer time. Data Science can tell you what you need to build the model but needs help in getting all the pieces together. Another structural analogy could be – it can tell you exactly what you need to build brick wall, know how to construct it, but is too weak to lift all the bricks; or a bit like sending an army of ants off to fetch the brick to find they all got washed away by some rain.

We learnt this week that advertisers don’t tweet enough, and that our historic data set was not complete enough. This weekend we also learnt something about YouTube. It turns out that the volume of activity (likes and dislikes- it doesn’t matter), tell us who won the Grand Prix prize.

You can see by the colourful doughnuts that so much more attention is paid to these YouTube videos:


I’m still left with the question of causation. Does winning the Grand Prix prize suddenly make the general public love your advert? Are the Judges informing the trends? Are popular adverts given extra consideration at the Lions? Do the judges and the general public have the same opinions on what makes a good advert?

To answer any of these questions requires the YouTube metrics at different time points. Pre-Lions and post-Lions. It also requires us to allow for adverts that have been online longer and thus had longer to acquire popularity.

As we came in with YouTube metrics too late in the day to get our ‘pre-Lions’ data, the answers shall just have to wait for next year. To repeat this in 12 months, we’ll need to acquire a much more comprehensive backlog of information on the Gold, Silver and Gold winners. That’s a lot of tricky data acquisition.

To summarise – this year, we pulled out all the maths tricks, we just weren’t prepared enough – not enough data to play with. Next year we’ll be ready.

Charlotte is a Data Scientist Researcher with a PhD in Engineering Maths and two Masters degrees: one in Complex Systems and one in Earth Sciences. After thinking a lot about systemic risk in economics and finance, she now focuses on finding the right mathematical tools for our algorithms.