This one is a little late given that the election was a month ago but I still think it is worthwhile given the opinions I still hear about his performance.
I’m walking out of this election with an increased respect for Nate Silver and the work he is doing at fivethirtyeight.com. Statistics and prediction modeling is hard. Even that is an understatement because anytime you try to predict the future – even tomorrow – it’s more likely that you end up slight wrong that completely right.
A big portion of my job involves trying to understand the impact of decisions today on the business tomorrow. If we build out an office for 40 people today, what is the likelihood we have to close or expand it in 3 years? What is the likelihood that we can support 50 people in the same place without redoing the furniture? Is this city still going to be the right location for this function based on both business and geographic trends?
Why do I respect Nate Silver more today than before? Because he understands the single biggest rule of data analysis: Garbage in, Garbage out. If ou have any questions at all about the quality of the data you are being given to use it is your responsibility to account for that fact and note its potential impacts. Saying that you were wrong because the data was wrong is exactly the wrong answer because the follow-up question is “did you have any reason to suspect the data was wrong?” and any answer other than “yes” makes you incompetent in this case.
Data is fickle and many people think that data itself provides an answer. But if that was really true then IBM’s Watson would have taken over the planet already. Real intelligence is in being able to understand what data means. Where is it best applied, where should it not be applied at all, where is it misleading, where is it incomplete, where is it biased, what conditions could lead to a change in trends…. There is an art to being able to actually deal with data.
It does not surprise me in the slightest that most polling aggregators this year showed Clinton at a 98% chance of winning the election. The data seemed to reflect that. Relying on the cold hard numbers would point almost any model that direction. But this simply proves that some people are better at this than others and you should never trust any model until you sense check its approach, its strengths and its weaknesses. It’s like restaurants. The aggregators who gave 90%+ chance to Clinton are fast food, they throw everything in and give you a generic burger. Those that talked a lot about uncertainty are actually chefs, they know what to do with the raw food they are handed and throw out the worst and make the best sing before the plate goes out.