Trump, Failure of Prediction, and Lessons for Data Scientists

The shocking and unexpected win of Donald Trump of presidency of the United States has once again showed the limits of Data Science and prediction when dealing with human behavior.

By Gregory Piatetsky, KDnuggets on November 9, 2016 in Donald Trump, Elections, Failure, Hillary Clinton, Nate Silver, Poll

comments

Just before the Nov 8, 2016 election, most pollsters gave Clinton an edge of ~3% in popular vote. Nate Silver' FiveThirtyEight put the chances of Trump Victory at ~30%, while NYTimes Upshot and Princeton Election Consortium estimated ~15%, and other pollsters had even lower numbers. Still, Trump won. So what are the lessons for Data Scientists?

Correct prediction is based on statistics and statistics requires history of similar events and assumptions like independent variables to function correctly.

If we toss a 100 million fair coins, we can predict the estimated number of heads and tails quite accurately. But using polling to predict the votes of 100 million people is much more difficult. Pollsters need to get a representative sample, estimate the likelihood of a person actually voting, make many justified and unjustified assumptions, and avoid following their conscious and unconscious biases.

In the case of US Presidential election, correct prediction is even more difficult because of our system when each state (except for Maine and Nebraska) awards the winner its votes in the electoral college, and the resulting need to predict results by state.

The chart below shows that pollsters were off the mark in many states, mostly underestimating the Trump vote.

,
Source: @NateSilver538 tweet, Nov 9, 2016

To be fair, some statisticians like Salil Mehta @salilstatistics were warning about unreliability of polls, and David Wasserman of 538 actually described this scenario in Sep 2016 How Trump Could Win The White House While Losing The Popular Vote, but most pollsters were way off.

So a good lesson for Data Scientists is to question their assumptions and to be especially skeptical when predicting a rare event with limited history using human behavior.

Other important lessons are

Examine data quality - in this election polls were not reaching all likely voters
Beware of your own biases: many pollsters were likely Clinton supporters and did not want to question the results that favored their candidate. For example, Huffington Post had forecast 98% chance of Clinton Victory.

Other analyses of polling failures:

Wired: Trump’s Win Isn’t the Death of Data—It Was Flawed All Along.
NYTimes How Data Failed Us in Calling an Election
Datanami Six Data Science Lessons from the Epic Polling Failure
InformationWeek Trump's Election: Poll Failures Hold Data Lessons For IT
Why I Had to Eat a Bug on CNN, by Sam Wang, Princeton, whose Princeton Election Consortium gave Trump 15% to win.

Trump, Failure of Prediction, and Lessons for Data Scientists

More On This Topic

Latest Posts

Top Posts

<img width="120" height="120" src="/images/top-kdnuggets-blog-2016-silver.png" alt="2016 Silver Blog" align="right">Trump, Failure of Prediction, and Lessons for Data Scientists

More On This Topic

Latest Posts

Top Posts

Trump, Failure of Prediction, and Lessons for Data Scientists