KDnuggets Home » News » 2011 » Aug » Publications » Lies, damn lies and data mining algorithms  ( < Prev | 11:n22 | Next > )

Lies, damn lies and data mining algorithms


 
  
excessive use of data mining can undermine the entire industry; Segmenting risk in insurance eventually destroys the possibility to spread risk equitably. Similar danger exists in other industries.


B-Eye-Network, Barry Devlin, Aug 2011

Big Data "We are running through the United States with dynamite and rock saws so an algorithm can close the deal three microseconds faster," according to algorithm expert Kevin Slavin at last month's TEDGlobal conference.  Kevin was describing the fact that Spread Networks is building a fiber optic connection to shave three microseconds off the 825 mile trading journey between Chicago and New York.  The above comes courtesy of a thought-provoking article called "When algorithms control the world" by Jane Wakefield on the BBC website.

... Just because we can do some particular analysis, does it really make sense?

The classic case is in the use of BI and data mining in the insurance industry. Large data sets and advanced algorithms allow insurers to discover subtle clues to risk in segments of the population and adjust premiums accordingly. Now, of course, actuaries have been doing this since the 18th and 19th centuries. But the principal driver in the past was to derive an equitable spread of risk in a relatively large population, such that the cost of a single event was effectively spread over a significantly larger number of people.

However, data mining allows ever more detailed segmentation of a population, and insurers have responded by identifying particularly high-risk groups and effectively denying them insurance or pricing premiums so high that such people cannot insure their risk. While in some cases we can argue that this drives behavior changes that reduce overall risk (for example, safer driving practices among young males), in many other instances, no such change is possible (for example, for house owners living on flood plains). I would argue that excessive use of data mining to segment risk in insurance eventually destroys the possibility to spread risk equitably and thus undermines the entire industry.

But, it gets worse!  The BBC article reveals how a British company, Epagogix, uses algorithms to predict what makes a hit movie.  Using metrics such as script, plot, stars, location, and the box office takings of similar films, it tries to predict the success of a proposed production.  The problem here, and note that the same applies to book suggestions on Amazon and all similar approaches, is that the algorithm depends on past consumer behavior and predicts future preferences based upon that.  The question is: how do new preferences emerge if the only thing being offered is designed solely to satisfy past preferences?

Read more.


KDnuggets Home » News » 2011 » Aug » Publications » Lies, damn lies and data mining algorithms  ( < Prev | 11:n22 | Next > )