KDnuggets : News : 2008 : n20 : item6 < PREVIOUS | NEXT >

Features

From: Remi Raphael
Date: Tue, 7 Oct 2008
Subject: Open Standards, is the Predictive Analytics Community Aware?

For many years, programmers and developers of predictive models were completely locked in their systems. Building a predictive analytics algorithm resembled a rigid process lacking the flexibility of sharing models and moving the code from one system to another.

That was the time where the industry enjoyed a single main stream system and competitive frameworks dedicated to improve certain tasks of the data mining process (e.g. data cleansing, model training, model deployment, etc.) were not conceived, yet. Thus, the challenge of exchanging models and data among multiple systems was not a concern.

Times have changed. Many applications are available today to optimize a multitude of processes required to handle the data mining activity. The need for moving and exchanging data/models across systems has emerged. A decade ago, XML was introduced in order to free developers from the customization loopholes and to give them the ability to exchange data between different systems. Many applications were built on top of XML which received great acceptance. Various consortiums and standardization groups were involved in shaping the use of XML across the entire IT industry.

Fortunately, this revolution didn't miss the field of predictive analytics and data mining. A consortium DMG (Data Mining Group) has emerged with the support of many industry leaders to leverage the potential of XML within data mining applications. The DMG started its efforts of rationalizing and categorizing the predictive models into a markup language called PMML (Predictive Models Markup Language). Its current version 3.2 includes association model, clustering model, general regression model, neural network, etc. ).

In a recent Poll by KDnuggets, 69% of participants unfortunately do not use PMML for deploying models and consequently are not taking advantage of being able to move their models from one system to another. On the other extreme, the 15% that use PMML do so for more than 50% of their data mining models (http://www.kdnuggets.com/polls/2008/using-PMML-to-deploy-data-mining.htm).

Many interesting comments from participants reflected this polarized views toward PMML:

On one hand, one comment shouted No XML please! claiming that XML is a degenerative technology that should have never been used. ? . On the other hand, Alex Guazzelli who's company (http://www.Zementis.com) supports the deployment of PMML commented: ... It is not very productive to replicate a single solution in many different formats (for different packages) if you could represent it in a single PMML format. ? R, SAS, SPSS, and a range of other data mining packages already support the standard in one way or another. ? Open standards is the way to go.

Although predictive analytics developers were reluctant to embrace the XML revolution, they finally realized the virtue of "Open Standards".

Will PMML prevail as the industry standard for predictive analytics ? This remains a question to be answered by the degree of adoption from the users community. But the good news is that many users are trying to unlock themselves from committing to one system "does-it-all" and leveraging this inter-application exchange format today. Many vendors (e.g. SAS, SPSS, IBM, etc.) are actually engaged in pushing PMML to the next level, adding to its usability for various data mining models. It seems that the benefits of PMML will continue to draw new practitioners to the adoption of this standard.

Remi Raphael, remiDOTraphaelATgmailDOTcom

Bookmark using any bookmark manager!


KDnuggets : News : 2008 : n20 : item6 < PREVIOUS | NEXT >

Copyright © 2008 KDnuggets.   Subscribe to KDnuggets News!