KDnuggets Home » News :: 2013 :: Jan :: Software :: PMML FAQ: Predictive Model Markup Language ( 13:n02 )

PMML FAQ: Predictive Model Markup Language

          


An update on PMML (Predictive Model Markup Language), de facto standard to represent predictive solutions. With PMML 4.1, all the capabilities available for data pre-processing were also made available for post-processing.

From:
Alex Guazzelli kindly updated KDnuggets FAQ: PMML entry and his entry was so good that I want to share it with KDnuggets readers.

Alex Guazzelli (VP, Analytics at Zementis), answers:

PMMLPMML stands for "Predictive Model Markup Language". It is the de facto standard to represent predictive solutions. A PMML file may contain a myriad of data transformations (pre- and post-processing) as well as one or more predictive models.

Because it is a standard, PMML allows for different statistical and data mining tools to speak the same language. In this way, a predictive solution can be easily moved among different tools and applications without the need for custom coding. For example, it may be developed in one application and directly deployed on another.

Traditionally, the deployment of a predictive solution could take months, since after building it, the data scientist team had to write a document describing the entire solution. This document was then passed to the IT engineering team, which would then recode it into the production environment to make the solution operational. With PMML, that double effort is no longer required since the predictive solution as a whole (data transformations + predictive model) is simply represented as a PMML file which is then used as is for production deployment. What took months before, now takes hours or minutes with PMML.

PMML is developed by the Data Mining Group (DMG), a consortium of commercial and open-source data mining companies. The latest version of PMML, version 4.1, was released by the DMG in December 2011.

Since PMML is XML-based, it is not rocket science. Its structure follows a set of pre-defined elements and attributes which reflect the inner structure of a predictive workflow: data manipulations followed by one or more predictive models.

What are the benefits of PMML?

PMML makes it extremely easy for any predictive solution to be moved from one data mining system to another. For example, once represented as a PMML file, a predictive solution can be operationally deployed right away, without the need for custom code. In this way, PMML transforms predictive analytic solutions into dynamic assets that can be put to work immediately.

For big companies with many in-house statistical and data mining tools, PMML works as the common denominator, since whenever the solution is built, it is immediately represented as a PMML file. This allows companies to use "best of breed" tools to build the best possible solutions.

Since PMML is a standard, it also fosters transparency and best practices. Transparency comes from the fact that the predictive solution is no longer a black box. Open the box and understanding what is inside, the analytics team can easily recognize past decisions and establish practices that work.

What kind of predictive techniques are supported by PMML?

PMML defines specific elements for several predictive techniques, including neural networks, decision trees, and clustering models, to name just a few. New techniques just recently supported are k-Nearest Neighbors and Scorecards, which include reason codes.

PMML also defines an element for representing multiple models. That is, PMML can be used to represent model segmentation, composition, chaining, cascading, and ensemble, including Random Forest Models.

To review all the elements supported by PMML, take a look at the language specification at the DMG website (see Resources below).

Can PMML represent data pre- and post-processing?

PMML has several built-in functions, such as IF-THEN-ELSE and arithmetic functions, that allow for extensive data manipulation. It also defines specific elements for the most common pre-processing tasks such as normalization, discretization, and value mapping. To review all the pre-processing capabilities PMML has to offer, refer to the PMML pre-processing primer.

With PMML 4.1, all the capabilities available for data pre-processing were also made available for post-processing. In this case, a PMML file can now also contain a set of business rules that define actions or decisions to be taken based on the outcome of the predictive model. A PMML file can represent the entire predictive solution, from raw data and model to business decisions.

Resources

Websites

Book: PMML in Action (2nd Edition) - Available on Amazon.com

Talks/Presentations

Articles:








Most popular last 30 days


 

Most viewed last 30 days

  1. The Grammar of Data Science: Python vs R - Mar 28, 2015.
  2. More Free Data Mining, Data Science Books and Resources - Mar 25, 2015.
  3. Deep Learning, The Curse of Dimensionality, and Autoencoders - Mar 12, 2015. 4, up3
  4. Awesome Public Datasets on GitHub - Apr 6, 2015.
  5. Deep Learning for Text Understanding from Scratch - Mar 13, 2015.
  6. PredictionIO (Open Source Version) vs Microsoft Azure Machine Learning - Mar 26, 2015.
  7. 10 things statistics taught us about big data analysis - Feb 10, 2015.
  8. Top 10 Data Analysis Tools for Business - Jun 13, 2014.
  9. Forrester Wave(tm) Big Data Predictive Analytics 2015: Gainers and Losers - Apr 3, 2015.

 
 

Most shared last 30 days

  1. The Grammar of Data Science: Python vs R - Mar 28, 2015.
  2. Forrester Wave(tm) Big Data Predictive Analytics 2015: Gainers and Losers - Apr 3, 2015.
  3. Awesome Public Datasets on GitHub - Apr 6, 2015.
  4. PredictionIO (Open Source Version) vs Microsoft Azure Machine Learning - Mar 26, 2015.
  5. Data Science as a profession - time is now - Mar 30, 2015.
  6. Cloud Machine Learning Wars: Amazon vs IBM Watson vs Microsoft Azure - Apr 16, 2015.
  7. Top LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science - from "Big Bang" to Now - Apr 19, 2015.
  8. Computing Platforms for Analytics, Data Mining, Data Science - Apr 1, 2015.
  9. How Big Data Can Improve the Lives of the Poor - Mar 31, 2015.
  10. Gold Mine or Blind Alley? Functional Programming for Big Data & Machine Learning - Apr 1, 2015.

KDnuggets Home » News :: 2013 :: Jan :: Software :: PMML FAQ: Predictive Model Markup Language ( 13:n02 )