Comments Phil, PMML Scoring
Tim - I think you misunderstood my point about creating a scoring engine that would say take a function in C (or even SQL) as the input.
If the scoring engine included a C compiler to compile the function (or equation!), then it should be able to deal with any C code you generate.
Also you personally would not have to like C or whatever over language, it would be the modelling software that would spit it out, as it does the PMML code.
Also you would not have to wait for PMML to evolve, you could put in anything that could compile.
Tim Manns, PMML scoring
I don't think there is anything wrong with PMML, it offers a simple formalised way (eg XML) to describe a generated model.
I disagree that a better alternative is to use "a more flexible and developed language such as SAS, C or even VBA." I agree these languages are powerful, but too flexible. I doubt a scoring engine could interpret all the C or VBA code I could write. What if you hate SAS, or vb or java? I'd write some code different from the next person. A formalised structure such as XML is the best way to go, and can be used by a scoring engine written in any language.
Why would a company develop a scoring engine for free, when they can sell it and make a profit? There are scoring engines out there, many embedded in the data mining software we use.
The reason I don't use PMML is convenience. I also have the option to use my models and score tables as SQL. It's easier and far quicker for me to use SQL. My neural nets, CART etc are converted into SQL automatically on-the-fly. Not all databases support predictive or clustering models as PMML, and since all my data is on huge terabytes sized data warehouse I must process in the db. For me its not so much a problem with PMML, but support for it in the database I use.
RobertB, No XML please!
XML is a degenerative technology that should have never been used. PMMLs offere only XML-worthy. Some reasons include;
1. Precision: Representing 64bit or 128bit numbers in XML string is not something that you would expect to work properly. There are truncations and conversion differences. With PMMLs, you get inferior models!
2. Sequence of computation can have significance impact on accuracy. You should use the same binaries for both modeling and evaluation! No point using XML.
3. XMLs make it hard to use for non-XML enthusiasts, especially for non-IT people.
Rick Pechter, PMML
Phil is correct that PMML supports a finite number of model types and I can understand why this would be seen as a limitation for an algorithm producer. But the standard covers a vast majority algorithms required by users. More algorithms are being added to the standard all the time.
As a model consumer, it has the added benefit of separating the model definition from the model execution. Using PMML, my company can accept models from a wide variety of sources without having to build special drivers for each vendor. More importantly, we don't have to worry about viruses, malware and other security issues that would prevent most IT organizations from blindly deploying alien code into their data infrastructure. Our scoring engine simply parses the XML and scores the models.
I like Phil's suggestion of an open PMML scoring engine. Over time, I wouldn't be surprised if one became available via the open source community.
Saed Sayad (iSmartsoft), Universal PMML Reader
We surely need a common way to save models, not only for scoring but also for all other model related information (e.g., data dictionary, variables statistics, etc.). However, the difficulty is finding a universal PMML Reader with the capability to generate at least the scoring code in SQL, C, JAVA, SAS, VB and more.
Alex Guazzelli (Zementis), PMML
PMML is constantly evolving to allow for the representation of more modeling techniques as well as being easier to produce and consume. It aims to cover the most widely used predictive techniques and so it is limited to these techniques. However, it does allow for users to represent data transformations, which in my view provides great flexibility to the standard.
It is not very productive to replicate a single solution in many different formats (for different packages) if you could represent it in a single format. Obviously such a format needs to be understood by all your favorite statistical packages. PMML is such a format. R, SAS, SPSS, and a range of other data mining packages already support the standard on one way or another. Obviously, the more people that start using PMML, the more statistical packages will support it. Open standards is the way to go.
When we started working on a scoring engine for our models, we decided not to write yet another language to represent models and so opted to use PMML. Actually, the engine itself was born out of the necessity to deploy the models we were building (using different packages as well as our own internal code) into our client's production environment.
Phil Brierley (Tiberius Data Mining), PMML
A common way to score models is a good idea, but using PMML is too inflexible. It relies on the scoring engine to know what to do. If my company develops a fancy new algorithm, then how do I convert this to PMML? Our current solution is to generate scoring code in as many conceivable formats as possible such as SAS, SPSS, SQL, Excel, C etc. so most people can score data in their favourite application.
In our experience your average Joe has never heard of PMML and would not know what to do with the XML code.
A more useable/workable idea to make models (or equations) more transferable would have been for the software vendors to fund a freely available scoring engine that can interpret a more flexible and developed language such as SAS, C or even VBA. Anyone could then easily write a model to be scored.