KDnuggets : News : 2007 : n23 : item3 < PREVIOUS | NEXT >

Features


Subject: Ronny Kohavi Interview: on MLC++, SGI, and MineSet

This is a second part of the interview. ( Here is the beginning of the interview with Ronny Kohavi. )

Gregory Piatetsky-Shapiro: After Stanford you moved to SGI where MLC++ became part of their MineSet tool.
How difficult was to adapt your PhD thesis work into a product?

Ronny Kohavi
Ronny Kohavi
Ronny Kohavi:

MLC++, the Machine Learning library in C++ (http://www.sgi.com/tech/mlc/) wasn't my thesis, but rather it was a tool that I built with the help of multiple students in order to test machine learning algorithms. It was extremely useful for implementing research ideas, and all my thesis algorithms were developed using MLC++.

When I interviewed for a job, I talked to several companies about using MLC++. IBM, for example, said they can't put public domain software in a commercial product (the company�s policy has changed significantly since 1995). SGI, on the other hand, said: wow, this could shave two years from our product development cycle. I went to SGI.

To answer the specific question, it was extremely easy to take the work and develop it further at SGI, but that's because of the culture at SGI. Because it was public domain, Stanford also gave permission to SGI to host it for public use, and at SGI we continued to share source-code level revisions on the web site.

At SGI, we used MLC++ for building classifiers, and either repurposed or invented unique visualization for providing insights. For the backend algorithms, some of the early multi-algorithm comparisons were done with MLC++ and described in robotics.stanford.edu/~ronnyk/mlcj.pdf .

For visualizations, the team built impressive models specifically to convey the meaning of the classifier and help interpret the predictions. Some QuickTime videos are available that show the ideas at the bottom of http://ai.stanford.edu/~ronnyk .

Specifically:

  • Decision Tree. This is a great example where decision trees with many nodes can be visualized effectively. The visualizer was based on SGI's file system navigator, shown in the Jurassic Park movie
  • Evidence Visualizer (aka Naive Bayes). This is an example of how to make conditional probabilities easier to understand. Working in log space, they add up (instead of being multiplied) so the concept of additive "evidence" is easy to understand.
  • Decision Table classifier is simple yet effective and is easy to visualize.
  • Splat Visualizer allows visualizing large amounts of data by creating Gaussian splats
  • Scatter Visualizer allows visualizing scatterplots with sliders for a total of 7 dimensions: X, Y, Z, color, size, and two sliders

Bookmark using any bookmark manager!


KDnuggets : News : 2007 : n23 : item3 < PREVIOUS | NEXT >

Copyright © 2007 KDnuggets.   Subscribe to KDnuggets News!