Interview: Michael Berthold, President and Founder of KNIME, on Data Mining, Startups, and Visual Workflow

We discuss KNIME key features and how it compares to competition, KNIME business model, Pharma, planned development, and transition from an academic project to a company.

By Gregory Piatetsky, @kdnuggets, Aug 9, 2014.

Prof. Dr. Michael Berthold is the founder and president of AG, makers of the popular KNIME open source data mining and processing platform.

I have known Michael for many years, and have watched his transition from a leading researcher and organizer of IDA (Intelligent Data Analysis) meetings to the founder and president of a successful company. In our meetings at different conferences Michael always had interesting comments and observations, so I was glad he found time to answer my questions.

In the first part of the interview I have asked him about KNIME the company, KNIME the software platform, coming new features, his research, Big Data Hype, and more - read on! Michael's bio is at the end of this post. Here is part 2 of the interview.

Gregory Piatetsky, Q1. You are most known now as the founder and president of KNIME, a leading open source data mining platform. What are the key features of KNIME and how it compares to other platforms, commercial like SAS, open-source like Weka, and RapidMiner (in-between)?

Michael Berthold Michael Berthold: There are many differences, the license, the GUI, the breadth of functionality. KNIME comes under an open source license like Weka, so we can integrate other open source projects such as R and Weka. We also have a super active community adding cool new features continuously as well. Our community contributions are a huge asset, adding highly specialized functionality that you have a hard time finding in other, closed tools.

Like many others, KNIME started at a university - however, it didn't start as a research project but from day #1 as a well-designed software project since we knew from the beginning that we needed to build a professional scale platform - a software architect was part of the founding team! That has resulted in KNIME consistently getting very positive ratings when it comes to stability and scalability.

KNIME Workflow
Another, lesser known difference is the graphical workflow editor. All of the tools you list above have that, too, but it was added afterwards. Only KNIME really runs underneath the hood what you model visually. Many of the other tools are based on a different (scripting...) representation that the visual representation must be matched to. And the result you can see in the Rexer Analytics Data Miner Survey, Gartner and other reports: users consistently rank KNIME highest for ease of use.

Finally, KNIME has probably the most comprehensive set of ETL nodes (in addition to lots of analytics and visualization nodes) out there.Thanks to the extensive and powerful ETL component, thanks to its professional open architecture, and thanks to its active community, KNIME enables quick, easy, and seamless integration of tools and data.

GP: Q2. What is KNIME business model? How do you plan to compete with other companies that have software licenses or business-source model (where only limited versions are free, and latest version requires payment) ?

MB: KNIME creates revenue by licensing additional tools around the open analytics platform KNIME. Those tools allow you to run KNIME more productively in larger teams (such as the KNIME Server), deploy KNIME workflows to others in your group (via the KNIME WebPortal) or make it easier to access Big Data (via the Cloudera certified Big Data Extensions). This is all stuff you can also do with the open source KNIME, but our commercial software just makes your life easier or simply saves you time. We strongly believe in this concept - others provide outdated or crippled, light weight versions that you cannot use for real work - with open source KNIME you can do everything you want. And this model is working well, KNIME is profitable and growing strongly.

GP: Q3. KNIME seems to be especially popular in pharma and life sciences. What makes it so? Did KNIME contribute to discovering new drugs or medicine?

MB: That's an impression from the past. KNIME initially found very quick traction in life science research because people there had big data problems before that term even became popular. Nowadays our user base in pharma still grows but it grows much faster in other areas, such as customer intelligence, predictive maintenance, and others.

I'd like to believe that KNIME was a critical part of the discovery of a new drug but that's hard to validate. Drug discovery is a complex and long process. Plus I am sure nobody would ever admit that out of fear that we'd demand royalties :)

GP: Q4. Can you share some planned developments and exciting features planned for KNIME ?

MB: We are working hard on a lot things - the Big Data Extensions will grow further to embrace other platforms as well and we will be reaching out to some of the machine libraries there too. We are also making all of our visualizations "web aware" so that KNIME can ultimately be run on the web. And we are also working on our Enterprise Server setup, so that big companies can share workflows using distributed servers around the globe. And we will continue to add lots of smaller and bigger nodes to KNIME, of course. We have a completely new Python integration in the works and will also add better JSON support in the next version, to name just two categories.

GP: Q5. Tell us about the transition of KNIME from Konstanz University project to a company - what were the key transition points? What was most surprising to you?

MB: As I said before, we knew from day #1 that we were building a professional scale, open analytics platform (note that I am not claiming we knew from day #1 what a successful business model around that platform would look like, though). So the transition to create a spin off and later move the company to Zurich was pretty natural. At some point in time it became obvious that work needed to be done that was not really useful for an academic research group anymore.

Were there surprises? On the business side we all learned a lot of new things. But also on the SW side. To be quite honest, I didn't initially expect that the visual workflow editor would have such an impact. I found it the most natural way to do things but I underestimated what impact it would have for others to document what they were doing and creating blue prints of sophisticated analyses that others could use as a template.

Bio: Michael Berthold got his PhD from Karlsruhe University, Germany, and then spent over 7 years in the US - at CMU, Intel, UC Berkeley, and as director of an industrial think tank in San Francisco.

Since August 2003 he holds the Nycomed-Chair for Bioinformatics and Information Mining at Konstanz University, Germany where his research focuses on using machine learning methods for the interactive analysis of large information repositories in the Life Sciences.

Most of the research results of Michael Berthold's group are made available to the public via the open source data processing platform KNIME. In 2008 Michael Berthold co-founded AG, located in Zurich, Switzerland. offers consulting and training for the KNIME platform in addition to an increasing range of enterprise products.

M. Berthold is a Fellow of the IEEE, Past President of the North American Fuzzy Information Processing Society, Associate Editor of several journals and Past-President of the IEEE System, Man, and Cybernetics Society. He was involved in organization of various conferences, most notably the IDA-series of symposia on Intelligent Data Analysis and the conference series on Computational Life Science. With David Hand, he co-edited a successful textbook "Intelligent Data Analysis: An Introduction". He is also a co-author of the "Guide to Intelligent Data Analysis" (2010).