Ingo Mierswa and Ralf Klinkenberg are co-founders of the open-source company Rapid-I, which is providing software, solutions, and services in the fields of predictive analytics, data mining, and text mining. The main product of Rapid-I, the data analysis solution RapidMiner, is today the world-leading open-source system for professional data mining. It is available as a stand-alone application for data analysis and as a data mining engine which can be integrated into own products.
I am pleased to present this interview with both of them. Gregory PS
Gregory Piatetsky-Shapiro: 1) How did you and your colleagues start Rapid-I?
Ingo: The story around Rapid-I is closely connected to the history of RapidMiner. Back in 2001, we first started developing a data mining software environment named YALE as research assistants of Prof. Dr. Katharina Morik at University of Dortmund, Germany. We envisioned a data mining tool that was more flexible and by far more powerful than the tools available in the market. We made YALE available as open source software from the first version on and it quickly attracted lots of users.
Ralf: In 2006, when I had already left the university and worked as a data mining freelancer, the number of requests for RapidMiner consulting and training increased so much, that Ingo and I decided to start the company Rapid-I, to provide professional support, training, consulting, projects and other data mining services for the software users. YALE was completely rebuild to better meet the scalability and robustness requirements of global corporations with large volumes of data and renamed to RapidMiner to better describe its key features: rapid data mining application development.
Ingo: Now we had a widely used analysis solution and a great team providing all necessary services for professional analysts. From that point of time, we extended our data mining and text mining product and service portfolio as well as our customer base. Rapid-I now serves more than 200 customers world-wide.
2) Tell us about the best features of the software RapidMiner and the company Rapid-I behind it.
Ralf: RapidMiner is today by far the most comprehensive data mining solution for all steps of the data mining process from data loading and transformation to descriptive and predictive modelling, model evaluation, deployment, and evaluation as well as reporting. Key features of RapidMiner are its enormous flexibility and functional breadth, which supports all kinds of data mining, text mining, web mining, audio mining, time series analysis and forecasting and predictive analytics tasks. This flexibility and rapid deployment in projects lead to really fast project implementations.
Ingo: Combine this with the open source model free of license fees and you do not only get fast project implementations but also unbeatably low total cost of ownership and a really fast return on invest. The Rapid-I team helps its customers with its expertise to get a faster start and to achieve the most efficient and effective deployment of innovative data mining technology for their needs. This does not only allow our customers to be more innovative with their products and services than their competitors, but also to be more effective and profitable. By the way, RapidMiner has been downloaded more than 500,000 times and is now used in more than 50 countries world-wide.
3) What are the challenges of running a business based on open-source software?
Ingo: Well, you give away a world-leading data mining solution for free instead of becoming rich by charging high license fees. So the only way to be successful as an open source company is to be more innovative and to provide better expertise and better services than the competition. The customers only pay for the services they want and that are valuable to them. They do not pay for license fees and there is no vendor lock-in. If there is a better service provider, they can switch. And if the services do not provide a good value, they do not pay anything. So there is no choice for an open source business but to be better than the competition and to provide true value at a fair price.
Ralf: Let me add that, so far, Rapid-I has mastered this challenge extremely well and doubled the number of customers, the sales volume, and the team size every year. The good side of the open source business model is: If you provide a world-leading high-quality software for free and offer correspondingly high-quality services for a fair price, the software and the word-of-mouth spread really fast and that makes the business grow really fast without extensive marketing budgets ...
Ingo: ... which at the end again is better for our customers ...
Ralf: Right. There was no need for venture capital or other external funding for us. We were profitable from the start and achieved our growth rates by organic growth of our customer base.
4) Ingo, before Rapid-I you got a Ph.D. on "Non-Convex and Multi-Objective Optimization for Numerical Feature Engineering and Data Mining". Can you tell us the main idea of your thesis?
Ingo: For me, data mining is always about a trade-off: you of course prefer models or model parameters with less prediction errors on known data. But at the same time, you also prefer less complex models in order to avoid overfitting. Those goals are clearly conflicting and I made this trade-off explicit and solved both problems simultaneously by means of multi-objective optimization techniques. Those deliver the full Pareto front of all possible and meaningful models within a single optimization run and hence in exactly the same time as the known optimizations for a single model - the user simply can choose one of the meaningful models afterwards when he got all information available.
Here is part 2 of the interview