Ingo Mierswa and Ralf Klinkenberg are co-founders of the open-source company Rapid-I, which is providing software, solutions, and services in the fields of predictive analytics, data mining, and text mining.
Here is part I of the interview
5) Ralf, what attracted you to data mining and research?
Ralf: The mechanisms of learning fascinate me. Learning in humans and animals as well as learning machines. Therefore I focused my studies and research on machine learning, data mining, and descriptive and predictive modelling. Developing software for automated learning and process automation then was just a natural way to go, if you are not only interested in the theory of what can be learned and how and what the limits of learnability are, but if you also want to make this knowledge deployable and work for many people and organizations.
6) Ingo, you are a very young CEO and Ralf, same for you as CBDO. How did you make a transition from a graduate student to those positions?
Ralf: The transition happened quite smoothly. In 2001, I initiated the open source project RapidMiner / YALE with a certain vision of flexibility and broad applicability and extendibility in mind.
Ingo quickly took over the project lead and headed the development team and further development of RapidMiner, while I focused on enabling innovative data mining applications with our software and on attracting new users and customers for these solutions. So it seemed quite logical to us that Ingo would lead the technology development at Rapid-I and, thanks to his excellent leadership skills, Ingo was also the best fit to become the Rapid-I CEO. For me, the focus stays on developing ideas for innovative data mining solutions, supporting customers and partner companies to make these visions work for their businesses.
Ingo: As a CEO, I often serve as a connection point between the world inside and outside of Rapid-I. The ability to always keep the overview, to motivate people, and strengths in communication are definitely important skills for this - and things which I always try to optimize.
7) What future developments (both in Rapid-I and outside) make you excited?
Ingo: RapidMiner 5 is going to define a real milestone for data analysis: in the new version, RapidMiner continuously checks and propagates the meta data through the analysis process. This allows for the on-the-fly detection of design errors as early as possible and for the quite cool quick fixes which are a great help in analysis design especially for the less experienced users. I am sure that this meta data handling of RapidMiner will be the start of a revolution since it will completely change the way analysts define their data mining processes.
Ralf: Another pretty cool result arises from our cooperation with the open source enterprise database provider Ingres and the highly innovative data warehouse experts of VectorWise. Together, we develop the next generation data mining solutions that extend the scalability of data mining to larger data sets. Together with Ingres VectorWise, we made a demonstration of the results at the Open Source Business Intelligence (OSBI) event last December: we used the same large data set for an analysis in memory and directly in the database. The surprising result was: we did not need any memory on the client at all and - more important - the analysis was ten times faster in database than in memory. We got standing ovations for that.
Ingo: Some people claim, open source solutions are merely copies of established closed source solutions. They are wrong. The speed of development is much faster, release cycles shorter, the time from an idea to a deployable solution much shorter. I really believe that many open source companies will outpace the closed source competitors.
For example, Rapid-I was one of the first providers of solutions for automated sentiment analysis and online market research, customer insight, and competitive intelligence. We quickly attracted some of the leading European and US market research companies as well as major brand product manufacturers, either with RapidMiner as text mining and sentiment analysis software or for corresponding web services like RapidDoc or full service offerings like RapidSentilyzer. Other application areas are currently emerging like predictive maintenance, machine failure prevention, process monitoring, quality control, and manufacturing optimization as well as intelligent network analysis and visualisation with our new tool RapidNet. Rapid-I has a well filled product and service pipeline to be unrolled in 2010.
Ralf: For the data mining industry as a whole, I see strong trends to greater data sets, longer time periods of data collection being considered, finer grained data collections, holistic data collection covering all aspects of customer interaction or of a production process for optimization, more scalable solutions, etc. Increased usability and more seamless integration of data mining in bigger solutions will also become more important at the same time. Accordingly, we see a strong increase in other software vendors and integrators wanting to use RapidMiner as powerful OEM data mining, text mining, and/or time series analysis and forecasting engine inside their products. In the last three months we had more OEM requests than in the last three years.
8) What was a recent book you read and liked?
Ingo: The last book I read was Lawrence Norfolk's "Lempriere's Dictionary". I love the creativeness of the author as well as the different layers of very similar things happening at different times to different people - history is repeating, even if it is partly fictive.
Ralf: My recent readings are quite diverse and include, besides of a lot data mining and business and economics literature, novels like "Drei Minuten mit der Wirklichkeit" (German) by Wolfram Fleischhauer. This book lets you dive into a different world, where technology plays no major role which I sometimes really like. Our world is quite faceted and also in the real world you can dive into quite different "sub-worlds" or "realities" - or cultures and subcultures or communities or however you like to view and call it.
9) What advice would you give to young students considering data mining?
Ralf: Learn and understand the basics of machine learning, statistics, and data mining, and imagine what you can do with these tools. It's hard to find a place where you have both a good education in statistics and in algorithms. We are really lucky to have such a place in Dortmund.
Ingo: Use your common sense, your imagination, and your creativity to come up with new approaches and applications. That is fun, quite satisfying, and could be the basis for a successful research or business career or for your own business.