Interview: Ingo Mierswa, RapidMiner CEO on “Predaction” and Key Turning Points

RapidMiner CEO Ingo Mierswa talks about "predaction", reasons for RapidMiner popularity, business source model, analytics to investigate fraud, key turning points, and more.

By Ajay Ohri, DecisionStats, June 2014.

Here is my interview with Ingo MierswaIngo Mierswa, co-founder and CEO of RapidMiner.

Ingo Mierswa is an industry-veteran data scientist since starting to develop RapidMiner at the AI Division of the University of Dortmund, Germany. Mierswa has authored numerous award-winning publications about predictive analytics and big data. Mierswa, the entrepreneur, is the founder of RapidMiner. Under his leadership RapidMiner has grown up to 300% per year over the past five years. In 2012, he spearheaded the go-international strategy with the opening of offices in the US.

Ajay Ohri: Q1. RapidMiner continues to be very popular data mining software. Based on your customer feedback what are the reasons for this popularity?

Ingo Mierswa: Everything at RapidMiner follows three simple principles: "predaction", collaboration and simplicity. "Predaction" represents the fact that, contrary to popular belief,
the biggest value of predictive analytics is not in high-level predictions, but rather in performing millions of micro-predictions and acting on those predictions.

"What is the weather predaction for tomorrow?" - "I will bring my umbrella!"

Umbrellas and Rain The value is not in the knowledge that it is going to rain; the true value lies in determining the best option for you when this happens so you can be prepared. Should you stay home? Do you bring your umbrella? RapidMiner is a platform that can create those millions of predictions and trigger the right business actions as a result.

Collaboration means that we offer teams of people with different backgrounds a white board on which they can express their ideas on data integration, transformation, and modeling and turn them into reality with a single click. It's true that RapidMiner makes predictive analytics easy for business analysts since the product requires no programming and users can build analytics processes with a drag and drop interface. And indeed, across our customer base, the business analyst is often in the driver's seat. But if business users can contribute the business problem, then data scientists are freed up from standard tasks and can focus on the specialized algorithms where needed, and IT professionals can contribute the data and control access rights. RapidMiner empowers such a team to effectively reach the best solution together.

Finally, simplicity means that everybody can create predictions and predactions within just a few minutes. We identified a speed-up of up to a factor of 40-50 times compared to pure scripting approaches for data integration, transformation, modeling, deployment and maintenance. It is amazing how RapidMiner achieves this ease-of-use and performance gain given that it supports more than 1,500 analytical operations, including hundreds of methods for data integration, data transformation, data modeling and data visualization - with access to data sources including Excel, Access, Oracle, IBM DB2, Microsoft SQL, Sybase, Ingres, MySQL, Postgres, SPSS, dBase, text files and more.

AO: Q2. You embraced open source for RapidMiner many years ago. As an early adopter of the open source model, what have been your good and bad experiences? What would you say to other enterprise software creators who have been reluctant and even mildly fearful of intellectual property protection once their existing popular software goes partially or fully open source?

IM: We're proud of our open source roots and have a very large community of dedicated users that create add-ons to our software, which we will continue to support. But a pure open source-based business model poses a lot of challenges - and handling of intellectual property rights is only one of them. It is important to find a balance between being open and embracing innovation as well as offering extra value for paying customers.

Over the last six years, we evaluated multiple open source business models before settling on our existing business source model. We started with a pure open source model where the software was completely open and customers would only pay for guarantees and technical support. While this might work for mission-critical infrastructure parts like Red Hat's operating system, it is not a good model for business applications like RapidMiner. We then moved on - together with other large open source vendors like Jaspersoft, Pentaho and Talend- to offer our software in an open core model where the core of the software was open and freely available but customers had to pay for premium features.

But this model has disadvantages as well, including the fact that it is disconnected from the original concept of open source since those premium features will never be available for the community. It also poses challenges in product strategy since each feature could only support either community growth or conversion, but not both.

Our business source model represents a perfect balance of this community support and a scalable business model. The idea behind business source is incredibly simple: the latest and greatest version of RapidMiner is available under a freemium model, while previous versions are available under an open source license.

Popularized by Michael (Monty) Widenius, one of the founders of MySQL and an investor in RapidMiner, business source is a commercial software license model that offers many of the benefits of open source, but with a built-in time delay on users being able to access new versions of our products. This allows us to deliver feature-rich versions of the software to all groups of users, while commercial, paid users are able to analyze larger data sets and connect the software to more data sources.

AO: Q3. In the history of RapidMiner what are the top 5 turning points that you have seen?

IM: We started the development of RapidMiner under the name of "YALE" in 2001, and although we basically rewrote the complete product three times, we cover a history of more than ten years and have been through all phases of the market.

We started with a pure computation engine for analytical processes in 2001, and the first big turning point was when we introduced the graphical user interface, which was flexible enough to perform, for example, loops and cross-validation for preprocessing to measure its effect, and many other complex analytical tasks - all without the need for writing a single line of code. We achieved a flexibility known from programming languages like Base SAS or R but combined it with the ease of use of a SAS Enterprise Miner or Clementine, which later on became SPSS Modeler.

Believe it or not, the second turning point was the creation of a Windows installer which helped to overcome the technical hurdles of installing RapidMiner and the necessary underlying technologies. This doubled and tripled our community growth immediately.

The next significant phase was the development and release of RapidMiner 5 in 2010. We made a complete paradigm shift from our process trees to workflows. However, at the same time, we kept RapidMiner-specific features like the ability to propagate metadata through the process and detect errors during the design phase. This was also a major turning point towards usability and analyst support. With RapidMiner 5, we also introduced our Server product which allowed for remote execution, scheduling and one-click integration into business processes.

The fourth and fifth key milestones took place in 2013. The fourth was the change in our business model to business source, which I described previously. And fifth, the release of RapidMiner 6, which included our new "accelerators" that allow users to get the first predictions and predictive insights within five minutes after installation. All users have to do is pick a problem type like "churn" and throw in their data, and RapidMiner builds the necessary process. Users can then use this process as a starting point for optimizations, add more of their own data, or deploy it directly on the server for integration. Accelerators are a perfect blend between the ease-of-use of a shrink-wrapped vertical solution and the power and flexibility of a platform like RapidMiner.

AO: Q4. Analytics to investigate cyber-crimes - How can RapidMiner help in this area?

IM: In a broad sense, cyber-crime covers all types of crimes committed with the help of computers and networks. Typical examples are fraud detection in transactions, phishing attempts, identification of offensive content, attacks against computers or networks and even spam, which is considered unlawful in many jurisdictions.

RapidMiner dashboard for Fraud Analytics

We have seen applications of RapidMiner in all of these named examples. RapidMiner is often used as a central part of an intrusion detection system to analyze web logs for unusual patterns. We have also seen a lot of applications in the field of fraud detection, where both supervised and unsupervised machine learning methods of RapidMiner have been modeling fraud cases in financial transactions, and also in insurance claims, mainly in the health insurance sector.

Additionally, RapidMiner is used in the banking sector to identify logins that are a consequence of a phishing attack. All of these examples clearly show how predictive analytics, and RapidMiner, can help to identify cyber-crime. But we are still not seeing a "true" predictive approach since all identifications are re-active in nature. Personally, however, I would not like the idea of the movie Minority Report becoming a reality, so keeping a human being in the loop is the way to go.

AO: Q5. What is one use or application of RapidMiner that has surprised you with its popularity and creativity, which you had not thought of while creating it?

IM: I am surprised at how many users are deploying RapidMiner for sports analytics. This was definitely unexpected and came as a big surprise. We have seen applications of RapidMiner in sports betting, in predicting the winner of the World Series or the World Cup, and for identifying underrated players on the transfer market. We have even seen the use of RapidMiner for creating predictions during the games themselves, such as predicting the most likely landing zones for a certain combination of a pitcher and batter in baseball.

AO: Q6. Name a few anecdotal case studies where companies achieved analytics excellence by blending in RapidMiner and its R extension?

IM: We constantly see cases where optimal results are achieved when each member of an analytical team can focus on their key strengths. R programmers are true data scientists and as the name "scientist" suggests, they are best when they focus on the creation of a new algorithm for a complete new task. But in order to do work on this time-intensive task, they need to be freed from standard tasks related to data integration, transformation and visualization.

One of our customers in the automotive sector has optimized this approach. The complete team for advanced analytics is working as a center of excellence and supports multiple other business units with predictive intelligence and prediction-based action. The team consists of approximately 10 people with the majority being business analysts, plus one database expert and one data scientist creating new algorithms. In just two weeks, this team created and deployed the overall workflows for predictive maintenance of their machines. The data scientist used R to create an ensemble of methods using the preprocessed sensor data coming from RapidMiner. The overall workflow, as well as the notifications, are all governed by RapidMiner while the actual prediction was created by a specialized new algorithm created in R. This is a perfect example of how a collaborative team can blend the easier maintenance and reproducibility of RapidMiner with the programming possibilities of R to efficiently achieve a solution within a shorter time frame.

AO: Q7. What are some of challenges you foresee in the immediate and medium term for your company as well as your sector and domain?

IM: I think the biggest challenge for our field, as I mentioned above, is that many organizations believe that they can only rely on data scientists to work on predictive and advanced analytics. We are focused on changing this paradigm by empowering a team of experts with different backgrounds to help us achieve the results we're looking for, and other organizations should be using this approach as well in order to be successful.

Ajay Ohri is the founder of and the author of R for Business Analytics (Springer 2012).