Data Science of Process Mining – Understanding Complex Processes

Process Mining is introduced and explained, including Its benefits for Data Science, and key resources for further exploring Process Mining, including videos, articles, and MOOCs.

By Anne Rozinat, PhD and Christian W. Gunther, PhD.

Imagine that your data science team is supposed to help find the cause of a growing number of complaints in the customer service process. They delve into the service portal data and generate a series of charts and statistics for the distribution of complaints over the different departments and product groups. However, in order to solve the problem, the weaknesses in the process itself must be identified and communicated to the department.

You then include the CRM data and with the help of Process Mining you are quickly in the position to identify unwanted loops and delays in the process. And these variations are even displayed automatically as a graphical process map! The head of the CS department can detect at first glance what the problem is, and can immediately undertake corrective measures.

Right here is where we see an increasing enthusiasm for Process Mining across all industries: The data analyst can not only quickly provide answers but also speak the language of the Process Manager and visually display the discovered process problems.

Data scientists deftly move through a whole range of technologies. They know that 80% of the work consists of the processing and cleaning of data. They know how to work with SQL, NoSQL, ETL tools, statistics, scripting languages such as Python, data mining tools, and R. But for many of them Process Mining is not yet part of the data science toolbox.

What is Process Mining?

Process Mining is a relatively young technology, which was developed about 15 years ago at the Technical University of Eindhoven by the research group of Prof. Wil van der Aalst. Given the name, it seems to be related to the much older area of 'data mining'. Historically, however, Process Mining has its origin in the field of business process management, and the current Data Mining Tools contain no Process Mining Technology.

So what exactly is Process Mining?

Process Mining allows us to map and analyze complete processes based on digital traces in the information systems. A process is a sequence of steps. Therefore the following 3 requirements must be met in order to use Process Mining:

  1. Case ID: A case ID must identify the process instance, a specific execution of the process (for example, a customer number, order number, or patient ID).
  2. Activity: For each process the most important steps or status changes in the process must be logged. These mostly can be found in the business data of a database in the IT system (e.g., the date of an offer to the customer in the sales process).
  3. Timestamp: For every process step you need a timestamp to bring the process sequence for each case in the correct order.

Process Mining Data Requirements

[Figure 1 - Data Requirements Process Mining]

If you find these 3 elements in your IT system, Process Mining can supply a correct representation of the process in the blink of an eye. The visualisation of the process is generated directly from the historical raw data.

What You Can Do With Process Mining

Process Mining is not a reporting tool, but an analysis tool. It enables you to quickly analyse any and very complex processes. For example so-called Click Streams from websites that show how visitors navigate a webpage (and where they "drop out" or "wander around" due to poor usability of the page). Or take the new workflow system in your company, which has only recently been established and from which the department now wants to know how many processes really follow the redesigned, streamlined process path.

You can display the activity flow as well as the transfer between departments in different views of the process, identify bottlenecks, and investigate unwanted or long-running paths within the process. These process views can also be animated to help in the communication with the department: the actual processes based on the timestamps from the data are 'replayed' and show in a very tangible way where the problems in the process are.

Process Mining Animation

[Figure 2 - Animation in the Process Mining Software Disco]