KDnuggets Home » News » 2016 » Feb » Tutorials, Overviews » Change in Perspective with Process Mining ( 16:n05 )

Change in Perspective with Process Mining


Process mining is focused on the analysis of processes, and is an excellent tool in particular for the exploratory analysis of process-related data. Understand how effectively use it as an exploratory analysis tool, which can rapidly and flexibly take different perspectives on your processes.



By Anne Rozinat, Fluxicon

Data Scientists spend a large part of their day on exploratory analysis. In the 2015 Data Science Salary Survey[1], 46% of respondents said that they use one to three hours per day on the summarizing, visualization, and understanding of data, even more than on data cleansing and data preparation.

Process mining is focused on the analysis of processes [2], and is an excellent tool in particular for the exploratory analysis of process-related data. If your data science project concerns business or IT processes, then you need to explore these processes and understand them first before you can train machine learning algorithms or run statistical analyses in any meaningful way.

With process mining you can get a process view of the data. The specific process view results from the following three parameters:

  1. Case ID: The selected case ID determines the scope of the process and connects the individual steps of a process instance from the beginning to the end (for example, a customer number, order number or patient ID)
  2. Activity: The activity name determines the steps that are shown in the process view (such as “order received” or “X-ray examination completed”).
  3. Timestamp: One or more timestamps per step (for example for the beginning and the end of an X-ray examination) are used to calculate the process sequence and to derive parallel process steps.

When you analyze a data set with process mining, then you determine at the beginning of the analysis, which columns in the data correspond to the Case ID, activity name, and timestamps. You can set these parameters in the configuration when importing the data into the process mining tool.

When importing a CSV file into the process mining software Disco, you can specify for each column in your data set how it should be interpreted. (Note: For the open-source software ProM (http://www.promtools.org/) you often use XML formats such as XES or MXML, which contain this configuration.)

In the following example of a purchasing process, the Case ID column (the purchase order number) is configured as Case ID, the start and complete timestamps as Timestamp, and the Activity column as Activity. As a result, the process mining software automatically produces a graphical representation of the actual purchasing process based on historical data. The process can now be further analyzed based on facts.

process-mining-fig1

Usually, the first process view– and the import configuration derived from it–follows from the process understanding and task at hand.

However, many process mining newcomers are not yet aware of the fact that a major strength of process mining, as an exploratory analysis tool, is that you can rapidly and flexibly take different perspectives on your process. The above parameters function as a lens with which you can adjust process views from different angles.

Here are three examples:

1. Focus on Another Activity

For the above purchasing process, we can change the focus on the organizational process flow by setting the Role column (the function or department of the employee)as Activity.

process-mining-fig2

This way, the same process (and even the same data set) can now be analyzed from an organizational perspective. Ping-pong behavior and increased transfer times when passing on operations between organizational units can be made visible and addressed.