Privacy, Security and Ethics in Process Mining
Data Privacy, Security and Ethics are hot yet complex topics in the business and data science world. This important article talks about and provide guidelines for privacy, security and ethics, specifically in the context of Process Mining.
3. Consider Anonymization
If you have sensitive information in your data set, instead of removing it you can also consider the use of anonymization. When you anonymize a set of values, then the actual values (for example, the employee names “Mary Jones”, “Fred Smith”, etc.) will be replaced by another value (for example, “Resource 1”, “Resource 2”, etc.).
If the same original value appears multiple times in the data set, then it will be replaced with the same replacement value (“Mary Jones” will always be replaced by “Resource 1”). This way, anonymization allows you to obfuscate the original data but it preserves the patterns in the data set for your analysis. For example, you will still be able to analyze the workload distribution across all employees without seeing the actual names.
Some process mining tools (Disco  and ProM ) include anonymization functionality. This means that you can import your data into the process mining tool and select which data fields should be anonymized. For example, you can choose to anonymize just the Case IDs, the resource name, attribute values, or the timestamps. Then you export the anonymized data set and you can distribute it among your team for further analysis.
- Determine which data fields are sensitive and need to be anonymized (see also the list of common process mining attributes and how they are impacted if anonymized in ).
- Keep in mind that despite the anonymization certain information may still be identifiable. For example, there may be just one patient having a very rare disease, or the birthday information of your customer combined with their place of birth may narrow down the set of possible people so much that the data is not anonymous anymore.
- Anonymize the data before you have cleaned your data, because after the anonymization the data cleaning may not be possible anymore. For example, imagine that slightly different customer category names are used in different regions but they actually mean the same. You would like to merge these different names in a data cleaning step. However, after you have anonymized the names as “Category 1”, “Category 2”, etc. the data cleaning cannot be done anymore.
- Anonymize fields that do not need to be anonymized. While anonymization can help to preserve patterns in your data, you can easily lose relevant information. For example, if you anonymize the Case ID in your incident management process, then you cannot look up the ticket number of the incident in the service desk system anymore. By establishing a collaborative culture around your process mining initiative (see guideline No. 4) and by working in a responsible, goal-oriented way, you can often work openly with the original data that you have within your team.
4. Establish a Collaborative Culture
Perhaps the most important ingredient in creating a responsible process mining environment is to establish a collaborative culture within your organization. Process mining can make the flaws in your processes very transparent, much more transparent than some people may be comfortable with. Therefore, you should include change management professionals, for example, Lean practitioners who know how to encourage people to tell each other “the truth”, in your team .
Furthermore, be careful how you communicate the goals of your process mining project and involve relevant stakeholders in a way that ensures their perspective is heard. The goal is to create an atmosphere, where people are not blamed for their mistakes (which only leads to them hiding what they do and working against you) but where everyone is on board with the goals of the project and where the analysis and process improvement is a joint effort.
- Make sure that you verify the data quality before going into the data analysis, ideally by involving a domain expert already in the data validation step . This way, you can build trust among the process managers that the data reflects what is actually happening and ensure that you have the right understanding of what the data represents.
- Work in an iterative way and present your findings as a starting point for discussion in each iteration. Give people the chance to explain why certain things are happening and let them ask additional questions (to be picked up in the next iteration). This will help to improve the quality and relevance of your analysis as well as increase the buy-in of the process stakeholders in the final results of the project.
- Jump to conclusions. You can never assume that you know everything about the process. For example, slower teams may be handling the difficult cases, people may deviate from the process for good reasons, and you may not see everything in the data (for example, there might be steps that are performed outside of the system). By consistently using your observations as a starting point for discussion, and by allowing people to join in the interpretation, you can start building trust and the collaborative culture that process mining needs to thrive.
- Force any conclusions that you expect, or would like to have, by misrepresenting the data (or by stating things that are not actually supported by the data). Instead, keep track of the steps that you have taken in the data preparation and in your process mining analysis. If there are any doubts about the validity or questions about the basis of your analysis, you can always go back and show, for example, which filters have been applied to the data to come to the particular process view that you are presenting.
We would like to thank Frank van Geffen and Léonard Studer, who initiated the first discussions in the workgroup around responsible process mining in 2015. Furthermore, we would like to thank Moe Wynn, Felix Mannhardt and Wil van der Aalst for their feedback on earlier versions of this article.
 Responsible Data Science (RDS) initiative: http://www.responsibledatascience.org.
 Wil van der Aalst’s presentation on Responsible Data Science at Process Mining Camp 2016: http://coda.fluxicon.com/assets/downloads/Camp/2016/8-Wil-van-der-Aalst.pdf
 Process Mining software Disco: https://fluxicon.com/disco/
 Academic Process Mining framework ProM: http://www.promtools.org/
 Privacy, Security and Ethics in Process Mining. Extended version: http://coda.fluxicon.com/assets/downloads/Articles/PMNews/Privacy-Security-and-Ethics-In-Process-Mining.pdf
 Success Criteria for Process Mining: /2016/07/success-criteria-process-mining.html
 Data Quality Problems in Process Mining and What To Do About Them — Part 11: Data Validation Session with Domain Expert: http://fluxicon.com/blog/2016/10/data-quality-problems-in-process-mining-and-what-to-do-about-them-part-11-data-validation-session-with-domain-expert/
Authors: Anne Rozinat, PhD & Christian W. Günther, PhD
Anne Rozinat has more than 10 years of experience with the application of Process Mining. Christian W. Günther received his doctorate under Prof. Wil van of Aalst and his research ensured that nowadays even the most complex and heterogeneous processes can be analysed using Process Mining. Both are the founders of Fluxicon and the makers of the popular Process Mining software Disco. They organize the annual Process Mining Conference Process Mining Camp and blog regularly about Process Mining on fluxicon.com/blog/.
- Process Mining: Where Data Science and Process Science Meet
- Humans & Machines Ethics Framework: Assessing Machine Learning Influence
- 5 Best Practices for Big Data Security