GDPR after 2 months – What does it mean for Machine Learning?

Almost 2 months on from the GDPR introduction, how was machine learning affected? What does the future hold?

GDPR manual

GDPR (The General Data Protection Regulation) is a very significant EU law that offers major new data and privacy protection for all individuals within the European Union (EU) and the European Economic Area (EEA). GDPR took effect on May 25, 2018. You probably have all received countless emails from companies updating their privacy policies to comply with GDPR.

Much has been written about potential impact of GDPR on Machine Learning, including in KDnuggets - see our previous overviews

These were written prior to the introduction of the new regulations, when the ambiguity of the new regulations had everyone second-guessing as to what affect the machine learning field would feel. So, two months on, has it become clearer?

Well, yes, slightly. There’s still a “smoke and mirrors” type feeling when it comes to discussing GDPR, but several revelations have come to light over the last couple of months:



An AI software named Claudette – short for automated clause detector – has been developed by EU institute researchers and a consumer group to detect suspected GDPR breaches.

Claudette investigated the privacy policies of the following 14 major technology company’s in June, the month after the new GDPR laws went live:

  • Google
  • Facebook + Instagram
  • Amazon
  • Apple
  • Microsoft
  • WhatsApp
  • Twitter
  • Uber
  • AirBnB
  • Booking
  • Skyscanner
  • Netflix
  • Steam
  • Epic Games

According to Bloomberg, Google, Amazon and Facebook were amongst the companies highlighted as having potential GDPR breaches. Despite the findings researchers explained that the results could not be considered 100% accurate, as the software is still new and only viewed a small number of policies. However, you can be sure these companies will be reviewing their data policies, including any machine learning aspects, to not get caught out.

Following the GDPR introduction, it took exactly 48 minutes for the first lawsuit to be filed. It came from an Austrian privacy advocacy named NOYB (none of your business) and was against Facebook and Google. NOYB claimed that the two tech giants, along with a couple of Facebook subsidiaries WhatsApp and Instagram, failed to give European users specific control over their data. They asked regulators to dish out fines that ran well into the billions for these supposed breaches. Expect more of this in the coming months.


Time limit on user data

Restricting the time that a users data is held isn’t a new concept at all. In fact, the Data Protection Act 1998 contains the following regulation:

“Personal data processed for any purpose or purposes shall not be kept for longer than is necessary for that purpose or those purposes.”

The new GDPR regulation brings this into focus once more, stating that companies can only possess data for a valid time period. Companies have to be able to prove why they have kept data for the time period they have and be able to explain this decision if questioned.

Of course, this isn’t ideal for machine learning algorithms, as the longer data is stored, the better the machine can learn and interpret correlations and provide conclusions. Developers, data scientists, engineers and system administrators will have to work in unison to ensure that the customer data they’re using is entirely necessary. In the long run this could actually make companies more efficient, ensuring that they’re clean and thorough the data they hold.


Modelling Explanation

According to the GDPR owner’s rights, the data subject (the user/customer) has the right to explanation. For machine learning, this refers to the fact that the data processor must provide meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject.

Authorities are still discussing the depths that this should reach. Should it go down to machine level functions, or just an overview of what the data processor is aiming to achieve? It’s sure to be an ongoing discussion.


Removing user data

Another of the data subject’s rights is the right to be forgotten. This appears straightforward in general, simply deleting the users records, but how does this apply to machine learning? Does this mean that data subjects have the right to demand that your predictive model is retrained without their data? It’s an interesting grey area.

There are tools available to help machine learning engineers, such as BigML platform’s reification capability.


ePrivacy Regulation

What is it?

The elephant in the room is the new ePrivacy Regulations (EPR) that are due to be introduced at some point in the not-so-distant future. EPR has a similar vagueness to GDPR, but will basically work as a complement. It is aimed at digital data privacy and will provide a single framework that all companies doing business in the EU must conform to. Failure to do so will result in similar fines that are enforced under the GDPR regulations.



So what’s the difference exactly? Well, they both focus on the protection of customer data, but there is a subtle difference. GDPR focuses on providing protection for customer data that is stored in a company’s database and how that information is passed to 3rd parties. In comparison, EPR will concentrate specifically on how company’s electronic communications data are transmitted.


Machine Learning?

Similar to GDPR, EPR is sure to scare businesses when it comes to Machine Learning, due to the legal uncertainty it creates. The scope of ‘electronic communications’ will affect a range of services and organisations, from messaging apps such as WhatsApp to almost any internet service that acquires user data.

The processing and storing of machine-generated data could be prohibited, meaning that a user would have to consent for any of their information used. This could either result in lots of consent forms for a user to read and agree to or a blanket form that covers everything.

Although this sounds great for users, this is bad news for business. If users fail to consent, the progress of machine learning could be significantly slowed. Machines cannot learn and improve without the data and these new regulations could leave Europe disadvantaged compared to other regions when it comes to new technology discovery.

It is worth reiterating that it’s a lot of guesswork at this stage and until EPR is fully introduced, we won’t know the impact it could have on the industry. Watch this space.



This article barely scratched the service of the impact of GDPR on Machine Learning, but hopefully you feel updated with some of the many revelations that have taken place since it’s introduction. This isn’t a definitive topic and is sure to progress more over the next few months.