How Data Science Fuels Fraud Prevention

By themselves, these data points will probably not provide much insight into a single customer. However, a company that has some or all of this information is well-positioned to have a strong idea of how legitimate its visitors are.

How Data Science Fuels Fraud Prevention
Photo by Tima Miroshnichenko


Like an Oklahoman megafarmer being approached by a fracking company, most ecommerce companies don't know how much potential value they are sitting on. In this case, we aren't talking about natural resources, we're talking about data. Data mining has been touted as the new oil rush for some 15 years now, but the methods of refining the crude, raw data are still being discovered, as are the mechanisms we can fuel with it.

Many ecommerce companies are familiar with applying the data they harvest from their incoming traffic to functions like customer segmentation and targeted marketing. Of course, these are ideas that are at the top of the profit yield checklist, but part of generating an impressive profit is also mitigating losses to fraud – an increasing concern year on year. Data science fuels that bus as well.


Powering Your Fraud Solutions with Data


Regardless of what security provider you turn to, the layers of protection that it delivers will start with data. Generally speaking, the data points a fraud solution wants to look at will start with the kind of information any ecommerce website wants to collect. These might be contributed actively in the digital marketplace, meaning that the user opts to provide information themselves when prompted. Common examples include things like registering an account on a website or signing up for a newsletter. This usually yields data points like:

  • Name
  • Email address
  • Phone number
  • Physical address
  • Birthdate

Or, the data points on a user might be gathered passively, with no intentional, knowledgeable participation by the user:

  • IP address and other connection information
  • How they interact with the website, including the time spent and other behavioral biometric data
  • Identifying information of the device used to connect with the website, a device fingerprint

By themselves, these data points will probably not provide much insight into a single customer. However, a company that has some or all of this information is well-positioned to have a strong idea of how legitimate its visitors are. Fraud software can take these data points and enrich them to turn a relatively anonymous data skeleton into a fully-fleshed digital profile. 


Enriching Your Fraud Fighting


Nearly every fraud solution that employs identity verification to mitigate loss will lean heavily on data enrichment – that is, taking known data points and expanding them to associated, more useful data points. With this method, a single (albeit important) piece of information like a phone number can potentially turn into posts on social media, photos, and friends and family. This expanded data can be gathered in several ways:

  • Closed-source data, or personally identifying information that a user submits themselves as part of an onboarding or registration process that can’t be found elsewhere on the open internet.
  • Proprietary databases of aggregated user data are another kind of closed-source data. Many fraud prevention solutions make use of massive databases to cross-reference incoming traffic. Such databases might consist of historical good or bad users, fraudulent transaction behavior, reputation data, or even credit history, with some companies using a proprietary database counting their data reference points in the billions.
  • OSINT data, short for Open Source INTelligence, is the set of data that can be collected from publicly accessible sources, such as accounts and registrations associated with an email or phone number, images and posts from social media, traditional journalistic sources, matters of public record like marriages or arrests, geolocation data, and much more.

After the initially gathered data points are subjected to this kind of enrichment, the fraud software now has a user profile that is much easier, and much more conclusive, to evaluate. While scrutinizing the profile, a fraud score is assigned to each user based on its findings. Potential indicators of fraud, like connecting via a VPN, increase the score. After a predefined threshold is met, most solutions offer the ability to either stop the user’s progress automatically, or else elevate the case to a human counterpart. 

Defining your company’s risk tolerance threshold is part of executing a proactive anti-fraud initiative – the most effective kind. Just as with the fraudsters under your floorboards, the more data you have about your own company, the tighter the security can become. Having well-defined goals – stopping ATO attacks, for example – at the outset is paramount, as is cleaning your data and labeling it accordingly.

In terms of fraud prevention, this kind of data preparation is crucial for the machine-learning algorithms that power nearly every fraud solution’s AI. These algorithms, regardless of what model they rely on, need to be trained to produce accurate results for an individual company.   Training teaches the software the subtle differences between fraudsters and good customers in your system, identifying with increasing accuracy what is passable and what is a suspicious outlier. Without training, trusting machine-learning algorithms to run autonomously is a risky decision, but a well-trained algorithm might need very minimal human oversight, freeing up resources.


Example of a Data-Powered Fraud Investigation


A user arrives on your ecommerce platform, registers for a new account, and begins their shopping journey. Their user data appears legitimate, insofar as they have filled out each section of the registration form with a valid entry – phone number, email address, name, and location. 

Then your fraud software steps in, just in case. By performing lookups on the OSINT data associated with the provided credentials, the program notes that this user’s email address appears to be new, and their phone number is not associated with a single social network – very unusual in 2022. While this user may simply be social media agnostic, most fraud stacks can be customized to flag such an anomalous user as potentially suspicious, and their journey can be put on pause while it is escalated to manual review.

A manual reviewer from the dedicated fraud team steps in to take a closer look at this user. Initially, the reviewer is inclined to label this user as a false positive for fraud, despite their minimal digital presence. Being somewhat on the fence still, they decide to zoom out and look at the data trend analysis reported by the software. The analysis tells a different tale.

The software, which automatically derives insights from the aggregated data, notes that this user’s device fingerprint is nearly identical to 70 other users. As well, by running velocity checks, the program shows that all of those users visited the website for a similar amount of time, and all visited the website in the last 72 hours. Furthermore, IP analysis of all those accounts shows locations that are very disparate from the addresses they claimed upon registration, and many of those IPs originate from datacenter proxies that have previously been flagged as suspicious.

Thankful they did not simply give this user the green light, the fraud team member blocks all the transactions with the same profile. They set up a custom rule to detect future connections that match this profile, then eat a lunch made more delicious by a satisfying and productive morning.


Key Takeaways


First and foremost, the biggest takeaway for any ecommerce business should be that a fraud prevention solution leveraged against incoming traffic is crucial to curbing fraud inside the system. Fraud techniques are becoming more sophisticated all the time, and according to UK Finance, the UK alone lost £2.4 billion pounds to fraud just last year.

The second key point is to be aware that any fraud solution will be most effective when it is fed the best data available to your model. While providing a low-friction, low-churn shopping journey is important for every ecommerce sector, that experience has to be measured against your company’s appetite for fraud losses. Adding a moment of friction to the customer journey by requesting additional identifying information shouldn’t result in a huge ROI dropoff. As well, it may return a much more insightful view of your customer base, which, as discussed, is the fuel that will drive both your fraud mitigation and, hopefully, your profits.

Gergő Varga has been fighting online fraud since 2009 at various companies – even co-founding his own anti-fraud startup. He’s the author of the Fraud Prevention Guide for Dummies – SEON Special edition. He currently works as Content Evangelist at SEON, using his industry knowledge to keep marketing sharp, communicating between the different departments to understand what’s happening on the frontlines of fraud detection. He lives in Budapest, Hungary, and is an avid reader of philosophy and history.