KDnuggets Home » News » 2016 » Mar » Opinions, Interviews, Reports » Fraud Bots Mess Up Your Big Data ( 16:n10 )

Fraud Bots Mess Up Your Big Data

          

The bots that cause digital ad fraud also mess up analytics. When they create fake visits, pageviews, ad impressions, clicks, etc. those metrics are not real and should be corrected for.

By Dr. Augustine Fou

The rise of programmatic ad buying facilitated the rise of automated digital ad fraud.

Over the last few years, programmatic ad buying - the automated buying and placement of digital ads - has risen sharply because the number of “long-tail” websites that carry ads have exploded to the point that it is impossible for any human media planner to keep track of them all in a spreadsheet. So, technology was required to handle the scale of the media buying and the big data that its processes generated. However, along with this increase in technology automation, there was a corresponding increase in digital ad fraud, also automated by technology.

For the 2 largest forms of digital ads -- 1) display ads (aka banner ads) and 2) search ads -- automated tools could be used by “bad guys” to automatically generate millions upon millions of ad impressions and clicks, in order to earn ad revenues illicitly. These automated tools are commonly called “bots” which are programmed to hit webpages (like a human would) and cause ad impressions to load. These bots can also do other things that humans normally do, like click on ads, scroll up and down webpages, and simulate mouse movements.

CPM Fraud Example

The fake visits, ad impressions, and clicks caused by bots are polluting analytics.

As these bots go about committing the ad fraud, their actions are recorded by analytics packages and other measurement platforms. The fake visits from these bots artificially inflate visitor counts on website analytics platforms. The fake clicks cause click through rates to be higher than they actually should be -- when clicked by real humans. These bots can also do advanced actions like stay on websites to create fake “time on site” or tune bounce rates higher or lower. They can visit multiple pages to increase pages per visit and fake virtually every other KPI that is commonly used to judge the effectiveness of marketing campaigns.

Fake source example

Furthermore, bots cover up their fraudulent activities by passing fake metrics and tricking fraud detection technologies. For example, bots pass back fake source information to trick analytics platforms to think that the click came from a legitimate source, like a mainstream publisher or search engine, when it actually did not. Bots will also generate fake clicks in order to earn fraudulent click revenue on search ads. Unscrupulous website owners will stack dozens of ads on top of each other and “above the fold” in order to trick viewability measurement tools. All of these fraudulent actions mean that the numbers are suspect and unreliable -- and should not be used for making business decisions.

Being aware of fraud and actively verifying, corroborating, and correcting the data is necessary.

Unsuspecting brand marketers and the agencies that serve them might actually be sending more money to the bad guys when they optimize their campaigns using messed up data. For example, optimizing towards ads with higher click through rates would usually make sense, but not when the clicks are all fake. Spending more money on websites with higher viewability should make sense, except when the bad guys create fake, abnormally high viewability sites. So optimizations and business decisions based on bad data, corrupted by fraudulent bot activity, are going to be sub-optimal, if not downright wrong.

Bot filtering example

Because bots are the common ingredient in all forms of digital ad fraud, detecting and filtering such bot activity is a great first step to reducing the data corruption and bad business decisions. In Google Analytics, “bot filtering” can be turned on exclude a list of known bots -- like search engine crawlers -- that typically visit websites. For the more advanced bots that actively trick measurement systems and pass fake data, more advanced detection tools should be used. Then ad networks, sites, or other sources that have high bot activity should be actively blocked or removed to reduce their impact on analytics and big data.

For further reading, see: http://www.slideshare.net/augustinefou/presentations.

Dr. Augustine Fou

Bio: Dr. Augustine Fou is an industry recognized expect in digital ad fraud research. He has been advising clients on digital strategy and optimizing marketing campaigns for over 20 years. Dr. Fou was the former Chief Digital Officer of the Healthcare Consultancy Group, a group of agencies within Omnicom and a McKinsey consultant. He holds a PhD in Materials Science and Engineering from MIT.

Related: