Blog: The dummy’s guide to fraud detection techniques

Chris Williams and Ehsan Mokhtari explain the pros and cons of in-line and in-parallel processes for dealing with bots and spiders.

By Chris Williams and Ehsan Mokhtari 

The Media Ratings Council has released its Invalid Traffic Detection and Filtration Guidelines and has put a 180-day deadline on planning and implementation for digital media metric providers.

With so many different schemes to get invalid traffic into the supply chain, it shouldn’t come as a surprise that there are many different ways to get it out of the supply chain as well. The MRC divides invalid traffic detection into two buckets, general and sophisticated. Think of the general as a checklist of previous offenders while the sophisticated is detective work over time.

General methods of examining traffic include the use of lists of known problems such as data-centre traffic, specific spiders and bots on industry lists, or bot activities, which no human would do. This filtering occurs within ad servers, web analytics or any other ad tech that provides a counting service. It’s important to point out it’s all invalid traffic but not all of it is fraud, some of it could just be sloppy work. This points to the requirement for process audits to tighten up the work done on the supply side ensuring the list-based approach is applied throughout the supply chain.

The MRC’s other bucket is Sophisticated Invalid Traffic. Here fraudsters seek to install malware on people’s computers, steal ad tags, re-direct users away from their desired sites, stuff their cookies with fake data and much more. JICWEBS identifies 16 categories of malicious activity.

The general method is consistently applied, however sophisticated methods operate differently depending on where they reside in the ecosystem. Sophisticated methods involve trying to block bots before they generate pages sometimes called Reverse Proxy, presenting challenges such as captcha, pre-load or pre-bid for programmatic transaction environments, and serving third party pixels on ad creative. It’s important to underline that all methods are valuable to the overall ecosystem as long as their strengths and weaknesses are understood.

One of the most important distinctions about sophisticated methods is whether they occur “in-line” or “in-parallel.” In-line is a sequential process: the browser makes a request, the traffic verification method has a very short period of time to determine human versus bot, after which pages, ads, etcetera, are delivered. In-parallel works by activating a script at a certain point and then separating the delivery of ads and pages from the verification decision which can take as long as necessary.

In-line systems protect downstream users. No publisher wants to give a malicious bot access to scraping its pages for content or code. No programmatic buyer wants to obtain real-time ad placements only to find out later they weren’t valid even if they don’t get billed for them; they are lost opportunities. However, the challenge is that user experience is determined by page load times and hence any in-line method must balance speed versus the number of operations performed.

In-parallel methods excuse themselves of the user experience challenge and focus on accuracy and depth of operations performed. Some will take milliseconds and others will take considerably longer. Skip to page eight of the MRC’s Invalid Traffic Detection Guidelines: “… [D]etection procedures for Sophisticated Invalid Traffic take time to execute and may not be feasible to apply to real-time processes.”

Fast-forward now to page 25: “Requirement for Backward Looking Assessments and Correction.” The MRC mentions a 14-day window that can be extended longer depending on campaign duration or customer service requirements.

Clearly the in-parallel approach isn’t the one to apply to programmatic buying decisions, however it is the one to apply to reporting and billing. No one really cares if their ads were shown to bots but they sure as hell don’t want to get a bill for it.

Another advantage of in-parallel methods is the feedback time lag to bots. The MRC raises its concerns about reverse engineering invalid traffic detection secret sauce on page eight: “Measurement organizations that apply Sophisticated Invalid Traffic techniques are likely to need to remove identified Sophisticated Invalid Traffic downstream from original detection at later times to protect detection procedures from reverse engineering.”

In-line processes with immediate feedback could be used to train better bots as they trigger positive and false responses.

The point is success against fraudsters and the invalid traffic they generate is a multi-layered approach with specific defences for specific points. The system becomes more effective as it incorporates learning from one point to be applied somewhere else.

However detecting and filtering invalid traffic is only one part of the picture. Beyond that lies the coming arrangements of tracking back to unscrupulous businesses, setting terms and conditions, standards of operations, reconciling billing workflow to provide advantages to audited media suppliers and benefits to advertisers focused on quality. Digital media, like any supply chain, is going through the growing pains of securing its marketplace. This is just the beginning.

Chris Williams is the former president of IAB Canada, and currently works as principal at Chris Williams Consulting. Ehsan Mohtari is the co-founder and Chief Technology Officer at Sentrant, a security company focused digital media integrity. He is also a PhD Candidate at the University of New Brunswick. 

Image courtesy of Shutterstock