Last chance to get the Summer Rate and save up to $400 on B2B Forum in Boston. Get your ticket by August 31.

Among the misconceptions regarding Big Data, two important ones stand out: that correlations alone suffice and that Big Data means sampling bias is no longer an issue.

Fooled by Association

First, Big Data mining advocates claim that correlations suffice and the quest for causal interpretation should be abandoned. The real danger is that you will be "fooled by association," as explained in Freakonomics.

I consulted car company managers who were upset because, though profits were up, a "key performance indicator" was down. After causal analysis, what became clear was that this indicator did not cause or lead profits; the correlation was merely a coincidence and turned around in recent periods. As a result of the causal analysis, managers could refocus their energies on moving those indicators that do lead to sales.

The Sampling Bias

Second, Big Data sometimes gives the illusion that sampling bias is no longer an issue (as it is for small data) because the data capture the entire population. However, "N = all is often an assumption rather than a fact about the data" (Kaiser Fung, Numbersense).

For example, your social media data may accurately capture online sentiment but only for those consumers who are online and care enough about your brand and product category to comment through the online channel.

In recent research across 15 product categories, we compared the power of representative offline survey metrics (awareness, consideration, and liking) and online behavior metrics (paid ad clicks, site visits, and social media conversations) to explain and predict sales. We found that online behavior metrics excelled in short-term predictions, but that offline survey metrics excelled in medium-term predictions.

What Have We Learned?

Blowing up data size does dissolve us from the challenges of meaningful inference from the data. The recent review of the Google Flu Trends "success" story illustrates both the importance of causal inference and the sampling bias, excellently described by Tim Harford.

"Big Data has arrived, but big insights have not," states Harford. "The challenge now is to solve new problems and gain new answers—without making the same old statistical mistakes on a grander scale than ever."

Sign up for free to read the full article. Enter your email address to keep reading ...


image of Koen Pauwels

Dr. Koen Pauwels is distinguished professor of marketing at Northeastern University, in Boston. He is the author of Modeling Markets and Advanced Methods for Modeling Markets for researchers, and It's Not the Size of the Data—It's How You Use It for managers.

LinkedIn: Prof. dr. Koen Pauwels

Twitter: @romimarketer