Among the misconceptions regarding Big Data, two important ones stand out: that correlations alone suffice and that Big Data means sampling bias is no longer an issue.
Fooled by Association
First, Big Data mining advocates claim that correlations suffice and the quest for causal interpretation should be abandoned. The real danger is that you will be "fooled by association," as explained in Freakonomics.
I consulted car company managers who were upset because, though profits were up, a "key performance indicator" was down. After causal analysis, what became clear was that this indicator did not cause or lead profits; the correlation was merely a coincidence and turned around in recent periods. As a result of the causal analysis, managers could refocus their energies on moving those indicators that do lead to sales.
The Sampling Bias
Second, Big Data sometimes gives the illusion that sampling bias is no longer an issue (as it is for small data) because the data capture the entire population. However, "N = all is often an assumption rather than a fact about the data" (Kaiser Fung, Numbersense).
For example, your social media data may accurately capture online sentiment but only for those consumers who are online and care enough about your brand and product category to comment through the online channel.
In recent research across 15 product categories, we compared the power of representative offline survey metrics (awareness, consideration, and liking) and online behavior metrics (paid ad clicks, site visits, and social media conversations) to explain and predict sales. We found that online behavior metrics excelled in short-term predictions, but that offline survey metrics excelled in medium-term predictions.
What Have We Learned?
Blowing up data size does dissolve us from the challenges of meaningful inference from the data. The recent review of the Google Flu Trends "success" story illustrates both the importance of causal inference and the sampling bias, excellently described by Tim Harford.
"Big Data has arrived, but big insights have not," states Harford. "The challenge now is to solve new problems and gain new answers—without making the same old statistical mistakes on a grander scale than ever."
Take the first step (it's free).
You may also like:
- Chin up, Marketers: The Demise of Third-Party Cookies Isn't All Bad
- How to Marry Offline and Online Attribution Data for a 360 View in Google Analytics
- How B2B Marketers Can Absorb and Apply Data Effectively: Six Questions Answered
- How to Match Your Key Metrics to Your Content Goals
- Five Web Analytics Tools to Help You Optimize and Measure Marketing ROI