Among the misconceptions regarding Big Data, two important ones stand out: that correlations alone suffice and that Big Data means sampling bias is no longer an issue.
Fooled by Association
First, Big Data mining advocates claim that correlations suffice and the quest for causal interpretation should be abandoned. The real danger is that you will be "fooled by association," as explained in Freakonomics.
I consulted car company managers who were upset because, though profits were up, a "key performance indicator" was down. After causal analysis, what became clear was that this indicator did not cause or lead profits; the correlation was merely a coincidence and turned around in recent periods. As a result of the causal analysis, managers could refocus their energies on moving those indicators that do lead to sales.
The Sampling Bias
Second, Big Data sometimes gives the illusion that sampling bias is no longer an issue (as it is for small data) because the data capture the entire population. However, "N = all is often an assumption rather than a fact about the data" (Kaiser Fung, Numbersense).
For example, your social media data may accurately capture online sentiment but only for those consumers who are online and care enough about your brand and product category to comment through the online channel.
In recent research across 15 product categories, we compared the power of representative offline survey metrics (awareness, consideration, and liking) and online behavior metrics (paid ad clicks, site visits, and social media conversations) to explain and predict sales. We found that online behavior metrics excelled in short-term predictions, but that offline survey metrics excelled in medium-term predictions.