Three Big Data Biases That Screw Up Decision Making

I firmly believe that an organization's ability to aid (if not automate) empirical decision making separates the winners from the rest. Big data analytics plays "Big" role in that.

Be it the widely popular tech blogs, newspaper articles or popular CIO conferences, there is a huge hype around how analysing big data has helped organizations to get actionable insights on previously indomitable unknowns. And this hype is not unfounded. It only becomes problematic when it leads to what I call “data fanaticism”, the notion that if there is correlation it always indicates causation and that data sets and analytics always portray the truth lucidly and accurately. Therefore, whenever I come across phrases like "...numbers speak for themselves" or "In god we trust. Everybody else bring data", I often ponder - can data science deliver on its promise? Do numbers really speak for themselves?

Sadly, Datasets aren't perfect. They are humans creations and therefore aren't objective. After all its humans who give datasets its voice and form. Hidden biases in data collection and analysis stages pose considerable risks and it is very important for the CIOs to be mindful of those risks.

Here are the three biases that we encounter a lot -

Selection Bias                                                                  

Selection bias commonly occurs when data is selected subjectively rather than objectively or when non-random data has been selected. Because the selected dataset does not represent the complete data, the results are skewed.

Example - Feedback forms sent to the customers. Making a feature live on a selected page.

In both the cases the selected dataset is not random and hence the insights derived are bound to be skewed.


Outliers are extreme data values that measure significantly above or below the range of normal values or the pattern of normal distribution. This is particularly dangerous when you are developing systems to take automated decisions based on historical averages.

Example - Demand Forecasting for an item based on average historical daily sales.

One day of disproportional sales of an item may cause the average to spike up significantly. To mitigate this you should normalize the average and nullify the effect of outliers by adopting three-sigma (standard deviation) rule.

Simpson's Paradox

According to Simpson's Paradox a trend or a pattern found in groups of data can reverse when the groups of data are combined. This occurs because the absolute numbers or weightages are disproportionate in the split. This is the most common bias in data analysis and is very counter intuitive

Example - Open Rates of Email Marketing Campaign

Email Campaign


Week 2


Campaign A




Campaign B





While the open rates of campaign A is better than B in both weeks, still the overall open rate of campaign B is better than A.

Final Take

There are many other kinds of statistical and cognitive biases, but the ones presented above are the ones which are very common and often impact decision making.

This points to the next question? How do we address it? While the raw data may look abstract, qualitative analysis of the meta information of the dataset may reveal what cognitive biases they might bring to its interpretation. CIOs should make a habit of asking where the data came from and what methods were used to gather and analyse it.

There is a strong case for complimenting data sources with qualitative outlook. Insights can be found at multiple levels of granularity and by combining such methods with analytics we can add depth to the data we analyse. This may make the challenge of understanding big data more complex, but it would bring context-awareness to our predictive systems.

(Image Courtesy:

Categories: Digital

About Author

Orange Themes

Suyash Katyayani

Suyash Katyayani is CTO at

Read more

Write a Comment

Your e-mail address will not be published.
Required fields are marked*


Recent Comments