Learn to recognise and fight the most common biases when running data analysis.
Data-informed decisions show great results when used wisely but we've also seen many examples where data don't help. Most of the time, it highlights potential biases in the way the data was collected or processed and it is one of the main reason why I prefer the "data-informed" wording to the "data-driven" one.
One has to support their decision by data. It should not be the other way around. Here is a sample of typical biases data teams can stumble upon and some tips to avoid them.
This is the star of the data biases even though it is also very linked to how our brain works.
It happens every day and sometimes without you even knowing. It can be frustrating for Data Analysts. When the initial question asked turns out to be "Can we prove that our retention is above 60%?". Then most probably the answer will be "Retention is at 60%" whatever the actual insights that could have been extracted from the data. This is confirmation.
Social Media favors apparition of this confirmation bias a lot. Very often, Social Media actually promotes content that you most likely agree with. It creates echo chambers in which the initial bias confidently grows. Impact of confirmation biases over Covid-19 vaccines was recently studied.
The most powerful way to fight this bias is to write down your initial beliefs and assumptions. Once you're close to a conclusion you should as much as possible check that your results are not only due to initial assumptions and that you haven't forgotten potential outcomes on your way.
Receive similar content twice a month, along with the latest data news, tips and attractive European job offers.
One of my favorite as a former Product Manager.
"This new release is a success. We've reached 99.9% bug-free". And then a couple of hours later, Customer Success team complains about a particularly high number of opened issues. The reality? The bug tracking software itself had an issue and was not reporting actual crashes. You only saw data coming from users being fine. (Between us, confirmation bias is also hidden here as you were craving for a bug-free release.)
Second example. Analytics in Products are consent-based! It means that not all your users are reporting analytics. When making conclusions like "80% of our website visitors look at our video", make sure that the available data is representative of your population otherwise your available bias brings on... selection bias.
To reduce availability bias, if a figure remarkably drops or jumps, consider what could have changed in the conditions of your experimentation. Did something break in your data pipeline?
The last one for today even though the list is actually longer.
At an e-commerce, customers returning a product are supposed to have lower NPS (Net Promoter Score) compared to those who don't. Right?
Looking at the data they actually have a higher NPS. It might be that users are happy with the return process but actually the cause is a selection bias.
Customers who send back products have a higher probability to be returning customers. Returning customers have a higher probability to have higher NPS. So customers who send back products actually have higher NPS... If you actually want to see the impact, you should most likely compare similar categories of customers: those who ordered once, twice, three times, ...
There are many more biases around data: survivor, outlier, history, ...
It is important to know the most common ones as a Data expert but you should also try to share examples of them to your business users. When business users self-serve data they might face those biases without even noticing. "Data champions" are a great way to share them around.