Sun starts to shine, how do you expect to keep your data fresh?
This week, I was asked the following question: "Very often on Notion, and actually all Knowledge Management tool, quality and freshness of the information is still an issue. How do you deal with that in Husprey?"
Well, this is a more than appropriate question. What is Freshness? What are tips to mitigate freshness issues about insights?
Is the data I am looking at up-to-date? This would be my definition of freshness for the data.
Very often, in order to evaluate freshness we can check "last updated" timestamps, sometimes associated with the last executed computation. This is for example how dbt suggests to compute Freshness. Freshness issues can appear when something stopped working in the data pipeline. The error might or might not be displayed to the end user but in both cases, trust is impacted — read more on Trust in data.
Is the insight/piece of information I am looking at up-to-date? This one is trickier to define. An analyst might have answered a question from a stakeholder and sent an email stating something pretty brief: "Churn is decreasing since our latest pricing update" with a screenshot of a chart backing up her point. However, was it true for a short period of time, is it a generalized trend? Has an important event like another pricing change impacted the metric again since then?
Metric freshness is (mostly) about the last data point being up-to-date or not.
Insight freshness is more complex and should take into account the insight itself but also the potential changes that might have appeared in the meantime impacting the conclusions.
Notebooks (Notion, Slite, even Google Docs, ...) are of great help to convey all the context around an insight to ensure a future reader (and it might be you) has the keys to understand and exploit this insight properly. And enter data notebooks. Not only they allow you to convey the context needed but you can also very easily duplicate one and check if findings are still true. A quick check, without having to set up an entire environment, is then available for Analytics teams of all sizes.
No matter your current way of sharing findings and insights, there are a few tips that can help you and your stakeholders.
"We are looking at the impact of our latest onboarding flow release on the signup conversion rate".
Readers then understand that they might only use those insights/conclusions later when working on the next iteration of the onboarding flow. Even if they won't reuse the same specific figures, that first analysis can serve as a relevant starting point.
Depending on the type of analysis you run and the frequency of previous changes you might actually be able to define a timeframe during which those results hold true. It might be a simple line around the conclusion with the timeframe: Month, Quarter, Year. Beyond that timeframe, when someone needs to work around the same topic it might be good to quickly run again the analysis and see if there were any changes that should be taken into account.
Why? They are easy to understand from your stakeholders because stakeholders are used to them.
They ease impact comparison from one analysis to another. They greatly reduce the time spent inventing new metrics (that sometimes are created only to support your point) and most importantly for the topic at hand they have a long term support so running again the analysis will be easier.
One might argue, "What happens when the underlying model has changed and you cannot easily duplicate an analysis?". Great answers to this question are to expect in the years to come. Both on the data lineage and metric impact sides.
Our role at Husprey will be to make sure this metadata is easily available in any notebook, next to any conclusion so stakeholders will be able to read and reuse insights with trust!