I am back from a very intense week in New Orleans for the annual Coalesce — dbt conference. It was fun to meet so many data passionate: dbt Labs members, day-to-day practitioners or "people like us" vendors...
I wanted to present dbt to all our readers (for those who don't know) and what the Semantic Layer they introduced is about.
For the Analytics Engineers reading this, we know you know, so you can skip this part.
dbt (data build tools) was created as an open-source package simplifying data transformation in data warehouses using SQL.
While the idea is not new, their execution was exceptional. In particular there are 2 factors they nailed down.
First, their community. They now gather 50k+ data practitioners in their Slack giving them a real power on the market. Second, their opinionated product. They limited their initial feature set to a minimum while providing exceptional guardrails for both junior and experienced data engineers to work with best-in-class practices. It resulted in an astonishing growth.
Transformation is the step process in which the "production" raw data in the warehouse is transformed (😉) and prepared to an "analytics" version of the same data that can be read by a human or BI tools. Very often, this transformation steps embeds the Business logic: the definition of a specific metric, the list of items you want to get rid of because you know it comes from a bug or from a bot. Very often, this steps is also the best moment to "denormalize" your data. On the production hand, data are never repeated twice. It leads to powerful, easy to update, but complex to aggregate, architectures. On the analytics hand, you'd prefer to duplicate some info twice if it helps run aggregations more quickly or ease the life of a business user.
Learn more about Data Transformation on dbt blog.
Now enters the Semantic Layer.
One of the biggest analytics issue of the past years is the famous: I don't find the same result to the same question. Mmmmh. This is something all companies have experienced and for the sake of trust this should disappear.
Until now, dbt transformation was able to model the data but final aggregation happened in the various BI tools. Some BI tools (Looker with LookML as a flagship) even force data teams to do both modeling and aggregation in their tool.
This modeling for a specific tool leads to computation differences when using various BI tools (which is VERY frequent) but it also created very high switching costs, frightening data teams.
dbt suggests a new layer above the traditional modeling. You are able to define a metric as a specific aggregation on a column that would then be looked at through a predefined set of filters and dimensions. MRR can be defined as the SUM of the VALUE column and that can be aggregated by the transaction date, organization, plan, so on so forth.
It comes as a set of files used to describe what you expect for a specific metrics. Those files have the exact same characteristics as their dbt models counterparts: they're defined via simple versionabled / git-abled files, therefore available in code and the structure is open source.
Any client (think BI tools, notebooks, ...) can then call the dbt API with a description of what they need : the metric, filters and dimensions and get the result set.
Get the dbt vision of the Semantic Layer new feature.
This layer was long expected.
A great semantic/metrics layer needs to work across BI tools so computation stay the same in dashboards, notebooks, Machine Learning algorithms, ...
It's very difficult for any existing vendor to build such a layer as most of them built their own modeling 1. to build a great dashboard experience but 2. to build a leaving costs for their users... This "headless" layer makes a lot of sense for the community (and for dbt 😉).
I wish dbt succeeds making this Semantic layer popularly used. But it's not yet a done deal.
At Husprey, we made the decision to integrate with the Semantic Layer over the next months and will be happy to share the first implementation of the BigQuery dbt Semantic Layer as soon as it's available.
My reserves are mostly about some technical features (joining entities for e.g.) that should soon be taken care of and the monetization of their open-source software. They are entering a test and iterate phase I am really curious about.
Husprey recently integrated fully with dbt (yay!). All Husprey users can now sync their data using dbt Core and dbt Cloud.
This integration allows Data Analysts to see all transformation metadata directly alongside their queries and notebooks. It enables faster iteration whether they investigate a data quality issue or answer an ad-hoc request.
Everything about this integration is very much aligned with our vision. We believe documenting your data should happen next to your actions. Thus meaning documenting your columns while transforming your data with dbt, and documenting your investigation while iterating on this ad-hoc request in Husprey.
As I said above, our Husprey team is ready to add the Semantic Layer, when it's made available. Please look forward to it!
You can access the Husprey guide on integrating with dbt, it only takes up two minutes ⏱
We offer a two-week free trial, giving you time to connect your data source, build notebooks and test out the many features available. Join the modern data analysts and sign up to Husprey.