Recap - dbt Semantic Layer and Metrics
At dbt’s 2022 Coalesce event, the Semantic Layer was released to the public. The core goal of the Semantic Layer is to pull definitions for Metrics (the hard, well-defined metrics like customer churn, annual recurring revenue, and monthly active users, which should have a standard definition across the whole business) out of end-user tools and store them in a central location where they are defined once and once only. The outcome of this exercise is that everybody connects to the exact same set of metric definitions, and everybody calculates metrics in a standardized, consistent way.
The architecture upon which dbt has built its Semantic Layer consists of three core components:
- Metric nodes – These are simply the metric definitions that exist in your dbt Project, configured in .yml files.
- Proxy Server – This is an intermediary server between a third party and your data warehouse. The third party in this scenario could be dbt Cloud, an integrated partner solution, or it could be you sending a query from your local machine. The job of the Proxy Server is to convert SQL queries from the client that contains jinja and metric syntax into typical SQL that your data warehouse can understand and return results for.
- Metadata API – This API allows an authorized third party to access information about the metrics specified in a dbt project, including the list of metrics available and the grains or dimensions associated with each metric.
This illustration from dbt’s documentation shows how the different components are connected:
Image credit: dbt Labs
How can we query metrics?
Currently, a handful of companies partner with dbt to integrate directly with the dbt Semantic Layer and expose dbt metrics in out-of-the-box solutions. These range from BI solutions like Lightdash, Hex, and Mode to data cataloging staples like Atlan and Collibra. The list of partner integrators is available here, and if growth in this space is anything like growth in dbt usage itself, the list is likely to grow massively over the coming months.
However, not everybody uses the current set of partner integration solutions, and it’s unclear how soon, or likely, it will be that more common enterprise solutions (think Tableau, Power BI, Alteryx) will work with dbt to support metrics in such an automated way. If the tools at your company aren’t represented in the list of partner integrations yet, it might seem like dbt metrics are a thing of the distant future. Building a custom integration yourself that fully traverses the entire Semantic Layer architecture may be a daunting prospect.
But what if users want to query metrics with the analytical tools they use daily? For this, it’s possible to send queries directly to the Proxy Server:
Suppose we know what metric we want to query and what options we can provide for the metric calculation. In that case, we can provide this information to the Proxy Server in the form of a SQL query that uses the calculate() macro from the metrics package. This macro aggregates a metric to the specified dimensions and returns the dataset based on the parameters selected in the macro.
Here is an example query for calculating a metric called n_injuries, with some additional parameters:
In the remainder of this post, we will illustrate several ways you can calculate metrics using this method in the most commonly available analytical tools. In the following examples, Snowflake is used as the underlying data warehouse – methods using other warehouses are likely to be similar but different.
dbt Metrics from a SQL IDE
Firstly, it’s possible to calculate metrics in your favorite SQL IDE (dBeaver and DataGrip are but a few popular solutions) by opening a connection to the Proxy Server. Here is an example in dBeaver connected to a dbt Proxy Server in front of a Snowflake instance. Create a new Snowflake connection, and enter the dbt Proxy Server address as the Host, keeping your credentials and other Snowflake parameters as you would usually have them.
To calculate a metric, use the syntax mentioned above of the dbt Metrics package. The dbt Proxy Server converts this query into SQL that Snowflake understands, executes the query against Snowflake, and returns the metric values.
This enables any analyst who writes SQL to pull metrics from the Semantic Layer, by (1) reading the dbt project for available metrics and parameters, (2) filling out the dbt metrics.calculate() function in their query and (3) using a SQL IDE to run the query against the dbt Proxy Server.
dbt Metrics from Python
Not everybody is an analyst and not everybody works from a SQL IDE. Python is the world’s most commonly used programming language for data science, and dbt metrics will be indispensable for data scientists to use metrics in their models and analyses.
Calculating dbt metrics from Python can be achieved in much the same way as through a SQL IDE, by pointing your connection at the dbt Proxy Server. In our example, we used the Snowflake Connector for Python. To make a successful connection, keep the Account parameter pointed at your Snowflake instance (this is a required parameter), but set the Host parameter to the dbt Proxy Server URL.
The following code creates a connection like this (the credentials are stored in environment variables), and executes the same metric query as above, but from Python.
The ability to query metrics from Python enables analysts and data scientists using any Python-enabled platform (think Databricks, Dataiku, Alteryx, etc.) to calculate metrics consistently, regardless of their platform.
dbt Metrics for the rest of us
Not everybody uses a SQL IDE and not everybody writes Python. What about everybody else? The dbt Proxy Server can of course, be hit via ODBC, surfacing metrics in local applications like Microsoft Excel. While not exactly the core of the Modern Data Stack, it’s essential to show that dbt metrics can be widely accessed, even outside modern tooling.
To set up an ODBC connection to the dbt Proxy Server, first download and install the Snowflake ODBC Driver (if you use a different Data Warehouse, you would get the relevant driver). In the ODBC Data Sources application (this is installed on all Windows machines), create a new User DSN and select the Snowflake Driver. In the configuration, enter the dbt Proxy Server URL as the Server parameter for the configuration, and set the rest of your Snowflake credentials as you normally would.
You can then use this DSN in your favorite ODBC-enabled application to send a query for a given dbt Metric. In Microsoft Excel, you can navigate to Data>Get Data>From Other Sources>From ODBC to create a query.
In the following dialogue, select your DSN from the dropdown menu. Expand Advanced options and enter your metrics query into the SQL statement box.
Implement dbt at Your Organization With Aimpoint Digital
We hope this post has been helpful in showcasing how accessible dbt metrics are, even in this early stage of development. If you are interested in implementing dbt metrics at your organization, please reach out using the Contact Us form below!