Semantic layer: what are the benefits for the business?

Semantic layer: what are the benefits for the business?

Tags
Data Engineering
Semantic Layer
Self Serve Analytics
Data Analytics
Published
April 17, 2024
Semantic layers are getting more and more attention in the data industry and that is for a reason.
A lot has been written about semantic layers features. API, caching, permissions…
However it is harder to find details about the benefits one gets out of using a semantic layer.
If semantic layers are getting more and more popular why do we not hear more about the value they bring to be business?
This is what I will try to bring to light in this short write-up.

👨‍🏫  Small recap: what are semantic layers?

A semantic layer is a layer that sits between your database and your data visualisation or notebook tool. It centralises metrics and entities definitions and allows tools to access the database via an API. Their main usage is their ability to generate SQL queries based on a query definition.
 
To be accurate, semantic layers are nothing new. Tools like Power BI or Business Objects (for the more veteran of us) have their own embedded semantic layers.
The novelty in the modern data stack (no it’s not dead 😏) is that semantic layers are a standalone part of the stack. The main tools are:
You can read a good comparison here.
The main difference I would like to highlight is that Cube is open source and you can self-host whereas dbt’s semantic layer requires you to be a paid customer of dbt Cloud.

💰 What are the benefits?

The benefits one gets from using it depends on the user type.
Generally for developers, it's pretty clear what are these benefits. Less so from a business user perspective.
In my opinion it is quite important that everyone in the data team and also the more data savvy stakeholders understand why it is particularly useful 👇

Less time spent with report maintenance

Why is that? Semantic layers centralise metric definitions as code. When a new metric or an update needs to happen, it only needs to be made in the semantic layer, generally a YAML or python file, and hop it is sync with the reports 💫

Metric consistency across reports and dashboards

notion image
Long gone are the times where report A and report B showed different numbers. With a semantic layer the surface for inconsistencies is considerably reduced. Again this is due to a single place holding definitions for entities and metrics.

Analysts productivity is skyrocketed 🚀

The main job from the semantic layer is to spit SQL given an input JSON query.
A typical query could look like
{ measures: ["active_subscribers"], dimensions: ["country"], timeDimensions: [{dimension: "traffic_date", dateRange: "Last month"}] }
Which would give you in pseudo SQL
SELECT country, COUNT( DISTINCT CASE WHEN is_active THEN user_id ELSE NULL END ) as active_subscribers FROM users `users` LEFT JOIN traffic `traffic` ON `users`.user_id = `traffic`.user_id LEFT JOIN country `country` ON `country`.name = `traffic`.country GROUP BY 1 ORDER BY 2 DESC
This is gold for analysts as it becomes easy for them to self serve SQL templates from the semantic layer.
The workflow here is to enable analysts to compose their query rapidly and get the resulting SQL instantaneously, usually using a UI (cube UI playground is particularly shining here!).

Increases your self serve BI platform scalability

It is commonly known that any self serve BI solution ultimately fails because by leaving too much flexibility to end users they end us creating more chaos than aha moments.
Here semantic layers are particularly useful to scale self serve analytics solutions efficiency thanks to:
  1. consistency of metrics
  1. up to date documentation (facilitated with documentation as code)
  1. interoperability provided by APIs (graphQL) which means all departments can access the same data no matter if it’s Technology with APIs or Sales or Marketing with spreadsheet tools integrating with the semantic layer etc…
🤖
Lately we are seeing lots of hype around semantic layers and LLMs. The former enables the latter to perform much more efficiently aka semantic layers reduce LLM hallucinations

Conclusion

Semantic layers enable businesses to scale and sustain the value they get from their data. Productivity usually decline as more and more pipelines are churned. The beauty with semantic layers lies in its ability to bring clarity and consistency to all data workloads across the company. It also brings security thanks to built-in governance (with authorisation layer as code)
If I had to recommend somewhere to get started, check cube.dev. It has the more flexibility with offering both cloud and self hosted deployments.