Enhancing DBT Workflows with Dynamic Tag-Based Macros

In the world of modern data analytics and transformation, the Data Build Tool (DBT) has emerged as a cornerstone for managing SQL-based workflows within data warehouses. A powerful feature of DBT lies in its ability to leverage Jinja, a templating engine for Python, enabling developers to create dynamic and reusable code snippets. I will cover one such examples. A macro that I have crafted macro designed to collect DBT models based on tags. In this article, we'll delve into the significance of DBT tags, explore the necessity of executing code within DBT macros, and outline various use cases where the get_models_by_tag macro proves invaluable.

Understanding DBT Tags

First thing first, what are DBT tags? DBT tags serve as metadata labels assigned to DBT resources, including models, sources, or seeds. They provide a means for organizing and categorizing these resources based on criteria such as functionality, departmental ownership, or business unit. Tags offer clarity and structure within DBT projects, aiding in documentation, testing, deployment, and monitoring.

Tag could be placed, really simply:

{{ config(
    tags=["finance"]
) }}

select * from ....

As a result, you could perform any DBT operation over the group of models defined by the tag. Let's say you need to rebuild all models dedicated to finance data:

dbt run --select tag:finance

Or run you hourly pipeline:

dbt run --select tag:hourly

That is super flexible and easy.

DBT Tags and Jinja macro

But what if I need to perform some action over tagged models from DBT Jinja macro? Let's say: the pipeline has crafted some new models and I want to have macro that will do some complex cleanup of a temporary table that is not possible to be done by post-hook of DBT? Sorry, you could not collect a list of models by tag from macro code... until you create your own macro.

{% macro get_models_by_tag(tag) %}
    {% set models = [] -%}
    {% if execute %}
        {% for node in graph.nodes.values()
            | selectattr("resource_type", "equalto", "model")
            | selectattr("config.enabled", "equalto", True) %}
            {% if tag in node.tags %}
                    {% do models.append(("dm_" ~ type_tag, node.name)) %}
            {% endif %}
        {% endfor %}
    {% endif %}
    {{  return(models) }}
{% endmacro %}

You could collect a full list of models from graph object, which will appear after doing dbt compile/dbt run and will be stored in manifest file. Next, we need to filter out disabled nodes and collect only models. The last step - to check if target tag exists in tags list of each node. And that is it.