Our Thinking Thu 31st March, 2022
How to create a dbt package
In a previous blog post, we shared our journey to create a framework to support unit testing in dbt. We’ve started to use the approach in different projects, so we needed a strategy to share, evolve and maintain the code composed of a set of dbt macros. The appropriate way to do it is by creating a dbt package. In this article, we want to share the process of creating a dbt package, which is reasonably straight-forward after knowing a couple of details.
To create a dbt package, you need to work on three main tasks:
- Package the code you want to share
- Create tests
- Create a CI pipeline
Package the code
To package the code, you need to create a root folder for the package, containing a folder for macros and a dbt_project.yml to declare the package configurations. The dbt_project.yml could have a configuration similar to this:
name: 'sample-package' version: '0.1.0' config-version: 2 require-dbt-version: [">=0.20.2"] target-path: "target" macro-paths: ["macros"]
Name for the package, version, require-dbt-version and the path for the macros. If your package also contains models, you will need to have the model’s folder configured.
With this setup, the package is installable via dbt deps.
Tests
Currently, dbt doesn’t provide a tool to unit test macros, so we need to rely on integration tests. Integration tests in dbt mean the package code will run within a test dbt project. You need to think about the macros’ usage and how to test them by creating models/tests to support the scenarios that you want to cover. The integration tests project is usually at the root of the package code, and it includes a packages.yml referring to the package code as a dbt package:
packages: - local: ../
You can run the tests in multiple different environments (BigQuery, Snowflake, etc.) to ensure the interoperability of your package. Also, you could and should have the integration tests running in your CI pipeline.
CI pipeline
The goal of the CI pipeline is to introduce monitoring and automation to improve your package development process, particularly running the integration tests after code changes. Also, as mentioned before, the CI pipeline should run the tests against the different environments you want to support.
We’ve chosen GitHub actions to create a CI pipeline for our package because it checked all our requirements, and it’s free for public repositories. In Github Actions, the concept of workflow represents a pipeline and is configurable by a YAML file. Each workflow has one or more jobs that run in separated runners. Each job has one or more steps to define the pipeline, and there are a set of predefined steps available to use that are known by actions.
We wanted to have one pipeline to run the integration tests, and inside of the pipeline, different jobs per environment (BigQuery, Snowflake, etc.). Let’s assume we want to create just one workflow to run the integration tests with Snowflake to keep the following example simple.
Snowflake authentication requires credentials (account, user, password, role and warehouse) – the way to store credentials on Github is by using Action Secrets.
Setting up the workflow is relatively simple with the credentials in place – we need to set up a runner/container with the repository code. We also need to install dbt inside the container to run and test dbt against Snowflake using the credentials. That being said, let’s take a look at the workflow:
name: Dbt Unit Testing on: push: branches: [master] jobs: integration-tests-snowflake: runs-on: ubuntu-latest environment: test env: SNOWFLAKE_TEST_ACCOUNT: ${{ secrets.SNOWFLAKE_TEST_ACCOUNT }} SNOWFLAKE_TEST_USER: ${{ secrets.SNOWFLAKE_TEST_USER }} SNOWFLAKE_TEST_PASSWORD: ${{ secrets.SNOWFLAKE_TEST_PASSWORD }} SNOWFLAKE_TEST_ROLE: ${{ secrets.SNOWFLAKE_TEST_ROLE }} SNOWFLAKE_TEST_WAREHOUSE: ${{ secrets.SNOWFLAKE_TEST_WAREHOUSE }} steps: - uses: "actions/checkout@v2" - name: test snowflake run: ./run_test.sh snowflake
This sample workflow is triggered when a push to master happens (lines 3-7) and contains one job that runs on the ubuntu-latest runner. The job starts with an environment configuration reading secrets from the action secrets into environment variables inside the container. Then there are two steps – the first step uses a predefined action that checks out the repository into the container, and the second step (test snowflake), runs a bash command run_test. The run_test script hides the details of a shell script that installs dbt on the runner using pip, runs ‘dbt deps’ to install the package code, tests and runs dbt itself.
The workflow covered what was needed to keep the example simple. However, suppose you want to accept external contributions as we do. In that case, the workflow needs a slightly complex triggering strategy on pull requests that we will share in a follow-up article.
Final notes
Our goal was to give you a high-level overview of the work involved in creating a dbt package from a set of macros or models. We hope you understand the complexities of the package creation process after reading the article. If you want to explore it further, you can look at our unit testing package https://github.com/EqualExperts/dbt-unit-testing where we are testing our dbt package against multiple environments.