Are you at risk from this critical dbt vulnerability?
A newly discovered critical security vulnerability in the dbt ecosystem
UPDATE 17th July 2024: CVE-2024-40637 assigned and noted in GitHub.
CVSS score 4.2.
Today we’re sharing news of a critical security vulnerability that affects users of the dbt package ecosystem. This vulnerability, which I discovered with Michal Czerwinski, highlights the challenges our industry faces around the security of new software package supply chains. We responsibly disclosed our concerns to dbt Labs, who accepted the vulnerability and have implemented mitigations.
Understanding the vulnerability
The dbt tool is widely used to transform data within data warehouses. It allows data analysts and engineers to write modular SQL queries, which can be used in data pipelines.
dbt’s power and flexibility has made it a popular choice in the analytics engineering space, but that same flexibility also introduces significant risks. Because dbt brings its own ecosystem of software packages, the core of this vulnerability is the trust model inherent in software supply chains.
The potential impact of this vulnerability is severe. An attacker could:
Manipulate data: alter or delete data, leading to data integrity issues
Exfiltrate data: extract sensitive information from or change permissions in the database
“During a threat assessment for one of our clients, we encountered several security concerns. As I explored how to securely expose the DBT ecosystem to our developers, it became clear that there are significant challenges in addressing software supply chain security within the current DBT module ecosystem.” – Michal Czerwinski
When users install dbt packages from sources other than dbt Labs, they trust that these packages perform the advertised function and nothing more. In affected versions, the new vulnerability abuses the way dbt generates SQL, allowing a malicious dbt package to execute SQL injection attacks without any user interaction. An attacker could craft a dbt package that, once installed, could change, exfiltrate, or delete data within the victim database. We believe this vulnerability affects both dbt-core and the dbt Cloud hosted service.
We should note that dbt packages are not Python packages. They are a part of a dbt-specific package ecosystem that is largely unknown to the infosec community. Software Composition Analysis (SCA) tools like safetycli and Snyk can, along with Static Application Security Testing (SAST), scan third party and transient dependencies, alerting users to known vulnerabilities they might be exposed to.
This is a critical blind spot for users who depend on such tooling to inform them of vulnerabilities they are exposed to.
Simple example: exfiltrating at scale on Google Cloud via dbt
Here’s a simple exploit we crafted to demonstrate the problem. An attacker creates a malicious dbt package that copies your data out of Google BigQuery in the background whilst performing its advertised function.
The attacker creates a project named “myco_example_project” in Google Cloud, and creates a dataset “example_dataset” inside. This dataset is shared with public Data Editor permissions, so a table can be created in this dataset, and data copied into it from anywhere.
Within our exploit package’s directory structure is an innocuous-looking file “macros/example.sql”, starting with the following Jinja macro text:
An unsuspecting victim installs the package from GitHub or dbt Hub. With no further interaction, they execute `dbt run` as usual, or it is run by their automation.
In affected versions of dbt, this macro is run silently in place of the legitimate and trusted BigQuery adapter’s version. The contents of whatever `SELECT *` produces against this model (and for each of the set of models included in the run) is copied into a new table in the attacker’s dataset in seconds. Evidence of the exfiltration would only be present in the dbt log files and GCP audit logging, neither of which would, by default, proactively alert the victim of the attack.
How to mitigate against this dbt vulnerability
The vendor has provided mitigations for the issue with the config flag require_explicit_package_overrides_for_builtin_materializations. The behaviour of this flag varies by versions of dbt core and dbt Cloud, so refer to the Legacy Behaviours documentation to understand your current position and upgrade options. We offer the following advice for any dbt users to assess and mitigate the risks posed by this vulnerability:
dbt-core versions are Python dependencies. dbt Labs have recently updated their documentation to making a strong recommendation to keep versions up-to-date. Ensure dbt-core versions are actively updated to the latest versions as these fixes become available, including in dbt Cloud.
Review dbt package usage in your organisation. Ensure packages are obtained from trusted sources like dbt vendor itself, check that the value of a package outweighs the risk.
Ensure software dependencies are being scanned for known vulnerabilities, and that you have a vulnerability management process in place to respond to any alarms.
Review and minimise permissions that dbt is run with for human and unattended workloads.
Review the controls you have in place in your infrastructure that prevent transfer of data outside your organisational boundaries.
Cybersecurity Strategy and the Secure Delivery Playbook
Case Study
Plan for the future with a tailored security health check
Case Study
Reducing time-to-market in security architecture
Get in touch
Solving a complex business problem? You need experts by your side.
All business models have their pros and cons. But, when you consider the type of problems we help our clients to solve at Equal Experts, it’s worth thinking about the level of experience and the best consultancy approach to solve them.
If you’d like to find out more about working with us – get in touch. We’d love to hear from you.