All About dbt (Data Build Tool) with BigQuery 🚀

Published: None

Source: https://www.linkedin.com/pulse/all-dbt-data-build-tool-bigquery-arabinda-mohapatra-kjekc?trackingId=ux5WBiGsSi2CFJi4kUVf3A%3D%3D

All About dbt (Data Build Tool) with BigQuery 🚀

Arabinda Mohapatra

Running Kafka streams after dark, diving into genetic code by daylight, and wrestling with Databricks and Tableflow in every spare moment—sleep is optional

January 10, 2025

What is dbt?

dbt is the T in ELT (Extract, Load, Transform). It allows you to write SQL queries that transform raw data into usable insights, making it easier to:

✅ Create repeatable workflows.

✅ Perform testing and ensure data quality.

✅ Automate transformations for scalable pipelines.

💡 It’s specifically designed for the modern data stack and integrates seamlessly with cloud warehouses like BigQuery, Snowflake, and Redshift.

dbt Products

🔹 dbt Core:

A free Python package.
Interact via the CLI for building, testing, and running models.
Great for individual contributors or teams familiar with the command line.

🔹 dbt Cloud:

Built on top of dbt Core.
A web-based platform with scheduling, job orchestration, and a collaborative IDE.
Perfect for teams needing scalability, ease of use, and enterprise-level features.

🔹 dbt Desktop:

A lesser-known product that’s focused on local development for non-technical users.
Ideal for smaller teams or organizations transitioning to dbt Cloud.

Which dbt Tool is Right for You?

dbt CLI: Best for developers and technical teams comfortable with the command line.
dbt Cloud: Ideal for enterprises or teams that value collaboration, scheduling, and ease of use.
dbt Desktop: Great for smaller teams or those transitioning to dbt.

Why Use dbt?

🔸 Streamline data transformations without needing complex ETL tools.

🔸 Build modular, maintainable models with SQL and Jinja macros.

🔸 Enable data quality testing (e.g., uniqueness, not null checks).

🔸 Document your data pipelines for better transparency.

🔸 Collaborate seamlessly on data projects with version control (Git).

Understanding dbt Project Structure 🚀

🔹 models/ This is where the magic happens! Models are SQL files that define the transformations in your data pipeline.

✨ Example:

stg_customers.sql and stg_orders.sql: Staging models for cleaning and standardizing raw data.

🔹schema.yml: Add tests (e.g., uniqueness, not null) and documentation for your models.

🔹 views/ Organize your final views or data outputs here

✨ Example:

v_customers.sql: A curated view of customer data ready for reporting.

🔹 seeds/ Store small, static datasets (like CSV files) to load as tables. Think of it as your reference data hub!

🔹 snapshots/ Track historical changes in your data with snapshots. Perfect for slowly changing dimensions (SCDs).

🔹 target/ Holds auto-generated files like compiled SQL and manifests during execution.

💡 : This folder is excluded from version control.

🔹 Other Files:

dbt_project.yml: The brain of your dbt project, defining configurations like model paths and materialization.
.gitignore: Ensures you exclude unnecessary files (like target/) from version control.

Here’s the extracted text from the image:

dbt Key Concepts

MODELS:

It is a SQL file with a select statement. These models are defined in .sql files and can reference other models or tables in your data warehouse. The names of models created by dbt are their file names.

MACRO:

Code Reusability - Using macros in dbt simplifies the process of reusing SQL code fragments across various models.

TESTS:

There are two primary ways to define dbt tests: generic and singular.

Generic: Predefined tests are out-of-the-box tests that you can apply across multiple data models.
Singular: Custom tests are—yes, you guessed it—customized tests you develop for a specific data model.

SNAPSHOTS:

You can track slow-changing dimensions over time.

💡 Why this structure matters:

This modular setup enables:

✅ Clear separation of data transformation layers.

✅ Easy testing and documentation of your data models.

✅ Scalability for collaborative projects.

Here’s the extracted text from the image:

dbt Cloud and BigQuery

PREREQUISITES:

GCP Account
GCP Project
BigQuery set up
Github account

Steps to create DBT Models with BigQuery:

go to https://www.getdbt.com/ & create an account Try DBT Cloud Free

2. Create a Github Repositories named like dbt-bq-demo

3. Do the big Query Setup-https://console.cloud.google.com/bigquery

4. Go to the extreme left corner (Your Name ) & click on that then go tho the connection and click on new connection

5. Click on Big Query as a Connection Type

6. We have to now download the Json file from Bigquery console which will automatically fill all the required filelds to get that json file go to https://console.cloud.google.com/bigquery

7. Search IAM &Admin in the search bar

8. Click on Service account & then CREATE SERVICE ACCOUNT

9. Provide Service Account Name & then click CREATE AND CONTINUE

10. In Option 2 select role as an OWNER(Just for demo-we are provideing the role) Then click on done -

11. Now service account got created & click on that

12. Click on Keys & then click on add keys

13. Create a json file for your connection-click on CREATE only -automatically download a json file to your local system

14. Now go back to dbt cloud screen & upload the Service Account json file & Keep as it is all the option fields (fileds will automatically populated after uploading)

15.Now create a project under setting >project > new project>

16. Now provide the project name,development connection & github connection

17.Click on Develop on right side >Cloud IDE--Will populate the IDE

Now it is sucessfully connected to bigquery

Will write further on this on Models,Views,Staging with Example

Reference:

https://www.youtube.com/watch?v=C6BNAfaeqXY

Search This Blog