Question 1

dbt vs traditional ETL (like Informatica), what's the difference?

Accepted Answer

dbt is an ELT framework (Extract Load Transform), data lands in your warehouse first, then dbt transforms it using SQL. Traditional ETL transforms before loading. dbt's advantage: leverage the warehouse's compute (Snowflake, BigQuery), version-control transformations via Git, test data quality, document lineage, and iterate quickly. Drawback: requires centralized data warehouse (not cheap for small datasets). Use dbt if you have Snowflake/BigQuery/Redshift + >1GB data. Use traditional ETL if moving raw files between on-prem systems.

Question 2

What's a 'model' in dbt and how does it differ from a view or table?

Accepted Answer

A dbt model = a SQL file that defines a transformation. When you run dbt, each model becomes either a view or table in your warehouse (you choose via materialization config). View = query runs every time (cheaper compute, slower), table = pre-computed and stored (expensive compute, fast queries). Incremental model = hybrid (update only new rows since last run). Use tables for frequently-queried fact tables, views for intermediate staging layers. `SELECT * FROM {{ ref('my_model') }}` pulls from upstream models.

Question 3

How do I test data quality in dbt and prevent bad data from reaching downstream dashboards?

Accepted Answer

dbt includes built-in tests (unique, not_null, relationships, accepted_values) in schema.yml and custom SQL tests. Example: `- name: user_id
  tests:
    - unique
    - not_null`. Run `dbt test` to validate. Fail fast: block downstream models if tests fail (dbt's default). Advanced: write custom tests (e.g., 'conversion rate between 0 and 1'). Test at source level (raw data) and model level (transformed). Monitor test coverage: aim for >80% of columns tested.

Question 4

Snapshots and incremental models, when do I use each?

Accepted Answer

Incremental = transforms only NEW rows since last run (fast, cheap). Use for fact tables growing daily (events, transactions). Snapshot = captures a point-in-time view of a slowly-changing dimension (SCD). Example: snapshot user profile each day, track when 'plan' column changed from free to paid. Incremental fails if you need historical versions of a row; snapshot is designed for that. Rule of thumb: incremental for immutable events, snapshots for mutable dimensions.

Question 5

How do I set up CI/CD for dbt in production (dbt Cloud vs self-hosted)?

Accepted Answer

dbt Cloud = managed solution, runs jobs on dbt's infrastructure, built-in scheduler, PR deployment previews (run jobs against staging branch). Cost ~$100-300/mo. Self-hosted (Airflow/Dagster) = you manage Python orchestrator, invoke dbt CLI in tasks, more control but more ops. Modern best practice: use dbt Cloud for small teams (<$10M ARR), self-hosted for large companies managing complex DAGs. dbt Cloud integration: commit to branch, dbt automatically runs `dbt test` + `dbt build` on preview branch before merging to main.

Question 6

What's the dbt semantic layer and how does it change analytics?

Accepted Answer

Semantic layer = centralized metric definitions (one source of truth for KPIs). Instead of each BI tool defining 'revenue' differently, dbt semantic layer defines it once (SELECT SUM(amount) WHERE status='paid'), and Looker/Tableau/Metabase pull from it. Benefit: consistency, fast iterations (change metric definition once, all dashboards update). Drawback: new workflow, requires dbt Cloud (not Core), learning curve. Use if: multiple BI tools or teams defining metrics differently. Skip if: small team, single BI tool.

Question 7

How do I debug dbt when transformations produce wrong results?

Accepted Answer

Use `dbt debug` to check Warehouse connection. Use `dbt run --select my_model` to run one model. Check `target/compiled/` folder to see the actual SQL generated (Jinja templates expanded). Check `target/run_results.json` for execution times and test failures. Common bugs: missing WHERE clause (forgot to filter source data), wrong join logic (many-to-many join creating duplicates), stale dependencies (old ref() to wrong model). Always run `dbt test` after model changes. Use `dbt freshness` to check source data staleness.

Region	Junior	Mid	Senior
USA	$85k	$128k	$163k
UK	£52k	£82k	£110k
EU	€58k	€90k	€125k
CANADA	C$92k	C$135k	C$175k

Region	Junior	Mid	Senior
USA	$85k	$128k	$163k
UK	£52k	£82k	£110k
EU	€58k	€90k	€125k
CANADA	C$92k	C$135k	C$175k

dbt (Data Build Tool)

What is dbt (Data Build Tool)

📋 Before you start

💰 Salary by region

🎓 Certifications

🎯 Careers using dbt (Data Build Tool)

⚖ Compare with

❓ FAQ

Not sure this skill is for you?

Find your ideal career path

dbt (Data Build Tool)

What is dbt (Data Build Tool)

📋 Before you start

💰 Salary by region

🎓 Certifications

🎯 Careers using dbt (Data Build Tool)

⚖ Compare with

❓ FAQ

Not sure this skill is for you?

Find your ideal career path

dbt (Data Build Tool)

What is dbt (Data Build Tool)

📋 Before you start

💰 Salary by region

🎓 Certifications

🎯 Careers using dbt (Data Build Tool)

⚖ Compare with

❓ FAQ

🔗 Related skills

Not sure this skill is for you?

Find your ideal career path

dbt (Data Build Tool)

What is dbt (Data Build Tool)

📋 Before you start

💰 Salary by region

🎓 Certifications

🎯 Careers using dbt (Data Build Tool)

⚖ Compare with

❓ FAQ

🔗 Related skills

Not sure this skill is for you?

Find your ideal career path