Back to blog
·4 min read·dbt, Data engineering, Tutorial

dbt for small teams: how to start without overcomplicating

dbt looks like overkill when you're a team of one. It isn't, but the way most tutorials introduce it is. Here's the minimum viable setup that scales.

Gabriel Fernandes
Gabriel Fernandes
Data Wizard
Ler em português

Half the small companies I talk to dismiss dbt as "the tool the FAANG-scale analytics teams use". The other half try to adopt it, follow a tutorial that introduces twelve packages on day one, get scared, and quietly go back to SQL views in their warehouse.

Both groups are wrong, but the second group has the better instinct. dbt is worth the investment for almost any team running more than ten production queries, but only if you adopt it in proportion to your size. Here's the version I deploy at small companies: opinionated, minimal, and designed to grow with you instead of forcing you to grow into it.

What dbt actually buys you on day one

Strip away the marketing and the YouTube tutorials. Day one, dbt gives you four things you don't have if you're writing SQL views in your warehouse:

  1. Version control. Your transformations live in git. You can review changes, roll back, and see who introduced that weird CASE statement in the revenue model six months ago.
  2. Tests. One line of YAML and dbt will warn you when a primary key isn't unique, when a column starts producing nulls it shouldn't, or when a foreign key drifts.
  3. Documentation that doesn't rot. Models and columns are documented next to the SQL that defines them, so the docs only get out of date when you forget to update both.
  4. A repeatable build. One command rebuilds your entire analytics pipeline in dependency order. No more "I think this view depends on that one but I'm not sure".

Everything else dbt offers, semantic layer, exposures, snapshots, the entire hub of community packages, is value you'll grow into. Don't try to use it on day one.

The minimum viable dbt project

For a team of 1–5 working with a single warehouse, here's the smallest project structure that I'm willing to put in production:

models/
  staging/
    stg_stripe__charges.sql
    stg_stripe__customers.sql
    stg_hubspot__deals.sql
  marts/
    fct_revenue.sql
    dim_customers.sql
  schema.yml

Two layers. staging models are 1:1 with raw source tables, rename columns to a consistent convention, cast types, and stop. No business logic. marts models are the business-facing tables your dashboards read from. They join the staging models together and apply the actual rules that define your KPIs.

Notice what's missing: an intermediate layer, custom macros, a utils package, exposures, snapshots, seeds. You'll add those when you have a concrete reason, a join that gets reused four times, a slowly-changing dimension that actually matters, a finance team asking when the revenue model last refreshed. Adding them prematurely creates folders with one file in them, which is the opposite of clarifying.

The four tests every model needs

For each mart model, write these in schema.yml:

  • Unique + not_null on the primary key. If fct_revenue has a row per invoice, the invoice id should be unique and never null. This catches an enormous class of pipeline bugs.
  • relationships test on every foreign key. If fct_revenue.customer_id references dim_customers.customer_id, dbt will tell you the day a customer is deleted upstream and your fact table starts orphaning rows.
  • accepted_values on enum-like columns. Status fields, country codes, plan tiers, if a new value ever appears, you want a warning, not a silent dashboard mystery.
  • a freshness check on each source. One line in your sources YAML and dbt will fail the run when the raw charges table hasn't been updated in 24 hours.

Four tests, fifteen minutes of YAML. They will catch 80% of the silent breakages that make data teams look unreliable.

CI from day one (yes, really)

I know, you're a team of two, you don't need CI. You absolutely do. The single thing that separates a dbt project that stays trustworthy from one that decays into "the model that we're not sure works" is automated test runs on every PR.

On dbt Cloud, this is a single toggle: enable Slim CI on a job that runs on every PR. If you're on dbt Core, a 30-line GitHub Actions workflow gets you the same thing. Either way, the contract is: no PR merges with red tests. That contract is much easier to enforce on day one with two contributors than on day two hundred with twenty.

A 1-week dbt setup engagement gives you the layered project, the four-test convention, CI on every PR, and a documented runbook your future hires will be grateful for. Painful to retrofit later, cheap to do right now.

Set up the right way the first time

What to skip until you actually need it

A short list of things tutorials will tell you to do that you should resist until you have a reason:

  • Snapshots. Useful when you genuinely need to track changes over time. Premature when you just want yesterday's data.
  • Custom macros. Don't write a macro until you've copied the same SQL three times. The third copy is when you know what the abstraction actually wants to be.
  • The dbt-utils package. Adds 100 macros you won't use. Add it when you need a specific one, not preemptively.
  • The Semantic Layer. Worth it when you have multiple BI tools competing for the same metric. Overkill when one analyst is hitting one dashboard.

The version of dbt you adopt on day one is not the version you'll be running in two years. That's fine. The point is to lock down the boring fundamentals, version control, layered models, tests, CI, and add complexity only when a concrete pain demands it.

Want to discuss your setup?

Let's turn your data into decisions.