April 28, 2026·4 min read·AI, Governance, Strategy

Why your company needs to fix its data before adopting generative AI

Generative AI is only as good as the data feeding it. If your KPIs disagree with each other, your copilot will too. Here's how to think about the order of operations.

Gabriel Fernandes

Data Wizard

Ler em português

Every founder and CDO I talk to in 2026 has the same conversation queued up: "we need to put AI in front of our data." Some are already piloting a copilot. Others are evaluating vendors. A few are quietly admitting that the pilot they ran six months ago stalled and nobody quite wants to revive it.

The pattern I see in the stalled pilots is almost always the same, and it has very little to do with the model, the prompt, or the chosen vendor. It has to do with what the model is reading. When I dig in, I find a foundation that wasn't ready for a human analyst to trust, let alone a language model to query.

The model is only as honest as the data

A large language model is a confidence engine. Ask it about your revenue last quarter, and it will produce a fluent, coherent answer regardless of whether the underlying numbers are clean. If two tables in your warehouse define "active customer" differently, the model has no way of knowing which one to trust. It will pick one, narrate it confidently, and the executive on the other side of the chat interface will believe it.

I've watched this happen in real companies. A CFO asks the copilot about churn, gets a number, then asks the BI team to confirm it, gets a different number. Within two weeks the experiment is dead, not because the model is bad, but because nobody knows which version of the truth to defend.

Three signs your foundation isn't ready

Before you spin up another pilot, walk through these three checks. If any answer is "no", you're not ready to put a model in front of internal users.

One source of truth per metric. If "revenue", "active customer" or "MRR" can be calculated three different ways depending on which dashboard you open, your model will inherit that ambiguity and make it worse.
Lineage you can read. When the model surfaces a number, can a human trace it back through the pipeline to the raw source in under five minutes? If not, you can't audit hallucinations when they happen.
Access controls that actually mean something. Generative AI is a confidentiality multiplier. If your warehouse has loose row-level access, a copilot will happily quote the salary table to anyone who asks nicely.

What "AI-ready" actually means

AI-ready isn't a marketing label. It's a concrete checklist. When I run a foundation audit before a generative-AI initiative, I'm looking for four specific layers:

A modelled warehouse with clear staging, intermediate and mart layers. Each metric has a single canonical definition that downstream tools cannot redefine.
A data catalog (Atlan, DataHub or equivalent) where every metric has an owner, a definition in plain language, and a freshness SLA.
End-to-end lineage so the model, and the human reviewing its output , can walk from a number on a chart back to the source row in the system that generated it.
Governance hooks that decide what the model can and can't see. Not a wishlist; an enforced policy at the warehouse layer.

Notice what's not on this list: which model you use, which vector database you chose, whether you're doing RAG or fine-tuning. Those decisions matter, but they're cheap to change later. The four items above are expensive to retrofit once a copilot is already in production and your team has built habits around wrong answers.

If you're sitting on a stalled AI pilot, the model is rarely the problem. I run a 1–2 week foundation audit that tells you exactly what to fix before you re-launch, and in what order.

Audit my foundation

The order I recommend

The companies that ship AI features that actually get used internally tend to follow the same order of operations: foundation, then tooling, then adoption.

First, fix the foundation. Pin down metric definitions, ship a tested dbt warehouse, deploy a catalog, wire lineage. Boring work that nobody will celebrate, but it's the difference between a copilot that's trusted and a copilot that's quietly ignored.

Then, pick your tooling. Once the warehouse is the source of truth, the choice between vendors becomes much smaller. You're not buying a miracle worker, you're buying a UI on top of data you already trust.

Finally, drive adoption. Train your team on what the model is good at and what it isn't. Set the expectation that every answer can be traced back to the warehouse. Build the muscle of asking "where does this number come from?" before acting on it.

The companies trying to do this in reverse, buy the tool, then patch the data, then drive adoption, are the ones writing me six months later asking why their pilot stalled. The order matters more than the model.

Want to discuss your setup?

Let's turn your data into decisions.

Get in touch