AI agents for small and medium businesses: where to start, and why most pilots stall
Only 23% of organizations are scaling agentic AI, and ~95% of GenAI pilots show no P&L impact. For an SMB, the realistic starting point is one narrow agent on cleaner data, not transformation.

McKinsey's November 2025 State of AI survey, run on 1,993 executives across 105 countries, found that 23% of organizations are scaling agentic AI somewhere in the business and another 39% are still experimenting. Inside any single business function, no more than about 10% have scaled agents at all. The press coverage reads as if AI agents are now rewriting the operating model of every company on earth. The data says agents are mostly a slide in a steering committee deck.
This piece is for a different reader. A 50-person logistics company in São Paulo with three SaaS tools and a billing workflow that still moves through email attachments. A 120-person agency where the support inbox routes through one overworked manager. A mid-market industrial firm whose accounts payable team types invoice numbers into an ERP for four hours a day. None of these companies needs a transformation. They need one agent, on cleaner data, quietly removing one workflow's worth of friction. The hard part is almost never the model.
What an "agent" actually means at SMB scale
The chatbot on the homepage is not the agent that will pay back. The agent that pays back is narrower, slower-looking, and lives inside one workflow. It triages a support inbox and routes the top 30% by topic. It reads an emailed PDF invoice, pulls header fields, and creates the draft AP record. It scores inbound leads against the last twelve months of closed-won deals and assigns the right rep. None of those are demos. They're plumbing.
The definition matters because the failure mode is set by the scope. A wide, ambitious "AI assistant for the whole company" inherits every messy table, every undefined metric, and every undocumented exception in the business. A narrow agent inherits one workflow's worth of mess, which is usually small enough to fix. That's the entire reason narrow agents ship and broad ones don't.
Why most pilots stall
The most-cited stall number this year comes from MIT's NANDA initiative. Their August 2025 report, The GenAI Divide, drew on 150 interviews, a 350-employee survey, and analysis of 300 public deployments. About 95% of enterprise GenAI pilots delivered no measurable P&L impact. The 5% that did had a specific shape: 67% of deployments partnered with specialized vendors succeeded, against roughly 33% of internal builds. The report's own framing is worth quoting: "the core issue is not the quality of the AI models, but the learning gap for both tools and organizations."
Where does that gap live? Two other 2025-2026 surveys point at the same place. Cloudera and Harvard Business Review Analytic Services, in a March 2026 study of about 230 HBR audience members surveyed in October 2025, found that only 7% of organizations describe their data as "completely ready" for AI. Fifty-six percent named siloed and integration issues as their top obstacle. Gartner's February 2025 release, drawing on 248 data management leaders, said 63% of organizations either lack AI-ready data management or aren't sure they have it, and forecast that through 2026 organizations would abandon 60% of AI projects unsupported by AI-ready data.
BCG put a clean number on the same picture. Their AI Radar 2024, fielded on more than 1,000 C-level executives across 59 countries and 14 industries, attributed roughly 70% of AI implementation challenges to people and process, 20% to technology and data, and only 10% to the algorithms. Most pilots don't fail because the model is wrong. They fail because the inputs are wrong, the workflow around the model was never redesigned, and nobody owns the data the agent reads from.
Where AI implementation actually breaks
Share of AI implementation challenges, by source.
- People & process70%
- Technology & data20%
- AI algorithms10%
The surprise isn't that people and process matter. It's the ratio. Seven units of difficulty live outside the model for every one unit inside it. A 30-person operations team that can't reconcile its own AR aging report inside one BI tool is not going to be rescued by a frontier model. The model will read whatever it's given and produce a confident sentence about the wrong number.
Where agents actually return value in an SMB
Three patterns show up reliably enough in the public record and in real engagements to recommend by name. None of them is a transformation. All of them are narrow.
1. Customer-service triage and assist
The most-documented pattern, and the one with the cleanest ROI evidence, is a support agent that handles top-of-funnel triage and assists humans on the harder tickets. Forrester's Total Economic Impact composites, commissioned by vendors and built from customer interviews, have published three-year ROI figures of 315% for Microsoft Dynamics 365 Customer Service, 301% for Zendesk Advanced AI, 391% for PolyAI voice, and 210% for Sprinklr, with payback periods inside six months in the strongest cases. These are directional, not audited; vendor-commissioned TEIs run optimistic, and the "composite organization" is built from selected real customers. Read them as the ceiling, not the benchmark.
Customer-service agent ROI claims, side by side
Three-year ROI for composite organizations across five vendor-commissioned studies.
Klarna's February 2024 agent is the celebrity in this category. Inside one month the system handled 2.3 million customer-service conversations, two-thirds of the company's chat volume, did the equivalent work of roughly 700 full-time agents, cut average resolution time from 11 minutes to under 2, and projected about $40 million in profit improvement. The walk-back is part of the story too. By 2024-2025 Klarna had publicly softened the AI-first posture and rehired some human agents on the harder paths. The right reading is not "Klarna failed." The right reading is that the ceiling exists, the ceiling is high, and the path to it is human plus agent, not agent alone. Klarna is a useful illustration. It is not an SMB benchmark, a 30-person company will never see 2.3 million conversations in a month.
2. Accounts payable and back-office workflows
The second pattern is invoice and document workflow automation: an agent that reads emailed PDF invoices, extracts header and line-item fields, matches against POs, and creates a draft entry the AP clerk approves. The evidence here is weaker. AP-vendor blog posts cite "up to 80%" cost savings without methodology or sample, which is why this piece doesn't quote them. What's defensible is the workflow logic. An AP team that currently rekeys 400 invoices a month with a 4% error rate spends a measurable number of hours on rekeying and a measurable number of hours on rework. An agent that gets the first pass right 85% of the time turns most of those hours into review hours. The math is real even where the vendor numbers aren't.
This is also the pattern with the loudest data-readiness requirement. If supplier names are inconsistent across the ERP, the agent will create duplicate vendors. If invoice line items aren't tied to a clean chart of accounts, the agent will guess. The cleanup is half the project. Anyone who quotes you a four-week deployment without asking to see the supplier table is selling something.
3. Lead routing and document extraction
The third pattern is narrower still: an agent that reads inbound form data plus the free-text message, scores against the last twelve months of closed-won deals, and routes to the right rep or tier. Or one that ingests a contract PDF and produces a structured summary with the dates, parties, and renewal terms a paralegal would otherwise type out. There is no clean public SMB-segmented number for lead-routing lift, so I won't fake one. What I can say is that the pattern is short to ship, short to evaluate, and tends to survive the budget cycle because the failure is loud, wrong rep, wrong tier, and easy to correct.
The Brazil overlay
For a Brazilian reader the international numbers above understate where the local market sits. CGI.br's TIC Empresas 2024, the federal benchmark survey of 4,453 enterprises with 10 or more employees, found that only 13% of Brazilian companies were running AI applications by 2024. Large companies sat at 38%, medium at 29%, and small at 10%. The top reported use case was process and workflow automation, at 63% of AI-using firms. That number is a clue: when Brazilian companies do adopt, they adopt where this piece recommends starting.
Brazilian AI adoption falls off a cliff by size
Share of Brazilian companies running AI applications, by headcount band.
Small companies sit closer to the national total than to medium ones, the population is almost entirely small. A separate September 2025 study from SEBRAE, FGV-IBRE, and Google surveyed about 5,000 companies and measured something different. It asked about familiarity with generative AI tools and about frequent use. Familiarity is near-universal: 99% in mid and large companies, 96% in MPEs, 87% in MEIs. Frequent use collapses the picture: 35% in mid and large, 15% in MPEs, 18% in MEIs. The two surveys disagree because they measure different things. CGI.br measures deployed AI applications. SEBRAE measures hands-on tool use. Reading them together, the Brazilian MPE has heard of ChatGPT, has tried it once, and has not yet wired it into anything that runs without a person typing into a chat box.
Brazilian companies know AI; few use it daily
Familiarity with generative AI versus frequent use, by company size.
The gap between familiarity and frequent use is wide at every tier, but it's widest where the company is smallest relative to its appetite. MPEs report higher familiarity than MEIs and lower frequent use, the awareness has arrived faster than the operational fit. The implication is more useful than the headline. The Brazilian SMB doesn't need to be sold on AI. It needs help moving from "we've used ChatGPT a few times" to one agent that runs against the company's own data without supervision. That's a sequencing problem, not an awareness problem.
The objection that deserves a real answer
The strongest pushback to everything above is: if 95% of GenAI pilots show no P&L impact, why should a 50-person company go anywhere near this? The honest answer is that the same MIT NANDA finding splits the population. Projects bought from specialized partners succeeded at roughly 67%. Internal builds succeeded at about 33%. The failure rate is not uniform. It tracks closely with whether the team running the project has done it before, whether the data was prepared first, and whether the workflow around the agent was redesigned rather than decorated.
For an SMB the lesson is specific. Don't run an AI program. Run one project. Pick one workflow whose pain is measurable in hours per week. Hire or contract someone who has shipped this exact pattern before. Spend the first two weeks on the data the agent will read from, not on the agent. Ship the narrow version, measure for a month, decide whether to extend. That sequence isn't the heroic version of AI adoption. It's the version with the failure rate that doesn't bankrupt anyone.
What "starting" looks like with Data Concierge
McKinsey's State of AI 2025 ran a regression across 25 attributes that might explain why some companies see EBIT impact from AI and others don't. The single biggest lever, ahead of model choice, vendor choice, and budget size, was workflow redesign. Only 21% of gen-AI users in the survey had redesigned even some workflows around AI. That 21% accounted for a disproportionate share of the EBIT impact. High performers, the cohort seeing meaningful returns, were roughly 6% of the sample.
The Data Concierge sequence is built around that finding. The work splits into four steps, in this order, with no skipping.
- Pick the one workflow. Audit two or three candidate workflows with the team. Cost each one in hours per week. Pick the workflow with the highest hours and the cleanest data inputs, not the one with the most strategic-sounding name.
- Clean the data the agent will read. Two to four weeks, depending on the mess. Reconcile supplier names, define the one metric the agent will quote, map the source tables, set the access policy. This is the boring step the failures skip.
- Deploy the narrow agent. Buy from a specialized vendor where one exists; build only if no off-the-shelf option fits the workflow. Wire it into the cleaned data. Keep a human in the loop on every decision the agent makes in the first month.
- Measure for a month, then decide. Hours saved, error rate, escalation rate, customer-side signal. If the numbers move, extend. If they don't, pull the agent out and run the next candidate. Don't argue about whether AI works, measure whether this AI works for this workflow.
The Data Concierge role in this sequence is the bridge between the executive ambition ("we should be doing AI") and the technical reality ("our data isn't ready and we don't know which workflow to start with"). The role isn't to be the AI vendor. It's to make sure the agent the vendor sells has something trustworthy to read, and that the workflow around it has been redesigned, not just plumbed in. I wrote about the same dynamic at more strategic length in companies transitioning to AI need a data concierge.
If you want help choosing the first workflow, getting the data the agent will read into shape, and deciding whether to buy or build, that's the work we do. A 30-minute call to talk through where you are; a 1 to 2 week diagnostic if it's worth doing.
Pick the first workflow with meThe quiet version
The version of this story that ages well isn't the one where a board approves a $2 million AI program and the CEO tells the analyst day call about transformation. It's the one where, eighteen months from now, three agents are running quietly inside a 100-person company: the support inbox triages itself, the AP team approves invoices instead of typing them, and the lead-router puts the right rep on the right deal. Nobody got a promotion for shipping any of them. They just saved an aggregate forty hours a week that the company gets back.
That's what AI for an SMB looks like when it works. The companies that get there did one thing the 95% didn't: they fixed the plumbing before they bought the model.
Related reading
For the strategic framing of why the foundation comes before the copilot, see fix your data before adopting generative AI. For the SaaS-specific cut of the same adoption boom, see AI adoption in small SaaS product teams. For the Brazilian-market view, see Brazil's AI adoption boom in public numbers.
Sources
McKinsey, The State of AI, November 2025 (n=1,993 executives, 105 countries, June to July 2025 fieldwork): McKinsey QuantumBlack State of AI.
MIT NANDA, The GenAI Divide, August 2025 (150 interviews, 350-employee survey, 300 public deployments): Fortune coverage of the MIT NANDA report.
Cloudera and Harvard Business Review Analytic Services, AI data readiness study, March 2026 (n≈230 HBR audience, October 2025 fieldwork): Cloudera / HBR press release.
Gartner, AI-ready data research, February 2025 (n=248 data management leaders, Q3 2024): Gartner press release on AI-ready data.
BCG, Where's the Value in AI?, October 2024 (AI Radar 2024, 1,000+ C-suite, 59 countries): BCG Where's the Value in AI.
Forrester Total Economic Impact composites, vendor-commissioned, 2024-2025: Microsoft Dynamics 365 Customer Service TEI; Zendesk Advanced AI TEI; Sprinklr Customer Service TEI.
Klarna AI assistant first-month results, February 2024: Klarna press release.
Goldman Sachs 10,000 Small Businesses Voices, February 2026 (n=1,256 US SMBs): Goldman Sachs press release. Intuit QuickBooks Small Business Insights, April 2025 (n=2,200 US businesses up to 100 employees): Intuit QuickBooks survey.
CGI.br, TIC Empresas 2024 (n=4,453 Brazilian enterprises with 10+ employees, fielded March to November 2024): CGI.br press release. SEBRAE / FGV-IBRE / Google, September 2025 (n≈5,000 Brazilian companies): Blog do IBRE / FGV.