dbt Labs dropped their annual State of Analytics Engineering report on Tuesday, and one number keeps rattling around my head: 72% of data teams now use AI-assisted coding in their daily workflows. Twenty-four percent invest in AI-assisted pipeline management — the testing, observability, and quality controls that keep dashboards from lying to executives at 8 AM. That three-to-one ratio tells you everything about where the industry is right now.
The Gap Nobody Named Until Now
The report surveyed 363 data practitioners and leaders, and the headline finding is what dbt calls the "acceleration-governance gap." Teams generate dbt models, Spark jobs, and SQL transformations faster than ever. The tooling for verifying those transformations hasn't kept pace.
Here's what shifted year-over-year:
| Priority | 2025 | 2026 | Change |
|---|---|---|---|
| Trust in data and data teams | 66% | 83% | +17 pp |
| Speed of delivery | 50% | 71% | +21 pp |
| Cost reduction | 48% | 53% | +5 pp |
Trust shot up 17 points in a single year. That kind of jump doesn't happen because things are going well. It happens because someone's quarterly revenue dashboard was wrong, the CFO noticed, and suddenly "data quality" appears in company OKRs. Speed climbed even faster — up 21 points. Teams want to move faster and trust the output more. AI gives you the first one essentially for free. The second one costs engineering discipline, and discipline doesn't autocomplete.
Your Warehouse Bill Is Growing Faster Than Your Team
57% of respondents reported increased warehouse and compute spend this year. Only 36% reported increased team budgets. That 21-point spread deserves its own incident review.
AI-generated code is cheap to write and expensive to execute. A copilot spins up 15 new dbt models in an afternoon — each triggering incremental builds, running tests (if anyone wrote them), materializing tables, racking up compute. Nobody asked whether those 15 models should exist. The majority of practitioners still spend their time maintaining and organizing existing datasets, not building new features. The promise was that AI would change that ratio. It seems to have just manufactured more datasets to maintain.
71% Fear Hallucinated Data — and Practitioners Are More Worried Than Their Bosses
This statistic hits differently than LLM hallucination in a chatbot. When a language model fabricates a fact in conversation, you get a wrong answer and move on. When your pipeline produces a hallucinated metric — a join that silently drops rows, an aggregation that double-counts because the copilot didn't understand your SCD2 logic — that number ends up in a board deck. Or worse, in a financial filing.
71% of data professionals cite incorrect or hallucinated outputs reaching stakeholders as a top concern. There's a telling gap between who worries about it: practitioners report 7 percentage points higher concern about exposing sensitive data to LLMs than the leaders approving those tool purchases. The people closest to the code see the cracks that dashboards don't show.
Pooja Crahen, a Senior Manager at Okta, nailed it in the report: "There's tension between moving fast and building trust... discipline in modeling becomes a requirement, not a best practice." When Okta's data lead says governance is no longer optional, that's not conference talk — that's a production lesson.
Meanwhile, data ownership sits stubbornly unsolved at 41%, unchanged from last year. You cannot govern what nobody owns. And technical integration challenges actually improved (from 35% to 27%), which suggests the problem was never wiring tools together. It was always about who's responsible when the numbers are wrong.
Iceberg in the Wild: Less Than You'd Think
Buried in the architecture section is a reality check for anyone drowning in Iceberg hype. Only 9% of respondents run Apache Iceberg in production. Another 12% are in proof-of-concept. A full 68% have no plans at all.
The barriers are telling: knowledge gaps and unclear use cases tie at 27% each. Teams know the format exists — they've read the blog posts, attended the summit talks. They just can't articulate why they'd migrate from their current Delta Lake or Hive setup, and nobody on the team has done it before. The primary driver for those who are adopting? Multi-engine compatibility, at 22%. Not performance improvements. Not storage savings. Portability. The teams moving to Iceberg got burned by vendor lock-in and want to query the same table from Spark, Trino, and Snowflake without maintaining three copies of their data.
What To Do About It Before Your Next Oncall Rotation
The report's implications are clearer than its recommendations. The gap isn't a tooling gap — it's a prioritization gap. Your AI coding assistant already knows how to generate dbt test blocks. The problem is nobody asks it to. Teams sprint to ship models and skip the not_null, unique, and relationships tests that catch the 3 AM page. A dbt project with 200 models and 15 tests isn't "modern analytics engineering." It's technical debt wearing a CI/CD badge.
Measure your test-to-model ratio. Run dbt ls --resource-type test | wc -l alongside dbt ls --resource-type model | wc -l. If tests aren't at least half the model count, you have a coverage problem. Not a philosophical one — a "71% of teams worry about bad data" one.
Audit AI-generated models before they materialize. Tag them staging so they run in CI but don't build in production until someone reviews the SQL. Copilots nail syntax and fumble business logic. That LEFT JOIN might be technically valid and semantically catastrophic.
Track compute cost per model. If your warehouse spend grows at 57% while headcount grows at 36%, you need to know which models are driving the bill. Most warehouses expose query-cost metadata — Snowflake's QUERY_HISTORY, BigQuery's INFORMATION_SCHEMA.JOBS. Use them. Kill the models nobody queries.
Jason Ganz, Director at dbt Labs, put the shift bluntly: "Two years ago... most didn't expect AI generating the majority of code. Today, that's where we are." The generation problem is solved. The trust problem is wide open. And it won't get fixed by a faster model — it'll get fixed by the same boring engineering practices we've always known work: tests, ownership, reviews, and someone who actually understands the business logic behind column rev_adj_3.