AI Productivity Is Up. Are Your Metrics Keeping Up?

All this author’s posts

For the past year, I've been hearing a version of the same thing from engineering leaders: AI tools are working, productivity is up, the business case is there. And yet, something about the picture still feels incomplete. So we decided to go find out how widespread that feeling actually is. We surveyed 700 engineers and managers across five countries, and published the results in the State of Engineering Excellence 2026.

89% of engineering leaders say developer productivity has improved since deploying AI. It's a clean story. AI is working. Engineering teams are moving faster.

But, we also found that 81% of those same leaders say code review time has gone up since deploying AI. Significantly up, in a lot of cases. And, developers estimate that roughly a third of their day is now consumed by AI-related work that remains largely invisible to traditional productivity metrics.

So which is it? Is AI making engineering teams more productive, or simply shifting effort into places they don’t yet measure? After sitting with this data for a few weeks, the answer is both. That's the more honest read, even if it's less satisfying.

‍

The gap between generating code and shipping value

AI has been very good at increasing output. Simultaneously, it has not automatically delivered more shipped value.

I talked to a customer recently, a large enterprise engineering org, and they were genuinely proud of how much their output metrics had improved. Lines of code written, PR velocity per developer, tickets closed, features delivered. All of it up. Then we dug into what was actually making it to production, and the numbers looked much less clean. A meaningful share of AI-generated code was not getting to production.

Most organizations can tell you how much AI code was accepted. Very few can tell you how much of it actually landed in production, and that's the number that matters. Hard dollars spent on agent compute that never shipped anything isn't a productivity story. That's a visibility gap, and it's one most organizations aren't measuring today.

‍

What "invisible work" actually looks like

The 31% figure, the estimated share of developer time now consumed by AI-related work that appears in no metric, probably sounds abstract until you break down what it actually is.

It's a developer sitting with a pull request for 45 minutes because the AI-generated code is technically correct but written in a style nobody on the team recognizes, and they need to fully understand it before they can approve it. It's debugging a subtle edge case that the AI missed, which takes longer to track down than writing the function would have. It's working with 10 agents in parallel on 10 different tasks. None of this makes it into velocity or cycle time, and even code review metrics only catch a fraction of it.

What this data shows is that organizations are running a business where the costs are partially off the books. You can show your CFO a 20% productivity improvement and that's true. You just can't show them what it cost to get there.

‍

High confidence in a broken system is its own problem

The finding that surprised me most: 89% of engineering leaders say their current metrics accurately reflect AI's impact. And 94% say key factors like tech debt, validation time, and developer burnout are missing from those same metrics.

When there's no established standard for measuring something, people default to trusting the frameworks they already know. Not because they've validated them for the new environment, but because they're familiar. High confidence in an incomplete system is a coping mechanism, not an accuracy signal.

The lesson: confidence in your measurement system should go up as you add instrumentation, not stay high when important dimensions of the work are still invisible. When 94% of leaders acknowledge gaps and only 6% think they're equipped to close them, that's not a minor calibration issue. That's a signal worth taking seriously.

‍

The trust problem is structural, not individual

54% of practitioners fear individual performance evaluations based on AI productivity data. Managers, by contrast, show far greater comfort with these systems: they are nearly four times more likely than developers to report having no concerns at all.

Measurement systems almost always get built top-down, by the people who won't be measured by them. The practitioners who experience the day-to-day pressures of AI adoption, and who understand where invisible overhead actually lives, are rarely involved in defining the frameworks used to measure it. The result is a system that captures what leadership can see and misses what developers actually experience.

What developers said they need is straightforward: keep improvement data separate from performance evaluation, be transparent about what's being measured, and involve them in defining the metrics. None of that is technically hard. It requires organizational commitment. When measurement feels like surveillance, you don't get accurate data. You get people performing for the system instead of working in it.

‍

What we're doing about it at Harness

The productivity gains from AI are real. The problem is that organizations are making multi-year investment decisions with dashboards built for a different era, and the gap between what those dashboards show and what's actually happening widens as AI adoption scales.

This is a problem we’ve been thinking deeply about at Harness. We’re working on new capabilities in Software Engineering Insights (SEI) that are designed to give engineering leaders visibility into the full picture: not just how much code is being generated, but how much of it is shipping, what the review and validation overhead actually looks like, and where AI spend is producing returns versus producing churn.

We believe the next generation of engineering measurement needs to be built for AI-native workflows, and we’ll be sharing more about that direction in the coming weeks.

Getting the measurement right isn't a reporting exercise. It's what makes the productivity gains from AI sustainable.

Download the full State of Engineering Excellence 2026 report [here].

Trevor Stuart

All this author’s posts

Trevor Stuarto is the founder of Split Software, now GM and SVP of Product for Harness Feature Management and Experimentation.

The AI Productivity Paradox: We're Measuring the Gains and Missing the Costs
| Harness Blog

The gap between generating code and shipping value

What "invisible work" actually looks like

High confidence in a broken system is its own problem

The trust problem is structural, not individual

What we're doing about it at Harness

Similar Blogs

Introducing Software Engineering Insights

A Snapshot of DevOps

DevOps Adoption at Harness

DevOps

Modernization

The AI Productivity Paradox: We're Measuring the Gains and Missing the Costs| Harness Blog

The gap between generating code and shipping value

What "invisible work" actually looks like

High confidence in a broken system is its own problem

The trust problem is structural, not individual

What we're doing about it at Harness

Similar Blogs

Introducing Software Engineering Insights

A Snapshot of DevOps

DevOps Adoption at Harness

the State of

DevOps

Modernization

The AI Productivity Paradox: We're Measuring the Gains and Missing the Costs
| Harness Blog