Which topics does this article cover?

It highlights AI productivity, developer tools, METR study, AI coding, GitHub Copilot.

AI Made Developers 19% Slower. They Thought It Made Them 24% Faster.

Q: What is "AI Made Developers 19% Slower. They Thought It Made Them 24% Faster." about?

METR ran a proper randomized controlled trial. 246 real tasks. Expert developers. State-of-the-art AI tools. Result: developers were 19% slower with AI — and were convinced they were 20% faster. Here's what that gap means.

You probably have a feeling about whether AI makes you faster.

Most developers do.

And that feeling is almost certainly wrong.

In July 2025, a nonprofit called METR published a study that should have broken the internet but mostly got buried under the next wave of AI announcements. They ran a proper randomized controlled trial — the kind of study that drug companies run before putting something on the market — to measure what AI coding tools actually do to developer productivity.

Here is what they found:

Developers with AI tools took 19% longer to complete tasks than developers without them.

And here is the part that is even harder to sit with:

Before seeing the results, those same developers predicted AI would make them 24% faster.

After seeing the results, they still insisted they felt 20% faster.

The data said slower. The developers felt faster. Both things were simultaneously true.

That gap — between what you experience and what is actually happening — is what this post is about.

How the study worked

Most AI productivity studies are garbage.

They use weak proxies. They are vendor-sponsored. They test toy problems on junior developers. They measure speed on isolated tasks that look nothing like real work.

METR did it differently.

They used actual state-of-the-art tools: Cursor Pro with Claude 3.5 and 3.7 Sonnet — the best coding AI available in early 2025. They gave developers 246 real issues to fix across actual open-source repositories they knew well. Bug fixes. New features. Real codebases with real complexity.

Then they randomly assigned each task: some with AI access, some without.

That random assignment is what matters. It means the results are not confounded by "developers using AI choosing easier tasks" or "motivated developers working harder with tools they like." The randomisation controls for all of it.

The result: AI made experienced developers 19% slower.

Not beginners. Not developers on unfamiliar code. Experienced developers, on their own codebases, with the best tools available.

Why it happens

The study doesn't fully explain the why, but the pattern is recognisable to anyone who has spent serious time with AI coding tools.

The integration tax is real.

When you write code yourself, you are in direct conversation with the problem. Your fingers and your thinking are coupled. Bringing in an AI breaks that coupling. You have to explain the problem, evaluate the output, figure out what it got wrong, course-correct, re-evaluate. For a simple, isolated task — generate this function, write this test — the AI is fast. For a task with real complexity and context, the back-and-forth costs more than you save.

Debugging AI output is expensive in a specific way.

When you write a bug, you have a theory. You wrote the code, you know what you were thinking, you can reason about where it went wrong. When an AI writes a bug, you have no theory. You are reading code you did not author, trying to reverse-engineer the reasoning behind a decision that was essentially stochastic. Stack Overflow's 2025 developer survey found that 45% of developers say debugging AI-generated code is time-consuming. That is not 45% of developers who hate AI. That is nearly half of regular AI users acknowledging the hidden cost.

Confidence is miscalibrated.

The most interesting thing in the METR study is the perception gap. Developers felt faster. They reported enjoying the experience. They said they would keep using the tools.

They were slower.

One possible explanation: AI assistance changes the subjective feel of work without changing the objective output. Typing less feels like doing more. Having something to react to feels more productive than staring at a blank editor. The cognitive load shifts — you are orchestrating rather than generating — and orchestration feels lighter, even when it is not faster.

The counterevidence that is also true

Stopping here would be dishonest.

Because here is what happened at Spotify.

In December 2025, Spotify's co-CEO Gustav Söderström told analysts during the Q4 earnings call that since deploying Claude Code with Opus 4.5, senior engineers at Spotify had shifted almost entirely from writing syntax to generating and supervising AI-produced code. Spotify built an internal background agent called Honk, running on MCP, that handles source-to-source transformations across their repositories autonomously.

That is not 19% slower. That is a fundamentally different workflow.

And Faros AI — a developer metrics company — published a counter-study in February 2026 that measured something METR couldn't: parallel work. Their data across real development teams showed that developers on high-AI-adoption teams handled 47% more pull requests per day and completed 21% more tasks. Not because individual tasks got faster. Because AI allowed developers to run multiple workstreams simultaneously.

So which is it? Slower or faster?

Both. Depending on what you are measuring and how you are working.

METR measured isolated task completion. Faros measured total throughput across a workflow. Spotify measured a specific, mature, team-scale deployment of background agents.

These are three different questions about three different things. They are all real data about real teams. The mistake is treating any one of them as the definitive answer.

The productivity paradox explained simply

Here is the framework that makes sense of the contradictions.

Think of AI coding tools as a new type of vehicle.

A bicycle is faster than walking on a flat road. Slower than walking up a steep staircase. Dramatically faster than walking on a long, smooth highway.

AI coding tools are the same.

Where AI is genuinely faster:

Greenfield code with few constraints
Boilerplate: CRUD routes, test scaffolding, config files
Unfamiliar territory: a language or framework you don't know well
Parallelisable work: background agents handling routine tasks while you focus on hard problems

Where AI is slower:

Complex, mature codebases where context is everything
Tasks where understanding the code is most of the work
Debugging — your own bugs or the AI's
Architecture and design decisions that require judgement the AI does not have

The experienced developers in the METR study were working on mature, complex, open-source codebases they already knew deeply. That is the hardest possible environment for AI assistance. Of course it was slower.

Your experience probably varies by task type in exactly this pattern, even if you have not consciously noticed it.

What the data says you should actually do

Stop measuring by feel.

The METR result — developers who were slower being convinced they were faster — is a warning about a specific kind of self-deception. The experience of using AI tools is genuinely different from traditional coding. Different feels faster. Faster is not guaranteed.

Track your actual velocity. Not your subjective sense of productivity. Cycle time on tasks. PRs merged per week. Time from ticket open to deployment. Measure the same things with and without AI assistance, on similar tasks. Let the data tell you where the tool helps.

Be specific about what you are automating.

The developers getting the most from AI tools in 2026 are not the ones who use AI for everything. They are the ones who have mapped their specific workflow and identified the stages where AI genuinely accelerates them — and use it deliberately there.

Boilerplate: yes. Initial test cases: yes. Explaining a bug to a rubber duck: yes. Core business logic that needs to be exactly right: review carefully. System design: don't delegate.

Background agents are a different thing from Copilot.

The Spotify result and the METR result are not contradictions — they are measuring different things. Cursor's inline suggestions change how you write individual lines. Background agents like Claude Code running autonomously change how you structure your whole workday. The second one is the more significant shift, and most developers have not made it yet.

If you are still primarily using AI as autocomplete, you are using the least powerful part of the technology.

The uncomfortable question underneath all of this

The METR study has a detail that deserves more attention than it gets.

Developers who used AI tools said they found the experience more enjoyable. They said they would keep using them even knowing the results. Some said they thought of it as an investment — learning to work with AI now, for future tools that would be more capable.

That reasoning might be exactly right.

The tools available in July 2025 are not the tools available today. Claude Code with Opus 4.7 is not Cursor with 3.5 Sonnet. Spotify's Honk didn't exist when METR ran their study. The 2026 update METR published with late-2025 tools showed a meaningfully different picture.

So the honest answer is: the data on current tools is still being written.

What is already clear is that the gap between what developers feel and what the data shows is large enough to matter. And the developers who are measuring rather than assuming are the ones who will understand fastest when the tools cross the threshold where they reliably accelerate rather than slow down.

That threshold might already be here for some workflows.

It is not here for all of them yet.

Are you tracking your actual velocity with and without AI, or going by feel? The METR result suggests most of us are going by feel. Comments below.

Sources:

METR Study, July 2025 — the original randomized controlled trial
Faros AI Counter-Study, February 2026 — parallel workstream data
Stack Overflow Developer Survey 2025 — debugging AI code finding
arXiv paper — peer-reviewed version of the METR findings

You probably have a feeling about whether AI makes you faster.

Most developers do.

And that feeling is almost certainly wrong.

Here is what they found:

Developers with AI tools took 19% longer to complete tasks than developers without them.

And here is the part that is even harder to sit with:

Before seeing the results, those same developers predicted AI would make them 24% faster.

After seeing the results, they still insisted they felt 20% faster.

The data said slower. The developers felt faster. Both things were simultaneously true.

That gap — between what you experience and what is actually happening — is what this post is about.

How the study worked

Most AI productivity studies are garbage.

They use weak proxies. They are vendor-sponsored. They test toy problems on junior developers. They measure speed on isolated tasks that look nothing like real work.

METR did it differently.

Then they randomly assigned each task: some with AI access, some without.

The result: AI made experienced developers 19% slower.

Not beginners. Not developers on unfamiliar code. Experienced developers, on their own codebases, with the best tools available.

Why it happens

The study doesn't fully explain the why, but the pattern is recognisable to anyone who has spent serious time with AI coding tools.

The integration tax is real.

Debugging AI output is expensive in a specific way.

Confidence is miscalibrated.

The most interesting thing in the METR study is the perception gap. Developers felt faster. They reported enjoying the experience. They said they would keep using the tools.

They were slower.

The counterevidence that is also true

Stopping here would be dishonest.

Because here is what happened at Spotify.

That is not 19% slower. That is a fundamentally different workflow.

So which is it? Slower or faster?

Both. Depending on what you are measuring and how you are working.

METR measured isolated task completion. Faros measured total throughput across a workflow. Spotify measured a specific, mature, team-scale deployment of background agents.

These are three different questions about three different things. They are all real data about real teams. The mistake is treating any one of them as the definitive answer.

The productivity paradox explained simply

Here is the framework that makes sense of the contradictions.

Think of AI coding tools as a new type of vehicle.

A bicycle is faster than walking on a flat road. Slower than walking up a steep staircase. Dramatically faster than walking on a long, smooth highway.

AI coding tools are the same.

Where AI is genuinely faster:

Greenfield code with few constraints
Boilerplate: CRUD routes, test scaffolding, config files
Unfamiliar territory: a language or framework you don't know well
Parallelisable work: background agents handling routine tasks while you focus on hard problems

Where AI is slower:

Complex, mature codebases where context is everything
Tasks where understanding the code is most of the work
Debugging — your own bugs or the AI's
Architecture and design decisions that require judgement the AI does not have

Your experience probably varies by task type in exactly this pattern, even if you have not consciously noticed it.

What the data says you should actually do

Stop measuring by feel.

Be specific about what you are automating.

Boilerplate: yes. Initial test cases: yes. Explaining a bug to a rubber duck: yes. Core business logic that needs to be exactly right: review carefully. System design: don't delegate.

Background agents are a different thing from Copilot.

If you are still primarily using AI as autocomplete, you are using the least powerful part of the technology.

The uncomfortable question underneath all of this

The METR study has a detail that deserves more attention than it gets.

That reasoning might be exactly right.

So the honest answer is: the data on current tools is still being written.

That threshold might already be here for some workflows.

It is not here for all of them yet.

Are you tracking your actual velocity with and without AI, or going by feel? The METR result suggests most of us are going by feel. Comments below.

Sources:

METR Study, July 2025 — the original randomized controlled trial
Faros AI Counter-Study, February 2026 — parallel workstream data
Stack Overflow Developer Survey 2025 — debugging AI code finding
arXiv paper — peer-reviewed version of the METR findings

AI Made Developers 19% Slower. They Thought It Made Them 24% Faster.

How the study worked

Why it happens

The counterevidence that is also true

The productivity paradox explained simply

What the data says you should actually do

The uncomfortable question underneath all of this

ZyVOP

Comments (0)

AI Made Developers 19% Slower. They Thought It Made Them 24% Faster.

How the study worked

Why it happens

The counterevidence that is also true

The productivity paradox explained simply

What the data says you should actually do

The uncomfortable question underneath all of this

ZyVOP

Comments (0)

Related Posts

Developer‑Centric AI Hits a Tipping Point: Performance, Open Models, and Scaling Pain | The AI Daily Roundup

Three Surveys. Three №1 Tools. All of Them Correct.

Brick: The LLM Router That Skips the Cascade and Still Cuts Your Bill

Is AI Actually Cheap Enough to Replace Developers?

This Week in AI: Claude Goes Dark, SpaceX Buys Cursor for $60B

Popular Tags