<
https://pivot-to-ai.com/2025/07/11/ai-coders-think-theyre-20-faster-but-theyre-actually-19-slower/>
"Model Evaluation and Threat Research is an AI research charity that looks into
the threat of AI agents! That sounds a bit AI doomsday cult, and they take
funding from the AI doomsday cult organisations, but they don’t seem strident
about it and they like to show their working. So good, I guess.
METR funded 16 experienced open-source developers with “moderate AI experience”
to do what they do. They set the devs to work in projects they knew well,
fixing 246 real bug reports in each project — not synthetic tasks like in your
typical AI coding benchmark.
The projects were large and reasonably popular — over a million lines of code,
average 22,000 stars on GitHub. It was real work on real software.
For each task, METR randomly told the dev to either use a chatbot helper —
Cursor with Claude Code — or use no assistance.
The developers predicted they’d go 24% faster with AI. After they’d done the
work, the developers said they’d done it 20% faster. But they’d actually been
slowed down by 19%. They thought they were faster, they were actually slower.
[blog post; paper, PDF]
When the devs use the AI, they’re spending less time looking for information
and writing code — and instead they’re prompting the AI, they’re reviewing the
AI, or they’re doing nothing while they’re waiting for the AI.
Even the devs who liked the AI found it was bad at large and complex code bases
like these ones, and over half the AI suggestions were not usable. Even the
suggestions they accepted needed a lot of fixing up.
The number of people tested in the study was n=16. That’s a small number. But
it’s a lot better than the usual AI coding promotion, where n=1 ’cos it’s just
one guy saying “I’m so much faster now, trust me bro. No, I didn’t measure it.”
The interesting thing is how the devs thought these tools were rocket fuel —
then the numbers showed the opposite.
When you’re developing software, you never claim you’ve optimised your program
without measuring the program’s performance. That should apply to claims about
dev tools. How many AI advocates measure their performance numbers properly,
not just self-reporting? It’s all vibe advocacy.
Devs are famously bad at estimating how long a software project will take. And
so it is here.
How you feel about your job is important. If an experienced dev enjoys using
the AI tab complete and their stuff is still good, I mean, fine. But if someone
claims a speedup, that’s a number, and they should show that number."
Via
The RISKS Digest Volume 34 Issue 71
https://catless.ncl.ac.uk/Risks/34/71#subj9
Cheers,
*** Xanni ***
--
mailto:xanni@xanadu.net Andrew Pam
http://xanadu.com.au/ Chief Scientist, Xanadu
https://glasswings.com.au/ Partner, Glass Wings
https://sericyb.com.au/ Manager, Serious Cybernetics