
Photo by Craig Pattenaude on Unsplash
An earlier METR study placed AI tools in a more positive light than the latest, but of course there was a catch:
Although AI systems have learned to perform an impressive set of tasks, they struggle to complete those tasks with the consistency and accuracy demanded in real-world settings. The results of the March METR study, for example, were based on a “50 percent success rate,” meaning the AI system could reliably complete the task only half the time—making it essentially useless on its own. This gap makes using AI in a work context challenging. Even the most advanced systems make small mistakes or slightly misunderstand directions, requiring a human to carefully review their work and make changes where needed.
That seems to be the reason for the newest result, which METR found 20% slower completion of tasks when using AI tools than when working without. This, despite how the participating developers in the study had estimated a 20% gain themselves at the end of the study. They felt they were going faster but in fact were going slower. Why?
Developers ended up spending a lot of time checking and redoing the code that AI systems had produced—often more time than it would have taken to simply write it themselves. One participant later described the process as the “digital equivalent of shoulder-surfing an overconfident junior developer.”
Coding with these things amounts to two steps forward and one step back, just as what was pointed out in my earlier post on the matter.