RSS Bilingual Reader

Remember how I warned you a year ago that maintaining GenAI code would be harder than writing code with GenAI?

Any coder with any chops at all knows that is one thing to write code, and another to debug it (and still another to maintain it, a year or a decade later, which is even harder)

And remember how Nathan Hamiel and I warned you in August that

Right on cue, big problems have indeed started to arrive. FT just reported that “Amazon holds engineering meeting following AI-related outages”:

H/t Lukasz Olejnik, whose highlighted screen capture I reprint. Elon Musk himself took note

A new study from Sun Yat-sen University and Alibaba reports similar observations in a new benchmark that focuses on long-term maintainability:

As Chris Laub summarized the study on X,

Alibaba tested 18 AI coding agents on 100 real codebases, spanning 233 days each. they failed spectacularly. [It] turns out passing tests once is easy. maintaining code for 8 months without breaking everything is where AI completely collapses.

In fairness, some of the latest systems have done better than earlier ones, but for mission critical systems, even a small number of errors can be deadly. As Amazon is discovering in the real-world.

We may well move to a regime in which AI writes most code — but for a long time to come we are going to need humans to fix the mess.

Subscribe now

“A spate of outages, including incidents tied to the use of AI coding tools”, right on schedule