This is so true. Reviewing a MR from someone who uses AI takes 10 times more time, because you can’t trust that ANY thought has gone into each decision.
Catastrophically bad decisions (such as OP noted, which crash prod…) are instead camouflaged as GREAT decisions and backed up by a mountain of text. It’s a garbage fire.
Unfortunately, the next step is for AI to get better and better at generating verification trails that look correct but are not.
I mean, they already do that, right? If buggy code comes with polished documentation and passing unit tests, that’s a verification trail that looks correct but is not. The problem is that, up until recently, those were broadly assumed to be good enough even though we’ve known for decades that flawed unit test suites can ignore or even obscure bugs, documentation can be incorrect or out-of-date, because their existence implied, reasonably safely, that the dev who wrote the code understood it well enough to write the doc and the tests. The signals of correctness have always been imperfect, they were just good enough that we got by with them until now. Now we need to think of something more rigorous.
I was responding to the following paragraph in the article:
We used to get proof-of-thought for free because producing a patch took real effort. Now that writing code is cheap, verification becomes the real proof-of-work. I mean proof of work in the original sense: effort that leaves a trail: careful reviews, assumption checks, simulations, stress tests, design notes, postmortems. That trail is hard to fake. [emphasis mine] In a world where AI says anything with confidence and the tone never changes, skepticism becomes the scarce resource.
I am a bit wary that the trail of verification will continue to be so “hard to fake”.
Right, I guess I’m more taking issue with the article than with you, because the “trail” is always documents, and it’s pretty easy for LLMs to fake documents. I mean, humans have been half-assing verification checks for decades, and it has kind of worked because even half-assing a verification document has required at least some fluency with the code under test, which in turn has required the engineering team to develop enough understanding of the code that they can maintain it, just to produce plausible verification trail documents. Now, the relationship between plausible documentation and a dev’s understanding of the code being verified is much less reliable, so we need more precise mechanisms. In other words, the signals of trust have always been broken, it’s just that it hasn’t been a problem up until now because there were side effects that made the signals a good-enough proxy for what we actually wanted. Now that proxy is no longer reliable.




