Cool. That sure sounds nice and simple. What do you do when the multiple LLMs disagree on what the correct tests are? Do you sit down and compare 5 different diffs to see which have the tests you actually want? That sure sounds like a task you would need an actual programmer for.
At some point a human has to actually use their brain to decide what the actual goals of a given task are. That person needs to be a domain expert to draw the lines correctly. There's no shortcut around that, and throwing more stochastic parrots at it doesn't help.
Just because you can't (yet) remove the human entirely from the loop, doesn't mean that economising on the use of the humans time is impossible.
For comparison have a look at compilers: nowadays approximately no one writes their software by hand, we write a 'prompt' in something like Rust or C, and ask another computer program to create the actual software.
We still need the human in the loop here, but it takes much less human time than creating the ELF directly.
It’s not “economizing” if I have to verify every test myself. To actually validate that tests are good I need to understand the system under test, and at that point I might as well just write the damn thing myself.
This is the fundamental problem with this “AI” mirage. If I have to be an expert to validate that the LLM actually did the task I set out, and isn’t just cheating on tests, then I might as well code the solution myself.
From a PM perspective, the main differentiator between an engineering team and AI is "common sense". As these tools get used more and more, enough training data will be available that AI's "common sense" in terms of coding and engineering decisions could be indistinguishable from a human's over time. At that point, the only advantage a human has is that they're also useful on the ops and incident response side, so it's beneficial if they're also comfortable with the codebase.
Eventually these human advantages will be overcome, and AI will sufficiently pass a "Turing Test" for software engineering. PMs will work with them directly and get the same kinds of guidance, feedback, documentation, and conversational planning and coordination that they'd get from an engineering team, just with far greater speed and less cost. At that point, yeah you'll probably need to keep a few human engineers around to run the system, but the system itself will manage the software. The advantage of keeping a human in the loop will dwindle to zero.
I can see how LLMs can help with testing, but one should never compare LLMs with deterministic tools like compilers. LLMs are entirely a separate category.