What is the steady state of LLMs and software development?

Originally posted on my LinkedIn page.

In my last AI-related post I described my angst about how the possibly massive impact of GenAI on software development is also so divisive in the industry. This post is different - if GenAI is going to write a lot of the software we write, how does that become a sustainable part of our process / Software Development Lifecycle? In other words, what's the steady state we're tending to?

This has come to my mind because of a few things. On one hand there's the discussion of disposable vs durable software, which both Charity Majors (here) and Ross Pettit (here) have written about. On the other there are the many anecdotal reports I hear of engineers having to review thousand-line changes on the regular - mostly filled with LLM-generated code.

The problem: validation

Engineering teams are faced with tricky choices when they use LLM-generated code for a production application. Should they validate the code the LLM has generated, and if so, how?

These questions aren't new - any industrial software team needs to wrangle with whether they perform code review, and how much and what type of testing they perform. But there is significantly different context when an LLM is in play, e.g.

An LLM can generate code far faster than a human. So when human review is required the proportion of an engineer's work is going to be more skewed to "review" than "write"
An LLM may generate code that is beyond the knowledge of a team member to provide effective review on
For human-generated code there's already been a review-cycle within the developer's own head, whereas an LLM is hallucinating all-the-time (it's just that sometimes the hallucinations are useful).
Engineer-written automated tests usually test at various levels of the code base - e.g. external "integration" tests vs. "unit" tests that thoroughly cover edge cases of domain logic. But when an LLM has designed and written the complete internal design of a codebase an engineer may not know where best to write unit tests, requiring slower integration tests to cover more business logic.

Disposable vs Durable

For "disposable" software we may not care too much about these questions - we can just throw something into production and see if it works. If it does, great, if not then get the LLM to fix it. Or just not worry. YOLO.

Frankly I'm not worried about what happens with the process around disposable software, it will work its way out. I know some people think that all software is going to become disposable and ... well, I don't agree. I would really rather my bank balance doesn't spontaneously drop to zero, nor all air-traffic-control systems in the world stop working. Call me old fashioned if you like.

My concern is what are we doing with "durable" software, a subset of which is "anything that's involved with the movement of money". Which is ... quite a lot of the software in the world.

For durable software we need to make sure that the system is (a) acting as intended at the time it is specified, (b) continues to act that way as changes are made over time, and (c) doesn't cost the world to change. I know, I've already lost the YOLOers with these quaintly boring concepts.

Two solutions

Personally, I only currently see two reasonable possible steady states for "durable" applications or components:

A - LLMs generate and maintain all the code, and humans only perform external system testing. The internal design and implementation is a black-box.

B - LLMs can be used as a tool, but all code must continue to be reviewed and understood by human members of the team.

(A) is a super interesting theory. Taken to its logical conclusion it means we no longer care about things like design, underlying language, code duplication, etc. As long as the code works in the functional and non-functional ways required, then it's valid. It's also a very fast way of building software, absolutely fitting with everything I see about agent swarms, and the like.

Unfortunately I just don't believe it's realistic, for durable software specifically. I think an LLM can write code; I'm yet to be convinced about all the other things that go into building software, like security, performance, etc. Maybe we can write code in small enough components that each component works this way, and humans join them all together ... but that seems a stretch. Also, there's always logic in the joins, and who writes that logic? I've seen this game of software pinball before.

If (A) does come to pass then by far the most valuable people on a software team are those that can write effective automated integration tests. Because we're going to need A LOT of them. So all you QA people, get ready!

(B) means that we're basically using an LLM like a super-charged IDE. We can write code faster, but at the end of the day the SDLC remains unchanged. We're going to need to make sure senior engineers aren't burned out just reviewing 10,000 line PRs all day, but that's definitely possible with better use of agentic LLM rules and some team culture guidelines. Personally, I suspect this is where we are headed, but I don't see how this fits into what people are saying about "10X improvement in development speed".

My best guess is that today many companies aren't settling on either of these strategies. I can't help but imagine, therefore, that the amount of unreviewed, untested, barely maintainable code being shipped today for durable systems is sky rocketing. I know one answer is "But LLMs will solve that down the road", and if so I look forward to being proven wrong. Or more than likely using an LLM, and my own experience, to fix the mess.