

Over the last year or so, I've been trying to describe what we actually do when we build software with AI, because too much of the conversation still treats AI like magic beans. "Vibe coding" never fit what we do. That term implies you prompt, accept, and pray.
There's a better name floating around now: agentic engineering. Andrej Karpathy floated the term earlier this year as the grown-up successor to "vibe coding" (which he also coined). Simon Willison picked it up and put real patterns behind it, most recently on a Lenny's Podcast episode. The label is small, but the distinction it draws is huge in practice.
Don't get me wrong, vibe coding has a place in building software. I've used it. When I'm exploring a design space, checking whether an idea is feasible, or standing up a throwaway script to answer a question, I'm happy to move at the speed of "whatever the model hands me, I'll accept."
That mode is optimized for information, not durability. You're buying answers. You're not building a product.
Agentic engineering is how real software gets built. The simplest way I've heard it framed is this: you are the senior engineer. The AI is a fast, tireless junior who can write a lot of code quickly but needs guardrails.
That mental model reshapes everything. You write the spec. You define the architecture. You review every diff. You run the tests. You push back when the output doesn't match your intent.
The leverage isn't that you type less. It's that the boring parts move faster, so your time compounds into the decisions that actually matter: what to build, how it should behave, and why.
Simon pointed out a pattern that really resonated with me, and there's another I'd add from our own work.
Test-driven development. TDD has never really resonated with a lot of developers' day-to-day workflow, but it is a strong way to set AI up for success. Define the tests first, then have AI write the implementation, and you get output that is much closer to what you intended. Behavior-driven development has always resonated more with me because it makes test intent easier for humans to read. We have been experimenting with ways to bring more of that into our flow.
Tight review loops. The model can produce two hundred lines that look right and have one subtle bug on line one thirty-seven. So we keep batches small, we run the tests after every change, and we read every diff. The faster the generation, the more important it is that something or someone is checking it before mistakes compound across files. Treat AI output as something to verify, not something to trust.
You can point a team of agents at a product, one reviewing requirements, one building features, one reviewing PRs, one running tests, and get working software out the other side. I've done it. It works.
What it does not do by itself is make feature 1000 as easy to build as feature 1. Codebases get heavier over time. Patterns get harder to change. A bad integration decision in month two can cost you a week of rework in month ten.
That part still needs experienced engineers. Someone has to decide the data model before the agents pave it in. Someone has to notice when three features are secretly the same feature and unify them. Agents will cheerfully build the fourth slightly different version of the thing you already have.
Foundations are what let speed compound. Without them, you get a codebase that's easy to add the next feature to and impossible to change when reality shifts.
AI doesn't understand your system. It doesn't know your users, your constraints, or the decisions you made last year that you can't revisit now. That context lives with the engineers.
Models are accelerants. They make good engineering faster, and they make bad engineering faster. The difference between the two is the discipline around them, not the model itself.
If you're serious about using AI in real product work, invest in the engineering habits that make the automation worth the acceleration. That's where the leverage is hiding.