It is a startling admission when one of the people who helped build modern AI admits he has never felt more behind as a programmer. Here are the 10 most profound things Andrej Karpathy revealed about the shift from vibe coding to agentic engineering — and what it means for developers and enterprises.

It is a startling admission when one of the people who helped build modern AI, co-founded OpenAI, and got Tesla's Autopilot working admits that he has never felt more behind as a programmer. This was the exact sentiment Andrej Karpathy recently shared, describing a mixture of exhilaration and unsettlement at the rapid evolution of software development.
For Karpathy, the turning point happened in December. Having some time off, he noticed a stark transition: the latest AI models were generating perfect chunks of code, and he found himself trusting the system completely without needing to make corrections. He transitioned to "vibe coding" full-time, building an infinity of side projects. This wasn't just the "ChatGPT adjacent" experience of the previous year; it was the dawn of coherent, agentic workflows.
Based on his recent deep dive into this new computing paradigm, here are the 10 most profound things I learned from Andrej Karpathy about the shift from vibe coding to agentic engineering.
Karpathy argues that Large Language Models (LLMs) are not just better software; they represent an entirely new computer. To understand this, we have to look at the progression of programming:
When models are trained on sufficiently large sets of tasks across the internet, they implicitly become programmable computers. In Software 3.0, the context window is your primary lever, and the LLM acts as an interpreter that performs computations within the digital information space.
One of Karpathy's most mind-blowing revelations came from a side project called MenuGen. When you go to a restaurant, menus rarely have pictures, leaving you guessing what 30% to 50% of the items actually are. He built a traditional app hosted on Vercel to solve this. It allowed users to upload a photo, utilized OCR to read the text, ran an image generator to visualize the items, and then re-rendered the menu.
Then, he saw the Software 3.0 version. By simply feeding the raw photo of the menu to Gemini and instructing it to use an image generator to overlay the pictures directly onto the menu, the exact same result was achieved. There was no need for an app in between. The neural network handled the entire workflow—taking an image as the context and outputting a modified image directly. The old paradigm of stringing together services is spurious; the neural network is now raw and powerful enough to do the work natively.
In the Software 1.0 universe, installing a tool like OpenClaw required running a complex bash shell script. Because these scripts have to account for countless different operating systems, platforms, and computer types, they become bloated and incredibly complex.
In the Software 3.0 era, you don't use a shell script; you use a simple copy-paste instruction given to an AI agent. The agent utilizes its packaged intelligence to observe your specific computer environment, follow the high-level instructions, and intelligently debug the installation in a loop. The new programming paradigm is simply figuring out what text to copy-paste to your agent.
Karpathy famously coined the term "vibe coding," but he makes an important distinction between this concept and the more serious discipline of "agentic engineering."
For the people who master agentic engineering, the ceiling is incredibly high. The old concept of the "10x engineer" is magnified; developers who invest heavily in their agentic setups are peaking at speeds far beyond a 10x multiplier.
When extrapolating what computing might look like in 2026 and beyond, Karpathy paints a wildly foreign picture. In the 1950s and 60s, it wasn't obvious if computers would evolve to look like calculators or neural networks. We went down the calculator path, and today, neural nets run virtualized on top of classical computers.
In the future, this dynamic is going to flip. Neural networks will likely become the "host process" doing the heavy lifting, and traditional CPUs will be relegated to being co-processors—historical appendages used only for deterministic tool use. We might see entirely neural computers that ingest raw video and audio, utilizing diffusion to dynamically render custom user interfaces unique to that exact moment.
Despite their massive power, these models possess a "jagged" form of intelligence. Traditional computers easily automated what could be specified in code, but modern LLMs automate what can be verified.
Because frontier labs train these models in giant reinforcement learning (RL) environments driven by verification rewards, the models become incredibly capable at highly verifiable tasks like mathematics and coding. However, outside of those domains, they can stagnate. Karpathy points out an insane contradiction: a state-of-the-art model can easily refactor a 100,000-line codebase or discover zero-day vulnerabilities, but if you ask it whether you should walk or drive to a car wash 50 meters away, it will logically fail and tell you to walk. These models are not animals shaped by evolution; they are statistical "ghosts" lacking intrinsic motivation, curiosity, or common sense.
Because of how jagged these models are, users have to actively explore their capabilities. If you are working within the "circuits" that were heavily represented in the reinforcement learning data, you will fly at light speed. If you are outside of them, it feels like pulling teeth.
Karpathy points to the jump from GPT-3.5 to GPT-4. Chess capabilities improved massively not just because of a general capability progression, but because someone at the lab decided to include a huge amount of chess data in the pre-training set. As developers, we are at the mercy of whatever the labs decide to put into the mix. If your application falls out of the data distribution, you will likely need to fine-tune the model yourself.
As agents take on more work, they effectively function as extremely fast, capable interns. They have perfect recall for trivial details, meaning human developers no longer need to memorize API nuances.
For example, when working with neural network tensors across PyTorch, NumPy, and Pandas, Karpathy notes that he has entirely forgotten whether to use keep_dims, keep_dim, axis, reshape, permute, or transpose. The agentic intern handles all of that. However, the human must still understand the fundamental engineering concepts—such as knowing that manipulating a view is more memory-efficient than creating different storage—to ensure the intern isn't unnecessarily copying memory around.
If the AI is doing the coding, what is the human doing? According to Karpathy, human skill becomes centered around aesthetics, system design, and oversight.
Agents still lack fundamental judgment. In his MenuGen project, Karpathy used Google accounts for user sign-ups and Stripe for purchasing credits. His AI agent bizarrely tried to cross-correlate users' funds by matching their Google email to their Stripe email, failing to realize that a user might use two different email addresses for the two services. Agents will make these weird, illogical mistakes because they don't understand the real world. The human's job is to work with the agent to design detailed specs, enforce high-level categories, and ensure logical constraints (like tying funds to a unique, persistent user ID) are met.
Perhaps the most philosophical takeaway is regarding the future of human education and intellect. Karpathy cited a recent idea that blew his mind: "You can outsource your thinking, but you can't outsource your understanding."
Even with agents doing the processing, humans remain the ultimate bottleneck. We still have to understand why a product is worth building, what we are trying to achieve, and how to intelligently direct our agents. Karpathy uses LLMs to compile and reorganize large volumes of text into personal wikis, seeing them as incredible tools for enhancing his own understanding by offering different projections of data. Ultimately, LLMs currently do not excel at true understanding; they are tools that humans must uniquely direct.
As we look forward, the transition requires an entire refactoring of how we interact with technology. We are currently stuck in a world where documentation is still written for humans to read, frustratingly telling us what steps to manually take rather than giving us blocks of text to feed our agents.
Karpathy envisions an exciting "agent-native" future where workloads are decomposed into "sensors" and "actuators" over the world. Infrastructure will be described to agents first, and our personal agents will communicate with other agents to negotiate details like scheduling meetings. Until then, mastering Agentic Engineering is about learning how to expertly direct these powerful, jagged models—combining their infinite speed with our irreplaceable human judgment.
About the Author

Rejith Krishnan
Founder and CEO
Rejith Krishnan is the Founder and CEO of lowtouch.ai, a platform dedicated to empowering enterprises with private, no-code AI agents. With expertise in Site Reliability Engineering (SRE), Kubernetes, and AI systems architecture, he is passionate about simplifying the adoption of AI-driven automation to transform business operations.
Rejith specializes in deploying Large Language Models (LLMs) and building intelligent agents that automate workflows, enhance customer experiences, and optimize IT processes, all while ensuring data privacy and security. His mission is to help businesses unlock the full potential of enterprise AI with seamless, scalable, and secure solutions that fit their unique needs.