Vibe Coding Is Ambiguous — and That’s a Deal Breaker for Enterprise Standards

October 29, 2025

🕒 7 minutes

“The hottest new programming language is English.”

That’s how Andrej Karpathy, co-founder of OpenAI, described the new wave of vibe coding – a style of software creation where developers (and sometimes non-developers) describe what they want in natural language, and AI writes the code.

The promise is undeniably exciting: the idea that you can create almost anything your imagination allows… and maybe even challenge the long-standing dominance of programmers in writing code and building systems. Some even say it could wipe most of them out in one clean sweep.

But how much of that is really possible, and how much is just smoke and mirrors?

Because as James Gosling, the creator of Java, told The New Stack, “as soon as your vibe coding project gets even slightly complicated, they blow their brains out.”

In other words: vibe coding might work for quick demos or prototypes. But for enterprise software, where systems must run “every f**ing time”*, as Gosling added — it simply doesn’t hold up.

In this article, we go through the three reasons why English (or any natural language) can’t replace your favorite programming language.

And why it’s far too soon to think AI could replace most engineers overnight.

vibe coding and technical debt — Every ambiguous prompt is a potential security hole.

1) Is the Code in the Data AI Learns From Good Enough?

Let’s start with the most uncomfortable truth: AI code generators learn from public repositories — GitHub, Stack Overflow, and millions of open datasets floating around the internet. And like everything online, there’s a bit of everything in there: brilliant work, half-finished experiments, and a whole lot of “it runs, so ship it.”

The problem is that during training, models don’t really know the difference between good and bad code. They aren’t taught to evaluate quality, performance, or security. Instead, they optimize for statistical correlation — learning which words, symbols, or code structures often appear together in context.

So if thousands of examples in their training data include a bad practice (like poor error handling or insecure database queries) the model absorbs it just the same as the good ones. It can’t tell which snippet came from a senior engineer at Google and which came from a weekend hobbyist who never finished the project.

As Simon Ritter, Deputy CTO at Azul Systems, said in The New Stack, “GitHub is full of quick hacks, abandoned experiments, and code that was never reviewed. Train an AI on mediocre code, and you’ll get mediocre results.”

Sure, there’s also high-quality code out there (open-source frameworks, academic projects, and production-grade libraries). But without explicit quality signals or metadata, the model treats everything equally.

And that’s where the problem begins: AI isn’t learning how to code well; it’s learning how code looks.

That’s fine for prototyping. For enterprises, this is a dealbreaker. Their systems can’t afford “good enough.” When you’re managing healthcare data, payments, or logistics, every line of code has to be right — not 90% of the time, but 100%.

2) Can English Really Be a Programming Language?

Some of the biggest voices in tech have suggested that we’re entering a new era where anyone can code just by talking to a computer. Today, we already communicate with machines through programming languages. But the new promise goes further: to do it using natural language, whether it’s English, Spanish, French, or any other native tongue.

For example, Jensen Huang, CEO of NVIDIA, said that “AI means everyone can now be a programmer — you just have to say something to the computer.” Andrej Karpathy, co-founder of OpenAI, went even further, calling English “the hottest new programming language.”

It’s an appealing idea without a doubt. The notion that software creation could become as natural as speaking. But it’s worth taking a closer look to understand whether natural language could really function as a good programming language, one capable of building enterprise-level applications and systems.

The idea has gained traction because it feels intuitive: if large language models can understand human instructions and generate working code, why shouldn’t natural language be considered the next programming language?

But that logic misses a key point. Programming languages exist precisely because natural language isn’t precise enough. They were invented to remove ambiguity — to make every instruction explicit, reproducible, and machine-readable.

As a recent article from The New Stack explains, this is the real reason English can’t replace Python, Java, or C++. Human language is full of nuance and interpretation, while programming requires determinism. When we say “make it faster” or “improve security,” a person might understand the context, but an AI model doesn’t. It predicts what seems statistically correct based on patterns in its training data, not what’s technically or logically right for your system.

Let’s take a simple example.

Imagine you ask an AI model to “add a login system that’s easy to use and secure.”

To a human engineer, that phrase triggers a series of design and implementation decisions: using OAuth2 or SSO, validating inputs, hashing passwords, managing sessions, and applying MFA policies depending on the context.

To an AI, the same instruction could mean almost anything. It might generate a basic username-password form, or it might skip encryption entirely. “Easy to use” could be interpreted as removing password complexity requirements, while “secure” might mean simply adding HTTPS.

That’s the danger of ambiguity. What’s clear to us as humans becomes distorted when filtered through a model trained to reproduce what typical code looks like — not necessarily how it should be built. These systems aren’t applying best practices, nor are they adapting solutions to the specific needs, architecture, or compliance standards of the business they’re coding for.

In other words, large language models don’t actually understand intent, they just complete text based on probability. When you give them an instruction, they predict which sequence of words or lines of code is most statistically likely to follow. And because they learn from existing public code (which, as we saw, varies greatly in quality), they can easily reproduce technical debt, bad habits, outdated practices, or even hidden vulnerabilities found in their training data.

Why senior engineers still have plenty of work ahead

This is where senior engineers remain essential. They know that a vague goal like “optimize performance” actually involves dozens of concrete technical decisions: caching API calls, reducing payload sizes, profiling queries, or rethinking data structures. They also know how to verify that what the AI generated actually works under enterprise conditions.

Now, when people without programming experience use these tools (a trend often described as a sort of “democratization” of software development), the situation gets even more ambiguous. A non-developer may not write a detailed enough prompt, nor have the skills to review or debug what the AI produced. The result is that errors move from the coding phase to the review and debugging phase — and that usually means more time, not less.

That’s the gap between what Andrej Karpathy originally called “vibe coding” — giving the AI a light description and getting something that seems to work — and what real software engineering requires. In the enterprise world, “it works” isn’t enough. Systems must integrate with existing architecture, respect compliance rules, protect data, and scale predictably.

That’s also why there’s such a difference between an AI tool in the hands of a senior developer and one used by a non-technical user. Senior engineers understand architecture, business needs, and user behavior. They know how to give AI the right context and how to verify what it delivers. When that expertise is missing, what looks like democratization can quickly turn into a wave of hidden technical debt.

3) Why Probabilistic Systems Can’t Guarantee Reliability

We’ve reached the third reason why vibe coding struggles to meet enterprise standards — and it might be the most fundamental of all.

Even if a model could perfectly interpret our intent and follow best practices, there’s a deeper limitation at play: large language models are probabilistic systems. That means they don’t always give the same answer, even when you ask the same question twice.

In traditional programming, if you run the same piece of code, you’ll always get the same result, it’s deterministic by design. AI models, on the other hand, generate text (or code) by predicting what’s statistically most likely to come next based on patterns in their training data. They don’t reason, they approximate.

This works beautifully for creative tasks like writing or idea generation, where variation is valuable. But in software development, unpredictability is the opposite of what you want. Two identical prompts can produce slightly different functions, logic flows, or dependencies, and since the model can’t actually verify which one is correct, every variation has to be tested, reviewed, and debugged by a human.

So, it’s not a matter of whether one model or another is better at writing code… the issue is how LLMs fundamentally work.

They were built to simulate language, not to guarantee correctness. That means the more complex your system becomes, the higher the risk of inconsistency. Imagine you ask the AI to generate a “login system with user authentication,” and later, another team asks for the same thing to integrate into a different module. Both outputs might look fine — but one could use cookies while the other relies on tokens. Each works on its own, yet when you try to connect them, they won’t recognize each other’s authentication method.

At scale, this kind of inconsistency fragments your architecture and increases technical debt.

Conclusion

We’re not against using AI to write code, quite the opposite. At Inclusion Cloud, we see it as one of the most exciting ways to accelerate development today. But there’s an important distinction between vibe coding and AI-assisted development.

When senior engineers use AI, they understand what’s happening behind every line of code that goes into production. If something fails, they know why and how to fix it. AI helps them move faster through repetitive tasks so they can focus on what really matters: building better architectures, optimizing costs, and improving how data flows across systems.

That’s the balance every company should aim for. Generative AI is an amazing creative partner, even a great co-developer. But it’s not ready to replace human judgment — or to be the architect just yet.

How inMOVE™ by Inclusion Cloud keeps the human + AI balance

That same mindset drives inMOVE™ by Inclusion Cloud, our AI-powered recruiting engine. We built it to find developers who know how to combine the best of both worlds: human expertise and AI efficiency.

inMOVE™ doesn’t just identify top talent — it helps us understand how they use AI. Which frameworks do they rely on? How do they integrate automation responsibly? That’s how we separate true engineers from vibe coders — the ones who know how to use AI to enhance their work.

Every candidate goes through a double validation process, with both HR and technical leaders reviewing their skills. It’s a human-in-the-loop approach that ensures we bring in people who understand systems, architecture, and business value — and who can use AI responsibly to deliver all three.