Soft ROI vs. Hard ROI: Why Businesses Struggle to Adopt GenAI at the Process Level

May 26, 2026

🕒 8 minutes

If enterprises are investing billions into GenAI and consuming record amounts of tokens, why are measurable business outcomes still so elusive?

According to a 2025 MIT report, 95% of generative AI pilots are failing. But… what’s behind that figure?

We are seeing two major answers to that question.

Soft ROI vs. Hard ROI: Why Businesses Struggle to Adopt GenAI at the Process Level

1) The copilot problem:

Part of the answer may be that most organizations are still using GenAI primarily as a copilot: a tool that improves individual productivity, but struggles to generate operational transformation at a bigger scale.

With that, we aren’t saying that the time and efficiency gained with copilots and chatbots are useless. They are not. Helping employees write faster, summarize information, search internal knowledge, draft code, or move through daily tasks with less friction can absolutely create value.

The problem is that this value often stays at an individual level.

And when productivity gains remain tied to how each person works, they become harder to measure, harder to capture, and harder to connect to the Profit & Loss (P&L). A developer may save 100 minutes a day with an AI coding assistant. But unless the business redesigns the workflow around that saved time, those 100 minutes may disappear into more meetings, more context switching, or simply more fragmented work.

In other words, the gain exists. But it has not been institutionalized.

That may be one of the biggest reasons companies still struggle to move from soft ROI to hard ROI with GenAI.

Soft ROI is usually tied to productivity, convenience, employee experience, or faster execution at the individual level. Hard ROI appears when AI changes the operational structure of the business itself: lower processing costs, fewer manual interventions, reduced error rates, faster throughput, and the ability to scale output without growing the team at the same pace.

2) The use case problem:

But there is another problem hiding behind many failed GenAI initiatives: not every use case is a good fit for probabilistic systems like LLMs.

By their very nature, large language models can hallucinate. That makes them difficult to trust in workflows that require deterministic logic, strict governance, or highly sensitive decision-making. In many enterprise scenarios, a system that is “usually correct” is simply not reliable enough.

But there are domains where GenAI appears to fit much more naturally.

Modern multimodal models are exceptionally good at language, semantic interpretation, document understanding, transcription, and text extraction. In clean printed documents, OCR benchmarks often report character-level error rates below 1%, while handwriting or lower-quality scans can move closer to the 3–5% error range. For field extraction, which is usually the more relevant metric in business processes, modern AI-based document systems are often benchmarked in the 95–99%+ accuracy range, depending on document quality, layout consistency, and the complexity of the fields being extracted.

The “tokenmaxxing” symptom

A new term has recently started circulating across corporate America: tokenmaxxing.

The concept, discussed in pieces like this Business Insider article, refers to the push to aggressively increase AI token consumption as a visible signal of AI adoption. In simple terms, companies and employees are starting to treat the number of tokens they burn as proof that they are serious about AI.

On the surface, the logic is easy to understand. AI companies often price their products based on token usage, so tokens become a convenient way to measure how much employees are interacting with AI systems. If usage is rising, leaders can point to dashboards and say adoption is growing.

Some enterprises, like JP Morgan or Disney, are already moving in that direction, with internal dashboards tracking how employees use AI tools.

But this is also where the metric becomes tricky.

A leaderboard or dashboard can create friendly competition, but it can also create the wrong incentives. People may start using GenAI because the activity is visible, not because the use case is meaningful. They may burn tokens on low-value tasks, force AI into work that does not need it, or learn how to game the system so their adoption numbers look better.

For CFOs, this can quickly become a nightmare. Token consumption may look like momentum on an AI Adoption Dashboard, but it also shows up as a growing bill. Without clear KPIs for AI value creation, it becomes difficult to know whether the company is investing in real transformation or simply running up its AI spend.

There is also the issue of AI sprawl.

When companies encourage broad experimentation without a clear direction, teams often start building overlapping tools that create new silos between departments. One team builds an assistant for finance. Another creates a similar tool for procurement. A third launches its own chatbot for operations. Each project may help solve a local problem, but together they can duplicate functionality, fragment data, blur ownership, and create tools that do not talk to each other.

That is essentially Shadow AI, the GenAI version of Shadow IT.

And this is probably the biggest limitation of tokenmaxxing as an ROI strategy. More token usage does not necessarily mean companies are getting better at applying AI where the business actually needs it.

A lot still depends on the individual employee: how well they understand the tool, how often they use it, what kind of work they bring to it, and whether they even know where AI can realistically create leverage.

At the same time, the AI bill keeps growing. Encouraging employees to consume more tokens can dramatically increase costs, so eventually, companies need to ask a harder question: are these experiments, assistants, and productivity gains actually creating enough operational value to justify that increase in spend?

Saving time in isolated tasks may help. Building internal AI tools may help too. But if those gains remain fragmented at the individual level, it becomes difficult for the business to capture them structurally.

So companies can end up with rising adoption numbers, rising token bills, and still very little change in the workflows underneath.

That is where a use case like SAP Document AI starts looking different. As a feature available within SAP BTP, Document AI can move GenAI from individual experimentation into a specific business process. Instead of hoping employees find value on their own, the technology becomes part of the workflow itself, helping institutionalize productivity gains, reduce repetitive manual work, and redefine the role humans play in the process.

Let’s look at that more closely.

Why SAP Document AI May Be One of the Clearest Paths to Hard ROI

Recently, we had the chance to test some of these ideas in practice during a hands-on workshop held at SAP’s Dallas offices.

Together with SAP, we brought together 15 participants from industries including telecom, energy and utilities, pharma, automotive, food and beverage, and high tech to explore how SAP Document AI could automate document-heavy processes and integrate that data directly into SAP BTP workflows.

From the beginning, we tried to structure the session around a very specific idea: avoid generic demos as much as possible.

Instead of walking everyone through the same predefined scenario, participants worked around their own operational realities. Some teams were dealing with invoice intake across multiple regions and languages. Others were trying to reduce repetitive ERP entries, simplify fragmented approval chains, or improve document validation processes that still depended heavily on manual intervention.

Our technical team worked directly with participants so they could get hands-on with the platform, configure workflows, test extraction logic, and start building small PoCs connected to their own processes. The idea was for companies to leave with something they could realistically continue evaluating internally afterward, not simply a presentation about the tool.

As Nick Baca-Storni, CRO at Inclusion Cloud, described it after the session: “What we wanted to avoid was the typical AI demo where everything works perfectly because the environment is artificial. The value comes when companies can start testing the tool against their own bottlenecks, documents, approval flows, and operational constraints.”

We also tried to keep the business conversation grounded in operational metrics:

Alongside the technical exercises, we worked with an ROI calculator developed by Inclusion Cloud to compare the current costs of manual or semi-automated document-processing workflows against what those same processes could look like with Document AI integrated.

Inside an energy company’s document workflow

One of the scenarios we used during the workshop came from a large energy company in the U.S. managing thousands of documents across multiple regions, formats, and languages. In the current workflow, much of the process still depends on people manually reviewing documents, extracting information, validating fields, routing files between teams, and making sure the correct data eventually reaches the ERP.

SAP Document AI workshop - Inclusion Cloud

That is exactly the type of workflow where DocAI starts making operational sense.

The task itself is relatively bounded. Documents follow recognizable structures. Validation rules already exist. And when the system detects uncertainty, exceptions can still be routed to human reviewers instead of forcing fully autonomous automation.

Using the ROI calculator, we estimated that the payback period approached six months.

That obviously does not mean every organization will see ROI on the same timeline. Different companies operate with different levels of process complexity, document quality, operational maturity, and integration requirements. But the workshop reinforced something we have been seeing repeatedly with Document AI: document-heavy workflows tend to offer a clearer path toward measurable operational impact than many other GenAI use cases.

Why this use case fits GenAI better

Part of the reason is that document processing operates inside a more bounded environment.

Invoices, purchase orders, forms, shipping documents, and operational records usually follow repetitive structures and relatively clear validation rules. Modern multimodal models tend to perform well in that type of environment because the task itself is narrower and easier to validate: classify, extract, compare, validate, route, escalate.

That is very different from deploying GenAI inside highly ambiguous workflows where hallucinations can multiply operational risk much more quickly.

And perhaps more importantly, the gains become easier to institutionalize.

Instead of depending on whether employees decide to use AI effectively on a given day, the workflow itself starts changing. Documents can move automatically through validation layers, ERP checks, routing logic, and approval flows before humans need to intervene.

That also changes the role humans play in the process.

Instead of spending hours reading, copying, validating, and routing repetitive information manually, teams can focus more on exception handling, approvals, governance, and operational analysis.

Nick Baca-Storni summarized it this way after the workshop: “The discussion stopped being about prompts or experimentation and became much more concrete: how much repetitive work exists today, what part of it can realistically be automated, and how that changes the economics of the process.”

That is where DocAI starts connecting the two major problems many organizations are facing with GenAI adoption:

The use case itself is better aligned with what modern multimodal models are actually good at. These are language models with strong capabilities for reading, interpreting, and extracting text from documents. In relatively controlled workflows, where the documents follow recognizable patterns and the extracted fields can be validated, the margin of error can be low enough to support automation at volume. That does not mean companies should remove humans from the process entirely. In most enterprise environments, it still makes sense to keep people in the loop at specific control points, especially for exceptions, low-confidence extractions, unusual formats, or decisions that require business judgment.

With Document AI, the company can rethink the process itself. With copilots, saved time can easily disappear into the daily fog of work. An employee may become faster, but the organization may not redesign the process around that extra capacity. The gain exists, but it stays informal. If document intake, extraction, and validation become automated inside the workflow, the organization can redefine what the teams involved in that process actually do. Instead of spending hours on repetitive document handling, people can be reassigned toward higher-value tasks where GenAI is less reliable, such as analyzing exceptions, reviewing potential risks, improving process controls, or handling cases that require judgment.

Conclusion

The current GenAI wave may be forcing enterprises to confront a much bigger question than simply:

“How do we get employees to use AI?”

The real question is becoming:

“How do we redesign the business so AI can generate structural operational value?”

That distinction may ultimately define the difference between soft ROI and hard ROI.

Over the last two years, most organizations approached GenAI primarily as a productivity layer:

Copilots,

Chat interfaces,

Summarization tools,

Coding assistants,

Enterprise search systems.

Those tools absolutely created value. They helped employees move faster, reduce friction, and access knowledge more efficiently.

But adoption alone does not guarantee transformation.

That is why so many companies now find themselves in a strange middle ground. AI usage is growing. Token consumption is exploding. Executives continue investing billions into GenAI initiatives. Yet measurable business impact often remains difficult to isolate.

Part of the problem is that many organizations are still measuring AI through activity metrics:

Prompts,

Active users,

Token consumption,

Or adoption dashboards.

But those metrics mostly describe quantity but not quality.

And that is where the idea of trapped productivity becomes important. Saving 90 minutes per day for an engineer may absolutely be a real efficiency gain. But unless the organization redesigns the surrounding process, that saved time can easily disappear into the normal chaos of enterprise work: more meetings, more context switching, more parallel tasks, or fragmented coordination across systems.

That is why so many companies are starting to encounter what analysts increasingly describe as incremental AI gains rather than transformational ones. AI improves parts of the workflow, but the operational architecture underneath often remains intact.

The companies that create the most long-term value from GenAI may not necessarily be the ones generating the most prompts, burning the most tokens, or deploying the most copilots. They may be the organizations capable of redesigning workflows, operational models, and business architectures around where AI can realistically create measurable leverage.

That is also why we believe tools like SAP Document AI are becoming such important starting points for enterprise AI adoption. They offer a more controlled environment, clearer operational boundaries, measurable KPIs, and a realistic path toward institutionalizing efficiency gains inside the workflow itself.

If your organization is currently evaluating GenAI initiatives or exploring a Document AI PoC, feel free to contact us. Our team can help you evaluate operational use cases, estimate potential ROI, and design a PoC aligned with your real business workflows instead of a generic demo.