Why AI Costs Spike After the First Use Cases

by Michael Marolda Posted on April 13, 2026

The Hidden Second-Wave AI Tax and How Modular Agentic RAG Prevents It

Most organizations begin their AI journey with a single high-impact use case. Often, a chatbot, internal copilot or a knowledge assistant. The idea is to prove the concept and how AI can deliver ROI. Most of the time, this approach is fast and efficient, and it works exactly as planned.

However, shortcuts that make AI pilots successful can become liabilities when you try to expand beyond that initial use case. As experimentation with AI transitions into an operational imperative, these shortcuts can strain infrastructure and cause your costs to spike.

Why does this happen? Early deployments are typically siloed. Each team builds its own ingestion pipelines, indexing strategies, embedding generation workflows and retrieval configurations. As departments adopt AI independently, duplication multiplies with separate vector stores, repeated governance reviews, isolated large language model (LLM) integrations, and redundant tuning efforts. 

Moreover, each department's standalone AI experiment also requires maintenance, and as AI use grows, you’ll need an evaluation layer to ensure quality and performance. If you decide to add external web search, additional engineering time is required and must be governed. When new models are released, you’ll need to re-engineer your pipelines taking time away from important strategic projects.

According to McKinsey, 65% of organizations now use generative AI in at least one function, nearly double the previous year, and broader AI adoption has surged from 50% of organizations in 2022 to 88% in 2025. Among organizations that have reached limited or full-scale maturity, nearly 30% have already integrated AI agents into their operations. 

At these adoption rates, only organizations that have a modular foundation in place for scaling AI across the enterprise will be able to afford it.

In this blog, we take a look at why AI costs accelerate so quickly after their initial success, and how a modular approach to Agentic RAG can transform isolated pilots into a scalable, sustainable foundation for enterprise AI.

Duplication, Lock-In and Fragmented Governance

As AI deployments multiply across functions, the hidden costs of fragmented architecture and decentralized controls are hard to ignore. Let’s examine what they are.

Pipeline-Per-Use-Case Architecture

In many enterprises, each new AI assistant is built as its own isolated stack. In this “pipeline-per-use-case” pattern, every chatbot, copilot or knowledge agent has its own ingestion workflow, indexing configuration and retrieval logic.

 

This approach requires re-ingesting and re-indexing the same enterprise data for every new AI project. That means recomputing all embeddings and duplicating vector stores. It also requires rewriting retrieval logic as compute costs and operational overhead skyrocket. 

Instead of building on a shared knowledge foundation, teams recreate it over and over and pay for the same data processing multiple times, quickly spiking costs.

Hard-Coded LLM Dependencies

Early RAG implementations often tightly integrate a specific LLM directly into the application logic. While this can accelerate time-to-value initially, it also limits flexibility. As a result, a change in model pricing or governance requirements can cause major disruption.

There’s also the issue of cost. Some LLMs can be 10X more expensive, with similar answer quality. What’s more, models deprecate over time. For example, GPT-4 announced an ending datewhat happens with an LLM sunsets when your AI experience is using that model?

The pace of change in AI models introduces a new kind of risk. New models are released frequently, often delivering better performance at lower cost—but organizations locked into a single provider or model can’t take advantage of those gains.

There’s also lifecycle risk. When providers deprecate models (as we have seen with the recent sunsetting of GPT- 4.0 variants), LLM-dependent workflows can break or require costly rework. At the same time, pricing varies widely. Some models cost up to 10X more than alternatives while delivering only marginal improvements for specific use cases.

To manage this volatility, organizations need flexibility to switch models, control costs and ensure continuity as the AI landscape evolves.

Governance and Security Rebuilds

In many organizations, each new AI system has its own separate access controls and policy rules. This duplication is simply unacceptable for many large enterprises. Yet, because they want to move fast, they often ignore the associated cost and risk. 

 

When governance is rebuilt every time a new AI agent is deployed, inconsistencies emerge and oversight becomes more complex, making systems more vulnerable to security incidents. According to IBM, the average data breach costs $4.88 million globally

 

Minimizing duplication will help control costs and reduce risk, but it requires building a centralized knowledge layer capable of supporting multiple AI agents, workflows and use cases. 

How a Centralized, Modular Knowledge Layer Supports Scalable AI Deployment 

As AI adoption spreads across the enterprise, the goal is to ensure every new AI experience builds on a shared infrastructure. 

Ingest Once. Use Many Times.

Modular agentic RAG centralizes ingestion, normalization and indexing, so enterprise data can be processed once and reused many times. 

 

Instead of each team building its own pipelines, they leverage a shared knowledge layer and unified framework for parsing documents, enriching metadata, generating embeddings and indexing data. This also creates a shared objective reality from which AI can pull relevant knowledge and generate accurate answers, all while dramatically reducing the risk of hallucinations.

 

With this foundation in place, teams can launch new AI experiences on top of existing infrastructure, instead of having to rebuild it from scratch. For example, a compliance assistant, a sales copilot and an engineering knowledge bot can all operate from the same indexed data. Redundant compute costs disappear and time-to-deployment for new AI initiatives drops from months to weeksor less, all because the basic foundation already exists.

Retrieval as Configuration, Not Reconstruction

Different departments require different retrieval strategies. For instance, a compliance assistant may prioritize authoritative policy documents and enforce strict filtering to reduce risk, whereas an engineering copilot may require technical documentation and spec sheets, searching across different knowledge bases.

 

In the traditional RAG model, each of these use cases would require separate indexes and custom retrieval logic. But in a modular architecture, retrieval is configurable and teams can adjust things like chunking strategies, ranking logic, search weighting (keyword vs. semantic), metadata filtering and model orchestration. This removes the need for separating data pipelines for each use case. 

 

Teams can also refine ranking signals and optimize query orchestration over time for better results, all without destabilizing the underlying infrastructure.

Audience-Specific Experiences Without Data Duplication

A centralized knowledge layer enables multiple tailored AI agents to operate from the same trusted data foundation. Sales, engineering, HR, compliance and operations can all have role-specific AI experiences with role-appropriate access controls and filtering. Additionally, sync agents automatically ingest and update data from the underlying knowledge repositories such as SharePoint, Google Drive or Dropbox–automatically, further reducing the need for manual intervention.

 

With a single knowledge layer, governance is simpler, and updates can propagate automatically across all AI experiences. As new agents are introduced, they inherit a mature, optimized knowledge base rather than requiring their own isolated stack. 

Continuous Optimization and Long-Term Cost Control

AI performance requires visibility into how well your system is actually responding in production—and RAG Evaluation Metrics REMi in Progress Agentic RAG plays a critical role.

REMi continuously measures the quality of every AI-generated response using metrics such as relevance, accuracy and groundedness to create an objective view of performance over time.

REMi also captures a full record of user interactions, logging every question asked, along with the system’s response. This creates a traceable history that supports tuning, auditing and improvement. REMi highlights gaps by tracking unanswered questions (where the system lacks sufficient knowledge to respond), as well, allowing you to proactively enrich your organization's knowledge base and close coverage gaps.

Instead of relying on guesswork, REMi enables a continuous improvement loop: measure performance, identify weaknesses and refine the knowledge layer. As a result, the accuracy, reliability and usefulness of the AI systems continue to increase over time, without the need for constant reengineering.

How Progress Agentic RAG Prevents the Second-Wave AI Tax

Progress Agentic RAG is designed to prevent the “second-wave AI tax” by separating knowledge orchestration from individual AI experiences. This approach is different from traditional RAG in the following ways:

  • Modular by Design: Built as a configurable knowledge orchestration layer rather than a fixed RAG pipeline, Progress Agentic RAG enables teams to build multiple AI experiences from a centralized data pipeline and configure retrieval strategies per workflow. This approach eliminates the need for costly rebuilds as requirements evolve.
  • Retrieval Control and Continuous Optimization: Teams can tune hybrid retrieval methods (vector, keyword and metadata search) based on the needs of each AI use case and adjust how content is organized, prioritized and filtered without reindexing. Continuous monitoring and optimization ensure the knowledge layer improves over time instead of degrading.
  • Model Flexibility: Because LLMs are not hard-wired into the architecture, you can switch models as needs, pricing, performance or compliance requirements change. This protects long-term investments and reduces risk.
  • Built-In Governance and Access Controls: Centralized security policies protect data access across AI use cases. Since governance sits above the retrieval layer, secure scaling across departments is possible without duplicating security logic for every AI agent.

These capabilities result in significant ROI. Unlike traditional RAG approaches in which each new assistant requires separate ingestion, indexing, tuning, security and monitoring, Progress Agentic RAG amortizes those investments across use cases. 

 

As adoption expands, the marginal cost of launching new AI workflows decreases, increasing infrastructure ROI because each deployment builds on shared capabilities instead of recreating them.

Architecture Determines AI Economics

When AI spending starts to spike, in many cases, a fragmented architecture is to blame. Every time a new assistant requires its own ingestion pipeline, indexing logic, security layer and orchestration workflow, costs and complexity multiply.

The companies that scale AI sustainably are the ones that are building a reusable knowledge layer as the foundation for AI innovation. Each new AI experience builds on shared capabilities and infrastructure, leveraging existing investments and reducing both risk and cost. 

AI will continue to expand across the enterprise. The question then becomes whether your architecture will make each instance less expensive and easier to deploy than the last.

Discover how Progress Agentic RAG helps you scale AI with control, flexibility and long-term cost efficiency.

 


Author Michael Marolda in front of Niagara Falls
Michael Marolda

Product Marketing Manager, Senior

Michael Marolda is a seasoned product marketer with deep expertise in data, analytics and AI-driven solutions. He is currently the lead product marketer for the Progress Agentic RAG solution. Previously, he held product marketing roles at Qlik, Starburst Data and Tellius, where he helped craft compelling narratives across analytics, data management and business intelligence product areas. Michael specializes in translating advanced technology concepts into clear, practical business terms, such as Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) and modern data platforms.

More from the author

Related Products:

Agentic RAG

Progress Agentic RAG transforms scattered documents, video, and other files into trusted, verifiable answers accelerating AI adoption, reducing hallucinations, and improving AI-driven outcomes.

Get in Touch

Related Tags

Related Articles

Make Your Conference Content Work Year-Round
In this post, we'll cover how associations can make better use of one of the biggest content investments associations make: conferences. Agentic RAG is the key to making conference content deliver value and revenue year-round.
How the Progress Agentic RAG Solution Transforms Internal Knowledge into Strategic Advantage
Progress Agentic RAG, paired with the OpenEdge MCP Server, helps OpenEdge-based ISVs unlock and reason over decades of structured and unstructured application knowledge through secure, governed AI actions—delivering fast, grounded insights without rewriting trusted code.
Accelerate Your AI-Readiness: Eudald Camprubí on Agentic RAG, Trustworthy Data and Personalized Experiences
ICYMI: Eudald Camprubí on retrieval-augmented generation, small language models and secure AI adoption.
Prefooter Dots
Subscribe Icon

Latest Stories in Your Inbox

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

Loading animation