Most organizations begin their AI journey with a single high-impact use case. Often, a chatbot, internal copilot or a knowledge assistant. The idea is to prove the concept and how AI can deliver ROI. Most of the time, this approach is fast and efficient, and it works exactly as planned.
However, shortcuts that make AI pilots successful can become liabilities when you try to expand beyond that initial use case. As experimentation with AI transitions into an operational imperative, these shortcuts can strain infrastructure and cause your costs to spike.
Why does this happen? Early deployments are typically siloed. Each team builds its own ingestion pipelines, indexing strategies, embedding generation workflows and retrieval configurations. As departments adopt AI independently, duplication multiplies with separate vector stores, repeated governance reviews, isolated large language model (LLM) integrations, and redundant tuning efforts.
Moreover, each department's standalone AI experiment also requires maintenance, and as AI use grows, you’ll need an evaluation layer to ensure quality and performance. If you decide to add external web search, additional engineering time is required and must be governed. When new models are released, you’ll need to re-engineer your pipelines taking time away from important strategic projects.
According to McKinsey, 65% of organizations now use generative AI in at least one function, nearly double the previous year, and broader AI adoption has surged from 50% of organizations in 2022 to 88% in 2025. Among organizations that have reached limited or full-scale maturity, nearly 30% have already integrated AI agents into their operations.
At these adoption rates, only organizations that have a modular foundation in place for scaling AI across the enterprise will be able to afford it.
In this blog, we take a look at why AI costs accelerate so quickly after their initial success, and how a modular approach to Agentic RAG can transform isolated pilots into a scalable, sustainable foundation for enterprise AI.
As AI deployments multiply across functions, the hidden costs of fragmented architecture and decentralized controls are hard to ignore. Let’s examine what they are.
In many enterprises, each new AI assistant is built as its own isolated stack. In this “pipeline-per-use-case” pattern, every chatbot, copilot or knowledge agent has its own ingestion workflow, indexing configuration and retrieval logic.
This approach requires re-ingesting and re-indexing the same enterprise data for every new AI project. That means recomputing all embeddings and duplicating vector stores. It also requires rewriting retrieval logic as compute costs and operational overhead skyrocket.
Instead of building on a shared knowledge foundation, teams recreate it over and over and pay for the same data processing multiple times, quickly spiking costs.
Early RAG implementations often tightly integrate a specific LLM directly into the application logic. While this can accelerate time-to-value initially, it also limits flexibility. As a result, a change in model pricing or governance requirements can cause major disruption.
There’s also the issue of cost. Some LLMs can be 10X more expensive, with similar answer quality. What’s more, models deprecate over time. For example, GPT-4 announced an ending date—what happens with an LLM sunsets when your AI experience is using that model?
The pace of change in AI models introduces a new kind of risk. New models are released frequently, often delivering better performance at lower cost—but organizations locked into a single provider or model can’t take advantage of those gains.
There’s also lifecycle risk. When providers deprecate models (as we have seen with the recent sunsetting of GPT- 4.0 variants), LLM-dependent workflows can break or require costly rework. At the same time, pricing varies widely. Some models cost up to 10X more than alternatives while delivering only marginal improvements for specific use cases.
To manage this volatility, organizations need flexibility to switch models, control costs and ensure continuity as the AI landscape evolves.
In many organizations, each new AI system has its own separate access controls and policy rules. This duplication is simply unacceptable for many large enterprises. Yet, because they want to move fast, they often ignore the associated cost and risk.
When governance is rebuilt every time a new AI agent is deployed, inconsistencies emerge and oversight becomes more complex, making systems more vulnerable to security incidents. According to IBM, the average data breach costs $4.88 million globally.
Minimizing duplication will help control costs and reduce risk, but it requires building a centralized knowledge layer capable of supporting multiple AI agents, workflows and use cases.
As AI adoption spreads across the enterprise, the goal is to ensure every new AI experience builds on a shared infrastructure.
Modular agentic RAG centralizes ingestion, normalization and indexing, so enterprise data can be processed once and reused many times.
Instead of each team building its own pipelines, they leverage a shared knowledge layer and unified framework for parsing documents, enriching metadata, generating embeddings and indexing data. This also creates a shared objective reality from which AI can pull relevant knowledge and generate accurate answers, all while dramatically reducing the risk of hallucinations.
With this foundation in place, teams can launch new AI experiences on top of existing infrastructure, instead of having to rebuild it from scratch. For example, a compliance assistant, a sales copilot and an engineering knowledge bot can all operate from the same indexed data. Redundant compute costs disappear and time-to-deployment for new AI initiatives drops from months to weeks—or less, all because the basic foundation already exists.
Different departments require different retrieval strategies. For instance, a compliance assistant may prioritize authoritative policy documents and enforce strict filtering to reduce risk, whereas an engineering copilot may require technical documentation and spec sheets, searching across different knowledge bases.
In the traditional RAG model, each of these use cases would require separate indexes and custom retrieval logic. But in a modular architecture, retrieval is configurable and teams can adjust things like chunking strategies, ranking logic, search weighting (keyword vs. semantic), metadata filtering and model orchestration. This removes the need for separating data pipelines for each use case.
Teams can also refine ranking signals and optimize query orchestration over time for better results, all without destabilizing the underlying infrastructure.
A centralized knowledge layer enables multiple tailored AI agents to operate from the same trusted data foundation. Sales, engineering, HR, compliance and operations can all have role-specific AI experiences with role-appropriate access controls and filtering. Additionally, sync agents automatically ingest and update data from the underlying knowledge repositories such as SharePoint, Google Drive or Dropbox–automatically, further reducing the need for manual intervention.
With a single knowledge layer, governance is simpler, and updates can propagate automatically across all AI experiences. As new agents are introduced, they inherit a mature, optimized knowledge base rather than requiring their own isolated stack.
AI performance requires visibility into how well your system is actually responding in production—and RAG Evaluation Metrics REMi in Progress Agentic RAG plays a critical role.
REMi continuously measures the quality of every AI-generated response using metrics such as relevance, accuracy and groundedness to create an objective view of performance over time.
REMi also captures a full record of user interactions, logging every question asked, along with the system’s response. This creates a traceable history that supports tuning, auditing and improvement. REMi highlights gaps by tracking unanswered questions (where the system lacks sufficient knowledge to respond), as well, allowing you to proactively enrich your organization's knowledge base and close coverage gaps.
Instead of relying on guesswork, REMi enables a continuous improvement loop: measure performance, identify weaknesses and refine the knowledge layer. As a result, the accuracy, reliability and usefulness of the AI systems continue to increase over time, without the need for constant reengineering.
Progress Agentic RAG is designed to prevent the “second-wave AI tax” by separating knowledge orchestration from individual AI experiences. This approach is different from traditional RAG in the following ways:
These capabilities result in significant ROI. Unlike traditional RAG approaches in which each new assistant requires separate ingestion, indexing, tuning, security and monitoring, Progress Agentic RAG amortizes those investments across use cases.
As adoption expands, the marginal cost of launching new AI workflows decreases, increasing infrastructure ROI because each deployment builds on shared capabilities instead of recreating them.
When AI spending starts to spike, in many cases, a fragmented architecture is to blame. Every time a new assistant requires its own ingestion pipeline, indexing logic, security layer and orchestration workflow, costs and complexity multiply.
The companies that scale AI sustainably are the ones that are building a reusable knowledge layer as the foundation for AI innovation. Each new AI experience builds on shared capabilities and infrastructure, leveraging existing investments and reducing both risk and cost.
AI will continue to expand across the enterprise. The question then becomes whether your architecture will make each instance less expensive and easier to deploy than the last.
Discover how Progress Agentic RAG helps you scale AI with control, flexibility and long-term cost efficiency.
Product Marketing Manager, Senior
Michael Marolda is a seasoned product marketer with deep expertise in data, analytics and AI-driven solutions. He is currently the lead product marketer for the Progress Agentic RAG solution. Previously, he held product marketing roles at Qlik, Starburst Data and Tellius, where he helped craft compelling narratives across analytics, data management and business intelligence product areas. Michael specializes in translating advanced technology concepts into clear, practical business terms, such as Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) and modern data platforms.
Subscribe to get all the news, info and tutorials you need to build better business apps and sites