How DataVault Connected 20 Years of Research in 48 Hours to Transform Their Knowledge Management
Having proven the value of RAG technology with their initial knowledge box, DataVault Financial Services faced their next challenge: scaling beyond manually uploaded documents to create a comprehensive intelligence network. Their 20-year archive of research reports, real-time market feeds and compliance documentation remained scattered across systems, limiting the transformative potential they’d glimpsed.
This is the second article in our three-part series following DataVault’s implementation of Progress’ RAG-as-a-Service platform, Progress Agentic RAG. In our previous article, we saw how they solved their immediate knowledge crisis by creating their first searchable repository. Now we’ll explore how they built sophisticated data pipelines, connected multiple sources and created AI-powered systems that transformed scattered information into actionable intelligence—all while facing a critical compliance deadline.
Sarah Rodriguez stared at the email timestamp: 2:47 PM Thursday. A major client wanted a comprehensive market outlook report combining Fed policy analysis, global economic trends and regulatory updates. Due: Monday morning.
“Our research is scattered everywhere: Dropbox folders, news feeds we never read, regulatory documents in email attachments,” she told David Kim, their senior developer. “Last time this took two weeks. We have 48 hours.”
David had spent the morning after their initial Progress Agentic RAG success exploring the platform’s integration capabilities. The InvestmentInsights knowledge box was already proving valuable, but it only contained their most recent 1,000 documents—manually uploaded.
“Look at this,” David said, pulling up both the Progress Agentic RAG dashboard and his code editor. “They have connectors for folders, RSS feeds, even web scraping. I’ve already written scripts to automate everything.”
Progress Agentic RAG Synchronize Dashboard
The Progress Agentic RAG Synchronize interface showing three data source options: Folder for local file systems, RSS for news feeds and Sitemap for web content
Marcus Chen leaned over his shoulder. “How fast can you connect our historical archives?”
DataVault’s Dropbox contained 20 years of financial history: SEC filings, Federal Reserve reports, IMF analyses and internal compliance documentation. David knew that Progress Agentic RAG’s Sync Agent could handle this volume effortlessly by monitoring the local Dropbox folder on their workstation.
Within the Synchronize section, David reviewed the available data source options. The interface presented three main integration paths: Folder for local file systems, RSS for news feeds and Sitemap for web content.
Progress Agentic RAG Synchronize Dashboard
The Progress Agentic RAG Synchronize interface showing three data source options: Folder, RSS and Sitemap, with the Sync Agent setup panel above
“To connect our Dropbox archive, we’ll use the Progress Agentic RAG Sync Agent with a Folder connector,” David explained to Sarah. “Since Dropbox syncs files to a local folder on our workstation, we can point the Sync Agent directly to that directory. The agent will monitor the folder for any changes and automatically sync them to Progress Agentic RAG. RSS feeds work independently - we can set those up right away through the dashboard without the Sync Agent.”
For the Dropbox integration, David configured a Folder connector:
Folder Connector Configuration
The folder configuration interface for setting up synchronized sources
Progress Agentic RAG Synchronizations dashboard with configured sources
The Synchronizations dashboard showing active data sources: DataVault Research Archive and Dropbox Archive (both using Folder connectors via Sync Agent), plus RSS feeds (MarketWatch Markets, Federal Reserve News and Reuters Business News), all actively syncing
Once configured, the Folder connector began processing their folder structure:
From /Compliance:
From /Market_Analysis:
From /Research:
Documents appearing in Progress Agentic RAG Resources list
The Progress Agentic RAG Resources list showing 144 processed documents from various sources including WSJ articles and financial advisory content
“Look at this,” David showed Sarah as documents began appearing in the Resources list. “It’s preserving our entire folder structure. Your compliance team can find documents using the same mental model they already have.”
While the historical documents synchronized, Lisa Thompson, their senior analyst, had her own request: “Can we get real-time market news flowing into this system? I need to correlate today’s events with our historical analyses.”
“RSS feeds are even easier,” David explained, pulling up his terminal. “They don’t require the sync agent - Progress Agentic RAG can pull from them directly through the cloud. Watch this.”
David ran his rss_feed_config.pyscript from the code_samples repository, showing Lisa the configuration in action:
# DataVault's RSS feed configuration# File: rss_feed_config.pyrss_feeds = [
{
"name": "MarketWatch Markets",
"url": "https://feeds.content.dowjones.io/public/rss/RSSMarketsMain",
"category": "market_analysis" },
{
"name": "Reuters Business News",
"url": "https://feeds.feedburner.com/reuters/businessNews",
"category": "market_news" },
{
"name": "Bloomberg Markets",
"url": "https://feeds.bloomberg.com/markets/news.rss",
"category": "market_news" }
]
RSS feed configuration interface
Adding a new RSS feed for MarketWatch Markets with automatic 15-minute sync intervals
As each feed was added, Progress Agentic RAG began indexing articles in real-time. Lisa watched as breaking news about Federal Reserve policy changes appeared alongside historical Fed analyses in their knowledge base. Within moments of adding the Federal Reserve RSS feed, she could see the latest announcement about “Federal Reserve Board announces final individual capital requirements for large banks” appearing in the Latest processed section.
Federal Reserve RSS Real-time Indexing
Real-time indexing in action - Federal Reserve announcements appearing immediately in the knowledge base alongside WSJ market articles
“This is exactly what we needed,” Lisa said. “Now I can search for ‘interest rate changes’ and get both today’s Fed announcement and our historical analysis of similar moves from 2019.”
With data flowing in from multiple sources, David implemented the next crucial component: a natural language Q&A system that could understand context across all their documents.
David accessed the Search configuration page and enabled Progress Agentic RAG’s semantic search capabilities. “I’ve been working on this all weekend,” he told Sarah, opening his terminal:
# Install Progress Agentic RAG SDK and required dependenciespip install Progress Agentic RAG python-dotenv pytest
“The dependencies installed perfectly,” David continued. “Now let me show you the search implementation using the Progress Agentic RAG Python SDK.” He opened the search_financial_insights.pyscript he’d been testing:
# search_financial_insights.pyimport os
from dotenv importload_dotenv
from Progress Agentic RAG import sdk
# Load environment variablesload_dotenv()
# DataVault's InvestmentInsights [Knowledge Box](https://docs.Progress Agentic RAG.dev/docs/management/knowledgebox) configurationPROGRESS AGENTIC RAG_API_KEY = os.environ.get('PROGRESS AGENTIC RAG_API_KEY', 'YOUR_API_KEY_HERE')
PROGRESS AGENTIC RAG_ZONE = os.environ.get('PROGRESS AGENTIC RAG_ZONE', 'aws-us-east-2-1')
KB_ID = os.environ.get('PROGRESS AGENTIC RAG_KB_ID', 'investmentinsights')
def search_financial_insights(query, show_details=True):
""" Search across all DataVault's financial documents with semantic understanding using Progress Agentic RAG SDK """ # Initialize Progress Agentic RAG authentication kb_url = f"https://{PROGRESS AGENTIC RAG_ZONE}.Progress Agentic RAG.cloud/api/v1/kb/{KB_ID}" sdk.Progress Agentic RAGAuth().kb(url=kb_url, token=PROGRESS AGENTIC RAG_API_KEY)
# Create search instance search_client = sdk.Progress Agentic RAGSearch()
# Perform the search using Progress Agentic RAG SDK try:
response = search_client.find(
query=query,
filters=None # Can add filters like ['/icon/application/pdf'] )
# Process results from multiple sources results = {
'documents': [],
'news': [],
'compliance': []
}
.....
returnresults
except Exception ase:
print(f"Error performing search: {str(e)}")
return None# Test the search with Sarah's compliance queryif __name__ == "__main__":
# Sarah's audit query test_query = "risk disclosure regulatory compliance SEC requirements" results = search_financial_insights(test_query)
David ran his test suite to validate the implementation before Sarah’s critical demo:
# Running the test suitepython -m pytest test_Progress Agentic RAG_search.py -v
Pytest validation of Progress Agentic RAG search API
All 10 tests passing - API connectivity, search endpoints, filters and specific compliance term searches all validated
“Perfect! All tests passing,” David said. “The system is ready.”
With just 24 hours remaining before the audit, Sarah was nervous. “Show me how this works with a real compliance query,” she said.
“I built a specific script just for your audit needs,” David replied, opening his compliance_audit_query.pyfile. “Try this: Show me all risk disclosure documentation, recent regulatory updates and any market analyses that mention systemic risk from the past quarter.”
David executed the script he’d prepared for exactly this scenario:
# Sarah's urgent compliance audit query# File: compliance_audit_query.pyaudit_query = "risk disclosure regulatory updates systemic risk market analysis"print(f"Executing search: {audit_query}\n")
results = search_financial_insights(audit_query)
# The function returned categorized results within seconds
The terminal output showed results appearing from multiple sources:
🔍 Query: 'risk disclosure regulatory updates systemic risk market analysis'
📊 Total results: 23
📋 Compliance Documents:
• Risk_Disclosure_Requirements.txt
• SEC_Form_ADV_Instructions.pdf
• Regulatory_Compliance_Checklist_2024.docx
📰 Recent News:
• Fed Warns of Systemic Risk in Commercial Real Estate
• New SEC Disclosure Rules Take Effect
• Banking Regulators Update Risk Management Guidelines
📄 Research Documents:
• IMF_Global_Financial_Stability_Oct_2024.pdf
• Q3_2024_Market_Summary.md
• Systemic_Risk_Assessment_Framework.pdf
Sarah’s eyes widened as she watched the terminal output. “This would have taken me three days to compile manually. And look – it’s showing me connections I hadn’t even considered.”
David smiled. “That’s because I configured the semantic search to understand financial synonyms and relationships. Let me show you the configuration.” He opened search_config.pyto demonstrate the optimizations.
Inflation and market volatility correlation search
Progress Agentic RAG revealing connections between inflation uncertainty, market volatility and regulatory requirements across Risk Disclosure documents, Q3 Market Summary and IMF Global Financial Stability reports
As word spread about the system’s capabilities, other departments wanted in. The wealth management team needed a way to quickly answer client questions about market conditions.
David explored Progress Agentic RAG’s widget functionality, which allows embedding search and chat interfaces directly into websites. “We can create a client-safe search interface using Progress Agentic RAG’s pre-built widgets,” he explained. “They handle all the complexity while keeping our data secure.” The wealth management team could now provide instant insights to their clients:
Working client portal with Federal Reserve search results
DataVault’s custom client portal showing real-time search results for “Federal Reserve interest rates” – pulling from FOMC statements, IMF reports, Beige Books and regulatory announcements indexed in their InvestmentInsights knowledge box
The portal demonstrated immediate value:
The audit arrived Monday morning. Sarah had David’s scripts loaded on her laptop, ready to demonstrate. She pulled up the terminal alongside the Progress Agentic RAG dashboard and began her presentation to the regulators:
“Let me show you how we’ve transformed our compliance documentation system,” she began. “David, run the Basel III query.”
David typed into his terminal, executing a modified version of the compliance script:
python compliance_audit_query.py --query "Basel III implementation status DataVault 2024"
Instantly, results populated:
The lead auditor leaned forward. “How did you connect all these systems so quickly?”
Sarah smiled, gesturing to David. “My developer built a unified intelligence network using RAG-as-a-Service. Show them the code repository, David.”
David opened his file explorer, showing the organized structure of Python scripts, each one tested and documented. “Every document, every news feed, every analysis – it’s all connected through these scripts and searchable in natural language. The entire implementation took less than a week and the code is maintainable by our whole team.”
For those implementing similar systems, here are key technical considerations DataVault discovered:
Not all sources are equal. DataVault structured their ingestion priority:
David had documented their optimal search configuration in a dedicated script:
# Search configuration for financial data# File: search_config.pysearch_config = {
'semantic_weight': 0.7, # Understand intent 'keyword_weight': 0.3, # Catch specific terms 'enable_synonyms': True, # "Fed" = "Federal Reserve" 'boost_recent': True, # Prioritize recent news 'min_confidence': 0.75 # High accuracy requirement}
As David closed his laptop at the end of the week, Marcus Chen, the CTO, stopped by his desk.
“The board is impressed,” Marcus said. “They want to know if we can scale this globally. Our European acquisition needs the same system, but with multilingual support and stricter access controls.”
David pulled up the Progress Agentic RAG pricing page showing enterprise features. “Unlimited file sizes, cloud or on-premises deployment, custom AI tasks and enterprise support with private Slack channels. We can scale this globally.”
Progress Agentic RAG Enterprise features pricing page
Progress Agentic RAG’s Enterprise tier showing unlimited file sizes, on-premises deployment options and advanced AI capabilities
Sarah, still glowing from the successful audit, added: “And if we can implement AI agents for automated report generation…”
Marcus nodded. “That’s the next phase. But first, let’s document what we’ve built. This is going to transform how financial services handle information.”
Want to explore the exact documents and code that powered DataVault’s transformation? All the files used in this article are available in GitHub.
DataVault’s implementation demonstrates three critical success factors for building a financial intelligence network:
In our next article, we’ll explore how DataVault scaled their implementation globally, added multilingual capabilities and built AI agents that generate automated intelligence reports. The transformation from information repository to active intelligence platform was about to accelerate dramatically.
Ready to build your own financial intelligence network? Start your free Progress Agentic RAG trial and follow DataVault’s proven implementation path.
Editor's note: We'd like to thank Adam for this comprehensive guide on our newly launched RAG-as-a-Service product. Progress Agentic RAG is just at the beginning of its human-centric AI and innovation journey.
And as with all things AI, this product will change and evolve. We will be adding new models, features, functions and extending its capabilities. As such, elements in this How-To series might change.
If you spot areas that have been missed by this guide or if something is not factually correct, reach out to us, and we will fix it ASAP.
With so much innovation coming, mistakes can happen. Contact us if you spot anything or if you have a suggestion of what you'd like to see next.
Adam Bertram is a 25+ year IT veteran and an experienced online business professional. He’s a successful blogger, consultant, 6x Microsoft MVP, trainer, published author and freelance writer for dozens of publications. For how-to tech tutorials, catch up with Adam at adamtheautomator.com, connect on LinkedIn or follow him on X at @adbertram.
Subscribe to get all the news, info and tutorials you need to build better business apps and sites