Agentic RAG Data & AI

Monitoring the Quality of Your RAG Stack with REMi

by Eric Bréhault Posted on October 21, 2025

Previously published on Nuclia.com. Nuclia is now Progress Agentic RAG.

Using RAG is about getting the most from LLMs ability to phrase proper answers and at the same time making sure it uses the most relevant and up-to-date data according to the user’s question. The objective is to deliver high-quality answers to users.

That’s why it is important to monitor the quality of your RAG pipeline.

Even if you have initially tuned very carefully your RAG pipeline to get the best answers, you need to monitor its quality over time.

Indeed, the content of the Knowledge Box can change as you are ingesting new resources and questions that used to be answered correctly can become incorrect. Also, the questions asked by the users can evolve and all of a sudden, they might ask about topics that are insufficiently covered by your resources.

Fortunately, Progress Agentic RAG provides a set of tools to help you keep a close eye on the quality of your RAG pipeline!

REMi, the RAG Evaluation Metrics framework

We’ve developed REMi (it stands for RAG Evaluation Metrics), an efficient open-source fine-tuned LLM that simplifies the assessment of RAG pipelines.

Principle

The main inputs/outputs in the RAG pipeline are:

Query: The user’s question, which the model will try to answer
Context: The information retrieved by the retrieval step, which aims to be relevant to the user’s query.
Answer: The response generated by the language model after receiving the query and context pieces.

Hence, REMi defines the following metrics to assess the quality of the RAG pipeline:

Answer relevance: relevance of the generated answer to the user query.
Context relevance: relevance of the retrieved context to the user query.
Groundedness: degree to which the generated answer is grounded in the retrieved context.

By combining these metrics, REMi provides a comprehensive view of the quality of the RAG pipeline.

For example:

If the context relevance is high but the answer relevance and groundedness are low, it means that the model is generating evasive answers. The semantic search successfully retrieves relevant context pieces, but the model fails to generate a relevant and grounded answer. You should try a different LLM.
If the answer relevance is high but the context relevance and groundedness are low, it means that the model is generating unverifiable answers. The LLM generates a relevant answer but not based on the information stored in your Knowledge Box. First, you should check if the information is missing from your Knowledge Box. If the information is present, you should try and change your search and RAG strategy parameters.
If the groundedness is high but the answer relevance and context relevance are low, it means that the model is generating unrelated answers. The LLM generates an answer based on the context but the answer is not relevant to the user query. This can happen if the wrong context pieces are retrieved but the LLM still feels compelled to generate an answer based on the available information, disregarding the nuances of the query.

How to Use REMi

Progress Agentic RAG runs REMi on a regular basis to monitor the quality of the RAG pipeline. The results are displayed in the dashboard that shows the evolution of the metrics over time.

In your Knowledge Box home page, you will see a Health status section in the right column. It shows the answer relevance, context relevance and groundedness metrics for the past seven days. The dots represent the average of the metric and the segments represent the minimum and maximum values.

You can click on the More Metrics button to access the RAG Evaluation Metrics page.

This page lets you choose the time range for the metrics from the last 24 hours to the last 30 days. It displays the same Health status section as the home page as well as three graphs showing the performance evolution of the three REMi metrics over time (the red line is the average and the shaded area is the minimum and maximum values).

And it also lists the questions without answers and the questions with low context relevance. It is a good practice to review these questions regularly and improve your resources to answer them correctly.

Collecting User Feedback

Another very important aspect of monitoring the quality of your RAG pipeline is to collect user feedback.

The API provides the /feedback endpoint that allows you to collect user feedback.

It only requires a READER role to be used, so you can easily integrate it in your application.

It expects the identifier of the current query, and a boolean indicating if the answer was relevant or not. Optionally, it accepts a comment to provide more context.

The identifier of the query can be obtained from the NUCLIA-LEARNING-ID HTTP header returned in the /ask response.

If you want to collect feedbacks not only about the answer quality but also about the relevance of the returned results or citations, the /feedback endpoint also accepts a text_block_id parameter.

Accessing the Activity Log

As mentioned earlier, the REMi metrics can be accessed from the dashboard. But if you want to have a more detailed view of the activity of your RAG pipeline, you can access the Activity Log.

In the Activity Log section of the Knowledge Box, you can download the monthly activity logs. It contains all the queries asked by users, the answers provided by the LLM and the feedback collected.

But the most flexible way to access the Activity Log is to use the Nuclia CLI/SDK. It provides a set of commands to access the Activity Log, filter the results and export them.

Examples:

CLI:

nuclia kb logs query --type=CHAT --query='{
  "year_month": "2024-10",
  "show": ["id", "date", "question", "answer", "feedback_good"],
  "filters": {
    "question": {"ilike": "user question"},
    "feedback_good": {"eq": true}
  },
  "pagination": {"limit": 10}
}'

Python SDK:

from nuclia import sdk
from nuclia.lib.kb import LogType
from nuclia_models.events.activity_logs import ActivityLogsChatQuery, Pagination

kb = sdk.NucliaKB()
query = ActivityLogsChatQuery(
    year_month="2024-10",
    show=["id", "date", "question", "answer"],
    filters={
        "question": {"ilike": "user question"},
        "feedback_good": {"eq": True}
    },
    pagination=Pagination(limit=10)
)
kb.logs.query(type=LogType.CHAT, query=query)
For more details, check the CLI/SDK activity logs documentation page.

Eric Bréhault

View all posts from Eric Bréhault on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.

Related Tags

Agentic RAG

Monitoring the Quality of Your RAG Stack with REMi

REMi, the RAG Evaluation Metrics framework

Principle

How to Use REMi

Collecting User Feedback

Accessing the Activity Log

Eric Bréhault

Related Tags:

Related Products:

Agentic RAG

Related Tags

Monitoring the Quality of Your RAG Stack with REMi

REMi, the RAG Evaluation Metrics framework

Principle

How to Use REMi

Collecting User Feedback

Accessing the Activity Log

Eric Bréhault

Related Tags:

Related Products:

Agentic RAG

Related Tags

Latest Stories in Your Inbox