
Member-only story
Introducing Agent-based RAG
An implementation with LangGraph, Azure AI Search and Azure OpenAI GPT-4o
Among the various architectural patterns in the field of Generative AI, Retrieval Augmented Generation (RAG) was the first and probably still most polular to be around.
RAG is a technique that allows the generative models to access external knowledge sources, such as documents, databases, or web pages, and use them as additional inputs for generating responses. By doing so, RAG can improve the quality, diversity, and reliability of the generated content, as well as provide transparency and verifiability for the users.
Over the last months, many variations of RAG have been developed (GraphRAG, Adaptive RAG, Corrective RAG…), with the goal of improving some weaknesses of the “traditional” RAG pipeline.
In this article, we are going to see one of these variations: Agentic RAG. Before diving into the topic, let’s refresh how the two main ingredients of this solution — RAG and Agents — are defined.
What is RAG?
Retrieval Augmented Generation (RAG) is a powerful technique in LLM-powered applications scenarios that addresses the following problem: “what if I want to ask my LLM something that is not part of the training set where the LLM was trained?”. The idea behind RAG is that of decoupling the LLM from the knowledge base we want to navigate through, which is properly vectorized or embedded and stored into a VectorDB.
RAG is made of three phases:
- Retrieval → given a user’s query and its corresponding vector, the most similar pieces of documents (those corresponding to the vectors that are closer to the user query’s vector) are retrieved and used as the base context for the LLM.

- Augmentation →the retrieved context is enriched through additional instructions, rules, safety guardrails and similar practices that are typical of prompt engineering techniques.