RAG

For certain LLM tasks like answering questions, providing relevant context is essential. One common architecture is a two-stage RAG pipeline:

  1. Offline stage: Preprocess and index documents ("building the index").

  2. Online stage: Given a question, generate answers by retrieving the most relevant context.


Stage 1: Offline Indexing

We create three Nodes:

  1. ChunkDocs – chunks raw text.

  2. EmbedDocs – embeds each chunk.

  3. StoreIndex – stores embeddings into a vector database.

Usage example:


Stage 2: Online Query & Answer

We have 3 nodes:

  1. EmbedQuery – embeds the user’s question.

  2. RetrieveDocs – retrieves top chunk from the index.

  3. GenerateAnswer – calls the LLM with the question + chunk to produce the final answer.

Usage example:

Last updated