In RAG, effectively retrieving relevant source documents is crucial for generating high-quality, informative responses. Standard RAG methods often operate on smaller text chunks, which might not provide sufficient context for complex queries. Parent document retrieval (PDR) addresses this limitation by retrieving the complete parent documents associated with the most relevant child passages. This approach enhances RAG’s ability to handle intricate questions requiring a broader understanding of the source material.

What is parent document retrieval (PDR)?

Parent document retrieval (PDR) is a technique used in advanced RAG models to retrieve the full parent documents from which relevant child passages (snippets) are derived. This retrieval process improves the context available to the RAG model, leading to more comprehensive and informative responses, especially for complex or nuanced queries.

Here are the core steps of parent document retrieval in RAG models:

  • Data preprocessing: Split large documents into smaller chunks.

  • Create embeddings: Convert each chunk into a numerical representation for efficient search.

  • User query: The user submits a question.

  • Chunk retrieval: Search for the most relevant chunks based on the query’s embedding.

  • Identify parent documents: Find the original documents (or larger segments) for the shortlisted chunks.

  • Retrieve parent documents: Get the full parent documents for better context.

Get hands-on with 1200+ tech skills courses.