...

/

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG)

Discover how to retrieve information from documents using retrieval-augmented generation in LangChain.

Retrieval-augmented generation (RAG) is an NLP model architecture that combines the retrieval-based and generation-based approaches to enable a model’s capability to extract information from a specified document. The language model utilizes user-specific data to pull the relevant information. RAG overcomes the limitations in generating contextually relevant and accurate responses by leveraging the benefits of retrieval mechanisms. This results in more informed and contextually appropriate responses.

LangChain facilitates the implementation of RAG applications, empowering developers to seamlessly replace specific functionalities within an application.

Press + to interact
Answering a query using RAG
Answering a query using RAG

LangChain provides a retrieval system through its document loaders, document transformers, text embedding models, vector stores, and retrieval.

Press + to interact
Steps in the retrieval process
Steps in the retrieval process

Document loaders

Document loaders load documents from different sources and types, including HTML, PDF, code, and CSV. They also allow support to load private S3S3 is an AWS service that stands for "simple storage service." It's used to store, access, retrieve, and back up data where the data is stored in the form of objects. buckets and documents from public websites. This is particularly useful when we wish to retrieve information from any documentation available on external storage.

Loading a text file

The most basic type of document loading available in LangChain is through the load method. It reads a file as text and saves it in a single document as follows:

from langchain.document_loaders import TextLoader
loader = TextLoader("inputFile.txt")
loader.load()
Loading a text file in LangChain

Loading a CSV file

Another commonly used document is a comma-separated value (CSV) file. A CSV file is a delimited text file often used for storing and exchanging tabular data in plain text form. Each line of the file represents a table row, and commas separate the values within each row. CSV files are widely used for data manipulation and analysis across different applications and platforms.

from langchain.document_loaders.csv_loader import CSVLoader
loader = CSVLoader(file_path='inputRecord.csv')
data = loader.load()
Loading a CSV file in LangChain

Loading a PDF file

Portable Document Format (PDF) files are commonly used for storing documents across different platforms. LangChain provides a PyPDFLoader method to load a PDF file as follows:

from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("inputDocument.pdf")
pages = loader.load_and_split()
Loading a PDF file in LangChain

Here, the the PyPDFLoader loads the document into an array of documents with each document containing a single page contents and metadata with the corresponding page number.

Document transformers

After loading the document, it’s important to transform it according to our application or model requirements. This is where document transformers come into play. LangChain has various built-in transformers for documents that can perform several operations, including:

  • Splitting

  • Filtering

  • Combining

  • Translating to another language

  • Manipulating data

Let’s look into the simplest transformer that operates splitting. We use text splitters in LangChain for this purpose.

Text splitters

When working with a long document, splitting it into smaller pieces is often necessary. Let’s look at how the text splitters work:

Press + to interact
Working mechanism of text splitters
Working mechanism of text splitters

The text splitters follow the steps below:

  1. Split the document into smaller, readable chunks.

  2. Combine the small chunks into larger ones to reach the desired chunk size.

  3. Overlap part of smaller chunks at the boundary to keep the context between chunks.

Let’s see how the recursive text splitter works in this example. We use the following syntax:

Press + to interact
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
# The chunk_size and chunk_overlap can be modified according to the requirements
length_function = len,
chunk_size = 200,
chunk_overlap = 10,
add_start_index = True,
)
texts = text_splitter.create_documents([input_document])
  • Line 5: We define the length of a chunk. By default, this is done by counting the number of characters.

  • Line 6: We define the size of the chunks we want the document to be split into.

  • Line 7: We define the overlapping between chunks from the document to maintain context between them.

The following table represents the different types of text splitters available to use:

Text Splitters

Text Splitter

Usage


HTML header text splitter

Splits texts at an element level and adds relevant metadata for each header to a chunk


Split by character

Splits based on characters where a chunk size is measured by the number of characters present in a chunk


Split code

Enables splitting code in multiple programming languages


Markdown header text splitter

Splits a document in chunks identified by various headers creating header groups


Recursively split by character

Splits generic text based on a parameterized list of characters. The default list is ["\n\n", "\n", " ", ""].


Split by tokens

Splits text based on the token limit of a language model

1.

What happens when the size of contexts or overlaps is long in chunks?

0/500
Show Answer
Did you find this helpful?

Text embedding models

Embeddings in LangChain help us store the text’s semantic meaning by creating a vector representation, which helps determine similarity with other document texts. For this purpose, LangChain provides the Embeddings class, an interface to interact with all the embedding models. We have several embedding model interfaces, including HuggingFace, OpenAI, Cohere, etc.

The Embeddings class at a base level provides embeddings using two methods:

  • Embedding documents: This embeds multiple documents or texts into their numerical representation. To embed documents, we use the following syntax:

    embeddings = embeddings_model.embed_documents(
     [
         "This is the first text",
         "This is the second text",
         "This is the third text"
     ]
    )
    
  • Embedding a query: This embeds a single query into its numerical embedding. A query can be a text that contains the query we want to search for in the document.

    embedded_query = embeddings_model.embed_query("WRITE_YOUR_QUERY_HERE")
    

Vector stores

A document may or may not have unstructured data requiring some structuring for the model to access. We most commonly use embeddings to structure that data in a vector space. When a query is passed for retrieving data from the document, the unstructured query is embedded to determine the similarity index (most similar data) between the data present in the vector space and the embedded query. Vector stores perform all of this search process.

There are two methods for searching the similarity in data stored in vector stores:

Computing a simple similarity index for searching data

In the simple similarity method, we directly pass the query to the database by computing the similarity index with the most relevant documents and retrieve the most similar result, as shown in the diagram below.

Press + to interact
Vector stores for retrieving data
Vector stores for retrieving data

The following code shows how we can use the simple similarity_search method for retrieving data:

Press + to interact
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
# Load the document
input_document = TextLoader('my_document.txt').load()
# transform the document
text_splitter = RecursiveCharacterTextSplitter(
# The chunk_size and chunk_overlap can be modified according to the requirements
length_function = len,
chunk_size = 200,
chunk_overlap = 10,
add_start_index = True,
)
documents = text_splitter.create_documents([input_document])
# embed the chunks
db = Chroma.from_documents(documents, OpenAIEmbeddings())
# user query
query = "WRITE_YOUR_QUERY_HERE"
# computing the search using the similarity_search() method
docs = db.similarity_search(query)
  • Line 4: We import the Chroma module, which is an open-source vector store for building AI applications.

  • Line 7: We load the input document my_document using the TextLoader function.

  • Lines 10–18: We split the document using a RecursiveCharacterTextSplitter and store the chunks in a documents array.

  • Line 21: We use OpenAIEmbeddings to create a Chroma database db for vector stores.

  • Line 27: We retrieve the data based on the query using the similarity_search method.

Using vector stores to compute the similarity

We can also pass the quey after embedding to the vector store to find the similarity index and retrieve data.

Press + to interact
Vector stores for retrieving data
Vector stores for retrieving data

In the following code, after transforming and embedding the document chunks, we first embed the user query on line 5 using the embed_query method. Then we conduct the similarity search with vectors by using the similarity_search_by_vector method on line 8:

Press + to interact
# user query
query = "WRITE_YOUR_QUERY_HERE"
# embedding the query
embedding_vector = OpenAIEmbeddings().embed_query(query)
# computing the search using the search_by_vector() method
docs = db.similarity_search_by_vector(embedding_vector)

Note: The embedding of the query does not change the result.

Retrievers

As the name suggests, a retriever has the sole purpose of retrieving data and documents, unlike vector stores, which are also required to store the documents. A retriever can use any backbone for storing documents, including vector stores.

Press + to interact
Using retrievers for retrieving data
Using retrievers for retrieving data

To use a retriever with vector stores, we use the following code:

Press + to interact
from langchain.vectorstores import Chroma
db = Chroma.from_texts(texts, embeddings)
retriever = db.as_retriever()
# invoking the retriever
retrieved_docs = retriever.invoke(
# write your query here
)

We declare the retriever for our vector space using the as_retriever method on line 4 and pass our query to the retriever.invoke method on lines 7–9.

Test yourself

Check how well you understand RAG in LangChain with a short quiz.

1

Which option is not a transformer?

A)

CharacterTextSplitter

B)

Embed_documents

C)

tiktoken

D)

MarkdownHeaderTextSplitter

Question 1 of 40 attempted
Ask