Discover how to retrieve information from documents using retrieval-augmented generation in LangChain.
Retrieval-augmented generation (RAG) is an NLP model architecture that combines the retrieval-based and generation-based approaches to enable a model’s capability to extract information from a specified document. The language model utilizes user-specific data to pull the relevant information. RAG overcomes the limitations in generating contextually relevant and accurate responses by leveraging the benefits of retrieval mechanisms. This results in more informed and contextually appropriate responses.
LangChain facilitates the implementation of RAG applications, empowering developers to seamlessly replace specific functionalities within an application.
LangChain provides a retrieval system through its document loaders, document transformers, text embedding models, vector stores, and retrieval.
Document loaders
Document loaders load documents from different sources and types, including HTML, PDF, code, and CSV. They also allow support to load private
Loading a text file
The most basic type of document loading available in LangChain is through the load
method. It reads a file as text and saves it in a single document as follows:
from langchain.document_loaders import TextLoaderloader = TextLoader("inputFile.txt")loader.load()
Loading a CSV file
Another commonly used document is a comma-separated value (CSV) file. A CSV file is a delimited text file often used for storing and exchanging tabular data in plain text form. Each line of the file represents a table row, and commas separate the values within each row. CSV files are widely used for data manipulation and analysis across different applications and platforms.
from langchain.document_loaders.csv_loader import CSVLoaderloader = CSVLoader(file_path='inputRecord.csv')data = loader.load()
Loading a PDF file
Portable Document Format (PDF) files are commonly used for storing documents across different platforms. LangChain provides a PyPDFLoader
method to load a PDF file as follows:
from langchain.document_loaders import PyPDFLoaderloader = PyPDFLoader("inputDocument.pdf")pages = loader.load_and_split()
Here, the the PyPDFLoader
loads the document into an array of documents with each document containing a single page contents and metadata with the corresponding page number.
Document transformers
After loading the document, it’s important to transform it according to our application or model requirements. This is where document transformers come into play. LangChain has various built-in transformers for documents that can perform several operations, including:
Splitting
Filtering
Combining
Translating to another language
Manipulating data
Let’s look into the simplest transformer that operates splitting. We use text splitters in LangChain for this purpose.
Text splitters
When working with a long document, splitting it into smaller pieces is often necessary. Let’s look at how the text splitters work:
The text splitters follow the steps below:
Split the document into smaller, readable chunks.
Combine the small chunks into larger ones to reach the desired chunk size.
Overlap part of smaller chunks at the boundary to keep the context between chunks.
Let’s see how the recursive text splitter works in this example. We use the following syntax:
from langchain.text_splitter import RecursiveCharacterTextSplittertext_splitter = RecursiveCharacterTextSplitter(# The chunk_size and chunk_overlap can be modified according to the requirementslength_function = len,chunk_size = 200,chunk_overlap = 10,add_start_index = True,)texts = text_splitter.create_documents([input_document])
Line 5: We define the length of a chunk. By default, this is done by counting the number of characters.
Line 6: We define the size of the chunks we want the document to be split into.
Line 7: We define the overlapping between chunks from the document to maintain context between them.
The following table represents the different types of text splitters available to use:
Text Splitters
Text Splitter | Usage |
HTML header text splitter | Splits texts at an element level and adds relevant metadata for each header to a chunk |
Split by character | Splits based on characters where a chunk size is measured by the number of characters present in a chunk |
Split code | Enables splitting code in multiple programming languages |
Markdown header text splitter | Splits a document in chunks identified by various headers creating header groups |
Recursively split by character | Splits generic text based on a parameterized list of characters. The default list is |
Split by tokens | Splits text based on the token limit of a language model |
What happens when the size of contexts or overlaps is long in chunks?
Text embedding models
Embeddings in LangChain help us store the text’s semantic meaning by creating a vector representation, which helps determine similarity with other document texts. For this purpose, LangChain provides the Embeddings class, an interface to interact with all the embedding models. We have several embedding model interfaces, including HuggingFace, OpenAI, Cohere, etc.
The Embeddings class at a base level provides embeddings using two methods:
-
Embedding documents: This embeds multiple documents or texts into their numerical representation. To embed documents, we use the following syntax:
embeddings = embeddings_model.embed_documents( [ "This is the first text", "This is the second text", "This is the third text" ] )
-
Embedding a query: This embeds a single query into its numerical embedding. A query can be a text that contains the query we want to search for in the document.
embedded_query = embeddings_model.embed_query("WRITE_YOUR_QUERY_HERE")
Vector stores
A document may or may not have unstructured data requiring some structuring for the model to access. We most commonly use embeddings to structure that data in a vector space. When a query is passed for retrieving data from the document, the unstructured query is embedded to determine the similarity index (most similar data) between the data present in the vector space and the embedded query. Vector stores perform all of this search process.
There are two methods for searching the similarity in data stored in vector stores:
Computing a simple similarity index for searching data
In the simple similarity method, we directly pass the query to the database by computing the similarity index with the most relevant documents and retrieve the most similar result, as shown in the diagram below.
The following code shows how we can use the simple similarity_search
method for retrieving data:
from langchain.document_loaders import TextLoaderfrom langchain.embeddings.openai import OpenAIEmbeddingsfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.vectorstores import Chroma# Load the documentinput_document = TextLoader('my_document.txt').load()# transform the documenttext_splitter = RecursiveCharacterTextSplitter(# The chunk_size and chunk_overlap can be modified according to the requirementslength_function = len,chunk_size = 200,chunk_overlap = 10,add_start_index = True,)documents = text_splitter.create_documents([input_document])# embed the chunksdb = Chroma.from_documents(documents, OpenAIEmbeddings())# user queryquery = "WRITE_YOUR_QUERY_HERE"# computing the search using the similarity_search() methoddocs = db.similarity_search(query)
Line 4: We import the
Chroma
module, which is an open-source vector store for building AI applications.Line 7: We load the input document
my_document
using theTextLoader
function.Lines 10–18: We split the document using a
RecursiveCharacterTextSplitter
and store the chunks in adocuments
array.Line 21: We use
OpenAIEmbeddings
to create a Chroma databasedb
for vector stores.Line 27: We retrieve the data based on the
query
using thesimilarity_search
method.
Using vector stores to compute the similarity
We can also pass the quey after embedding to the vector store to find the similarity index and retrieve data.
In the following code, after transforming and embedding the document chunks, we first embed the user query on line 5 using the embed_query
method. Then we conduct the similarity search with vectors by using the similarity_search_by_vector
method on line 8:
# user queryquery = "WRITE_YOUR_QUERY_HERE"# embedding the queryembedding_vector = OpenAIEmbeddings().embed_query(query)# computing the search using the search_by_vector() methoddocs = db.similarity_search_by_vector(embedding_vector)
Note: The embedding of the query does not change the result.
Retrievers
As the name suggests, a retriever has the sole purpose of retrieving data and documents, unlike vector stores, which are also required to store the documents. A retriever can use any backbone for storing documents, including vector stores.
To use a retriever with vector stores, we use the following code:
from langchain.vectorstores import Chromadb = Chroma.from_texts(texts, embeddings)retriever = db.as_retriever()# invoking the retrieverretrieved_docs = retriever.invoke(# write your query here)
We declare the retriever for our vector space using the as_retriever
method on line 4 and pass our query to the retriever.invoke
method on lines 7–9.
Test yourself
Check how well you understand RAG in LangChain with a short quiz.
Which option is not a transformer?
CharacterTextSplitter
Embed_documents
tiktoken
MarkdownHeaderTextSplitter
Get hands-on with 1400+ tech skills courses.