pythonbeginner
LangChain Recursive Text Splitter
Split long documents into overlapping chunks optimised for LLM context windows.
pythonPress ⌘/Ctrl + Shift + C to copy
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
# Split a PDF into chunks
loader = PyPDFLoader('document.pdf')
pages = loader.load()
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
length_function=len,
separators=['\n\n', '\n', '. ', ' ', ''],
)
chunks = splitter.split_documents(pages)
print(f'Pages: {len(pages)}, Chunks: {len(chunks)}')
for i, chunk in enumerate(chunks[:3]):
print(f'Chunk {i}: {len(chunk.page_content)} chars | page {chunk.metadata.get("page")}')
print(chunk.page_content[:100], '...')Use Cases
- PDF ingestion
- RAG chunking
- context preparation
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
typescriptadvanced
LangChain RAG Chain Pipeline
Build a retrieval-augmented generation chain with LangChain using vector store retrieval and prompt templates.
Best for: Document Q&A
#langchain#rag
pythonadvanced
Build a RAG Pipeline with LangChain
Implement retrieval-augmented generation using LangChain, embeddings, and a vector store.
Best for: Knowledge base Q&A
#ai#langchain
typescriptintermediate
Text Chunking Strategies for RAG
Implement different text chunking strategies for RAG pipelines — fixed, recursive, and semantic.
Best for: RAG pipeline preprocessing
#ai#chunking
pythonadvanced
RAG with FAISS and LangChain Python
Build a local RAG pipeline using FAISS vector store and LangChain for document Q&A.
Best for: document Q&A
#rag#faiss