pythonbeginner

LangChain Recursive Text Splitter

Split long documents into overlapping chunks optimised for LLM context windows.

python
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader

# Split a PDF into chunks
loader = PyPDFLoader('document.pdf')
pages = loader.load()

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
    separators=['\n\n', '\n', '. ', ' ', ''],
)

chunks = splitter.split_documents(pages)

print(f'Pages: {len(pages)}, Chunks: {len(chunks)}')
for i, chunk in enumerate(chunks[:3]):
    print(f'Chunk {i}: {len(chunk.page_content)} chars | page {chunk.metadata.get("page")}')
    print(chunk.page_content[:100], '...')

Use Cases

  • PDF ingestion
  • RAG chunking
  • context preparation

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.