pythonadvanced

LangChain Self-Query Retriever

Enable natural language metadata filtering in vector search with LangChain's SelfQueryRetriever.

python
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo
from langchain_core.documents import Document

docs = [
    Document(page_content='Python for beginners guide', metadata={'difficulty':'beginner','language':'python','year':2023}),
    Document(page_content='Advanced LangChain patterns', metadata={'difficulty':'advanced','language':'python','year':2024}),
    Document(page_content='JavaScript web development', metadata={'difficulty':'intermediate','language':'javascript','year':2023}),
]

vs = FAISS.from_documents(docs, OpenAIEmbeddings())

meta_info = [
    AttributeInfo(name='difficulty', description='Beginner, intermediate, or advanced', type='string'),
    AttributeInfo(name='language',   description='Programming language',                type='string'),
    AttributeInfo(name='year',       description='Year published',                       type='integer'),
]

retriever = SelfQueryRetriever.from_llm(ChatOpenAI(model='gpt-4o-mini'), vs, 'Coding tutorials', meta_info, verbose=True)
results = retriever.invoke('Show me advanced Python tutorials from 2024')
for d in results:
    print(d.page_content, d.metadata)

Use Cases

  • metadata filtering
  • smart retrieval
  • filtered RAG

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.