Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Qdrant sparse vector retriever #14814

Merged
merged 11 commits into from
Dec 20, 2023
255 changes: 255 additions & 0 deletions docs/docs/integrations/retrievers/qdrant-sparse.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ce0f17b9",
"metadata": {},
"source": [
"# Qdrant Sparse Vector Retriever\n",
"\n",
">[Qdrant](https://qdrant.tech/) is an open-source, high-performance vector search engine/database.\n",
"\n",
"\n",
">`QdrantSparseVectorRetriever` uses [sparse vectors](https://qdrant.tech/articles/sparse-vectors/) introduced in Qdrant [v1.7.0](https://qdrant.tech/articles/qdrant-1.7.x/) for document retrieval.\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c307b082",
"metadata": {},
"source": [
"Install the 'qdrant_client' package:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bba863a2-977c-4add-b5f4-bfc33a80eae5",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install qdrant_client"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "c10dd962",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from qdrant_client import QdrantClient, models\n",
"\n",
"client = QdrantClient(location=\":memory:\")\n",
"collection_name = \"sparse_collection\"\n",
"vector_name = \"sparse_vector\"\n",
"\n",
"client.create_collection(\n",
" collection_name,\n",
" vectors_config={},\n",
" sparse_vectors_config={\n",
" vector_name: models.SparseVectorParams(\n",
" index=models.SparseIndexParams(\n",
" on_disk=False,\n",
" )\n",
" )\n",
" },\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "f47a2bfe",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.retrievers import QdrantSparseVectorRetriever\n",
"from langchain_core.documents import Document"
]
},
{
"cell_type": "markdown",
"id": "41baa0d1",
"metadata": {},
"source": [
"Create a demo encoder function:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f2eff08e",
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"\n",
"\n",
"def demo_encoder(_: str) -> tuple[list[int], list[float]]:\n",
" return (\n",
" sorted(random.sample(range(100), 100)),\n",
" [random.uniform(0.1, 1.0) for _ in range(100)],\n",
" )\n",
"\n",
"\n",
"# Create a retriever with a demo encoder\n",
"retriever = QdrantSparseVectorRetriever(\n",
" client=client,\n",
" collection_name=collection_name,\n",
" sparse_vector_name=vector_name,\n",
" sparse_encoder=demo_encoder,\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "b68debff",
"metadata": {},
"source": [
"Add some documents:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "cd8a7b17",
"metadata": {},
"outputs": [],
"source": [
"docs = [\n",
" Document(\n",
" metadata={\n",
" \"title\": \"Beyond Horizons: AI Chronicles\",\n",
" \"author\": \"Dr. Cassandra Mitchell\",\n",
" },\n",
" page_content=\"An in-depth exploration of the fascinating journey of artificial intelligence, narrated by Dr. Mitchell. This captivating account spans the historical roots, current advancements, and speculative futures of AI, offering a gripping narrative that intertwines technology, ethics, and societal implications.\",\n",
" ),\n",
" Document(\n",
" metadata={\n",
" \"title\": \"Synergy Nexus: Merging Minds with Machines\",\n",
" \"author\": \"Prof. Benjamin S. Anderson\",\n",
" },\n",
" page_content=\"Professor Anderson delves into the synergistic possibilities of human-machine collaboration in 'Synergy Nexus.' The book articulates a vision where humans and AI seamlessly coalesce, creating new dimensions of productivity, creativity, and shared intelligence.\",\n",
" ),\n",
" Document(\n",
" metadata={\n",
" \"title\": \"AI Dilemmas: Navigating the Unknown\",\n",
" \"author\": \"Dr. Elena Rodriguez\",\n",
" },\n",
" page_content=\"Dr. Rodriguez pens an intriguing narrative in 'AI Dilemmas,' probing the uncharted territories of ethical quandaries arising from AI advancements. The book serves as a compass, guiding readers through the complex terrain of moral decisions confronting developers, policymakers, and society as AI evolves.\",\n",
" ),\n",
" Document(\n",
" metadata={\n",
" \"title\": \"Sentient Threads: Weaving AI Consciousness\",\n",
" \"author\": \"Prof. Alexander J. Bennett\",\n",
" },\n",
" page_content=\"In 'Sentient Threads,' Professor Bennett unravels the enigma of AI consciousness, presenting a tapestry of arguments that scrutinize the very essence of machine sentience. The book ignites contemplation on the ethical and philosophical dimensions surrounding the quest for true AI awareness.\",\n",
" ),\n",
" Document(\n",
" metadata={\n",
" \"title\": \"Silent Alchemy: Unseen AI Alleviations\",\n",
" \"author\": \"Dr. Emily Foster\",\n",
" },\n",
" page_content=\"Building upon her previous work, Dr. Foster unveils 'Silent Alchemy,' a profound examination of the covert presence of AI in our daily lives. This illuminating piece reveals the subtle yet impactful ways in which AI invisibly shapes our routines, emphasizing the need for heightened awareness in our technology-driven world.\",\n",
" ),\n",
"]"
]
},
{
"cell_type": "markdown",
"id": "a5e673fa",
"metadata": {},
"source": [
"Perform a retrieval:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "3c5970db",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['1a3e0d292e6444d39451d0588ce746dc',\n",
" '19b180dd31e749359d49967e5d5dcab7',\n",
" '8de69e56086f47748e32c9e379e6865b',\n",
" 'f528fac385954e46b89cf8607bf0ee5a',\n",
" 'c1a6249d005d4abd9192b1d0b829cebe']"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever.add_documents(docs)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "4fffd0af",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content=\"In 'Sentient Threads,' Professor Bennett unravels the enigma of AI consciousness, presenting a tapestry of arguments that scrutinize the very essence of machine sentience. The book ignites contemplation on the ethical and philosophical dimensions surrounding the quest for true AI awareness.\", metadata={'title': 'Sentient Threads: Weaving AI Consciousness', 'author': 'Prof. Alexander J. Bennett'}),\n",
" Document(page_content=\"Dr. Rodriguez pens an intriguing narrative in 'AI Dilemmas,' probing the uncharted territories of ethical quandaries arising from AI advancements. The book serves as a compass, guiding readers through the complex terrain of moral decisions confronting developers, policymakers, and society as AI evolves.\", metadata={'title': 'AI Dilemmas: Navigating the Unknown', 'author': 'Dr. Elena Rodriguez'}),\n",
" Document(page_content=\"Professor Anderson delves into the synergistic possibilities of human-machine collaboration in 'Synergy Nexus.' The book articulates a vision where humans and AI seamlessly coalesce, creating new dimensions of productivity, creativity, and shared intelligence.\", metadata={'title': 'Synergy Nexus: Merging Minds with Machines', 'author': 'Prof. Benjamin S. Anderson'}),\n",
" Document(page_content='An in-depth exploration of the fascinating journey of artificial intelligence, narrated by Dr. Mitchell. This captivating account spans the historical roots, current advancements, and speculative futures of AI, offering a gripping narrative that intertwines technology, ethics, and societal implications.', metadata={'title': 'Beyond Horizons: AI Chronicles', 'author': 'Dr. Cassandra Mitchell'})]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever.get_relevant_documents(\n",
" \"Life and ethical dilemmas of AI\",\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
4 changes: 4 additions & 0 deletions libs/community/langchain_community/retrievers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,9 @@
PineconeHybridSearchRetriever,
)
from langchain_community.retrievers.pubmed import PubMedRetriever
from langchain_community.retrievers.qdrant_sparse_vector_retriever import (
QdrantSparseVectorRetriever,
)
from langchain_community.retrievers.remote_retriever import RemoteLangChainRetriever
from langchain_community.retrievers.svm import SVMRetriever
from langchain_community.retrievers.tavily_search_api import TavilySearchAPIRetriever
Expand Down Expand Up @@ -93,6 +96,7 @@
"OutlineRetriever",
"PineconeHybridSearchRetriever",
"PubMedRetriever",
"QdrantSparseVectorRetriever",
"RemoteLangChainRetriever",
"SVMRetriever",
"TavilySearchAPIRetriever",
Expand Down
Loading
Loading