-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Document To Vector Store #838
Add Document To Vector Store #838
Conversation
Would be cool to have this idea reviewed! I'll add a script test if I am allowed to proceed |
Hey @khoangothe this is a great direction! Can you share an example of how it can be used? |
Hi @assafelovic, Thanks for the review! I just added a commit to document how it should be used. Here's the code that I used to test locally. Basically I stored the info in vector store whenever a vector store is defined and Will add a test script soon. import asyncio
from gpt_researcher import GPTResearcher
from langchain_community.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv
load_dotenv()
async def main():
vector_store = InMemoryVectorStore(embedding=OpenAIEmbeddings())
query = "Which one is the best LLM"
# Create an instance of GPTResearcher
researcher = GPTResearcher(
query=query,
report_type="research_report",
report_source="web",
vector_store=vector_store,
)
# Conduct research and write the report
await researcher.conduct_research()
# Check if the vector_store contains information from the sources
related_contexts = await vector_store.asimilarity_search("GPT-4", k=5)
print(related_contexts)
print(len(related_contexts))
asyncio.run(main()) |
Thanks @khoangothe excuse me if I might be missing something but how is this different than this? https://docs.gptr.dev/docs/gpt-researcher/context/vector-stores |
@assafelovic Sorry if my examples were not clear enough. The one you link allows you to talk to your vector store, so your vector store needs to already have information for gpt-researcher to do research on (when In the example, starting with an empty |
Looks good 👍🏼 |
This path of persistent & re-usable vector storage that can be leveraged across reports & follow-up questions is very interesting for me. I've merged this branch into #819 & am planning on testing extensively with PGVector Storage. |
7d7348e
to
7abebdd
Compare
@ElishaKay @assafelovic Hi guys, I was able to resolve the merge conflict and provided tests cases for the scenarios that I implemented. For each type of knowledge source (urls, hybrid, local, webs, langchain documents) data will be ingested in the vector_store that user provided, usage is written in the tests (I added a pdf to test the local and hybrid functionalities). I also raised 1 issue in Discord so hopefully you guys can check it out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome @khoangothe kudos for the hard work and implementation. Looking forward to the next PRs that can empower this
Love the idea, as you can build out a knowledge base through various queries this way. It adds a bit more human-in-the-loop for complex topics. One use case could be if you already have a corpus of literature, but want to add more recent content via GPT-R's searches. |
Documents, crawled urls, and website will be chunked and loaded to the inputted vector store if vector_store is not None. Although adding the data in Document would be more efficient, but I think this solution makes the code decoupled and easier to maintain.
By default this changes won't add any features, but new applications can be implemented based on the vectorstore (like chatting with the sources)