-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat add rag #974
Feat add rag #974
Conversation
examples/rag_pipeline.py
Outdated
self.engine.add_docs([travel_filepath]) | ||
await self.rag_pipeline(question=travel_question, print_title=False) | ||
|
||
async def rag_add_objs(self, print_title=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the function name, I thought it was an add_objs function for an actual engine class. It turns out to be an example function. Making an example as an class is a bit strange, because using class suggests you are doing abstraction, but why does an example need abstraction?
Perhaps below is simplier, like writing an unittest, each function is stating one scenario
def add_objs_and_query():
engine = SimpleEngine.from_docs(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here shows the recall changes before and after adding objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a lot of wrapper function around LI, such as asearch -> aquery, add_nodes -> insert_nodes, and some class whose init parameters are much the same with those in LI. So my major question is, what are the extra bits of work we have done here on top of LI, excluding those already supported by LI. Because for developers, it makes no difference from learning MG RAG APIs than LI APIs. Why should developers learn and use our RAG to develop MG if they can use LI to do the same? What are the extra features we are providing here (e.g. you can add pydantic model)? Can you guys give more explanation in the PR note?
if not input_dir and not input_files: | ||
raise ValueError("Must provide either `input_dir` or `input_files`.") | ||
|
||
documents = SimpleDirectoryReader(input_dir=input_dir, input_files=input_files).load_data() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this Reader fixed for this Engine? If so, you can write it as an attribute, otherwise, would a config be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SimpleDirectoryReader is enough for SimpleEngine. Using class variables may does not make it more convenient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the extend demands can be proposed if doesn't meet the current usage.
metagpt/rag/engines/simple.py
Outdated
retriever_configs: list[BaseRetrieverConfig] = None, | ||
ranker_configs: list[BaseRankerConfig] = None, | ||
) -> "SimpleEngine": | ||
"""Load from previously maintained""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously existed index? Say the object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Load from previously maintained index by self.persist(), index_config contains persis_path.
metagpt/rag/engines/simple.py
Outdated
retriever_configs: list[BaseRetrieverConfig] = None, | ||
ranker_configs: list[BaseRankerConfig] = None, | ||
) -> "SimpleEngine": | ||
"""Load from previously maintained""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Load from previously maintained index by self.persist(), index_config contains persis_path.
examples/rag_pipeline.py
Outdated
self.engine.add_docs([travel_filepath]) | ||
await self.rag_pipeline(question=travel_question, print_title=False) | ||
|
||
async def rag_add_objs(self, print_title=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here shows the recall changes before and after adding objects.
if not input_dir and not input_files: | ||
raise ValueError("Must provide either `input_dir` or `input_files`.") | ||
|
||
documents = SimpleDirectoryReader(input_dir=input_dir, input_files=input_files).load_data() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SimpleDirectoryReader is enough for SimpleEngine. Using class variables may does not make it more convenient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Features
examples/rag_pipeline.py
Feature Docs
Influence
further will remove
metagpt/document_store
and update LTM related works.Result
Other