Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat add rag #974

Merged
merged 106 commits into from
Mar 17, 2024
Merged

Feat add rag #974

merged 106 commits into from
Mar 17, 2024

Conversation

better629
Copy link
Collaborator

@better629 better629 commented Mar 7, 2024

Features

  • add rag and its example in examples/rag_pipeline.py
  • apply rag to memory_storage
  • update related-uts

Feature Docs

Influence

further will remove metagpt/document_store and update LTM related works.

Result

(metagpt_310) MacBook-Pro:MetaGPT xxxx$ pytest tests/metagpt/rag/*
============================================================================== test session starts ===============================================================================
collected 34 items

tests/metagpt/rag/engines/test_simple.py .....                                                                                                                             [ 14%]
tests/metagpt/rag/factories/test_base.py ..........                                                                                                                        [ 44%]
tests/metagpt/rag/factories/test_llm.py .....                                                                                                                              [ 58%]
tests/metagpt/rag/factories/test_ranker.py .....                                                                                                                           [ 73%]
tests/metagpt/rag/factories/test_retriever.py ......                                                                                                                       [ 91%]
tests/metagpt/rag/retrievers/test_bm25_retriever.py .                                                                                                                      [ 94%]
tests/metagpt/rag/retrievers/test_faiss_retriever.py .                                                                                                                     [ 97%]
tests/metagpt/rag/retrievers/test_hybrid_retriever.py .                                                                                                                    [100%]

======================================================================== 34 passed, 5 warnings in 16.35s =========================================================================




(metagpt_310) MacBook-Pro:MetaGPT xxxx$ pytest tests/metagpt/document_store/*
============================================================================== test session starts ===============================================================================
collected 8 items

tests/metagpt/document_store/test_chromadb_store.py ..                                                                                                                     [ 12%]
tests/metagpt/document_store/test_document.py .                                                                                                                            [ 25%]
tests/metagpt/document_store/test_faiss_store.py ...                                                                                                                       [ 62%]
tests/metagpt/document_store/test_lancedb_store.py .                                                                                                                       [ 75%]
tests/metagpt/document_store/test_qdrant_store.py .

========================================================================= 8 passed, 4 warnings in 17.91s =========================================================================



(metagpt_310) MacBook-Pro:MetaGPT xxxx$ pytest tests/metagpt/memory/*
============================================================================== test session starts ===============================================================================
collected 9 items

tests/metagpt/memory/test_brain_memory.py .....                                                                                                                            [ 55%]
tests/metagpt/memory/test_longterm_memory.py .                                                                                                                             [ 66%]
tests/metagpt/memory/test_memory.py .                                                                                                                                      [ 77%]
tests/metagpt/memory/test_memory_storage.py ..                                                                                                                             [100%]

========================================================================= 9 passed, 6 warnings in 10.73s =========================================================================

Other

examples/rag_pipeline.py Outdated Show resolved Hide resolved
self.engine.add_docs([travel_filepath])
await self.rag_pipeline(question=travel_question, print_title=False)

async def rag_add_objs(self, print_title=True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the function name, I thought it was an add_objs function for an actual engine class. It turns out to be an example function. Making an example as an class is a bit strange, because using class suggests you are doing abstraction, but why does an example need abstraction?
Perhaps below is simplier, like writing an unittest, each function is stating one scenario

def add_objs_and_query():
    engine = SimpleEngine.from_docs(...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here shows the recall changes before and after adding objects.

examples/rag_pipeline.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@garylin2099 garylin2099 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a lot of wrapper function around LI, such as asearch -> aquery, add_nodes -> insert_nodes, and some class whose init parameters are much the same with those in LI. So my major question is, what are the extra bits of work we have done here on top of LI, excluding those already supported by LI. Because for developers, it makes no difference from learning MG RAG APIs than LI APIs. Why should developers learn and use our RAG to develop MG if they can use LI to do the same? What are the extra features we are providing here (e.g. you can add pydantic model)? Can you guys give more explanation in the PR note?

metagpt/roles/sales.py Show resolved Hide resolved
metagpt/rag/engines/simple.py Show resolved Hide resolved
if not input_dir and not input_files:
raise ValueError("Must provide either `input_dir` or `input_files`.")

documents = SimpleDirectoryReader(input_dir=input_dir, input_files=input_files).load_data()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this Reader fixed for this Engine? If so, you can write it as an attribute, otherwise, would a config be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SimpleDirectoryReader is enough for SimpleEngine. Using class variables may does not make it more convenient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the extend demands can be proposed if doesn't meet the current usage.

metagpt/rag/engines/simple.py Show resolved Hide resolved
retriever_configs: list[BaseRetrieverConfig] = None,
ranker_configs: list[BaseRankerConfig] = None,
) -> "SimpleEngine":
"""Load from previously maintained"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously existed index? Say the object

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load from previously maintained index by self.persist(), index_config contains persis_path.

metagpt/rag/factories/base.py Outdated Show resolved Hide resolved
metagpt/memory/memory_storage.py Show resolved Hide resolved
metagpt/rag/engines/simple.py Show resolved Hide resolved
metagpt/rag/vector_stores/chroma/base.py Outdated Show resolved Hide resolved
metagpt/rag/factories/base.py Outdated Show resolved Hide resolved
retriever_configs: list[BaseRetrieverConfig] = None,
ranker_configs: list[BaseRankerConfig] = None,
) -> "SimpleEngine":
"""Load from previously maintained"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load from previously maintained index by self.persist(), index_config contains persis_path.

metagpt/rag/engines/simple.py Show resolved Hide resolved
examples/rag_pipeline.py Outdated Show resolved Hide resolved
self.engine.add_docs([travel_filepath])
await self.rag_pipeline(question=travel_question, print_title=False)

async def rag_add_objs(self, print_title=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here shows the recall changes before and after adding objects.

examples/rag_pipeline.py Outdated Show resolved Hide resolved
if not input_dir and not input_files:
raise ValueError("Must provide either `input_dir` or `input_files`.")

documents = SimpleDirectoryReader(input_dir=input_dir, input_files=input_files).load_data()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SimpleDirectoryReader is enough for SimpleEngine. Using class variables may does not make it more convenient.

examples/rag_pipeline.py Outdated Show resolved Hide resolved
examples/rag_pipeline.py Outdated Show resolved Hide resolved
examples/rag_pipeline.py Show resolved Hide resolved
examples/rag_pipeline.py Outdated Show resolved Hide resolved
metagpt/memory/memory_storage.py Show resolved Hide resolved
metagpt/memory/memory_storage.py Outdated Show resolved Hide resolved
Copy link
Owner

@geekan geekan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@geekan geekan merged commit e783e5b into geekan:main Mar 17, 2024
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants