Feat add rag #974

better629 · 2024-03-07T15:06:57Z

Features

add rag and its example in examples/rag_pipeline.py
apply rag to memory_storage
update related-uts

Feature Docs

Influence

further will remove metagpt/document_store and update LTM related works.

Result

(metagpt_310) MacBook-Pro:MetaGPT xxxx$ pytest tests/metagpt/rag/*
============================================================================== test session starts ===============================================================================
collected 34 items

tests/metagpt/rag/engines/test_simple.py .....                                                                                                                             [ 14%]
tests/metagpt/rag/factories/test_base.py ..........                                                                                                                        [ 44%]
tests/metagpt/rag/factories/test_llm.py .....                                                                                                                              [ 58%]
tests/metagpt/rag/factories/test_ranker.py .....                                                                                                                           [ 73%]
tests/metagpt/rag/factories/test_retriever.py ......                                                                                                                       [ 91%]
tests/metagpt/rag/retrievers/test_bm25_retriever.py .                                                                                                                      [ 94%]
tests/metagpt/rag/retrievers/test_faiss_retriever.py .                                                                                                                     [ 97%]
tests/metagpt/rag/retrievers/test_hybrid_retriever.py .                                                                                                                    [100%]

======================================================================== 34 passed, 5 warnings in 16.35s =========================================================================




(metagpt_310) MacBook-Pro:MetaGPT xxxx$ pytest tests/metagpt/document_store/*
============================================================================== test session starts ===============================================================================
collected 8 items

tests/metagpt/document_store/test_chromadb_store.py ..                                                                                                                     [ 12%]
tests/metagpt/document_store/test_document.py .                                                                                                                            [ 25%]
tests/metagpt/document_store/test_faiss_store.py ...                                                                                                                       [ 62%]
tests/metagpt/document_store/test_lancedb_store.py .                                                                                                                       [ 75%]
tests/metagpt/document_store/test_qdrant_store.py .

========================================================================= 8 passed, 4 warnings in 17.91s =========================================================================



(metagpt_310) MacBook-Pro:MetaGPT xxxx$ pytest tests/metagpt/memory/*
============================================================================== test session starts ===============================================================================
collected 9 items

tests/metagpt/memory/test_brain_memory.py .....                                                                                                                            [ 55%]
tests/metagpt/memory/test_longterm_memory.py .                                                                                                                             [ 66%]
tests/metagpt/memory/test_memory.py .                                                                                                                                      [ 77%]
tests/metagpt/memory/test_memory_storage.py ..                                                                                                                             [100%]

========================================================================= 9 passed, 6 warnings in 10.73s =========================================================================

Other

examples/rag_pipeline.py

garylin2099 · 2024-03-14T13:00:40Z

examples/rag_pipeline.py

+        self.engine.add_docs([travel_filepath])
+        await self.rag_pipeline(question=travel_question, print_title=False)
+
+    async def rag_add_objs(self, print_title=True):


from the function name, I thought it was an add_objs function for an actual engine class. It turns out to be an example function. Making an example as an class is a bit strange, because using class suggests you are doing abstraction, but why does an example need abstraction?
Perhaps below is simplier, like writing an unittest, each function is stating one scenario

def add_objs_and_query(): engine = SimpleEngine.from_docs(...)

Here shows the recall changes before and after adding objects.

examples/rag_pipeline.py

garylin2099

I see a lot of wrapper function around LI, such as asearch -> aquery, add_nodes -> insert_nodes, and some class whose init parameters are much the same with those in LI. So my major question is, what are the extra bits of work we have done here on top of LI, excluding those already supported by LI. Because for developers, it makes no difference from learning MG RAG APIs than LI APIs. Why should developers learn and use our RAG to develop MG if they can use LI to do the same? What are the extra features we are providing here (e.g. you can add pydantic model)? Can you guys give more explanation in the PR note?

metagpt/roles/sales.py

metagpt/rag/engines/simple.py

garylin2099 · 2024-03-14T13:24:14Z

metagpt/rag/engines/simple.py

+        if not input_dir and not input_files:
+            raise ValueError("Must provide either `input_dir` or `input_files`.")
+
+        documents = SimpleDirectoryReader(input_dir=input_dir, input_files=input_files).load_data()


is this Reader fixed for this Engine? If so, you can write it as an attribute, otherwise, would a config be better?

SimpleDirectoryReader is enough for SimpleEngine. Using class variables may does not make it more convenient.

the extend demands can be proposed if doesn't meet the current usage.

metagpt/rag/engines/simple.py

garylin2099 · 2024-03-14T13:29:57Z

metagpt/rag/engines/simple.py

+        retriever_configs: list[BaseRetrieverConfig] = None,
+        ranker_configs: list[BaseRankerConfig] = None,
+    ) -> "SimpleEngine":
+        """Load from previously maintained"""


previously existed index? Say the object

Load from previously maintained index by self.persist(), index_config contains persis_path.

metagpt/rag/factories/base.py

metagpt/memory/memory_storage.py

metagpt/rag/engines/simple.py

metagpt/rag/vector_stores/chroma/base.py

metagpt/rag/factories/base.py

seehi · 2024-03-15T02:14:15Z

metagpt/rag/engines/simple.py

+        retriever_configs: list[BaseRetrieverConfig] = None,
+        ranker_configs: list[BaseRankerConfig] = None,
+    ) -> "SimpleEngine":
+        """Load from previously maintained"""


Load from previously maintained index by self.persist(), index_config contains persis_path.

metagpt/rag/engines/simple.py

examples/rag_pipeline.py

seehi · 2024-03-15T02:49:18Z

examples/rag_pipeline.py

+        self.engine.add_docs([travel_filepath])
+        await self.rag_pipeline(question=travel_question, print_title=False)
+
+    async def rag_add_objs(self, print_title=True):


Here shows the recall changes before and after adding objects.

examples/rag_pipeline.py

seehi · 2024-03-15T03:10:07Z

metagpt/rag/engines/simple.py

+        if not input_dir and not input_files:
+            raise ValueError("Must provide either `input_dir` or `input_files`.")
+
+        documents = SimpleDirectoryReader(input_dir=input_dir, input_files=input_files).load_data()


SimpleDirectoryReader is enough for SimpleEngine. Using class variables may does not make it more convenient.

…memory

examples/rag_pipeline.py

metagpt/memory/memory_storage.py

geekan

lgtm

better629 and others added 30 commits January 19, 2024 17:37

replace langchain with llama-index

4fcf724

rag pipeline

916b139

rag pipeline

bd980d4

add example data

ed6ce07

modify .gitignore and add examples/data/rag.txt

0b0be04

rag pipeline

3ae4221

rag add docs

254088b

add rag pipeline unittest

a4c0953

simplify rag factory

dd965a2

upgrade llama-index to support new openai model

2c98540

rag add objs

a98da52

rag add objs

ab045cc

reflection for checking methods

ada8e8e

RAGObject interface add model_dump method; modify by pylint

aca3d1a

fix by pre-commit hooks

fae24fd

upgrade llama-index to v0.10

19a9a98

upgrade llama-index to v0.10

c02dc5c

Add .gitattributes to treat certain files as binary

4264f8c

update requirements.txt

0a3c120

update requirements.txt

7b552ff

add index factory and chromadb

c657af4

rag add chromadb save&load example

525b47b

remove examples/search_kb

800054a

reconstruct object in rag node

af63eab

reconstruct object in rag node

f149007

fix chromadb ut

ac14814

fix chromadb ut

184b012

fix rag ut failed cases

f327798

add excluded_llm_metadata_keys

a3b2cf7

fix conflict

0f2f460

seehi had a problem deploying to unittest March 14, 2024 12:39 — with GitHub Actions Failure

garylin2099 reviewed Mar 14, 2024

View reviewed changes

add runtime_checkable to support 3.10

666cac9

seehi had a problem deploying to unittest March 14, 2024 14:41 — with GitHub Actions Failure

update rsp_cache.json

e58cef6

seehi had a problem deploying to unittest March 15, 2024 03:12 — with GitHub Actions Failure

update

b3d13ac

better629 had a problem deploying to unittest March 15, 2024 06:29 — with GitHub Actions Failure

seehi reviewed Mar 15, 2024

View reviewed changes

seehi added 2 commits March 15, 2024 15:36

rename ConfigFactory to RAGConfigRegistry

cb9543b

Merge branch 'feat_memory' of github.com:better629/MetaGPT into feat_…

f46cc95

…memory

seehi had a problem deploying to unittest March 15, 2024 07:37 — with GitHub Actions Failure

rename RAGConfigRegistry to ConfigBasedFactory

8e80753

seehi had a problem deploying to unittest March 15, 2024 10:55 — with GitHub Actions Failure

rag_add_objs catch exception

08f4e2a

seehi had a problem deploying to unittest March 15, 2024 11:15 — with GitHub Actions Failure

for consistency, move rag.llm to rag.factories.llm

ec2e8bd

seehi had a problem deploying to unittest March 16, 2024 01:03 — with GitHub Actions Failure

just change some func's position

d27026a

seehi had a problem deploying to unittest March 16, 2024 01:12 — with GitHub Actions Failure

modify comment of ObjectNodeMetadata

5448de3

seehi had a problem deploying to unittest March 16, 2024 01:30 — with GitHub Actions Failure

geekan reviewed Mar 16, 2024

View reviewed changes

change func name for more readable

bd91611

seehi had a problem deploying to unittest March 16, 2024 14:10 — with GitHub Actions Failure

fix persist naming

8c218a1

better629 had a problem deploying to unittest March 16, 2024 14:31 — with GitHub Actions Failure

geekan approved these changes Mar 17, 2024

View reviewed changes

geekan merged commit e783e5b into geekan:main Mar 17, 2024
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat add rag #974

Feat add rag #974

better629 commented Mar 7, 2024 •

edited

Loading

garylin2099 Mar 14, 2024

seehi Mar 15, 2024

garylin2099 left a comment •

edited

Loading

garylin2099 Mar 14, 2024

seehi Mar 15, 2024

better629 Mar 15, 2024

garylin2099 Mar 14, 2024

seehi Mar 15, 2024

seehi Mar 15, 2024

seehi Mar 15, 2024

seehi Mar 15, 2024

geekan left a comment

Feat add rag #974

Feat add rag #974

Conversation

better629 commented Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garylin2099 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geekan left a comment

Choose a reason for hiding this comment

better629 commented Mar 7, 2024 •

edited

Loading

garylin2099 left a comment •

edited

Loading