-
Notifications
You must be signed in to change notification settings - Fork 103
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
16 changed files
with
4,620 additions
and
86 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,32 +1,54 @@ | ||
## Benchmarks | ||
We validate the benchmark results provided in [HippoRAG](https://arxiv.org/abs/2405.14831). | ||
We validate the benchmark results provided in [HippoRAG](https://arxiv.org/abs/2405.14831), as well as comparing with other methods: | ||
- NaiveRAG using the embedder `text-embedding-3-small` | ||
- [LightRAG](https://github.com/HKUDS/LightRAG) | ||
|
||
The scripts in this directory will generate and evaluate the 2wikimultihopqa datasets on a subsets of 51 and 101 queries with the same methodology as in the paper above. We preloaded the results so its is enough to run `evaluate.xx` to get the numbers. You can also run `create_dbs.xx` to regenerate the vector and graph databases. | ||
The scripts in this directory will generate and evaluate the 2wikimultihopqa datasets on a subsets of 51 and 101 queries with the same methodology as in the HippoRAG paper. In particular, we evaluate the retrieval capabilities of each method, mesauring the percentage of queries for which all the required evidence was retrieved. We preloaded the results so it is enough to run `evaluate.xx` to get the numbers. You can also run `create_dbs.xx` to regenerate the databases for the different methods (you will need to set OPENAI_API_KEY, LightRAG could take a while to process). | ||
|
||
The output should looks similar at follow (the exact numbers could vary based on your graph configuration) | ||
``` | ||
Evaluation of the performance of the VectorDB and Circlemind on the same data (51 queries) | ||
Evaluation of the performance of different RAG methods on 2wikimultihopqa (51 queries) | ||
VectorDB | ||
Loading dataset... | ||
[all questions] Percentage of queries with perfect retrieval: 0.49019607843137253 | ||
[multihop only] Percentage of queries with perfect retrieval: 0.32432432432432434 | ||
LightRAG | ||
Loading dataset... | ||
Percentage of queries with perfect retrieval: 0.47058823529411764 | ||
[multihop] Percentage of queries with perfect retrieval: 0.32432432432432434 | ||
Circlemind | ||
Loading dataset... | ||
[all questions] Percentage of queries with perfect retrieval: 0.9607843137254902 | ||
[multihop only] Percentage of queries with perfect retrieval: 0.9459459459459459 | ||
Evaluation of the performance of the VectorDB and Circlemind on the same data (101 queries) | ||
Evaluation of the performance of different RAG methods on 2wikimultihopqa (101 queries) | ||
VectorDB | ||
Loading dataset... | ||
[all questions] Percentage of queries with perfect retrieval: 0.4158415841584158 | ||
[multihop only] Percentage of queries with perfect retrieval: 0.2318840579710145 | ||
LightRAG [local] | ||
Loading dataset... | ||
Percentage of queries with perfect retrieval: 0.44554455445544555 | ||
[multihop] Percentage of queries with perfect retrieval: 0.2753623188405797 | ||
Circlemind | ||
Loading dataset... | ||
[all questions] Percentage of queries with perfect retrieval: 0.9306930693069307 | ||
[multihop only] Percentage of queries with perfect retrieval: 0.8985507246376812 | ||
``` | ||
``` | ||
|
||
We also quickly benchmarked on the HotpotQA dataset (we will soon release the code for that as well). Here's a preview of the results (101 queries): | ||
|
||
``` | ||
VectorDB: 0.78 | ||
LightRAG [local mode]: 0.55 | ||
Circlemind: 0.84 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
from typing import Dict, List | ||
|
||
DOMAIN: Dict[str, str] = { | ||
"2wikimultihopqa": """Analyse the following passage and identify the people, creative works, and places mentioned in it. Your goal is to create an RDF (Resource Description Framework) graph from the given text. | ||
IMPORTANT: among other entities and relationships you find, make sure to extract as separate entities (to be connected with the main one) a person's | ||
role as a family member (such as 'son', 'uncle', 'wife', ...), their profession (such as 'director'), and the location | ||
where they live or work. Pay attention to the spelling of the names.""", # noqa: E501 | ||
"hotpotqa": """Analyse the following passage and identify all the entities mentioned in it and their relationships. Your goal is to create an RDF (Resource Description Framework) graph from the given text. | ||
Pay attention to the spelling of the entity names.""" | ||
} | ||
|
||
QUERIES: Dict[str, List[str]] = { | ||
"2wikimultihopqa": [ | ||
"When did Prince Arthur's mother die?", | ||
"What is the place of birth of Elizabeth II's husband?", | ||
"Which film has the director died later, Interstellar or Harry Potter I?", | ||
"Where does the singer who wrote the song Blank Space work at?", | ||
], | ||
"hotpotqa": [ | ||
"Are Christopher Nolan and Sathish Kalathil both film directors?", | ||
"What language were books being translated into during the era of Haymo of Faversham?", | ||
"Who directed the film that was shot in or around Leland, North Carolina in 1986?", | ||
"Who wrote a song after attending a luau in the Koolauloa District on the island of Oahu in Honolulu County?" | ||
] | ||
} | ||
|
||
ENTITY_TYPES: Dict[str, List[str]] = { | ||
"2wikimultihopqa": [ | ||
"person", | ||
"familiy_role", | ||
"location", | ||
"organization", | ||
"creative_work", | ||
"profession", | ||
], | ||
"hotpotqa": [ | ||
"person", | ||
"familiy_role", | ||
"location", | ||
"organization", | ||
"creative_work", | ||
"profession", | ||
"event", | ||
"year" | ||
], | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,20 @@ | ||
python vdb_benchmark.py -n 51 -c -b | ||
python vdb_benchmark.py -n 101 -c -b | ||
python graph_benchmark.py -n 51 -c -b | ||
python graph_benchmark.py -n 101 -c -b | ||
:: 2wikimultihopqa benchmark | ||
:: Creating databases | ||
python vdb_benchmark.py -n 51 -c | ||
python vdb_benchmark.py -n 101 -c | ||
python lightrag_benchmark.py -n 51 -c | ||
python lightrag_benchmark.py -n 101 -c | ||
python graph_benchmark.py -n 51 -c | ||
python graph_benchmark.py -n 101 -c | ||
|
||
:: Evaluation (create reports) | ||
python vdb_benchmark.py -n 51 -b | ||
python vdb_benchmark.py -n 101 -b | ||
python lightrag_benchmark.py -n 51 -b --mode=local | ||
python lightrag_benchmark.py -n 101 -b --mode=local | ||
:: feel free to try with 'global' as well | ||
python lightrag_benchmark.py -n 51 -b --mode=hybrid | ||
:: feel free to try with 'global' as well | ||
python lightrag_benchmark.py -n 101 -b --mode=hybrid | ||
python graph_benchmark.py -n 51 -b | ||
python graph_benchmark.py -n 101 -b |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,18 @@ | ||
# 2wikimultihopqa benchmark | ||
# Creating databases | ||
python vdb_benchmark.py -n 51 -c -b | ||
python vdb_benchmark.py -n 101 -c -b | ||
python lightrag_benchmark.py -n 51 -c -b | ||
python lightrag_benchmark.py -n 101 -c -b | ||
python graph_benchmark.py -n 51 -c -b | ||
python graph_benchmark.py -n 101 -c -b | ||
|
||
# Evaluation (create reports) | ||
python vdb_benchmark.py -n 51 -b | ||
python vdb_benchmark.py -n 101 -b | ||
python lightrag_benchmark.py -n 51 -b --mode=local | ||
python lightrag_benchmark.py -n 101 -b --mode=local | ||
python lightrag_benchmark.py -n 51 -b --mode=hybrid # feel free to try with 'global' as well | ||
python lightrag_benchmark.py -n 101 -b --mode=hybrid # feel free to try with 'global' as well | ||
python graph_benchmark.py -n 51 -b | ||
python graph_benchmark.py -n 101 -b |
File renamed without changes.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,26 @@ | ||
@echo off | ||
echo Evaluation of the performance of the VectorDB and Circlemind on the same data (51 queries) | ||
echo Evaluation of the performance of different RAG methods on 2wikimultihopqa (51 queries) | ||
echo. | ||
echo VectorDB | ||
python vdb_benchmark.py -n 51 -s | ||
echo. | ||
echo LightRAG | ||
python lightrag_benchmark.py -n 51 -s --mode=local | ||
python lightrag_benchmark.py -n 51 -s --mode=hybrid | ||
echo. | ||
echo Circlemind | ||
python graph_benchmark.py -n 51 -s | ||
|
||
echo. | ||
echo. | ||
echo Evaluation of the performance of the VectorDB and Circlemind on the same data (101 queries) | ||
echo Evaluation of the performance of different RAG methods on 2wikimultihopqa (101 queries) | ||
echo. | ||
echo VectorDB | ||
python vdb_benchmark.py -n 101 -s | ||
echo. | ||
echo LightRAG | ||
python lightrag_benchmark.py -n 101 -s --mode=local | ||
python lightrag_benchmark.py -n 101 -s --mode=hybrid | ||
echo. | ||
echo Circlemind | ||
python graph_benchmark.py -n 101 -s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,27 @@ | ||
echo "Evaluation of the performance of the VectorDB and Circlemind on the same data (51 queries)"; | ||
echo "Evaluation of the performance of different RAG methods on the 2wikimultihopqa (51 queries)"; | ||
echo; | ||
echo "VectorDB"; | ||
python vdb_benchmark.py -n 51 -s | ||
echo; | ||
echo "LightRAG [local mode]"; | ||
python lightrag_benchmark.py -n 51 -s --mode=local | ||
# feel free to try with global as well | ||
echo "[hybrid mode]"; | ||
python lightrag_benchmark.py -n 51 -s --mode=hybrid | ||
echo; | ||
echo "Circlemind" | ||
python graph_benchmark.py -n 51 -s | ||
|
||
echo "Evaluation of the performance of the VectorDB and Circlemind on the same data (101 queries)"; | ||
echo "Evaluation of the performance of different RAG methods on the 2wikimultihopqa (101 queries)"; | ||
echo; | ||
echo "VectorDB"; | ||
python vdb_benchmark.py -n 101 -s | ||
echo; | ||
echo "LightRAG [local mode]"; | ||
python lightrag_benchmark.py -n 101 -s --mode=local | ||
# feel free to try with global as well | ||
echo "[hybrid mode]"; | ||
python lightrag_benchmark.py -n 101 -s --mode=hybrid | ||
echo; | ||
echo "Circlemind"; | ||
python graph_benchmark.py -n 101 -s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.