Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SearchApi retriever #851

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README-ja_JP.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ $ export TAVILY_API_KEY={Your Tavily API Key here}
```

- **LLMには、[OpenAI GPT](https://platform.openai.com/docs/guides/gpt) を使用することをお勧めします**が、[Langchain Adapter](https://python.langchain.com/docs/guides/adapters/openai) がサポートする他の LLM モデル(オープンソースを含む)を使用することもできます。llm モデルとプロバイダーを config/config.py で変更するだけです。[このガイド](https://python.langchain.com/docs/integrations/llms/) に従って、LLM を Langchain と統合する方法を学んでください。
- **検索エンジンには、[Tavily Search API](https://app.tavily.com)(LLM 用に最適化されています)を使用することをお勧めします**が、他の検索エンジンを選択することもできます。config/config.py で検索プロバイダーを「duckduckgo」、「googleAPI」、「googleSerp」、「searx」に変更するだけです。次に、config.py ファイルに対応する env API キーを追加します。
- **検索エンジンには、[Tavily Search API](https://app.tavily.com)(LLM 用に最適化されています)を使用することをお勧めします**が、他の検索エンジンを選択することもできます。config/config.py で検索プロバイダーを「duckduckgo」、「googleAPI」、「googleSerp」、「searchapi」、「searx」に変更するだけです。次に、config.py ファイルに対応する env API キーを追加します。
- **最適なパフォーマンスを得るために、[OpenAI GPT](https://platform.openai.com/docs/guides/gpt) モデルと [Tavily Search API](https://app.tavily.com) を使用することを強くお勧めします。**
<br />

Expand Down
2 changes: 1 addition & 1 deletion README-ko_KR.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ export TAVILY_API_KEY={Tavily API 키 입력}
더 영구적인 설정을 원한다면, 현재의 `gpt-researcher` 디렉토리에 `.env` 파일을 생성하고 환경 변수를 입력하세요 (export 없이).

- 기본 LLM은 [GPT](https://platform.openai.com/docs/guides/gpt)이지만, `claude`, `ollama3`, `gemini`, `mistral` 등 다른 LLM도 사용할 수 있습니다. LLM 제공자를 변경하는 방법은 [LLMs 문서](https://docs.gptr.dev/docs/gpt-researcher/llms/llms)를 참조하세요. 이 프로젝트는 OpenAI GPT 모델에 최적화되어 있습니다.
- 기본 검색기는 [Tavily](https://app.tavily.com)이지만, `duckduckgo`, `google`, `bing`, `serper`, `searx`, `arxiv`, `exa` 등의 검색기를 사용할 수 있습니다. 검색 제공자를 변경하는 방법은 [검색기 문서](https://docs.gptr.dev/docs/gpt-researcher/retrievers)를 참조하세요.
- 기본 검색기는 [Tavily](https://app.tavily.com)이지만, `duckduckgo`, `google`, `bing`, `searchapi`, `serper`, `searx`, `arxiv`, `exa` 등의 검색기를 사용할 수 있습니다. 검색 제공자를 변경하는 방법은 [검색기 문서](https://docs.gptr.dev/docs/gpt-researcher/retrievers)를 참조하세요.

### 빠른 시작

Expand Down
2 changes: 1 addition & 1 deletion README-zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ $ export TAVILY_API_KEY={Your Tavily API Key here}
```

- **LLM,我们推荐使用 [OpenAI GPT](https://platform.openai.com/docs/guides/gpt)**,但您也可以使用 [Langchain Adapter](https://python.langchain.com/docs/guides/adapters/openai) 支持的任何其他 LLM 模型(包括开源),只需在 config/config.py 中更改 llm 模型和提供者即可。请按照 [这份指南](https://python.langchain.com/docs/integrations/llms/) 学习如何将 LLM 与 Langchain 集成。
- **对于搜索引擎,我们推荐使用 [Tavily Search API](https://app.tavily.com)(已针对 LLM 进行优化)**,但您也可以选择其他搜索引擎,只需将 config/config.py 中的搜索提供程序更改为 "duckduckgo"、"googleAPI"、"googleSerp "或 "searx "即可。然后在 config.py 文件中添加相应的 env API 密钥。
- **对于搜索引擎,我们推荐使用 [Tavily Search API](https://app.tavily.com)(已针对 LLM 进行优化)**,但您也可以选择其他搜索引擎,只需将 config/config.py 中的搜索提供程序更改为 "duckduckgo"、"googleAPI"、"searchapi"、"googleSerp "或 "searx "即可。然后在 config.py 文件中添加相应的 env API 密钥。
- **我们强烈建议使用 [OpenAI GPT](https://platform.openai.com/docs/guides/gpt) 模型和 [Tavily Search API](https://app.tavily.com) 以获得最佳性能。**
<br />

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ export TAVILY_API_KEY={Your Tavily API Key here}
For a more permanent setup, create a `.env` file in the current `gpt-researcher` directory and input the env vars (without `export`).

- The default LLM is [GPT](https://platform.openai.com/docs/guides/gpt), but you can use other LLMs such as `claude`, `ollama3`, `gemini`, `mistral` and more. To learn how to change the LLM provider, see the [LLMs documentation](https://docs.gptr.dev/docs/gpt-researcher/llms/llms) page. Please note: this project is optimized for OpenAI GPT models.
- The default retriever is [Tavily](https://app.tavily.com), but you can refer to other retrievers such as `duckduckgo`, `google`, `bing`, `serper`, `searx`, `arxiv`, `exa` and more. To learn how to change the search provider, see the [retrievers documentation](https://docs.gptr.dev/docs/gpt-researcher/search-engines/retrievers) page.
- The default retriever is [Tavily](https://app.tavily.com), but you can refer to other retrievers such as `duckduckgo`, `google`, `bing`, `searchapi`, `serper`, `searx`, `arxiv`, `exa` and more. To learn how to change the search provider, see the [retrievers documentation](https://docs.gptr.dev/docs/gpt-researcher/search-engines/retrievers) page.

### Quickstart

Expand Down
4 changes: 4 additions & 0 deletions backend/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ class ConfigRequest(BaseModel):
GOOGLE_API_KEY: str = ''
GOOGLE_CX_KEY: str = ''
BING_API_KEY: str = ''
SEARCHAPI_API_KEY: str = ''
SERPAPI_API_KEY: str = ''
SERPER_API_KEY: str = ''
SEARX_URL: str = ''
Expand Down Expand Up @@ -137,6 +138,7 @@ async def get_config(
google_api_key: str = Header(None),
google_cx_key: str = Header(None),
bing_api_key: str = Header(None),
searchapi_api_key: str = Header(None),
serpapi_api_key: str = Header(None),
serper_api_key: str = Header(None),
searx_url: str = Header(None)
Expand All @@ -148,6 +150,7 @@ async def get_config(
"GOOGLE_API_KEY": google_api_key if google_api_key else os.getenv("GOOGLE_API_KEY", ""),
"GOOGLE_CX_KEY": google_cx_key if google_cx_key else os.getenv("GOOGLE_CX_KEY", ""),
"BING_API_KEY": bing_api_key if bing_api_key else os.getenv("BING_API_KEY", ""),
"SEARCHAPI_API_KEY": searchapi_api_key if searchapi_api_key else os.getenv("SEARCHAPI_API_KEY", ""),
"SERPAPI_API_KEY": serpapi_api_key if serpapi_api_key else os.getenv("SERPAPI_API_KEY", ""),
"SERPER_API_KEY": serper_api_key if serper_api_key else os.getenv("SERPER_API_KEY", ""),
"SEARX_URL": searx_url if searx_url else os.getenv("SEARX_URL", ""),
Expand All @@ -170,6 +173,7 @@ async def set_config(config: ConfigRequest):
os.environ["GOOGLE_API_KEY"] = config.GOOGLE_API_KEY
os.environ["GOOGLE_CX_KEY"] = config.GOOGLE_CX_KEY
os.environ["BING_API_KEY"] = config.BING_API_KEY
os.environ["SEARCHAPI_API_KEY"] = config.SEARCHAPI_API_KEY
os.environ["SERPAPI_API_KEY"] = config.SERPAPI_API_KEY
os.environ["SERPER_API_KEY"] = config.SERPER_API_KEY
os.environ["SEARX_URL"] = config.SEARX_URL
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ export TAVILY_API_KEY={Your Tavily API Key here}
For a more permanent setup, create a `.env` file in the current `gpt-researcher` directory and input the env vars (without `export`).

- For LLM provider, we recommend **[OpenAI GPT](https://platform.openai.com/docs/guides/gpt)**, but you can use any other LLM model (including open sources). To learn how to change the LLM model, please refer to the [documentation](https://docs.gptr.dev/docs/gpt-researcher/llms/llms) page.
- For web search API, we recommend **[Tavily Search API](https://app.tavily.com)**, but you can also refer to other search APIs of your choice by changing the search provider in config/config.py to `duckduckgo`, `google`, `bing`, `serper`, `searx` and more. Then add the corresponding env API key.
- For web search API, we recommend **[Tavily Search API](https://app.tavily.com)**, but you can also refer to other search APIs of your choice by changing the search provider in config/config.py to `duckduckgo`, `google`, `bing`, `searchapi`, `serper`, `searx` and more. Then add the corresponding env API key.

## Quickstart

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/gpt-researcher/gptr/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ You can also include your own external JSON file `config.json` by adding the pat

Below is a list of current supported options:

- **`RETRIEVER`**: Web search engine used for retrieving sources. Defaults to `tavily`. Options: `duckduckgo`, `bing`, `google`, `serper`, `searx`. [Check here](https://github.com/assafelovic/gpt-researcher/tree/master/gpt_researcher/retrievers) for supported retrievers
- **`RETRIEVER`**: Web search engine used for retrieving sources. Defaults to `tavily`. Options: `duckduckgo`, `bing`, `google`, `searchapi`, `serper`, `searx`. [Check here](https://github.com/assafelovic/gpt-researcher/tree/master/gpt_researcher/retrievers) for supported retrievers
- **`EMBEDDING_PROVIDER`**: Provider for embedding model. Defaults to `openai`. Options: `ollama`, `huggingface`, `azure_openai`, `custom`.
- **`LLM_PROVIDER`**: LLM provider. Defaults to `openai`. Options: `google`, `ollama`, `groq` and much more!
- **`FAST_LLM_MODEL`**: Model name for fast LLM operations such summaries. Defaults to `gpt-4o-mini`.
Expand Down
1 change: 1 addition & 0 deletions docs/docs/gpt-researcher/search-engines/retrievers.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Thanks to our community, we have integrated the following web search engines:
- [Tavily](https://app.tavily.com) - Default
- [Bing](https://www.microsoft.com/en-us/bing/apis/bing-web-search-api) - Env: `RETRIEVER=bing`
- [Google](https://developers.google.com/custom-search/v1/overview) - Env: `RETRIEVER=google`
- [SearchApi](https://www.searchapi.io/) - Env: `RETRIEVER=searchapi`
- [Serp API](https://serpapi.com/) - Env: `RETRIEVER=serpapi`
- [Serper](https://serper.dev/) - Env: `RETRIEVER=serper`
- [Searx](https://searx.github.io/searx/) - Env: `RETRIEVER=searx`
Expand Down
1 change: 1 addition & 0 deletions frontend/nextjs/app/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ export default function Home() {
'google_api_key': apiVariables.GOOGLE_API_KEY,
'google_cx_key': apiVariables.GOOGLE_CX_KEY,
'bing_api_key': apiVariables.BING_API_KEY,
'searchapi_api_key': apiVariables.SEARCHAPI_API_KEY,
'serpapi_api_key': apiVariables.SERPAPI_API_KEY,
'serper_api_key': apiVariables.SERPER_API_KEY,
'searx_url': apiVariables.SEARX_URL
Expand Down
9 changes: 9 additions & 0 deletions frontend/nextjs/components/Settings/Modal.js
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ export default function Modal({ setChatBoxSettings, chatBoxSettings }) {
GOOGLE_API_KEY: '',
GOOGLE_CX_KEY: '',
BING_API_KEY: '',
SEARCHAPI_API_KEY: '',
SERPAPI_API_KEY: '',
SERPER_API_KEY: '',
SEARX_URL: '',
Expand Down Expand Up @@ -80,6 +81,13 @@ export default function Modal({ setChatBoxSettings, chatBoxSettings }) {
<input type="text" name="BING_API_KEY" value={apiVariables.BING_API_KEY} onChange={handleInputChange} />
</div>
);
case 'searchapi':
return (
<div className="form-group">
<label className="form-group-label">SEARCHAPI_API_KEY</label>
<input type="text" name="SEARCHAPI_API_KEY" value={apiVariables.SEARCHAPI_API_KEY} onChange={handleInputChange} />
</div>
);
case 'serpapi':
return (
<div className="form-group">
Expand Down Expand Up @@ -145,6 +153,7 @@ export default function Modal({ setChatBoxSettings, chatBoxSettings }) {
<option value="tavily">Tavily</option>
<option value="google">Google</option>
<option value="searx">Searx</option>
<option value="searchapi">SearchApi</option>
<option value="serpapi">SerpApi</option>
<option value="googleSerp">GoogleSerp</option>
<option value="duckduckgo">DuckDuckGo</option>
Expand Down
1 change: 1 addition & 0 deletions gpt_researcher/config/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ def parse_retrievers(self, retriever_str: str):
"duckduckgo",
"exa",
"google",
"searchapi",
"searx",
"semantic_scholar",
"serpapi",
Expand Down
4 changes: 4 additions & 0 deletions gpt_researcher/master/actions.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@ def get_retriever(retriever):
from gpt_researcher.retrievers import SearxSearch

retriever = SearxSearch
case "searchapi":
from gpt_researcher.retrievers import SearchApiSearch

retriever = SearchApiSearch
case "serpapi":
from gpt_researcher.retrievers import SerpApiSearch

Expand Down
2 changes: 2 additions & 0 deletions gpt_researcher/retrievers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from .pubmed_central.pubmed_central import PubMedCentralSearch
from .searx.searx import SearxSearch
from .semantic_scholar.semantic_scholar import SemanticScholarSearch
from .searchapi.searchapi import SearchApiSearch
from .serpapi.serpapi import SerpApiSearch
from .serper.serper import SerperSearch
from .tavily.tavily_search import TavilySearch
Expand All @@ -15,6 +16,7 @@
"TavilySearch",
"CustomRetriever",
"Duckduckgo",
"SearchApiSearch",
"SerperSearch",
"SerpApiSearch",
"GoogleSearch",
Expand Down
Empty file.
84 changes: 84 additions & 0 deletions gpt_researcher/retrievers/searchapi/searchapi.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# SearchApi Retriever

# libraries
import os
import requests
import urllib.parse


class SearchApiSearch():
"""
SearchApi Retriever
"""
def __init__(self, query):
"""
Initializes the SearchApiSearch object
Args:
query:
"""
self.query = query
self.api_key = self.get_api_key()

def get_api_key(self):
"""
Gets the SearchApi API key
Returns:

"""
try:
api_key = os.environ["SEARCHAPI_API_KEY"]
except:
raise Exception("SearchApi key not found. Please set the SEARCHAPI_API_KEY environment variable. "
"You can get a key at https://www.searchapi.io/")
return api_key

def search(self, max_results=7):
"""
Searches the query
Returns:

"""
print("SearchApiSearch: Searching with query {0}...".format(self.query))
"""Useful for general internet search queries using SearchApi."""


url = "https://www.searchapi.io/api/v1/search"
params = {
"q": self.query,
"engine": "google",
}

headers = {
'Content-Type': 'application/json',
'Authorization': f'Bearer {self.api_key}',
'X-SearchApi-Source': 'gpt-researcher'
}

encoded_url = url + "?" + urllib.parse.urlencode(params)
search_response = []

try:
response = requests.get(encoded_url, headers=headers, timeout=20)
if response.status_code == 200:
search_results = response.json()
if search_results:
results = search_results["organic_results"]
results_processed = 0
for result in results:
# skip youtube results
if "youtube.com" in result["link"]:
continue
if results_processed >= max_results:
break
search_result = {
"title": result["title"],
"href": result["link"],
"body": result["snippet"],
}
search_response.append(search_result)
results_processed += 1
except Exception as e:
print(f"Error: {e}. Failed fetching sources. Resulting in empty response.")
search_response = []

return search_response