GitHub - kesamet/retrieval-augmented-generation: Retrieval augmented generation demos with open-source Llama-3.1, 3, 2 / Aya Expanse / Mistral / Zephyr / Gemma

Retrieval augmented generation with quantized LLM

Retrieval augmented generation (RAG) demos with Mistral, Zephyr, Phi, Gemma, Llama, Aya-Expanse

The demos use quantized models and run on CPU with acceptable inference time. They can run offline without Internet access, thus allowing deployment in an air-gapped environment.

The demos also allow user to

apply propositionizer to document chunks
perform reranking upon retrieval
perform hypothetical document embedding (HyDE)

🔧 Getting Started

You will need to set up your development environment using conda, which you can install directly.

conda env create --name rag python=3.11
conda activate rag
pip install -r requirements.txt

We shall use unstructured to process PDFs. Refer to nstallation Instructions for Local Development.

You would also need to download punkt_tab and averaged_perceptron_tagger_eng from nltk.

import nltk
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')

Note that we shall only use strategy="fast" in this demo. WIP for extraction of tables from PDFs.

Activate the environment.

conda activate rag

🧠 Use different LLMs

Using a different LLM might lead to poor responses and even fail to output a response. It will require testing, prompt engineering and code refactoring.

Download and save the models in ./models and update config.yaml. The models used in this demo are:

Embeddings
Rerankers:
- BAAI/bge-reranker-base: save in models/bge-reranker-base/
- facebook/tart-full-flan-t5-xl: save in models/tart-full-flan-t5-xl/
Propositionizer
- chentong00/propositionizer-wiki-flan-t5-large save in models/propositionizer-wiki-flan-t5-large/
LLMs

The LLMs can be loaded directly in the app, or they can be first deployed with Ollama server.

You can also choose to use models from Groq. Set GROQ_API_KEY in .env.

Add prompt format

Since each model type has its own prompt format, include the format in ./src/prompt_templates.py. For example, the format used in openbuddy models is

"""{system}
User: {user}
Assistant:"""

🤖 Tracing

We shall use Phoenix for LLM tracing. Phoenix is an open-source observability library designed for experimentation, evaluation, and troubleshooting. Before running the app, start a phoenix server

python3 -m phoenix.server.main serve

The traces can be viewed at http://localhost:6006.

💻 App

We use Streamlit as the interface for the demos. There are three demos:

Conversational Retrieval

streamlit run app_conv.py

Retrieval QA

streamlit run app_qa.py

Conversational Retrieval using ReAct

NOTE: Using gemini-1.5-flash as LLM

Create vectorstore first and update config.yaml

python -m vectorize --filepaths <your-filepath>

Run the app

streamlit run app_react.py

🔍 Usage

To get started, upload a PDF and click on Build VectorDB. Creating vector DB will take a while.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github/workflows		.github/workflows
assets		assets
data		data
experimental		experimental
models		models
src		src
streamlit_app		streamlit_app
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app_conv.py		app_conv.py
app_qa.py		app_qa.py
app_react.py		app_react.py
config.yaml		config.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
vectorize.py		vectorize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrieval augmented generation with quantized LLM

🔧 Getting Started

🧠 Use different LLMs

Add prompt format

🤖 Tracing

💻 App

🔍 Usage

About

Releases

Packages

Languages

License

kesamet/retrieval-augmented-generation

Folders and files

Latest commit

History

Repository files navigation

Retrieval augmented generation with quantized LLM

🔧 Getting Started

🧠 Use different LLMs

Add prompt format

🤖 Tracing

💻 App

🔍 Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages