Popular AI company compatible LLM Server

This is a very simple Flask application that provides a popular compatible API for other large language models.

Very useful if you have tests or lots of running Collaborative Agent Modules :-)

It currently supports Llama2, Mistral-7b and RWKV since these models can run pretty easily on local hardware which makes it a great fit for the agent use case.

Streaming is supported as well.

Setup

Create a venv python3 -m venv venv
Activate venv source venv/bin/activate (or venv\Scripts\activate on Windows)
Install dependencies pip install -r requirements.txt
Create a symlink to your models. Example ln -s /mnt/ssd/models/rwkv models/rwkv
Run the server using python app.py.

Sending Requests

curl http://localhost:5000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer WE_DONT_NEED_NO_STINKING_TOKENS" \
  -d '{
    "model": "mistral-7b-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Popular AI company compatible LLM Server

Setup

Sending Requests

Files

README.md

Latest commit

History

README.md

File metadata and controls

Popular AI company compatible LLM Server

Setup

Sending Requests