Plug whisper audio transcription to a local ollama server and ouput tts audio responses
This is just a simple combination of three tools in offline mode:
- Speech recognition: whisper running local models in offline mode
- Large Language Mode: ollama running local models in offline mode
- Offline Text To Speech: pyttsx3
whisper dependencies are setup to run on GPU so Install Cuda before running pip install
.
Install the packages python3-pyaudio
, portaudio19-dev
and espeak
on your distribution
Install ollama and ensure server is started locally first (in WLS under windows) (e.g. curl https://ollama.ai/install.sh | sh
)
Download a whisper model and place it in the whisper
subfolder (e.g. https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt)
Configure assistant.yaml
settings. (It is setup to work in french with ollama mistral model by default...)
Run assistant.py
Leave space
key pressed to talk, the AI will interpret the query when you release the key.
- Rearrange code base
- Multi threading to overlap tts and speed recognition (ollama is already running remotely in parallel) wget https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt