Freeing VRAM after timeout #413

ei23fxg · 2024-11-30T22:27:12Z

I'm using this great program as a Docker container and I'm wondering if it would be possible to remove the Whisper model from the VRAM after a certain time so that it doesn't permanently block the VRAM.
Maybe some kind of timeout parameter would be useful, so that you can set it to 60 seconds, for example, and after 60 seconds of standby the model is removed from the VRAM again.

Restarting the container is currently my workaround.

  whisper-webui:
    build:
      context: ./volumes/Whisper-WebUI/.
      dockerfile: dockerfile
    image: jhj0517/whisper-webui:latest
    container_name: Whisper-WebUI
    restart: no
    volumes:
      - ./volumes/Whisper-WebUI/models:/Whisper-WebUI/models
      - ./volumes/Whisper-WebUI/outputs:/Whisper-WebUI/outputs
      - ./volumes/Whisper-WebUI/configs:/Whisper-WebUI/configs
    ports:
      - "7860:7860"
    stdin_open: true
    tty: true
    entrypoint: ["python", "app.py", "--server_port", "7860", "--whisper_type", "insanely-fast-whisper", "--api_open", "True", "--server_name", "0.0.0.0",]
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [ gpu ]

Maybe something like this:

import gc
import torch

class App:
    def __init__(self, args):
        self.args = args
        # ... (rest of your __init__ code)
        self.whisper_inf = None  # Initialize as None

    def load_model(self):
        if self.whisper_inf is None:
            self.whisper_inf = WhisperFactory.create_whisper_inference(
                # ... your model loading parameters
            )

    def unload_model(self):
        if self.whisper_inf is not None:
            del self.whisper_inf
            self.whisper_inf = None
            gc.collect()  # Force garbage collection
            torch.cuda.empty_cache()  # Empty CUDA cache

As mentioned with a timeout, or an api call / web-request / button for unloading.

The text was updated successfully, but these errors were encountered:

jhj0517 · 2024-12-01T07:36:58Z

Hi, Thanks for your suggestion!

Similar to #398, it seems there's a lot more people asking for this feature than I thought.

Probably running a fully dependent thread for it, like #398, would be a good way to approach this?
And I'm planning to add offload() function to the each inferencer as I did in MusicSeparator():

Whisper-WebUI/modules/uvr/music_separator.py

Lines 162 to 163 in 972e58a

    
           def offload(self): 
        
               """Offload the model and free up the memory"""

ei23fxg · 2024-12-01T14:48:41Z

Hi, Thanks for your suggestion!

Similar to #398, it seems there's a lot more people asking for this feature than I thought.

Probably running a fully dependent thread for it, like #398, would be a good way to approach this? And I'm planning to add offload() function to the each inferencer as I did in MusicSeparator():

Whisper-WebUI/modules/uvr/music_separator.py

Lines 162 to 163 in 972e58a

def offload(self):

"""Offload the model and free up the memory"""

Sounds great!
BTW, I made a python voice keyboard (currently only testing on debian) based on this tool.
Running pretty fast, even with gradio api. Feels already like mic streaming.
It's not public atm, but let me know if you're interested or anything else i can help with.

ei23fxg added the enhancement New feature or request label Nov 30, 2024

ei23fxg assigned jhj0517 Nov 30, 2024

jhj0517 mentioned this issue Dec 13, 2024

Add offload() to each inferencer #422

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Freeing VRAM after timeout #413

Freeing VRAM after timeout #413

ei23fxg commented Nov 30, 2024 •

edited

Loading

jhj0517 commented Dec 1, 2024 •

edited

Loading

ei23fxg commented Dec 1, 2024

Freeing VRAM after timeout #413

Freeing VRAM after timeout #413

Comments

ei23fxg commented Nov 30, 2024 • edited Loading

jhj0517 commented Dec 1, 2024 • edited Loading

ei23fxg commented Dec 1, 2024

ei23fxg commented Nov 30, 2024 •

edited

Loading

jhj0517 commented Dec 1, 2024 •

edited

Loading