You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using this great program as a Docker container and I'm wondering if it would be possible to remove the Whisper model from the VRAM after a certain time so that it doesn't permanently block the VRAM.
Maybe some kind of timeout parameter would be useful, so that you can set it to 60 seconds, for example, and after 60 seconds of standby the model is removed from the VRAM again.
Restarting the container is currently my workaround.
import gc
import torch
class App:
def __init__(self, args):
self.args = args
# ... (rest of your __init__ code)
self.whisper_inf = None # Initialize as None
def load_model(self):
if self.whisper_inf is None:
self.whisper_inf = WhisperFactory.create_whisper_inference(
# ... your model loading parameters
)
def unload_model(self):
if self.whisper_inf is not None:
del self.whisper_inf
self.whisper_inf = None
gc.collect() # Force garbage collection
torch.cuda.empty_cache() # Empty CUDA cache
As mentioned with a timeout, or an api call / web-request / button for unloading.
The text was updated successfully, but these errors were encountered:
Similar to #398, it seems there's a lot more people asking for this feature than I thought.
Probably running a fully dependent thread for it, like #398, would be a good way to approach this?
And I'm planning to add offload() function to the each inferencer as I did in MusicSeparator():
Similar to #398, it seems there's a lot more people asking for this feature than I thought.
Probably running a fully dependent thread for it, like #398, would be a good way to approach this? And I'm planning to add offload() function to the each inferencer as I did in MusicSeparator():
Sounds great!
BTW, I made a python voice keyboard (currently only testing on debian) based on this tool.
Running pretty fast, even with gradio api. Feels already like mic streaming.
It's not public atm, but let me know if you're interested or anything else i can help with.
I'm using this great program as a Docker container and I'm wondering if it would be possible to remove the Whisper model from the VRAM after a certain time so that it doesn't permanently block the VRAM.
Maybe some kind of timeout parameter would be useful, so that you can set it to 60 seconds, for example, and after 60 seconds of standby the model is removed from the VRAM again.
Restarting the container is currently my workaround.
Maybe something like this:
As mentioned with a timeout, or an api call / web-request / button for unloading.
The text was updated successfully, but these errors were encountered: