Releases: Dadangdut33/Speech-Translate
1.3.10 - Add option to not use .en model
Small addition / changes
What's Changed
- [New] add option to not use .en model for english language
- [Fix] fix cases where translation will show error when it's only translating digits of number
Full Changelog: 1.3.9...1.3.10
Notes
- Before downloading / installing please take a look at the wiki and read the getting started section.
- Use the CUDA version for GPU support
- Linux/Mac user can follow this installation note to install
Speech Translate
as module. - If you previously installed
speech translate
as a module, you can update by doingpip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --upgrade --force-reinstall --no-deps
- If you install from installer, you can download and launch the installer below to update
- If you have any suggestions or found any bugs please feel free to open a disccussion or open an issue
Requirements
- Compatible OS Installation:
OS | Installation from Prebuilt binary | Installation as a Module | Installation from Git |
---|---|---|---|
Windows | ✔️ | ✔️ | ✔️ |
MacOS | ❌ | ✔️ | ✔️ |
Linux | ❌ | ✔️ | ✔️ |
* Python 3.8 or later (3.11 is recommended) for installation as module.
- Speaker input only work on windows 8 and above (Alternatively, you can make a loopback to capture your system audio as virtual input (like mic input) by using this guide/tool: [Voicemeeter on Windows]/[YT Tutorial] - [pavucontrol on Ubuntu with PulseAudio] - [blackhole on MacOS]
- Internet connection is needed only for translation with API & downloading models (If you want to go fully offline, you can setup LibreTranslate on your local machine and set it up in the app settings)
- Recommended to have
Segoe UI
font installed on your system for best UI experience (For OS other than windows, you can see this: Ubuntu - MacOS) - Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) for faster result. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.
Size | Parameters | Required VRAM | Relative speed |
---|---|---|---|
tiny | 39 M | ~1 GB | ~32x |
base | 74 M | ~1 GB | ~16x |
small | 244 M | ~2 GB | ~6x |
medium | 769 M | ~5 GB | ~2x |
large | 1550 M | ~10 GB | 1x |
* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the model speed will be significantly faster and have smaller vram usage, for more information about this please visit faster-whisper repository
1.3.9 - Minor bug Fix
This release fixed some minor bugs that can prevent logging in record.
What's Changed
- Fix the latest release url in about window
- Fix Error when activating Debug recording #64
Full Changelog: 1.3.8.1...1.3.9
Notes
- Before downloading / installing please take a look at the wiki and read the getting started section.
- Use the CUDA version for GPU support
- Linux/Mac user can follow this installation note to install
Speech Translate
as module. - If you previously installed
speech translate
as a module, you can update by doingpip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --upgrade --force-reinstall --no-deps
- If you install from installer, you can download and launch the installer below to update
- If you have any suggestions or found any bugs please feel free to open a disccussion or open an issue
Requirements
- Compatible OS Installation:
OS | Installation from Prebuilt binary | Installation as a Module | Installation from Git |
---|---|---|---|
Windows | ✔️ | ✔️ | ✔️ |
MacOS | ❌ | ✔️ | ✔️ |
Linux | ❌ | ✔️ | ✔️ |
* Python 3.8 or later (3.11 is recommended) for installation as module.
- Speaker input only work on windows 8 and above (Alternatively, you can make a loopback to capture your system audio as virtual input (like mic input) by using this guide/tool: [Voicemeeter on Windows]/[YT Tutorial] - [pavucontrol on Ubuntu with PulseAudio] - [blackhole on MacOS]
- Internet connection is needed only for translation with API & downloading models (If you want to go fully offline, you can setup LibreTranslate on your local machine and set it up in the app settings)
- Recommended to have
Segoe UI
font installed on your system for best UI experience (For OS other than windows, you can see this: Ubuntu - MacOS) - Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) for faster result. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.
Size | Parameters | Required VRAM | Relative speed |
---|---|---|---|
tiny | 39 M | ~1 GB | ~32x |
base | 74 M | ~1 GB | ~16x |
small | 244 M | ~2 GB | ~6x |
medium | 769 M | ~5 GB | ~2x |
large | 1550 M | ~10 GB | 1x |
* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the model speed will be significantly faster and have smaller vram usage, for more information about this please visit faster-whisper repository
1.3.8.1 - Fix model checking
This release fix model checking bug in the previous version
What's Changed
- Fix faster whisper model checking #63
Full Changelog: 1.3.8...1.3.8.1
Notes
- Before downloading / installing please take a look at the wiki and read the getting started section.
- Use the CUDA version for GPU support
- Linux/Mac user can follow this installation note to install
Speech Translate
as module. - If you previously installed
speech translate
as a module, you can update by doingpip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --upgrade --force-reinstall --no-deps
- If you install from installer, you can download and launch the installer below to update
- If you have any suggestions or found any bugs please feel free to open a disccussion or open an issue
Requirements
- Compatible OS Installation:
OS | Installation from Prebuilt binary | Installation as a Module | Installation from Git |
---|---|---|---|
Windows | ✔️ | ✔️ | ✔️ |
MacOS | ❌ | ✔️ | ✔️ |
Linux | ❌ | ✔️ | ✔️ |
* Python 3.8 or later (3.11 is recommended) for installation as module.
- Speaker input only work on windows 8 and above (Alternatively, you can make a loopback to capture your system audio as virtual input (like mic input) by using this guide/tool: [Voicemeeter on Windows]/[YT Tutorial] - [pavucontrol on Ubuntu with PulseAudio] - [blackhole on MacOS]
- Internet connection is needed only for translation with API & downloading models (If you want to go fully offline, you can setup LibreTranslate on your local machine and set it up in the app settings)
- Recommended to have
Segoe UI
font installed on your system for best UI experience (For OS other than windows, you can see this: Ubuntu - MacOS) - Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) for faster result. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.
Size | Parameters | Required VRAM | Relative speed |
---|---|---|---|
tiny | 39 M | ~1 GB | ~32x |
base | 74 M | ~1 GB | ~16x |
small | 244 M | ~2 GB | ~6x |
medium | 769 M | ~5 GB | ~2x |
large | 1550 M | ~10 GB | 1x |
* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the model speed will be significantly faster and have smaller vram usage, for more information about this please visit faster-whisper repository
1.3.8 - Bug Fixes & Enhancement
This release fix some bugs and add improvement to the startup time
What's Changed
- Fix model checking when no internet
- Fix prebuilt app might not start #62
- Fix align result
- Added splash screen in the startup
- Greatly improve startup speed
Full Changelog: 1.3.7...1.3.8
Notes
- Before downloading / installing please take a look at the wiki and read the getting started section.
- Use the CUDA version for GPU support
- Linux/Mac user can follow this installation note to install.
- If you previously installed
speech translate
as a module, you can update by doingpip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --upgrade --force-reinstall --no-deps
- If you install from installer, you can download and launch the installer below to update
- If you have any suggestions or found any bugs please feel free to open a disccussion or open an issue
Requirements
- Compatible OS Installation:
OS | Installation from Prebuilt binary | Installation as a Module | Installation from Git |
---|---|---|---|
Windows | ✔️ | ✔️ | ✔️ |
MacOS | ❌ | ✔️ | ✔️ |
Linux | ❌ | ✔️ | ✔️ |
* Python 3.8 or later (3.11 is recommended) for installation as module.
- Speaker input only work on windows 8 and above (Alternatively, you can make a loopback to capture your system audio as virtual input (like mic input) by using this guide/tool: [Voicemeeter on Windows]/[YT Tutorial] - [pavucontrol on Ubuntu with PulseAudio] - [blackhole on MacOS]
- Internet connection is needed only for translation with API & downloading models (If you want to go fully offline, you can setup LibreTranslate on your local machine and set it up in the app settings)
- Recommended to have
Segoe UI
font installed on your system for best UI experience (For OS other than windows, you can see this: Ubuntu - MacOS) - Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) for faster result. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.
Size | Parameters | Required VRAM | Relative speed |
---|---|---|---|
tiny | 39 M | ~1 GB | ~32x |
base | 74 M | ~1 GB | ~16x |
small | 244 M | ~2 GB | ~6x |
medium | 769 M | ~5 GB | ~2x |
large | 1550 M | ~10 GB | 1x |
* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the model speed will be significantly faster and have smaller vram usage, for more information about this please visit faster-whisper repository
1.3.7 - Some enhancement for record
This release added some little feature that is mainly for record.
I would like to say once again, thanks for all the download and feedbacks everyone
What's Changed
- Added option for no sentence limit on record #57
- Added option for auto scrolling the record result
- Added confirmation prompt before record (this confirmation prompt can be disabled in setting)
- Added ways to run without the tray app by adding
--no-tray
when launching the app. Ex:.\SpeechTranslate.exe --no-tray
- Changed the emoji icon to NotoEmoji
- Fixed some stuff when run from linux
Full Changelog: 1.3.6...1.3.7
Notes
- Before downloading / installing please take a look at the wiki and read the getting started section.
- Use the CUDA version for GPU support
- If you previously installed
speech translate
as a module, you can update by doingpip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --upgrade --force-reinstall --no-deps
- If you install from installer, you can download and launch the installer below to update
- If you have any suggestions or found any bugs please feel free to open a disccussion or open an issue
Requirements
- Compatible OS Installation:
OS | Installation from Prebuilt binary | Installation as a Module | Installation from Git |
---|---|---|---|
Windows | ✔️ | ✔️ | ✔️ |
MacOS | ❌ | ✔️ | ✔️ |
Linux | ❌ | ✔️ | ✔️ |
* Python 3.8 or later (3.11 is recommended) for installation as module.
- Speaker input only work on windows 8 and above (Alternatively, you can make a loopback to capture your system audio as virtual input (like mic input) by using this guide/tool: [Voicemeeter on Windows]/[YT Tutorial] - [pavucontrol on Ubuntu with PulseAudio] - [blackhole on MacOS])
- Internet connection is needed only for translation with API & downloading models (If you want to go fully offline, you can setup LibreTranslate on your local machine and set it up in the app settings)
- Recommended to have
Segoe UI
font installed on your system for best UI experience (For OS other than windows, you can see this: Ubuntu - MacOS) - Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) for faster result. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.
Size | Parameters | Required VRAM | Relative speed |
---|---|---|---|
tiny | 39 M | ~1 GB | ~32x |
base | 74 M | ~1 GB | ~16x |
small | 244 M | ~2 GB | ~6x |
medium | 769 M | ~5 GB | ~2x |
large | 1550 M | ~10 GB | 1x |
* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the model speed will be significantly faster and have smaller vram usage, for more information about this please visit faster-whisper repository
1.3.6 - Bug Fixes and Enhancement
It seems that we have reached 7.4k download and almost 300 stars as of the time i release this 🚀
Thank you very much everyone for the participation in downloading, starring, forking, opening discussion, and submitting bug reports or feature request.
What's Changed
- Fixed translate with whisper not showing result in record
- Fixed continue after model download prompt
- Fixed random crashing when using faster whisper
- Fixed getting frame window in record
- Fixed log format when clear or change mode
- Changed some default option
- FFmpeg is now bundled with the app using static-ffmpeg #56 thanks to @zackees for the suggestion
- Added ways to filter result to counter hallucination
- Added optional silero vad that can be used alongside webrtcvad if possible on record
- Added setting for record min input length
Full Changelog: 1.3.5...1.3.6
Notes
- Before downloading / installing please take a look at the wiki and read the getting started section.
- If you previously installed
speech translate
as a module, you can update by doingpip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --upgrade --force-reinstall
- If you install from installer, you can download and launch the installer below to update
- If you have any suggestions or found any bugs please feel free to open a disccussion or open an issue
Requirements
- Compatible OS:
OS | Prebuilt binary | As a module |
---|---|---|
Windows | ✔️ | ✔️ |
MacOS | ❌ | ✔️ |
Linux | ❌ | ✔️ |
* Python 3.8 or later (3.11 is recommended) for installation as module.
- Speaker input only work on windows 8 and above.
- Internet connection (for translation with API)
- Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) to run each model. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.
Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
---|---|---|---|---|---|
tiny | 39 M | tiny.en |
tiny |
~1 GB | ~32x |
base | 74 M | base.en |
base |
~1 GB | ~16x |
small | 244 M | small.en |
small |
~2 GB | ~6x |
medium | 769 M | medium.en |
medium |
~5 GB | ~2x |
large | 1550 M | N/A | large |
~10 GB | 1x |
* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the speed will be significantly faster and the required vram size will be reduced depending on the usage, for more information about this please visit faster-whisper repository
1.3.5 - Bug Fixes and Some Enhancement
What's Changed
- Fixed theme bug in the subtitle window context menu #54
- Fixed character limit in record #44
- Fixed flickering results in record session
- Fixed some menu interaction state
- Changed the default file export format to folderize the export
- Changed the model name in the model selection to add more information
- Moved file export setting to its own tab in setting
- Updated library/dependencies
- Added character limit for file operation #44
- Added metada on file operation
- Added more export naming format
- Added more options in tray app
- Added more explaining error message when no internet connection present
Full Changelog: 1.3.4...1.3.5
Notes
- Before downloading / installing please take a look at the wiki and read the getting started section.
- If you previously installed
speech translate
as a module, you can update by doingpip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --upgrade --force-reinstall
- If you install from installer, you can download and launch the installer below to update
- If you have any suggestions or found any bugs please feel free to open a disccussion or open an issue
Requirements
- Compatible OS:
OS | Prebuilt binary | As a module |
---|---|---|
Windows | ✔️ | ✔️ |
MacOS | ❌ | ✔️ |
Linux | ❌ | ✔️ |
* Python 3.8 or later (3.11 is recommended) for installation as module.
- Speaker input only work on windows 8 and above.
- Internet connection (for translation with API)
- FFmpeg is required to be installed and added to the PATH environment variable. You can do it when prompted in the app, or you can download it here and add it to your path manually. Alternatively, you can also download and add it to path automatically by using the following commands:
# on Windows using powershell (Also included in the release page, and can be run by right clicking and selecting "Run with PowerShell")
# Must be run in an elevated PowerShell prompt (Run as administrator)
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser # Optional: Needed to run a remote script the first time
& ([scriptblock]::Create(
(New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/install_ffmpeg.ps1')
)) -webdl
# on Windows using Winget (Default package manager for Windows 10 and above)
winget install --id=Gyan.FFmpeg -e
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
- Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) to run each model. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.
Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
---|---|---|---|---|---|
tiny | 39 M | tiny.en |
tiny |
~1 GB | ~32x |
base | 74 M | base.en |
base |
~1 GB | ~16x |
small | 244 M | small.en |
small |
~2 GB | ~6x |
medium | 769 M | medium.en |
medium |
~5 GB | ~2x |
large | 1550 M | N/A | large |
~10 GB | 1x |
* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the speed will be significantly faster and the required vram size will be reduced depending on the usage, for more information about this please visit faster-whisper repository
1.3.4 - Large v3 for faster whisper
What's Changed
- add large-v3 for faster whisper
- can now cancel faster whisper model download
- remove keys that does not work for mymemorytranslator
- fix using demucs in file import
- fix when using demucs and vad in recording
- fix some widget disabled state
Full Changelog: 1.3.3...1.3.4
Notes
- Before downloading / installing please take a look at the wiki and read the getting started section.
- If you previously installed
speech translate
as a module, you can update by doingpip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --upgrade --force-reinstall
- If you install from installer, you can download and launch the installer below to update
- If you have any suggestions or found any bugs please feel free to open a disccussion or open an issue
Requirements
- Compatible OS:
OS | Prebuilt binary | As a module |
---|---|---|
Windows | ✔️ | ✔️ |
MacOS | ❌ | ✔️ |
Linux | ❌ | ✔️ |
* Python 3.8 or later (3.11 is recommended) for installation as module.
- Speaker input only work on windows 8 and above.
- Internet connection (for translation with API)
- FFmpeg is required to be installed and added to the PATH environment variable. You can do it when prompted in the app, or you can download it here and add it to your path manually. Alternatively, you can also download and add it to path automatically by using the following commands:
# on Windows using powershell (Also included in the release page, and can be run by right clicking and selecting "Run with PowerShell")
# Must be run in an elevated PowerShell prompt (Run as administrator)
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser # Optional: Needed to run a remote script the first time
& ([scriptblock]::Create(
(New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/install_ffmpeg.ps1')
)) -webdl
# on Windows using Winget (Default package manager for Windows 10 and above)
winget install --id=Gyan.FFmpeg -e
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
- Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) to run each model. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.
Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
---|---|---|---|---|---|
tiny | 39 M | tiny.en |
tiny |
~1 GB | ~32x |
base | 74 M | base.en |
base |
~1 GB | ~16x |
small | 244 M | small.en |
small |
~2 GB | ~6x |
medium | 769 M | medium.en |
medium |
~5 GB | ~2x |
large | 1550 M | N/A | large |
~10 GB | 1x |
* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the speed will be significantly faster and the required vram size will be reduced depending on the usage, for more information about this please visit faster-whisper repository
1.3.3 - Bug Fix
Fix bug with the logger, language code, and loading whisper model.
What's Changed
Full Changelog: 1.3.2...1.3.3
Notes
- Before downloading / installing please take a look at the wiki and read the getting started section.
- If you previously installed
speech translate
as a module, you can update by doingpip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --upgrade --force-reinstall
- If you install from installer, you can download and launch the installer below to update
- If you have any suggestions or found any bugs please feel free to open a disccussion or open an issue
Requirements
- Compatible OS:
OS | Prebuilt binary | As a module |
---|---|---|
Windows | ✔️ | ✔️ |
MacOS | ❌ | ✔️ |
Linux | ❌ | ✔️ |
* Python 3.8 or later (3.11 is recommended) for installation as module.
- Speaker input only work on windows 8 and above.
- Internet connection (for translation with API)
- FFmpeg is required to be installed and added to the PATH environment variable. You can do it when prompted in the app, or you can download it here and add it to your path manually. Alternatively, you can also download and add it to path automatically by using the following commands:
# on Windows using powershell (Also included in the release page, and can be run by right clicking and selecting "Run with PowerShell")
# Must be run in an elevated PowerShell prompt (Run as administrator)
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser # Optional: Needed to run a remote script the first time
& ([scriptblock]::Create(
(New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/install_ffmpeg.ps1')
)) -webdl
# on Windows using Winget (Default package manager for Windows 10 and above)
winget install --id=Gyan.FFmpeg -e
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
- Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) to run each model. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.
Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
---|---|---|---|---|---|
tiny | 39 M | tiny.en |
tiny |
~1 GB | ~32x |
base | 74 M | base.en |
base |
~1 GB | ~16x |
small | 244 M | small.en |
small |
~2 GB | ~6x |
medium | 769 M | medium.en |
medium |
~5 GB | ~2x |
large | 1550 M | N/A | large |
~10 GB | 1x |
* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the speed will be significantly faster and the required vram size will be reduced depending on the usage, for more information about this please visit faster-whisper repository
1.3.2 - Bug Fix
Fix bug where the app wont start.
What's Changed
- Fix #51 thanks to @Adi900696 for reporting
Full Changelog: 1.3.1...1.3.2
Notes
- Before downloading / installing please take a look at the wiki and read the getting started section.
- If you previously installed
speech translate
as a module, you can update by doingpip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --upgrade --force-reinstall
- If you install from installer, you can download and launch the installer below to update
- If you have any suggestions or found any bugs please feel free to open a disccussion or open an issue
Requirements
- Compatible OS:
OS | Prebuilt binary | As a module |
---|---|---|
Windows | ✔️ | ✔️ |
MacOS | ❌ | ✔️ |
Linux | ❌ | ✔️ |
* Python 3.8 or later (3.11 is recommended) for installation as module.
- Speaker input only work on windows 8 and above.
- Internet connection (for translation with API)
- FFmpeg is required to be installed and added to the PATH environment variable. You can do it when prompted in the app, or you can download it here and add it to your path manually. Alternatively, you can also download and add it to path automatically by using the following commands:
# on Windows using powershell (Also included in the release page, and can be run by right clicking and selecting "Run with PowerShell")
# Must be run in an elevated PowerShell prompt (Run as administrator)
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser # Optional: Needed to run a remote script the first time
& ([scriptblock]::Create(
(New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/install_ffmpeg.ps1')
)) -webdl
# on Windows using Winget (Default package manager for Windows 10 and above)
winget install --id=Gyan.FFmpeg -e
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
- Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) to run each model. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.
Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
---|---|---|---|---|---|
tiny | 39 M | tiny.en |
tiny |
~1 GB | ~32x |
base | 74 M | base.en |
base |
~1 GB | ~16x |
small | 244 M | small.en |
small |
~2 GB | ~6x |
medium | 769 M | medium.en |
medium |
~5 GB | ~2x |
large | 1550 M | N/A | large |
~10 GB | 1x |
* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the speed will be significantly faster and the required vram size will be reduced depending on the usage, for more information about this please visit faster-whisper repository