Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
build.py		build.py
requirements.txt		requirements.txt
run.py		run.py
run_faster_whisper.py		run_faster_whisper.py
tokenizer.py		tokenizer.py
weight.py		weight.py
whisper_utils.py		whisper_utils.py

README.md

Whisper

This document shows how to build and run a whisper model in TensorRT-LLM on a single GPU.

Overview

The TensorRT-LLM Whisper example code is located in examples/whisper. There are three main files in that folder:

build.py to build the TensorRT engine(s) needed to run the Whisper model.
run.py to run the inference on a single wav file, or a HuggingFace dataset (Librispeech test clean).
run_faster_whisper.py to do benchmark comparison with Faster Whisper.

Support Matrix

FP16
INT8

Usage

The TensorRT-LLM Whisper example code locates at examples/whisper. It takes whisper pytorch weights as input, and builds the corresponding TensorRT engines.

Build TensorRT engine(s)

Need to prepare the whisper checkpoint first by downloading models from here.

wget --directory-prefix=assets https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/multilingual.tiktoken
wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
wget --directory-prefix=assets https://raw.githubusercontent.com/yuekaizhang/Triton-ASR-Client/main/datasets/mini_en/wav/1221-135766-0002.wav
# large-v3 model
wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt

TensorRT-LLM Whisper builds TensorRT engine(s) from the pytorch checkpoint.

# install requirements first
pip install -r requirements.txt

# Build the large-v3 model using a single GPU with plugins.
python3 build.py --output_dir whisper_large_v3 --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin

# Build the large-v3 model using a single GPU with plugins and weight-only quantization.
python3 build.py --output_dir whisper_large_weight_only --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --use_weight_only

Run

# choose the engine you build [./whisper_large_v3, ./whisper_large_weight_only]
output_dir=./whisper_large_v3
# decode a single audio file
# If the input file does not have a .wav extension, ffmpeg needs to be installed with the following command:
# apt-get update && apt-get install -y ffmpeg
python3 run.py --name single_wav_test --engine_dir $output_dir --input_file assets/1221-135766-0002.wav
# decode a whole dataset
python3 run.py --engine_dir $output_dir --dataset hf-internal-testing/librispeech_asr_dummy --enable_warmup --name librispeech_dummy_large_v3_plugin

Acknowledgment

This implementation of TensorRT-LLM for Whisper has been adapted from the NVIDIA TensorRT-LLM Hackathon 2023 submission of Jinheng Wang, which can be found in the repository Eddie-Wang-Hackathon2023 on GitHub. We extend our gratitude to Jinheng for providing a foundation for the implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper

whisper

README.md

Whisper

Overview

Support Matrix

Usage

Build TensorRT engine(s)

Run

Acknowledgment

Files

whisper

Directory actions

More options

Directory actions

More options

Latest commit

History

whisper

Folders and files

parent directory

README.md

Whisper

Overview

Support Matrix

Usage

Build TensorRT engine(s)

Run

Acknowledgment