Skip to content

reazon-research/icefall

 
 

Repository files navigation

Introduction

The icefall project contains speech-related recipes for various datasets using k2-fsa and lhotse.

You can use sherpa, sherpa-ncnn or sherpa-onnx for deployment with models in icefall; these frameworks also support models not included in icefall; please refer to respective documents for more details.

You can try pre-trained models from within your browser without the need to download or install anything by visiting this huggingface space. Please refer to document for more details.

Installation

Please refer to document for installation.

Recipes

Please refer to document for more details.

ASR: Automatic Speech Recognition

Supported Datasets

More datasets will be added in the future.

Supported Models

The LibriSpeech recipe supports the most comprehensive set of models, you are welcome to try them out.

CTC

  • TDNN LSTM CTC
  • Conformer CTC
  • Zipformer CTC

MMI

  • Conformer MMI
  • Zipformer MMI

Transducer

  • Conformer-based Encoder
  • LSTM-based Encoder
  • Zipformer-based Encoder
  • LSTM-based Predictor
  • Stateless Predictor

Whisper

If you are willing to contribute to icefall, please refer to contributing for more details.

We would like to highlight the performance of some of the recipes here.

This is the simplest ASR recipe in icefall and can be run on CPU. Training takes less than 30 seconds and gives you the following WER:

[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]

We provide a Colab notebook for this recipe: Open In Colab

Please see RESULTS.md for the latest results.

test-clean test-other
WER 2.42 5.73

We provide a Colab notebook to test the pre-trained model: Open In Colab

test-clean test-other
WER 6.59 17.69

We provide a Colab notebook to test the pre-trained model: Open In Colab

test-clean test-other
greedy_search 3.07 7.51

We provide a Colab notebook to test the pre-trained model: Open In Colab

test-clean test-other
modified_beam_search (beam_size=4) 2.56 6.27

We provide a Colab notebook to test the pre-trained model: Open In Colab

WER (modified_beam_search beam_size=4 unless further stated)

  1. LibriSpeech-960hr
Encoder Params test-clean test-other epochs devices
Zipformer 65.5M 2.21 4.79 50 4 32G-V100
Zipformer-small 23.2M 2.42 5.73 50 2 32G-V100
Zipformer-large 148.4M 2.06 4.63 50 4 32G-V100
Zipformer-large 148.4M 2.00 4.38 174 8 80G-A100
  1. LibriSpeech-960hr + GigaSpeech
Encoder Params test-clean test-other
Zipformer 65.5M 1.78 4.08
  1. LibriSpeech-960hr + GigaSpeech + CommonVoice
Encoder Params test-clean test-other
Zipformer 65.5M 1.90 3.98
Dev Test
WER 10.47 10.58

Conformer Encoder + Stateless Predictor + k2 Pruned RNN-T Loss

Dev Test
greedy_search 10.51 10.73
fast_beam_search 10.50 10.69
modified_beam_search 10.40 10.51
Dev Test
greedy_search 10.31 10.50
fast_beam_search 10.26 10.48
modified_beam_search 10.25 10.38
test
CER 10.16

We provide a Colab notebook to test the pre-trained model: Open In Colab

test
CER 4.38

We provide a Colab notebook to test the pre-trained model: Open In Colab

WER (modified_beam_search beam_size=4)

Encoder Params dev test epochs
Zipformer 73.4M 4.13 4.40 55
Zipformer-small 30.2M 4.40 4.67 55
Zipformer-large 157.3M 4.03 4.28 56

1 Trained with all subsets:

test
CER 29.08

We provide a Colab notebook to test the pre-trained model: Open In Colab

TEST
PER 19.71%

We provide a Colab notebook to test the pre-trained model: Open In Colab

TEST
PER 17.66%

We provide a Colab notebook to test the pre-trained model: Open In Colab

dev test
modified_beam_search (beam_size=4) 6.91 6.33

We provide a Colab notebook to test the pre-trained model: Open In Colab

dev test
modified_beam_search (beam_size=4) 6.77 6.14

We provide a Colab notebook to test the pre-trained model: Open In Colab

Dev Test
greedy_search 5.53 6.59
fast_beam_search 5.30 6.34
modified_beam_search 5.27 6.33

We provide a Colab notebook to test the pre-trained model: Open In Colab

Dev Test-Net Test-Meeting
greedy_search 7.80 8.75 13.49
fast_beam_search 7.94 8.74 13.80
modified_beam_search 7.76 8.71 13.41

We provide a Colab notebook to test the pre-trained model: Open In Colab

Dev Test-Net Test-Meeting
greedy_search 8.78 10.12 16.16
fast_beam_search 9.01 10.47 16.28
modified_beam_search 8.53 9.95 15.81
Eval Test-Net
greedy_search 31.77 34.66
fast_beam_search 31.39 33.02
modified_beam_search 30.38 34.25

We provide a Colab notebook to test the pre-trained model: Open In Colab

The best results for Chinese CER(%) and English WER(%) respectively (zh: Chinese, en: English):

decoding-method dev dev_zh dev_en test test_zh test_en
greedy_search 7.30 6.48 19.19 7.39 6.66 19.13
fast_beam_search 7.18 6.39 18.90 7.27 6.55 18.77
modified_beam_search 7.15 6.35 18.95 7.22 6.50 18.70

We provide a Colab notebook to test the pre-trained model: Open In Colab

TTS: Text-to-Speech

Supported Datasets

Supported Models

Deployment with C++

Once you have trained a model in icefall, you may want to deploy it with C++ without Python dependencies.

Please refer to

for how to do this.

We also provide a Colab notebook, showing you how to run a torch scripted model in k2 with C++. Please see: Open In Colab

About

Implementing streaming ReazonSpeech model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.8%
  • Shell 1.9%
  • Other 0.3%