Design thoughts #1

danpovey · 2021-07-01T03:14:18Z

Guys (mostly Piotr but also anyone who's listening),

Firstly, on timeline, we need something working well in time for September 1st-ish, the time we give the tutorial.
So we can't be too ambitious: think, cleaned up and reorganized version of Snowfall, working by early-to-mid August. Sorry I have delayed this for so long. Liyong is working on replicating ESPNet results with k2 mechanisms, he is making good progress, we may want to incorporate parts of that.

I want to avoid big centralized APIs at the moment.

I also want to avoid the phenomenon in SpeechBrain and ESPNet where there is a kind of "configuration layer" where
you pass in configs, and these get parsed into actual python code by some other code. I would rather keep it all
Python code. Suppose we have a directory (this doesn't have to be the real name):
egs/librispeech/ASR/
then I am thinking we can have subdirectories of that where the scripts for different versions of experiments live.
We might have some data-prep scripts:
egs/librispeech/ASR/{prepare.sh,local/blahblah,...}
and these would write to some subdirectory, e.g. egs/librispeech/ASR/data/...
Then for different experiments we'd have the scripts in subdirectories, like:
egs/librispeech/ASR/tdnn_lstm_ctc/{model.py,train.py,decode.py,README.md}
and we might have
egs/librispeech/ASR/conformer_mmi/{model.py,train.py,decode.py,README.md}
that would refer to the alignment model in e.g. ../tdnn_lstm_ctc/8.pt, and to the data in ../data/blah...

The basic idea here is that if you want to change the experiment locally, you would copy-and-modify the scripts in conformer_mmi to e.g. conformer_mmi_1a/, and add them to your git repo if wanted. We would avoid overloading the scripts in these experiment directories with command-line options. Any back-compatibility would be at the level of the icefall Python libraries themselves. We could perhaps introduce versions of the data directories as well, e.g. data/, data2/ and so on (not sure whether it would make sense to have multiple versions of the data-prep scripts or use options).

In order to avoid overloading the model code, and utility code, with excessive back-compatibility, I suggest that we have versions of the model code and maybe even parts of the other libraries: e.g. snowfall/models1/. Then we can add options etc., but when it becomes oppressive we can just copy-and-modify to models2/ and strip out most of the options. This will tend to reduce "cyclomatic complexity" by keeping any given version of the code simple. At this point, let's think of this to some extent as a demo tool for k2 and lhotse, we don't have to think of it as some vast toolkit with zillions of features.

pzelasko · 2021-07-01T12:39:39Z

OK, that is pretty similar to Kaldi. I can work with that. I also recently started to dislike the "configuration layer" and somehow prefer to write just the necessary things in a given script/notebook...

Some suggestions:

let's aim to keep all the model definitions in the "library" part of Icefall and specifically avoid local for that -- I feel that "well-tuned" configurations should be more easily re-usable than copy-paste style;
instead of models1/models2/modelsN, we could do icefall/models/conformer/v{1,2,3}.py, icefall/models/tdnnf/v{1,2,3}.py, etc. (or even better models/conformer/conformer_{1a,1b,1c}.py), I like that approach more because models1/models2 suggests it's a different, new version of the toolkit (like espnet, espnet2)
it'd be best to have the lexicon/lang creation in Python in the "library" part (as per Building lexicons in Python snowfall#191 which maybe I'll finally find the time for now that I'm back) -- I'd like to be able to re-use phone/BPE/subword/character lexicons and lexicon-related code across recipes, also merge multiple lexicons e.g. for multilingual/multi-accent models

Mid/long-term considerations:

ideally, every "training" script wouldn't be too long (I think ~200, maybe ~300 loc sounds reasonable, unless we come up with something very involved and non-standard), we should regularly move some common patterns to the "library" part as functions/classes...
we should at least consider a common function signature for all model forward methods to allow simpler use via torchscript (or we could add methods like .get_inputs_info() [not necessarily with that name] that help users figure out which inputs to provide, although maybe that beats the purpose?)

pzelasko · 2021-07-01T12:41:57Z

One more suggestion: let's try to aim to make the library friendly to use with jupyter notebooks by making most of the functionality importable (even if it's something used just by one recipe I guess); they are really great for prototyping/tinkering with these things

danpovey · 2021-07-02T06:05:11Z

OK, that is pretty similar to Kaldi. I can work with that. I also recently started to dislike the "configuration layer" and somehow prefer to write just the necessary things in a given script/notebook...

Cool!

let's aim to keep all the model definitions in the "library" part of Icefall and specifically avoid local for that -- I feel that "well-tuned" configurations should be more easily re-usable than copy-paste style;

I guess I'm OK with that, if we mean things like the conformer being in the "library", but I think it could be a good idea to use local model.py to "put pieces together". The point is, we may want to experiment with a lot of new things, and I want it to be easy to do.

instead of models1/models2/modelsN, we could do icefall/models/conformer/v{1,2,3}.py, icefall/models/tdnnf/v{1,2,3}.py, etc. (or even better models/conformer/conformer_{1a,1b,1c}.py), I like that approach more because models1/models2 suggests it's a different, new version of the toolkit (like espnet, espnet2)

OK.

it'd be best to have the lexicon/lang creation in Python in the "library" part (as per Building lexicons in Python snowfall#191 which maybe I'll finally find the time for now that I'm back) -- I'd like to be able to re-use phone/BPE/subword/character lexicons and lexicon-related code across recipes, also merge multiple lexicons e.g. for multilingual/multi-accent models

After seeing a draft I might agree. I'm OK to centralize some code code after patterns emerge. But I want any utilities and centralized classes to befairly simple and easily separable, so that you can understand one piece without understanding the whole.

Mid/long-term considerations:

ideally, every "training" script wouldn't be too long (I think ~200, maybe ~300 loc sounds reasonable, unless we come up with something very involved and non-standard), we should regularly move some common patterns to the "library" part as functions/classes...

Sure, in the longer term hopefully they can become much shorter.

we should at least consider a common function signature for all model forward methods to allow simpler use via torchscript (or we could add methods like .get_inputs_info() [not necessarily with that name] that help users figure out which inputs to provide, although maybe that beats the purpose?)

I'm OK with settling on a particular order of tensors, e.g. (N, C, T), not sure if that's what you mean? (I think speechbrain does that?).

pzelasko · 2021-07-02T12:01:00Z

I'm OK with settling on a particular order of tensors, e.g. (N, C, T), not sure if that's what you mean? (I think speechbrain does that?).

my point was about the number of args/kwargs — currently the signature of forward in our models is inconsistent, so you can’t use them as drop in replacements with inference code. 100% plug and play doesn’t seem attainable (we may want to keep adding inputs to try out new things) but at the very least we could modify all signatures to accept (*args, **kwargs) so that simpler models can work with extra inputs.

danpovey · 2021-07-03T03:47:35Z

Mm, do speechbrain or ESPNet do that? I'm cautious about this. Seems to me potentially a recipe for bugs.

…

On Fri, Jul 2, 2021 at 8:01 PM Piotr Żelasko ***@***.***> wrote: I'm OK with settling on a particular order of tensors, e.g. (N, C, T), not sure if that's what you mean? (I think speechbrain does that?). my point was about the number of args/kwargs — currently the signature of forward in our models is inconsistent, so you can’t use them as drop in replacements with inference code. 100% plug and play doesn’t seem attainable (we may want to keep adding inputs to try out new things) but at the very least we could modify all signatures to accept (*args, **kwargs) so that simpler models can work with extra inputs. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOYUPVSCFI6IRISM65LTVWTANANCNFSM47TSKU2A> .

pzelasko · 2021-07-03T11:04:37Z

Mm, do speechbrain or ESPNet do that? I'm cautious about this. Seems to me potentially a recipe for bugs.
…

I don't think they do. My knowledge might be out of date but the last time I checked, none of the existing PyTorch-based ASR frameworks I know of seriously addressed the matter of model deployment.

danpovey · 2021-07-03T11:14:11Z

We can consider such things later on. We are working on a deadline right now and I want to consider this a demo of k2/lhotse for now, so we can explore architectures, before really settling on common APIs.

danpovey mentioned this issue Sep 2, 2021

RuntimeError: Specified device cuda:0 does not match device of data cuda:-2 #33

Closed

Lzhang-hub mentioned this issue Oct 11, 2021

CUDA out of memory in decoding #70

Open

danpovey mentioned this issue Nov 27, 2021

Decoding error 'Fsa' object doesn't support assignment. #133

Open

wwxm0523 mentioned this issue Jan 30, 2022

LF-MMI GPU OOM #196

Open

ahazned mentioned this issue Apr 13, 2022

Illegal memory error when training with multi-GPU #247

Open

armusc mentioned this issue May 10, 2022

memory blows up in LG determinization #357

Open

ngoel17 mentioned this issue Dec 12, 2022

LODR RNNLM rescoring requirements #749

Open

iggygeek mentioned this issue Nov 29, 2023

Zipformer training crash : 'cannot set number of interop threads ' ... #1395

Closed

ngoel17 mentioned this issue Sep 30, 2024

Illegal memory access during zipformer training #1764

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design thoughts #1

Design thoughts #1

danpovey commented Jul 1, 2021

pzelasko commented Jul 1, 2021

pzelasko commented Jul 1, 2021

danpovey commented Jul 2, 2021

pzelasko commented Jul 2, 2021

danpovey commented Jul 3, 2021 via email

pzelasko commented Jul 3, 2021

danpovey commented Jul 3, 2021

Design thoughts #1

Design thoughts #1

Comments

danpovey commented Jul 1, 2021

pzelasko commented Jul 1, 2021

pzelasko commented Jul 1, 2021

danpovey commented Jul 2, 2021

pzelasko commented Jul 2, 2021

danpovey commented Jul 3, 2021 via email

pzelasko commented Jul 3, 2021

danpovey commented Jul 3, 2021