-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support fine-tuning #944
Support fine-tuning #944
Conversation
To do fine-tuning on another dataset or your own dataset, you just need to replace |
Here are a few experiments with different learning rate schedule/initializations:
|
@marcoyang1998 can you upload the tensorboard logs of the finetuning experiment |
See https://tensorboard.dev/experiment/5xPOGv6CQHO02YEApc1jxg/. @pingfengluo It's from the experiment with |
ok, thank you |
@marcoyang1998 hello, I found that using the same config, I can not reproduce the result, my running command is But my result only gets 14.66/15.04 my tensorboard logs are as follows:
Compared with https://tensorboard.dev/experiment/5xPOGv6CQHO02YEApc1jxg/#scalars, I found that my valid loss is smaller, but performance is worse? Could you please prvoide a checkpoint of |
@marcoyang1998 By the way, can you provide the result of GigaSpeech that is decoded from LibriSpeech pretrained model (without finetuning)? So I can validate my pretrained model is correct or not, THX. From the results above in #944 and #1059, there is no result without finetuning. |
I got 20.96/20.26 with modfied_beamsearch+beam=8 without ft. Two 3090 are used with max-duration=500. There are some differences between https://tensorboard.dev/experiment/5xPOGv6CQHO02YEApc1jxg/#scalars and my experiments: 1, LR scale: it changes from 5e-3 to 1.5e-3 during training (while mine only to 4.5e-3) The dataset is built differently, as I reuse the kaldi-fsa/kaldi:egs/gigaspeech as the data dir and use |
This seems to be caused by this change, it skips the warmup period and sets the batch count to a very large number: If you want to reproduce my results, you might need to remove the above few lines so that warmup is not skipped. But theoretically, warmup is not needed during finetuning as long as a small enough learning rate is used.
At 20k updates, my exp only reached the 8-th epoch. Since we are using using 2 GPUs with the same max-duration, this looks a bit weird to me. But overall I think the WER discrepancy is reasonable, as your initialization is worse than mine. |
We are using 3x speed perturb, which explains the total step difference. |
@marcoyang1998 I want to train on a new dataset (which is larger than the previous dataset) based on a previous checkpoint (I think that it's quite similar to fine-tuning). The previous checkpoint used to take 4-5 days to train for one epoch. However, I've found that now it takes only about 2 days to train for one epoch (which is larger than the previous dataset). Therefore, I suspect there might be a problem somewhere in the dataloader's iteration in the training in terms on the larger dataset, causing the model to not actually traverse all the data during the actual training. |
I haven't used the
|
I think the |
Yes, in most cases. |
This PR adds a fine-tune script for recipe
pruned_transducer_stateless7
.It fine-tunes a model trained with LibriSpeech on GigaSpeech. To do fine-tuning, you need to provide the path to the checkpoint from which the training will resume. You also need to set
--do-finetune True
. Below is an example of fine-tuning on the GigaSpeech subsetS
:The WERs on the GigaSpeech dev&test set after fine-tuning are shown below. As a reference, the WERs of the same model trained on GigaSpeech subset
S
are also shown: