-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mux to fine-tune recipe for pruned_transducer_stateless7 #1059
Conversation
@@ -104,6 +103,7 @@ def set_batch_count(model: Union[nn.Module, DDP], batch_count: float) -> None: | |||
|
|||
def add_finetune_arguments(parser: argparse.ArgumentParser): | |||
parser.add_argument("--do-finetune", type=str2bool, default=False) | |||
parser.add_argument("--use-mux", type=str2bool, default=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add some documentation for this argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
We should probably skip warmup even though it doesn't seem to help results, because it's a bit dangerous to include warmup code in an already-trained model (could cause regression if learning rate is too high). |
I see. I will merge the skip-warmup version later. |
Docs: #1074 |
@yfyeung hello, could you please provide the config of CMD: There is only one difference, I use pruned_transducer_stateless7_ctc as the pretrained model (without ctc-loss-scale during finetuning). Moreover, why epoch 11 avg 5 is worse than epoch 11 avg 10 while the valid loss is getting smaller? Do you know? THX |
Please have a look at #944. |
This PR adds
Lhotse.mux
to fine-tune recipe for pruned_transducer_stateless7.For adapting already-trained models to new data, we mix 5% of the new/adaptation data with 95% of the original data by
CutSet.mux
in Lhotse to fine-tune.Our observation:
Pretrained model and bpe model needed for fine-tuning:
https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless7-2022-11-11
Usage:
Here are some results of adapting a model trained with LibriSpeech on GigaSpeech S:
Note: we use the LR-related settings
--base-lr 0.005, --lr-epochs 100, --lr-batches 100000
Baseline
Train from scratch on GigaSpeech S
Fine-tune from LibriSpeech
Note: It gets much higher WERs on LibriSpeech (about 3.1/7.1).
Full Result
Ablation Study
Skip warm-up