Implementation of model average. #1

yaozengwei · 2022-05-01T15:23:55Z

This is the implementation of Dan's idea about model average. (see k2-fsa#337)

yaozengwei · 2022-05-02T05:19:19Z

The codes are based on egs/librispeech/pruned_transducer_stateless2.
During training, the averaged model model_avg is updated each average_period batches with:
model_avg = (average_period / batch_idx_train) * model + ((batch_idx_train - average_period) / batch_idx_train) * model_avg
During decoding, Let start = batch_idx_train of model-start; end = batch_idx_train of model-end. Then the averaged model avg over epoch [start+1, start+2, ..., end] is avg = (model_end * end - model_start * start) / (end - start).
When trained on train-clean-100 with 3 gpu for 30 epochs, average_period=100, I got following results with greedy search decoding:

decode with epoch-29, avg=5, 7.14 & 19.33 (without averaged model) -> 7.03 & 18.85 (with averaged model);
decode with epoch-29, avg=10, 6.99 & 18.93 (without averaged model) -> 6.91 & 18.65 (with averaged model).

When trained on full librispeech with 6 gpu for 30 epochs, average_period=100, I got following results with greedy search decoding:

decode with epoch-29, avg=5, 2.77 & 6.77 (without averaged model) -> 2.72 & 6.67 (with averaged model);
decode with epoch-29, avg=10, 2.78 & 6.68 (without averaged model) -> 2.74 & 6.67 (with averaged model).

csukuangfj · 2022-05-02T06:26:32Z

egs/librispeech/ASR/pruned_transducer_stateless3/decode.py

+"""
+Usage:
+(1) greedy search
+./pruned_transducer_stateless2/decode.py \


Suggested change

./pruned_transducer_stateless2/decode.py \

./pruned_transducer_stateless3/decode.py \

Also, please sync with the latest k2/icefall and rename it to pruned_transducer_stateless4

csukuangfj · 2022-05-02T06:30:30Z

egs/librispeech/ASR/pruned_transducer_stateless3/decode.py

+            model.load_state_dict(average_checkpoints(filenames, device=device))
+    else:
+        assert params.iter == 0
+        start = params.epoch - params.avg


Please add more doc to --use-average-model.
It is not clear how it is used in the code from the current help info.

csukuangfj · 2022-05-02T06:33:04Z

egs/librispeech/ASR/pruned_transducer_stateless3/decode.py

+        filename_start = f"{params.exp_dir}/epoch-{start}.pt"
+        filename_end = f"{params.exp_dir}/epoch-{params.epoch}.pt"
+        logging.info(
+            f"averaging modes over range with {filename_start} (excluded) "


Suggested change

f"averaging modes over range with {filename_start} (excluded) "

f"averaging models over range with {filename_start} (excluded) "

csukuangfj · 2022-05-02T06:36:21Z

icefall/checkpoint.py

@@ -118,6 +126,10 @@ def load_checkpoint(

    checkpoint.pop("model")

+    if model_avg is not None and "model_avg" in checkpoint:
+        model_avg.load_state_dict(checkpoint["model_avg"], strict=strict)


Please add a log here, e.g., saying "loading averaged model".

csukuangfj · 2022-05-02T06:44:18Z

icefall/checkpoint.py

+    # Identify shared parameters. Two parameters are said to be shared
+    # if they have the same data_ptr
+    uniqued: Dict[int, str] = dict()
+    for k, v in avg.items():
+        v_data_ptr = v.data_ptr()
+        if v_data_ptr in uniqued:
+            continue
+        uniqued[v_data_ptr] = k
+
+    uniqued_names = list(uniqued.values())
+    for k in uniqued_names:
+        avg[k] *= weight_end
+        avg[k] += model_start[k] * weight_start
+


This part is almost the same as the above function. Please refactor it to reduce redundant code.

csukuangfj · 2022-05-02T06:45:41Z

egs/librispeech/ASR/pruned_transducer_stateless3/train.py

+    parser.add_argument(
+        "--start-epoch",
+        type=int,
+        default=0,


Please change it so that epoch is counted from 1, not 0.

csukuangfj · 2022-05-02T06:46:31Z

egs/librispeech/ASR/pruned_transducer_stateless3/train.py

+def load_checkpoint_if_available(
+    params: AttributeDict,
+    model: nn.Module,
+    model_avg: nn.Module = None,


Suggested change

model_avg: nn.Module = None,

model_avg: Optional[nn.Module] = None,

csukuangfj · 2022-05-02T06:47:05Z

egs/librispeech/ASR/pruned_transducer_stateless3/train.py

+        The return value of :func:`get_params`.
+      model:
+        The training model.
+      optimizer:


Please update the doc to include model_avg.

csukuangfj · 2022-05-02T06:49:08Z

egs/librispeech/ASR/pruned_transducer_stateless3/train.py

+    logging.info(f"Number of model parameters: {num_param}")
+
+    assert params.save_every_n >= params.average_period
+    model_avg: nn.Module = None


Suggested change

model_avg: nn.Module = None

model_avg: Optional[nn.Module] = None

…n the pruned_transducer_stateless4/decode.py

…_model

yaozengwei added 4 commits May 1, 2022 23:20

First upload of model average codes.

fba9ae0

minor fix

08b37e0

update decode file

aea8a03

update .flake8

36c241e

csukuangfj reviewed May 2, 2022

View reviewed changes

yaozengwei added 8 commits May 4, 2022 21:37

rename pruned_transducer_stateless3 to pruned_transducer_stateless4

44d75e2

Merge remote-tracking branch 'k2-fsa/master' into model_avg_new

389e899

Merge branch 'model_avg_new' into model_avg

8eb380d

change epoch number counter starting from 1 instead of 0

ff3c0d5

minor fix of pruned_transducer_stateless4/train.py

a0592e0

refactor the checkpoint.py

8bf2fef

minor fix, update docs, and modify the epoch number to count from 1 i…

22ecc56

…n the pruned_transducer_stateless4/decode.py

update author info

5c07402

yaozengwei mentioned this pull request May 5, 2022

Model average k2-fsa/icefall#344

Merged

add docs of the scaling in function average_checkpoints_with_averaged…

4f18d52

…_model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of model average. #1

Implementation of model average. #1

yaozengwei commented May 1, 2022 •

edited

Loading

yaozengwei commented May 2, 2022 •

edited

Loading

csukuangfj May 2, 2022 •

edited

Loading

yaozengwei May 2, 2022

csukuangfj May 2, 2022

csukuangfj May 2, 2022

csukuangfj May 2, 2022

yaozengwei May 2, 2022

csukuangfj May 2, 2022

csukuangfj May 2, 2022

csukuangfj May 2, 2022

csukuangfj May 2, 2022

csukuangfj May 2, 2022

	./pruned_transducer_stateless2/decode.py \
	./pruned_transducer_stateless3/decode.py \

	f"averaging modes over range with {filename_start} (excluded) "
	f"averaging models over range with {filename_start} (excluded) "

	model_avg: nn.Module = None,
	model_avg: Optional[nn.Module] = None,

	model_avg: nn.Module = None
	model_avg: Optional[nn.Module] = None

Implementation of model average. #1

Are you sure you want to change the base?

Implementation of model average. #1

Conversation

yaozengwei commented May 1, 2022 • edited Loading

yaozengwei commented May 2, 2022 • edited Loading

csukuangfj May 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yaozengwei commented May 1, 2022 •

edited

Loading

yaozengwei commented May 2, 2022 •

edited

Loading

csukuangfj May 2, 2022 •

edited

Loading