Has anybody pre-trained successfully on en-de translation with MASS ? #62

ZhenYangIACAS · 2019-09-12T13:25:35Z

I have tried a lot to reproduce the results on En-de, but I failed.

StillKeepTry · 2019-09-14T07:12:16Z

I have fixed some params in the uploaded model, Do you have a try?

ZhenYangIACAS · 2019-09-14T07:50:06Z

I did not mean to reload the uploaded model. I want to reproduce the results from scratch (from pre-training to finetune). But I failed on pre-training. Anyway, which params have you fixed?

yuekai146 · 2020-01-15T02:17:39Z

I also failed in pretraining from scratch. Here is my training script.

    export NGPU=8
    python3 -m torch.distributed.launch --nproc_per_node=$NGPU train.py \
            --exp_name unsupMT_deen \
            --data_path /data/corpus/news_crawl/de-en/ \
            --lgs 'de-en' \
            --mass_steps 'de,en' \
            --encoder_only false \
            --emb_dim 512 \
            --n_layers 6 \
            --n_heads 8 \
            --dropout 0.1 \
            --attention_dropout 0.1 \
            --gelu_activation true \
            --tokens_per_batch 3000 \
            --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001 \
            --epoch_size 200000 \
            --max_epoch 100 \
            --eval_bleu true \
            --word_mass 0.5 \
            --min_len 5

I use 170,000,000 German sentences and 100,000,000 English sentences, due to memory issues, I use transformer-base instead of transformer-big. Here is my fine tune script.

    export NGPU=8
    ckpt_path="../base/dumped/unsupMT_deen/9g0eku48dy"
    python3 -m torch.distributed.launch --nproc_per_node=$NGPU train.py \
            --exp_name unsupMT_deen_ft \
            --data_path /data/corpus/news_crawl/de-en/ \
            --lgs 'de-en' \
            --bt_steps 'de-en-de,en-de-en' \
            --encoder_only false \
            --emb_dim 512 \
            --n_layers 6 \
            --n_heads 8 \
            --dropout 0.1 \
            --attention_dropout 0.1 \
            --gelu_activation true \
            --tokens_per_batch 2000 \
            --batch_size 32 \
            --bptt 256 \
            --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001 \
            --epoch_size 200000 \
            --max_epoch 30 \
            --save_periodic 1 \
            --eval_bleu true \
            --reload_model "$ckpt_path/checkpoint.pth,$ckpt_path/checkpoint.pth"

I could only get 22.86 BLEU points in translating German to English on newstest2016, which is far from what is reported in the paper.

Could you give me some advices on pretraining from scratch and how to fully reproduce your results?

tdomhan · 2020-02-05T12:04:48Z

@StillKeepTry Would you maybe be able to share the training log of the pre-trained models that are offered as downloads?

StillKeepTry · 2020-02-05T14:01:38Z

@tdomhan here is the log

tdomhan · 2020-02-05T15:41:46Z

Thanks!

tdomhan · 2020-02-06T18:07:23Z

@StillKeepTry quick question: the logs indicate that the training was done with 5 million sentences. Does this mean that the pretrained models offered were trained with a subset of the monolingual data?

tdomhan · 2020-02-10T08:45:00Z

@StillKeepTry Can you confirm that the provided pre-trained model was only trained with 5 million sentences?

tdomhan · 2020-02-20T09:39:53Z

@StillKeepTry Could you confirm that the pre-trained models provided are trained on a subsample? If so did you randomly subsample the newscrawl data or how were the 5 (or 50) million sentences selected?

StillKeepTry · 2020-02-20T13:13:55Z

@tdomhan It is trained on a subsample (50 million sentences). The corpus is first tokenized by mosesdecoder and then I remove the sentences which length > 175 after tokenization. Finally, I will randomly choose 50M sentences from the tokenized data.

tdomhan · 2020-02-21T11:26:13Z

Thanks!

cbaziotis mentioned this issue Jul 20, 2020

Confusion about the amount of monolingual data used in the experiments #160

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Has anybody pre-trained successfully on en-de translation with MASS ? #62

Has anybody pre-trained successfully on en-de translation with MASS ? #62

ZhenYangIACAS commented Sep 12, 2019

StillKeepTry commented Sep 14, 2019

ZhenYangIACAS commented Sep 14, 2019

yuekai146 commented Jan 15, 2020 •

edited

Loading

tdomhan commented Feb 5, 2020

StillKeepTry commented Feb 5, 2020

tdomhan commented Feb 5, 2020

tdomhan commented Feb 6, 2020

tdomhan commented Feb 10, 2020

tdomhan commented Feb 20, 2020

StillKeepTry commented Feb 20, 2020

tdomhan commented Feb 21, 2020

Has anybody pre-trained successfully on en-de translation with MASS ? #62

Has anybody pre-trained successfully on en-de translation with MASS ? #62

Comments

ZhenYangIACAS commented Sep 12, 2019

StillKeepTry commented Sep 14, 2019

ZhenYangIACAS commented Sep 14, 2019

yuekai146 commented Jan 15, 2020 • edited Loading

tdomhan commented Feb 5, 2020

StillKeepTry commented Feb 5, 2020

tdomhan commented Feb 5, 2020

tdomhan commented Feb 6, 2020

tdomhan commented Feb 10, 2020

tdomhan commented Feb 20, 2020

StillKeepTry commented Feb 20, 2020

tdomhan commented Feb 21, 2020

yuekai146 commented Jan 15, 2020 •

edited

Loading