You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to ask what is the amount of monolingual data used in each experiment.
In the paper, as well as in this issue, you mention that you use...
... all of the monolingual data from WMT News Crawl datasets, which covers 190M, 62M and 270M sentences from the year 2007 to 2017 for English, French, German respectively.
In get-data-nmt.sh I see that you have commented out the download links to the News Crawl data from many years for each language.
I may have missed something or misread the issues, but I am confused about how much data you actually used. I would appreciate it if you helped clear my confusion.
Thanks!
The text was updated successfully, but these errors were encountered:
cbaziotis
changed the title
Confusion about the amount of data use in the experiments
Confusion about the amount of monolingual data used in the experiments
Jul 20, 2020
Same issue. Unable to tell which data is being used to reproduced the experiments. Can you please exactly specify how did you created the data for pre-training and fine-tuning, for en-fr, de-en and en-ro?
Thank you a lot. @StillKeepTry
Hi,
I would like to ask what is the amount of monolingual data used in each experiment.
I may have missed something or misread the issues, but I am confused about how much data you actually used. I would appreciate it if you helped clear my confusion.
Thanks!
The text was updated successfully, but these errors were encountered: