Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: nllb200_distilled_600M official not running properly #63

Closed
Jason-JP-Yang opened this issue Aug 2, 2024 · 0 comments · Fixed by #64
Closed

[BUG]: nllb200_distilled_600M official not running properly #63

Jason-JP-Yang opened this issue Aug 2, 2024 · 0 comments · Fixed by #64

Comments

@Jason-JP-Yang
Copy link

I'm using nllb200_distilled_600M official model (using cache, not offine downloaded)
running following programs:

import dl_translate as dlt
import nltk
nltk.data.path.append(r"E:\xxx\nltk_data")

mt = dlt.TranslationModel("nllb200")
mt = dlt.TranslationModel("facebook/nllb-200-distilled-600M")

text = "This paper presents a literature survey on existing disparity map algorithms. It focuses on four main stages of processing as proposed by Scharstein and Szeliski in a taxonomy and evaluation of dense two-frame stereo correspondence algorithms performed in 2002. To assist future researchers in developing their own stereo matching algorithms, a summary of the existing algorithms developed for every stage of processing is also provided. The survey also notes the implementation of previous software-based and hardware-based algorithms. Generally, the main processing module for a software-based implementation uses only a central processing unit. By contrast, a hardware-based implementation requires one or more additional processors for its processing module, such as graphical processing unit or a field programmable gate array. This literature survey also presents a method of qualitative measurement that is widely used by researchers in the area of stereo vision disparity mappings. "
sents = nltk.tokenize.sent_tokenize(text, "english")
print("".join(mt.translate(sents, source="eng_Latn", target="zho_Hans")))

Bugs occur:

$  File "E:\xxx\translation.py", line 16, in <module>
$    print("".join(mt.translate(sents, source=dlt.lang.ENGLISH, target="zho_Hans")))
$  File "E:\xxx\Anaconda3\envs\DLTranslation\lib\site-packages\dl_translate\_translation_model.py", line 173, in translate
$    "forced_bos_token_id", self._tokenizer.lang_code_to_id[target]
$ AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id'

But when i using other model like m2m100, there is no problem, really need help!!!

@Galoist Galoist mentioned this issue Aug 19, 2024
@xhluca xhluca closed this as completed in #64 Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant