You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working on fine-tuning a LLaMA model and want to reduce the tokenizer vocabulary size to optimize memory consumption. Specifically, I would like to:
Retain special tokens, English characters, symbols, and numbers.
Remove tokens related to other languages (as I don’t need them).
My questions are:
Is it feasible to shrink the tokenizer vocabulary in this way and still use a pre-trained model for fine-tuning without affecting its performance significantly?
What are the recommended approaches or tools for modifying the tokenizer vocabulary in such cases?
Are there any caveats I should be aware of when performing this adjustment (e.g., issues with token embeddings or alignment with the pre-trained model)?
Is it a good idea at all to reduce the vocabulary size? Can it meaningfully reduce memory consumption and make generation faster?
Any guidance or references to similar implementations would be greatly appreciated.
Thank you!
The text was updated successfully, but these errors were encountered:
Hi,
I am working on fine-tuning a LLaMA model and want to reduce the tokenizer vocabulary size to optimize memory consumption. Specifically, I would like to:
Retain special tokens, English characters, symbols, and numbers.
Remove tokens related to other languages (as I don’t need them).
My questions are:
Is it feasible to shrink the tokenizer vocabulary in this way and still use a pre-trained model for fine-tuning without affecting its performance significantly?
What are the recommended approaches or tools for modifying the tokenizer vocabulary in such cases?
Are there any caveats I should be aware of when performing this adjustment (e.g., issues with token embeddings or alignment with the pre-trained model)?
Is it a good idea at all to reduce the vocabulary size? Can it meaningfully reduce memory consumption and make generation faster?
Any guidance or references to similar implementations would be greatly appreciated.
Thank you!
The text was updated successfully, but these errors were encountered: