-
Notifications
You must be signed in to change notification settings - Fork 815
Issues: huggingface/tokenizers
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Cannot find package 'tokenizers-linux-x64-musl' - Alpine support
#1703
opened Dec 14, 2024 by
PylotLight
if split_special_tokens==True,fast_tokenizer is slower than slow_tokenizer
#1700
opened Dec 12, 2024 by
gongel
Request for pre-tokenizer that creates words based length alone.
#1697
opened Dec 10, 2024 by
filbeofITK
How to determine the splicing logic in post_processor based on the sentence to be tokenized?
#1696
opened Dec 5, 2024 by
gongel
Bug: is_pretokenized is not used when calling tokenizer.encode(...)
#1695
opened Nov 29, 2024 by
jannessm
wikitext-103-raw-v1.zip is not available on the amazonaws anymore
#1683
opened Nov 18, 2024 by
gec1-dev
out of memory when training a BBPE tokenizer on a large corpus
#1681
opened Nov 14, 2024 by
yucc-leon
Option to disable cache for FromPretrained and FromFile
Feature Request
#1680
opened Nov 12, 2024 by
daulet
Allow users to select/write encoding strategies
Feature Request
#1655
opened Oct 16, 2024 by
pietrolesci
Inconsistent behaviour of Something isn't working
PreTrainedTokenizerFast
s on diacritics marked texts
bug
#1663
opened Oct 11, 2024 by
sven-nm
2 of 4 tasks
Disable pretty-print when saving tokenizer.json files
Feature Request
#1656
opened Oct 7, 2024 by
xenova
How to build a custom tokenizer on top of a exsiting Llama 3.2 tokenizer?
training
#1644
opened Oct 5, 2024 by
yakhyo
NormalizedString.clear() broken?
bug
Something isn't working
#1636
opened Sep 25, 2024 by
lkurlandski
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.