Package: tok 0.1.4.9000

Daniel Falbel

tok: Fast Text Tokenization

Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm <https://huggingface.co/docs/tokenizers/index>. It's extremely fast for both training new vocabularies and tokenizing texts.

Authors:Daniel Falbel [aut, cre], Regouby Christophe [ctb], Posit [cph]

tok_0.1.4.9000.tar.gz
tok_0.1.4.9000.zip(r-4.5)tok_0.1.4.9000.zip(r-4.4)tok_0.1.4.9000.zip(r-4.3)
tok_0.1.4.9000.tgz(r-4.4-x86_64)tok_0.1.4.9000.tgz(r-4.4-arm64)tok_0.1.4.9000.tgz(r-4.3-x86_64)tok_0.1.4.9000.tgz(r-4.3-arm64)
tok_0.1.4.9000.tar.gz(r-4.5-noble)tok_0.1.4.9000.tar.gz(r-4.4-noble)
tok.pdf |tok.html
tok/json (API)
NEWS

# Install 'tok' in R:
install.packages('tok', repos = c('https://mlverse.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/mlverse/tok/issues

On CRAN:

20 exports 40 stars 3.04 score 2 dependencies 1 dependents 7 scripts 335 downloads

Last updated 15 hours agofrom:33ec18b1fc. Checks:OK: 9. Indexed: yes.

TargetResultDate
Doc / VignettesOKSep 17 2024
R-4.5-win-x86_64OKSep 17 2024
R-4.5-linux-x86_64OKSep 17 2024
R-4.4-win-x86_64OKSep 17 2024
R-4.4-mac-x86_64OKSep 17 2024
R-4.4-mac-aarch64OKSep 17 2024
R-4.3-win-x86_64OKSep 17 2024
R-4.3-mac-x86_64OKSep 17 2024
R-4.3-mac-aarch64OKSep 17 2024

Exports:decoder_byte_levelencodingmodel_bpemodel_unigrammodel_wordpiecenormalizer_nfcnormalizer_nfkcpre_tokenizerpre_tokenizer_byte_levelpre_tokenizer_whitespaceprocessor_byte_leveltok_decodertok_modeltok_normalizertok_processortok_trainertokenizertrainer_bpetrainer_unigramtrainer_wordpiece

Dependencies:cliR6