Package: tok 0.1.5

Daniel Falbel

tok: Fast Text Tokenization

Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm <https://huggingface.co/docs/tokenizers/index>. It's extremely fast for both training new vocabularies and tokenizing texts.

Authors:Daniel Falbel [aut, cre], Regouby Christophe [ctb], Posit [cph]

tok_0.1.5.tar.gz
tok_0.1.5.zip(r-4.5)tok_0.1.5.zip(r-4.4)tok_0.1.5.zip(r-4.3)
tok_0.1.5.tgz(r-4.4-x86_64)tok_0.1.5.tgz(r-4.4-arm64)tok_0.1.5.tgz(r-4.3-x86_64)tok_0.1.5.tgz(r-4.3-arm64)
tok_0.1.5.tar.gz(r-4.5-noble)tok_0.1.5.tar.gz(r-4.4-noble)
tok.pdf |tok.html
tok/json (API)
NEWS

# Install 'tok' in R:
install.packages('tok', repos = c('https://mlverse.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/mlverse/tok/issues

On CRAN:

6.10 score 42 stars 1 packages 25 scripts 57 downloads 20 exports 2 dependencies

Last updated 30 days agofrom:ff883e2dba. Checks:OK: 9. Indexed: yes.

TargetResultDate
Doc / VignettesOKNov 22 2024
R-4.5-win-x86_64OKNov 22 2024
R-4.5-linux-x86_64OKNov 22 2024
R-4.4-win-x86_64OKNov 22 2024
R-4.4-mac-x86_64OKNov 22 2024
R-4.4-mac-aarch64OKNov 22 2024
R-4.3-win-x86_64OKNov 22 2024
R-4.3-mac-x86_64OKNov 22 2024
R-4.3-mac-aarch64OKNov 22 2024

Exports:decoder_byte_levelencodingmodel_bpemodel_unigrammodel_wordpiecenormalizer_nfcnormalizer_nfkcpre_tokenizerpre_tokenizer_byte_levelpre_tokenizer_whitespaceprocessor_byte_leveltok_decodertok_modeltok_normalizertok_processortok_trainertokenizertrainer_bpetrainer_unigramtrainer_wordpiece

Dependencies:cliR6