Package: tok 0.2.2.9000

Tomasz Kalinowski

tok: Fast Text Tokenization

Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm <https://huggingface.co/docs/tokenizers/index>. It's extremely fast for both training new vocabularies and tokenizing texts.

Authors:Tomasz Kalinowski [ctb, cre], Daniel Falbel [aut], Regouby Christophe [ctb], Posit [cph]

tok_0.2.2.9000.tar.gz
tok_0.2.2.9000.zip(r-4.7)tok_0.2.2.9000.zip(r-4.6)tok_0.2.2.9000.zip(r-4.5)
tok_0.2.2.9000.tgz(r-4.6-x86_64)tok_0.2.2.9000.tgz(r-4.6-arm64)tok_0.2.2.9000.tgz(r-4.5-x86_64)tok_0.2.2.9000.tgz(r-4.5-arm64)
tok_0.2.2.9000.tar.gz(r-4.7-arm64)tok_0.2.2.9000.tar.gz(r-4.7-x86_64)tok_0.2.2.9000.tar.gz(r-4.6-arm64)tok_0.2.2.9000.tar.gz(r-4.6-x86_64)
manual.pdf |manual.html
card.svg |card.png
tok/json (API)
NEWS

# Install 'tok' in R:
install.packages('tok', repos = c('https://mlverse.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/mlverse/tok/issues

On CRAN:

Conda:

rustcargo

7.07 score 47 stars 1 packages 15 scripts 11k downloads 20 exports 2 dependencies

Last updated from:f925ad65e3. Checks:12 OK, 1 FAIL. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64OK267
linux-devel-x86_64OK253
source / vignettesOK339
linux-release-arm64OK256
linux-release-x86_64OK290
macos-release-arm64OK209
macos-release-x86_64OK411
macos-oldrel-arm64OK176
macos-oldrel-x86_64OK413
windows-develOK365
windows-releaseOK294
windows-oldrelOK295
wasm-releaseFAIL226

Exports:decoder_byte_levelencodingmodel_bpemodel_unigrammodel_wordpiecenormalizer_nfcnormalizer_nfkcpre_tokenizerpre_tokenizer_byte_levelpre_tokenizer_whitespaceprocessor_byte_leveltok_decodertok_modeltok_normalizertok_processortok_trainertokenizertrainer_bpetrainer_unigramtrainer_wordpiece

Dependencies:cliR6