Package: tok 0.2.2.9000

Tomasz Kalinowski

tok: Fast Text Tokenization

Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm <https://huggingface.co/docs/tokenizers/index>. It's extremely fast for both training new vocabularies and tokenizing texts.

Authors:Tomasz Kalinowski [ctb, cre], Daniel Falbel [aut], Regouby Christophe [ctb], Posit [cph]

tok_0.2.2.9000.tar.gz
tok_0.2.2.9000.zip(r-4.7)tok_0.2.2.9000.zip(r-4.6)tok_0.2.2.9000.zip(r-4.5)
tok_0.2.2.9000.tgz(r-4.6-x86_64)tok_0.2.2.9000.tgz(r-4.6-arm64)tok_0.2.2.9000.tgz(r-4.5-x86_64)tok_0.2.2.9000.tgz(r-4.5-arm64)
tok_0.2.2.9000.tar.gz(r-4.7-arm64)tok_0.2.2.9000.tar.gz(r-4.7-x86_64)tok_0.2.2.9000.tar.gz(r-4.6-arm64)tok_0.2.2.9000.tar.gz(r-4.6-x86_64)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
tok/json (API)

# Install 'tok' in R:
install.packages('tok', repos = c('https://mlverse.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/mlverse/tok/issues

On CRAN:

Conda:

rustcargo

6.10 score 47 stars 1 packages 18 scripts 80 downloads 20 exports 2 dependencies

Last updated from:f925ad65e3. Checks:12 OK, 1 FAIL. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64OK248
linux-devel-x86_64OK246
source / vignettesOK281
linux-release-arm64OK252
linux-release-x86_64OK240
macos-release-arm64OK158
macos-release-x86_64OK482
macos-oldrel-arm64OK211
macos-oldrel-x86_64OK466
windows-develOK300
windows-releaseOK312
windows-oldrelOK335
wasm-releaseFAIL198

Exports:decoder_byte_levelencodingmodel_bpemodel_unigrammodel_wordpiecenormalizer_nfcnormalizer_nfkcpre_tokenizerpre_tokenizer_byte_levelpre_tokenizer_whitespaceprocessor_byte_leveltok_decodertok_modeltok_normalizertok_processortok_trainertokenizertrainer_bpetrainer_unigramtrainer_wordpiece

Dependencies:cliR6