This is a work-in-progress walkthrough of the OpenAI tiktoken package including Python examples of text tokenization, viewing the internal dictionary, multi-byte unicode, logit bias, and more. The ...
A lightweight Rust library for training GPT-style BPE tokenizers. The tiktoken library is excellent for inference but doesn't support training. The HuggingFace tokenizers library supports training but ...