sentencepiece

Unsupervised text tokenizer and detokenizer for NLP

brewmacoslinux
Try with needOr install directly
Source

About

Unsupervised text tokenizer and detokenizer

Commands

spm_trainspm_encodespm_decodespm_export_vocab

Examples

Train a SentencePiece model on a text corpus$ spm_train --input=corpus.txt --model_prefix=model --vocab_size=8000
Encode text using a trained model$ spm_encode --model=model.model < input.txt > encoded.txt
Decode tokenized text back to original form$ spm_decode --model=model.model < encoded.txt > decoded.txt