Unsupervised text tokenizer and detokenizer for NLP
Unsupervised text tokenizer and detokenizer
spm_trainspm_encodespm_decodespm_export_vocab$ spm_train --input=corpus.txt --model_prefix=model --vocab_size=8000$ spm_encode --model=model.model < input.txt > encoded.txt$ spm_decode --model=model.model < encoded.txt > decoded.txt