update runner doc (#778)

pytorch · Jul 17, 2024 · 001b279 · 001b279
1 parent c43135f
commit 001b279
Show file tree

Hide file tree

Showing 2 changed files with 12 additions and 0 deletions.
diff --git a/docs/runner_build.md → docs/native-execution.md b/docs/runner_build.md → docs/native-execution.md
diff --git a/parking_lot/unsupported/runner-tokenizer.md b/parking_lot/unsupported/runner-tokenizer.md
@@ -0,0 +1,12 @@
+The SentencePiece tokenizer implementations for Python (developed by
+Google) and the C/C++ implementation (developed by Andrej Karpathy)
+use different input formats. The Python implementation reads a
+tokenizer specification in tokenizer.model format. The C/C++ tokenizer
+that reads the tokenizer instructions from a file in tokenizer.bin
+format. We include Andrej's SentencePiece converter which translates a
+SentencePiece tokenizer in tokenizer.model format to tokenizer.bin in
+the XXXutilsXXX subdirectory:
+
+```
+python3 XXXutilsXXX/tokenizer.py --tokenizer-model=${MODEL_DIR}/tokenizer.model
+```