Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unofficial bindings / ports in other languages #97

Open
hauntsaninja opened this issue Apr 5, 2023 · 11 comments
Open

Unofficial bindings / ports in other languages #97

hauntsaninja opened this issue Apr 5, 2023 · 11 comments

Comments

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Apr 5, 2023

The following projects are not maintained by OpenAI. I cannot vouch that any of them are correct or safe to use. Use at your own risk.

Note that if a tokeniser fails to exactly match tiktoken's behaviour, you may get worse results when sampling from models, with no warning.

Javascript

Rust

Java

Ruby

C#

Go

PHP

Kotlin

Thanks to everyone for building useful things!

I'm happy to link to other projects in this comment.

@hauntsaninja hauntsaninja changed the title Unofficial bindings in other languages Unofficial bindings / ports in other languages Apr 5, 2023
@hauntsaninja hauntsaninja pinned this issue Apr 5, 2023
@bluescreen10
Copy link

👋,

I built a port for go that you can find in the link below

https://github.com/tiktoken-go/tokenizer

@fang2hou
Copy link

fang2hou commented Apr 9, 2023

I am currently using a another port in Go.
https://github.com/pkoukk/tiktoken-go

@rex-remind101
Copy link

Hello @hauntsaninja , I was looking at https://github.com/openai/tiktoken/blob/main/src/lib.rs and it appears to be written in Rust. Could this be open sourced into a crate of its own?

@hauntsaninja
Copy link
Collaborator Author

See the FAQ #98

@danielcompton
Copy link

@hauntsaninja would it be possible to publish the full test suite publicly? That would make it easier to tell whether a given implementation matches (or is close to) the official implementation.

@niieani
Copy link

niieani commented Jun 1, 2023

Here's a pure JavaScript / TypeScript port of tiktoken: https://github.com/niieani/gpt-tokenizer
Playground online: https://gpt-tokenizer.dev

@shylockWu

This comment was marked as resolved.

@niieani
Copy link

niieani commented Sep 5, 2023

@shylockWu they're not incorrect. You've set gpt-tokenizer to tokenize using GPT-3.5/GPT-4 encoding, whereas the official openAI token calculator uses the older GPT-3. If you switch the playground to use the older model, you'll get the same result.

@danny50610
Copy link

👋

I ported a version of PHP, link here

https://github.com/danny50610/bpe-tokeniser

@aallam
Copy link

aallam commented Oct 11, 2023

I have built and published a port for Kotlin: https://github.com/aallam/ktoken :)

@Gabriella439
Copy link

Pure Haskell implementation of tiktoken: https://hackage.haskell.org/package/tiktoken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants