-
Notifications
You must be signed in to change notification settings - Fork 26
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from petergoldstein/feature/add_github_actions_ci
Add CI with GitHub Actions
- Loading branch information
Showing
12 changed files
with
118 additions
and
69 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
name: build | ||
on: [push, pull_request] | ||
jobs: | ||
test: | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
os: [ubuntu-latest] | ||
ruby: [3.2, 3.1, "3.0"] | ||
runs-on: ${{ matrix.os }} | ||
steps: | ||
- uses: actions/checkout@v3 | ||
- uses: actions/cache@v3 | ||
with: | ||
path: | | ||
~/.cargo/registry | ||
~/.cargo/git | ||
tmp | ||
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }} | ||
- uses: ruby/setup-ruby@v1 | ||
with: | ||
ruby-version: ${{ matrix.ruby }} | ||
bundler-cache: true | ||
- run: bundle exec rake compile | ||
- run: bundle exec rake spec | ||
lint: | ||
strategy: | ||
matrix: | ||
os: [ubuntu-latest] | ||
runs-on: ${{ matrix.os }} | ||
steps: | ||
- uses: actions/checkout@v3 | ||
- uses: ruby/setup-ruby@v1 | ||
with: | ||
ruby-version: 3.1 | ||
bundler-cache: true | ||
- run: bundle exec rake standard |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -69,6 +69,8 @@ GEM | |
|
||
PLATFORMS | ||
arm64-darwin-22 | ||
x86_64-darwin-22 | ||
x86_64-linux | ||
|
||
DEPENDENCIES | ||
pry (~> 0.14.2) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
require 'lib/tiktoken_ruby.rb' | ||
require "lib/tiktoken_ruby" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,51 +1,52 @@ | ||
# frozen_string_literal: true | ||
|
||
class Tiktoken::Encoding | ||
attr_reader :name | ||
|
||
# This returns a new Tiktoken::Encoding instance for the requested encoding | ||
# @param encoding [Symbol] The name of the encoding to load | ||
# @return [Tiktoken::Encoding] The encoding instance | ||
def self.for_name(encoding) | ||
Tiktoken::Encoding.new(Tiktoken::BpeFactory.send(encoding.to_sym), encoding.to_sym) | ||
end | ||
|
||
# This returns a Tiktoken::Encoding instance for the requested encoding | ||
# It will reuse an existing encoding if it's already been loaded | ||
# @param encoding [Symbol] The name of the encoding to load | ||
# @return [Tiktoken::Encoding] The encoding instance | ||
def self.for_name_cached(encoding) | ||
@encodings ||= {} | ||
@encodings[encoding.to_sym] ||= Tiktoken::Encoding.for_name(encoding) | ||
end | ||
|
||
# Encodes the text as a list of integer tokens. This encoding will encode special non text tokens | ||
# basically it's unescaped | ||
# @param text [String] The text to encode | ||
# @return [Array<Integer>] The encoded tokens | ||
def encode_ordinary(text) | ||
@ext_base_bpe.encode_ordinary(text) | ||
end | ||
|
||
# Encodes the text as a list of integer tokens. This encoding will treat special non text tokens | ||
# as text unless they're in the allowed_special array. It's basically like the text was escaped | ||
# @param text [String] The text to encode | ||
# @param allowed_special [Array<String>] An array of special tokens to allow | ||
# @return [Array<Integer>] The encoded tokens | ||
def encode(text, allowed_special: []) | ||
@ext_base_bpe.encode(text, allowed_special) | ||
end | ||
|
||
# Decodes the tokens back into text | ||
# @param tokens [Array<Integer>] The tokens to decode | ||
# @return [String] The decoded text | ||
def decode(tokens) | ||
@ext_base_bpe.decode(tokens) | ||
end | ||
|
||
private | ||
def initialize(ext_base_bpe, name) | ||
@ext_base_bpe = ext_base_bpe | ||
@name = name | ||
end | ||
attr_reader :name | ||
|
||
# This returns a new Tiktoken::Encoding instance for the requested encoding | ||
# @param encoding [Symbol] The name of the encoding to load | ||
# @return [Tiktoken::Encoding] The encoding instance | ||
def self.for_name(encoding) | ||
Tiktoken::Encoding.new(Tiktoken::BpeFactory.send(encoding.to_sym), encoding.to_sym) | ||
end | ||
|
||
# This returns a Tiktoken::Encoding instance for the requested encoding | ||
# It will reuse an existing encoding if it's already been loaded | ||
# @param encoding [Symbol] The name of the encoding to load | ||
# @return [Tiktoken::Encoding] The encoding instance | ||
def self.for_name_cached(encoding) | ||
@encodings ||= {} | ||
@encodings[encoding.to_sym] ||= Tiktoken::Encoding.for_name(encoding) | ||
end | ||
|
||
# Encodes the text as a list of integer tokens. This encoding will encode special non text tokens | ||
# basically it's unescaped | ||
# @param text [String] The text to encode | ||
# @return [Array<Integer>] The encoded tokens | ||
def encode_ordinary(text) | ||
@ext_base_bpe.encode_ordinary(text) | ||
end | ||
|
||
# Encodes the text as a list of integer tokens. This encoding will treat special non text tokens | ||
# as text unless they're in the allowed_special array. It's basically like the text was escaped | ||
# @param text [String] The text to encode | ||
# @param allowed_special [Array<String>] An array of special tokens to allow | ||
# @return [Array<Integer>] The encoded tokens | ||
def encode(text, allowed_special: []) | ||
@ext_base_bpe.encode(text, allowed_special) | ||
end | ||
|
||
# Decodes the tokens back into text | ||
# @param tokens [Array<Integer>] The tokens to decode | ||
# @return [String] The decoded text | ||
def decode(tokens) | ||
@ext_base_bpe.decode(tokens) | ||
end | ||
|
||
private | ||
|
||
def initialize(ext_base_bpe, name) | ||
@ext_base_bpe = ext_base_bpe | ||
@name = name | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,9 +9,9 @@ Gem::Specification.new do |spec| | |
spec.email = ["[email protected]"] | ||
|
||
spec.summary = "Ruby wrapper for Tiktoken" | ||
spec.description = "An unofficial Ruby wrapper for Tiktoken, " + | ||
"a BPE tokenizer written by and used by OpenAI. It can be used to " + | ||
"count the number of tokens in text before sending it to OpenAI APIs." | ||
spec.description = "An unofficial Ruby wrapper for Tiktoken, " \ | ||
"a BPE tokenizer written by and used by OpenAI. It can be used to " \ | ||
"count the number of tokens in text before sending it to OpenAI APIs." | ||
|
||
spec.homepage = "https://github.com/IAPark/tiktoken_ruby" | ||
spec.license = "MIT" | ||
|
@@ -22,8 +22,8 @@ Gem::Specification.new do |spec| | |
spec.metadata["homepage_uri"] = spec.homepage | ||
spec.metadata["source_code_uri"] = "https://github.com/IAPark/tiktoken_ruby" | ||
spec.metadata["documentation_uri"] = "https://rubydoc.info/github/IAPark/tiktoken_ruby/main" | ||
#spec.metadata["changelog_uri"] = "TODO: Put your gem's CHANGELOG.md URL here." | ||
|
||
# spec.metadata["changelog_uri"] = "TODO: Put your gem's CHANGELOG.md URL here." | ||
|
||
# Specify which files should be added to the gem when it is released. | ||
# The `git ls-files -z` loads the files in the RubyGem that have been added into git. | ||
|