Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tiktoken example from readme.md giving trait bound error related to ChunkSizer. #27

Closed
vamseekm opened this issue Jul 2, 2023 · 6 comments

Comments

@vamseekm
Copy link

vamseekm commented Jul 2, 2023

Describe the bug
tiktoken example from readme.md giving trait bound error related to ChunkSizer.

To Reproduce
Steps to reproduce the behavior:
Try to run the code block from readme.md related to tiktoken.

Expected behavior
Should compile without errors.

Desktop (please complete the following information):

  • OS: MacOS.

Additional context
Below is the error that I am getting.

13 | let splitter = TextSplitter::new(tokenizer);
| ----------------- ^^^^^^^^^ the trait ChunkSizer is not implemented for CoreBPE
| |
| required by a bound introduced by this call
|
= help: the following other types implement trait ChunkSizer:
Characters
tiktoken_rs::vendor_tiktoken::CoreBPE
note: required by a bound in TextSplitter::<S>::new
--> /Users/xxx/.cargo/registry/src/index.crates.io-6f17d22bba15001f/text-splitter-0.4.1/src/lib.rs:294:8
|
294 | S: ChunkSizer,
| ^^^^^^^^^^ required by this bound in TextSplitter::<S>::new

error[E0277]: the trait bound CoreBPE: ChunkSizer is not satisfied
--> src/chunker/chunker.rs:13:20
|
13 | let splitter = TextSplitter::new(tokenizer);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait ChunkSizer is not implemented for CoreBPE
|
= help: the following other types implement trait ChunkSizer:
Characters
tiktoken_rs::vendor_tiktoken::CoreBPE

@vamseekm
Copy link
Author

vamseekm commented Jul 2, 2023

Even tests are failing for this library.

@benbrandt
Copy link
Owner

Thanks for reporting, and apologies. It seems I forgot to document that these tokenizers require features to be enabled.

So if you enable the feature tiktoken-rs in your Cargo.toml then it will include the trait impl you need.

text-splitter = { version = "0.4.1", features = ["tiktoken-rs"] }

The reason is so that not every user has to depend on every tokenizer crate.

But I will make sure to add more documentation about the features, thanks again!

@vamseekm
Copy link
Author

vamseekm commented Jul 2, 2023

Thanks for the reply. Already enabled the feature flag and still failing with the feature flag enabled.

These are the dependencies that I am using in my cargo.toml.

text-splitter = {version= "0.4.1", features=["tiktoken-rs"]}

tiktoken-rs = "0.5.0"

@vamseekm
Copy link
Author

vamseekm commented Jul 2, 2023

Downgrading tiktoken-rs from 0.5.0 to 0.4.2 (the one used by) text-splitter seems to have resolved the error. I no longer see trait bound errors.

@benbrandt
Copy link
Owner

I don't know what you mean by the "tests are failing", but yes tiktoken-rs has released a new version, and I need to release a new version that supports it.

I can't assume every breaking change by them is compatible so unfortunately I need to verify before I allow the new version range.

I will do a release with a more flexible version range soon.

@benbrandt
Copy link
Owner

The reason this happened is that cargo resolved to you needing two versions of tiktoken, with only one of them getting the trait impl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants