Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't pack a repo due to presence of special token: <|endoftext|> #89

Closed
Shubxam opened this issue Sep 28, 2024 · 4 comments
Closed

can't pack a repo due to presence of special token: <|endoftext|> #89

Shubxam opened this issue Sep 28, 2024 · 4 comments

Comments

@Shubxam
Copy link

Shubxam commented Sep 28, 2024

working with a repo which contains hf model. tried to ignore folders which might cause the problem, but still unable to pack.

Screenshot 2024-09-28 at 19 50 58
@yamadashy
Copy link
Owner

yamadashy commented Sep 29, 2024

Thank you for reporting this issue, @Shubxam!

It seems the error is indeed related to token counting, specifically with the special token <|endoftext|> that is commonly used in some NLP models.

I've addressed this in our latest release, version 0.1.39. In this update, i've changed how Repopack handles token counting errors

  • Instead of stopping the process, it now logs a warning and continues.
  • Files that cause token counting errors are assigned a token count of 0.

https://github.com/yamadashy/repopack/releases/tag/v0.1.39

This should allow Repopack to complete the packing process even when encountering these special tokens.

@yamadashy
Copy link
Owner

@Shubxam Hi there!
Just checking in about the special token issue. We've released v0.1.39 which should address this problem. Could you please try it out and let us know if it resolves your issue? Thanks!

@Shubxam
Copy link
Author

Shubxam commented Oct 18, 2024

yes I can indeed confirm that the issue has been fixed and I can pack the repo without any error. thanks @yamadashy

@yamadashy
Copy link
Owner

@Shubxam Thank you for confirming! I'm glad to hear that the issue has been resolved.

I'll go ahead and close this issue now.
If you encounter any other issues or have any suggestions in the future, please don't hesitate to open a new issue.

Thanks again for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants