Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to find ENCODING_CONSTRUCTORS #61

Closed
dheerajiiitv opened this issue Mar 14, 2023 · 6 comments · May be fixed by #152
Closed

Unable to find ENCODING_CONSTRUCTORS #61

dheerajiiitv opened this issue Mar 14, 2023 · 6 comments · May be fixed by #152

Comments

@dheerajiiitv
Copy link

This is happening when building tiktoken in bazel

enc = tiktoken.get_encoding(encoder)
File "/private/var/tmp/_bazel_dheerajagrawal/795e110180f9443c94e6bab86cf49f84/execroot/main/bazel-out/darwin_arm64-fastbuild/bin/metadata_extraction/functions/server/app.runfiles/main/utils/pypi/tiktoken_0.3.0/site-packages/tiktoken/registry.py", line 56, in get_encoding
_find_constructors()
File "/private/var/tmp/_bazel_dheerajagrawal/795e110180f9443c94e6bab86cf49f84/execroot/main/bazel-out/darwin_arm64-fastbuild/bin/metadata_extraction/functions/server/app.runfiles/main/utils/pypi/tiktoken_0.3.0/site-packages/tiktoken/registry.py", line 36, in _find_constructors
raise ValueError(
ValueError: tiktoken plugin tiktoken_ext.pycache does not define ENCODING_CONSTRUCTORS

@hauntsaninja
Copy link
Collaborator

What's the best way for me to reproduce this error?

@hauntsaninja hauntsaninja closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2023
@dheerajiiitv
Copy link
Author

dheerajiiitv commented Jun 23, 2023

Hi @hauntsaninja ,
I identified the issue, this issue is caused especially in Bazel and the reason the issue is coming is because of the creation of pycache module.

Here you can see the Code:


import importlib
import tiktoken_ext
import pkgutil

from pprint import pprint

if __name__ == "__main__":
    plugin_mods = pkgutil.iter_modules(tiktoken_ext.__path__, tiktoken_ext.__name__ + ".")

    for _, mod_name, _ in plugin_mods:
        mod = importlib.import_module(mod_name)
        try:
            constructors = mod.ENCODING_CONSTRUCTORS
            print(constructors)
        except Exception as e:
            print("Error", mod)
            pass


Here's the output:

First loop value
`Error <module 'tiktoken_ext.pycache' from '/tiktoken_0.3.1/site-packages/tiktoken_ext/pycache/init.py'>

Second loop value
{'gpt2': <function gpt2 at 0x104b74dc0>, 'r50k_base': <function r50k_base at 0x106386670>, 'p50k_base': <function p50k_base at 0x106386700>, 'p50k_edit': <function p50k_edit at 0x106386790>, 'cl100k_base': <function cl100k_base at 0x106386820>}`

@rishabh-sagar-20
Copy link

Any update on this @hauntsaninja?

@azmathr
Copy link

azmathr commented Feb 15, 2024

@hauntsaninja I get error in lambda while deploying through zappa

File "/var/task/summaryserver.py", line 13, in <module>
   encoding = tiktoken.get_encoding("cl100k_base")
 File "/var/task/tiktoken/registry.py", line 64, in get_encoding
   _find_constructors()
 File "/var/task/tiktoken/registry.py", line 44, in _find_constructors

@flying-sheep
Copy link

The problem is not that __pycache__ exists, that’s normal. The problem is that pkgutil.iter_modules isn’t smart enough to skip it. So you should skip it yourself instead.

@dheerajiiitv
Copy link
Author

@flying-sheep please make them merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants