Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickle issue with load_ParametricUMAP #1134

Open
eafpres opened this issue Jun 19, 2024 · 5 comments
Open

Pickle issue with load_ParametricUMAP #1134

eafpres opened this issue Jun 19, 2024 · 5 comments

Comments

@eafpres
Copy link

eafpres commented Jun 19, 2024

Describe the bug

Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:   File "/var/app/current/application.py", line 478, in load_stuff
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:     model = load_ParametricUMAP(model_set + '/' + full_name,
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:   File "/home/user/mambaforge/envs/tensorml/lib/python3.11/site-packages/umap/parametric>
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:     model = pickle.load((open(model_output, "rb")))
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:   File "/home/user/mambaforge/envs/tensorml/lib/python3.11/site-packages/numba/core/seri>
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:     ctor, states = loads(serialized)
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:                    ^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: TypeError: code() argument 13 must be str, not int

To Reproduce
Steps to reproduce the behavior:
ubuntu 20.04
Python 3.11
umap-learn==0.5.3

  1. create an embedding:
  distance = 'sokalsneath'
  op_mix_ratio = 0.3
  embed_dim = 10
  reducer = umap.ParametricUMAP(random_state = 42,
                                transform_seed = 42,
                                n_neighbors = 15,
                                n_epochs = 500,
                                metric = distance,
                                min_dist = 0.0,
                                set_op_mix_ratio = op_mix_ratio,
                                n_components = embed_dim)
  mapper = reducer.fit(model_vectors)
  mapper.save(data_path + '/' + date_prefix + '/' +
              date_prefix + '_umap_mapper.umap')
  1. attempt to load the model on a different linux machine using load_ParametricUMAP()
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:   File "/var/app/current/application.py", line 478, in load_stuff
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:     model = load_ParametricUMAP(model_set + '/' + full_name,
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:   File "/home/user/mambaforge/envs/tensorml/lib/python3.11/site-packages/umap/parametric>
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:     model = pickle.load((open(model_output, "rb")))
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:   File "/home/user/mambaforge/envs/tensorml/lib/python3.11/site-packages/numba/core/seri>
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:     ctor, states = loads(serialized)
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:                    ^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: TypeError: code() argument 13 must be str, not int

Expected behavior
On another machine this worked. I believe it is a subtle pickle issue. I had issues with other pickle files, which was solved by using pickle.dump(object, open(filename), protocol = 2). I have not figured out how to get umap to use the protocol.

Desktop (please complete the following information):

  • OS: Windows 11 Pro, running WSL 2 with Ubuntu 20.04
@eafpres
Copy link
Author

eafpres commented Jun 20, 2024

Update--this may be a Python3.11-related issue. I have tested downgrading the server to Python3.9 and things seem too work then. I did try loading Python3.11 on my dev system and re-saving the model, but still got the error on the Python3.11 server.

@timsainb
Copy link
Collaborator

hey, can you try this branch to see if it resolves the issue on python 3.11? #1123

@kobiche
Copy link

kobiche commented Aug 21, 2024

I can confirm this is related to the python version. How should I proceed?

@rantoniuk
Copy link

@timsainb I can see the #1123 has conflicts to be resolved. Is this in a shape that I could use for building a custom version to see if it solves the issue or do you want to rebase first?

@timsainb
Copy link
Collaborator

We are just about to pull in an updated version of Parametric UMAP #1153 so my plan is to wait till that is pulled in to integrate #1123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants