Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recent release brings in all globals #202

Closed
mrocklin opened this issue Sep 14, 2018 · 8 comments
Closed

Recent release brings in all globals #202

mrocklin opened this issue Sep 14, 2018 · 8 comments
Labels

Comments

@mrocklin
Copy link
Contributor

The following test started to fail in the latest 0.5.6 release:

import numpy as np
import cloudpickle

def test_pickle_globals():
    def f(x):
        return np.sin(x) + np.cos(x)  # requires access to global numpy

    def g():  # some other object not necessary for f
        pass

    d = cloudpickle.loads(cloudpickle.dumps(f)).__globals__
    assert 'g' not in d
    assert set(d) == {'np', '__builtins__'}

It now seems that cloudpickle brings in many more globals into the serialization.

In [6]: set(d)
Out[6]: 
{'In',
 'Out',
 '_',
 '_5',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i2',
 '_i3',
 '_i4',
 '_i5',
 '_i6',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'cloudpickle',
 'd',
 'exit',
 'f',
 'g',
 'get_ipython',
 'np',
 'quit'}

This is causing dask tests to fail

@ogrisel
Copy link
Contributor

ogrisel commented Sep 14, 2018

This is a consequence of #188 followed by #198.

I believe #188 already introduced the issue but only for functions defined in the main while #198 propagated it to functions defined in a local scope of a regular module.

@ogrisel
Copy link
Contributor

ogrisel commented Sep 14, 2018

This test does not really check that those extra objects where included in the pickle though. The __globals__ attribute of the unpickled function is now repopulated with the references of the module where it is unpickled. To check that we did not include any extra unwanted object, we need to depickle in a new vanilla python process.

@ogrisel
Copy link
Contributor

ogrisel commented Sep 14, 2018

So maybe it's not a bug :) But we need to improve the tests in cloudpickle to make sure that we did no introduce a regression.

@mrocklin
Copy link
Contributor Author

Ah, interesting. Another way to test:

In [1]: import numpy as np
   ...: import cloudpickle
   ...: 

In [2]: def f(x):
   ...:     return np.sin(x) + np.cos(x)
   ...: 

In [3]: b = cloudpickle.dumps(f)

In [4]: def g():
   ...:     pass
   ...: 

In [5]: b2 = cloudpickle.dumps(f)

In [6]: b == b2
Out[6]: True

@tomMoral
Copy link
Contributor

tomMoral commented Sep 14, 2018

The current behavior is to try to retrieve the globals from the module where f lives before creating new variables. For the test you are referring to, as f is re-imported in the same module, f.__globals__ is set to vars(f.__module__) and no globals are changed because they all exists. You can check that with assert d is f.__globals__

So I think the test should be changed to test that importing a function from a __main__ module in a new subprocess does not import too many global variables.

@ogrisel
Copy link
Contributor

ogrisel commented Sep 14, 2018

Here is another one that passes:

import cloudpickle
import numpy as np
from io import StringIO
from pickletools import dis


def unrelated_function(a):
    return np.array([a])


def my_small_function(a, b):
    return a + b


pickled = cloudpickle.dumps(my_small_function)
buffer_ = StringIO()
dis(pickled, out=buffer_)
pickled_asm = buffer_.getvalue()
assert 'my_small_function' in pickled_asm
assert 'unrelated_function' not in pickled_asm
assert 'numpy' not in pickled_asm

@mrocklin
Copy link
Contributor Author

OK. Closing as not-a-bug. Thanks everyone for the help.

@ogrisel
Copy link
Contributor

ogrisel commented Sep 14, 2018

Great, I opened #203 to improve the cloudpickle tests.

jakirkham pushed a commit to dask/dask that referenced this issue Sep 14, 2018
* Change test_pickle_globals to not check __globals__ directly

Implementation taken from cloudpipe/cloudpickle#202 (comment)

* [skip ci] Fix docstring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants