-
-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
__globals__ of function in deleted module restored inconsistently #532
Comments
Actually, you can trigger this problem without closures: if you pickle several imported functions as above, each will get its own global dict instead of sharing a single one. For example:
|
@dfremont, thanks for reporting. @leogama: you may want to have a look at this due to current work on saving module dicts, @anivegesana: you should probably look at this due to the recursive post-processing code. |
First section of code you mentioned: I think it is doing it because deleting the module makes the functions and their global dictionaries inaccessible, so dill pickles a copy of it. Because the global dictionary of other is not accessible by running Unfortunately, I am unaware of a fix to this. I am open to suggestions, but I don't think there is a way to pickle the reference to a global dictionary of a module that has been deleted. Second section where both functions are pickled together in the same file: It was a duplicate of #466. I wasn't working on it because it would be a pain to squeeze it around the more pressing software engineering changes that @leogama and @mmckerns were working on (regarding removing dead code from Python 2), so I just left it open. Before now, it was just a theoretical issue that could cause problems. Good to know that someone will use that feature when it is done. |
It's not a good idea to delete from Line 230 in 87b8541
|
Thanks to both of you for your fast response (and all your work on
Is there a reason why the namespace of a module can't be treated like any ordinary
Ah, yes, sorry for missing that! That case does indeed matter for me, since I expect the unpickled functions to share the same global namespace dict (as they did before pickling) so that any mutations they make affect each other.
Agreed! But it is allowed, and unfortunately difficult to avoid in my application, so if there's a relatively easy way to fix this on the
I'm not understanding why it would be necessary to do anything complicated here. |
The reason that the globals dictionary cannot be treated like a normal dictionary is a bit nuanced. The A future fix would keep track of all copies of global dictionaries created so that no global dictionary is copied more than once. This is a bit more complicated and would require I will open a PR tomorrow night. |
@anivegesana, actually, under the hood, what happens is a bit more complex. Consider a function If the module is built-in, the reloaded module is different from the one that was removed from >>> import sys
>>> import math as math1
>>> del sys.modules['math']
>>> import math as math2
>>> math1 == math2
False
>>> math1.__dict__ is math2.__dict__
False
>>> math1.__dict__ == math2.__dict__
True
>>> math1.sin is math2.sin
True If the module is not built-in, however, the functions' code is re-executed, so the objects don't match: >>> import sys
>>> import statistics as stats1
>>> del sys.modules['statistics']
>>> import statistics as stats2
>>> stats1 == stats2
False
>>> stats1.__dict__ == stats2.__dict__
False
>>> stats1.mean == stats2.mean
False
>>> stats1.mean.__code__ is stats2.mean.__code__
False
>>> # But the function's code *is* identical:
>>> stats1.mean.__code__ == stats2.mean.__code__
True I'd argue that, if the function (or class, etc.) would normally be saved by reference i.e. "as global", dill should be able to identify this edge case and still save it the same way. |
I think this issue is concerning whether or not all functions that shared the same globals dictionary still share the same globals dictionary as each other after unpickling, not that they have the same exact dictionary that was used during pickling. This seems impossible because there is no way to access the old globals dictionary if the pickle was unpickled in a different interpreter. I fail to see why standard library modules would cause issues. If you have a case where this condition fails, I'd be happy to fix it. >>> import dill, sys, math
>>> mdict = math.__dict__
>>> md1 = dill.copy(mdict)
>>> del sys.modules['math']
>>> import math
>>> mdict2 = math.__dict__
>>> md2 = dill.copy(mdict2)
>>> md1 is mdict
True
>>> md2 is mdict2
True
>>> md2 is md1
False |
Disclaimer: none of the following comments deal with the shared namespace problem. Anyway... I've just opened a PR that would likely solve the problem for OP's specific examples, but there's yet another approach to unloaded modules: patching This would be inside _modules_cache = sys.modules.copy() # initialization
__import__ = __import__ # copy from __builtins__ to __dict__
@wraps(__import__)
def _import_hook(*args, **kwds):
module = __import__(*args, **kwds)
_modules_cache.update(sys.modules)
return module
__builtin__.__import__ = _import_hook # original __import__ is _import_hook.__wrapped__ It doesn't break anything in principle, it's like a decorator. All tests passed on both CPython and PyPy with this snippet included. Idea stolen from: https://stackoverflow.com/questions/14778407/do-something-every-time-a-module-is-imported/14778568#14778568 |
@leogama: I'm not sure if you looked at the code I was referencing (in |
I saw the |
My application sometimes pickles functions after the module they were imported from has been removed from
sys.modules
(for reasons we hopefully don't have to get into). Usuallydill
handles this just fine, but there seems to be a bug when the pickled function is used in a closure which also maintains a reference to the pickled function's__globals__
dictionary. The following code yields an assertion failure withdill
0.3.5.1 (and also the latest version onmaster
; in each case I was using Python 3.9):where
other.py
contains the code:The assertion fails because in the unpickled version of
inner
, the free variablefunc
is bound to the unpickled version ofblah
but the free variableglobs
is bound to a dict which is not the globals dict offunc
(nor is it the globals dict ofother.blah
, in fact).What seems to be happening is that when
blah
is unpickled,dill
calls_create_function
with the argumentfglobals
set to an empty dictionary, so that the linecreates a new dict to be the globals dict of the new function. Then when the "post-processing" updates the cells of
inner
, it sets the cell forglobs
to the old dictionary instead of this newly-created one. To confirm this diagnosis, I removed theor dict()
from the line above, and changed the "F1" case ofsave_function
to always setglobs = globs_copy
(i.e. removing theelse
branch where it setsglobs = {'__name__': obj.__module__}
), and indeed the problem went away. I'm not sure how to properly fix the issue, though.The text was updated successfully, but these errors were encountered: