-
-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes some bugs when using dump_session()
with byref=True
#463
Conversation
… modules When no objects are found to be imported from external modules, `_stash_modules()` should return `main_module` unmodified, and not the pair list of objects created from it (`original`).
…was imported Just calling `import multiprocessing` creates a `'__mp_main__'` entry in `sys.modules` that is simply a reference to the `__main__` module. In the old version of the test for module membership, objects in the global scope are wrongly attributed to this virtual `__mp_main__` module. And therefore `load_session()` fails. Testing for object identity instead of module name resolves the issue.
Yes, if we didn't set The PR looks good thus far. FYI, I expect that we will create a new release this coming week. |
This are great news! It would be awesome to have |
if we can finish the PR, it will be included in the release |
d430c2a
to
67f0b78
Compare
…e in `dump_session(byref=TRUE)` Currently, `dump_session(byref=True)` misses some imported objects. For example: - If the session had a statement `import numpy as np`, it may find a reference to the `numpy` named as `np` in some internal module listed in `sys.resources`. But if the module was imported with a non-canonical name, like `import numpy as nump`, it won't find it at all. Mapping the objects by id in `modmap` solves the issue. Note that just types of objects usually imported under an alias must be looked up by id, otherwise common objects like singletons may be wrongly attributed to a module, and such reference in the module could change to a different object depending on its initialization and state. - If a object in the global scope is a top level module, like `math`, again `save_session` may find a reference to it in another module and it works. But if this module isn't referenced anywhere else, it won't be found because the function only looks for objects inside the `sys.resources` modules and not for the modules themselves. This commit introduces two new attributes to session modules saved by reference: - `__dill_imported_as`: a list with (module name, object name, object alias in session) - `__dill_imported_top_level`: a list with (module name, module alias in session) I did it this way for forwards (complete) and backwards (partial) compatibility. Oh, and I got rid of that nasty `exec()` call in `_restore_modules()`! ;)
67f0b78
to
c891eb5
Compare
Fixes RecursionWarning error where a function defined in the top level of the module being saved with `dump_session(byref=True)`, of which "globals" is a reference to the original module's `__dict__`, makes dill to try to save this instead of the modified module's `__dict__`, triggering recursion.
c891eb5
to
8b15b99
Compare
Hello there! My PR is ready for review. I rushed to solve all the problems I've found using Do we need some unit tests? How can I provide them? Should it just reproduce the kinds of usage that were failing before these modification? I'm new to testing. |
That's great. I'll review it tonight. Yes, |
Thank you! I'll take a look at the tests. |
tests have to pass for the currently supported versions of python which are: (python 2.7, 3.7-3.10; PyPy 2.7, 3.7-3.9) and should also pass for python 3.11. We will drop support for python 2.7 / PyPy 2.7 after this upcoming release. |
I did some random testing -- building a class and an instance of a class, then dumping a session (with and without byref), then loading, and getting the same results either way. Also, testing that things like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Would be nice to have some tests, but can't complain as there are currently none.
baf9ec5
to
d535f70
Compare
Added some tests. The Travis build is failing because of a warning raised in a assert statement before the proper tests. It's not critical... |
d535f70
to
17b025d
Compare
Travis doesn't like warnings. Can I substitute it by a print to stderr or a call to log()? Otherwise I will just comment out the offending lines. |
If the raising of a warning is what you want to test, then you can catch it in a try/except block, and assert True on except. If this isn't the case, you can always use an if block to skip the line on travis, or something else. It looks like you are just trying to ensure that the modules you will test are not already imported. Could you not instead delete them from sys.modules if they are already loaded? |
I added the line to check if the modules that I choose to import in the tests weren't already loaded by python or dill. My assumption is that |
Actually, I remove all the non "pre-loaded" modules from |
I would think that if you delete a module from |
17b025d
to
47a060d
Compare
I've removed the header "pre-test" and adapted the code. Travis builds are failing with Python 3.7 and Python 2.7. This last one "is allowed to fail", and all the tests pass with Python 2.7 on my machine though (Ubuntu Focal). The error is the same: the This is an artificial test setting however, if you start to think. The use case of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the changes to _dill.py
-- the tests need a few changes, however are generally good. please address the comments left inline
Answered all the comments. All tests passing. Ready to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@leogama: In case the OP doesn't open a |
@mmckerns, take a look at my answer there. It is half a bug and half misdesign/misuse. It may need a decision about what to do. |
yep, that's my thought exactly |
I've already made up my mind. Restoring the whole namespace of a module to a different module (with a different name!) is a really bad idea, as it overwrites the second module's attributes like The only doubt I have is whether >>> import dill, types
>>> mod = types.ModuleType('runtime')
>>> mod.var = 'spam'
>>> dill.dump_session(main=mod) In other session: >>> import dill
>>> mod = dill.load_session()
>>> print(mod.var)
spam EditI feel it is not so hard to do the same as above without changing >>> import dill, sys, types
>>> sys.modules['runtime'] = types.ModuleType('runtime')
>>> mod = dill.load_session() # it could return the module anyway
>>> print(mod.var)
spam |
I'm pretty sure that I agree with you. I also think it'd be good if modules created at runtime (as in your example) worked. |
Here are the proposed fixes for bugs reported in #462.
You can review the changes already, but please don't merge it yet as I'm still tracking a strange third bug in
dump_session()
. And I spotted yet another possible bug, but need to test it first. I'll probably append more commits in the next days.I'm working on this feature for it to be used as the cache mechanism for the
reticulate
R package —knitr
's Python engine. It's mostly functional by now. (Related discussion: yihui/knitr#1505)By the way, could you clarify a thing for me? Why is it that, even when the
byref
parameter is set toTrue
, after the imported objects are identified and stored in the__dill_imported
object, thePickler
is called with the_byref
attribute set toFalse
. What's the rationale? Is it because the session is saved as a whole module and thenpickle
would try to store everything inside it by name reference?Fixes #462