-
-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load_session()
fails when dump_session()
is used with byref=True
#462
Comments
This should be the expected behavior for what you have above:
|
Taking a closer look to the the logic in the function >>> from dill import dump_session, load_session
>>> x = 1
>>> dump_session()
>>> x = 0
>>> load_session()
>>> x
1 The problem seems to be the last line of the mentioned function: Lines 418 to 436 in 914d47f
The function should return the if len(imported):
...
return newmod
else:
return main_module |
I've found another bug in |
Nice catch. Please do. |
OK, the next problem I'm having seems to be related to functions defined in the global namespace. First,
In summary, what
I've found the logic of Line 1866 in 5bd56a8
But what actually happens is this: Line 1853 in 5bd56a8
Lines 1863 to 1872 in 5bd56a8
Step by step:
Finally, the module's dictionary is pushed to the "postproc" stack to be pickled latter: Lines 1907 to 1910 in 5bd56a8
In the case of It would be nice to have a truth table for the series of questions: Should this function's
I need some help to solve this without breaking anything else. |
@anivegesana: as you introduced |
@anivegesana FYI my issue was solved with a small change to |
TLDR: The Looking back at it, I realize that the comments are very poor. Each individual PR I made was properly described how the changes fixed something, but they do not explain how they interact with each other and now it appears to be a messy hodge-podge. This is actually the first time that I am seeing all of my changes to the function merged together, and it looks daunting even for me. The entire Before talking about the The first step that has to be decided is if the globals dictionary is going to be copied or not. The globals dictionary is copied when recurse is on. The copied globals dictionary will be a subset of the globals dictionary of the function (will usually be the module that it resides in) that contains all global variables and only the global variables that are used by the function. Unfortunately, passing the globals dictionary generated by The next step is more straightforward. There are special attributes of a function that belong to the function itself and not the The last bit is just a trick to allow for some of the other recursive patterns that I mentioned in #458. In those cases, the cell doesn't contain the function itself, but a container that contains that function. To handle this, we delay the assignment of the cell as long as possible (add it to the postprocessing list that is on the bottom of the stack) and perform the assignment earlier if possible (move it to the current postprocessing list from the bottom one). Now that should be a good description of what Should this function's globals be pushed to the pickle stack... This should be easier to understand now. The globals dictionary is not pushed onto the stack. An instruction to update the dictionary is pushed onto the stack. This is because the objects in the dictionary are not yet available, so we delay the assignment of its members until all members are present. This is only needed with In the case that the globals dictionary is not copied, there are two possibilities: the function was created with a custom globals dictionary or was created in a module (most likely situation.) What the globs = obj.__globals__ if PY3 else obj.func_globals
# If the globals is a module __dict__, do not save it in the pickle.
if globs is not None and obj.__module__ is not None and \
getattr(_import_module(obj.__module__, True), '__dict__', None) is globs:
globs_copy = globs
else:
# Fake save globs as an empty dictionary and delay copying elements into it
# until all of the elements have been created.
globs_copy = globs.copy()
from pickle import EMPTY_DICT, MARK, DICT, POP
if pickler.bin:
pickler.write(EMPTY_DICT)
else: # proto 0 -- can't use EMPTY_DICT
pickler.write(MARK + DICT)
pickler.memoize(obj)
pickler.write(POP) Feel free to ask more questions. It will help me make the documentation for this function and talking through this helped me realize that this |
@anivegesana: I was just about to sign off for tonight. Thanks for the detailed response. I'll go through it early tomorrow morning. FYI, a release is imminent (expected to be by midday Friday, ET). If you have anything here that you wanted to resolve, let me know and I will delay the release a bit, if needed. No last minute features, etc... however a small patch or documentation should be fine. |
Since I have the bug fix ready, I'll open another PR tonight after I verify that it works. If you need a little bit of time to look over it, you could push the release a couple of days. |
I'll see what your PR looks like before I make that decision. Everything else is at a good state all across the UQF codebases, so It's ready to cut otherwise. |
Sorry for hijacking this issue, but @mmckerns I believe that the namespace bug issue was actually present for a very long time. It definitely would need more ironing out and we would have to push out the release a couple of days to fix it. Cloudpickle just uses a weak ref dictionary to keep track of the global dictionaries that it has created so far and the same pattern could be used here to solve the problem. |
@anivegesana: if you are proposing to alter the behavior of |
Don't worry, it's fine for me. Thank you for the detailed answer. For my part, this issue could be closed as soon as my PR (#463) is merged —@mmckerns already reviewed it and it's passing all the tests now. If you prefer, I can edit the title and let the issue open to keep the discussion history. Please, @anivegesana, if any changes to be included in the pending release conflict with my PR let me know so that I can adapt my changes to the new code. |
The namespace breaking bug will definitely not be fixed in the pending release. Your PR should be unaffected. |
@leogama, don't edit this one... @anivegesana will open a new PR and reference this one. |
Oh, nice. And the discussion can continue there. Okay |
@anivegesana: finally worked through your summary and analysis. Very nice, and I agree with you. |
load_session()
is failing even in the simplest cases whendump_session()
was used with thebyref
parameter set toTrue
:Am I missing something? I tried to take a look at what happens differently when the
byref
option is used, but I don't understand the manipulations made to the__main__
module.My setup:
The text was updated successfully, but these errors were encountered: