Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_session() fails when dump_session() is used with byref=True #462

Closed
leogama opened this issue Apr 13, 2022 · 18 comments · Fixed by #463
Closed

load_session() fails when dump_session() is used with byref=True #462

leogama opened this issue Apr 13, 2022 · 18 comments · Fixed by #463
Labels
Milestone

Comments

@leogama
Copy link
Contributor

leogama commented Apr 13, 2022

load_session() is failing even in the simplest cases when dump_session() was used with the byref parameter set to True:

>>> import dill
>>> x = 1
>>> dill.dump_session(byref=True)
# restart the interpreter, or not
>>> x = 0
>>> dill.load_session()
>>> x
0
# should have updated 'x' value
>>> del x
>>> dill.load_session()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/leogama/.local/lib/python3.8/site-packages/dill/_dill.py", line 423, in load_session
    _restore_modules(main)
  File "/home/leogama/.local/lib/python3.8/site-packages/dill/_dill.py", line 383, in _restore_modules
    exec("from %s import %s" % (module, name), main_module.__dict__)
  File "<string>", line 1, in <module>
ImportError: cannot import name 'x' from '__main__' (unknown location)

Am I missing something? I tried to take a look at what happens differently when the byref option is used, but I don't understand the manipulations made to the __main__ module.


My setup:

  • Ubuntu 20.04.4 LTS
  • Linux 5.4.0-107
  • Python 3.8.10
  • dill 0.3.4
@mmckerns mmckerns added the bug label Apr 14, 2022
@mmckerns
Copy link
Member

This should be the expected behavior for what you have above:

>>> import dill
>>> x = 1
>>> dill.dump_session()
>>> x = 0
>>> dill.load_session()
>>> x
1

byref only should affect class definitions, so there's something amiss here. I can reproduce what you are seeing.

@leogama
Copy link
Contributor Author

leogama commented Apr 20, 2022

Taking a closer look to the the logic in the function _stash_modules(), I've found that if you import at least one object from a module, not the module itself, that example works fine:

>>> from dill import dump_session, load_session
>>> x = 1
>>> dump_session()
>>> x = 0
>>> load_session()
>>> x
1

The problem seems to be the last line of the mentioned function:

dill/dill/_dill.py

Lines 418 to 436 in 914d47f

def _stash_modules(main_module):
modmap = _module_map()
imported = []
original = {}
items = 'items' if PY3 else 'iteritems'
for name, obj in getattr(main_module.__dict__, items)():
source_module = _lookup_module(modmap, name, obj, main_module)
if source_module:
imported.append((source_module, name))
else:
original[name] = obj
if len(imported):
import types
newmod = types.ModuleType(main_module.__name__)
newmod.__dict__.update(original)
newmod.__dill_imported = imported
return newmod
else:
return original

The function should return the main module modified or not, and not the original dictionary created from the module in the latter case. Therefore, it should be sufficient to change the last return statement to:

if len(imported):
    ...
    return newmod
else:
    return main_module

@leogama
Copy link
Contributor Author

leogama commented Apr 20, 2022

I've found another bug in dump_session(). I think I'll fill a pull request soon.

@mmckerns
Copy link
Member

Nice catch. Please do.

@leogama
Copy link
Contributor Author

leogama commented Apr 24, 2022

OK, the next problem I'm having seems to be related to functions defined in the global namespace. First, _stash_modules() defers imported modules and objects to be handled by _restore_modules() on load and excludes them from the list of objects to be pickled. But then, when a function defined in the module is being pickled (be it __main__ or not), its "globals" are looked for and the parent module's dictonary (module.__dict__) is picked up to be pickled again, but without that deference logic from dump_session(byref=True). The current result is:

  • dill version 0.3.4: I eventually get a PicklingError for an unpickable object or imported module in the global namespace, that should only be saved by reference.
  • dill master branch: I get a PicklingWarning for recursive self-references.

In summary, what dill.dump_session(byref=True) currently tries to do is:

save(module)save(function) (defined in module) → save(function.__globals__) (it's module.__dict__) → errors

I've found the logic of save_function() to be rather contrived —even more after recent changes. I know that dump_session() is a relatively minor use case of the package, mainly with the byref option, and these kinds of corner cases are expected to pop up from time to time as the code expands. But the strange thing is that a comment in that function explicitly says:

# If the globals is a module __dict__, do not save it in the pickle.

But what actually happens is this:

if _recurse:

(_recurse is False)

dill/dill/_dill.py

Lines 1863 to 1872 in 5bd56a8

else:
globs_copy = obj.__globals__ if PY3 else obj.func_globals
# If the globals is a module __dict__, do not save it in the pickle.
if globs_copy is not None and obj.__module__ is not None and \
getattr(_import_module(obj.__module__, True), '__dict__', None) is globs_copy:
globs = globs_copy
else:
globs = {'__name__': obj.__module__}

Step by step:

  1. obj is the function being pickled;
  2. [L1864] globs_copy is created as a reference to (not a copy of) obj.__globals__;
  3. [L1867] obj.__module__ is the name of the module containing the function definition (the module currently being saved in the case of dump(module) and dump_session(main=module);
  4. [L1868] Indeed, module.__dict__ is obj.__globals__;
  5. [L1869] Therefore, globs is created as another reference to obj.__globals__;

Finally, the module's dictionary is pushed to the "postproc" stack to be pickled latter:

dill/dill/_dill.py

Lines 1907 to 1910 in 5bd56a8

_save_with_postproc(pickler, (_create_function, (
obj.__code__, globs, obj.__name__, obj.__defaults__,
closure
), state), obj=obj, postproc_list=postproc_list)

In the case of dump_session(byref=True) we definitely don't want this to happen. I'm not sure if this is the correct behavior for other cases. The if-else branches on line 1867 seems to be switched.

It would be nice to have a truth table for the series of questions:

Should this function's globals be pushed to the pickle stack...

  • ...if it is the dictionary of the module where this function is defined?
    • ...if that module is already being pickled (this is an indirect call from save(module))?
  • ...if it was called by dump_session()?
  • ...if the byref option is (or was) set?
  • ...if the recurse option is set?

I need some help to solve this without breaking anything else.

@mmckerns
Copy link
Member

@anivegesana: as you introduced _save_with_postproc and related changes, can you comment on this? I'll also try to read through the current logic and respond here.

@leogama
Copy link
Contributor Author

leogama commented Apr 28, 2022

@anivegesana FYI my issue was solved with a small change to save_function(), without touching the main logic. But I'm still curious about it.

@anivegesana
Copy link
Contributor

anivegesana commented Apr 29, 2022

TLDR: The else is not quite right.

Looking back at it, I realize that the comments are very poor. Each individual PR I made was properly described how the changes fixed something, but they do not explain how they interact with each other and now it appears to be a messy hodge-podge. This is actually the first time that I am seeing all of my changes to the function merged together, and it looks daunting even for me. The entire save_function function probably deserves a comment describing high level overview of what is happening.

Before talking about the save_function function, I should explain the postprocessing list. _save_with_postproc performs pickler._save_reduce and then allows for us to specify a list of postprocessing actions that are done immediately after the object is created. We use it to save a recursive object because this allows us to check if we are in a recursive cycle between immutable objects (particularly function -> tuple (__closure__) -> cell -> ) and break the cycle somewhere. We chose to break the cycle at the cell and delay the assignment of the cell until the object has been created. Unfortunately, that doesn't cover all cases. In fact, it only covers the one case that I mentioned and valid Python programs exist that do not fit this structure (see #458 for more details). _save_with_postproc also performs some additional checks to see if it is trapped in a recursive cycle and try break it and generate a warning (maybe we should change it to an error now that #458 is pulled), but that is just for easier debugging and doesn't affect functionality.

The first step that has to be decided is if the globals dictionary is going to be copied or not. The globals dictionary is copied when recurse is on. The copied globals dictionary will be a subset of the globals dictionary of the function (will usually be the module that it resides in) that contains all global variables and only the global variables that are used by the function. Unfortunately, passing the globals dictionary generated by dill.detect.globalvars to the function constructor will not work. The globals have to be populated after every object has been created. This is because globals dictionary will contain the current function as a member and the __globals__ attribute of the function class is read only, so it cannot be changed later. The good news is that the globals dictionary itself can be changed at anytime. So, we use add an action to update the globals dictionary to a postprocessing list. Unfortunately, the globals dictionary could also contain functions that are also on the postprocessing list stack (are in the process of being created and are not yet available to be assigned to the dictionary). Because of this, we need to create the globals dictionary after we are done with the recursive object that will be finalized last (the lowest entry on the stack that is in the globals dictionary). It would be impossible to create the globals dictionary any earlier.

The next step is more straightforward. There are special attributes of a function that belong to the function itself and not the __dict__ dictionary that contains the object's fields. In order to correctly pickle these attributes, they need to be extracted and saved in the state dictionary which will then be assigned to the attributes when the function is unpickled. And we use _save_with_postproc instead of _save_reduce to indicate that the function may be a recursive object and must go on the postprocessing list stack to avoid potential infinite recursion.

The last bit is just a trick to allow for some of the other recursive patterns that I mentioned in #458. In those cases, the cell doesn't contain the function itself, but a container that contains that function. To handle this, we delay the assignment of the cell as long as possible (add it to the postprocessing list that is on the bottom of the stack) and perform the assignment earlier if possible (move it to the current postprocessing list from the bottom one).

Now that should be a good description of what save_function is doing, I will answer the question(s) you asked.

Should this function's globals be pushed to the pickle stack...

This should be easier to understand now. The globals dictionary is not pushed onto the stack. An instruction to update the dictionary is pushed onto the stack. This is because the objects in the dictionary are not yet available, so we delay the assignment of its members until all members are present. This is only needed with _recurse is on, because this is the only case where a (subset) copy of the globals dictionary is made.

In the case that the globals dictionary is not copied, there are two possibilities: the function was created with a custom globals dictionary or was created in a module (most likely situation.) dill does not pickle module __dict__ attributes (except in _stash_modules) because they are a global state of the module and it would be unclear how to merge multiple states of a module at runtime if multiple pickles were unpickled (at least that is my understanding. @mmckerns feel free to jump in if there was a different reason). Because of this, we pickle the globals dictionary directly, and save_module_dict will treat it as a global.

What the else of that if statement is trying to do is to create an empty dictionary (save for the __name__ attribute) and copy the function's real global dictionary into it in postprocessing. Unfortunately (and thank goodness that you drew attention to it) this is not quite right. As it is written, a copy of a shared non-module global dictionary is made for every function, which decouples their shared global namespace. Thankfully, I don't think this bug will occur much (hence why I didn't catch it). It should only really happen when you call dill.copy twice or more on a function using recurse=True. Also, thankfully, it should be a simple fix that I already have ready. I'll verify it later tonight (we have no test cases for it right now), but I think it seems to make more sense than it was originally.

globs = obj.__globals__ if PY3 else obj.func_globals

# If the globals is a module __dict__, do not save it in the pickle.
if globs is not None and obj.__module__ is not None and \
        getattr(_import_module(obj.__module__, True), '__dict__', None) is globs:
    globs_copy = globs
else:
    # Fake save globs as an empty dictionary and delay copying elements into it
    # until all of the elements have been created.
    globs_copy = globs.copy()

    from pickle import EMPTY_DICT, MARK, DICT, POP
    
    if pickler.bin:
        pickler.write(EMPTY_DICT)
    else:   # proto 0 -- can't use EMPTY_DICT
        pickler.write(MARK + DICT)

    pickler.memoize(obj)
    
    pickler.write(POP)

Feel free to ask more questions. It will help me make the documentation for this function and talking through this helped me realize that this if statement is not needed anymore. Also, I need to find a place to put this or a way to comprehensibly split it up since this is a little bit dense and would clutter the codebase.

@mmckerns
Copy link
Member

mmckerns commented Apr 29, 2022

@anivegesana: I was just about to sign off for tonight. Thanks for the detailed response. I'll go through it early tomorrow morning. FYI, a release is imminent (expected to be by midday Friday, ET). If you have anything here that you wanted to resolve, let me know and I will delay the release a bit, if needed. No last minute features, etc... however a small patch or documentation should be fine.

@anivegesana
Copy link
Contributor

Since I have the bug fix ready, I'll open another PR tonight after I verify that it works. If you need a little bit of time to look over it, you could push the release a couple of days.

@mmckerns
Copy link
Member

I'll see what your PR looks like before I make that decision. Everything else is at a good state all across the UQF codebases, so It's ready to cut otherwise.

@anivegesana
Copy link
Contributor

anivegesana commented Apr 29, 2022

Sorry for hijacking this issue, but @mmckerns I believe that the namespace bug issue was actually present for a very long time. recurse broke all shared namespaces for children and it always has. This does seem like incorrect behavior, but it also seems like a new feature if it has been there for so long.

It definitely would need more ironing out and we would have to push out the release a couple of days to fix it. Cloudpickle just uses a weak ref dictionary to keep track of the global dictionaries that it has created so far and the same pattern could be used here to solve the problem.

@mmckerns
Copy link
Member

mmckerns commented Apr 29, 2022

@anivegesana: if you are proposing to alter the behavior of recurse, then open a new PR and we can discuss it there. If the issue is indeed due to older lines of code from recurse, then it's best to deal with it after the pending release.

@leogama
Copy link
Contributor Author

leogama commented Apr 29, 2022

Sorry for hijacking this issue, but @mmckerns I believe that this issue was actually present for a very long time. recurse broke all shared namespaces for children and it always has. This does seem like incorrect behavior, but it also seems like a new feature if it has been there for so long.

Don't worry, it's fine for me. Thank you for the detailed answer. For my part, this issue could be closed as soon as my PR (#463) is merged —@mmckerns already reviewed it and it's passing all the tests now. If you prefer, I can edit the title and let the issue open to keep the discussion history.

Please, @anivegesana, if any changes to be included in the pending release conflict with my PR let me know so that I can adapt my changes to the new code.

@anivegesana
Copy link
Contributor

The namespace breaking bug will definitely not be fixed in the pending release. Your PR should be unaffected.

@mmckerns
Copy link
Member

@leogama, don't edit this one... @anivegesana will open a new PR and reference this one.

@leogama
Copy link
Contributor Author

leogama commented Apr 29, 2022

Oh, nice. And the discussion can continue there. Okay

@mmckerns
Copy link
Member

@anivegesana: finally worked through your summary and analysis. Very nice, and I agree with you.

@mmckerns mmckerns added this to the dill-0.3.5 milestone May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants