Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX More func base globals fixes #198

Merged
merged 18 commits into from
Sep 13, 2018

Conversation

pierreglaser
Copy link
Member

Follow up on #192, itself being a follow up on #187

The goal is to keep track of global variable modifications in the child
environment where functions are unpickled, and to prevent those modifications to be overriden if an object containing those globals gets reloaded.

NOTE: The case where the global variables live in __main__ was handled in here by @tomMoral
for the case where they live in any other module, a solution was proposed here by @ogrisel. The principle was to keep the name of the module where the globals lived instead of the variables themselves, and to call vars(sys.modules[module_name]) at unpickling time only.

However, the module where the pickled object lived in the parent environment does not always exist in the child environment. So sys.modules has to be checked in the new environment.
If the module does not exist, base_globals start to get tracked in case of further
re-loadings, using _BASE_GLOBALS_CACHE

Ultimately, this was causing an issue in python 2 in the case of dynamic
modules, because dynamic_subimport, that creates the dynamic module, writes into sys.modules, which can cause recursive imports in python 2.

@codecov-io
Copy link

codecov-io commented Sep 7, 2018

Codecov Report

Merging #198 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #198      +/-   ##
==========================================
+ Coverage   84.95%   84.98%   +0.02%     
==========================================
  Files           1        1              
  Lines         565      566       +1     
  Branches      109      110       +1     
==========================================
+ Hits          480      481       +1     
  Misses         62       62              
  Partials       23       23
Impacted Files Coverage Δ
cloudpickle/cloudpickle.py 84.98% <100%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81a7a8c...7bc6a82. Read the comment docs.

Copy link
Contributor

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but could you add a test for the reuse mechanism?

# lived in
base_globals = vars(sys.modules[base_globals])
elif id(base_globals) in _BASE_GLOBALS_CACHE.keys():
base_globals = _BASE_GLOBALS_CACHE[id(base_globals)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you test this mechanism?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, was about to do it!

Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments:

@@ -934,7 +939,10 @@ def subimport(name):
def dynamic_subimport(name, vars):
mod = imp.new_module(name)
mod.__dict__.update(vars)
sys.modules[name] = mod
# adding a dynamic module to sys.module can cause recursive imports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: sys.modules

tests/cloudpickle_test.py Show resolved Hide resolved
cloned_f0()

# Ensure that the global variable is the same for another function
result_f1 = cloned_f1()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to "result_cloned_f1".

# adding a dynamic module to sys.module can cause recursive imports
# in python 2.x.x because site.Quitter contains sys as one of its globals
if sys.version > '3':
sys.modules[name] = mod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutating sys.modules only under Python 3 seems weird to me. Why the discrepancy between the two major versions? Why is this not required for Python 2?

@@ -78,6 +78,10 @@
PY3 = True


# cache that tracks the values of global variables.
_BASE_GLOBALS_CACHE = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that this module level dict will cause memory leaks if loky is used in long-running processes with dynamic modules. Is this cache really necessary?

# this checks if we can import the previous environment the object
# lived in
base_globals = vars(sys.modules[base_globals])
elif id(base_globals) in _BASE_GLOBALS_CACHE.keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not use the id of an object that can be garbage collected as a key in a dict: this object id can be reused by Python for a subsequent object.

@pierreglaser
Copy link
Member Author

@ogrisel I wrote a test that shows the use case of _BASE_GLOBALS_CACHE.
In summary, when pickling a function living in a dynamic module, it will no longer be in sys.modules, so calling vars(sys.modules[name]) at loading time won't work, and globals variables changes will be overriden. Hence this cache creation, that is basically an alternative to sys.modules.

@ogrisel
Copy link
Contributor

ogrisel commented Sep 10, 2018

I think this should be renamed to _dynamic_modules and use the module name as the key rather than id(module_name) that could be recycled for another object in case of garbage collection.

@pierreglaser
Copy link
Member Author

pierreglaser commented Sep 10, 2018

Agreed, I switched from id(module_name) to module_name in the last commit.
Noted for the _dynamic_modules comment.

# this checks if we can import the previous environment the object
# lived in
base_globals = vars(sys.modules[base_globals_name])
elif base_globals_name in _dynamic_modules.keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use the get function to simplify the code:

base_globals = _dynamic_modules_globals.get(base_globals_name], {})
_dynamic_modules_globals[base_globals_name] = base_globals

I would also rename _dynamic_modules to _dynamic_module_globals

first_f = pickle.loads(fileobj)
# change the mod's global variable x
mod.x = 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an assert that first_f() == 2


fileobj = cloudpickle.dumps(mod.func_that_relies_on_dynamic_module)

mod.x = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put mod.x = 3 and then test that first_f() == 3

# finally, re-load the dynamic module
new_f = pickle.loads(fileobj)

assert first_f() == new_f()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also verify the value here as one issue can be that unpickling new_f might reset the global variable value to the initial value 1.

@pierreglaser
Copy link
Member Author

as discussed with Olivier and Tom, dynamic modules caching is adressed in another PR.
Tests about dynamic modules global changes have been modified to comply with the standard behavior (e.g each time a function from a dynamic module is unpickled, a new set of globals gets created).

ogrisel and others added 14 commits September 11, 2018 15:04
during unpickling, in order to recreate the previous environnement an object lived in,
the module under the base_globals name is re-imported. However, it
is not always possible, especially if base_globals pointed to a
temporary or locally defined module. This PR introduces a fix that
checks if the module still exists during unpickling
two main fixes here:
* added a _BASE_GLOBALS_CACHE, that will track a functions globals in the
environnement where it is loaded

* fixed a python 2 issue due to dynamic_subimport
this modification was done to prevent side effects, and affectation
of cloudpickle's global variable, (here, _dynamic_modules).
Now, those global variables are only changed in ephemere child
processes
dynamic modules globals are not stored in a dict.
Each time a function from a dynamic module gets unpickled,
a new set of globals is created
Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a first review with a couple of requested changes.

@@ -934,7 +936,6 @@ def subimport(name):
def dynamic_subimport(name, vars):
mod = imp.new_module(name)
mod.__dict__.update(vars)
sys.modules[name] = mod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be changed in this PR, right? This line come from a very old commit (e7341b6).

The fact that this deleted line does not break any existing test is worrisome but this should not be changed by this PR in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, registering dynamic modules was a bug. Once registered as such, subsequent calls to dumps will consider the module as not dynamic which will make it impossible to load in other python processes.

@@ -78,6 +78,7 @@
PY3 = True



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not add this extra blank line.

CHANGES.md Outdated
master
======

- global variables from modules referenced in sys.modules in the child process
now overrides the initial global variables of the pickled function.
([issue #187](https://github.com/cloudpipe/cloudpickle/issues/187))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rephrase this as:

"""
Ensure that unpickling a locally defined function that accesses the global variables of a module does not reset the values of the global variables if they are already initialized.
"""

@@ -887,6 +891,107 @@ def f1():
clone_func=clone_func)
assert_run_python_script(textwrap.dedent(code))

def test_closure_interacting_with_a_global_variable(self):
global _TEST_GLOBAL_VARIABLE
orig_value = _TEST_GLOBAL_VARIABLE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add:

assert _TEST_GLOBAL_VARIABLE == "default_value"

to make the test both more easy to follow and robust to side effects by other tests.

Hence, a modification of the globals of one function should not
interact with the globals of another function.
This test validates this behavior.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I don't know what "should" happen in this case. I would rather remove this test from this PR otherwise it might give the impression that the current behavior is on purpose.

Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A couple of nitpicks though ;)

# make sure that a when loaded, a dynamic module
# preserves its dynamic property. Otherwise,
# this will lead to an import error if pickled in the child process
# and reloaded in another one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cosmetics: please do no insert new line too early in comments. Try to use your 80 char budgets. Also please start sentences with a capital case :)

child_of_child_process_script = {}

with open('dynamic_module_from_parent_process.pk', 'rb') as fid:
mod = pickle.load(fid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: fid => f. This is not an integer but a file object.

'''

# the script ran by the process created by the child process
child_of_child_process_script = ''' \'\'\'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use triple double quote on the outside """ so that you can use nested triple single quotes ''' in the inside.


child_of_child_process_script = {}

with open('dynamic_module_from_parent_process.pk', 'rb') as fid:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: use .pkl as filename extension for pickle files.

if os.path.exists('dynamic_module_from_parent_process.pk'):
os.unlink('dynamic_module_from_parent_process.pk')
if os.path.exists('dynamic_module_from_child_process.pk'):
os.unlink('dynamic_module_from_child_process.pk')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put those two filenames in local variables to avoid duplication and avoid hard to debug typos.

@ogrisel ogrisel merged commit 8eaf637 into cloudpipe:master Sep 13, 2018
@ogrisel
Copy link
Contributor

ogrisel commented Sep 13, 2018

Thanks @pierreglaser!

@mrocklin
Copy link
Contributor

This may have caused a regression described here: #202

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants