-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX More func base globals fixes #198
FIX More func base globals fixes #198
Conversation
Codecov Report
@@ Coverage Diff @@
## master #198 +/- ##
==========================================
+ Coverage 84.95% 84.98% +0.02%
==========================================
Files 1 1
Lines 565 566 +1
Branches 109 110 +1
==========================================
+ Hits 480 481 +1
Misses 62 62
Partials 23 23
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but could you add a test for the reuse mechanism?
cloudpickle/cloudpickle.py
Outdated
# lived in | ||
base_globals = vars(sys.modules[base_globals]) | ||
elif id(base_globals) in _BASE_GLOBALS_CACHE.keys(): | ||
base_globals = _BASE_GLOBALS_CACHE[id(base_globals)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you test this mechanism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, was about to do it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments:
cloudpickle/cloudpickle.py
Outdated
@@ -934,7 +939,10 @@ def subimport(name): | |||
def dynamic_subimport(name, vars): | |||
mod = imp.new_module(name) | |||
mod.__dict__.update(vars) | |||
sys.modules[name] = mod | |||
# adding a dynamic module to sys.module can cause recursive imports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: sys.modules
tests/cloudpickle_test.py
Outdated
cloned_f0() | ||
|
||
# Ensure that the global variable is the same for another function | ||
result_f1 = cloned_f1() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename to "result_cloned_f1".
cloudpickle/cloudpickle.py
Outdated
# adding a dynamic module to sys.module can cause recursive imports | ||
# in python 2.x.x because site.Quitter contains sys as one of its globals | ||
if sys.version > '3': | ||
sys.modules[name] = mod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutating sys.modules
only under Python 3 seems weird to me. Why the discrepancy between the two major versions? Why is this not required for Python 2?
cloudpickle/cloudpickle.py
Outdated
@@ -78,6 +78,10 @@ | |||
PY3 = True | |||
|
|||
|
|||
# cache that tracks the values of global variables. | |||
_BASE_GLOBALS_CACHE = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am worried that this module level dict will cause memory leaks if loky is used in long-running processes with dynamic modules. Is this cache really necessary?
cloudpickle/cloudpickle.py
Outdated
# this checks if we can import the previous environment the object | ||
# lived in | ||
base_globals = vars(sys.modules[base_globals]) | ||
elif id(base_globals) in _BASE_GLOBALS_CACHE.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not use the id of an object that can be garbage collected as a key in a dict: this object id can be reused by Python for a subsequent object.
@ogrisel I wrote a test that shows the use case of |
I think this should be renamed to |
Agreed, I switched from |
cloudpickle/cloudpickle.py
Outdated
# this checks if we can import the previous environment the object | ||
# lived in | ||
base_globals = vars(sys.modules[base_globals_name]) | ||
elif base_globals_name in _dynamic_modules.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use the get
function to simplify the code:
base_globals = _dynamic_modules_globals.get(base_globals_name], {})
_dynamic_modules_globals[base_globals_name] = base_globals
I would also rename _dynamic_modules
to _dynamic_module_globals
tests/cloudpickle_test.py
Outdated
first_f = pickle.loads(fileobj) | ||
# change the mod's global variable x | ||
mod.x = 2 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add an assert that first_f() == 2
tests/cloudpickle_test.py
Outdated
|
||
fileobj = cloudpickle.dumps(mod.func_that_relies_on_dynamic_module) | ||
|
||
mod.x = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put mod.x = 3
and then test that first_f() == 3
tests/cloudpickle_test.py
Outdated
# finally, re-load the dynamic module | ||
new_f = pickle.loads(fileobj) | ||
|
||
assert first_f() == new_f() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also verify the value here as one issue can be that unpickling new_f
might reset the global variable value to the initial value 1
.
as discussed with Olivier and Tom, dynamic modules caching is adressed in another PR. |
during unpickling, in order to recreate the previous environnement an object lived in, the module under the base_globals name is re-imported. However, it is not always possible, especially if base_globals pointed to a temporary or locally defined module. This PR introduces a fix that checks if the module still exists during unpickling
two main fixes here: * added a _BASE_GLOBALS_CACHE, that will track a functions globals in the environnement where it is loaded * fixed a python 2 issue due to dynamic_subimport
this modification was done to prevent side effects, and affectation of cloudpickle's global variable, (here, _dynamic_modules). Now, those global variables are only changed in ephemere child processes
dynamic modules globals are not stored in a dict. Each time a function from a dynamic module gets unpickled, a new set of globals is created
f76c3c2
to
da966e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a first review with a couple of requested changes.
@@ -934,7 +936,6 @@ def subimport(name): | |||
def dynamic_subimport(name, vars): | |||
mod = imp.new_module(name) | |||
mod.__dict__.update(vars) | |||
sys.modules[name] = mod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be changed in this PR, right? This line come from a very old commit (e7341b6).
The fact that this deleted line does not break any existing test is worrisome but this should not be changed by this PR in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, registering dynamic modules was a bug. Once registered as such, subsequent calls to dumps will consider the module as not dynamic which will make it impossible to load in other python processes.
cloudpickle/cloudpickle.py
Outdated
@@ -78,6 +78,7 @@ | |||
PY3 = True | |||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not add this extra blank line.
CHANGES.md
Outdated
master | ||
====== | ||
|
||
- global variables from modules referenced in sys.modules in the child process | ||
now overrides the initial global variables of the pickled function. | ||
([issue #187](https://github.com/cloudpipe/cloudpickle/issues/187)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rephrase this as:
"""
Ensure that unpickling a locally defined function that accesses the global variables of a module does not reset the values of the global variables if they are already initialized.
"""
tests/cloudpickle_test.py
Outdated
@@ -887,6 +891,107 @@ def f1(): | |||
clone_func=clone_func) | |||
assert_run_python_script(textwrap.dedent(code)) | |||
|
|||
def test_closure_interacting_with_a_global_variable(self): | |||
global _TEST_GLOBAL_VARIABLE | |||
orig_value = _TEST_GLOBAL_VARIABLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add:
assert _TEST_GLOBAL_VARIABLE == "default_value"
to make the test both more easy to follow and robust to side effects by other tests.
tests/cloudpickle_test.py
Outdated
Hence, a modification of the globals of one function should not | ||
interact with the globals of another function. | ||
This test validates this behavior. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I don't know what "should" happen in this case. I would rather remove this test from this PR otherwise it might give the impression that the current behavior is on purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A couple of nitpicks though ;)
tests/cloudpickle_test.py
Outdated
# make sure that a when loaded, a dynamic module | ||
# preserves its dynamic property. Otherwise, | ||
# this will lead to an import error if pickled in the child process | ||
# and reloaded in another one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cosmetics: please do no insert new line too early in comments. Try to use your 80 char budgets. Also please start sentences with a capital case :)
tests/cloudpickle_test.py
Outdated
child_of_child_process_script = {} | ||
|
||
with open('dynamic_module_from_parent_process.pk', 'rb') as fid: | ||
mod = pickle.load(fid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: fid => f. This is not an integer but a file object.
tests/cloudpickle_test.py
Outdated
''' | ||
|
||
# the script ran by the process created by the child process | ||
child_of_child_process_script = ''' \'\'\' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use triple double quote on the outside """
so that you can use nested triple single quotes '''
in the inside.
tests/cloudpickle_test.py
Outdated
|
||
child_of_child_process_script = {} | ||
|
||
with open('dynamic_module_from_parent_process.pk', 'rb') as fid: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: use .pkl
as filename extension for pickle files.
tests/cloudpickle_test.py
Outdated
if os.path.exists('dynamic_module_from_parent_process.pk'): | ||
os.unlink('dynamic_module_from_parent_process.pk') | ||
if os.path.exists('dynamic_module_from_child_process.pk'): | ||
os.unlink('dynamic_module_from_child_process.pk') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put those two filenames in local variables to avoid duplication and avoid hard to debug typos.
Thanks @pierreglaser! |
This may have caused a regression described here: #202 |
Follow up on #192, itself being a follow up on #187
The goal is to keep track of global variable modifications in the child
environment where functions are unpickled, and to prevent those modifications to be overriden if an object containing those globals gets reloaded.
NOTE: The case where the global variables live in
__main__
was handled in here by @tomMoralfor the case where they live in any other module, a solution was proposed here by @ogrisel. The principle was to keep the name of the module where the globals lived instead of the variables themselves, and to call
vars(sys.modules[module_name])
at unpickling time only.However, the module where the pickled object lived in the parent environment does not always exist in the child environment. So
sys.modules
has to be checked in the new environment.If the module does not exist,
base_globals
start to get tracked in case of furtherre-loadings, using
_BASE_GLOBALS_CACHE
Ultimately, this was causing an issue in python 2 in the case of dynamic
modules, because
dynamic_subimport
, that creates the dynamic module, writes into sys.modules, which can cause recursive imports in python 2.