Make keras.Model picklable #14748

adriangb · 2021-06-18T07:18:20Z

An attempt at porting over from tensorflow/tensorflow#39609.

Is this something that should now go here (in this repo) now?

Thanks

fchollet

Thanks for the PR. This is valuable functionality.

fchollet · 2021-06-20T19:13:27Z

keras/saving/pickle_utils.py

+  Returns:
+      keras.Model: a Keras Model instance.
+  """
+  temp_dir = f"ram://{uuid4()}"


Will this work across all systems?

As per tensorflow/tensorflow#48086, there is currently a bug in TF that makes this fail on Windows. It looks like the current momentum is to fix it via tensorflow/tensorflow#48125, which is getting close to being ready.

So for now this would only work on *nix, but once the bug in TF is fixed via that PR or otherwise, this should work on all platforms without any change to this PR / Keras.

fchollet · 2021-06-20T19:15:08Z

keras/saving/pickle_utils.py

+from keras.saving.save import load_model
+
+
+def unpack_model(packed_keras_model):


Let's use the names serialize_as_bytecode and deserialize_from_bytecode to be more explicit.

Sounds great, thank you for the suggestion

fchollet · 2021-06-20T19:16:01Z

keras/saving/pickle_utils_test.py

+  """Tests pickle protoocol support.
+  """
+
+  @keras_parameterized.run_all_keras_modes


This parameterization isn't useful here.

Ok, removed

fchollet · 2021-06-20T19:16:42Z

keras/saving/pickle_utils_test.py

+
+  @keras_parameterized.run_all_keras_modes
+  def test_pickle_model(self):
+    """Test copy.copy, copy.deepcopy and pickle on Functional Model."""


Let's test all model types (Sequential, Functional, subclass). We have a parameterization for that. @keras_parameterized.run_with_all_model_types. See examples.

I attempted to incorporate this parametrization based on other examples, but I'm not 100% sure I got it right.

It seems to be working, I'm seeing tests run for sequential, subclass, etc.

fchollet

LGTM, thank you!

…model-picklable

fchollet · 2021-06-21T22:15:49Z

I'm actually seeing an error in the subclass model case:

Traceback (most recent call last):
  File "<embedded stdlib>/copyreg.py", line 69, in _reduce_ex
    getstate = self.__getstate__
AttributeError: 'ObjectIdentityDictionary' object has no attribute '__getstate__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "absl/testing/parameterized.py", line 316, in bound_param_test
    return test_method(self, *testcase_params)
  File "keras/keras_parameterized.py", line 284, in decorated
    _test_subclass_model_type(f, self, *args, **kwargs)
  File "keras/keras_parameterized.py", line 301, in _test_subclass_model_type
    f(test_or_class, *args, **kwargs)
  File "keras/saving/pickle_utils_test.py", line 47, in test_pickle_model
    model = roundtrip(original_model)
  File "keras/saving/pickle_utils_test.py", line 38, in roundtrip
    model = pickle.loads(pickle.dumps(model, protocol=protocol))
  File "<embedded stdlib>/copyreg.py", line 72, in _reduce_ex
    raise TypeError("a class that defines __slots__ without "
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled

ObjectIdentityDictionary appears to be a class defined in tensorflow/python/util/object_identity.py. It may be that we have to fix it there first before we can proceed with this PR.

fchollet · 2021-06-21T23:05:14Z

Correction: this object is actually replicated in keras/utils/object_identity.py.

adriangb · 2021-06-22T01:36:25Z

Yeah, I see that as well.

Would you propose implementing __{get,set}state__ for those objects? If it's just those, that seems reasonable.

I think another alternative might be to require pickle protocol >=3 (default as of Python 3.4 I believe); I'll test this and report back.

fchollet · 2021-06-22T17:23:24Z

Would you propose implementing {get,set}state for those objects? If it's just those, that seems reasonable.

Yes, I think it would be straightforward. We'd have to do it for ObjectIdentityDictionary, _ObjectIdentityWrapper, ObjectIdentitySet. Some have weakrefs so it will require a little bit of thinking but still very straightforward.

Changes would have to be replicated in the TF versions of these objects in a separate PR (for consistency).

I think another alternative might be to require pickle protocol >=3 (default as of Python 3.4 I believe); I'll test this and report back.

That is fine too if that works.

adriangb · 2021-06-23T16:05:51Z

require pickle protocol >=3

implementing {get,set}state for those objects

Both of these solutions led to the same problem: somewhere a weakref object is (attempting) to be pickled. I can't tell where because this is happening within cPickle, so there isn't a traceback. I haven't had any luck setting up a test.py to test manually in pdb or otherwise, it seems that some protobuf compiling and such is needed which I guess Bazel does automatically.

But maybe let's think higher level for a second: this is only happening for subclassed models, and only for untrained models (as per this PR, if Model.built is False copying/pickling is delegated to object because SavedModel doesn't support unbuilt models; please correct me if this is wrong). Why would subclassed models behave any differently than non-subclassed models? That is, I'd expect:

class MyModel(keras.Model):
    ....

To behave just like keras.Model. So there must be some other stuff going on with the subclassing?

fchollet · 2021-06-24T00:12:57Z

Both of these solutions led to the same problem

Did you apply the fix to all 3 objects I listed? I really do expect that 3 objects are the only problem.

But maybe let's think higher level for a second: this is only happening for subclassed models, and only for untrained models (as per this PR, if Model.built is False copying/pickling is delegated to object because SavedModel doesn't support unbuilt models

I would suggest simply raising a clear error message when people try to pickle an unbuilt model. That would solve the issue.

fchollet · 2021-07-04T21:06:24Z

Thanks for the update. Some tests are failing: https://source.cloud.google.com/results/invocations/1f3f9ea2-14c6-4d42-907f-74d572c363ac/targets/keras%2Fgithub%2Fubuntu%2Fcpu%2Fpresubmit/log

adriangb · 2021-07-04T23:10:06Z

Apologies, I didn't run the entire test suite before pushing.

It looks like the failure is coming from here:

keras/keras/tests/model_subclassing_test.py

Lines 711 to 724 in 70d7d07

    
           class MyModel(keras.Model): 
        
             def __init__(self): 
        
               super(MyModel, self).__init__() 
        
               self.my_variable = tf.Variable(0.0, trainable=False) 
        
               self.layer = keras.layers.Dense(4) 
        
             def call(self, obs): 
        
               return self.layer(obs) 
        
           model = MyModel() 
        
           model.my_variable.assign_add(1.0) 
        
           new_model = copy.deepcopy(model)

Please correct me if I am misinterpreting the logs.

The above test instantiates a subclassed model, then tries to copy.deepcopy without fitting or calling .build.
Before this PR, that would implicitly fall back to object.__reduce__, but with this PR it is now routed to SaveModel.
But as discussed above, SaveModel does not support unbuilt models.
And, also as discussed in #14748 (comment), object.__reduce__ does not support all types of models.

In my opinion, the existing test is not great; it is testing an API that only partially works, and was never explicitly supported/implemented. I would remove that test and only explicitly support built models, as discussed in #14748 (comment). But perhaps I am missing some of the finer details.

What do you think @fchollet ?

fchollet · 2021-07-05T21:11:25Z

Before this PR, that would implicitly fall back to object.reduce, but with this PR it is now routed to SaveModel.

Can't we just fall back to object.__reduce__ when the model is unbuilt? That way we preserve the existing behavior, while adding robust support for pickling / copying built models.

It would be bad practice to break an existing, tested behavior. Given how extensively Keras is used at Google this would be pretty much guaranteed to break some internal builds, which we'd have to resolve on our end before merging the PR, which could significantly delay merging the PR.

adriangb · 2021-07-05T21:16:59Z

Can't we just fall back to object.reduce when the model is unbuilt?

That is what we were doing, but object.__reduce__ doesn't support models that have those weakref-based wrappers (which I guess are part of the parametrized tests but no the existing copy.copy test that is now failing), which is why back at #14748 (comment) we had decided to only support unbuilt models.

In other words, the existing test (keras/keras/tests/model_subclassing_test.py -> test_deepcopy) is testing with a class of model that seems to work fine with object.__reduce__, but the naïve behavior of object.__reduce__ can't support all Keras models, since we already know that some of the ones generated by parametrization can't be copied via object.__reduce__.

At least that is my understanding of the situation.

There are 3 options I can think of:

Don't support unbuilt models and remove the existing test. As you say above, this is probably a bad idea.
Revert this PR back to 28c1187 (when we were supporting unbuilt models) and exclude the subclassed parametrization in the new test, maybe manually testing a simple subclassed model like the one in keras/keras/tests/model_subclassing_test.py -> test_deepcopy.
Revert this PR back to 28c1187 and fix the pickling of those wrapper objects and any other issues that get uncovered once that's fixed (which I attempted to do and failed, but can try again).

fchollet · 2021-07-07T04:03:57Z

Revert this PR back to 28c1187 and fix the pickling of those wrapper objects and any other issues that get uncovered once that's fixed (which I attempted to do and failed, but can try again).

This is the more robust solution of the lot, and I believe it is quite doable -- adding support for the list of objects I mentioned should be enough.

If you try this and it fails, I would recommend falling back to:

exclude the subclassed parametrization in the new test, maybe manually testing a simple subclassed model like the one in keras/keras/tests/model_subclassing_test.py -> test_deepcopy.

This is an acceptable option because for the models that will fail the user will see an explicit error message -- "your model contains these weird objects that can't be pickled." The user will be left wondering, "wait, why?" but hopefully this will only happen for a small set of models.

adriangb · 2021-07-10T20:10:16Z

@fchollet I think the weak refs can be worked around by doing something like*

class Model:

  def __getstate__(self):
      state = super().__getstate__()
      state.pop("_compiled_trainable_state", None)
      state.pop("_trackable_saver", None)
      return state

  def __setstate__(self, state):
      super().__setstate__(state)
      self._reset_compile_cache()
      self._trackable_saver = saver_with_op_caching(self)

I'm not sure if this is safe or not, I do not have the context for what the expected behavior/use of this data is.
For what it's worth, bazel test -c opt -- //keras/saving/... //keras/engine/... //keras/tests/... still passes all tests.

In any case, this just led me down a rabbit hole of more unpicklable things around the Keras/TensorFlow codebase:

And some more that I lost track of.
For those two, the fix consists of moving the locals to the module level.
For the Metric issue, maybe using keras.metrics.serialize would work.
Unfortunately, I do not have time to keep digging deeper and make multiple PRs across both repos.
And I'd also fear breaking something that might not even be tested, these fixes start to get deep into implementation details which I'm afraid may only be obvious to those with extensive experience in the codebase.

Thus, I would like to propose that we scope/implement this PR as follows:

Built models can be copied, deepcopied and pickled (backed by SavedModel)
Unbuilt models can be copied and deepcopied, but we make no promises about pickling (this will depend on what losses / optimizers / etc. the model is constructed with).

I think this should take care of the most common use cases without breaking any existing use cases or tests, while also leaving the door open for those loose ends around the codebase to be tied up so that Keras can promise picklability of unbuilt models in the future.

dca259a implements this proposal, and passes bazel test -c opt -- //keras/saving/... //keras/engine/... //keras/tests/... locally for me.

* I did not push this because it is not needed unless we wanted to attempt to support pickling of unbuilt models int this PR, which I am proposing we do not

fchollet

Thanks for the update!

I think the weak refs can be worked around by doing something like*

I believe this would invalidate model compilation (you'd have to recompile the model after copy). Which of course would not matter for an unbuilt model. Overall the workaround looks mysterious/complex so it's better not to do it, for the sake of future maintainability.

Thus, I would like to propose that we scope/implement this PR as follows:

That sounds good to me.

fchollet · 2021-07-11T18:26:41Z

keras/engine/training.py

+      # it _may_ be possible to serialize as a plain Python object,
+      # as long as the constituent parts (layers, optimizers, losses, etc.)
+      # can be serialized as plain Python objects.
+      # Thus we call up the MRO to get an implementation of __reduce__


MRO == Method Resolution Order here, but I admit it's probably not the best term to use, much less abbreviated.

How about Thus we call up the superclass hierarchy to get an implementation of __reduce__?

keras/engine/training.py

keras/saving/pickle_utils.py

adriangb · 2021-07-12T05:26:59Z

@fchollet thank you for that last round of review. I pushed the requested changes in 8a96eed

fchollet

LGTM, thanks!

adriangb added 2 commits June 17, 2021 23:51

Add file

ae36af8

Add tests

c583322

google-cla bot added the cla: yes label Jun 18, 2021

Add test file

1e849d6

fchollet reviewed Jun 20, 2021

View reviewed changes

adriangb and others added 9 commits June 20, 2021 12:51

Rename methods

b1030b2

remove parametrization in pickle_utils_test.py

f1c5826

try to use run_with_all_model_types parametrization

07df258

Merge branch 'master' into make-model-picklable

da0b015

fix build

472877b

use HIGHEST_PROTOCOL

8ddbf34

Rename parameter & docstrings

644b84f

Add comment

79260c6

Update pickle_utils.py

2f18a82

fchollet approved these changes Jun 20, 2021

View reviewed changes

fchollet added the ready to pull Ready to be merged into the codebase label Jun 20, 2021

adriangb added 2 commits June 20, 2021 14:24

fix range end & use get_small_mlp

df8c58d

Merge remote-tracking branch 'origin/make-model-picklable' into make-…

d63e67d

…model-picklable

fchollet approved these changes Jun 20, 2021

View reviewed changes

fchollet added ready to pull Ready to be merged into the codebase and removed ready to pull Ready to be merged into the codebase labels Jun 20, 2021

fchollet self-assigned this Jun 22, 2021

adriangb mentioned this pull request Jun 24, 2021

SaveModel fails after multiple roundtrips #14808

Closed

include folders

28c1187

adriangb added 2 commits July 10, 2021 11:05

Remove prints

5a33c7d

Support copying but not pickling for unbuilt models

dca259a

adriangb added 2 commits July 10, 2021 15:18

revert accidental docstring change

b6bffe0

improve clarify of comment

5f50628

fchollet requested changes Jul 11, 2021

View reviewed changes

Fix imports & docstrings

8a96eed

Clarify meaning of MRO

1ffb436

fchollet approved these changes Jul 13, 2021

View reviewed changes

fchollet added kokoro:force-run and removed ready to pull Ready to be merged into the codebase labels Jul 13, 2021

kokoro-team removed the kokoro:force-run label Jul 13, 2021

adriangb mentioned this pull request Jul 14, 2021

Implement __reduce__ on tf.keras.Model to copy.deepcopy and pickle tensorflow/tensorflow#39609

Closed

qlzh727 added the ready to pull Ready to be merged into the codebase label Jul 15, 2021

copybara-service bot merged commit a86bc99 into keras-team:master Jul 16, 2021

adriangb deleted the make-model-picklable branch July 16, 2021 18:34

hbaniecki mentioned this pull request Aug 22, 2021

Keras Error: cannot pickle '_thread.RLock' object ModelOriented/DALEX#256

Closed

adriangb mentioned this pull request Sep 6, 2021

DRAFT: Avoid patching the Keras Model if Keras alrady supports __reduce__ adriangb/scikeras#248

Draft

Edwin-Koh1 mentioned this pull request Sep 13, 2021

Keras model pickle-able but tf.keras model not pickle-able tensorflow/tensorflow#34697

Closed

antonymilne mentioned this pull request Oct 25, 2021

Dataset for saving a sklearn pipeline with a tensorflow estimator kedro-org/kedro#982

Closed

adriangb mentioned this pull request Mar 31, 2022

Model serializer path is incompatible with Windows keras-team/tf-keras#87

Open

This was referenced May 13, 2022

BatchNormalization layer fails to deserialize when working with Tensorflow securefederatedai/openfl#422

Closed

tf.keras Serialization fix securefederatedai/openfl#432

Merged

MasterSkepticista mentioned this pull request Jul 2, 2022

Can’t use custom loss or metrics in Director-Based workflow securefederatedai/openfl#451

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make keras.Model picklable #14748

Make keras.Model picklable #14748

adriangb commented Jun 18, 2021

fchollet left a comment

fchollet Jun 20, 2021

adriangb Jun 20, 2021

fchollet Jun 20, 2021

adriangb Jun 20, 2021

fchollet Jun 20, 2021

adriangb Jun 20, 2021

fchollet Jun 20, 2021

adriangb Jun 20, 2021

adriangb Jun 20, 2021

fchollet left a comment

fchollet commented Jun 21, 2021

fchollet commented Jun 21, 2021

adriangb commented Jun 22, 2021 •

edited

Loading

fchollet commented Jun 22, 2021

adriangb commented Jun 23, 2021

fchollet commented Jun 24, 2021

fchollet commented Jul 4, 2021

adriangb commented Jul 4, 2021 •

edited

Loading

fchollet commented Jul 5, 2021

adriangb commented Jul 5, 2021 •

edited

Loading

fchollet commented Jul 7, 2021

adriangb commented Jul 10, 2021

fchollet left a comment

fchollet Jul 11, 2021

adriangb Jul 12, 2021

adriangb commented Jul 12, 2021 •

edited

Loading

fchollet left a comment

		from keras.saving.save import load_model


		def unpack_model(packed_keras_model):

Make keras.Model picklable #14748

Make keras.Model picklable #14748

Conversation

adriangb commented Jun 18, 2021

fchollet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fchollet left a comment

Choose a reason for hiding this comment

fchollet commented Jun 21, 2021

fchollet commented Jun 21, 2021

adriangb commented Jun 22, 2021 • edited Loading

fchollet commented Jun 22, 2021

adriangb commented Jun 23, 2021

fchollet commented Jun 24, 2021

fchollet commented Jul 4, 2021

adriangb commented Jul 4, 2021 • edited Loading

fchollet commented Jul 5, 2021

adriangb commented Jul 5, 2021 • edited Loading

fchollet commented Jul 7, 2021

adriangb commented Jul 10, 2021

fchollet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb commented Jul 12, 2021 • edited Loading

fchollet left a comment

Choose a reason for hiding this comment

adriangb commented Jun 22, 2021 •

edited

Loading

adriangb commented Jul 4, 2021 •

edited

Loading

adriangb commented Jul 5, 2021 •

edited

Loading

adriangb commented Jul 12, 2021 •

edited

Loading