Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot save optimizer weights due to h5 error "object header message is too large" #11104

Closed
4 tasks
sakvaua opened this issue Sep 7, 2018 · 5 comments
Closed
4 tasks

Comments

@sakvaua
Copy link

sakvaua commented Sep 7, 2018

When trying to save my model I get the runtime error below. There was a similar issue when model layers names were too long and it can be solved by giving layers shorter names. This time the error pops up when saving optimizer weights. getattr(model.optimizer,'weights') shows

[<tf.Variable 'Adam/iterations:0' shape=() dtype=int64_ref>,
 <tf.Variable 'training/Adam/Variable:0' shape=(3, 3, 1, 64) dtype=float32_ref>,
 <tf.Variable 'training/Adam/Variable_1:0' shape=(64,) dtype=float32_ref>,
 <tf.Variable 'training/Adam/Variable_2:0' shape=(64,) dtype=float32_ref>,
...]

and if I convert it to numpy array its length is above the 64k limits which gives h5 runtime. I can save the model if I use save_model(....,include_optimizer=False) but I need the optimizer state. Is there any way I can reduce the length of "training/Adam/Variable:0"... names so as to fit them into 64k hdf5 table limit. Thanks.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-130-d231b4a5a40c> in <module>()
----> 1 model.save('model')

C:\Anaconda3\lib\site-packages\keras\engine\network.py in save(self, filepath, overwrite, include_optimizer)
   1083             raise NotImplementedError
   1084         from ..models import save_model
-> 1085         save_model(self, filepath, overwrite, include_optimizer)
   1086 
   1087     def save_weights(self, filepath, overwrite=True):

C:\Anaconda3\lib\site-packages\keras\engine\saving.py in save_model(model, filepath, overwrite, include_optimizer)
    173                     #print('Weight names',weight_names,len(weight_names),np.asarray(weight_names).nbytes)
    174                     optimizer_weights_group.attrs[
--> 175                         'weight_names'] = weight_names
    176                     for name, val in zip(weight_names, weight_values):
    177                         param_dset = optimizer_weights_group.create_dataset(

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

C:\Anaconda3\lib\site-packages\h5py\_hl\attrs.py in __setitem__(self, name, value)
     93         use the methods create() and modify().
     94         """
---> 95         self.create(name, data=value, dtype=base.guess_dtype(value))
     96 
     97     @with_phil

C:\Anaconda3\lib\site-packages\h5py\_hl\attrs.py in create(self, name, data, shape, dtype)
    186 
    187             try:
--> 188                 attr = h5a.create(self._id, self._e(tempname), htype, space)
    189             except:
    190                 raise

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\h5a.pyx in h5py.h5a.create()

RuntimeError: Unable to create attribute (object header message is too large)

Please make sure that the boxes below are checked before you submit your issue. If your issue is an implementation question, please ask your question on StackOverflow or join the Keras Slack channel and ask there instead of filing a GitHub issue.

Thank you!

  • Check that you are up-to-date with the master branch of Keras. You can update with:
    pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps

  • If running on TensorFlow, check that you are up-to-date with the latest version. The installation instructions can be found here.

  • If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
    pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps

  • Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

@jlherren
Copy link
Contributor

jlherren commented Sep 8, 2018

Do I understand this correctly that you're working with a model that has a quite large amount of weights? I'm guessing more than 2200 or so. Unfortunately it doesn't look like the names of these variables can be controlled, excepted for the name of the optimizer. So if you exceed those 64k by just a small amount you could perhaps save 3 bytes per variable by doing the following, but it really won't save you much:

class A(Adam):
    pass

While you wait for this to be fixed properly, you could try to just do save_model(..., include_optimizer=False) as you described, then re-open the h5py file and use your own custom optimizer saving code that doesn't have this problem. I regularly save additional data into h5py and it works great.

@gabrieldemarmiesse gabrieldemarmiesse added the To investigate Looks like a bug. It needs someone to investigate. label Sep 9, 2018
@gabrieldemarmiesse
Copy link
Contributor

Could you provide a minimal script to help us narrow down the possible bug? Thank you.

@sakvaua
Copy link
Author

sakvaua commented Sep 10, 2018

Here is a sample code that reproduces the error. It takes quite a bit of time though. 10 mins to save. Not sure why.

import numpy as np
from keras.models import Model, Input
from keras.layers import Conv2D, Concatenate, GlobalAveragePooling2D
from keras.optimizers import Adam

inp = Input(shape=(10, 10, 1))

layersC = []
for i in range(10):
    layers = []
    for j in range(40):
        x = Conv2D(1, (1, 1))(inp)
        layers.append(x)
    layersC.append(Concatenate()(layers))

out = Concatenate()(layersC)
out = Conv2D(1, (1, 1))(out)
out = GlobalAveragePooling2D()(out)
m = Model(inputs=inp, outputs=out)
m.compile(optimizer=Adam(1e-4), loss='mse')
m.summary()
x = np.array(np.random.normal(size=(100, 10, 10, 1), loc=0, scale=1))
y = np.array(np.random.normal(size=(100, 1), loc=0, scale=1))
m.fit(x=x, y=y)

symbolic_weights = getattr(m.optimizer, 'weights')
if symbolic_weights:
    weight_names = []
    for i, w in enumerate(symbolic_weights):
        if hasattr(w, 'name') and w.name:
            name = str(w.name)
        else:
            name = 'param_' + str(i)
        weight_names.append(name.encode('utf8'))
print(np.array(weight_names).nbytes)

m.save('model')

@gabrieldemarmiesse
Copy link
Contributor

I could indeed reproduce the issue with this script. Thanks for the detailed report.

@gabrieldemarmiesse gabrieldemarmiesse added type:bug/performance and removed To investigate Looks like a bug. It needs someone to investigate. labels Sep 10, 2018
@gabrieldemarmiesse
Copy link
Contributor

gabrieldemarmiesse commented Sep 10, 2018

Linked to issue #6766 PR welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants