Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary changed size during iteration #1201

Closed
cinjon opened this issue May 21, 2018 · 9 comments
Closed

Dictionary changed size during iteration #1201

cinjon opened this issue May 21, 2018 · 9 comments

Comments

@cinjon
Copy link

cinjon commented May 21, 2018

  • Tensorboard 1.8.0
  • OS: Ubuntu 16.04
  • Python 3.6

I'm getting this RuntimeError. There doesn't seem to be a way for the tag_to_content to change though.

E0521 09:37:15.386984 Thread-3 _internal.py:88] Error on request:
Traceback (most recent call last):
  File "/private/home/cinjon/anaconda3/lib/python3.6/site-packages/werkzeug/serving.py", line 270, in run_wsgi
    execute(self.server.app)
  File "/private/home/cinjon/anaconda3/lib/python3.6/site-packages/werkzeug/serving.py", line 258, in execute
    application_iter = app(environ, start_response)
  File "/private/home/cinjon/anaconda3/lib/python3.6/site-packages/tensorboard/backend/application.py", line 272, in __call__
    return self.data_applications[clean_path](environ, start_response)
  File "/private/home/cinjon/anaconda3/lib/python3.6/site-packages/werkzeug/wrappers.py", line 308, in application
    resp = f(*args[:-2] + (request,))
  File "/private/home/cinjon/anaconda3/lib/python3.6/site-packages/tensorboard/plugins/scalar/scalars_plugin.py", line 185, in tags_route
    index = self.index_impl()
  File "/private/home/cinjon/anaconda3/lib/python3.6/site-packages/tensorboard/plugins/scalar/scalars_plugin.py", line 117, in index_impl
    for (tag, content) in six.iteritems(tag_to_content):
RuntimeError: dictionary changed size during iteration
@cinjon
Copy link
Author

cinjon commented May 21, 2018

I'm still baffled what is mutating this. Additionally, it appears that content (

content = metadata.parse_plugin_metadata(content)
) is not used.

@cinjon
Copy link
Author

cinjon commented May 21, 2018

I also tried this with python 3.5 and got the same error.

@jart
Copy link
Contributor

jart commented May 21, 2018

That definitely looks like a bug. Chances are it's not being mutated by that function, but rather by the thread that's reading for new events. In Python2 the solution would be simple, which is called items() rather than iteritems() but this behavior flipped in Python3 and I'm not sure if six provides an abstract, and in either case, there could be a thread race condition. @nfelt could you take a look?

@cinjon
Copy link
Author

cinjon commented May 21, 2018

A deepcopy on tag_to_content seems to fix it, although wow is this going slow now.

@jart
Copy link
Contributor

jart commented May 21, 2018

Yes there's a lot of stuff in there. Adding a lock around the precise data structure is likely what's required. I'm surprised we haven't encountered this before.

@nfelt
Copy link
Contributor

nfelt commented May 21, 2018

I actually have seen this crop up before, though it's generally been hard to reproduce.

I think a sufficient fix would be adding another set of locks, one per value of PluginEventAcccumulator._plugin_to_tag_to_content (where the value is a dict of tag to content). Then we acquire the relevant lock:

  1. when adding a new tag to a tag-to-content dict:
    if tag not in self.summary_metadata:
    self.summary_metadata[tag] = value.metadata
    plugin_data = value.metadata.plugin_data
    if plugin_data.plugin_name:
    self._plugin_to_tag_to_content[plugin_data.plugin_name][tag] = (
    plugin_data.content)
  2. when fetching the tag-to-content dict for a given plugin:
    def PluginTagToContent(self, plugin_name):
    """Returns a dict mapping tags to content specific to that plugin.
    Args:
    plugin_name: The name of the plugin for which to fetch plugin-specific
    content.
    Raises:
    KeyError: if the plugin name is not found.
    Returns:
    A dict mapping tags to plugin-specific content (which are always strings).
    Those strings are often serialized protos.
    """
    if plugin_name not in self._plugin_to_tag_to_content:
    raise KeyError('Plugin %r could not be found.' % plugin_name)
    return self._plugin_to_tag_to_content[plugin_name]

I think this shouldn't be too much of a performance hit, hopefully less than a deepcopy on every read, which I don't think would 100% prevent the issue (since I don't think deepcopy is atomic in python, so it could still hit the race condition, there's just a smaller window for it).

@tothovak
Copy link

tothovak commented Jun 1, 2018

Is there any way around this? Tensorboard was running for me for good 2 weeks then this problem started happening and it won't work anymore. Can't really downgrade to1.7 as am using TF 1.8 functionalities.

@nfelt
Copy link
Contributor

nfelt commented Jun 5, 2018

For now one workaround could be setting the background reload via --reload_interval higher than the default (5 second interval), which should at least make this happen less often.

@nfelt
Copy link
Contributor

nfelt commented Jun 11, 2018

This should be fixed by #1235 and will be in the 1.9 release of TensorBoard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants