Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove expensive plugin initialization code #1114

Merged
merged 1 commit into from
May 17, 2018
Merged

Conversation

jart
Copy link
Contributor

@jart jart commented Apr 9, 2018

This change does some code cleanup on Beholder to fix #1107, where invoking
TensorFlow routines caused a nontrivial amount of GPU to be reserved.

@jart jart requested a review from nfelt April 9, 2018 20:22
Copy link
Contributor

@nfelt nfelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this change looks like it's mostly refactoring to keep this code from running at construction time. Does that fix #1107? If it does, does this mean TensorBoard will start using GPU again if somebody activates Beholder (or anything else using encode_png() for that matter) and the PersistentOpEvaluator config doesn't actually work?

@@ -137,8 +135,7 @@ def _serve_change_config(self, request):
def _serve_section_info(self, request):
path = '{}/{}'.format(
self.PLUGIN_LOGDIR, shared_config.SECTION_INFO_FILENAME)
info = file_system_tools.read_pickle(path, default=self.most_recent_info)
self.most_recent_info = info
info = file_system_tools.read_pickle(path, default=DEFAULT_INFO)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I'm not sure why we're removing the "cache last data shown" behavior. It's not really necessary to fix this bug, since we could still keep most_recent_info and then just default to None here, and then

if info is None:
  info = self.most_recent_info if self.most_recent_info is not None else DEFAULT_INFO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we preserve the caching behavior here too if we're doing it for the frame? Right now init is storing most_recent_info but it's never being used. I think just reverting the delta at these lines would suffice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

except (message.DecodeError, IOError, tf.errors.NotFoundError):
return self.most_recent_frame

with self._lock: # only to put boundaries on workload for now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment - what does the lock accomplish? This code should be called synchronously from the _frame_generator() helper so the locking only would come into play if we were somehow streaming via two requests at once, which shouldn't normally be possible in the front end logic. Also, if we did want to bound workload during multiple streams, it would make more sense to have the current frame fetching logic occurring in only a single thread. This change would instead result in two request threads competing to fetch frames, so you might end up sort of round-robining frames to the two streams which seems very weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_frame_generator() is an endpoint for a multi-threaded HTTP server. I agree that a single thread should be what's computing this for all viewers. Might not be worth the effort quite yet, since it's likely there'll only be a single viewer. We don't have the resources quite yet to make Beholder scale. But we will in the future.

return frame

except (message.DecodeError, IOError, tf.errors.NotFoundError):
return self.most_recent_frame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing most_recent_frame? That's a behavior change (missing data now immediately goes blank instead of preserving the last frame and section info on screen) and it's not clear to me that this is better for users? It doesn't seem necessary to fix the initialization logic; you can still keep most_recent_frame and if it's unset, then default to the no data PNG.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@jart jart force-pushed the disable-gpu branch 2 times, most recently from 8a5d91c to 769dd26 Compare April 16, 2018 23:26
Copy link
Contributor Author

@jart jart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL @nfelt

except (message.DecodeError, IOError, tf.errors.NotFoundError):
return self.most_recent_frame

with self._lock: # only to put boundaries on workload for now
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_frame_generator() is an endpoint for a multi-threaded HTTP server. I agree that a single thread should be what's computing this for all viewers. Might not be worth the effort quite yet, since it's likely there'll only be a single viewer. We don't have the resources quite yet to make Beholder scale. But we will in the future.

return frame

except (message.DecodeError, IOError, tf.errors.NotFoundError):
return self.most_recent_frame
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@jart
Copy link
Contributor Author

jart commented Apr 26, 2018

PTAL @nfelt this is blocking #1155.

@@ -419,7 +419,8 @@ def _lazily_initialize(self):
graph = tf.Graph()
with graph.as_default():
self.initialize_graph()
self._session = tf.Session(graph=graph)
config = tf.ConfigProto(device_count={'GPU': 0})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here noting that we're deliberately keeping this off the GPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -137,8 +135,7 @@ def _serve_change_config(self, request):
def _serve_section_info(self, request):
path = '{}/{}'.format(
self.PLUGIN_LOGDIR, shared_config.SECTION_INFO_FILENAME)
info = file_system_tools.read_pickle(path, default=self.most_recent_info)
self.most_recent_info = info
info = file_system_tools.read_pickle(path, default=DEFAULT_INFO)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we preserve the caching behavior here too if we're doing it for the frame? Right now init is storing most_recent_info but it's never being used. I think just reverting the delta at these lines would suffice.

This change does some code cleanup on Beholder to fix tensorflow#1107, where invoking
TensorFlow routines caused a nontrivial amount of GPU to be reserved.
@jart jart merged commit 2b7bb8b into tensorflow:master May 17, 2018
@jart jart deleted the disable-gpu branch May 17, 2018 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TensorBoard 1.7.0 is reserving GPU
2 participants