-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove expensive plugin initialization code #1114
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this change looks like it's mostly refactoring to keep this code from running at construction time. Does that fix #1107? If it does, does this mean TensorBoard will start using GPU again if somebody activates Beholder (or anything else using encode_png() for that matter) and the PersistentOpEvaluator config doesn't actually work?
@@ -137,8 +135,7 @@ def _serve_change_config(self, request): | |||
def _serve_section_info(self, request): | |||
path = '{}/{}'.format( | |||
self.PLUGIN_LOGDIR, shared_config.SECTION_INFO_FILENAME) | |||
info = file_system_tools.read_pickle(path, default=self.most_recent_info) | |||
self.most_recent_info = info | |||
info = file_system_tools.read_pickle(path, default=DEFAULT_INFO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I'm not sure why we're removing the "cache last data shown" behavior. It's not really necessary to fix this bug, since we could still keep most_recent_info and then just default to None here, and then
if info is None:
info = self.most_recent_info if self.most_recent_info is not None else DEFAULT_INFO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we preserve the caching behavior here too if we're doing it for the frame? Right now init is storing most_recent_info but it's never being used. I think just reverting the delta at these lines would suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
except (message.DecodeError, IOError, tf.errors.NotFoundError): | ||
return self.most_recent_frame | ||
|
||
with self._lock: # only to put boundaries on workload for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment - what does the lock accomplish? This code should be called synchronously from the _frame_generator()
helper so the locking only would come into play if we were somehow streaming via two requests at once, which shouldn't normally be possible in the front end logic. Also, if we did want to bound workload during multiple streams, it would make more sense to have the current frame fetching logic occurring in only a single thread. This change would instead result in two request threads competing to fetch frames, so you might end up sort of round-robining frames to the two streams which seems very weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_frame_generator()
is an endpoint for a multi-threaded HTTP server. I agree that a single thread should be what's computing this for all viewers. Might not be worth the effort quite yet, since it's likely there'll only be a single viewer. We don't have the resources quite yet to make Beholder scale. But we will in the future.
return frame | ||
|
||
except (message.DecodeError, IOError, tf.errors.NotFoundError): | ||
return self.most_recent_frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing most_recent_frame? That's a behavior change (missing data now immediately goes blank instead of preserving the last frame and section info on screen) and it's not clear to me that this is better for users? It doesn't seem necessary to fix the initialization logic; you can still keep most_recent_frame and if it's unset, then default to the no data PNG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
8a5d91c
to
769dd26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL @nfelt
except (message.DecodeError, IOError, tf.errors.NotFoundError): | ||
return self.most_recent_frame | ||
|
||
with self._lock: # only to put boundaries on workload for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_frame_generator()
is an endpoint for a multi-threaded HTTP server. I agree that a single thread should be what's computing this for all viewers. Might not be worth the effort quite yet, since it's likely there'll only be a single viewer. We don't have the resources quite yet to make Beholder scale. But we will in the future.
return frame | ||
|
||
except (message.DecodeError, IOError, tf.errors.NotFoundError): | ||
return self.most_recent_frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
tensorboard/util.py
Outdated
@@ -419,7 +419,8 @@ def _lazily_initialize(self): | |||
graph = tf.Graph() | |||
with graph.as_default(): | |||
self.initialize_graph() | |||
self._session = tf.Session(graph=graph) | |||
config = tf.ConfigProto(device_count={'GPU': 0}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment here noting that we're deliberately keeping this off the GPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -137,8 +135,7 @@ def _serve_change_config(self, request): | |||
def _serve_section_info(self, request): | |||
path = '{}/{}'.format( | |||
self.PLUGIN_LOGDIR, shared_config.SECTION_INFO_FILENAME) | |||
info = file_system_tools.read_pickle(path, default=self.most_recent_info) | |||
self.most_recent_info = info | |||
info = file_system_tools.read_pickle(path, default=DEFAULT_INFO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we preserve the caching behavior here too if we're doing it for the frame? Right now init is storing most_recent_info but it's never being used. I think just reverting the delta at these lines would suffice.
This change does some code cleanup on Beholder to fix tensorflow#1107, where invoking TensorFlow routines caused a nontrivial amount of GPU to be reserved.
This change does some code cleanup on Beholder to fix #1107, where invoking
TensorFlow routines caused a nontrivial amount of GPU to be reserved.