-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the /data/plugins_listing response. #625
Labels
Comments
nfelt
added a commit
that referenced
this issue
Oct 19, 2017
This is a preliminary step in addressing #625 that just adds logging (at INFO) for the amount of time taken by the is_active() call for each plugin loaded by TensorBoard when the plugins listing endpoint is invoked. This way as we make fixes we can monitor for regressions, and it also can help isolate the problematic plugin if we get reports from users that TensorBoard is taking a long time to respond with the list of active plugins (which is a prerequisite for basically any other usage of TensorBoard). I also bundled in a couple small tweaks to other logging lines that can help diagnose slow plugin behavior.
nfelt
added a commit
that referenced
this issue
Oct 26, 2017
Addresses #625 by changing TextPlugin to avoid blocking its `is_active()` method on reading from the filesystem. Note that "reading from the filesystem" is not only reading once, but reading in all plugin assets from all runs, which could be dozens of reads per run, potentially over the network if using a remote filesystem. In my testing this could easily cause `is_active()` to take 30+ seconds that kind of scenario, which in turn blocks the `/plugins_listing` response for the whole of TensorBoard. Since the TextPlugin's `is_active()` is based on its `index_impl()` (which backs its `/tags` route) and the expensive filesystem-checking logic is shared, this change puts the entirety of `index_impl()` on a background thread. This way the `/tags` route can also benefit from not blocking. The semantics are now that `is_active()` will return False and `index_impl()` will return a placeholder empty response when first invoked, but they'll kick off at most one background thread at a time to do the computation and store the resulting index response. Once the plugin is detected to be active, `is_active()` will no longer kick off new background threads when invoked, but `index_impl()` will still kick off a new background thread when called (if one is not running already) so that refreshes will pick up new data as it gets written to the logdir.
jart
pushed a commit
to jart/tensorboard
that referenced
this issue
Oct 31, 2017
) Addresses tensorflow#625 by changing TextPlugin to avoid blocking its `is_active()` method on reading from the filesystem. Note that "reading from the filesystem" is not only reading once, but reading in all plugin assets from all runs, which could be dozens of reads per run, potentially over the network if using a remote filesystem. In my testing this could easily cause `is_active()` to take 30+ seconds that kind of scenario, which in turn blocks the `/plugins_listing` response for the whole of TensorBoard. Since the TextPlugin's `is_active()` is based on its `index_impl()` (which backs its `/tags` route) and the expensive filesystem-checking logic is shared, this change puts the entirety of `index_impl()` on a background thread. This way the `/tags` route can also benefit from not blocking. The semantics are now that `is_active()` will return False and `index_impl()` will return a placeholder empty response when first invoked, but they'll kick off at most one background thread at a time to do the computation and store the resulting index response. Once the plugin is detected to be active, `is_active()` will no longer kick off new background threads when invoked, but `index_impl()` will still kick off a new background thread when called (if one is not running already) so that refreshes will pick up new data as it gets written to the logdir.
jart
pushed a commit
to jart/tensorboard
that referenced
this issue
Oct 31, 2017
This is a preliminary step in addressing tensorflow#625 that just adds logging (at INFO) for the amount of time taken by the is_active() call for each plugin loaded by TensorBoard when the plugins listing endpoint is invoked. This way as we make fixes we can monitor for regressions, and it also can help isolate the problematic plugin if we get reports from users that TensorBoard is taking a long time to respond with the list of active plugins (which is a prerequisite for basically any other usage of TensorBoard). I also bundled in a couple small tweaks to other logging lines that can help diagnose slow plugin behavior.
jart
pushed a commit
that referenced
this issue
Nov 1, 2017
Addresses #625 by changing TextPlugin to avoid blocking its `is_active()` method on reading from the filesystem. Note that "reading from the filesystem" is not only reading once, but reading in all plugin assets from all runs, which could be dozens of reads per run, potentially over the network if using a remote filesystem. In my testing this could easily cause `is_active()` to take 30+ seconds that kind of scenario, which in turn blocks the `/plugins_listing` response for the whole of TensorBoard. Since the TextPlugin's `is_active()` is based on its `index_impl()` (which backs its `/tags` route) and the expensive filesystem-checking logic is shared, this change puts the entirety of `index_impl()` on a background thread. This way the `/tags` route can also benefit from not blocking. The semantics are now that `is_active()` will return False and `index_impl()` will return a placeholder empty response when first invoked, but they'll kick off at most one background thread at a time to do the computation and store the resulting index response. Once the plugin is detected to be active, `is_active()` will no longer kick off new background threads when invoked, but `index_impl()` will still kick off a new background thread when called (if one is not running already) so that refreshes will pick up new data as it gets written to the logdir.
jart
pushed a commit
that referenced
this issue
Nov 1, 2017
This is a preliminary step in addressing #625 that just adds logging (at INFO) for the amount of time taken by the is_active() call for each plugin loaded by TensorBoard when the plugins listing endpoint is invoked. This way as we make fixes we can monitor for regressions, and it also can help isolate the problematic plugin if we get reports from users that TensorBoard is taking a long time to respond with the list of active plugins (which is a prerequisite for basically any other usage of TensorBoard). I also bundled in a couple small tweaks to other logging lines that can help diagnose slow plugin behavior.
chihuahua
added a commit
that referenced
this issue
Mar 15, 2018
…st before checking plugin assets. (#1021) We make the text plugin's is_active method return true if it detects relevant data present within the multiplexer. Subsequently, there would be no more need to check for data stored within plugin assets. This solves an issue: The text plugin used to be inactive when the user initially loads into it because the thread for computing whether the plugin is active would not have completed executing yet. Part of #625. Test plan: Run the [text demo](https://github.com/tensorflow/tensorboard/blob/master/tensorboard/plugins/text/text_demo.py). Verify behavior. The text plugin should be active on the first load.
I think we can mark this done; we moved the expensive |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
TensorBoard only renders dashboards after it determines which ones are active. To do the latter, TensorBoard issues a request to the
/data/plugins_listing
endpoint.tensorboard/tensorboard/components/tf_tensorboard/tf-tensorboard.html
Line 717 in bb57d4c
Hence, the
/data/plugins_listing
endpoint must return quickly - any time we shave off of the response time maps directly to time reduced for loading TensorBoard.The server logic for that endpoint serially calls the
is_active
method for all plugins (active or not):tensorboard/tensorboard/backend/application.py
Line 227 in bb57d4c
In the past, optimizing the
is_active
method for plugins has made the difference between say a 1s load vs a 30s load ... vs a frontend that fails to load due to outlandish response times.Some Action Items
is_active
methods that check for presence of data should short-circuit when any data is found. See Make scalars is_active short circuit if apt #621 for example.is_active
methods that read from disk should consider not only short-circuiting, but also (1) computing on a separate thread and (2) caching after the plugin is found to be active. See Make is_active for projector return quickly #326 for example. Peruse the profile and text dashboards for potential optimizations.The text was updated successfully, but these errors were encountered: