Optimize the /data/plugins_listing response. #625

chihuahua · 2017-10-05T20:24:53Z

TensorBoard only renders dashboards after it determines which ones are active. To do the latter, TensorBoard issues a request to the /data/plugins_listing endpoint.

tensorboard/tensorboard/components/tf_tensorboard/tf-tensorboard.html

Line 717 in bb57d4c

.then(updateActiveDashboards, onFailure);

Hence, the /data/plugins_listing endpoint must return quickly - any time we shave off of the response time maps directly to time reduced for loading TensorBoard.

The server logic for that endpoint serially calls the is_active method for all plugins (active or not):

tensorboard/tensorboard/backend/application.py

Line 227 in bb57d4c

{plugin.plugin_name: plugin.is_active() for plugin in self._plugins},

In the past, optimizing the is_active method for plugins has made the difference between say a 1s load vs a 30s load ... vs a frontend that fails to load due to outlandish response times.

Some Action Items

is_active methods that check for presence of data should short-circuit when any data is found. See Make scalars is_active short circuit if apt #621 for example.
is_active methods that read from disk should consider not only short-circuiting, but also (1) computing on a separate thread and (2) caching after the plugin is found to be active. See Make is_active for projector return quickly #326 for example. Peruse the profile and text dashboards for potential optimizations.

The text was updated successfully, but these errors were encountered:

This is a preliminary step in addressing #625 that just adds logging (at INFO) for the amount of time taken by the is_active() call for each plugin loaded by TensorBoard when the plugins listing endpoint is invoked. This way as we make fixes we can monitor for regressions, and it also can help isolate the problematic plugin if we get reports from users that TensorBoard is taking a long time to respond with the list of active plugins (which is a prerequisite for basically any other usage of TensorBoard). I also bundled in a couple small tweaks to other logging lines that can help diagnose slow plugin behavior.

Addresses #625 by changing TextPlugin to avoid blocking its `is_active()` method on reading from the filesystem. Note that "reading from the filesystem" is not only reading once, but reading in all plugin assets from all runs, which could be dozens of reads per run, potentially over the network if using a remote filesystem. In my testing this could easily cause `is_active()` to take 30+ seconds that kind of scenario, which in turn blocks the `/plugins_listing` response for the whole of TensorBoard. Since the TextPlugin's `is_active()` is based on its `index_impl()` (which backs its `/tags` route) and the expensive filesystem-checking logic is shared, this change puts the entirety of `index_impl()` on a background thread. This way the `/tags` route can also benefit from not blocking. The semantics are now that `is_active()` will return False and `index_impl()` will return a placeholder empty response when first invoked, but they'll kick off at most one background thread at a time to do the computation and store the resulting index response. Once the plugin is detected to be active, `is_active()` will no longer kick off new background threads when invoked, but `index_impl()` will still kick off a new background thread when called (if one is not running already) so that refreshes will pick up new data as it gets written to the logdir.

) Addresses tensorflow#625 by changing TextPlugin to avoid blocking its `is_active()` method on reading from the filesystem. Note that "reading from the filesystem" is not only reading once, but reading in all plugin assets from all runs, which could be dozens of reads per run, potentially over the network if using a remote filesystem. In my testing this could easily cause `is_active()` to take 30+ seconds that kind of scenario, which in turn blocks the `/plugins_listing` response for the whole of TensorBoard. Since the TextPlugin's `is_active()` is based on its `index_impl()` (which backs its `/tags` route) and the expensive filesystem-checking logic is shared, this change puts the entirety of `index_impl()` on a background thread. This way the `/tags` route can also benefit from not blocking. The semantics are now that `is_active()` will return False and `index_impl()` will return a placeholder empty response when first invoked, but they'll kick off at most one background thread at a time to do the computation and store the resulting index response. Once the plugin is detected to be active, `is_active()` will no longer kick off new background threads when invoked, but `index_impl()` will still kick off a new background thread when called (if one is not running already) so that refreshes will pick up new data as it gets written to the logdir.

This is a preliminary step in addressing tensorflow#625 that just adds logging (at INFO) for the amount of time taken by the is_active() call for each plugin loaded by TensorBoard when the plugins listing endpoint is invoked. This way as we make fixes we can monitor for regressions, and it also can help isolate the problematic plugin if we get reports from users that TensorBoard is taking a long time to respond with the list of active plugins (which is a prerequisite for basically any other usage of TensorBoard). I also bundled in a couple small tweaks to other logging lines that can help diagnose slow plugin behavior.

Addresses #625 by changing TextPlugin to avoid blocking its `is_active()` method on reading from the filesystem. Note that "reading from the filesystem" is not only reading once, but reading in all plugin assets from all runs, which could be dozens of reads per run, potentially over the network if using a remote filesystem. In my testing this could easily cause `is_active()` to take 30+ seconds that kind of scenario, which in turn blocks the `/plugins_listing` response for the whole of TensorBoard. Since the TextPlugin's `is_active()` is based on its `index_impl()` (which backs its `/tags` route) and the expensive filesystem-checking logic is shared, this change puts the entirety of `index_impl()` on a background thread. This way the `/tags` route can also benefit from not blocking. The semantics are now that `is_active()` will return False and `index_impl()` will return a placeholder empty response when first invoked, but they'll kick off at most one background thread at a time to do the computation and store the resulting index response. Once the plugin is detected to be active, `is_active()` will no longer kick off new background threads when invoked, but `index_impl()` will still kick off a new background thread when called (if one is not running already) so that refreshes will pick up new data as it gets written to the logdir.

This is a preliminary step in addressing #625 that just adds logging (at INFO) for the amount of time taken by the is_active() call for each plugin loaded by TensorBoard when the plugins listing endpoint is invoked. This way as we make fixes we can monitor for regressions, and it also can help isolate the problematic plugin if we get reports from users that TensorBoard is taking a long time to respond with the list of active plugins (which is a prerequisite for basically any other usage of TensorBoard). I also bundled in a couple small tweaks to other logging lines that can help diagnose slow plugin behavior.

…st before checking plugin assets. (#1021) We make the text plugin's is_active method return true if it detects relevant data present within the multiplexer. Subsequently, there would be no more need to check for data stored within plugin assets. This solves an issue: The text plugin used to be inactive when the user initially loads into it because the thread for computing whether the plugin is active would not have completed executing yet. Part of #625. Test plan: Run the [text demo](https://github.com/tensorflow/tensorboard/blob/master/tensorboard/plugins/text/text_demo.py). Verify behavior. The text plugin should be active on the first load.

nfelt · 2019-12-17T23:19:14Z

I think we can mark this done; we moved the expensive is_active() calls into background threads. This introduces new problems as described in #2102 (plugins don't load initially and seem like they're just absent) but we can track that there.

chihuahua added the core:frontend label Oct 5, 2017

nfelt self-assigned this Oct 19, 2017

nfelt removed the core:frontend label Oct 19, 2017

nfelt mentioned this issue Oct 19, 2017

Log time taken by is_active() calls for plugin listing #662

Merged

nfelt added the core:backend label Oct 19, 2017

nfelt mentioned this issue Oct 20, 2017

Change TextPlugin to compute index_impl() asynchronously #663

Merged

chihuahua mentioned this issue Mar 5, 2018

Make text plugin's is_active method check for data in multiplexer first before checking plugin assets. #1021

Merged

nfelt mentioned this issue Aug 31, 2018

Allow profile plugin to search subdirectories. #1360

Closed

nfelt closed this as completed Dec 17, 2019

sanatmpa1 mentioned this issue Jan 4, 2022

The tensorboard is blocked by "/data/plugins_listing" #5481

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the /data/plugins_listing response. #625

Optimize the /data/plugins_listing response. #625

chihuahua commented Oct 5, 2017 •

edited

Loading

nfelt commented Dec 17, 2019

Optimize the /data/plugins_listing response. #625

Optimize the /data/plugins_listing response. #625

Comments

chihuahua commented Oct 5, 2017 • edited Loading

Some Action Items

nfelt commented Dec 17, 2019

chihuahua commented Oct 5, 2017 •

edited

Loading