-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL activity pages (sessions, transactions and statements) are empty during an active workload #68602
Comments
requests when the API requests fails Previously, Statement Page and Transaction Page issues API requests in conmponentDidUpdate() React lifecycle hook. However, when the backend API returns an error, this lifecycle hook is constantly triggered which resulted in infinite loop. This crashes the entire frontend application. This commit removes API calls from within the componentDidUpdate() hooks. Resolves cockroachdb#68602 Release note: None
Spent some times investigation this. This is the result of backend not properly surfacing the error message to frontend. Specifically in this case, the intrusion was using an expired kubenet cookie, which resulted in a error in the backend DB Console proxy. (This can be reproduced using a hacked version of intrusion). As the result, it simply returns a HTTP response with 502 error code without any error message. This causes the frontend to surface an empty error string. However, our error handling is severely flawed here due to the API calls within PR here addressed that issue. |
Thanks for the update Archer! Should there even be an error message surfaced? What's the expected behavior? I would expect Statements/Transactions to still be populated in this view. Also did you forget to link the PR? |
This should definitely be an error. Though I am expecting the error should be coming from Console API backend.
Well, ideally the expected behavior is there's never going to be an error :P . But in this case, it's hard to say. If either the Console -> Intrusion connection or Intrusion -> Cluster breaks, this would means something really wrong is happening in the cloud. Last few times this happened we both declared incidents.
If the connection borked, we won't be able to get data from the backend. I suppose we can still load the page with our cached stale data until the cache TTL expires. But once the cache TTL expires there's not much we can do. Also we won't be able to change selected time ranges or go into statement detail page, since they would need additional API call. |
Chatted with @maryliag offline. To summarize:
@Annebirzin , do you mind create a design for the error box with a Retry button? |
Thanks Archer. I was more wondering why we're consistently getting this (unexpected) error related cookie expiration and how we can make this more resilient. So it seems like we'll be polling at a fixed interval which should reduce the likelihood of running into this error (expired cookie) but for edge cases, we need an error message and retry button. Correct? |
This was due to erroneous logic within the Intrusion, which was recently patched.
Hmm running at a fixed interval is not going to reduce the likelihood of running into this error, since it's something completely out of our control. This is more of a technical issue of how we are managing out API calls. Currently, we would attempt to issue an API call as soon as anything on the page has changed (e.g. filter, search bar, sorting order). This attempt is aborted if we have already requested for data from the API within last cache TTL (5 minutes). This means we would constantly attempt to issue API calls, keep getting aborted in a normal case. But in unhappy case, er would keep attempt to issue API calls, keep getting errors and retry them. This is extremely wasteful. The new approach would be instead of running this loop of:
We simply poll at a fixed interval, and pause the polling if an error is encountered. |
Got it! Thanks for providing the context Archer. |
Here are error states for the statements and transactions pages, including a link to reload the page: https://www.figma.com/file/MoSPFkEyJfGipCmQvvLg1a/21.2_SQL-obsrv_query-performance?node-id=3524%3A74970 Not sure if we want to be more specific around the messaging cc @ianjevans Also, does the same thing happen on the transaction page? |
Yes, same thing would also happen on the Transaction Page |
I have a slight preference for "error" over "issue" on wording, and we can be more clear that the error isn't in the cluster but the console. |
Sounds good, updated the designs to reflect those changes |
Resolves cockroachdb#68602 Release note (ui change): when an error is encoutered in the statement/ transaction/session page, user can now click on reload button to reload the page.
Resolves cockroachdb#68602 Release note (ui change): when an error is encoutered in the statement/ transaction/session page, user can now click on reload button to reload the page.
Resolves cockroachdb#68602 Release note (ui change): when an error is encoutered in the statement/ transaction/session page, user can now click on reload button to reload the page.
Resolves cockroachdb#68602 Release note (ui change): when an error is encoutered in the statement/ transaction/session page, user can now click on reload button to reload the page.
71846: ui: add reload button for Loading component r=maryliag a=Azhng Resolves #68602 https://user-images.githubusercontent.com/9267198/138362614-96f15559-9761-4c4b-8a47-2843cad7978d.mov Release note (ui change): when an error is encoutered in the statement/ transaction/session page, user can now click on reload button to reload the page. Co-authored-by: Azhng <[email protected]>
Resolves #68602 Release note (ui change): when an error is encoutered in the statement/ transaction/session page, user can now click on reload button to reload the page.
CC dedicated clusters show no activity in the Statements, Sessions, or Transactions pages within the CC console. There’s only a red info icon with no text. This was during an active workload (TPCC).
Note these pages were working after initially creating the cluster. Screenshot:
The text was updated successfully, but these errors were encountered: