-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ui: transaction fingerprints page updates data sporadically, ignores custom time interval #96186
Comments
This issue encompasses #68375. In addition to execution count, the following column values are affected:
|
I took a quick look. It's because in the render method for the txns table section of the page, we are aggregating the txn stats. That sounds fine, but in the mergeTransactionStats method which is used to perform the aggregation grouping on txnFIngerprintID, we take a shallow copy of the first txn in the txns data array to use as our base txn that we return.
|
As an aside, the txns page needs some serious cleanup. It's gotten so messy to the point we can't even catch when we're mutating props we shouldn't be 😬 |
I spoke to Eric and since I had a fix on my branch I took over issuing the PR for this. Kudos to him for narrowing it down to the bug happening in aggregation process though. |
96967: changefeedccl: skip testing queries that are too slow as regular SQL r=[samiskin] a=HonoreDB TestChangefeedRandomExpressions was occasionally timing out when doing the regular SELECT query--it's tricky to get sqlsmith not to generate complex expressions that are likely to not be valid for changefeeds anyway, so this PR just skips predicates that take more than a second to process. Informs #96532. Release note: None 97860: jobs: add VIEWJOB global privilege, remove role option r=jayshrivastava a=jayshrivastava This change updates `VIEWJOB` to be a global privilege instead of a role option so that it can be inherited from roles to their members. Previously, `VIEWJOB` was a role option which could be granted to users. Now, `VIEWJOB` is a global privilege. Granting this privilege to a user or role has the syntax `GRANT SYSTEM VIEWJOB TO user`. Using `VIEWJOB` as a role option is deprecated. Note that the `VIEWJOB` role option was not included in any release so far. It was queued up to be released in 23.1, but was not. This change is also being queued for 23.1, so there should not be any backwards compatibility issues. Informs: #96382 Epic: None Release Note: None 98135: cdc: copy request body when registering schemas r=jayshrivastava a=jayshrivastava cdc: copy request body when registering schemas Previously, when the schema registry encountered an error when registering a schema, it would retry the request. The problem is that upon hitting an error, we clean the body before retrying. Retrying with an empty body results in a obscure error message. With this change, we now retry with the original request body so the original error is sustained. This change also adds the metric `changefeed.schema_registry.retry_count` which is a counter for the number of retries performed by the schema registry. Seeing nonzero values indicates that there is an issue with contacting the schema registry and/or registering schemas. Release note (ops change): A new metric `changefeed.schema_registry.retry_count` is added. This measures the number of request retries performed when sending requests to the schema registry. Observing a nonzero value may indicate improper configuration of the schema registry or changefeed parameters. Epic: None 98212: authors: add Mira Radeva to authors r=miraradeva a=miraradeva Release note: None Epic: None 98249: backupccl: incremental schedules always wait on_previous_running r=benbardin a=adityamaru An incremental backup schedule must always wait if there is a running job that was previously scheduled by this incremental schedule. This is because until the previous incremental backup job completes, all future incremental jobs will attempt to backup data from the same `StartTime` corresponding to the `EndTime` of the last incremental layer. In this case only the first incremental job to complete will succeed, while the remaining jobs will either be rejected or worse corrupt the chain of backups. This change overrides the Wait behaviour for an incremental schedule to always default to `wait` during schedule creation or in an alter statement. Note the user specified value will still be applied to the full backup schedule. Ideally we'd have a way to configure options for both the full and incremental schedule separately, in which case we could reject the `on_previous_running` configuration for incremental schedules. Until then this workaround will have to do and we should call out this known limitation. Fixes: #96110 Release note (enterprise change): backup schedules created or altered to have the option `on_previous_running` will have the full backup schedule created with the user specified option, but will override the incremental backup schedule to always default to `on_previous_running = wait`. This ensures correctness of the backup chains created by the incremental schedule by preventing duplicate incremental jobs from racing against each other. 98307: ui: fix txn aggregations in txns fingerprints page r=xinhaoz a=xinhaoz This commit addresses 2 issues on the txns overview page: 1. We were previously grouping txns by txn fingerprint id, agg time, agg interval, and app name. This is from a time when we wanted all these fields, but recently we only want to aggregate on txn fingerprint id. This commit changes the grouping to only the txn id. 2. Stats aggregation causing undesired data mutations: We were seeing that in the txns fingerprint page, stats columns would seemingly randomly continue to increase while on the page (e.g. exec count, bytes read). During stats aggregation after grouping by the fields mentioned above, we were using the first txn in the grouping as the base object for stats aggregation, meaning we inherited and mutated the stats object of that txn. Since we aggregate on every re-render, This meant that we were using the result of any previous aggregations as the base for our current aggregation in the re-render. This explains the never-ending incrementing stats. This commit addresses this bug by ensuring we don't re-use the stats object between re-renders by creating a new copy of the stats for every aggregation. Fixes: #96186 Fixes: #68375 Release note (bug fix): stats columns in txns fingerprint overview page does not continuously increment BEFORE https://www.loom.com/share/d9bbd98ced2742dd899031fbc16df6af AFTER https://www.loom.com/share/5407fbbad086404c8d9d63e7f5ef15dd 98321: backupccl: add restore/pause/tpce/80GB/aws/nodes=4/cpus=8 to aws nightlies r=lidorcarmel a=msbutler Epic: none Release note: None Co-authored-by: Aaron Zinger <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Mira Radeva <[email protected]> Co-authored-by: adityamaru <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]> Co-authored-by: Michael Butler <[email protected]>
This commit addresses 2 issues on the txns overview page: 1. We were previously grouping txns by txn fingerprint id, agg time, agg interval, and app name. This is from a time when we wanted all these fields, but recently we only want to aggregate on txn fingerprint id. This commit changes the grouping to only the txn id. 2. Stats aggregation causing undesired data mutations: We were seeing that in the txns fingerprint page, stats columns would seemingly randomly continue to increase while on the page (e.g. exec count, bytes read). During stats aggregation after grouping by the fields mentioned above, we were using the first txn in the grouping as the base object for stats aggregation, meaning we inherited and mutated the stats object of that txn. Since we aggregate on every re-render, This meant that we were using the result of any previous aggregations as the base for our current aggregation in the re-render. This explains the never-ending incrementing stats. This commit addresses this bug by ensuring we don't re-use the stats object between re-renders by creating a new copy of the stats for every aggregation. Fixes: #96186 Fixes: #68375 Release note (bug fix): stats columns in txns fingerprint overview page does not continuously increment
This commit addresses 2 issues on the txns overview page: 1. We were previously grouping txns by txn fingerprint id, agg time, agg interval, and app name. This is from a time when we wanted all these fields, but recently we only want to aggregate on txn fingerprint id. This commit changes the grouping to only the txn id. 2. Stats aggregation causing undesired data mutations: We were seeing that in the txns fingerprint page, stats columns would seemingly randomly continue to increase while on the page (e.g. exec count, bytes read). During stats aggregation after grouping by the fields mentioned above, we were using the first txn in the grouping as the base object for stats aggregation, meaning we inherited and mutated the stats object of that txn. Since we aggregate on every re-render, This meant that we were using the result of any previous aggregations as the base for our current aggregation in the re-render. This explains the never-ending incrementing stats. This commit addresses this bug by ensuring we don't re-use the stats object between re-renders by creating a new copy of the stats for every aggregation. Fixes: cockroachdb#96186 Fixes: cockroachdb#68375 Release note (bug fix): stats columns in txns fingerprint overview page does not continuously increment
The Transaction Fingerprints overview page updates data sporadically, with no regard for the time interval. Something is broken in the table row population.
For example, the Execution Count increments at an impossible rate, while the same stat on the Statements Page remains the same:
https://www.loom.com/share/b331d10e58364848b2908c1dd49d8a6f
This behavior is the same, even when the time interval is fixed (i.e., there should be no change in data with a fixed start and end time).
This issue is present in both CC console and DB console.
Jira issue: CRDB-23991
The text was updated successfully, but these errors were encountered: