ui: transaction fingerprints page updates data sporadically, ignores custom time interval #96186

ericharmeling · 2023-01-30T15:44:12Z

The Transaction Fingerprints overview page updates data sporadically, with no regard for the time interval. Something is broken in the table row population.

For example, the Execution Count increments at an impossible rate, while the same stat on the Statements Page remains the same:

https://www.loom.com/share/b331d10e58364848b2908c1dd49d8a6f

This behavior is the same, even when the time interval is fixed (i.e., there should be no change in data with a fixed start and end time).

This issue is present in both CC console and DB console.

Jira issue: CRDB-23991

ericharmeling · 2023-03-08T15:30:37Z

This issue encompasses #68375.

In addition to execution count, the following column values are affected:

Transaction Time
Contention Time
CPU Time
Max Memory

xinhaoz · 2023-03-09T15:34:36Z

I took a quick look. It's because in the render method for the txns table section of the page, we are aggregating the txn stats. That sounds fine, but in the mergeTransactionStats method which is used to perform the aggregation grouping on txnFIngerprintID, we take a shallow copy of the first txn in the txns data array to use as our base txn that we return.
So we proceed to inherit + mutate the stats_data of the first txn, which contains the stats like exec count in one field, and other things like txn fingerprint id, app etc. Then we set the stats field of stats_data to the aggregated result of the stats. This means for every re-render, the base txn stats we're using is that of the previous re-render, explaining the incrementing stats.
An easy backportable fix is just to not mutate the stats_data object of the first txn, just create a new obj:

const mergeTransactionStats = function (txns: Transaction[]): Transaction {
  if (txns.length === 0) {
    return null;
  }
  const txn = { ...txns[0], stats_data: { ...txns[0]?.stats_data } }; // Copy the stats_data object explicitly here.
  txn.stats_data.stats = combineTransactionStats(
    txns.map(t => t.stats_data.stats),
  );
  return txn;
};

xinhaoz · 2023-03-09T15:36:49Z

As an aside, the txns page needs some serious cleanup. It's gotten so messy to the point we can't even catch when we're mutating props we shouldn't be 😬

xinhaoz · 2023-03-09T16:31:57Z

I spoke to Eric and since I had a fix on my branch I took over issuing the PR for this. Kudos to him for narrowing it down to the bug happening in aggregation process though.

96967: changefeedccl: skip testing queries that are too slow as regular SQL r=[samiskin] a=HonoreDB TestChangefeedRandomExpressions was occasionally timing out when doing the regular SELECT query--it's tricky to get sqlsmith not to generate complex expressions that are likely to not be valid for changefeeds anyway, so this PR just skips predicates that take more than a second to process. Informs #96532. Release note: None 97860: jobs: add VIEWJOB global privilege, remove role option r=jayshrivastava a=jayshrivastava This change updates `VIEWJOB` to be a global privilege instead of a role option so that it can be inherited from roles to their members. Previously, `VIEWJOB` was a role option which could be granted to users. Now, `VIEWJOB` is a global privilege. Granting this privilege to a user or role has the syntax `GRANT SYSTEM VIEWJOB TO user`. Using `VIEWJOB` as a role option is deprecated. Note that the `VIEWJOB` role option was not included in any release so far. It was queued up to be released in 23.1, but was not. This change is also being queued for 23.1, so there should not be any backwards compatibility issues. Informs: #96382 Epic: None Release Note: None 98135: cdc: copy request body when registering schemas r=jayshrivastava a=jayshrivastava cdc: copy request body when registering schemas Previously, when the schema registry encountered an error when registering a schema, it would retry the request. The problem is that upon hitting an error, we clean the body before retrying. Retrying with an empty body results in a obscure error message. With this change, we now retry with the original request body so the original error is sustained. This change also adds the metric `changefeed.schema_registry.retry_count` which is a counter for the number of retries performed by the schema registry. Seeing nonzero values indicates that there is an issue with contacting the schema registry and/or registering schemas. Release note (ops change): A new metric `changefeed.schema_registry.retry_count` is added. This measures the number of request retries performed when sending requests to the schema registry. Observing a nonzero value may indicate improper configuration of the schema registry or changefeed parameters. Epic: None 98212: authors: add Mira Radeva to authors r=miraradeva a=miraradeva Release note: None Epic: None 98249: backupccl: incremental schedules always wait on_previous_running r=benbardin a=adityamaru An incremental backup schedule must always wait if there is a running job that was previously scheduled by this incremental schedule. This is because until the previous incremental backup job completes, all future incremental jobs will attempt to backup data from the same `StartTime` corresponding to the `EndTime` of the last incremental layer. In this case only the first incremental job to complete will succeed, while the remaining jobs will either be rejected or worse corrupt the chain of backups. This change overrides the Wait behaviour for an incremental schedule to always default to `wait` during schedule creation or in an alter statement. Note the user specified value will still be applied to the full backup schedule. Ideally we'd have a way to configure options for both the full and incremental schedule separately, in which case we could reject the `on_previous_running` configuration for incremental schedules. Until then this workaround will have to do and we should call out this known limitation. Fixes: #96110 Release note (enterprise change): backup schedules created or altered to have the option `on_previous_running` will have the full backup schedule created with the user specified option, but will override the incremental backup schedule to always default to `on_previous_running = wait`. This ensures correctness of the backup chains created by the incremental schedule by preventing duplicate incremental jobs from racing against each other. 98307: ui: fix txn aggregations in txns fingerprints page r=xinhaoz a=xinhaoz This commit addresses 2 issues on the txns overview page: 1. We were previously grouping txns by txn fingerprint id, agg time, agg interval, and app name. This is from a time when we wanted all these fields, but recently we only want to aggregate on txn fingerprint id. This commit changes the grouping to only the txn id. 2. Stats aggregation causing undesired data mutations: We were seeing that in the txns fingerprint page, stats columns would seemingly randomly continue to increase while on the page (e.g. exec count, bytes read). During stats aggregation after grouping by the fields mentioned above, we were using the first txn in the grouping as the base object for stats aggregation, meaning we inherited and mutated the stats object of that txn. Since we aggregate on every re-render, This meant that we were using the result of any previous aggregations as the base for our current aggregation in the re-render. This explains the never-ending incrementing stats. This commit addresses this bug by ensuring we don't re-use the stats object between re-renders by creating a new copy of the stats for every aggregation. Fixes: #96186 Fixes: #68375 Release note (bug fix): stats columns in txns fingerprint overview page does not continuously increment BEFORE https://www.loom.com/share/d9bbd98ced2742dd899031fbc16df6af AFTER https://www.loom.com/share/5407fbbad086404c8d9d63e7f5ef15dd 98321: backupccl: add restore/pause/tpce/80GB/aws/nodes=4/cpus=8 to aws nightlies r=lidorcarmel a=msbutler Epic: none Release note: None Co-authored-by: Aaron Zinger <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Mira Radeva <[email protected]> Co-authored-by: adityamaru <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]> Co-authored-by: Michael Butler <[email protected]>

This commit addresses 2 issues on the txns overview page: 1. We were previously grouping txns by txn fingerprint id, agg time, agg interval, and app name. This is from a time when we wanted all these fields, but recently we only want to aggregate on txn fingerprint id. This commit changes the grouping to only the txn id. 2. Stats aggregation causing undesired data mutations: We were seeing that in the txns fingerprint page, stats columns would seemingly randomly continue to increase while on the page (e.g. exec count, bytes read). During stats aggregation after grouping by the fields mentioned above, we were using the first txn in the grouping as the base object for stats aggregation, meaning we inherited and mutated the stats object of that txn. Since we aggregate on every re-render, This meant that we were using the result of any previous aggregations as the base for our current aggregation in the re-render. This explains the never-ending incrementing stats. This commit addresses this bug by ensuring we don't re-use the stats object between re-renders by creating a new copy of the stats for every aggregation. Fixes: #96186 Fixes: #68375 Release note (bug fix): stats columns in txns fingerprint overview page does not continuously increment

This commit addresses 2 issues on the txns overview page: 1. We were previously grouping txns by txn fingerprint id, agg time, agg interval, and app name. This is from a time when we wanted all these fields, but recently we only want to aggregate on txn fingerprint id. This commit changes the grouping to only the txn id. 2. Stats aggregation causing undesired data mutations: We were seeing that in the txns fingerprint page, stats columns would seemingly randomly continue to increase while on the page (e.g. exec count, bytes read). During stats aggregation after grouping by the fields mentioned above, we were using the first txn in the grouping as the base object for stats aggregation, meaning we inherited and mutated the stats object of that txn. Since we aggregate on every re-render, This meant that we were using the result of any previous aggregations as the base for our current aggregation in the re-render. This explains the never-ending incrementing stats. This commit addresses this bug by ensuring we don't re-use the stats object between re-renders by creating a new copy of the stats for every aggregation. Fixes: cockroachdb#96186 Fixes: cockroachdb#68375 Release note (bug fix): stats columns in txns fingerprint overview page does not continuously increment

ericharmeling added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-observability labels Jan 30, 2023

ericharmeling changed the title ~~ui: transaction fingerprints page updates data sporadically~~ ui: transaction fingerprints page updates data sporadically, ignores custom time interval Jan 30, 2023

ericharmeling self-assigned this Mar 8, 2023

xinhaoz mentioned this issue Mar 9, 2023

ui: fix txn aggregations in txns fingerprints page #98307

Merged

xinhaoz self-assigned this Mar 9, 2023

craig bot closed this as completed in a59d6b6 Mar 9, 2023

blathers-crl bot mentioned this issue Mar 9, 2023

release-22.2: ui: fix txn aggregations in txns fingerprints page #98336

Merged

exalate-issue-sync bot unassigned ericharmeling Mar 9, 2023

xinhaoz mentioned this issue Mar 23, 2023

release-22.1: ui: fix txn aggregations in txns fingerprints page #99405

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ui: transaction fingerprints page updates data sporadically, ignores custom time interval #96186

ui: transaction fingerprints page updates data sporadically, ignores custom time interval #96186

ericharmeling commented Jan 30, 2023 •

edited

Loading

ericharmeling commented Mar 8, 2023

xinhaoz commented Mar 9, 2023 •

edited

Loading

xinhaoz commented Mar 9, 2023

xinhaoz commented Mar 9, 2023

ui: transaction fingerprints page updates data sporadically, ignores custom time interval #96186

ui: transaction fingerprints page updates data sporadically, ignores custom time interval #96186

Comments

ericharmeling commented Jan 30, 2023 • edited Loading

ericharmeling commented Mar 8, 2023

xinhaoz commented Mar 9, 2023 • edited Loading

xinhaoz commented Mar 9, 2023

xinhaoz commented Mar 9, 2023

ericharmeling commented Jan 30, 2023 •

edited

Loading

xinhaoz commented Mar 9, 2023 •

edited

Loading