server, ui: aggregate statements on period selected #76301

maryliag · 2022-02-09T16:00:27Z

Previously, the aggregation timestamp was used as part
of the key of a statement. The new query to collect
statement now ignores aggregation timestamp as keys.
This commit also removes the aggregated timestamp value on the UI,
since it no longer applies.

Fixes #74513

Release note (ui change): We don't show information about
aggregations timestamp in Statements overview and Details pages, since
now all the statement fingerprints are grouped inside the same
time selection.

cockroach-teamcity · 2022-02-09T16:00:34Z

This change is

matthewtodd

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @maryliag)

-- commits, line 7 at r1:
nits: "commits" -> "commit", "aggreagted" -> "aggregated"

-- commits, line 13 at r1:
nit: Maybe spell out "timestamps" here (for docs writers / future readers) instead of abbreviating "ts"?

pkg/server/combined_statement_stats.go, line 134 at r1 (raw file):

				transaction_fingerprint_id,
				app_name,
				max(aggregated_ts) as aggregated_ts,

I'm wondering if we want max or min here -- do we want to show the latest or the earliest timestamp?

pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):

				max(aggregated_ts) as aggregated_ts,
				fingerprint_id,
				max(metadata) as metadata,

I see we were already using max in collectCombinedStatements, but this has me wondering what the type of metadata is. From its name, I'm surprised that max is meaningful here, though maybe we just needed some way to say "take any one of them"?

Azhng

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @maryliag and @yuzefovich)

pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

I see we were already using max in collectCombinedStatements, but this has me wondering what the type of metadata is. From its name, I'm surprised that max is meaningful here, though maybe we just needed some way to say "take any one of them"?

Hmm I think in SQL Engine we have an any_not_null aggregation operator. However, it's used only internally and I don't think that's expose anywhere in the SQL CLI.

Second thoughts here, metadata column is a big JSON blob. Running max aggregation essentially performs ordering comparison between those JSON blob. However, since we know that for each distinct fingerprint_id, its metadata column should be identical. So the JSON ordering comparison is mostly not useful computation.

cc: @yuzefovich any thoughts we can better express this any_not_null semantics better here in this query?

pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

				fingerprint_id,
				max(metadata) as metadata,
				crdb_internal.merge_transaction_stats(array_agg(statistics)) AS statistics,

array_agg buffers all the rows in the same GROUP BY clause in-memory first before feeding them into merge_transaction_stats. So array_agg here effectively means that we are performing buffered aggregation. This might have some memory usage concerns here.

Previously, we used array_agg to perform aggregation across the statement/transaction stats from multiple nodes.
This was mostly fine since the amount of data is quite small. (e.g. we have maximum like 100 nodes or so, so this array is maximally size of 100-ish).

Now we are aggregating across a wider span of rows. We can do some quick math to get an idea how much data we are dealing with here, E.g. if user request statement statistics for the last month, assuming we are doing 1 hour agg interval, 30 days * 24 hours / days * 100 node * 1 row / hour = 72,000 rows (or size of the tree.Datums array when array_agg is executed). That's how much data we need to buffer per distinct statement stats. Optimistically, assuming each row takes 1 KiB, that means we buffering 72 MiB per distinct fingerprint. I'm afraid that at some point, the amount of data is just too large and SQL Engine would either decide to spill to disk (a better scenario) or just kill the query to protect itself from blowing up the memory.

Hmm so perhaps a better strategy here would be ditch the GROUP BY clause and use ORDER BY to get the the input to be in the correct order. Then we can perform streaming aggregation here without worrying about blowing up the memory.

cc: @yuzefovich also I want to poke your brain here a bit to see if my concern here is valid, and if there's a better strategy.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @matthewtodd)

pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Hmm I think in SQL Engine we have an any_not_null aggregation operator. However, it's used only internally and I don't think that's expose anywhere in the SQL CLI.

Second thoughts here, metadata column is a big JSON blob. Running max aggregation essentially performs ordering comparison between those JSON blob. However, since we know that for each distinct fingerprint_id, its metadata column should be identical. So the JSON ordering comparison is mostly not useful computation.

cc: @yuzefovich any thoughts we can better express this any_not_null semantics better here in this query?

any_not_null is an internal-only aggregate function that is used for columns from GROUP BY clause that need to be projected out (e.g. app_name in this query). Archer is right that using max will make us perform comparison of metadata objects within each aggregation bucket, and if all metadata values are the same, then it would be a redundant computation. We could include metadata in the GROUP BY clause and not use max, but then we'd have the same problem of redundant comparisons, just at a different stage (during grouping rather than during aggregation).

Unfortunately, I don't see a way to not perform these comparisons without allowing to use any_not_null aggregate explicitly in the query - that, however, might be a somewhat large change (the execution support is there, but the amount of changes in the optimizer is unknown to me).

I think using max or min is ok for now. I'd be curious to see the benchmarks of the performance overhead of having these redundant comparisons (i.e. try the query with and without metadata column) on a large dataset.

pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

array_agg buffers all the rows in the same GROUP BY clause in-memory first before feeding them into merge_transaction_stats. So array_agg here effectively means that we are performing buffered aggregation. This might have some memory usage concerns here.

Previously, we used array_agg to perform aggregation across the statement/transaction stats from multiple nodes.
This was mostly fine since the amount of data is quite small. (e.g. we have maximum like 100 nodes or so, so this array is maximally size of 100-ish).

Now we are aggregating across a wider span of rows. We can do some quick math to get an idea how much data we are dealing with here, E.g. if user request statement statistics for the last month, assuming we are doing 1 hour agg interval, 30 days * 24 hours / days * 100 node * 1 row / hour = 72,000 rows (or size of the tree.Datums array when array_agg is executed). That's how much data we need to buffer per distinct statement stats. Optimistically, assuming each row takes 1 KiB, that means we buffering 72 MiB per distinct fingerprint. I'm afraid that at some point, the amount of data is just too large and SQL Engine would either decide to spill to disk (a better scenario) or just kill the query to protect itself from blowing up the memory.

Hmm so perhaps a better strategy here would be ditch the GROUP BY clause and use ORDER BY to get the the input to be in the correct order. Then we can perform streaming aggregation here without worrying about blowing up the memory.

cc: @yuzefovich also I want to poke your brain here a bit to see if my concern here is valid, and if there's a better strategy.

Your description seemed reasonable to me, so I decided to take a look the EXPLAIN of the query:

EXPLAIN (TYPES)
  SELECT
    app_name,
    max(aggregated_ts) AS aggregated_ts,
    fingerprint_id,
    max(metadata) AS metadata,
    crdb_internal.merge_transaction_stats(array_agg(statistics)) AS statistics,
    aggregation_interval
  FROM
    crdb_internal.transaction_statistics
  GROUP BY
    app_name, fingerprint_id, aggregation_interval
  ORDER BY
    1, 2, 3
  LIMIT
    1;
                                                                                         info
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • render
  │ columns: (app_name string, aggregated_ts timestamptz, fingerprint_id bytes, metadata jsonb, statistics jsonb, aggregation_interval interval)
  │ estimated row count: 1 (missing stats)
  │ render statistics: (crdb_internal.merge_transaction_stats((array_agg)[jsonb[]]))[jsonb]
  │ render fingerprint_id: (fingerprint_id)[bytes]
  │ render app_name: (app_name)[string]
  │ render aggregation_interval: (aggregation_interval)[interval]
  │ render max: (max)[timestamptz]
  │ render max: (max)[jsonb]
  │
  └── • top-k
      │ columns: (fingerprint_id bytes, app_name string, aggregation_interval interval, max timestamptz, max jsonb, array_agg jsonb[])
      │ ordering: +app_name,+max,+fingerprint_id
      │ estimated row count: 1 (missing stats)
      │ order: +app_name,+max,+fingerprint_id
      │ k: 1
      │
      └── • group (hash)
          │ columns: (fingerprint_id bytes, app_name string, aggregation_interval interval, max timestamptz, max jsonb, array_agg jsonb[])
          │ estimated row count: 2,000 (missing stats)
          │ aggregate 0: max(aggregated_ts)
          │ aggregate 1: max(max)
          │ aggregate 2: array_agg(crdb_internal.merge_transaction_stats)
          │ group by: fingerprint_id, app_name, aggregation_interval
          │
          └── • render
              │ columns: ("crdb_internal.merge_transaction_stats" jsonb, aggregated_ts timestamptz, fingerprint_id bytes, app_name string, aggregation_interval interval, max jsonb)
              │ estimated row count: 2,000 (missing stats)
              │ render crdb_internal.merge_transaction_stats: (crdb_internal.merge_transaction_stats((array_agg)[jsonb[]]))[jsonb]
              │ render aggregated_ts: (aggregated_ts)[timestamptz]
              │ render fingerprint_id: (fingerprint_id)[bytes]
              │ render app_name: (app_name)[string]
              │ render aggregation_interval: (aggregation_interval)[interval]
              │ render max: (max)[jsonb]
              │
              └── • group (hash)
                  │ columns: (aggregated_ts timestamptz, fingerprint_id bytes, app_name string, aggregation_interval interval, max jsonb, array_agg jsonb[])
                  │ estimated row count: 2,000 (missing stats)
                  │ aggregate 0: max(metadata)
                  │ aggregate 1: array_agg(statistics)
                  │ group by: aggregated_ts, fingerprint_id, app_name, aggregation_interval
                  │
                  └── • union all
                      │ columns: (aggregated_ts timestamptz, fingerprint_id bytes, app_name string, metadata jsonb, statistics jsonb, aggregation_interval interval)
                      │ estimated row count: 2,000 (missing stats)
                      │
                      ├── • virtual table
                      │     columns: (aggregated_ts timestamptz, fingerprint_id bytes, app_name string, metadata jsonb, statistics jsonb, aggregation_interval interval)
                      │     estimated row count: 1,000 (missing stats)
                      │     table: cluster_transaction_statistics@primary
                      │
                      └── • scan
                            columns: (aggregated_ts timestamptz, fingerprint_id bytes, app_name string, metadata jsonb, statistics jsonb, agg_interval interval)
                            estimated row count: 1,000 (missing stats)
                            table: transaction_statistics@primary
                            spans: FULL SCAN

Note that we perform the aggregation twice: first on aggregated_ts, fingerprint_id, app_name, aggregation_interval, then we do crdb_internal.merge_transaction_stats, and then second aggregation on fingerprint_id, app_name, aggregation_interval. IIUC the concern is that the removal aggregated_ts from GROUP BY in this query made many more buckets, and for each grouping bucket we're accumulating []JSONB in-memory. However, as the plan shows we seem to be merging the stats after the first aggregation with ts in GROUP BY clause, so maybe the concern is not applicable?

I don't have any bright ideas at the moment, but I think it'd be worth at taking a look at EXPLAIN ANALYZE of the query before and after this change on a large dataset and see whether the performance and memory usage are reasonable.

Azhng

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @maryliag, @matthewtodd, and @yuzefovich)

pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

any_not_null is an internal-only aggregate function that is used for columns from GROUP BY clause that need to be projected out (e.g. app_name in this query). Archer is right that using max will make us perform comparison of metadata objects within each aggregation bucket, and if all metadata values are the same, then it would be a redundant computation. We could include metadata in the GROUP BY clause and not use max, but then we'd have the same problem of redundant comparisons, just at a different stage (during grouping rather than during aggregation).

Unfortunately, I don't see a way to not perform these comparisons without allowing to use any_not_null aggregate explicitly in the query - that, however, might be a somewhat large change (the execution support is there, but the amount of changes in the optimizer is unknown to me).

I think using max or min is ok for now. I'd be curious to see the benchmarks of the performance overhead of having these redundant comparisons (i.e. try the query with and without metadata column) on a large dataset.

Hmm interesting. 🤔

It seems like somehow slightly improved the memory usage (from 166 MB -> 138 MB) in the second aggregator. Though the execution time remained mostly the same.

Query Plan using max

Query Plan putting the metadata in the GROUP BY

pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

IIUC the concern is that the removal aggregated_ts from GROUP BY in this query made many more buckets

I think the number of buckets here is less of a concern, I'm more concerned that the size of each bucket now is a lot bigger.

Testing the query in the CLI, the second aggregator stage's

execution time went from 2 seconds -> 9 seconds
allocated memory went from 76 MiB -> 166 MiB

This is with about ~180k rows in the table. Thoughts on if this growth would be reasonable?

The URL output of the EXPLAIN ANALYZE (distsql) is attached below

Original Query

New Query

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @matthewtodd)

pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

IIUC the concern is that the removal aggregated_ts from GROUP BY in this query made many more buckets

I think the number of buckets here is less of a concern, I'm more concerned that the size of each bucket now is a lot bigger.

Testing the query in the CLI, the second aggregator stage's

execution time went from 2 seconds -> 9 seconds

allocated memory went from 76 MiB -> 166 MiB

This is with about ~180k rows in the table. Thoughts on if this growth would be reasonable?

The URL output of the EXPLAIN ANALYZE (distsql) is attached below

Original Query

New Query

Thanks for checking this out. I wanna play around a bit with the data set, do you mind sharing it somehow?

Azhng

deleted

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @matthewtodd)

pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Hmm interesting. 🤔

It seems like somehow slightly improved the memory usage (from 166 MB -> 138 MB) in the second aggregator. Though the execution time remained mostly the same.

Query Plan using max

Query Plan putting the metadata in the GROUP BY

I wouldn't look too much into the decrease in memory usage - in both cases we have to spill to disk, so the memory usage shouldn't be really a concern (and probably it vary somewhat).

pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

here's the zipped cockroach-data directories

Thanks! I'll take a closer look.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @matthewtodd)

pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Thanks! I'll take a closer look.

BTW does this data directory contain any confidential information? Is the link working only for folks in CRL?

Azhng

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @maryliag, @matthewtodd, and @yuzefovich)

pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

BTW does this data directory contain any confidential information? Is the link working only for folks in CRL?

No confidential information here. It's just a local database I used myself. Also the link is for CRL only.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @matthewtodd)

pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I wouldn't look too much into the decrease in memory usage - in both cases we have to spill to disk, so the memory usage shouldn't be really a concern (and probably it vary somewhat).

I made the change to allow the usage of any_not_null aggregate function explicitly, but it gives only a minor improvement (10% of the time of the second aggregation which is like 2% of the query latency overall):
max
metadata in GROUP BY
any_not_null

Given these observations, I think it's not worth making any_not_null non-internal. I'd probably choose the option of putting metadata in GROUP BY clause since overall it seems cleaner.

pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

No confidential information here. It's just a local database I used myself. Also the link is for CRL only.

Here the increase in memory usage is actually due to a bug - we are incorrectly limiting the size of the output batches in the hash aggregation, and I have a WIP fix in #76421. With that fix applied I get this.

I'm assuming this PR will not be backported? If not, then I think we should be good with the change to this query - the vectorized engine will spill to disk if necessary and (mostly) correctly accounts for the in-memory state, so we should not blow up.

maryliag

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @matthewtodd)

-- commits, line 7 at r1:

Previously, matthewtodd (Matthew Todd) wrote…

nits: "commits" -> "commit", "aggreagted" -> "aggregated"

Done

-- commits, line 13 at r1:

Previously, matthewtodd (Matthew Todd) wrote…

nit: Maybe spell out "timestamps" here (for docs writers / future readers) instead of abbreviating "ts"?

Done

pkg/server/combined_statement_stats.go, line 134 at r1 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

I'm wondering if we want max or min here -- do we want to show the latest or the earliest timestamp?

this one really doesn't matter, we won't use it anymore, but I wanted to still return something just in case, so I went with the latest since seeing the latest are priority.

pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I made the change to allow the usage of any_not_null aggregate function explicitly, but it gives only a minor improvement (10% of the time of the second aggregation which is like 2% of the query latency overall):
max
metadata in GROUP BY
any_not_null

Given these observations, I think it's not worth making any_not_null non-internal. I'd probably choose the option of putting metadata in GROUP BY clause since overall it seems cleaner.

Thank you for the input @yuzefovich and @Azhng . I made the change to move the metadata to the GROUP BY clause.

pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Here the increase in memory usage is actually due to a bug - we are incorrectly limiting the size of the output batches in the hash aggregation, and I have a WIP fix in #76421. With that fix applied I get this.

I'm assuming this PR will not be backported? If not, then I think we should be good with the change to this query - the vectorized engine will spill to disk if necessary and (mostly) correctly accounts for the in-memory state, so we should not blow up.

Indeed this won't be backported. I can keep an eye when you land your PR if indeed we had improvements here. Thank you for taking a look into it!

Previously, the aggregation timestamp was used as part of the key of a statement. The new query to collect statement now ignores aggregation timestamp as keys. This commit also removes the aggregated timestamp value on the UI, since it no longer applies. Fixes cockroachdb#74513 Release note (ui change): We don't show information about aggregations timestamp in Statements overview and Details pages, since now all the statement fingerprints are grouped inside the same time selection.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @matthewtodd)

pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

Indeed this won't be backported. I can keep an eye when you land your PR if indeed we had improvements here. Thank you for taking a look into it!

That bug is quite tricky to fix, so I filed #76464 to track it (I'll put my WIP to fix it aside for now). However, even with the bug present I think it's safe to proceed with this change. The behavior of the bug is such that we might create vectorized batches of sizes larger than we want; however, crucially we're still correctly accounting for that large size, so we should not OOM.

Azhng

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @matthewtodd)

pkg/server/combined_statement_stats.go, line 254 at r2 (raw file):

			FROM crdb_internal.transaction_statistics %s
			GROUP BY
				app_name,

Oh boy did we forget to aggregate by app_name this whole time 🤦‍♂️

maryliag

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @matthewtodd)

pkg/server/combined_statement_stats.go, line 254 at r2 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Oh boy did we forget to aggregate by app_name this whole time 🤦‍♂️

no, the interline view aggregates by app_name, I'm just continuing the group by here. This function didn't use the group by, that's probably why you got that impression here :)

maryliag · 2022-02-14T16:49:24Z

TFTR!
bors r+

craig · 2022-02-14T18:00:07Z

Build succeeded:

GitHub CI (Cockroach)

maryliag requested a review from a team February 9, 2022 16:00

matthewtodd suggested changes Feb 10, 2022

View reviewed changes

Azhng reviewed Feb 10, 2022

View reviewed changes

yuzefovich reviewed Feb 10, 2022

View reviewed changes

Azhng reviewed Feb 10, 2022

View reviewed changes

yuzefovich reviewed Feb 11, 2022

View reviewed changes

Azhng reviewed Feb 11, 2022

View reviewed changes

yuzefovich reviewed Feb 11, 2022

View reviewed changes

Azhng reviewed Feb 11, 2022

View reviewed changes

yuzefovich reviewed Feb 11, 2022

View reviewed changes

maryliag force-pushed the new-aggregation branch from e42a90b to 614a74c Compare February 11, 2022 14:21

maryliag commented Feb 11, 2022

View reviewed changes

maryliag force-pushed the new-aggregation branch from 614a74c to 71ef6b8 Compare February 11, 2022 14:30

yuzefovich reviewed Feb 12, 2022

View reviewed changes

Azhng approved these changes Feb 14, 2022

View reviewed changes

maryliag commented Feb 14, 2022

View reviewed changes

craig bot merged commit 3be8121 into cockroachdb:master Feb 14, 2022

cockroach-teamcity mentioned this pull request Feb 14, 2022

server, ui: aggregate statements on period selected cockroachdb/docs#13018

Closed

maryliag deleted the new-aggregation branch March 1, 2022 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server, ui: aggregate statements on period selected #76301

server, ui: aggregate statements on period selected #76301

maryliag commented Feb 9, 2022 •

edited

Loading

cockroach-teamcity commented Feb 9, 2022

matthewtodd left a comment

Azhng left a comment

yuzefovich left a comment

Azhng left a comment

yuzefovich left a comment

Azhng left a comment •

edited

Loading

yuzefovich left a comment

yuzefovich left a comment

Azhng left a comment

yuzefovich left a comment

maryliag left a comment

yuzefovich left a comment

Azhng left a comment

maryliag left a comment

maryliag commented Feb 14, 2022

craig bot commented Feb 14, 2022

server, ui: aggregate statements on period selected #76301

server, ui: aggregate statements on period selected #76301

Conversation

maryliag commented Feb 9, 2022 • edited Loading

cockroach-teamcity commented Feb 9, 2022

matthewtodd left a comment

Choose a reason for hiding this comment

Azhng left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

Azhng left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

Azhng left a comment • edited Loading

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

Azhng left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

maryliag left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

Azhng left a comment

Choose a reason for hiding this comment

maryliag left a comment

Choose a reason for hiding this comment

maryliag commented Feb 14, 2022

craig bot commented Feb 14, 2022

maryliag commented Feb 9, 2022 •

edited

Loading

Azhng left a comment •

edited

Loading