Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server, ui: aggregate statements on period selected #76301

Merged
merged 1 commit into from
Feb 14, 2022

Conversation

maryliag
Copy link
Contributor

@maryliag maryliag commented Feb 9, 2022

Previously, the aggregation timestamp was used as part
of the key of a statement. The new query to collect
statement now ignores aggregation timestamp as keys.
This commit also removes the aggregated timestamp value on the UI,
since it no longer applies.

Fixes #74513

Release note (ui change): We don't show information about
aggregations timestamp in Statements overview and Details pages, since
now all the statement fingerprints are grouped inside the same
time selection.

@maryliag maryliag requested a review from a team February 9, 2022 16:00
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@matthewtodd matthewtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @maryliag)


-- commits, line 7 at r1:
nits: "commits" -> "commit", "aggreagted" -> "aggregated"


-- commits, line 13 at r1:
nit: Maybe spell out "timestamps" here (for docs writers / future readers) instead of abbreviating "ts"?


pkg/server/combined_statement_stats.go, line 134 at r1 (raw file):

				transaction_fingerprint_id,
				app_name,
				max(aggregated_ts) as aggregated_ts,

I'm wondering if we want max or min here -- do we want to show the latest or the earliest timestamp?


pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):

				max(aggregated_ts) as aggregated_ts,
				fingerprint_id,
				max(metadata) as metadata,

I see we were already using max in collectCombinedStatements, but this has me wondering what the type of metadata is. From its name, I'm surprised that max is meaningful here, though maybe we just needed some way to say "take any one of them"?

Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @maryliag and @yuzefovich)


pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

I see we were already using max in collectCombinedStatements, but this has me wondering what the type of metadata is. From its name, I'm surprised that max is meaningful here, though maybe we just needed some way to say "take any one of them"?

Hmm I think in SQL Engine we have an any_not_null aggregation operator. However, it's used only internally and I don't think that's expose anywhere in the SQL CLI.

Second thoughts here, metadata column is a big JSON blob. Running max aggregation essentially performs ordering comparison between those JSON blob. However, since we know that for each distinct fingerprint_id, its metadata column should be identical. So the JSON ordering comparison is mostly not useful computation.

cc: @yuzefovich any thoughts we can better express this any_not_null semantics better here in this query?


pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

				fingerprint_id,
				max(metadata) as metadata,
				crdb_internal.merge_transaction_stats(array_agg(statistics)) AS statistics,

array_agg buffers all the rows in the same GROUP BY clause in-memory first before feeding them into merge_transaction_stats. So array_agg here effectively means that we are performing buffered aggregation. This might have some memory usage concerns here.

Previously, we used array_agg to perform aggregation across the statement/transaction stats from multiple nodes.
This was mostly fine since the amount of data is quite small. (e.g. we have maximum like 100 nodes or so, so this array is maximally size of 100-ish).

Now we are aggregating across a wider span of rows. We can do some quick math to get an idea how much data we are dealing with here, E.g. if user request statement statistics for the last month, assuming we are doing 1 hour agg interval, 30 days * 24 hours / days * 100 node * 1 row / hour = 72,000 rows (or size of the tree.Datums array when array_agg is executed). That's how much data we need to buffer per distinct statement stats. Optimistically, assuming each row takes 1 KiB, that means we buffering 72 MiB per distinct fingerprint. I'm afraid that at some point, the amount of data is just too large and SQL Engine would either decide to spill to disk (a better scenario) or just kill the query to protect itself from blowing up the memory.

Hmm so perhaps a better strategy here would be ditch the GROUP BY clause and use ORDER BY to get the the input to be in the correct order. Then we can perform streaming aggregation here without worrying about blowing up the memory.

cc: @yuzefovich also I want to poke your brain here a bit to see if my concern here is valid, and if there's a better strategy.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @matthewtodd)


pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Hmm I think in SQL Engine we have an any_not_null aggregation operator. However, it's used only internally and I don't think that's expose anywhere in the SQL CLI.

Second thoughts here, metadata column is a big JSON blob. Running max aggregation essentially performs ordering comparison between those JSON blob. However, since we know that for each distinct fingerprint_id, its metadata column should be identical. So the JSON ordering comparison is mostly not useful computation.

cc: @yuzefovich any thoughts we can better express this any_not_null semantics better here in this query?

any_not_null is an internal-only aggregate function that is used for columns from GROUP BY clause that need to be projected out (e.g. app_name in this query). Archer is right that using max will make us perform comparison of metadata objects within each aggregation bucket, and if all metadata values are the same, then it would be a redundant computation. We could include metadata in the GROUP BY clause and not use max, but then we'd have the same problem of redundant comparisons, just at a different stage (during grouping rather than during aggregation).

Unfortunately, I don't see a way to not perform these comparisons without allowing to use any_not_null aggregate explicitly in the query - that, however, might be a somewhat large change (the execution support is there, but the amount of changes in the optimizer is unknown to me).

I think using max or min is ok for now. I'd be curious to see the benchmarks of the performance overhead of having these redundant comparisons (i.e. try the query with and without metadata column) on a large dataset.


pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

array_agg buffers all the rows in the same GROUP BY clause in-memory first before feeding them into merge_transaction_stats. So array_agg here effectively means that we are performing buffered aggregation. This might have some memory usage concerns here.

Previously, we used array_agg to perform aggregation across the statement/transaction stats from multiple nodes.
This was mostly fine since the amount of data is quite small. (e.g. we have maximum like 100 nodes or so, so this array is maximally size of 100-ish).

Now we are aggregating across a wider span of rows. We can do some quick math to get an idea how much data we are dealing with here, E.g. if user request statement statistics for the last month, assuming we are doing 1 hour agg interval, 30 days * 24 hours / days * 100 node * 1 row / hour = 72,000 rows (or size of the tree.Datums array when array_agg is executed). That's how much data we need to buffer per distinct statement stats. Optimistically, assuming each row takes 1 KiB, that means we buffering 72 MiB per distinct fingerprint. I'm afraid that at some point, the amount of data is just too large and SQL Engine would either decide to spill to disk (a better scenario) or just kill the query to protect itself from blowing up the memory.

Hmm so perhaps a better strategy here would be ditch the GROUP BY clause and use ORDER BY to get the the input to be in the correct order. Then we can perform streaming aggregation here without worrying about blowing up the memory.

cc: @yuzefovich also I want to poke your brain here a bit to see if my concern here is valid, and if there's a better strategy.

Your description seemed reasonable to me, so I decided to take a look the EXPLAIN of the query:

EXPLAIN (TYPES)
  SELECT
    app_name,
    max(aggregated_ts) AS aggregated_ts,
    fingerprint_id,
    max(metadata) AS metadata,
    crdb_internal.merge_transaction_stats(array_agg(statistics)) AS statistics,
    aggregation_interval
  FROM
    crdb_internal.transaction_statistics
  GROUP BY
    app_name, fingerprint_id, aggregation_interval
  ORDER BY
    1, 2, 3
  LIMIT
    1;
                                                                                         info
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • render
  │ columns: (app_name string, aggregated_ts timestamptz, fingerprint_id bytes, metadata jsonb, statistics jsonb, aggregation_interval interval)
  │ estimated row count: 1 (missing stats)
  │ render statistics: (crdb_internal.merge_transaction_stats((array_agg)[jsonb[]]))[jsonb]
  │ render fingerprint_id: (fingerprint_id)[bytes]
  │ render app_name: (app_name)[string]
  │ render aggregation_interval: (aggregation_interval)[interval]
  │ render max: (max)[timestamptz]
  │ render max: (max)[jsonb]
  │
  └── • top-k
      │ columns: (fingerprint_id bytes, app_name string, aggregation_interval interval, max timestamptz, max jsonb, array_agg jsonb[])
      │ ordering: +app_name,+max,+fingerprint_id
      │ estimated row count: 1 (missing stats)
      │ order: +app_name,+max,+fingerprint_id
      │ k: 1
      │
      └── • group (hash)
          │ columns: (fingerprint_id bytes, app_name string, aggregation_interval interval, max timestamptz, max jsonb, array_agg jsonb[])
          │ estimated row count: 2,000 (missing stats)
          │ aggregate 0: max(aggregated_ts)
          │ aggregate 1: max(max)
          │ aggregate 2: array_agg(crdb_internal.merge_transaction_stats)
          │ group by: fingerprint_id, app_name, aggregation_interval
          │
          └── • render
              │ columns: ("crdb_internal.merge_transaction_stats" jsonb, aggregated_ts timestamptz, fingerprint_id bytes, app_name string, aggregation_interval interval, max jsonb)
              │ estimated row count: 2,000 (missing stats)
              │ render crdb_internal.merge_transaction_stats: (crdb_internal.merge_transaction_stats((array_agg)[jsonb[]]))[jsonb]
              │ render aggregated_ts: (aggregated_ts)[timestamptz]
              │ render fingerprint_id: (fingerprint_id)[bytes]
              │ render app_name: (app_name)[string]
              │ render aggregation_interval: (aggregation_interval)[interval]
              │ render max: (max)[jsonb]
              │
              └── • group (hash)
                  │ columns: (aggregated_ts timestamptz, fingerprint_id bytes, app_name string, aggregation_interval interval, max jsonb, array_agg jsonb[])
                  │ estimated row count: 2,000 (missing stats)
                  │ aggregate 0: max(metadata)
                  │ aggregate 1: array_agg(statistics)
                  │ group by: aggregated_ts, fingerprint_id, app_name, aggregation_interval
                  │
                  └── • union all
                      │ columns: (aggregated_ts timestamptz, fingerprint_id bytes, app_name string, metadata jsonb, statistics jsonb, aggregation_interval interval)
                      │ estimated row count: 2,000 (missing stats)
                      │
                      ├── • virtual table
                      │     columns: (aggregated_ts timestamptz, fingerprint_id bytes, app_name string, metadata jsonb, statistics jsonb, aggregation_interval interval)
                      │     estimated row count: 1,000 (missing stats)
                      │     table: cluster_transaction_statistics@primary
                      │
                      └── • scan
                            columns: (aggregated_ts timestamptz, fingerprint_id bytes, app_name string, metadata jsonb, statistics jsonb, agg_interval interval)
                            estimated row count: 1,000 (missing stats)
                            table: transaction_statistics@primary
                            spans: FULL SCAN

Note that we perform the aggregation twice: first on aggregated_ts, fingerprint_id, app_name, aggregation_interval, then we do crdb_internal.merge_transaction_stats, and then second aggregation on fingerprint_id, app_name, aggregation_interval. IIUC the concern is that the removal aggregated_ts from GROUP BY in this query made many more buckets, and for each grouping bucket we're accumulating []JSONB in-memory. However, as the plan shows we seem to be merging the stats after the first aggregation with ts in GROUP BY clause, so maybe the concern is not applicable?

I don't have any bright ideas at the moment, but I think it'd be worth at taking a look at EXPLAIN ANALYZE of the query before and after this change on a large dataset and see whether the performance and memory usage are reasonable.

Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @maryliag, @matthewtodd, and @yuzefovich)


pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

any_not_null is an internal-only aggregate function that is used for columns from GROUP BY clause that need to be projected out (e.g. app_name in this query). Archer is right that using max will make us perform comparison of metadata objects within each aggregation bucket, and if all metadata values are the same, then it would be a redundant computation. We could include metadata in the GROUP BY clause and not use max, but then we'd have the same problem of redundant comparisons, just at a different stage (during grouping rather than during aggregation).

Unfortunately, I don't see a way to not perform these comparisons without allowing to use any_not_null aggregate explicitly in the query - that, however, might be a somewhat large change (the execution support is there, but the amount of changes in the optimizer is unknown to me).

I think using max or min is ok for now. I'd be curious to see the benchmarks of the performance overhead of having these redundant comparisons (i.e. try the query with and without metadata column) on a large dataset.

Hmm interesting. 🤔

It seems like somehow slightly improved the memory usage (from 166 MB -> 138 MB) in the second aggregator. Though the execution time remained mostly the same.

Query Plan using max

Query Plan putting the metadata in the GROUP BY


pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

IIUC the concern is that the removal aggregated_ts from GROUP BY in this query made many more buckets

I think the number of buckets here is less of a concern, I'm more concerned that the size of each bucket now is a lot bigger.

Testing the query in the CLI, the second aggregator stage's

  • execution time went from 2 seconds -> 9 seconds
  • allocated memory went from 76 MiB -> 166 MiB

This is with about ~180k rows in the table. Thoughts on if this growth would be reasonable?

The URL output of the EXPLAIN ANALYZE (distsql) is attached below

Original Query

New Query

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @matthewtodd)


pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

IIUC the concern is that the removal aggregated_ts from GROUP BY in this query made many more buckets

I think the number of buckets here is less of a concern, I'm more concerned that the size of each bucket now is a lot bigger.

Testing the query in the CLI, the second aggregator stage's

  • execution time went from 2 seconds -> 9 seconds
  • allocated memory went from 76 MiB -> 166 MiB

This is with about ~180k rows in the table. Thoughts on if this growth would be reasonable?

The URL output of the EXPLAIN ANALYZE (distsql) is attached below

Original Query

New Query

Thanks for checking this out. I wanna play around a bit with the data set, do you mind sharing it somehow?

Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @matthewtodd)


pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Hmm interesting. 🤔

It seems like somehow slightly improved the memory usage (from 166 MB -> 138 MB) in the second aggregator. Though the execution time remained mostly the same.

Query Plan using max

Query Plan putting the metadata in the GROUP BY

I wouldn't look too much into the decrease in memory usage - in both cases we have to spill to disk, so the memory usage shouldn't be really a concern (and probably it vary somewhat).


pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

here's the zipped cockroach-data directories

Thanks! I'll take a closer look.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @matthewtodd)


pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Thanks! I'll take a closer look.

BTW does this data directory contain any confidential information? Is the link working only for folks in CRL?

Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @maryliag, @matthewtodd, and @yuzefovich)


pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

BTW does this data directory contain any confidential information? Is the link working only for folks in CRL?

No confidential information here. It's just a local database I used myself. Also the link is for CRL only.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @matthewtodd)


pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I wouldn't look too much into the decrease in memory usage - in both cases we have to spill to disk, so the memory usage shouldn't be really a concern (and probably it vary somewhat).

I made the change to allow the usage of any_not_null aggregate function explicitly, but it gives only a minor improvement (10% of the time of the second aggregation which is like 2% of the query latency overall):
max
metadata in GROUP BY
any_not_null

Given these observations, I think it's not worth making any_not_null non-internal. I'd probably choose the option of putting metadata in GROUP BY clause since overall it seems cleaner.


pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

No confidential information here. It's just a local database I used myself. Also the link is for CRL only.

Here the increase in memory usage is actually due to a bug - we are incorrectly limiting the size of the output batches in the hash aggregation, and I have a WIP fix in #76421. With that fix applied I get this.

I'm assuming this PR will not be backported? If not, then I think we should be good with the change to this query - the vectorized engine will spill to disk if necessary and (mostly) correctly accounts for the in-memory state, so we should not blow up.

Copy link
Contributor Author

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @matthewtodd)


-- commits, line 7 at r1:

Previously, matthewtodd (Matthew Todd) wrote…

nits: "commits" -> "commit", "aggreagted" -> "aggregated"

Done


-- commits, line 13 at r1:

Previously, matthewtodd (Matthew Todd) wrote…

nit: Maybe spell out "timestamps" here (for docs writers / future readers) instead of abbreviating "ts"?

Done


pkg/server/combined_statement_stats.go, line 134 at r1 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

I'm wondering if we want max or min here -- do we want to show the latest or the earliest timestamp?

this one really doesn't matter, we won't use it anymore, but I wanted to still return something just in case, so I went with the latest since seeing the latest are priority.


pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I made the change to allow the usage of any_not_null aggregate function explicitly, but it gives only a minor improvement (10% of the time of the second aggregation which is like 2% of the query latency overall):
max
metadata in GROUP BY
any_not_null

Given these observations, I think it's not worth making any_not_null non-internal. I'd probably choose the option of putting metadata in GROUP BY clause since overall it seems cleaner.

Thank you for the input @yuzefovich and @Azhng . I made the change to move the metadata to the GROUP BY clause.


pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Here the increase in memory usage is actually due to a bug - we are incorrectly limiting the size of the output batches in the hash aggregation, and I have a WIP fix in #76421. With that fix applied I get this.

I'm assuming this PR will not be backported? If not, then I think we should be good with the change to this query - the vectorized engine will spill to disk if necessary and (mostly) correctly accounts for the in-memory state, so we should not blow up.

Indeed this won't be backported. I can keep an eye when you land your PR if indeed we had improvements here. Thank you for taking a look into it!

Previously, the aggregation timestamp was used as part
of the key of a statement. The new query to collect
statement now ignores aggregation timestamp as keys.
This commit also removes the aggregated timestamp value
on the UI, since it no longer applies.

Fixes cockroachdb#74513

Release note (ui change): We don't show information about
aggregations timestamp in Statements overview and Details pages,
since now all the statement fingerprints are grouped inside the same
time selection.
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @matthewtodd)


pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

Indeed this won't be backported. I can keep an eye when you land your PR if indeed we had improvements here. Thank you for taking a look into it!

That bug is quite tricky to fix, so I filed #76464 to track it (I'll put my WIP to fix it aside for now). However, even with the bug present I think it's safe to proceed with this change. The behavior of the bug is such that we might create vectorized batches of sizes larger than we want; however, crucially we're still correctly accounting for that large size, so we should not OOM.

Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @matthewtodd)


pkg/server/combined_statement_stats.go, line 254 at r2 (raw file):

			FROM crdb_internal.transaction_statistics %s
			GROUP BY
				app_name,

Oh boy did we forget to aggregate by app_name this whole time 🤦‍♂️

Copy link
Contributor Author

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @matthewtodd)


pkg/server/combined_statement_stats.go, line 254 at r2 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Oh boy did we forget to aggregate by app_name this whole time 🤦‍♂️

no, the interline view aggregates by app_name, I'm just continuing the group by here. This function didn't use the group by, that's probably why you got that impression here :)

@maryliag
Copy link
Contributor Author

TFTR!
bors r+

@craig
Copy link
Contributor

craig bot commented Feb 14, 2022

Build succeeded:

@craig craig bot merged commit 3be8121 into cockroachdb:master Feb 14, 2022
@maryliag maryliag deleted the new-aggregation branch March 1, 2022 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modify the Statements and Transactions tables to aggregate rows across aggregation intervals
5 participants