-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server, ui: aggregate statements on period selected #76301
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @maryliag)
-- commits, line 7 at r1:
nits: "commits" -> "commit", "aggreagted" -> "aggregated"
-- commits, line 13 at r1:
nit: Maybe spell out "timestamps" here (for docs writers / future readers) instead of abbreviating "ts"?
pkg/server/combined_statement_stats.go, line 134 at r1 (raw file):
transaction_fingerprint_id, app_name, max(aggregated_ts) as aggregated_ts,
I'm wondering if we want max
or min
here -- do we want to show the latest or the earliest timestamp?
pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):
max(aggregated_ts) as aggregated_ts, fingerprint_id, max(metadata) as metadata,
I see we were already using max
in collectCombinedStatements
, but this has me wondering what the type of metadata
is. From its name, I'm surprised that max
is meaningful here, though maybe we just needed some way to say "take any one of them"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @maryliag and @yuzefovich)
pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
I see we were already using
max
incollectCombinedStatements
, but this has me wondering what the type ofmetadata
is. From its name, I'm surprised thatmax
is meaningful here, though maybe we just needed some way to say "take any one of them"?
Hmm I think in SQL Engine we have an any_not_null
aggregation operator. However, it's used only internally and I don't think that's expose anywhere in the SQL CLI.
Second thoughts here, metadata
column is a big JSON blob. Running max
aggregation essentially performs ordering comparison between those JSON blob. However, since we know that for each distinct fingerprint_id
, its metadata
column should be identical. So the JSON ordering comparison is mostly not useful computation.
cc: @yuzefovich any thoughts we can better express this any_not_null
semantics better here in this query?
pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):
fingerprint_id, max(metadata) as metadata, crdb_internal.merge_transaction_stats(array_agg(statistics)) AS statistics,
array_agg
buffers all the rows in the same GROUP BY
clause in-memory first before feeding them into merge_transaction_stats
. So array_agg
here effectively means that we are performing buffered aggregation. This might have some memory usage concerns here.
Previously, we used array_agg
to perform aggregation across the statement/transaction stats from multiple nodes.
This was mostly fine since the amount of data is quite small. (e.g. we have maximum like 100 nodes or so, so this array is maximally size of 100-ish).
Now we are aggregating across a wider span of rows. We can do some quick math to get an idea how much data we are dealing with here, E.g. if user request statement statistics for the last month, assuming we are doing 1 hour agg interval, 30 days * 24 hours / days * 100 node * 1 row / hour = 72,000 rows (or size of the tree.Datums
array when array_agg
is executed). That's how much data we need to buffer per distinct statement stats. Optimistically, assuming each row takes 1 KiB, that means we buffering 72 MiB per distinct fingerprint. I'm afraid that at some point, the amount of data is just too large and SQL Engine would either decide to spill to disk (a better scenario) or just kill the query to protect itself from blowing up the memory.
Hmm so perhaps a better strategy here would be ditch the GROUP BY
clause and use ORDER BY
to get the the input to be in the correct order. Then we can perform streaming aggregation here without worrying about blowing up the memory.
cc: @yuzefovich also I want to poke your brain here a bit to see if my concern here is valid, and if there's a better strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @matthewtodd)
pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Hmm I think in SQL Engine we have an
any_not_null
aggregation operator. However, it's used only internally and I don't think that's expose anywhere in the SQL CLI.Second thoughts here,
metadata
column is a big JSON blob. Runningmax
aggregation essentially performs ordering comparison between those JSON blob. However, since we know that for each distinctfingerprint_id
, itsmetadata
column should be identical. So the JSON ordering comparison is mostly not useful computation.cc: @yuzefovich any thoughts we can better express this
any_not_null
semantics better here in this query?
any_not_null
is an internal-only aggregate function that is used for columns from GROUP BY clause that need to be projected out (e.g. app_name
in this query). Archer is right that using max
will make us perform comparison of metadata
objects within each aggregation bucket, and if all metadata
values are the same, then it would be a redundant computation. We could include metadata
in the GROUP BY clause and not use max
, but then we'd have the same problem of redundant comparisons, just at a different stage (during grouping rather than during aggregation).
Unfortunately, I don't see a way to not perform these comparisons without allowing to use any_not_null
aggregate explicitly in the query - that, however, might be a somewhat large change (the execution support is there, but the amount of changes in the optimizer is unknown to me).
I think using max
or min
is ok for now. I'd be curious to see the benchmarks of the performance overhead of having these redundant comparisons (i.e. try the query with and without metadata
column) on a large dataset.
pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):
Previously, Azhng (Archer Zhang) wrote…
array_agg
buffers all the rows in the sameGROUP BY
clause in-memory first before feeding them intomerge_transaction_stats
. Soarray_agg
here effectively means that we are performing buffered aggregation. This might have some memory usage concerns here.Previously, we used
array_agg
to perform aggregation across the statement/transaction stats from multiple nodes.
This was mostly fine since the amount of data is quite small. (e.g. we have maximum like 100 nodes or so, so this array is maximally size of 100-ish).Now we are aggregating across a wider span of rows. We can do some quick math to get an idea how much data we are dealing with here, E.g. if user request statement statistics for the last month, assuming we are doing 1 hour agg interval, 30 days * 24 hours / days * 100 node * 1 row / hour = 72,000 rows (or size of the
tree.Datums
array whenarray_agg
is executed). That's how much data we need to buffer per distinct statement stats. Optimistically, assuming each row takes 1 KiB, that means we buffering 72 MiB per distinct fingerprint. I'm afraid that at some point, the amount of data is just too large and SQL Engine would either decide to spill to disk (a better scenario) or just kill the query to protect itself from blowing up the memory.Hmm so perhaps a better strategy here would be ditch the
GROUP BY
clause and useORDER BY
to get the the input to be in the correct order. Then we can perform streaming aggregation here without worrying about blowing up the memory.cc: @yuzefovich also I want to poke your brain here a bit to see if my concern here is valid, and if there's a better strategy.
Your description seemed reasonable to me, so I decided to take a look the EXPLAIN of the query:
EXPLAIN (TYPES)
SELECT
app_name,
max(aggregated_ts) AS aggregated_ts,
fingerprint_id,
max(metadata) AS metadata,
crdb_internal.merge_transaction_stats(array_agg(statistics)) AS statistics,
aggregation_interval
FROM
crdb_internal.transaction_statistics
GROUP BY
app_name, fingerprint_id, aggregation_interval
ORDER BY
1, 2, 3
LIMIT
1;
info
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
distribution: local
vectorized: true
• render
│ columns: (app_name string, aggregated_ts timestamptz, fingerprint_id bytes, metadata jsonb, statistics jsonb, aggregation_interval interval)
│ estimated row count: 1 (missing stats)
│ render statistics: (crdb_internal.merge_transaction_stats((array_agg)[jsonb[]]))[jsonb]
│ render fingerprint_id: (fingerprint_id)[bytes]
│ render app_name: (app_name)[string]
│ render aggregation_interval: (aggregation_interval)[interval]
│ render max: (max)[timestamptz]
│ render max: (max)[jsonb]
│
└── • top-k
│ columns: (fingerprint_id bytes, app_name string, aggregation_interval interval, max timestamptz, max jsonb, array_agg jsonb[])
│ ordering: +app_name,+max,+fingerprint_id
│ estimated row count: 1 (missing stats)
│ order: +app_name,+max,+fingerprint_id
│ k: 1
│
└── • group (hash)
│ columns: (fingerprint_id bytes, app_name string, aggregation_interval interval, max timestamptz, max jsonb, array_agg jsonb[])
│ estimated row count: 2,000 (missing stats)
│ aggregate 0: max(aggregated_ts)
│ aggregate 1: max(max)
│ aggregate 2: array_agg(crdb_internal.merge_transaction_stats)
│ group by: fingerprint_id, app_name, aggregation_interval
│
└── • render
│ columns: ("crdb_internal.merge_transaction_stats" jsonb, aggregated_ts timestamptz, fingerprint_id bytes, app_name string, aggregation_interval interval, max jsonb)
│ estimated row count: 2,000 (missing stats)
│ render crdb_internal.merge_transaction_stats: (crdb_internal.merge_transaction_stats((array_agg)[jsonb[]]))[jsonb]
│ render aggregated_ts: (aggregated_ts)[timestamptz]
│ render fingerprint_id: (fingerprint_id)[bytes]
│ render app_name: (app_name)[string]
│ render aggregation_interval: (aggregation_interval)[interval]
│ render max: (max)[jsonb]
│
└── • group (hash)
│ columns: (aggregated_ts timestamptz, fingerprint_id bytes, app_name string, aggregation_interval interval, max jsonb, array_agg jsonb[])
│ estimated row count: 2,000 (missing stats)
│ aggregate 0: max(metadata)
│ aggregate 1: array_agg(statistics)
│ group by: aggregated_ts, fingerprint_id, app_name, aggregation_interval
│
└── • union all
│ columns: (aggregated_ts timestamptz, fingerprint_id bytes, app_name string, metadata jsonb, statistics jsonb, aggregation_interval interval)
│ estimated row count: 2,000 (missing stats)
│
├── • virtual table
│ columns: (aggregated_ts timestamptz, fingerprint_id bytes, app_name string, metadata jsonb, statistics jsonb, aggregation_interval interval)
│ estimated row count: 1,000 (missing stats)
│ table: cluster_transaction_statistics@primary
│
└── • scan
columns: (aggregated_ts timestamptz, fingerprint_id bytes, app_name string, metadata jsonb, statistics jsonb, agg_interval interval)
estimated row count: 1,000 (missing stats)
table: transaction_statistics@primary
spans: FULL SCAN
Note that we perform the aggregation twice: first on aggregated_ts, fingerprint_id, app_name, aggregation_interval
, then we do crdb_internal.merge_transaction_stats
, and then second aggregation on fingerprint_id, app_name, aggregation_interval
. IIUC the concern is that the removal aggregated_ts
from GROUP BY in this query made many more buckets, and for each grouping bucket we're accumulating []JSONB
in-memory. However, as the plan shows we seem to be merging the stats after the first aggregation with ts
in GROUP BY clause, so maybe the concern is not applicable?
I don't have any bright ideas at the moment, but I think it'd be worth at taking a look at EXPLAIN ANALYZE of the query before and after this change on a large dataset and see whether the performance and memory usage are reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @maryliag, @matthewtodd, and @yuzefovich)
pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
any_not_null
is an internal-only aggregate function that is used for columns from GROUP BY clause that need to be projected out (e.g.app_name
in this query). Archer is right that usingmax
will make us perform comparison ofmetadata
objects within each aggregation bucket, and if allmetadata
values are the same, then it would be a redundant computation. We could includemetadata
in the GROUP BY clause and not usemax
, but then we'd have the same problem of redundant comparisons, just at a different stage (during grouping rather than during aggregation).Unfortunately, I don't see a way to not perform these comparisons without allowing to use
any_not_null
aggregate explicitly in the query - that, however, might be a somewhat large change (the execution support is there, but the amount of changes in the optimizer is unknown to me).I think using
max
ormin
is ok for now. I'd be curious to see the benchmarks of the performance overhead of having these redundant comparisons (i.e. try the query with and withoutmetadata
column) on a large dataset.
Hmm interesting. 🤔
It seems like somehow slightly improved the memory usage (from 166 MB -> 138 MB) in the second aggregator. Though the execution time remained mostly the same.
Query Plan putting the metadata
in the GROUP BY
pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):
IIUC the concern is that the removal aggregated_ts from GROUP BY in this query made many more buckets
I think the number of buckets here is less of a concern, I'm more concerned that the size of each bucket now is a lot bigger.
Testing the query in the CLI, the second aggregator stage's
- execution time went from 2 seconds -> 9 seconds
- allocated memory went from 76 MiB -> 166 MiB
This is with about ~180k rows in the table. Thoughts on if this growth would be reasonable?
The URL output of the EXPLAIN ANALYZE (distsql)
is attached below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @matthewtodd)
pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):
Previously, Azhng (Archer Zhang) wrote…
IIUC the concern is that the removal aggregated_ts from GROUP BY in this query made many more buckets
I think the number of buckets here is less of a concern, I'm more concerned that the size of each bucket now is a lot bigger.
Testing the query in the CLI, the second aggregator stage's
- execution time went from 2 seconds -> 9 seconds
- allocated memory went from 76 MiB -> 166 MiB
This is with about ~180k rows in the table. Thoughts on if this growth would be reasonable?
The URL output of the
EXPLAIN ANALYZE (distsql)
is attached below
Thanks for checking this out. I wanna play around a bit with the data set, do you mind sharing it somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @matthewtodd)
pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Hmm interesting. 🤔
It seems like somehow slightly improved the memory usage (from 166 MB -> 138 MB) in the second aggregator. Though the execution time remained mostly the same.
I wouldn't look too much into the decrease in memory usage - in both cases we have to spill to disk, so the memory usage shouldn't be really a concern (and probably it vary somewhat).
pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):
Previously, Azhng (Archer Zhang) wrote…
here's the zipped
cockroach-data
directories
Thanks! I'll take a closer look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @matthewtodd)
pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Thanks! I'll take a closer look.
BTW does this data directory contain any confidential information? Is the link working only for folks in CRL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @maryliag, @matthewtodd, and @yuzefovich)
pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
BTW does this data directory contain any confidential information? Is the link working only for folks in CRL?
No confidential information here. It's just a local database I used myself. Also the link is for CRL only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, and @matthewtodd)
pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I wouldn't look too much into the decrease in memory usage - in both cases we have to spill to disk, so the memory usage shouldn't be really a concern (and probably it vary somewhat).
I made the change to allow the usage of any_not_null
aggregate function explicitly, but it gives only a minor improvement (10% of the time of the second aggregation which is like 2% of the query latency overall):
max
metadata in GROUP BY
any_not_null
Given these observations, I think it's not worth making any_not_null
non-internal. I'd probably choose the option of putting metadata
in GROUP BY clause since overall it seems cleaner.
pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):
Previously, Azhng (Archer Zhang) wrote…
No confidential information here. It's just a local database I used myself. Also the link is for CRL only.
Here the increase in memory usage is actually due to a bug - we are incorrectly limiting the size of the output batches in the hash aggregation, and I have a WIP fix in #76421. With that fix applied I get this.
I'm assuming this PR will not be backported? If not, then I think we should be good with the change to this query - the vectorized engine will spill to disk if necessary and (mostly) correctly accounts for the in-memory state, so we should not blow up.
e42a90b
to
614a74c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @matthewtodd)
Previously, matthewtodd (Matthew Todd) wrote…
nits: "commits" -> "commit", "aggreagted" -> "aggregated"
Done
Previously, matthewtodd (Matthew Todd) wrote…
nit: Maybe spell out "timestamps" here (for docs writers / future readers) instead of abbreviating "ts"?
Done
pkg/server/combined_statement_stats.go, line 134 at r1 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
I'm wondering if we want
max
ormin
here -- do we want to show the latest or the earliest timestamp?
this one really doesn't matter, we won't use it anymore, but I wanted to still return something just in case, so I went with the latest since seeing the latest are priority.
pkg/server/combined_statement_stats.go, line 248 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I made the change to allow the usage of
any_not_null
aggregate function explicitly, but it gives only a minor improvement (10% of the time of the second aggregation which is like 2% of the query latency overall):
max
metadata in GROUP BY
any_not_nullGiven these observations, I think it's not worth making
any_not_null
non-internal. I'd probably choose the option of puttingmetadata
in GROUP BY clause since overall it seems cleaner.
Thank you for the input @yuzefovich and @Azhng . I made the change to move the metadata
to the GROUP BY
clause.
pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Here the increase in memory usage is actually due to a bug - we are incorrectly limiting the size of the output batches in the hash aggregation, and I have a WIP fix in #76421. With that fix applied I get this.
I'm assuming this PR will not be backported? If not, then I think we should be good with the change to this query - the vectorized engine will spill to disk if necessary and (mostly) correctly accounts for the in-memory state, so we should not blow up.
Indeed this won't be backported. I can keep an eye when you land your PR if indeed we had improvements here. Thank you for taking a look into it!
Previously, the aggregation timestamp was used as part of the key of a statement. The new query to collect statement now ignores aggregation timestamp as keys. This commit also removes the aggregated timestamp value on the UI, since it no longer applies. Fixes cockroachdb#74513 Release note (ui change): We don't show information about aggregations timestamp in Statements overview and Details pages, since now all the statement fingerprints are grouped inside the same time selection.
614a74c
to
71ef6b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @matthewtodd)
pkg/server/combined_statement_stats.go, line 249 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
Indeed this won't be backported. I can keep an eye when you land your PR if indeed we had improvements here. Thank you for taking a look into it!
That bug is quite tricky to fix, so I filed #76464 to track it (I'll put my WIP to fix it aside for now). However, even with the bug present I think it's safe to proceed with this change. The behavior of the bug is such that we might create vectorized batches of sizes larger than we want; however, crucially we're still correctly accounting for that large size, so we should not OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @matthewtodd)
pkg/server/combined_statement_stats.go, line 254 at r2 (raw file):
FROM crdb_internal.transaction_statistics %s GROUP BY app_name,
Oh boy did we forget to aggregate by app_name this whole time 🤦♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @matthewtodd)
pkg/server/combined_statement_stats.go, line 254 at r2 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Oh boy did we forget to aggregate by app_name this whole time 🤦♂️
no, the interline view aggregates by app_name, I'm just continuing the group by
here. This function didn't use the group by
, that's probably why you got that impression here :)
TFTR! |
Build succeeded: |
Previously, the aggregation timestamp was used as part
of the key of a statement. The new query to collect
statement now ignores aggregation timestamp as keys.
This commit also removes the aggregated timestamp value on the UI,
since it no longer applies.
Fixes #74513
Release note (ui change): We don't show information about
aggregations timestamp in Statements overview and Details pages, since
now all the statement fingerprints are grouped inside the same
time selection.