-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: cleanups around a tenant's metrics registry #97889
Conversation
First two commits are #97884. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, 1 of 1 files at r2, 4 of 4 files at r3, 4 of 4 files at r4, 1 of 1 files at r5, 3 of 3 files at r6, 3 of 3 files at r7, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, and @andreimatei)
pkg/server/tenant.go
line 620 at r7 (raw file):
// If there's a higher-level recorder, we link our metrics registry to it. if s.sqlCfg.NodeMetricsRecorder != nil { s.sqlCfg.NodeMetricsRecorder.AddTenantRegistry(s.sqlCfg.TenantID, s.registry)
Do we need to "unregister" the metrics if the tenant is stopped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are appreciated. I think you just need to gen bazel
for CI but LGTM.
Reviewed 3 of 3 files at r1, 1 of 1 files at r2, 4 of 4 files at r3, 4 of 4 files at r4, 3 of 3 files at r6, 3 of 3 files at r7, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @andreimatei)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
pkg/server/tenant.go
line 620 at r7 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Do we need to "unregister" the metrics if the tenant is stopped?
Is there such a thing as the tenant being stopped independently from the host node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with merging without resolving the unregister discussion since that was a problem prior to the refactors as well.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @andreimatei)
pkg/server/tenant.go
line 620 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Is there such a thing as the tenant being stopped independently from the host node?
via ALTER TENANT x STOP SERVICE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, 1 of 1 files at r2, 4 of 4 files at r3, 4 of 4 files at r4, 1 of 1 files at r5, 3 of 3 files at r6, 3 of 3 files at r7, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
pkg/server/tenant.go
line 620 at r7 (raw file):
Previously, dhartunian (David Hartunian) wrote…
via
ALTER TENANT x STOP SERVICE
?
Yes OK for merging this but we need a follow-up issue to handle this case. Can you file that issue and add a link here? Thanks
It was surprising and unnecessary for the MetricsRegistry to take an rpc.Context. It turns out it only needed something about measuring latencies to other nodes, and even that for some peripheral functionality. Release note: None
The MetricRecorded has nothing to do with hlc; it was using an hlc.Clock purely for reading the physical time. Time patch makes it take a simpler clock. Release note: None
Release note: None
Before this patch, the MetricRecorded was a recursive structure - a server's Recorder contained references to the recorders of shared-proc tenants. This was confusing - a recorder does not need the functionality of child recorders (no methods of the child recorders were used); it only needs a reference to the child registries. Also, a recorder also had a number of child registries, so having both registries and other recorders as children was extra dubious. This patch makes the MetricRecorder reference the tenants' registries directly. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
pkg/server/tenant.go
line 620 at r7 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Yes OK for merging this but we need a follow-up issue to handle this case. Can you file that issue and add a link here? Thanks
maybe I can address this if it's easy - but I can't seem to find any code that deals with stopping a tenant server. Can someone point me to it or is the whole thing yet to be written?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
pkg/server/tenant.go
line 620 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
maybe I can address this if it's easy - but I can't seem to find any code that deals with stopping a tenant server. Can someone point me to it or is the whole thing yet to be written?
@dhartunian already had it implemented as part of my TSDB PR, which is in draft but I expect to have it ready for review this week. It will require some tweaking after this refactor lands. 28577fc#diff-8453981e212eb3eee4afbfde1f551dc981960870b369580d9b4f911b9108533cR180.
I can look into where the stopping code is implemented, though. But I would prefer if this PR doesn't block because of it.
In case of shared-process tenants, the metrics registry of the tenant's SQLServer is registered with the node's metrics recorder, so that the tenant metrics are exported together with the system's metrics. Before this patch, this registration was done in a weird place - it was done when preparing the arguments that will be used for ultimately constructing the respective server. Doing this registration before the server is even constructed, let alone started, is surprising. It's also pretty broken, since the tenant start can fail afterwards. This patch rearranges things so that the registration is done when the tenant server is started. I believe this patch also fixes a bug around the exporting of tenant metrics to Graphite. Before, I think the tenant's metrics were exported twice - once by the tenant and once by the node recorder. Release note: None
6ac48dc
to
a4543af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aadityasondhi, @abarganier, @dhartunian, and @knz)
pkg/server/tenant.go
line 620 at r7 (raw file):
Previously, aadityasondhi (Aaditya Sondhi) wrote…
@dhartunian already had it implemented as part of my TSDB PR, which is in draft but I expect to have it ready for review this week. It will require some tweaking after this refactor lands. 28577fc#diff-8453981e212eb3eee4afbfde1f551dc981960870b369580d9b4f911b9108533cR180.
I can look into where the stopping code is implemented, though. But I would prefer if this PR doesn't block because of it.
ack.
Opened #98100 and added a reference to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aadityasondhi, @abarganier, @dhartunian, and @knz)
Build succeeded: |
As a follow up to cockroachdb#97889, this patch adds a function to unregister a tenant's metric registry when it is stopped. Fixes cockroachdb#98100. Release note: None
98228: sql,clusterversion,gc_job: hoist version gates for DelRange to 23.1 r=ajwerner a=ajwerner We added version gates to unconditionally enable sending DelRange tombstones in 22.2, but then we pre-empted that by disabling them with a cluster setting. This PR hoists those gates and that invariant checking up to 23.1. Relates to #96763 Epic: None Release note: None 98529: server: unregister metric registry for in-process tenants r=aadityasondhi a=aadityasondhi As a follow up to #97889, this patch adds a function to unregister a tenant's metric registry when it is stopped. Fixes #98100. Release note: None Co-authored-by: ajwerner <[email protected]> Co-authored-by: Aaditya Sondhi <[email protected]>
Please see individual commits.
Release note: None
Epic: None