Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: cleanups around a tenant's metrics registry #97889

Merged
merged 5 commits into from
Mar 6, 2023

Conversation

andreimatei
Copy link
Contributor

Please see individual commits.

Release note: None
Epic: None

@andreimatei andreimatei requested review from a team as code owners March 1, 2023 23:36
@andreimatei andreimatei requested a review from a team March 1, 2023 23:36
@andreimatei andreimatei requested review from a team as code owners March 1, 2023 23:36
@andreimatei andreimatei requested review from herkolategan, smg260 and a team and removed request for a team, herkolategan and smg260 March 1, 2023 23:36
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andreimatei
Copy link
Contributor Author

First two commits are #97884.

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, 1 of 1 files at r2, 4 of 4 files at r3, 4 of 4 files at r4, 1 of 1 files at r5, 3 of 3 files at r6, 3 of 3 files at r7, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, and @andreimatei)


pkg/server/tenant.go line 620 at r7 (raw file):

	// If there's a higher-level recorder, we link our metrics registry to it.
	if s.sqlCfg.NodeMetricsRecorder != nil {
		s.sqlCfg.NodeMetricsRecorder.AddTenantRegistry(s.sqlCfg.TenantID, s.registry)

Do we need to "unregister" the metrics if the tenant is stopped?

Copy link
Collaborator

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are appreciated. I think you just need to gen bazel for CI but LGTM.

Reviewed 3 of 3 files at r1, 1 of 1 files at r2, 4 of 4 files at r3, 4 of 4 files at r4, 3 of 3 files at r6, 3 of 3 files at r7, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @andreimatei)

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)


pkg/server/tenant.go line 620 at r7 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Do we need to "unregister" the metrics if the tenant is stopped?

Is there such a thing as the tenant being stopped independently from the host node?

@andreimatei andreimatei requested a review from dhartunian March 6, 2023 18:25
Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I'm fine with merging without resolving the unregister discussion since that was a problem prior to the refactors as well.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @andreimatei)


pkg/server/tenant.go line 620 at r7 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Is there such a thing as the tenant being stopped independently from the host node?

via ALTER TENANT x STOP SERVICE?

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, 1 of 1 files at r2, 4 of 4 files at r3, 4 of 4 files at r4, 1 of 1 files at r5, 3 of 3 files at r6, 3 of 3 files at r7, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)


pkg/server/tenant.go line 620 at r7 (raw file):

Previously, dhartunian (David Hartunian) wrote…

via ALTER TENANT x STOP SERVICE?

Yes OK for merging this but we need a follow-up issue to handle this case. Can you file that issue and add a link here? Thanks

It was surprising and unnecessary for the MetricsRegistry to take an
rpc.Context. It turns out it only needed something about measuring
latencies to other nodes, and even that for some peripheral
functionality.

Release note: None
The MetricRecorded has nothing to do with hlc; it was using an hlc.Clock
purely for reading the physical time. Time patch makes it take a simpler
clock.

Release note: None
Before this patch, the MetricRecorded was a recursive structure - a
server's Recorder contained references to the recorders of shared-proc
tenants. This was confusing - a recorder does not need the functionality
of child recorders (no methods of the child recorders were used); it
only needs a reference to the child registries. Also, a recorder also
had a number of child registries, so having both registries and other
recorders as children was extra dubious. This patch makes the
MetricRecorder reference the tenants' registries directly.

Release note: None
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)


pkg/server/tenant.go line 620 at r7 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Yes OK for merging this but we need a follow-up issue to handle this case. Can you file that issue and add a link here? Thanks

maybe I can address this if it's easy - but I can't seem to find any code that deals with stopping a tenant server. Can someone point me to it or is the whole thing yet to be written?

Copy link
Collaborator

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)


pkg/server/tenant.go line 620 at r7 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

maybe I can address this if it's easy - but I can't seem to find any code that deals with stopping a tenant server. Can someone point me to it or is the whole thing yet to be written?

@dhartunian already had it implemented as part of my TSDB PR, which is in draft but I expect to have it ready for review this week. It will require some tweaking after this refactor lands. 28577fc#diff-8453981e212eb3eee4afbfde1f551dc981960870b369580d9b4f911b9108533cR180.

I can look into where the stopping code is implemented, though. But I would prefer if this PR doesn't block because of it.

In case of shared-process tenants, the metrics registry of the tenant's
SQLServer is registered with the node's metrics recorder, so that the
tenant metrics are exported together with the system's metrics. Before
this patch, this registration was done in a weird place - it was done
when preparing the arguments that will be used for ultimately
constructing the respective server. Doing this registration before the
server is even constructed, let alone started, is surprising. It's also
pretty broken, since the tenant start can fail afterwards. This patch
rearranges things so that the registration is done when the tenant
server is started.

I believe this patch also fixes a bug around the exporting of tenant
metrics to Graphite. Before, I think the tenant's metrics were exported
twice - once by the tenant and once by the node recorder.

Release note: None
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aadityasondhi, @abarganier, @dhartunian, and @knz)


pkg/server/tenant.go line 620 at r7 (raw file):

Previously, aadityasondhi (Aaditya Sondhi) wrote…

@dhartunian already had it implemented as part of my TSDB PR, which is in draft but I expect to have it ready for review this week. It will require some tweaking after this refactor lands. 28577fc#diff-8453981e212eb3eee4afbfde1f551dc981960870b369580d9b4f911b9108533cR180.

I can look into where the stopping code is implemented, though. But I would prefer if this PR doesn't block because of it.

ack.
Opened #98100 and added a reference to it.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aadityasondhi, @abarganier, @dhartunian, and @knz)

@craig
Copy link
Contributor

craig bot commented Mar 6, 2023

Build succeeded:

@craig craig bot merged commit b0f6985 into cockroachdb:master Mar 6, 2023
aadityasondhi added a commit to aadityasondhi/cockroach that referenced this pull request Mar 13, 2023
As a follow up to cockroachdb#97889, this patch adds a function to unregister a
tenant's metric registry when it is stopped.

Fixes cockroachdb#98100.

Release note: None
@andreimatei andreimatei deleted the metrics.cleanup branch March 13, 2023 21:19
craig bot pushed a commit that referenced this pull request Mar 15, 2023
98228: sql,clusterversion,gc_job: hoist version gates for DelRange to 23.1 r=ajwerner a=ajwerner

We added version gates to unconditionally enable sending DelRange tombstones in 22.2, but then we pre-empted that by disabling them with a cluster setting. This PR hoists those gates and that invariant checking up to 23.1.

Relates to #96763

Epic: None

Release note: None

98529: server: unregister metric registry for in-process tenants r=aadityasondhi a=aadityasondhi

As a follow up to #97889, this patch adds a function to unregister a tenant's metric registry when it is stopped.

Fixes #98100.

Release note: None

Co-authored-by: ajwerner <[email protected]>
Co-authored-by: Aaditya Sondhi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants