Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.1: metrics: improve ux around _status/vars output #99820

Merged
merged 1 commit into from
Mar 30, 2023

Conversation

blathers-crl[bot]
Copy link

@blathers-crl blathers-crl bot commented Mar 28, 2023

Backport 1/1 commits from #99516 on behalf of @dhartunian.

/cc @cockroachdb/release


Previously, the addition of the tenant metric label was applied uniformly and could result in confusion for customers who never enable multi-tenancy or c2c. The tenant="system" label carries little meaning when there's no tenancy in use.

This change modifies the system tenant label application to only happen when a non- sytem in-process tenant is created.

Additionally, an environment variable:
COCKROACH_DISABLE_NODE_AND_TENANT_METRIC_LABELS can be set to false to disable the new tenant and node_id labels. This can be used on single-process tenants to disable the tenant label.

Resolves: #94668

Epic: CRDB-18798

Release note (ops change): The
COCKROACH_DISABLE_NODE_AND_TENANT_METRIC_LABELS env var can be used to disable the newly introduced metric labels in the _status/vars output if they conflict with a customer's scrape configuration.


Release justification: low risk high impact addition to the prometheus metric output that makes it easier to use and reduces impact of new changes

Previously, the addition of the `tenant` metric label was applied
uniformly and could result in confusion for customers who never enable
multi-tenancy or c2c. The `tenant="system"` label carries little
meaning when there's no tenancy in use.

This change modifies the system tenant label application to only
happen when a non- sytem in-process tenant is created.

Additionally, an environment variable:
`COCKROACH_DISABLE_NODE_AND_TENANT_METRIC_LABELS` can be set to
`false` to disable the new `tenant` and `node_id` labels. This can be
used on single-process tenants to disable the `tenant` label.

When the `tenantNameContainer` is nil, or the `nodeID` is set to
0, the labels will not be applied during recorder configuration on
startup. This is currently the case when running a separate process
tenant using `mt start-sql`. Those tenants *will not* have `tenant` or
`nodeID` labels available.

Resolves: #94668

Epic: CRDB-18798

Release note (ops change): The
`COCKROACH_DISABLE_NODE_AND_TENANT_METRIC_LABELS` env var can be used
to disable the newly introduced metric labels in the `_status/vars`
output if they conflict with a customer's scrape configuration.
@blathers-crl blathers-crl bot requested a review from a team March 28, 2023 16:36
@blathers-crl blathers-crl bot requested a review from a team as a code owner March 28, 2023 16:36
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-23.1-99516 branch 2 times, most recently from 1767e2d to 9521951 Compare March 28, 2023 16:36
@blathers-crl
Copy link
Author

blathers-crl bot commented Mar 28, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@blathers-crl blathers-crl bot closed this Mar 28, 2023
@blathers-crl blathers-crl bot deleted the blathers/backport-release-23.1-99516 branch March 28, 2023 16:36
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz restored the blathers/backport-release-23.1-99516 branch March 28, 2023 16:42
@knz knz reopened this Mar 28, 2023
@knz
Copy link
Contributor

knz commented Mar 28, 2023

@dhartunian is this still relevant?

@dhartunian
Copy link
Collaborator

@knz yep, thx for reopening.

@dhartunian dhartunian requested a review from knz March 28, 2023 21:19
// We assume that all stores have been added to the registry
// prior to calling `AddNode`.
for _, s := range mr.mu.storeRegistries {
s.AddLabel("node_id", strconv.Itoa(int(desc.NodeID)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on I am confused here. Why are we adding the node_id label again for every store? Don't we want separate store IDs instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stores have a store_id label already. This adds an additional node_id to those because I was operating under the assumption that every metric emitted by a node under _status/vars should label itself with that node's ID.

Should we omit node_id label from store metrics entirely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining.

@dhartunian dhartunian merged commit 79eddfe into release-23.1 Mar 30, 2023
@dhartunian dhartunian deleted the blathers/backport-release-23.1-99516 branch March 30, 2023 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants