Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: improved UX for tenant labels on prometheus metrics #94668

Closed
2 tasks
dhartunian opened this issue Jan 3, 2023 · 1 comment · Fixed by #99516
Closed
2 tasks

metrics: improved UX for tenant labels on prometheus metrics #94668

dhartunian opened this issue Jan 3, 2023 · 1 comment · Fixed by #99516
Assignees
Labels
A-observability-inf branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker

Comments

@dhartunian
Copy link
Collaborator

dhartunian commented Jan 3, 2023

  • tenants running via mt start-sql may want to omit tenant_id label. uncertain if this is necessary, could be optional.
  • single-tenant configurations should not output any tenant labels

Jira issue: CRDB-23074

Epic CRDB-18798

@dhartunian dhartunian added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability-inf labels Jan 3, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 22, 2023

Hi @dhartunian, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@dhartunian dhartunian added the branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 label Mar 22, 2023
dhartunian added a commit to dhartunian/cockroach that referenced this issue Mar 27, 2023
Previously, the addition of the `tenant` metric label was applied
uniformly and could result in confusion for customers who never enable
multi-tenancy or c2c. The `tenant="system"` label carries little
meaning when there's no tenancy in use.

This change modifies the system tenant label application to only
happen when a non- sytem in-process tenant is created.

Additionally, an environment variable:
`COCKROACH_DISABLE_NODE_AND_TENANT_METRIC_LABELS` can be set to
`false` to disable the new `tenant` and `node_id` labels. This can be
used on single-process tenants to disable the `tenant` label.

Resolves: cockroachdb#94668

Epic: CRDB-18798

Release note (ops change): The
`COCKROACH_DISABLE_NODE_AND_TENANT_METRIC_LABELS` env var can be used
to disable the newly introduced metric labels in the `_status/vars`
output if they conflict with a customer's scrape configuration.
craig bot pushed a commit that referenced this issue Mar 28, 2023
…99752 #99774

99433: opt: fixup CTE stats on placeholder queries r=cucaroach a=cucaroach

During optbuilder phase we copy the initial expressions stats into the
fake-rel but this value can change when placeholders are assigned so add
code in AssignPlaceholders to rebuild the cte if the stats change.

Fixes: #99389
Epic: none
Release note: none

99516: metrics: improve ux around _status/vars output r=aadityasondhi a=dhartunian

Previously, the addition of the `tenant` metric label was applied uniformly and could result in confusion for customers who never enable multi-tenancy or c2c. The `tenant="system"` label carries little meaning when there's no tenancy in use.

This change modifies the system tenant label application to only happen when a non- sytem in-process tenant is created.

Additionally, an environment variable:
`COCKROACH_DISABLE_NODE_AND_TENANT_METRIC_LABELS` can be set to `false` to disable the new `tenant` and `node_id` labels. This can be used on single-process tenants to disable the `tenant` label.

Resolves: #94668

Epic: CRDB-18798

Release note (ops change): The
`COCKROACH_DISABLE_NODE_AND_TENANT_METRIC_LABELS` env var can be used to disable the newly introduced metric labels in the `_status/vars` output if they conflict with a customer's scrape configuration.

99522: jobsprofiler: store DistSQL diagram of jobs in job info r=dt a=adityamaru

This change teaches import, cdc, backup and restore
to store their DistSQL plans in the job_info table
under a timestamped info key. The generation and writing
of the plan diagram is done asynchronously so as to not
slow down the execution of the job. A new plan will be
stored everytime the job sets up its DistSQL flow.

Release note: None
Epic: [CRDB-8964](https://cockroachlabs.atlassian.net/browse/CRDB-8964)
Informs: #99729

99574: streamingccl: skip acceptance/c2c on remote cluster setup r=stevendanna a=msbutler

acceptance/c2c currently fails when run on a remote cluster. This patch ensures the test gets skipped when run on a remote cluster. There's no need to run the test on a remote cluster because the other c2c roachtests provide sufficient coverage.

Fixes #99553

Release note: none

99691: codeowners: update sql obs to cluster obs r=maryliag a=maryliag

Update mentions of `sql-observability` to
`cluster-observability`.

Epic: none
Release note: None

99712: ui: connect metrics provider to metrics timescale object r=xinhaoz a=dhartunian

Previously, the `MetricsDataProvider` component queried the redux store for the `TimeScale` object which contained details of the currently active time window. This piece of state was assumed to update to account for the "live" moving window that metrics show when pre-set lookback time windows are selected.

A recent PR: #98331 removed the feature that polled new data from SQL pages, which also disabled polling on metrics pages due to the re-use of `TimeScale`.

This commit modifies the `MetricsDataProvider` to instead read the `metricsTime` field of the `TimeScaleState` object. This object was constructed for use by the `MetricsDataProvider` but was not wired up to the component.

Resolves #99524

Epic: None

Release note: None

99733: telemetry: add FIPS-specific channel r=knz a=rail

Previously, all official builds were reporting using the same telemetry channel.

This PR adds an new telemetry channel for the FIPS build target.

Fixes: CC-24110
Epic: DEVINF-478
Release note: None

99745: spanconfigsqlwatcher: deflake TestSQLWatcherOnEventError r=arulajmani a=arulajmani

Previously, this test was setting the no-op checkpoint duration to be
 every hour to effectively disable checkpoints. Doing so is integral to
what the test is testing. However, this was a lie, given how `util.Every` works -- A call to `ShouldProcess` returns true the very first time.

This patch achieves the original goal by introducing a new testing knob. Previously, the test would fail in < 40 runs locally.  Have this running strong for ~1000 runs.

Fixes #76765

Release note: None

99747: roachtest: use persistent disks for disk-stall tests r=jbowens a=nicktrav

Currently, the `disk-stall` tests use local SSDs. When run on GCE VMs, a higher test flake rate is observed due to known issues with fsync latency for local SSDs.

Switch the test to use persistent disks instead.

Touches: #99372.

Release note: None.

Epic: CRDB-20293

99752: kvserver: bump tolerance more r=ajwerner a=ajwerner

I'm not digging into this more, but the test is flakey.

Epic: none

https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_UnitTests_BazelUnitTests/9161972?showRootCauses=false&expandBuildChangesSection=true&expandBuildProblemsSection=true&expandBuildTestsSection=true

Release note: None

99774: *: identify remaining uses of TODOSQLCodec r=stevendanna a=knz

The `TODOSQLCodec` was a bug waiting to happen.

The only reasonable remaining purpose is for use in tests. As such, this change moves its definition to a test-only package (we have a linter that verifies that `testutils` is not included in non-test code).

This change then identifies the one non-reasonable remaining purposes and identifies it properly as a bug linked to #48123.

Release note: None
Epic: None

Co-authored-by: Tommy Reilly <[email protected]>
Co-authored-by: David Hartunian <[email protected]>
Co-authored-by: adityamaru <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: maryliag <[email protected]>
Co-authored-by: Rail Aliiev <[email protected]>
Co-authored-by: Arul Ajmani <[email protected]>
Co-authored-by: Nick Travers <[email protected]>
Co-authored-by: ajwerner <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
@craig craig bot closed this as completed in a6a1a4c Mar 28, 2023
blathers-crl bot pushed a commit that referenced this issue Mar 28, 2023
Previously, the addition of the `tenant` metric label was applied
uniformly and could result in confusion for customers who never enable
multi-tenancy or c2c. The `tenant="system"` label carries little
meaning when there's no tenancy in use.

This change modifies the system tenant label application to only
happen when a non- sytem in-process tenant is created.

Additionally, an environment variable:
`COCKROACH_DISABLE_NODE_AND_TENANT_METRIC_LABELS` can be set to
`false` to disable the new `tenant` and `node_id` labels. This can be
used on single-process tenants to disable the `tenant` label.

When the `tenantNameContainer` is nil, or the `nodeID` is set to
0, the labels will not be applied during recorder configuration on
startup. This is currently the case when running a separate process
tenant using `mt start-sql`. Those tenants *will not* have `tenant` or
`nodeID` labels available.

Resolves: #94668

Epic: CRDB-18798

Release note (ops change): The
`COCKROACH_DISABLE_NODE_AND_TENANT_METRIC_LABELS` env var can be used
to disable the newly introduced metric labels in the `_status/vars`
output if they conflict with a customer's scrape configuration.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-observability-inf branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant