Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsdump: histograms are missing #107701

Closed
tbg opened this issue Jul 27, 2023 · 2 comments · Fixed by #108597
Closed

tsdump: histograms are missing #107701

tbg opened this issue Jul 27, 2023 · 2 comments · Fixed by #108597
Assignees
Labels
A-observability-inf C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.

Comments

@tbg
Copy link
Member

tbg commented Jul 27, 2023

In the tsdump in #106140 (comment), the histogram quantiles are not recorded. For example, exec.latency is not there. I remember this working at some point; unclear if it's an issue with this particular tsdump or a regression.

tsdump.gob.yaml (July 5-6)

1: 1
2: 2
3: 3
4: 4
5: 5
6: 6
7: 7
8: 8

Jira issue: CRDB-30167

@tbg tbg added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-observability-inf labels Jul 27, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jul 27, 2023

Hi @tbg, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@ericharmeling ericharmeling self-assigned this Aug 3, 2023
ericharmeling added a commit to ericharmeling/cockroach that referenced this issue Aug 11, 2023
This commit assigns prometheusgo.MetricType_HISTOGRAM to the
Metadata.MetricType in Histogram.GetMetadata.

Before this change, GetMetadata() was returning the
Metadata.MetricType zero value (prometheusgo.MetricType_COUNTER)
for all histograms that did not explicitly specify the
prometheusgo.MetricType_HISTOGRAM for Metadata.MetricType in
their Metadata definitions. This prevented checks on histogram
Metadata.MetricType from properly evaluating the metrics as
histograms.

Fixes cockroachdb#106448.
Fixes cockroachdb#107701.

Releaes note: None
@ericharmeling
Copy link
Contributor

We're not actually constructing the histogram names for most histograms in the tsdump code path.

GetInternalTimeseriesNamesFromServer checks the meta.MetricType of the metadata for each metric here:

func GetInternalTimeseriesNamesFromServer(

The zero value for the MetricType in the prometheus go client is COUNTER: https://github.com/prometheus/client_model/blob/62e59ec852bda61ad6d00519f9e81595c41f865b/go/metrics.pb.go#L41

The histogram mentioned in this issue (exec.latency) does not specify a MetricType:

metaExecLatency = metric.Metadata{
Name: "exec.latency",
Help: `Latency of batch KV requests (including errors) executed on this node.
This measures requests already addressed to a single replica, from the moment
at which they arrive at the internal gRPC endpoint to the moment at which the
response (or an error) is returned.
This latency includes in particular commit waits, conflict resolution and replication,
and end-users can easily produce high measurements via long-running transactions that
conflict with foreground traffic. This metric thus does not provide a good signal for
understanding the health of the KV layer.
`,
Measurement: "Latency",
Unit: metric.Unit_NANOSECONDS,
}

In fact, the only metrics for which we do specify a MetricType are ttljob_metrics:

SpanTotalDuration: b.Histogram(metric.HistogramOptions{
Metadata: metric.Metadata{
Name: "jobs.row_level_ttl.span_total_duration",
Help: "Duration for processing a span during row level TTL.",
Measurement: "nanoseconds",
Unit: metric.Unit_NANOSECONDS,
MetricType: io_prometheus_client.MetricType_HISTOGRAM,
},
MaxVal: time.Hour.Nanoseconds(),
SigFigs: sigFigs,
Duration: histogramWindowInterval,
BucketConfig: metric.LongRunning60mLatencyBuckets,
}),
SelectDuration: b.Histogram(metric.HistogramOptions{
Metadata: metric.Metadata{
Name: "jobs.row_level_ttl.select_duration",
Help: "Duration for select requests during row level TTL.",
Measurement: "nanoseconds",
Unit: metric.Unit_NANOSECONDS,
MetricType: io_prometheus_client.MetricType_HISTOGRAM,
},
MaxVal: time.Minute.Nanoseconds(),
SigFigs: sigFigs,
Duration: histogramWindowInterval,
BucketConfig: metric.BatchProcessLatencyBuckets,
}),
DeleteDuration: b.Histogram(metric.HistogramOptions{
Metadata: metric.Metadata{
Name: "jobs.row_level_ttl.delete_duration",
Help: "Duration for delete requests during row level TTL.",
Measurement: "nanoseconds",
Unit: metric.Unit_NANOSECONDS,
MetricType: io_prometheus_client.MetricType_HISTOGRAM,

And these are the only histogram metrics that actually show up in the tsdump.

Rather than explicitly specify a metric type for each histogram, we should be assigning this type on histogram creation, or returning it on metadata introspection. This PR should resolve this issue: #108597

ericharmeling added a commit to ericharmeling/cockroach that referenced this issue Aug 21, 2023
This commit assigns prometheusgo.MetricType_HISTOGRAM to the
Metadata.MetricType on histogram construction.

Before this change, GetMetadata() was returning the
Metadata.MetricType zero value (prometheusgo.MetricType_COUNTER)
for all histograms that did not explicitly specify the
prometheusgo.MetricType_HISTOGRAM for Metadata.MetricType in
their Metadata definitions. This prevented checks on histogram
Metadata.MetricType from properly evaluating the metrics as
histograms.

Fixes cockroachdb#106448.
Fixes cockroachdb#107701.

Releaes note: None
craig bot pushed a commit that referenced this issue Aug 25, 2023
108597: metrics: assign histogram metric type on histogram construction r=ericharmeling a=ericharmeling

This commit assigns prometheusgo.MetricType_HISTOGRAM to the
Metadata.MetricType on histogram construction.

Before this change, GetMetadata() was returning the
Metadata.MetricType zero value (prometheusgo.MetricType_COUNTER)
for all histograms that did not explicitly specify the
prometheusgo.MetricType_HISTOGRAM for Metadata.MetricType in
their Metadata definitions. This prevented checks on histogram
Metadata.MetricType from properly evaluating the metrics as
histograms.

Fixes #106448.
Fixes #107701.

Releaes note: None

109345: changefeedccl: refactor kvfeed startup in changeaggregator processor  r=miretskiy a=jayshrivastava

changefeedccl: refactor kvfeed startup in changeaggregator processor
This change cleans up the code used to start up the kv feed
in change aggregator processors. This change removes uncessessary code,
adds a better API, and makes code easier to reason about.

Informs: #96953
Release note: None
Epic: None

109386: sql: adjust many tests to work with test tenant r=yuzefovich a=yuzefovich

Epic: CRDB-18499
Informs #76378

Release note: None

109476: dev: make `dev test --count 1` invalidate cached test results r=dt a=rickystewart

This matches the behavior of `go test`.

Epic: none
Release note: None

109506: changefeedccl: ensure rangefeed setting is enabled in tests r=miretskiy a=jayshrivastava

Previously, many tests which create rangefeeds would not explicitly set the `kv.rangefeed.enabled` setting to be true. These tests would still work because, by default, rangefeeds are enabled via span configs. However, it was observed that span configs are not immediately applied when range splits occur. This would cause the testing rangefeed reader to encounter errors and/or timeout on very rare occasions. See #109306 (comment) for more info.

This change updates these tests to set the `kv.rangefeed.enabled` cluster setting to be true, which removes the dependency on span configs.

Closes: #109306
Epic: None
Release note: None

109511: concurrency: use generic lists in the lock table r=nvanbenschoten a=arulajmani


Now that cda4fa2 has landed, we can make use of generic lists in a few places in the lock table.

Epic: none
Release note: None

Co-authored-by: Eric Harmeling <[email protected]>
Co-authored-by: Jayant Shrivastava <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Arul Ajmani <[email protected]>
@craig craig bot closed this as completed in fb8f99b Aug 25, 2023
blathers-crl bot pushed a commit that referenced this issue Aug 25, 2023
This commit assigns prometheusgo.MetricType_HISTOGRAM to the
Metadata.MetricType on histogram construction.

Before this change, GetMetadata() was returning the
Metadata.MetricType zero value (prometheusgo.MetricType_COUNTER)
for all histograms that did not explicitly specify the
prometheusgo.MetricType_HISTOGRAM for Metadata.MetricType in
their Metadata definitions. This prevented checks on histogram
Metadata.MetricType from properly evaluating the metrics as
histograms.

Fixes #106448.
Fixes #107701.

Releaes note: None
ericharmeling added a commit to ericharmeling/cockroach that referenced this issue Aug 28, 2023
This commit assigns prometheusgo.MetricType_HISTOGRAM to the
Metadata.MetricType on histogram construction.

Before this change, GetMetadata() was returning the
Metadata.MetricType zero value (prometheusgo.MetricType_COUNTER)
for all histograms that did not explicitly specify the
prometheusgo.MetricType_HISTOGRAM for Metadata.MetricType in
their Metadata definitions. This prevented checks on histogram
Metadata.MetricType from properly evaluating the metrics as
histograms.

Fixes cockroachdb#106448.
Fixes cockroachdb#107701.

Releaes note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-observability-inf C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants