release-22.1: Adding metrics required by the serverless autoscaler #79519

darinpp · 2022-04-06T16:58:44Z

Backport:

1/1 commits from "server/status: selective metric export" (server/status: selective metric export #79021)
1/1 commits from "server/status: add running non-idle jobs metric" (server/status: add running non-idle jobs metric #79022)
1/1 commits from "server/status: add load related metrics" (server/status: add load related metrics #79023)

Please see individual PRs for details.

/cc @cockroachdb/release

Release justification: Low risk, high reward changes to existing functionality

Release note: None

Previously PrometheusExporter could only export all the metrics in a registry without ability to select a subset. For serverless we use a separate metric endpoint (_status/load) that currently shows cpu utilization metrics that are generated each time the metrics are pulled. We need however some additional metrics that are currently tracked by MetricRecorder. Exporting all the metrics tracked by the MetricRecorder is not desirables as this incurs performabnce penalty given the higher poll rate on the load endpoint. So this PR modifies PrometheusExporter to only scrape a subset of all the metrics. A second change is how the locking is done when scraping and writing the screaped output. Previously the lock when doing that was external and was a responsibility of the caller. This PR adds a ScrapeAndPrintAsText method to the exporter that is thread safe and does the locking internally. Release justification: Low risk, high reward changes to existing functionality Release note: None

Previously serverless was using the sql jobs running metric to determine if a tenant process is idle and can be shut down. With the introduction of continiously running jobs this isn't a good indicator anymore. A recent addition is a per job metrics that show running or idle. The auto scaler doesn't care about the individual jobs and only cares about the total number of jobs that a running but haven't reported as being idle. The pull rate is also very high so the retriving all the individual running/idle metrics for each job type isn't optimal. So this PR adds a single metric that just aggregates and tracks the total count of jobs running and not idle. Release justification: Bug fixes and low-risk updates to new functionality Release note: None

Previously there were only CPU related metrics available on the _status/load endpoint. For serverless we need in addition to these, the metrics which show the total number of current sql connections, the number of sql queries executed and the number of jobs currently running that are not idle. This PR adds the three new metrics by using selective prometheus exporter and scraping the MetricsRecorder. Release justification: Low risk, high reward changes to existing functionality Release note: None

blathers-crl · 2022-04-06T16:58:46Z

cockroach-teamcity · 2022-04-06T16:58:57Z

This change is

darinpp added 3 commits April 6, 2022 09:56

darinpp requested a review from a team April 6, 2022 16:58

darinpp requested review from a team as code owners April 6, 2022 16:58

darinpp requested review from samiskin and removed request for a team April 6, 2022 16:58

darinpp merged commit 2963625 into cockroachdb:release-22.1 Apr 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-22.1: Adding metrics required by the serverless autoscaler #79519

release-22.1: Adding metrics required by the serverless autoscaler #79519

darinpp commented Apr 6, 2022

blathers-crl bot commented Apr 6, 2022 •

edited by darinpp

Loading

cockroach-teamcity commented Apr 6, 2022

release-22.1: Adding metrics required by the serverless autoscaler #79519

release-22.1: Adding metrics required by the serverless autoscaler #79519

Conversation

darinpp commented Apr 6, 2022

blathers-crl bot commented Apr 6, 2022 • edited by darinpp Loading

cockroach-teamcity commented Apr 6, 2022

blathers-crl bot commented Apr 6, 2022 •

edited by darinpp

Loading