add metrics to scrubber #4398

clockfort · 2019-11-08T22:02:35Z

Add metrics to scrubbing, for the purposes of debugging a stack that may not be keeping up with its own scrub queue.

(Aside) I believe the eventual goal is to get rid of 90% of scrubbing and trust in targeted sweep, but in the interim this needs metrics.

changelog-app · 2019-11-08T22:02:40Z

Generate changelog in `changelog/@unreleased`

Type

Description

The scrubber has been instrumented with metrics.

Check the box to generate changelog(s)

Generate changelog entry

jkozlowski · 2019-11-11T10:42:48Z

atlasdb-impl-shared/src/main/java/com/palantir/atlasdb/cleaner/Scrubber.java

@@ -172,6 +184,13 @@ private Scrubber(KeyValueService keyValueService,
        this.threadCount = threadCount;
        this.readThreadCount = readThreadCount;
        this.followers = followers;
+        this.metricsManager = metricsManager;
+
+        this.enqueuedCells = metricsManager.registerOrGetMeter(Scrubber.class, AtlasDbMetricNames.ENQUEUED_CELLS);


Can we make those metrics be lazily registered? I am only mildly familiar with the Scrubber, but it looks like it's only used for hard delete?

private void scrubForAggressiveHardDelete(SnapshotTransaction tx) { if ((tx.getTransactionType() == TransactionType.AGGRESSIVE_HARD_DELETE) && !tx.isAborted()) { // t.getCellsToScrubImmediately() checks that t has been committed cleaner.scrubImmediately(this, tx.getCellsToScrubImmediately(), tx.getTimestamp(), tx.getCommitTimestamp()); } }

if (getTransactionType() == TransactionType.AGGRESSIVE_HARD_DELETE || getTransactionType() == TransactionType.HARD_DELETE) { cleaner.queueCellsForScrubbing(getCellsToQueueForScrubbing(), getStartTimestamp()); }

So we wouldn't want to add all of those always, as not all products do hard delete.

Have a look at SweepOutcomeMetrics or LegacySweepMetrics.

We should add logic to this in MetricsManager/TaggedMetricRegistry or port it upstream to tritium. Some sort of proxy which wraps around the metric and only instantiates/registers it when it's first used. If any additional metrics have to do this themselves, it becomes unwieldy.

This is a good call, as Atlas-level metrics cost $$ in many deployments and we're trying to keep that down.

Though as a heads up: this is confusingly named, but any services that use Cleanup Tasks (and thus Stream Stores) will invoke hard delete transactions even if no user-level hard deletes have taken place, and thus register the metrics. So the savings might be less than we expect as things like internal shopping or compute deployer products will still use them.

Internal product creates the stream store, but does not use it in most deployments, so that wouldn't actually run any hard delete transactions? In which case we need the laziness here and also making sure the scrubber doesn't put, say, zeros for those metrics (and thus register them) when it had nothing to do.

@felixdesouza +1 for tracking something to make this easier, don't think it needs to block this PR.

@jkozlowski Internal product uses the stream store for storing media, most all deployments would be using it.

I switched the new scrub metrics over to being lazily registered, though better upstream metrics library support transparent to the user would be better and surely result in additional cost savings for other less-used instrumented classes.

@clockfort haha which internal product? the one I'm talking about only actually uses the stream store when installed on-prem, and the cost of those metrics we do not care about. Changes look good from my perspective, @jeremyk-91 can review the actual scrubber code. +1 for upstreaming something like this

(@jkozlowski was probably referring to the fjord product, @clockfort the large internal product!)

jeremyk-91

I verified the marking of the meters is done at an appropriate time, and they aren't registered unless the relevant codepaths are hit.

svc-autorelease · 2019-11-20T18:39:59Z

Released 0.173.10

clockfort force-pushed the scrub_metrics branch 3 times, most recently from fe65c5c to bf5dfea Compare November 8, 2019 22:39

add metrics to scrubber

70ac4be

clockfort force-pushed the scrub_metrics branch from bf5dfea to 70ac4be Compare November 8, 2019 23:04

clockfort requested a review from jeremyk-91 November 8, 2019 23:24

jkozlowski reviewed Nov 11, 2019

View reviewed changes

lazy registration of scrub metrics for cost savings

34b29d4

clockfort assigned jeremyk-91 Nov 19, 2019

jeremyk-91 approved these changes Nov 20, 2019

View reviewed changes

jeremyk-91 added the autorelease label Nov 20, 2019

jeremyk-91 merged commit 970313d into develop Nov 20, 2019

delete-merged-branch bot deleted the scrub_metrics branch November 20, 2019 18:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add metrics to scrubber #4398

add metrics to scrubber #4398

clockfort commented Nov 8, 2019

changelog-app bot commented Nov 8, 2019 •

edited by clockfort

Loading

jkozlowski Nov 11, 2019 •

edited

Loading

felixdesouza Nov 11, 2019 •

edited

Loading

jeremyk-91 Nov 11, 2019 •

edited

Loading

jkozlowski Nov 12, 2019 •

edited

Loading

jkozlowski Nov 12, 2019

clockfort Nov 12, 2019

jkozlowski Nov 13, 2019 •

edited

Loading

jeremyk-91 Nov 20, 2019

clockfort Nov 21, 2019 •

edited

Loading

jeremyk-91 left a comment

svc-autorelease commented Nov 20, 2019

add metrics to scrubber #4398

add metrics to scrubber #4398

Conversation

clockfort commented Nov 8, 2019

changelog-app bot commented Nov 8, 2019 • edited by clockfort Loading

Generate changelog in changelog/@unreleased

jkozlowski Nov 11, 2019 • edited Loading

Choose a reason for hiding this comment

felixdesouza Nov 11, 2019 • edited Loading

Choose a reason for hiding this comment

jeremyk-91 Nov 11, 2019 • edited Loading

Choose a reason for hiding this comment

jkozlowski Nov 12, 2019 • edited Loading

Choose a reason for hiding this comment

jkozlowski Nov 12, 2019

Choose a reason for hiding this comment

clockfort Nov 12, 2019

Choose a reason for hiding this comment

jkozlowski Nov 13, 2019 • edited Loading

Choose a reason for hiding this comment

jeremyk-91 Nov 20, 2019

Choose a reason for hiding this comment

clockfort Nov 21, 2019 • edited Loading

Choose a reason for hiding this comment

jeremyk-91 left a comment

Choose a reason for hiding this comment

svc-autorelease commented Nov 20, 2019

changelog-app bot commented Nov 8, 2019 •

edited by clockfort

Loading

Generate changelog in `changelog/@unreleased`

jkozlowski Nov 11, 2019 •

edited

Loading

felixdesouza Nov 11, 2019 •

edited

Loading

jeremyk-91 Nov 11, 2019 •

edited

Loading

jkozlowski Nov 12, 2019 •

edited

Loading

jkozlowski Nov 13, 2019 •

edited

Loading

clockfort Nov 21, 2019 •

edited

Loading