-
Notifications
You must be signed in to change notification settings - Fork 15
Conversation
Generate changelog in
|
fe65c5c
to
bf5dfea
Compare
bf5dfea
to
70ac4be
Compare
@@ -172,6 +184,13 @@ private Scrubber(KeyValueService keyValueService, | |||
this.threadCount = threadCount; | |||
this.readThreadCount = readThreadCount; | |||
this.followers = followers; | |||
this.metricsManager = metricsManager; | |||
|
|||
this.enqueuedCells = metricsManager.registerOrGetMeter(Scrubber.class, AtlasDbMetricNames.ENQUEUED_CELLS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make those metrics be lazily registered? I am only mildly familiar with the Scrubber, but it looks like it's only used for hard delete?
private void scrubForAggressiveHardDelete(SnapshotTransaction tx) {
if ((tx.getTransactionType() == TransactionType.AGGRESSIVE_HARD_DELETE) && !tx.isAborted()) {
// t.getCellsToScrubImmediately() checks that t has been committed
cleaner.scrubImmediately(this,
tx.getCellsToScrubImmediately(),
tx.getTimestamp(),
tx.getCommitTimestamp());
}
}
if (getTransactionType() == TransactionType.AGGRESSIVE_HARD_DELETE
|| getTransactionType() == TransactionType.HARD_DELETE) {
cleaner.queueCellsForScrubbing(getCellsToQueueForScrubbing(), getStartTimestamp());
}
So we wouldn't want to add all of those always, as not all products do hard delete.
Have a look at SweepOutcomeMetrics or LegacySweepMetrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add logic to this in MetricsManager
/TaggedMetricRegistry
or port it upstream to tritium
. Some sort of proxy which wraps around the metric and only instantiates/registers it when it's first used. If any additional metrics have to do this themselves, it becomes unwieldy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good call, as Atlas-level metrics cost $$ in many deployments and we're trying to keep that down.
Though as a heads up: this is confusingly named, but any services that use Cleanup Tasks (and thus Stream Stores) will invoke hard delete transactions even if no user-level hard deletes have taken place, and thus register the metrics. So the savings might be less than we expect as things like internal shopping or compute deployer products will still use them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internal product creates the stream store, but does not use it in most deployments, so that wouldn't actually run any hard delete transactions? In which case we need the laziness here and also making sure the scrubber doesn't put, say, zeros for those metrics (and thus register them) when it had nothing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixdesouza +1 for tracking something to make this easier, don't think it needs to block this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkozlowski Internal product uses the stream store for storing media, most all deployments would be using it.
I switched the new scrub metrics over to being lazily registered, though better upstream metrics library support transparent to the user would be better and surely result in additional cost savings for other less-used instrumented classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clockfort haha which internal product? the one I'm talking about only actually uses the stream store when installed on-prem, and the cost of those metrics we do not care about. Changes look good from my perspective, @jeremyk-91 can review the actual scrubber code. +1 for upstreaming something like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(@jkozlowski was probably referring to the fjord product, @clockfort the large internal product!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep! yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified the marking of the meters is done at an appropriate time, and they aren't registered unless the relevant codepaths are hit.
Released 0.173.10 |
Add metrics to scrubbing, for the purposes of debugging a stack that may not be keeping up with its own scrub queue.
(Aside) I believe the eventual goal is to get rid of 90% of scrubbing and trust in targeted sweep, but in the interim this needs metrics.