Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add weighted size counter to caffeine instrumentation #1246

Open
jhominal opened this issue Jan 3, 2025 · 10 comments
Open

Add weighted size counter to caffeine instrumentation #1246

jhominal opened this issue Jan 3, 2025 · 10 comments

Comments

@jhominal
Copy link
Contributor

jhominal commented Jan 3, 2025

Hello,

I am using caffeine in a context where I am using a Weigher to bound the cache instance memory consumption.

In that context, I find that the weighted size (which can be found on the EvictionPolicy object in caffeine) is an interesting statistic to track.

I would be happy to author a PR to add that stat to caffeine instrumentation. (guava cache does not seem to expose an API that would make that statistic available)

@zeitlinger
Copy link
Member

Great - PR would be welcome 😄

@jhominal
Copy link
Contributor Author

jhominal commented Jan 7, 2025

@zeitlinger While looking at the details of the metrics collected in caffeine-instrumentation, I have come to believe that caffeine_cache_eviction_weight should be a Counter instead of a Gauge (Weigher.weigh is annotated as returning a non-negative value.)

Would you approve a change of metric type from gauge to counter?
What provisions, if any, should be made for backward compatibility?

@zeitlinger
Copy link
Member

@zeitlinger While looking at the details of the metrics collected in caffeine-instrumentation, I have come to believe that caffeine_cache_eviction_weight should be a Counter instead of a Gauge (Weigher.weigh is annotated as returning a non-negative value.)

Would you approve a change of metric type from gauge to counter? What provisions, if any, should be made for backward compatibility?

Gauge seems to be correct, because the method returns the current accumulated value rather than a difference

 /**
   * Returns the sum of weights of evicted entries. This total does not include manual
   * {@linkplain Cache#invalidate invalidations}.
   *
   * @return the sum of weights of evicted entities
   */

@jhominal
Copy link
Contributor Author

jhominal commented Jan 7, 2025

@zeitlinger Semantically the eviction weight is also described in the CacheStats class doc string

/**
 * Statistics about the performance of a {@link Cache}.
 * <p>
 * Cache statistics are incremented according to the following rules:
 * <ul>
 *   <li>When an entry is evicted from the cache, {@code evictionCount} is incremented and the
 *       weight added to {@code evictionWeight}.
 * </ul>

That is, each time there is an eviction:

  • evictionCount is incremented by 1;
  • evictionWeight is incremented by the evicted entry's weight. By contract (on the Weigher interface), the weight of an entry must be non-negative.

Because of that, I do believe that evictionWeight is a monotically increasing counter.

In other words, if eviction is a counter (which it is, even though it returns an accumulated count of evicted entries), then eviction_weight should, in my opinion, also be a counter.

@zeitlinger
Copy link
Member

You're right 😄

@fstab do you think we can make a breaking change in a minor release?

@jhominal
Copy link
Contributor Author

jhominal commented Jan 7, 2025

In case you are not comfortable with making a breaking change in a minor release (which is quite understandable!), I am willing to make any changes necessary to ease such a migration:

  • Adding an optional constructor parameter to opt-in into the change from gauge to counter;
  • documenting the change on the library documentation (in addition to making a more explicit backward compatibility policy regarding instrumentation libraries, as described on the PR);
  • Adding deprecation annotations/warnings on the current, parameter-less constructor;
  • Anything else that you may think of;

The main point that I am unsure of, is how we will be reminded to make the breaking change, when the next major version will be planned. I guess a deprecated annotation with a comment mentioning that it will be removed in the next major would do the trick

@dhoard
Copy link
Collaborator

dhoard commented Jan 7, 2025

I feel we shouldn't make a breaking change in minor release. This breaks the contract of what should be contained in a minor
release.

@jhominal
Copy link
Contributor Author

jhominal commented Jan 7, 2025

@zeitlinger @dhoard @fstab: As mentioned in the corresponding PR, I consider that there are two changes that may break users of the caffeine-instrumentation library:

  • Changing the caffeine_cache_eviction_weight metric from a gauge to a counter;
  • Adding the caffeine_cache_weighted_size as a new metric;

Here is my proposal for handling these changes:

  • Implement a Builder pattern on caffeine-instrumentation CacheMetricsCollector ;
  • The builder will add boolean options in order to control, both adding the new caffeine_cache_weighted_size metric, and the change of caffeine_cache_eviction_weight to a gauge;
  • When using the Builder pattern, by default the two changes will be applied;
  • The current constructor will be marked as deprecated and will lock its users into the legacy behavior;

@ben-manes
Copy link

here is some historical context fwiw, in case it is somehow helpful.

The immutable CacheStats consists only of saturating monotonically increasing statistics, per its history in Guava's cache. This was the approach of Google's monitoring system (Borgmon), which Prometheus was inspired by, as a best practice on time series based monitoring systems. That allows for the data system to perform the differences and rate computations, such as for alerting. Those who wanted client-side computed rates for non-monotonic reporting could instead use the plus/minus of snapshots to diff it themselves.

The eviction weight was added in Caffeine by request of Apache Druid to optimize their deployments of their local / remote caches. The developer nicely summarized why the metric would be useful for his tuning.

@jhominal
Copy link
Contributor Author

Thank you for your feedback.

I have implemented the Builder pattern that was higher in this issue.

The corresponding PR #1251 is, in my view, ready and waiting for a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants