Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Inference API Rate limiter #106330

Merged
merged 10 commits into from
Mar 19, 2024

Conversation

jonathan-buttner
Copy link
Contributor

This PR adds a RateLimiter class. It is currently unused but will be leveraged once the queuing and threading of the external services is refactored.

It implements the token bucket algorithm: https://en.wikipedia.org/wiki/Token_bucket

@jonathan-buttner jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team v8.14.0 labels Mar 13, 2024
@jonathan-buttner jonathan-buttner marked this pull request as ready for review March 13, 2024 20:20
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

throw new IllegalArgumentException("Accumulated tokens limit must be greater than or equal to 0");
}

if (newAccumulatedTokensLimit == Double.POSITIVE_INFINITY) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (newAccumulatedTokensLimit == Double.POSITIVE_INFINITY) {
if (Double.isInfinite(newAccumulatedTokensLimit)) {

return Double.POSITIVE_INFINITY;
}

return Double.NEGATIVE_INFINITY;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the accumulateTokens code:

            var newTokens = tokensPerNanos * elapsedTimeNanos;
            accumulatedTokens = Math.min(accumulatedTokensLimit, newTokens);

Math.min(a_positive_number, Double.NEGATIVE_INFINITY) returns Double.NEGATIVE_INFINITY. If accumulatedTokens becomes a -ve number I think that could cause errors.

One option is to return 0 ( not +ve or -ve infinity). Using ChronoUnit.MICRO or ChronoUnit.MILLIS reduces the chance of an arithmetic overflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll switch it to 0 and use micros instead.


private static double nanosBetweenExact(Instant start, Instant end) {
try {
return ChronoUnit.NANOS.between(start, end);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TemporalUnit.between() returns a long not double

private void accumulateTokens() {
var now = Instant.now(clock);
if (now.isAfter(nextTokenAvailability)) {
var elapsedTimeNanos = nanosBetweenExact(nextTokenAvailability, now);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called in the ctor via setRate() at which point nextTokenAvailability == Instant.MIN. Because the calculated elapsedTimeNanos is high the class will be initialised with accumulatedTokens == accumulatedTokensLimit.

That seems reasonable to me, or at least as good as initialising accumulatedTokens to 0. Just want to check that is the intention

Copy link
Contributor Author

@jonathan-buttner jonathan-buttner Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that was intentional. My thinking was that the first request can move forward without having to wait for tokens to accumulate if the limit was set to a positive number. If we always want it to start as 0 that's fine with me too though.

@jonathan-buttner
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-1

@jonathan-buttner
Copy link
Contributor Author

@elasticmachine merge upstream

private void accumulateTokens() {
var now = Instant.now(clock);
if (now.isAfter(nextTokenAvailability)) {
var elapsedTimeNanos = microsBetweenExact(nextTokenAvailability, now);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var elapsedTimeNanos = microsBetweenExact(nextTokenAvailability, now);
var elapsedTimeMicros = microsBetweenExact(nextTokenAvailability, now);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh sorry for all the missed nanos find replace 🤦‍♂️

var now = Instant.now(clock);
if (now.isAfter(nextTokenAvailability)) {
var elapsedTimeNanos = microsBetweenExact(nextTokenAvailability, now);
var newTokens = tokensPerMicros * elapsedTimeNanos;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var newTokens = tokensPerMicros * elapsedTimeNanos;
var newTokens = tokensPerMicros * elapsedTimeMicros;


accumulatedTokensLimit = newAccumulatedTokensLimit;

var unitsInNanos = newUnit.toMicros(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var unitsInNanos = newUnit.toMicros(1);
var unitsInMicros = newUnit.toMicros(1);

accumulatedTokensLimit = newAccumulatedTokensLimit;

var unitsInNanos = newUnit.toMicros(1);
tokensPerMicros = newTokensPerTimeUnit / unitsInNanos;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tokensPerMicros = newTokensPerTimeUnit / unitsInNanos;
tokensPerMicros = newTokensPerTimeUnit / unitsInMicros;

if (now.isAfter(nextTokenAvailability)) {
var elapsedTimeNanos = microsBetweenExact(nextTokenAvailability, now);
var newTokens = tokensPerMicros * elapsedTimeNanos;
accumulatedTokens = Math.min(accumulatedTokensLimit, newTokens);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this include the previously accumulated tokens?

Suggested change
accumulatedTokens = Math.min(accumulatedTokensLimit, newTokens);
accumulatedTokens = Math.min(accumulatedTokensLimit, accumulatedTokens + newTokens);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep thanks for that.

@jonathan-buttner
Copy link
Contributor Author

@elasticmachine merge upstream

@jonathan-buttner
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-3

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jonathan-buttner
Copy link
Contributor Author

@elasticmachine merge upstream

@jonathan-buttner
Copy link
Contributor Author

@elasticmachine merge upstream

@jonathan-buttner jonathan-buttner merged commit edbff94 into elastic:main Mar 19, 2024
14 checks passed
@jonathan-buttner jonathan-buttner deleted the ml-token-bucket branch March 19, 2024 21:44
@lkts lkts mentioned this pull request Mar 21, 2024
@maxhniebergall
Copy link
Contributor

This PR might be relevant to this issue #106877

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >non-issue Team:ML Meta label for the ML team v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants