Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributor rate-limiter should call Wait() not Allow() #879

Closed
bboreham opened this issue Jul 12, 2018 · 1 comment
Closed

Distributor rate-limiter should call Wait() not Allow() #879

bboreham opened this issue Jul 12, 2018 · 1 comment

Comments

@bboreham
Copy link
Contributor

bboreham commented Jul 12, 2018

I think this will work better under very sudden bursts of input.

Under certain circumstances I have seen a scraping Prometheus (v1.8) expand to 1,000 shards, and then have all 1,000 shards upload at the same time, creating an input burst of x00,000 samples.

Our rate-limit defaults to 25,000 samples per second, per distributor.

The behaviour of Allow() under a large burst is to count up to 25,000 saying 'yes', then say 'no' to the rest.

The behaviour of Wait() is to say 'yes' to the first 25,000, then start a sleep for the next one, then the next one, and so on. If the Context has a timeout, let's say 10 seconds, then it figures out it can allow 250,000 samples to wait, and it starts saying 'no' immediately to subsequent callers.

Note we don't currently have a timeout on the input side of distributor, but we have one defaulted to 2 seconds downstream to ingesters.

@bboreham
Copy link
Contributor Author

bboreham commented Jan 6, 2021

In practice, holding all the pending requests in memory of distributors is a bad idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants