Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1676712: Reduce throttle backoff time to 1s and fixes it for python and swift #1315

Closed
wants to merge 1 commit into from

Conversation

fviard
Copy link

@fviard fviard commented Nov 11, 2020

Fixes regression introduced by PR #1240 :
Python and Swift "sleep" functions are expecting "seconds" and not ms.

…on and swift

Fixes regression introduced by PR mozilla#1240 :
Python and Swift "sleep" functions are expecting "seconds" and not ms.
@auto-assign auto-assign bot requested a review from badboy November 11, 2020 20:38
Copy link
Contributor

@brizental brizental left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @fviard , thanks for finding and fixing this issue. The intended throttle backoff for is 60s though. Glean's internal throttle time is 60s, which means the binding will still need to wait 60s to get next pings even if they sleep for less than that. Would you mind changing it?

I understand this probably does not solve the issue you describe on the bug about the tests hanging for too long. They will still hang for at least 60s. We have a bug open to allow for customizing the throttling interval, which would allow for you to disable throttling completely for tests. See Bug 1647630.

@fviard
Copy link
Author

fviard commented Nov 12, 2020

Hi @brizental,
The issue with your proposed change is that, whatever happens you will have to wait these 60s because of the uninterruptible sleep.
And so, what I mean, is if the process was to be interrupted with a DONE.
There is no reason to wait 60s to process that, no?
So, is it not better to just let this loop run with 1s sleep? even if it waits 60s in total?
As anyway, the glean-core rate limiter will just say WAIT each time, it is not like really sending the upload request, here it should probably cost around nothing to ask for the task and get the WAIT tag.

But at least, you can react immediately when the time is arrived.

Btw, If you think about it, if your rate limiter is 60s and you plan to sleep for 60s when you receive a WAIT, then:

  • let's suppose you receive the WAIT at 5s before the end of the throttle interval period from the core, you will then wait 55s for nothing.

We have a bug open to allow for customizing the throttling interval, which would allow for you to disable throttling completely for tests
Take care with that, because it is probably the reason why you did not notice the current issue in the first place, as the throttling was specifically set to 1s just for the tests.

@brizental
Copy link
Contributor

brizental commented Nov 13, 2020

And so, what I mean, is if the process was to be interrupted with a DONE.

Yes, you are right about this. But glean-core only throttles in case there are next tasks, if there are none it will return DONE.

So, is it not better to just let this loop run with 1s sleep? even if it waits 60s in total?

If the binding asks for a new task and gets a WAIT response 3 times in a row, it will then get a DONE response. So, if we go ahead with your suggestion of the binding asking every 1s, we make obsolete the rate limiting applied by glean-core, because anytime the limit is reached the binding will ask three times, get a DONE response and shut down the upload worker.

let's suppose you receive the WAIT at 5s before the end of the throttle interval period from the core, you will then wait 55s for nothing.

Regarding this issue, we have also opened a bug to implement your suggestion of a sliding window algorithm, where glean-core tells the binding to wait exactly the amount of time it should until the next rate limit interval instead of defaulting to 60s. See Bug 1676853.

Take care with that, because it is probably the reason why you did not notice the current issue in the first place, as the throttling was specifically set to 1s just for the tests.

This feature has not been implemented yet, so it couldn't have been the issue we didn't notice. I am guessing we didn't notice just because we do not reach the rate limit on our unit tests.

@fviard
Copy link
Author

fviard commented Nov 17, 2020

Hi, sorry for the delay of my reply.
Indeed, if you want 60, it is ok for 60 so good that you merged it.

Regarding the sliding window, it is good, but please don't forget that one of the issue of the sleep is also to have the process interruptible at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants