-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rate limit prebuilds #5176
Comments
/schedule |
Looking into the old solution for now: https://github.com/gitpod-io/gitpod/blob/main/components/server/ee/src/prebuilds/prebuild-queue-maintainer.ts I hope, @csweichel or @geropl find time to provide more details on this, please 🙏🏻
'Talked to Sven and we decide to try to solve it with an extra storage to the side of servers. That should be basically a queue and servers should pick a prebuild event when ready to execute. We assume that having a proper tech that already solves the queueing (such as redis) would also allow us to manage prebuilds properly, i.e. enforce rate limitation and implement throttling. |
I don't think this is actively worked on, and I also think this has to be specified and broken into smaller issues . Removing it from groundwork. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This just lead to an incident: https://app.incident.io/incidents/101 /cc @jldec |
Scheduled this issue. |
Remove highest priority label - not 🔥. |
Happy to look into implementing a reasonable rate limit that doesn't break normal(+) usage but prevents incidents. |
I did some research (internal) on our top prebuild rates for February 2022:
Also, the "Dependabot" case is interesting:
All-in-all, I believe that an initial per-user rate limit at 50 prebuilds / minute makes sense (i.e. allow up to 50 prebuilds per minute, then reject any new prebuild requests beyond that until the next minute):
|
Hi @jankeromnes , would it be prohibitive to also include a check to see how many total prebuilds this user is running now, and not allow them to do more than 50? |
Hi @kylos101! I think this is an interesting, but somewhat more complicated proposal. For one, it's a bit harder to research how many prebuilds in progress all users have at any given time, in order to get the full picture of what normal vs abuse looks like (but I think it can be done). Also, long-building, high-frequency projects like GitLab might well fit under a 25/min rate, but have prebuilds running for 45min, thus potentially ramping up to 1000+ parallel prebuilds at certain times. Suddenly failing 950+ of these might have dramatic consequences (and, if we do fail these 950+ excess prebuilds, I think we need to be smart about picking which ones should get cancelled). All-in-all, I think that:
Just my 2 cents though, I may be wrong. 😊 Please feel free to open a feature request about limiting concurrent prebuilds per user, and I'd be happy to help with research and/or implementation. |
Hey @jankeromnes 👋 ,
I thought that might be, but wanted to ask for the sake of posterity (my lens was the one user, two repos, and ~100 PRs). Thank you very much for your thoughtful response!
Totally agreed, this will 💯 help! 🙏 🌟
Those are excellent points! 🥇 I appreciate you sharing the related use cases / different ways to control the throttle. @jldec is there a related epic for |
We just discussed an possible approach to use the DB to limit the starting rate of prebuilds per repo. |
Bug description
Today at around 13:24 UTC, there was a sudden jump in the number of prebuilds: Over about a minute, the repository https://gitlab.com/yo/gitlab triggered nearly 100 new prebuilds.
Unfortunately, this put excessive pressure on the US cluster, and in particular on the database, which lead to several workspaces not being able to start (for example: https://community.gitpod.io/t/django-pod-is-down/4541).
This incident lasted about 20 minutes and was logged here: https://www.gitpodstatus.com/incidents/ddyzghh3mfbm
Steps to reproduce
Instantly trigger 100 prebuilds
Expected behavior
There should be no outage, especially in the database
Example repository
No response
Anything else?
Maybe it would make sense to introduce a reasonable rate limit on prebuilds, for example per repository.
The text was updated successfully, but these errors were encountered: