Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[server, dashboard] Do basic rate limiting on startWorkspace #8073

Merged
merged 1 commit into from
Feb 7, 2022

Conversation

geropl
Copy link
Member

@geropl geropl commented Feb 7, 2022

Description

This configures a rate-limit for API calls to startWorkspace of 1 / 10s. The dashboard handles the error and waits until retryAfter to re-try the call (currently indefinitely).

Note that this is enforced per server instance at the moment, as the rate-limiter state is not shared between server instances. But it's a first step that already reduces the impact of future incidents.

Related Issue(s)

Context: #8043

How to test

Release Notes

configure basic rate-limiting for `startWorkspace`

Documentation

@geropl geropl requested a review from a team February 7, 2022 16:59
@github-actions github-actions bot added the team: webapp Issue belongs to the WebApp team label Feb 7, 2022
@codecov
Copy link

codecov bot commented Feb 7, 2022

Codecov Report

Merging #8073 (056ead0) into main (29c3a7d) will decrease coverage by 1.15%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #8073      +/-   ##
==========================================
- Coverage   12.01%   10.86%   -1.16%     
==========================================
  Files          20       18       -2     
  Lines        1190     1022     -168     
==========================================
- Hits          143      111      -32     
+ Misses       1043      909     -134     
+ Partials        4        2       -2     
Flag Coverage Δ
components-gitpod-cli-app 10.86% <ø> (ø)
components-local-app-app-darwin-amd64 ?
components-local-app-app-darwin-arm64 ?
components-local-app-app-linux-amd64 ?
components-local-app-app-linux-arm64 ?
components-local-app-app-windows-386 ?
components-local-app-app-windows-amd64 ?
components-local-app-app-windows-arm64 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
components/local-app/pkg/auth/auth.go
components/local-app/pkg/auth/pkce.go

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29c3a7d...056ead0. Read the comment docs.

Copy link
Contributor

@jankeromnes jankeromnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, so cool to see the rate limiter code actually working! 👀 (Last time we tried it with Team Plan methods, I think it didn't behave as expected at all and we had to revert it again.)

I guess this makes our "infinite-redirect on workspace fast crash" failure mode a little better (i.e. we make the self-DDOS 10x less severe), and the code looks good to me. 👍

I'm just curious if this can have any impact in normal situations.

  • For one, I guess it will impact the use case "have 5 stopped workspaces and restart them all by clicking all the tabs" (I guess it will just make their restart [0-4] * 10s slower, but maybe that's okay)
  • Other impact can probably be tested on staging once merged. I expect no blockers, but if starting workspaces on staging becomes unreliable, we might have to revert again before deploying (that's what happened with our Team Plan calls rate-limiting attempt last time)

In any case, this looks good for merging & further testing on staging! 🚀

Comment on lines +39 to +42
startWorkspace: {
points: 1, // 1 workspace start per user per 10s
durationsSec: 10
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -356,7 +356,7 @@ class GitpodJsonRpcProxyFactory<T extends object> extends JsonRpcProxyFactory<T>
throw rlRejected;
}
log.warn({ userId }, "Rate limiter prevents accessing method due to too many requests.", rlRejected, { method });
throw new ResponseError(ErrorCodes.TOO_MANY_REQUESTS, "too many requests", { "Retry-After": String(Math.round(rlRejected.msBeforeNext / 1000)) || 1 });
throw new ResponseError<RateLimiterError>(ErrorCodes.TOO_MANY_REQUESTS, "too many requests", { method, retryAfter: Math.round(rlRejected.msBeforeNext / 1000) || 1 });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably safer with a Math.ceil, but not a major issue (since the front-end retry code works well):

Suggested change
throw new ResponseError<RateLimiterError>(ErrorCodes.TOO_MANY_REQUESTS, "too many requests", { method, retryAfter: Math.round(rlRejected.msBeforeNext / 1000) || 1 });
throw new ResponseError<RateLimiterError>(ErrorCodes.TOO_MANY_REQUESTS, "too many requests", { method, retryAfter: Math.ceil(rlRejected.msBeforeNext / 1000) || 1 });

@roboquat roboquat merged commit d955ce1 into main Feb 7, 2022
@roboquat roboquat deleted the gpl/8043-redirect-loop branch February 7, 2022 17:54
@geropl
Copy link
Member Author

geropl commented Feb 8, 2022

@jankeromnes Thx for the thorough review! 🙏

I'm just curious if this can have any impact in normal situations.

Indeed. This is first and foremost a safety measure to help with the next days. Next on the list:

  • see if we can get the "number of workspaces started by the user recently" from the DB, and maybe show a modal if it's too excessive
  • fix the underlying issue: requires a re-deploy of workspace cluster (ws-proxy)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note size/L team: webapp Issue belongs to the WebApp team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants