[server, dashboard] Do basic rate limiting on startWorkspace #8073

geropl · 2022-02-07T16:59:29Z

Description

This configures a rate-limit for API calls to startWorkspace of 1 / 10s. The dashboard handles the error and waits until retryAfter to re-try the call (currently indefinitely).

Note that this is enforced per server instance at the moment, as the rate-limiter state is not shared between server instances. But it's a first step that already reduces the impact of future incidents.

Related Issue(s)

Context: #8043

How to test

start any workspace: https://gpl-8043-redirect-loop.staging.gitpod-dev.com/#https://github.com/gitpod-io/template-typescript-node
open the dev console to notice the rate limier messages

Release Notes

configure basic rate-limiting for `startWorkspace`

Documentation

codecov · 2022-02-07T17:07:01Z

Codecov Report

Merging #8073 (056ead0) into main (29c3a7d) will decrease coverage by 1.15%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #8073      +/-   ##
==========================================
- Coverage   12.01%   10.86%   -1.16%     
==========================================
  Files          20       18       -2     
  Lines        1190     1022     -168     
==========================================
- Hits          143      111      -32     
+ Misses       1043      909     -134     
+ Partials        4        2       -2

Flag	Coverage Δ
components-gitpod-cli-app	`10.86% <ø> (ø)`
components-local-app-app-darwin-amd64	`?`
components-local-app-app-darwin-arm64	`?`
components-local-app-app-linux-amd64	`?`
components-local-app-app-linux-arm64	`?`
components-local-app-app-windows-386	`?`
components-local-app-app-windows-amd64	`?`
components-local-app-app-windows-arm64	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
components/local-app/pkg/auth/auth.go
components/local-app/pkg/auth/pkce.go

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29c3a7d...056ead0. Read the comment docs.

jankeromnes

Wow, so cool to see the rate limiter code actually working! 👀 (Last time we tried it with Team Plan methods, I think it didn't behave as expected at all and we had to revert it again.)

I guess this makes our "infinite-redirect on workspace fast crash" failure mode a little better (i.e. we make the self-DDOS 10x less severe), and the code looks good to me. 👍

I'm just curious if this can have any impact in normal situations.

For one, I guess it will impact the use case "have 5 stopped workspaces and restart them all by clicking all the tabs" (I guess it will just make their restart [0-4] * 10s slower, but maybe that's okay)
Other impact can probably be tested on staging once merged. I expect no blockers, but if starting workspaces on staging becomes unreliable, we might have to revert again before deploying (that's what happened with our Team Plan calls rate-limiting attempt last time)

In any case, this looks good for merging & further testing on staging! 🚀

jankeromnes · 2022-02-07T17:52:26Z

components/server/src/auth/rate-limiter.ts

+        startWorkspace: {
+            points: 1,  // 1 workspace start per user per 10s
+            durationsSec: 10
+        },


jankeromnes · 2022-02-07T17:53:00Z

components/server/src/websocket/websocket-connection-manager.ts

@@ -356,7 +356,7 @@ class GitpodJsonRpcProxyFactory<T extends object> extends JsonRpcProxyFactory<T>
                    throw rlRejected;
                }
                log.warn({ userId }, "Rate limiter prevents accessing method due to too many requests.", rlRejected, { method });
-                throw new ResponseError(ErrorCodes.TOO_MANY_REQUESTS, "too many requests", { "Retry-After": String(Math.round(rlRejected.msBeforeNext / 1000)) || 1 });
+                throw new ResponseError<RateLimiterError>(ErrorCodes.TOO_MANY_REQUESTS, "too many requests", { method, retryAfter: Math.round(rlRejected.msBeforeNext / 1000) || 1 });


Probably safer with a Math.ceil, but not a major issue (since the front-end retry code works well):

Suggested change

throw new ResponseError<RateLimiterError>(ErrorCodes.TOO_MANY_REQUESTS, "too many requests", { method, retryAfter: Math.round(rlRejected.msBeforeNext / 1000) || 1 });

throw new ResponseError<RateLimiterError>(ErrorCodes.TOO_MANY_REQUESTS, "too many requests", { method, retryAfter: Math.ceil(rlRejected.msBeforeNext / 1000) || 1 });

geropl · 2022-02-08T08:07:43Z

@jankeromnes Thx for the thorough review! 🙏

I'm just curious if this can have any impact in normal situations.

Indeed. This is first and foremost a safety measure to help with the next days. Next on the list:

see if we can get the "number of workspaces started by the user recently" from the DB, and maybe show a modal if it's too excessive
fix the underlying issue: requires a re-deploy of workspace cluster (ws-proxy)

[server, dashboard] Do basic rate limiting on startWorkspace

056ead0

geropl requested a review from a team February 7, 2022 16:59

roboquat added release-note size/L labels Feb 7, 2022

github-actions bot added the team: webapp Issue belongs to the WebApp team label Feb 7, 2022

jankeromnes approved these changes Feb 7, 2022

View reviewed changes

roboquat merged commit d955ce1 into main Feb 7, 2022

roboquat deleted the gpl/8043-redirect-loop branch February 7, 2022 17:54

geropl mentioned this pull request Feb 8, 2022

[dashboard, server] Prevent redirect loops that trigger "startWorkspace" in a loop #8043

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[server, dashboard] Do basic rate limiting on startWorkspace #8073

[server, dashboard] Do basic rate limiting on startWorkspace #8073

geropl commented Feb 7, 2022

codecov bot commented Feb 7, 2022 •

edited

Loading

jankeromnes left a comment •

edited

Loading

jankeromnes Feb 7, 2022

jankeromnes Feb 7, 2022

geropl commented Feb 8, 2022

	throw new ResponseError<RateLimiterError>(ErrorCodes.TOO_MANY_REQUESTS, "too many requests", { method, retryAfter: Math.round(rlRejected.msBeforeNext / 1000) \|\| 1 });
	throw new ResponseError<RateLimiterError>(ErrorCodes.TOO_MANY_REQUESTS, "too many requests", { method, retryAfter: Math.ceil(rlRejected.msBeforeNext / 1000) \|\| 1 });

[server, dashboard] Do basic rate limiting on startWorkspace #8073

[server, dashboard] Do basic rate limiting on startWorkspace #8073

Conversation

geropl commented Feb 7, 2022

Description

Related Issue(s)

How to test

Release Notes

Documentation

codecov bot commented Feb 7, 2022 • edited Loading

Codecov Report

jankeromnes left a comment • edited Loading

Choose a reason for hiding this comment

jankeromnes Feb 7, 2022

Choose a reason for hiding this comment

jankeromnes Feb 7, 2022

Choose a reason for hiding this comment

geropl commented Feb 8, 2022

codecov bot commented Feb 7, 2022 •

edited

Loading

jankeromnes left a comment •

edited

Loading