helm: add rate limit env variable to launch workflow #619

VMois · 2022-03-09T07:19:53Z

codecov-commenter · 2022-03-09T07:38:01Z

Codecov Report

Merging #619 (a9e58d4) into master (5e71e74) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #619   +/-   ##
=======================================
  Coverage   19.34%   19.34%           
=======================================
  Files          26       26           
  Lines        2078     2078           
=======================================
  Hits          402      402           
  Misses       1676     1676

audrium · 2022-03-11T12:13:58Z

helm/reana/README.md

@@ -20,6 +20,7 @@ This Helm automatically prefixes all names using the release name to avoid colli
 | `components.reana_server.environment.REANA_WORKFLOW_SCHEDULING_POLICY` | Define workflow scheduling strategy. Options are "fifo" for first-in-first-out strategy regardless of users and "balanced" for multi-user-aware scheduling strategy. | "fifo" |
 | `components.reana_server.environment.REANA_RATELIMIT_GUEST_USER` | Set API limiter config for guest users. Users using reana-client will be treated as guests. | "20 per second" |
 | `components.reana_server.environment.REANA_RATELIMIT_AUTHENTICATED_USER` | Set API limiter config for authenticated web UI users. | "20 per second" |
+| `components.reana_server.environment.REANA_RATELIMIT_LAUNCH_WORKFLOW` | Set API limiter config for launch workflow endpoint. | "1/30 second" |


Tested the PR set, works nicely, just a couple of questions:

Perhaps it's worth to go for a more generic "fast" and a "slow" endpoint solution as described in the issue? Creating a separate envar only for launch endpoint seems to be too specific. WDYT? cc @tiborsimko

"1/30 second" rate limits seems to be too strict for me? Perhaps we should relax it a bit to something like "1/5 second"

I haven't looked deeply, but would it be possible to return more user friendly error messages if rate limit is not respected? For example if user hits Launch on REANA web page twice in 30 seconds, this is what will be displayed:

"1/5 second" means "1 per 5 seconds" which is stricter than "1/30 second" which means "1 per 30 seconds". We can make it "1/15 second" which will mean "1 per 15 seconds".

Good point. There are two ways how to make it nice: change UI or change backend. With the backend, I think, it is possible with limiter and flask-limiter libraries. But, we rely on invenio-app so I will need to check if they allow it somehow.

"1/30 second" allows a user to launch a workflow only once per 30 seconds
while "1/5 second" allows to do it every 5 seconds, seems less strict to me

Ops. You are right. My brain is not working.

WRT 1 I agree with @audrium, as we'll soon have other possibly "slow" endpoints, such as for validation of docker images that would need fetching, spawning pods, etc. Could create DoS kind of workload if someone overdoes those. So we could have the generic REANA_RATELIMIT_GUEST_USER and REANA_RATELIMIT_AUTHENTICATED_USER variables for default "fast" endpoints, as up to now, and we could introduce a new REANA_RATELIMIT_SLOW variable for "slow" endpoints that need to be protected, and start to applying them e.g. to workflow launcher, to notebook launcher, etc.

We could actually proactively introduce three values already, *_FAST, *_MEDIUM, *_SLOW if we want, and start tagging other endpoints as well. For example, for ATLAS pMSSM use case, where there are many concurrent workflows running, one may want to go into 1000 per second for some endpoints. So we could start already tagging via fast/medium/slow those endpoints which are needed for internal system working (and need to be ultra permissive), which are just for regular user consumption wondering about status (and which are medium default), which might be too time consuming (and need slow protection).

(As for naming, fast/medium/slow might be enough, we could introduce "rocket" or something later if need be. Or we could invent some fancy naming scheme such as cheetah/rabbit/ant/snail that would also be nicely extensible in the future if there will be a need for more fine-grained categorisation than just fast/slow.).

WRT 2 allowing 1 launch each 5 seconds seems reasonable.

WRT 3 prettifying the output message would be a nice bonus. See also #286 how the rate limiter error message looks like in regular reana-client usage scenario if user has many input files. Currently it is not easily understandable by users. So improving it for both Web UI users and CLI UI users would be really nice to have.

Changed to REANA_RATELIMIT_SLOW. I think we can do FAST and MEDIUM later when it is needed. I will create an issue.

Change to 1/5

Can be a separate issue to improve Web and CLI messages.

Created reanahub/reana-server#455 to address the 3rd point

audrium · 2022-03-14T11:25:00Z

helm/reana/README.md

@@ -20,6 +20,7 @@ This Helm automatically prefixes all names using the release name to avoid colli
 | `components.reana_server.environment.REANA_WORKFLOW_SCHEDULING_POLICY` | Define workflow scheduling strategy. Options are "fifo" for first-in-first-out strategy regardless of users and "balanced" for multi-user-aware scheduling strategy. | "fifo" |
 | `components.reana_server.environment.REANA_RATELIMIT_GUEST_USER` | Set API limiter config for guest users. Users using reana-client will be treated as guests. | "20 per second" |
 | `components.reana_server.environment.REANA_RATELIMIT_AUTHENTICATED_USER` | Set API limiter config for authenticated web UI users. | "20 per second" |
+| `components.reana_server.environment.REANA_RATELIMIT_SLOW` | Set API limit config for slow endpoints. | "1/5 second" |


We could enhance the message for the user to give a bit more context:
Set API limiter config for slow endpoints that need to be protected e.g. launch endpoint
or something similar, WDYT?

Also, we could add a similar message to the changelog

closes reanahub/reana-server#443

audrium

Works nicely!

VMois force-pushed the flexible-rate-limiter branch from caafe8d to b2c6c31 Compare March 9, 2022 12:47

VMois marked this pull request as ready for review March 9, 2022 12:55

audrium reviewed Mar 11, 2022

View reviewed changes

VMois force-pushed the flexible-rate-limiter branch 2 times, most recently from a9e58d4 to 6732776 Compare March 14, 2022 10:56

audrium reviewed Mar 14, 2022

View reviewed changes

audrium mentioned this pull request Mar 14, 2022

API rate limiter: prettify error messages reanahub/reana-server#455

Closed

helm: add API rate limit env variable for slow endpoints

52010f6

closes reanahub/reana-server#443

VMois force-pushed the flexible-rate-limiter branch from 6732776 to 52010f6 Compare March 14, 2022 12:39

audrium approved these changes Mar 14, 2022

View reviewed changes

audrium merged commit 52010f6 into reanahub:master Mar 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

helm: add rate limit env variable to launch workflow #619

helm: add rate limit env variable to launch workflow #619

VMois commented Mar 9, 2022

codecov-commenter commented Mar 9, 2022 •

edited

Loading

audrium Mar 11, 2022

VMois Mar 11, 2022

audrium Mar 11, 2022

VMois Mar 11, 2022

tiborsimko Mar 14, 2022

VMois Mar 14, 2022

audrium Mar 14, 2022

audrium Mar 14, 2022

audrium left a comment

helm: add rate limit env variable to launch workflow #619

helm: add rate limit env variable to launch workflow #619

Conversation

VMois commented Mar 9, 2022

codecov-commenter commented Mar 9, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

audrium left a comment

Choose a reason for hiding this comment

codecov-commenter commented Mar 9, 2022 •

edited

Loading