Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm: add rate limit env variable to launch workflow #619

Merged
merged 1 commit into from
Mar 14, 2022

Conversation

VMois
Copy link

@VMois VMois commented Mar 9, 2022

@codecov-commenter
Copy link

codecov-commenter commented Mar 9, 2022

Codecov Report

Merging #619 (a9e58d4) into master (5e71e74) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #619   +/-   ##
=======================================
  Coverage   19.34%   19.34%           
=======================================
  Files          26       26           
  Lines        2078     2078           
=======================================
  Hits          402      402           
  Misses       1676     1676           

@VMois VMois marked this pull request as ready for review March 9, 2022 12:55
@@ -20,6 +20,7 @@ This Helm automatically prefixes all names using the release name to avoid colli
| `components.reana_server.environment.REANA_WORKFLOW_SCHEDULING_POLICY` | Define workflow scheduling strategy. Options are "fifo" for first-in-first-out strategy regardless of users and "balanced" for multi-user-aware scheduling strategy. | "fifo" |
| `components.reana_server.environment.REANA_RATELIMIT_GUEST_USER` | Set API limiter config for guest users. Users using reana-client will be treated as guests. | "20 per second" |
| `components.reana_server.environment.REANA_RATELIMIT_AUTHENTICATED_USER` | Set API limiter config for authenticated web UI users. | "20 per second" |
| `components.reana_server.environment.REANA_RATELIMIT_LAUNCH_WORKFLOW` | Set API limiter config for launch workflow endpoint. | "1/30 second" |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the PR set, works nicely, just a couple of questions:

  1. Perhaps it's worth to go for a more generic "fast" and a "slow" endpoint solution as described in the issue? Creating a separate envar only for launch endpoint seems to be too specific. WDYT? cc @tiborsimko

  2. "1/30 second" rate limits seems to be too strict for me? Perhaps we should relax it a bit to something like "1/5 second"

  3. I haven't looked deeply, but would it be possible to return more user friendly error messages if rate limit is not respected? For example if user hits Launch on REANA web page twice in 30 seconds, this is what will be displayed:
    Screenshot 2022-03-11 at 13-10-42 REANA

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. "1/5 second" means "1 per 5 seconds" which is stricter than "1/30 second" which means "1 per 30 seconds". We can make it "1/15 second" which will mean "1 per 15 seconds".

  2. Good point. There are two ways how to make it nice: change UI or change backend. With the backend, I think, it is possible with limiter and flask-limiter libraries. But, we rely on invenio-app so I will need to check if they allow it somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"1/30 second" allows a user to launch a workflow only once per 30 seconds
while "1/5 second" allows to do it every 5 seconds, seems less strict to me

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ops. You are right. My brain is not working.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT 1 I agree with @audrium, as we'll soon have other possibly "slow" endpoints, such as for validation of docker images that would need fetching, spawning pods, etc. Could create DoS kind of workload if someone overdoes those. So we could have the generic REANA_RATELIMIT_GUEST_USER and REANA_RATELIMIT_AUTHENTICATED_USER variables for default "fast" endpoints, as up to now, and we could introduce a new REANA_RATELIMIT_SLOW variable for "slow" endpoints that need to be protected, and start to applying them e.g. to workflow launcher, to notebook launcher, etc.

We could actually proactively introduce three values already, *_FAST, *_MEDIUM, *_SLOW if we want, and start tagging other endpoints as well. For example, for ATLAS pMSSM use case, where there are many concurrent workflows running, one may want to go into 1000 per second for some endpoints. So we could start already tagging via fast/medium/slow those endpoints which are needed for internal system working (and need to be ultra permissive), which are just for regular user consumption wondering about status (and which are medium default), which might be too time consuming (and need slow protection).

(As for naming, fast/medium/slow might be enough, we could introduce "rocket" or something later if need be. Or we could invent some fancy naming scheme such as cheetah/rabbit/ant/snail that would also be nicely extensible in the future if there will be a need for more fine-grained categorisation than just fast/slow.).

WRT 2 allowing 1 launch each 5 seconds seems reasonable.

WRT 3 prettifying the output message would be a nice bonus. See also #286 how the rate limiter error message looks like in regular reana-client usage scenario if user has many input files. Currently it is not easily understandable by users. So improving it for both Web UI users and CLI UI users would be really nice to have.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Changed to REANA_RATELIMIT_SLOW. I think we can do FAST and MEDIUM later when it is needed. I will create an issue.

  2. Change to 1/5

  3. Can be a separate issue to improve Web and CLI messages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created reanahub/reana-server#455 to address the 3rd point

@VMois VMois force-pushed the flexible-rate-limiter branch 2 times, most recently from a9e58d4 to 6732776 Compare March 14, 2022 10:56
@@ -20,6 +20,7 @@ This Helm automatically prefixes all names using the release name to avoid colli
| `components.reana_server.environment.REANA_WORKFLOW_SCHEDULING_POLICY` | Define workflow scheduling strategy. Options are "fifo" for first-in-first-out strategy regardless of users and "balanced" for multi-user-aware scheduling strategy. | "fifo" |
| `components.reana_server.environment.REANA_RATELIMIT_GUEST_USER` | Set API limiter config for guest users. Users using reana-client will be treated as guests. | "20 per second" |
| `components.reana_server.environment.REANA_RATELIMIT_AUTHENTICATED_USER` | Set API limiter config for authenticated web UI users. | "20 per second" |
| `components.reana_server.environment.REANA_RATELIMIT_SLOW` | Set API limit config for slow endpoints. | "1/5 second" |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could enhance the message for the user to give a bit more context:
Set API limiter config for slow endpoints that need to be protected e.g. launch endpoint
or something similar, WDYT?

Also, we could add a similar message to the changelog

Copy link
Member

@audrium audrium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works nicely!

@audrium audrium merged commit 52010f6 into reanahub:master Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rate limit for run-on-reana launch endpoint
4 participants