-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spike: Revisit the issue of adding rate limiting logic to the application, create a list of actionable issues to start the effort. #23
Comments
For Harvard Dataverse we have been talking about investigating rate limiting solutions offered by AWS and I just pushed b1b703a to mention the new "Rate-Based Rules" offering that's part of AWS WAF (Web Application Firewall). This blog post provides a good overview: https://aws.amazon.com/blogs/aws/protect-web-sites-services-using-rate-based-rules-for-aws-waf/ |
At standup this morning I inquired if there is any specific technical plan or approach and got feedback that we are fine with an AWS-specific solution for now so I went ahead and made pull request IQSS/dataverse#4693 based on the commit I mentioned above and moved this issue to code review at https://waffle.io/IQSS/dataverse |
Thanks @pdurbin. I re-titled the issue to reflect that this is AWS-specific. We'll want a general solution at some point, but I think it's good to get this small chunk tested and out in a release. |
I sent a note to LTS:
(by the time I hit send I kinda felt like I was maybe pushing it with them... well, if that's the case they'll tell us to do it ourselves and we will. but I figured I'd ask) |
Meeting with LTS on Wednesday, will discuss. |
@matthew-a-dunlap Both nodes are using the database on dvn-cloud-dev-1. |
This story is mostly blocked until we hear back from LTS about access to the web console, they only provided us access to the boxes themselves. I'll can do some deeper research into web application firewall in the meantime. |
Prio meeting with Stefano.
|
Top priority for upcoming sprint |
Sizing:
|
This came up yet again, recently. What I'm proposing is that instead of trying to re-visit this issue as a whole, we should just start chipping away at the problem by addressing certain specific cases of limiting excessive load that we can define and know how to address. I've proposed some, like detecting and blocking aggressive crawlers (basically what I do by hand occasionally; also blocking crawlers may be one area where some off the shelf solution may/should work); or limiting specific expensive activity on the user level (like a limit on how many files/data an unprivileged user can upload per hour). Features like this are in fact long overdue. And I'm convinced by now that it would be more productive to just work on them one clearly defined case at a time. |
Sprint board review
(I can't wait until some of this is automated) |
Sprint board review
|
There are few specific areas that have been identified where we can start working immediately.
|
Per feedback from @qqmyers, I'll run some quick practical analysis on the ActionLogRecord data in production, to see if any obvious results can be derived from it immediately, smoking guns/worst offenders, etc. |
Actually, I'll add any useful stats from the prod. ActionLogRecord to the "command engine" issue (#9356). |
Reviewed the new issues added - I think they look good and represent what we can first get done, in order to help with rate limiting. There may well be more to do after those, but let's get them working (I've gone ahead and added them to the Dataverse Dev column in the backlog board) and we can revisit after, as needed. |
Grooming:
|
grooming:
|
Closing this, now that we have IQSS/dataverse#10211 in progress. |
@scolapasta Are you sure you wanted to close this one? I can see how an argument can be made that if there is anything potentially expensive that we want to ration, that's done bypassing the command system, then it could potentially be addressed by creating dedicated commands for all such things... But I still think that would need to be discussed to make sure we're not missing anything. |
@landreev If there are other areas that we do need to ration, outside of the command system, then I'd vote for creating more specific actionable issues for it. This one here was in the dm-project and I do think we've made plenty of headway on different aspects and I think that accomplished the goal of "creat[ing] a list of actionable issues to start the effort". But if you feel otherwise and think there's something more we can do for this one specifically, that's fine too. |
This ticket is a placeholder for general API rate and access limiting logic to better control the load placed on the service and provide options in case of system instability.
Rate limiting was mentioned during search api testing and github search api uses this concept too:
https://developer.github.com/v3/search/
Limiting access might involve varying degrees of options: general api access on/off switch, per api, and/or whitelist/blacklist of ip addresses/ users. The last might be integrated with groups and permissions.
Update: additional terms for this:
The text was updated successfully, but these errors were encountered: