Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add referrer based throttle scope #3486

Merged
merged 4 commits into from
Dec 19, 2023
Merged

Conversation

sarayourfriend
Copy link
Collaborator

Fixes

Fixes #3484 by @sarayourfriend
Fixes #3485 by @sarayourfriend

Description

Adds a new set of throttle classes to match requests with the openverse.org referrer header. Refer to the issue for rationale behind this approach. Keep in mind it is a temporary stop gap with known issues we seek to resolve with more advanced approaches later on.

The rollout for this will need infrastructure updates to include the new rate limiting scope environment variables.

Testing Instructions

Check over the new unit tests. I made significant modifications here to remove artificiallity from the tests. Instead, the tests run against endpoints known to have specific rate limiting configuration (whether the default or otherwise) and confirms that those rate limit scopes are used, based on the rate limiting headers).

To manually test the new throttle, run the API with DISABLE_GLOBAL_THROTTLING=False in your api/.env file (just api/up to get the new setting loaded). Then make a request to localhost:50280/v1/ in your browser and inspect the response. Confirm that the anon rate limit scope is applied based on the X-RateLimit headers. Now use your browser's dev tools to edit and resend the request, but set the referrer header to openverse.org. Inspect the response, and confirm that the ov_referrer scope is used instead, with separate rate limit counts. If you made a single anon request, you should see 4 remaining burst requests for ov_referrer. For extra confidence, run out of requests on the anon by refreshing the page 5 times (then once more to show the rate limit response) and then make the request with the referrer header. You should now have a separate limit.

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • [N/A] I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@sarayourfriend sarayourfriend added 🟧 priority: high Stalls work on the project or its dependents 💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🧱 stack: api Related to the Django API labels Dec 7, 2023
@sarayourfriend sarayourfriend requested a review from a team as a code owner December 7, 2023 20:31
@openverse-bot openverse-bot added the 🕹 aspect: interface Concerns end-users' experience with the software label Dec 7, 2023
Comment on lines +26 to +53
DEFAULT_THROTTLE_RATES = {
"anon_burst": THROTTLE_ANON_BURST,
"anon_sustained": THROTTLE_ANON_SUSTAINED,
"anon_healthcheck": THROTTLE_ANON_HEALTHCHECK,
"anon_thumbnail": THROTTLE_ANON_THUMBS,
"ov_referrer_burst": THROTTLE_OV_REFERRER_BURST,
"ov_referrer_sustained": THROTTLE_OV_REFERRER_SUSTAINED,
"ov_referrer_thumbnail": THROTTLE_OV_REFERRER_THUMBS,
"oauth2_client_credentials_thumbnail": THROTTLE_OAUTH2_THUMBS,
"oauth2_client_credentials_sustained": "10000/day",
"oauth2_client_credentials_burst": "100/min",
"enhanced_oauth2_client_credentials_sustained": "20000/day",
"enhanced_oauth2_client_credentials_burst": "200/min",
# ``None`` completely by-passes the rate limiting
"exempt_oauth2_client_credentials": None,
}

DEFAULT_THROTTLE_CLASSES = (
"api.utils.throttle.BurstRateThrottle",
"api.utils.throttle.SustainedRateThrottle",
"api.utils.throttle.OpenverseReferrerBurstRateThrottle",
"api.utils.throttle.OpenverseReferrerSustainedRateThrottle",
"api.utils.throttle.OAuth2IdBurstRateThrottle",
"api.utils.throttle.OAuth2IdSustainedRateThrottle",
"api.utils.throttle.EnhancedOAuth2IdBurstRateThrottle",
"api.utils.throttle.EnhancedOAuth2IdSustainedRateThrottle",
"api.utils.throttle.ExemptOAuth2IdRateThrottle",
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved these so that they are easier to reference in the tests when we need to enable the throttling for a scoped set of tests.

Comment on lines 143 to 150
if token_info.rate_limit_model not in self.applies_to_rate_limit_model:
return None

return self.cache_format % {"scope": self.scope, "ident": token_info.client_id}


class OAuth2IdThumbnailRateThrottle(AbstractOAuth2IdRateThrottle):
applies_to_rate_limit_model = "standard"
applies_to_rate_limit_model = ["standard", "enhanced"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"enhanced" previously had no thumbnail rate applied to it. I don't think that's the correct behaviour, but I could be wrong. This change makes enhanced tokens have the same thumbnail rate limit as standard oauth. I think that's right because we have no separate thumbnail rate limit configured for enhanced tokens.

Exempt tokens, I assume, should be exempted from all rate limiting, so there's no change for that. That already worked.


from api.utils.oauth2_helper import get_token_info


parent_logger = logging.getLogger(__name__)


class SimpleRateThrottleHeader(SimpleRateThrottle, metaclass=abc.ABCMeta):
class SimpleRateThrottle(BaseSimpleRateThrottle, metaclass=abc.ABCMeta):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this because it no longer only handled header-related enhancements.

Comment on lines +16 to +44
def enable_throttles(settings):
# Stash current settings so we can revert them after the test
original_default_throttle_rates = api_settings.DEFAULT_THROTTLE_RATES

# Put settings into base Django settings from which DRF reads
# settings when we call `api_settings.reload()`
settings.REST_FRAMEWORK["DEFAULT_THROTTLE_RATES"] = settings.DEFAULT_THROTTLE_RATES
settings.REST_FRAMEWORK[
"DEFAULT_THROTTLE_CLASSES"
] = settings.DEFAULT_THROTTLE_CLASSES

def get_redis_connection(*args, **kwargs):
return fake_redis
# Reload the settings and read them from base Django settings
# Also handles importing classes from class strings, etc
api_settings.reload()

monkeypatch.setattr("api.utils.throttle.get_redis_connection", get_redis_connection)
# Put the parsed/imported default throttle classes onto the base media view set
# to emulate the application startup. Without this, MediaViewSet has cached the
# initial setting and won't re-retrieve it after we've called `api_settings.reload`
MediaViewSet.throttle_classes = api_settings.DEFAULT_THROTTLE_CLASSES
throttle.SimpleRateThrottle.THROTTLE_RATES = api_settings.DEFAULT_THROTTLE_RATES

yield fake_redis
fake_redis.client().close()
yield

# Set everything back as it was before and reload settings again
del settings.REST_FRAMEWORK["DEFAULT_THROTTLE_CLASSES"]
settings.REST_FRAMEWORK["DEFAULT_THROTTLE_RATES"] = original_default_throttle_rates
api_settings.reload()
MediaViewSet.throttle_classes = api_settings.DEFAULT_THROTTLE_CLASSES
throttle.SimpleRateThrottle.THROTTLE_RATES = api_settings.DEFAULT_THROTTLE_RATES
Copy link
Collaborator Author

@sarayourfriend sarayourfriend Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we don't really need to modify settings directly, just the MediaViewSet and throttle.SimpleRateThrottle attributes. But to do that, we'd have to mimic a lot of what api_settings.reload() does, so it seemed worth it to go through that approach. Open to suggestions here though.

@sarayourfriend sarayourfriend removed the 💻 aspect: code Concerns the software code in the repository label Dec 7, 2023
@AetherUnbound AetherUnbound linked an issue Dec 11, 2023 that may be closed by this pull request
Copy link
Collaborator

@stacimc stacimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests well for me, and the stop gap approach makes sense. No blocking feedback. The test improvements in particular are great and so easy to read, cheers!

api/api/utils/throttle.py Show resolved Hide resolved
@openverse-bot
Copy link
Collaborator

Based on the high urgency of this PR, the following reviewers are being gently reminded to review this PR:

@dhruvkb
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend1 days, this PR was ready for review 7 day(s) ago. PRs labelled with high urgency are expected to be reviewed within 2 weekday(s)2.

@sarayourfriend, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Footnotes

  1. Specifically, Saturday and Sunday.

  2. For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range.

Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! There is one potential scope for improvement but I don't think like it should block a PR that works.

I also have mixed feelings about having to redo #3505 and #3544.

api/test/unit/utils/test_throttle.py Show resolved Hide resolved
@sarayourfriend sarayourfriend merged commit 20d104d into main Dec 19, 2023
47 checks passed
@sarayourfriend sarayourfriend deleted the add/referrer-header-throttling branch December 19, 2023 19:20
@sarayourfriend
Copy link
Collaborator Author

I also have mixed feelings about having to redo #3505 and #3544.

Well, yes, I felt the same when I saw those PRs opened after this one when this one had no reviews 😉.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🕹 aspect: interface Concerns end-users' experience with the software 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API
Projects
Archived in project
4 participants