Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove client application headers from API responses #4569

Closed
sarayourfriend opened this issue Jun 28, 2024 · 2 comments · Fixed by #4635
Closed

Remove client application headers from API responses #4569

sarayourfriend opened this issue Jun 28, 2024 · 2 comments · Fixed by #4635
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API

Comments

@sarayourfriend
Copy link
Collaborator

Problem

application: ThrottledApplication = request.auth.application
response["x-ov-client-application-name"] = application.name
response["x-ov-client-application-verified"] = application.verified

The code linked above adds the client application name and verification status to response headers. This is useful when we scan Nginx request logs. However, it also makes responses that would not otherwise be sensitive to the authorization header suddenly sensitive.

Thumbnails, for example, can be universally cached at the edge without issue. However, when an authorization header is present, the cache (correctly) will bypass and send the request upstream. That is only correct and necessary behaviour due to our inclusion of authorization sensitive materials in responses.

Description

If we remove these headers from responses (and instead log them to our structured API logging), we can use more aggressive caching for thumbnails that does not vary on the authorization header.

@sarayourfriend sarayourfriend added 🟨 priority: medium Not blocking but should be addressed soon 💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🧱 stack: api Related to the Django API labels Jun 28, 2024
@krysal
Copy link
Member

krysal commented Jun 28, 2024

@sarayourfriend To confirm my understanding of the desired solution, will transforming the response_headers_middleware middleware into a response_logging_middleware and replacing the added headers with the following logging do the job?

logger.info(
    "application_response_verification_status",
    application_id=application.id,
    application_name=application.name,
    application_verified=application.verified,
)

Or is there a better place for logging this info? I tried extending request log metadata as the django-structlog documentation suggest but I couldn't get the application's information in conf/settings/signals.py. I thought it could be the placing of the middleware in the base config list but it doesn't seem to be that.

# File: api/conf/settings/signals.py

@receiver(signals.bind_extra_request_metadata)
def bind_application_id(request, logger, **kwargs):
    if not (hasattr(request, "auth") and hasattr(request.auth, "application")):
        return

    # This code is never reached ❌
    application = request.auth.application
    structlog.contextvars.bind_contextvars(application_id=application.id)

@sarayourfriend
Copy link
Collaborator Author

sarayourfriend commented Jun 28, 2024

Your overall understanding is correct 👍 I don't think we need to use the middleware, though.

I haven't looked into the implementation of that signal, but I'm going to guess it must happen before authentication occurs, if request.auth isn't available? But it's weird that the documentation says user_id is included, which would seem to imply it happens after authentication. Maybe there's an issue with how we've ordered the middlewares?

If that approach with the signals isn't possible, then I think it would be fine to add logging somewhere around here, in the authentication class itself: https://github.com/WordPress/openverse/blob/main/api/conf/oauth2_extensions.py#L30-L32

I don't think we necessarily need to add these properties to every log line, which it looks like extending the request metadata would do. Instead, we can have a specific log event client_application_authentication (or whatever makes sense) that logs these pieces of information. Then, if we need to identify the client app for a request, we can query for that event where request_id=<the request id of the other request/event we're investigating>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants