Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add middleware to log application name and verification status #3369

Merged
merged 2 commits into from
Nov 26, 2023

Conversation

sarayourfriend
Copy link
Collaborator

@sarayourfriend sarayourfriend commented Nov 19, 2023

Also add authorization, app name and verified status to nginx logs

Fixes

Closes #2246 by @sarayourfriend
Fixes #3368 by @sarayourfriend

Description

Add logging as described in the issue.

To add the application name and verified status, I added a new middleware. The tests for this are in the test_auth suite.

Testing Instructions

You need to rebuild the nginx image to get the new logging: just build nginx. Afterwards, make requests to your local API via the nginx container: localhost:50270. Follow that container's logs (just logs nginx) and confirm that you see the following JSON log properties:

  • request_id
  • client_application_name
  • client_application_verified

These come from the following headers, respectively, which you should also see on responses:

  • x-request-id
  • x-ov-client-application-name
  • x-ov-client-application-verified

Note that the last two are only present on authorized requests. Nginx will log an empty string when those headers are not present, but if you make an authorized request locally (by following the auth flow), you'll be able to see these populated both in your response headers and in Nginx's logs.

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • [N/A] I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

Also add authorization, app name and verified status to nginx logs
@sarayourfriend sarayourfriend added 🟨 priority: medium Not blocking but should be addressed soon 🤖 aspect: dx Concerns developers' experience with the codebase 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🧱 stack: api Related to the Django API labels Nov 19, 2023
@sarayourfriend sarayourfriend requested a review from a team as a code owner November 19, 2023 23:49
@openverse-bot
Copy link
Collaborator

Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR:

@obulat
@dhruvkb
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend1 days, this PR was ready for review 4 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2.

@sarayourfriend, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Footnotes

  1. Specifically, Saturday and Sunday.

  2. For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range.

Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the code works as expected! I added a small note about more docs but it's not blocking.

from api.utils.oauth2_helper import get_token_info


def client_application_middleware(get_response):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few docstrings about this middleware would be nice! Just what's in the PR description is good enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'll rename this to add "headers" in there as well. The name I've chosen to commit is bafflingly obscure 😅

Copy link
Contributor

@obulat obulat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well locally, and has great tests.
A question not related to this PR: what is request_id for? It's different every time I request something. ('"request_id":"$sent_http_x_request_id",').

Seeing these x_ headers here, I thought that we could also add some x_ header to all the client frontend requests to identify them instead of trying to set the header in Axios (which is not allowed as it's not safe).


@dataclass
class TokenInfo:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@sarayourfriend
Copy link
Collaborator Author

A question not related to this PR: what is request_id for? It's different every time I request something.

The ID assigned to the request is meant to help us trace an individual requests logs. In a context where logs relevant to different requests can interweave (either in an async context or a multi-worker context), it helps us determine which request caused what logs.

Take the following scenario, for example, considering the same working in an async context.

Request A comes in at timestamp 0
Request B comes in at timestamp 1

Request A will run until it reaches an await, for example, to send the query to ES or to make a database request.

At that point, Request B will run. Imagine that request B finishes completely before request A's await resolves. For example, if request B is not an async route or if request B's async operations finish faster than A's. In that case, all the logging for request B will happen between the logging for request A before and after request A's await point. It could look like this:

req_a: start
req_a: Sending query {query}
req_b: start
req_b: some important error log!
req_b: end
req_a: Received query response {response}
req_a: end

Without a request ID, it wouldn't be either impossible or very difficult to determine whether the error log caused by request b was caused by request B or some other request. If the two requests are totally different code paths, then that makes it possible to deduce which request it's for. But if they're for the same endpoint? It's impossible without reading further logs, and even then, only if further logs are clearly tied to a specific request via the request's context (e.g., by logging the query).

Request ID eliminates the need to do this kind of mental juggling and reliance on deep knowledge of the code when trying to trace the lifecycle of a request in logs and also eliminates the possibility of two requests' logs getting mixed up altogether.

@sarayourfriend
Copy link
Collaborator Author

Seeing these x_ headers here, I thought that we could also add some x_ header to all the client frontend requests to identify them instead of trying to set the header in Axios (which is not allowed as it's not safe).

We can identify frontend client requests in two ways:

  1. After this PR, via the client application name, for SSR requests
  2. Already existing, we can see the openverse.org referrer header (already in our Nginx logs)

Would that cover what you're looking to find, @obulat?

@sarayourfriend sarayourfriend merged commit 520d4d9 into main Nov 26, 2023
43 checks passed
@sarayourfriend sarayourfriend deleted the add/app-middleware branch November 26, 2023 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API
Projects
Archived in project
4 participants