Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional logging around search_controller's ES query building #689

Closed
1 task
sarayourfriend opened this issue Jun 28, 2022 · 8 comments
Closed
1 task
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API

Comments

@sarayourfriend
Copy link
Collaborator

Problem

Currently we only log the query itself, but not the process that led to the query being constructed. We are seeing queries get sent with sizes of up to 1800 but have no way of telling why 1800 was chosen for the size.

Description

Add logging to the _get_query_slice method. Log at each branch and be sure to log the variables that lead to each branch as well. The idea is to log as much as is necessary to understand how the method is working in production to produce the from and size results that are being sent to ES.

In addition, log some basic search facts like the term and the serializer data.

Be sure to introduce a "trace" variable so that we can easily follow a particular search request. This can just be a uuidv4 generated at the top of the search method and passed around and added to the logs.

Implementation

  • 🙋 I would be interested in implementing this feature.
@sarayourfriend sarayourfriend added 🟧 priority: high Stalls work on the project or its dependents 💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users labels Jun 28, 2022
@sarayourfriend sarayourfriend self-assigned this Jun 29, 2022
@sarayourfriend
Copy link
Collaborator Author

Fixed by WordPress/openverse-api#777

@zackkrida
Copy link
Member

Re-opening for consideration, since some of the logging in WordPress/openverse-api#777 had to be reverted while triaging the API stability. Does it make sense to try to reintroduce some of the logging before revisiting the search controller refactor?

@sarayourfriend sarayourfriend removed their assignment Aug 19, 2022
@sarayourfriend
Copy link
Collaborator Author

Agreed, Zack, that we should add the logging back to the search controller.

@sarayourfriend
Copy link
Collaborator Author

@WordPress/openverse-api Does this issue still make sense to implement as it reads today? I'm wondering if we still find value in this particular approach or if we'd want to do something more like labelling specific queries and making it possible to read specific query type times and such. Given we've already identified the most time-intensive parts of the search controller (dead link filtering and very deep pagination), should we add logging around that kind of thing specifically rather than more general logs?

@krysal
Copy link
Member

krysal commented Nov 22, 2022

I think we can lower this issue's priority and look into Kibana, which should have everything built-in to see what is happening inside Elastichsearch. The additional logging around dead link filtering and pagination sounds good to have in the short term as well.

@sarayourfriend
Copy link
Collaborator Author

which should have everything built-in to see what is happening inside Elastichsearch

Can you elaborate what you mean by this? I'm not familiar with using Kibana in this way. Is there a documentation I could read about it? Is Kibana equipped to do meta-analytics of our ES cluster?

@obulat obulat transferred this issue from WordPress/openverse-api Feb 22, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Feb 23, 2023
@krysal krysal self-assigned this Feb 28, 2023
@obulat obulat moved this from 📋 Backlog to 📅 To do in Openverse Backlog Mar 7, 2023
@sarayourfriend
Copy link
Collaborator Author

@krysal @zackkrida @obulat Do y'all think this issue is still necessary? The original conceit of it is no longer valid: we pretty accurately understand why big queries get sent to Elasticsearch: dead links.

The methods mentioned in the issue description have all been well documented and unit tested at this point as well, so I think we understand them. If there are other motivations for adding process logging to the query building then we should update the issue description to accurately reflect the new motivations. Otherwise, we could close the issue. At the very least it doesn't seem like it should be high priority anymore 🤔

@krysal
Copy link
Member

krysal commented Mar 16, 2023

I looked at this a few days ago and was thinking the same, forgot to comment due to other priorities. The logs are there, and I don't think we need more here at the moment. Thanks for confirming Sara!

@krysal krysal closed this as not planned Won't fix, can't repro, duplicate, stale Mar 16, 2023
@github-project-automation github-project-automation bot moved this from 📅 To do to ✅ Done in Openverse Backlog Mar 16, 2023
@AetherUnbound AetherUnbound added 🧱 stack: api Related to the Django API and removed 🧱 stack: backend labels May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants