Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation Plan: Staging Elasticsearch reindex DAGs for both potential index types #1987

Closed
AetherUnbound opened this issue May 2, 2023 · 1 comment · Fixed by #2358
Assignees
Labels
📄 aspect: text Concerns the textual material in the repository 🌟 goal: addition Addition of new feature 🟨 priority: medium Not blocking but should be addressed soon 🧭 project: implementation plan An implementation plan for a project 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: ingestion server Related to the ingestion/data refresh server

Comments

@AetherUnbound
Copy link
Collaborator

Description

This issue is for tracking the drafting of an implementation plan in correspondence with the one noted in the search relevancy sandbox project proposal: Staging Elasticsearch reindex DAGs for both potential index types (these will be subsets of the full data refresh).

From the project plan:

This plan will describe the DAG or DAGs which will be used to create/update both the proportional-by-provider and production-data-volume indices.

It will also describe the mechanism by which maintainers can rapidly switch index the staging API uses. This could be done in two separate ways: a DAG which allows changing the primary index alias or a set of changes to the API which would allow queries to specify which index they use. The implementation plan should explore and describe both options.

There was some discussion on this in #1107 (link), from @zackkrida and @sarayourfriend:

(Zack)
Feels quite important to me and like it may be a bit under-explained by this proposal. As written, the proposal seems to suggest that the DAGs would be used to allow for creating a new staging index that replaces the default, "live" index the staging API is pointed at. Is that correct?

Alternatively, I can see a lot of value in having DAGs that allow for creating and updating staging indices with different qualities, but having a mechanism that allows developers to switch between them much more rapidly, perhaps even a query param-based solution (which would allow, for example, to switch indices via a checkbox on the staging.openverse.org/preferences page).

To put this all another way, I'm wondering if requirement #5 should be a bit more specific in terms of what "easily" means.

...

(Madison)
That's a great point, but I actually think this would be a good thing to determine in the implementation plan! It was my intention to have a single alias (like production, at least at the current moment) and allow maintainers to swap between these aliases. However, I do quite like the idea of an API setting which might allow dynamically changing which index is used at query time! That would bring increased complexity to the plan and the project that I wasn't initially considering, but if we scope it out as part of the plan, potentially starting with the rapid swapping but laying out a plan for what changing the index at query time might look like, we could even add that capability down the line outside of the bounds of this specific effort!

...

(Sara)
FWIW, if anyone is looking at this later during implementation planning, this probably doesn't need to be more complicated than a query parameter that determines the index, but we'd need to be careful to juggle things in light of https://docs.openverse.org/projects/proposals/detecting_sensitive_textual_content/20230308-implementation_plan_filtering_and_designating_results_with_sensitive_textual_content.html

@AetherUnbound AetherUnbound added 🌟 goal: addition Addition of new feature 📄 aspect: text Concerns the textual material in the repository 🟨 priority: medium Not blocking but should be addressed soon 🧭 project: implementation plan An implementation plan for a project 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: ingestion server Related to the ingestion/data refresh server labels May 2, 2023
@AetherUnbound AetherUnbound self-assigned this May 2, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog May 2, 2023
@zackkrida zackkrida assigned krysal and unassigned AetherUnbound May 3, 2023
@AetherUnbound
Copy link
Collaborator Author

@krysal since you're working on this, I wanted to surface this comment to you: #1154 (comment)

It might be something we want to think about as part of this IP, or at least issues created for this IP.

@sarayourfriend sarayourfriend moved this to Pending proposal in Openverse Discussions Jun 8, 2023
@krysal krysal moved this from 📋 Backlog to 🏗 In progress in Openverse Backlog Jun 20, 2023
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in Openverse Backlog Jul 14, 2023
@github-project-automation github-project-automation bot moved this from Pending proposal to Accepted in Openverse Discussions Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📄 aspect: text Concerns the textual material in the repository 🌟 goal: addition Addition of new feature 🟨 priority: medium Not blocking but should be addressed soon 🧭 project: implementation plan An implementation plan for a project 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: ingestion server Related to the ingestion/data refresh server
Projects
Archived in project
Status: Accepted
Development

Successfully merging a pull request may close this issue.

2 participants