Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removal of the ingestion server #3925

Open
8 of 13 tasks
stacimc opened this issue Mar 14, 2024 · 29 comments
Open
8 of 13 tasks

Removal of the ingestion server #3925

stacimc opened this issue Mar 14, 2024 · 29 comments
Assignees
Labels
🧭 project: thread An issue used to track a project and its progress

Comments

@stacimc
Copy link
Collaborator

stacimc commented Mar 14, 2024

@stacimc stacimc added the 🧭 project: thread An issue used to track a project and its progress label Mar 14, 2024
@stacimc stacimc self-assigned this Mar 14, 2024
@stacimc
Copy link
Collaborator Author

stacimc commented Mar 14, 2024

Adding myself as project lead at least for now. As discussed at the priority meeting, we will move forward with making the proposal and at least starting work on the implementation plan, with the understanding that implementation may not move forward if this project is discovered to be more complicated than we believe it will be.

@stacimc stacimc changed the title Removal of the data refresh server Removal of the ingestion server Mar 19, 2024
@stacimc stacimc moved this from ⌛ Not Started to 🚀 In Kickoff in Openverse Project Tracker Mar 19, 2024
@stacimc
Copy link
Collaborator Author

stacimc commented Mar 19, 2024

Updated the language from "data refresh server" to "ingestion server" to reflect the current name of the service, in order to be less confusing. We had experimented with renaming the ingestion server in some places in documentation; if this project proceeds we won't have to worry about this confusion anymore regardless :)

Work on the project proposal is underway, as is some early investigation into the feasibility of the project.

@openverse-bot
Copy link
Collaborator

Hi @stacimc, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

@stacimc
Copy link
Collaborator Author

stacimc commented Apr 15, 2024

The project proposal was approved, and the IP is awaiting its second approval.

@stacimc
Copy link
Collaborator Author

stacimc commented Apr 17, 2024

The implementation plan has been merged and approved, and issues created under the milestone.

@zackkrida zackkrida moved this from 💬 In RFC to ⏸ On Hold in Openverse Project Tracker Apr 22, 2024
@zackkrida
Copy link
Member

@stacimc I've moved this to "On Hold" while we wait to determine how to prioritize it; whether we start work on this project or aim to complete others first.

@stacimc stacimc moved this from ⏸ On Hold to 🚧 In Progress in Openverse Project Tracker Apr 23, 2024
@stacimc
Copy link
Collaborator Author

stacimc commented Apr 23, 2024

After discussing with @zackkrida we've decided to move this into In Progress and kick off implementation. I'll be picking up the first issue today 🥳

@openverse-bot
Copy link
Collaborator

Hi @stacimc, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

@stacimc
Copy link
Collaborator Author

stacimc commented May 9, 2024

This project is underway. The first major PR is up for review, with work in progress on the indexer worker image. There is also considerable progress on the infrastructure side with preparing for the indexer workers.

@stacimc
Copy link
Collaborator Author

stacimc commented May 10, 2024

Noting that https://github.com/WordPress/openverse-infrastructure/pull/871 has been merged and the indexer worker pools are now available!

@openverse-bot
Copy link
Collaborator

Hi @stacimc, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

@stacimc
Copy link
Collaborator Author

stacimc commented May 28, 2024

Progress was delayed due to AFK. The indexer worker has been drafted here and will be up for review this week.

@openverse-bot
Copy link
Collaborator

Hi @stacimc, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

@stacimc
Copy link
Collaborator Author

stacimc commented Jun 24, 2024

The indexer worker is fully implemented and we're moving on to the last few implementation issues. One additional issue was added to the milestone after discussion on #4464.

@stacimc
Copy link
Collaborator Author

stacimc commented Jul 3, 2024

After a conversation with @sarayourfriend yesterday, I'm considering removing the use of the autoscaling group and going back to a plan very similar to my original proposal in the IP for this project, allowing Airflow to directly manage the EC2 instances. The ASG has caused a few problems with error handling and, as noted in inline comments in the PR for implementing the distributed reindex, with retrying individual workers (one of the stated goals of the project, and a situation we've run into in recent memory).

This is not yet definitive and I will make a PR to change the implementation plan since this is a big enough change/I want to get approval. However I'll note that this would not invalidate any work that's been done so far, except for a few in-progress changes in the linked PR (which need work either way!), and removing just the ASG on the infrastructure side.

@sarayourfriend
Copy link
Collaborator

Great! I'll look out for the IP change (please ping me there) and will get the infra side ready for you, it should be a simple enough change, with some room for a small refactor I've been wanting to do for a while now (extract the launch template and security group creation out into a separate module, so that one-off instances like the bastion can use it instead of the old user-data approach. Anyway, it should be a quick one to implement on the infra side and unblock live testing with staging.

@openverse-bot
Copy link
Collaborator

Hi @stacimc, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

@openverse-bot
Copy link
Collaborator

Hi @stacimc, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

1 similar comment
@openverse-bot
Copy link
Collaborator

Hi @stacimc, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

@stacimc
Copy link
Collaborator Author

stacimc commented Aug 28, 2024

This project was considerably delayed due to a sequence of AFK and assigned support work/meetups. Work has now been resumed. Importantly the following have been merged:

The final large chunk of implementation, to add the remaining steps, is in progress. Afterward there will be a few cleanup PRs and small pieces, but the major piece of work left will be integration tests.

@openverse-bot
Copy link
Collaborator

Hi @stacimc, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

2 similar comments
@openverse-bot
Copy link
Collaborator

Hi @stacimc, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

@openverse-bot
Copy link
Collaborator

Hi @stacimc, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

@stacimc
Copy link
Collaborator Author

stacimc commented Oct 16, 2024

This project is code complete and has just been postponed awaiting time to rigorously test the new data refresh in staging. Any issues discovered during testing will then have to be addressed, and the project will be kept open for an extended period of time until several production runs have been completed before we retire the old dags.

@openverse-bot
Copy link
Collaborator

Hi @stacimc, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

@stacimc
Copy link
Collaborator Author

stacimc commented Oct 31, 2024

After tackling a number of small issues encountered during testing, the staging audio data refresh has been run successfully on the production Airflow instance, and the staging image data refresh is underway!

@krysal krysal self-assigned this Nov 6, 2024
@openverse-bot
Copy link
Collaborator

Hi @krysal, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

@WordPress WordPress deleted a comment from openverse-bot Dec 13, 2024
@krysal
Copy link
Member

krysal commented Dec 13, 2024

The alter step failed for the image data refresh as it consumed too much Airflow's memory and degraded the service to the point of bringing it down. It's pending to look for an alternative solution to the current parallel tasks.

@openverse-bot
Copy link
Collaborator

Hi @krysal, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧭 project: thread An issue used to track a project and its progress
Projects
Status: 🚧 In Progress
Development

No branches or pull requests

5 participants