Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically clean up after failed indexing runs (original #402) #1756

Open
obulat opened this issue Apr 21, 2021 · 2 comments
Open

Automatically clean up after failed indexing runs (original #402) #1756

obulat opened this issue Apr 21, 2021 · 2 comments
Labels
πŸ’» aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs

Comments

@obulat
Copy link
Contributor

obulat commented Apr 21, 2021

This issue has been migrated from the CC Search API repository

Author: aldenstpage
Date: Tue Jan 14 2020
Labels: Hacktoberfest,help wanted,✨ goal: improvement,🏷 status: label work required,πŸ™… status: discontinued

When an indexing job fails (such as if a node in our Elasticsearch cluster has a full disk, or a bug in indexer-worker halts the process), the incomplete index is left inside of the Elasticsearch cluster, requiring someone to manually delete it. The indexer should detect this condition when the job starts and handle it.

The production index is determined by the image alias. The indexer should delete any index NOT pointed to by this alias following the naming scheme image-<uuid>.


Original Comments:

hedonhermdev commented on Sat Feb 22 2020:

Can I work on this issue?
source

CodeMonk263 commented on Sun Feb 23 2020:

Can i work on this issue?

source

kgodey commented on Tue Feb 25 2020:

@hedonhermdev go ahead. @CodeMonk263 please find another issue to work on since @hedonhermdev commented first.

DantrazTrev commented on Sat Feb 29 2020:

@hedonhermdev are still working on this issue?

hedonhermdev commented on Sat Feb 29 2020:

No.

On Sat, 29 Feb 2020 at 8:07 PM, Dantraz [email protected] wrote:

@hedonhermdev https://github.com/hedonhermdev are still working on this issue?

source

DantrazTrev commented on Sat Feb 29 2020:

Can i take it over?
@aldenstpage

kgodey commented on Tue Mar 03 2020:

Go ahead @DantrazTrev

tushar912 commented on Fri Oct 02 2020:

@DantrazTrev are u still working on this?

kgodey commented on Fri Oct 02 2020:

@tushar912 it's been a few months since @DantrazTrev's post, I think you can go ahead and work on this.

tushar912 commented on Fri Oct 02 2020:

Ok

tushar912 commented on Tue Oct 06 2020:

The way i understood this issue is as follows. The main indexing job is done by indexer.py in ingestion_server . The TableIndexer class contains a method _index_table which checks if the database is in sync with index and replicates if not.There are two methods of indexing. reindex which creates a new index and makes it live alias and update which updates the index. Currently during reindex if the index is not created successfully it still persists in the cluster so the job is to delete the index if indexing fails . @kgodey or @aldenstpage please tell if i have understood correctly.
source

tushar912 commented on Tue Oct 06 2020:

Also i am thinking of modifying the already existing consistency_check method and add it to the reindex to delete the index if it is not indexed properly. Am i on the right track?
source

@sarayourfriend
Copy link
Collaborator

sarayourfriend commented Dec 16, 2022

@WordPress/openverse-catalog Would this fit into the data refresh DAG workflow? Does this issue need to be moved to the catalog repo?

@sarayourfriend sarayourfriend added 🟩 priority: low Low priority and doesn't need to be rushed πŸ’» aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users labels Dec 16, 2022
@AetherUnbound
Copy link
Collaborator

Definitely! I'll move it over.

@AetherUnbound AetherUnbound transferred this issue from WordPress/openverse-api Dec 16, 2022
@obulat obulat added the 🧱 stack: catalog Related to the catalog and Airflow DAGs label Feb 24, 2023
@github-project-automation github-project-automation bot moved this to πŸ“‹ Backlog in Openverse Backlog Apr 17, 2023
@obulat obulat transferred this issue from WordPress/openverse-catalog Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
πŸ’» aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Status: πŸ“‹ Backlog
Development

No branches or pull requests

3 participants