Automatically clean up after failed indexing runs (original #402) #1756

obulat · 2021-04-21T12:14:04Z

This issue has been migrated from the CC Search API repository

Author: aldenstpage
Date: Tue Jan 14 2020
Labels: Hacktoberfest,help wanted,✨ goal: improvement,🏷 status: label work required,🙅 status: discontinued

When an indexing job fails (such as if a node in our Elasticsearch cluster has a full disk, or a bug in indexer-worker halts the process), the incomplete index is left inside of the Elasticsearch cluster, requiring someone to manually delete it. The indexer should detect this condition when the job starts and handle it.

The production index is determined by the image alias. The indexer should delete any index NOT pointed to by this alias following the naming scheme image-<uuid>.

Original Comments:

hedonhermdev commented on Sat Feb 22 2020:

Can I work on this issue?
source

CodeMonk263 commented on Sun Feb 23 2020:

Can i work on this issue?

source

kgodey commented on Tue Feb 25 2020:

@hedonhermdev go ahead. @CodeMonk263 please find another issue to work on since @hedonhermdev commented first.

DantrazTrev commented on Sat Feb 29 2020:

@hedonhermdev are still working on this issue?

hedonhermdev commented on Sat Feb 29 2020:

No.

On Sat, 29 Feb 2020 at 8:07 PM, Dantraz [email protected] wrote:

@hedonhermdev https://github.com/hedonhermdev are still working on this issue?

source

DantrazTrev commented on Sat Feb 29 2020:

Can i take it over?
@aldenstpage

kgodey commented on Tue Mar 03 2020:

Go ahead @DantrazTrev

tushar912 commented on Fri Oct 02 2020:

@DantrazTrev are u still working on this?

kgodey commented on Fri Oct 02 2020:

@tushar912 it's been a few months since @DantrazTrev's post, I think you can go ahead and work on this.

tushar912 commented on Fri Oct 02 2020:

Ok

tushar912 commented on Tue Oct 06 2020:

The way i understood this issue is as follows. The main indexing job is done by indexer.py in ingestion_server . The TableIndexer class contains a method _index_table which checks if the database is in sync with index and replicates if not.There are two methods of indexing. reindex which creates a new index and makes it live alias and update which updates the index. Currently during reindex if the index is not created successfully it still persists in the cluster so the job is to delete the index if indexing fails . @kgodey or @aldenstpage please tell if i have understood correctly.
source

tushar912 commented on Tue Oct 06 2020:

Also i am thinking of modifying the already existing consistency_check method and add it to the reindex to delete the index if it is not indexed properly. Am i on the right track?
source

The text was updated successfully, but these errors were encountered:

sarayourfriend · 2022-12-16T04:59:35Z

@WordPress/openverse-catalog Would this fit into the data refresh DAG workflow? Does this issue need to be moved to the catalog repo?

AetherUnbound · 2022-12-16T19:29:35Z

Definitely! I'll move it over.

sarayourfriend added 🟩 priority: low Low priority and doesn't need to be rushed 💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users labels Dec 16, 2022

AetherUnbound transferred this issue from WordPress/openverse-api Dec 16, 2022

obulat added the 🧱 stack: catalog Related to the catalog and Airflow DAGs label Feb 24, 2023

github-project-automation bot added this to Openverse Backlog Apr 17, 2023

github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Apr 17, 2023

obulat transferred this issue from WordPress/openverse-catalog Apr 17, 2023

obulat mentioned this issue Nov 24, 2023

Clean up all previous indexes after successfully switching to a new one during data refresh #1481

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically clean up after failed indexing runs (original #402) #1756

Automatically clean up after failed indexing runs (original #402) #1756

obulat commented Apr 21, 2021 •

edited

Loading

sarayourfriend commented Dec 16, 2022 •

edited

Loading

AetherUnbound commented Dec 16, 2022

Automatically clean up after failed indexing runs (original #402) #1756

Automatically clean up after failed indexing runs (original #402) #1756

Comments

obulat commented Apr 21, 2021 • edited Loading

Original Comments:

sarayourfriend commented Dec 16, 2022 • edited Loading

AetherUnbound commented Dec 16, 2022

obulat commented Apr 21, 2021 •

edited

Loading

sarayourfriend commented Dec 16, 2022 •

edited

Loading