You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This GH issue is to support the implementation of the search feature (#825) and was discussed during the search call on Nov. 7/8.
There are times when the ES index for a project needs to be recreated. This is when the index is first populated, when the index needs to be refreshed because it had somehow gone out of sync with the platform data, when a data migration occurs and new (or existing) fields need to be (re)indexed, etc. To provide this function, a reindex management command needs to be implemented.
This command accepts a project slug as its argument and performs the following:
Create an empty ES index for the specified project. The existing project's index (if any) is still functioning and serving search queries.
Index all of the project's records and resource metadata by pulling the data from the database and then pushing it to the ES cluster via its bulk API.
Switch the new index with the active index using ES' index aliasing feature then drop the now inactive and outdated index.
When all projects need to be reindexed, the idea is to reindex a small project first, test that the reindexing works as expected, reindex another project if needed, then reindex the rest of the projects via an ad hoc script that calls the command for the remaining projects sequentially.
The text was updated successfully, but these errors were encountered:
Question: How do we ensure that any records that are created/updated/deleted while the reindexing is ongoing are not lost or become stale? One possible solution is that when a project is reindexed, the update processing (#908) is stopped (but the queue is still available to receive data). This may mean that a record may be updated in the index twice (once when the reindexing picks up the updated record, and second when the queue is finally processed) but this is not a problem, of course. Not sure though about the case when a record is deleted: the second time the index is updated to delete a record may result in an error.
This GH issue is to support the implementation of the search feature (#825) and was discussed during the search call on Nov. 7/8.
There are times when the ES index for a project needs to be recreated. This is when the index is first populated, when the index needs to be refreshed because it had somehow gone out of sync with the platform data, when a data migration occurs and new (or existing) fields need to be (re)indexed, etc. To provide this function, a reindex management command needs to be implemented.
This command accepts a project slug as its argument and performs the following:
When all projects need to be reindexed, the idea is to reindex a small project first, test that the reindexing works as expected, reindex another project if needed, then reindex the rest of the projects via an ad hoc script that calls the command for the remaining projects sequentially.
The text was updated successfully, but these errors were encountered: