Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to reindex a tsdb data stream #99176

Closed
martijnvg opened this issue Sep 4, 2023 · 3 comments · Fixed by #99476
Closed

Document how to reindex a tsdb data stream #99176

martijnvg opened this issue Sep 4, 2023 · 3 comments · Fixed by #99476
Assignees
Labels
>docs General docs changes :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Docs Meta label for docs team

Comments

@martijnvg
Copy link
Member

Re-indexing a tsdb data stream is more challenging than re-indexing a regular tsdb data stream. This is because when a new data stream is created, then the new backing index start and end time settings are blindly set to $now-2h to $now+2h. The backing indices of the existing tsdb data stream may overlap with this, or not at all. Directly re-indexing from the old tsdb data stream into the new tsdb data stream only works for documents that have a timestamp that matches with $now-2h to $now+2h.

Given that reindexing a tsdb data stream is possible, we should document how to do this. Currently no documentation around reindex a tsdb data stream exists.

The process looks something like this:

  • Create a specific index template for the new data stream only that will contain the re-indexed data. Otherwise other data streams may get affected. This index template should contain the new mappings / index settings that should get applied.
  • Update the template to set specific index.time_series.start_time and index.time_series.end_time index settings. The start and end time settings should be based on the lowest and highest @timestamp values in the data stream to be reindex. This way the first backing index is fixed to contain all data that is contained in the data stream that should be reindexed.
  • Update the template to set the index.number_of_shards index setting to the sum of all primary shards of all backing indices of the data stream to be reindexed.
  • Update the template to set index.number_of_replicas to zero and unset the index.lifecycle.name index setting.
  • Start the reindex operation.
  • After reindexing completed then remove the index.time_series.start_time, index.time_series.end_time index settings from the template and set index.number_of_replicas, index.number_of_shards and index.lifecycle.name to the original values.
  • Invoke the rollover api without any conditions set. Now data stream should be ready accept recent data.
@martijnvg martijnvg added >docs General docs changes :StorageEngine/TSDB You know, for Metrics labels Sep 4, 2023
@elasticsearchmachine elasticsearchmachine added Team:Docs Meta label for docs team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Sep 4, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@martijnvg
Copy link
Member Author

Related issue #98157

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Docs Meta label for docs team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants