Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index.look_back_time setting for tsdb data streams #98463

Closed
martijnvg opened this issue Aug 15, 2023 · 3 comments · Fixed by #98518
Closed

Add index.look_back_time setting for tsdb data streams #98463

martijnvg opened this issue Aug 15, 2023 · 3 comments · Fixed by #98518
Assignees
Labels
>enhancement :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@martijnvg
Copy link
Member

martijnvg commented Aug 15, 2023

Currently the index.look_ahead_time index setting controls the index.time_series_end_time index setting when rolling over a tsdb data stream. The index.look_ahead_time setting default to 2 hours, which means the by default the index.time_series_end_time index setting of the new backing index will be 2 hours later than the index.time_series.start_time index setting. The the index.time_series.start_time and index.time_series_end_time index settings control what @timestamp are allowed to be indexed.

Additionally the index.look_ahead_time setting is used to generate the index.time_series.start_time and index.time_series_end_time index settings when creating a new data stream for the first backing index. Based on the defaults of index.look_ahead_time setting, the index.time_series.start_time index is set to current time - 2 hours and the index.time_series_end_time index setting is set to current time + 2 hours.

However for indexing the initial data setting the index.time_series.start_time index setting to 2 hours in the past sometimes creates too small time window. For example because data gets indexed starting from midnight that day.

The idea is to add a index.look_back_time setting that controls how to generate the index.time_series.start_time index setting when creating the data stream (for the first backing index). This allows for creating a first backing index that accepts data that is earlier then current time - $index.look_ahead_time, without affecting how index.time_series.start_time and index.time_series_end_time index settings are generated during rollover.

I think a good default for the index.look_back_time setting would be 24 hours. Meaning that by default the index.time_series.start_time index setting for the first backing index of a new data stream would be current time - 24 hours.

Relates to elastic/integrations#7345

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 15, 2023
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Aug 16, 2023
This change adds a `index.look_back_time` index setting that sets the `index.time_series.start_time` setting for the first backing index when a data stream is created.

This allows accepting data that is older for initial indexing without changing the `index.look_ahead_time` setting. This setting also controls the `index.time_series.end_time` setting and would affect rollovers as well.

The default for the `index.look_back_time` is `2h`, which means documents with `@timestamp` up to 2 hours after creation of the data stream are allowed to be indexed. This is the same as is without this change, because `index.look_ahead_time` is used to set `index.time_series.start_time` of the first backing index.

Closes elastic#98463
@martijnvg martijnvg self-assigned this Aug 16, 2023
@lalit-satapathy
Copy link

The look back setting is useful for metric integrations that have some data before the $now - 2h period. The expectation is that data volume would be lower than for the data that is to be received for $now + 2h and $now - 2h.

Thanks @martijnvg,
Some integration packages can potentially use this flag. Particularly, packages which has a known delayed arrival of metrics and packages ingesting fewer metrics in general. Example AWS billing, for which we have disabled TSDB now.

Lets also keep a long-term discussion open, if we see dropped metrics because they are being queued at source. This could impact all packages.

@martijnvg
Copy link
Member Author

@lalit-satapathy I've opened #99343 to track this problem.

martijnvg added a commit that referenced this issue Sep 8, 2023
This change adds a `index.look_back_time` index setting that sets the `index.time_series.start_time` setting for the first backing index when a data stream is created.

This allows accepting data that is older for initial indexing without changing the `index.look_ahead_time` setting. This setting also controls the `index.time_series.end_time` setting and would affect rollovers as well.

The default for the `index.look_back_time` is `2h`, which means documents with `@timestamp` up to 2 hours after creation of the data stream are allowed to be indexed. This is the same as is without this change, because `index.look_ahead_time` is used to set `index.time_series.start_time` of the first backing index.

Closes #98463
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants