Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add time series index settings #297

Closed
wants to merge 1 commit into from

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Mar 15, 2022

What does this PR do?

Checks that settings for time series indexes are supported.

Settings needed:

  • index.routing_path in the index settings. This routing path must be a keyword dimension.
  • index_mode in the data stream settings.

Why is it important?

To improve support for time series data in the context of elastic/elasticsearch#74660.

Checklist

Related issues

@jsoriano jsoriano self-assigned this Mar 15, 2022
@elasticmachine
Copy link

elasticmachine commented Mar 15, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-04-04T09:41:36.280+0000

  • Duration: 2 min 5 sec

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

Comment on lines +19 to +20
data_stream:
index_mode: "time_series"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about adding this data_stream.index_mode setting to the spec. This setting needs that the data stream has dimension fields, and that index.routing_path is configured to a dimension field of type keyword. If we add this to the spec we should also check that the other requirements are satisfied in order to provide good feedback to package developers. We should also ensure that Fleet reads this setting and installs it in the index template.

Another option could be to make Fleet to automatically enable this mode if the index meets the requirements, that is if it has dimension fields, and it has an index.routing_path.

@mtojek @joshdover opinions?

Copy link
Member Author

@jsoriano jsoriano Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to install a data stream with these settings, and it fails with:

Error: can't install the package: can't install the package: could not install package; API status code = 500; response body = {"statusCode":500,"error":"Internal Server Error","message":"illegal_argument_exception: [illegal_argument_exception] Reason: [index.routing_path] requires [index.mode=time_series]"}

So it seems that current fleet would use index_template.settings but not index_template.data_stream.

We may need to check the required version to enable these settings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about adding this data_stream.index_mode setting to the spec. This setting needs that the data stream has dimension fields, and that index.routing_path is configured to a dimension field of type keyword. If we add this to the spec we should also check that the other requirements are satisfied in order to provide good feedback to package developers. We should also ensure that Fleet reads this setting and installs it in the index template.

I can comment on this one. I'm pretty sure we have a lot of gaps in the validation, but at some point, it helps developers discover what are available properties (index_mode with value time _series), even though it doesn't validate all dependent places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments/questions:

  • Why add a data_stream.index_mode field rather than using the existing index_template.settings.index.mode?
  • What is the dev experience we're aiming for with index settings? Do we want to move towards a more high-level, easy configuration experience or do we prefer making it explicit?
    • If we prefer the latter (which would be my preference), I think we should require the user to set the index.mode explicitly and validate this in the spec rather than having Kibana automatically apply the index mode when the setting is present. IMO, the less magic Kibana does to install packages the better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Why add a data_stream.index_mode field rather than using the existing index_template.settings.index.mode?

It seems that after elastic/elasticsearch#82621, it is recommended to use data_stream.index_mode. The setting under the index template settings seems to also require to manually set the start and end time of each index.

@martijnvg can probably confirm what setting we should use.

What is the dev experience we're aiming for with index settings? Do we want to move towards a more high-level, easy configuration experience or do we prefer making it explicit?

  • If we prefer the latter (which would be my preference), I think we should require the user to set the index.mode explicitly and validate this in the spec rather than having Kibana automatically apply the index mode when the setting is present. IMO, the less magic Kibana does to install packages the better.

Not sure what would be the best experience. What concerns me here is that there are three settings that are strongly dependant, a developer needs to use dimensions, set one of these dimensions as routing path and enable time series mode to actually enable time series. I can imagine that a developer can forget of any of these things.

I think that dimensions validation belongs to the spec, there we can statically check if the dimensions and the routing path are valid. But at this point the index mode would be redundant, the developer most likely wants to use it (actually it doesn't seem possible to use a routing path without the time series mode, see #297 (comment)).
I guess that adding another static check for this in the spec is preferred to adding more magic to Kibana.
One problem of this approach is that packages enabling the time series mode will only work with versions of the stack supporting these settings. We make harder for developers to support a broad range of stack versions.

Another, more explicit option, could be to add to the data stream manifest something like this:

time_series:
  enabled: true
  routing_path: elastic.agent.id

This would need to be explicitly interpreted by Kibana and applied to the index template.

There are two big advantages of this:

  • It is more explicit for developers.
  • It is more backwards compatible, older versions of the stack could continue using these packages, but without time series features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to also require to manually set the start and end time of each index

This is not required, and when setting index_mode to time_series inside data stream snippet of a template then start and end time index settings are automatically set.

Only the the routing_path index setting needs to be additionally configured as index setting inside the template.

time_series:
enabled: true
routing_path: elastic.agent.id

I like this idea. Assuming that time_series is a field inside data_stream object of a template?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to also require to manually set the start and end time of each index

This is not required, and when setting index_mode to time_series inside data stream snippet of a template then start and end time index settings are automatically set.

Oh ok, I understood from elastic/elasticsearch#82621 that this is required, and when defining it in the index settings then it to be set manually. In any case I think that for out use case we want this to be done automatically, and then it would be in the data stream object of the template, right?

time_series:
enabled: true
routing_path: elastic.agent.id

I like this idea. Assuming that time_series is a field inside data_stream object of a template?

Here I am talking about the data_stream manifest in a package (like this one for apache status). This would need to be translated by Fleet to the proper template settings when installing a package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case I think that for out use case we want this to be done automatically, and then it would be in the data stream object of the template, right?

Yes, defining index_mode in data stream object would automatically setup the start and end time index settings for the indices that are backing a data stream.

Here I am talking about the data_stream manifest in a package (like this one for apache status). This would need to be translated by Fleet to the proper template settings when installing a package.

I see. I'm currently discussing with others to change how tsdb data streams are configured in templates. In the way that you describe. (bringing routing_path index setting into the data stream object of a template)

@jsoriano
Copy link
Member Author

jsoriano commented May 11, 2022

Closing this, we are going to use the following options to enable time series at the end:

elasticsearch:
  index_template:
    settings:
      index.mode: time_series

Enabling it for kubernetes in elastic/integrations#3119.

This doesn't require changes in the spec or in Fleet.

Routing path will be automatically configured by Elasticsearch.

@jsoriano jsoriano closed this May 11, 2022
@martijnvg
Copy link
Member

Routing path will be automatically configured by Elasticsearch.

Yes, we plan to add this. But only for non dynamic fields defined in the template. All fields with type keyword and time_series_dimension set to true defined in the matching template will be included in the index.routing_path setting if this isn't defined already.

In case a template has a dynamic template definition where fields could be dynamically mapped to be keyword fields with time_series_dimension set true then the index.routing_path setting should be manually defined.

@jsoriano
Copy link
Member Author

jsoriano commented May 11, 2022

I haven't found any case for dimensions in dynamic fields.

If we find it, the approach of using the elasticsearch settings in manifests would also work, putting something like this in the integration manifest:

elasticsearch:
  index_template:
    settings:
      index.routing_path:
      - "kubernetes.pod.name"
      - "kubernetes.pod.uid"
      - "kubernetes.labels.some_dynamic_label"
      index.mode: "time_series"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for TSDB in package spec
5 participants