-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add time series index settings #297
Conversation
data_stream: | ||
index_mode: "time_series" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about adding this data_stream.index_mode
setting to the spec. This setting needs that the data stream has dimension fields, and that index.routing_path
is configured to a dimension field of type keyword
. If we add this to the spec we should also check that the other requirements are satisfied in order to provide good feedback to package developers. We should also ensure that Fleet reads this setting and installs it in the index template.
Another option could be to make Fleet to automatically enable this mode if the index meets the requirements, that is if it has dimension fields, and it has an index.routing_path
.
@mtojek @joshdover opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to install a data stream with these settings, and it fails with:
Error: can't install the package: can't install the package: could not install package; API status code = 500; response body = {"statusCode":500,"error":"Internal Server Error","message":"illegal_argument_exception: [illegal_argument_exception] Reason: [index.routing_path] requires [index.mode=time_series]"}
So it seems that current fleet would use index_template.settings
but not index_template.data_stream
.
We may need to check the required version to enable these settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about adding this data_stream.index_mode setting to the spec. This setting needs that the data stream has dimension fields, and that index.routing_path is configured to a dimension field of type keyword. If we add this to the spec we should also check that the other requirements are satisfied in order to provide good feedback to package developers. We should also ensure that Fleet reads this setting and installs it in the index template.
I can comment on this one. I'm pretty sure we have a lot of gaps in the validation, but at some point, it helps developers discover what are available properties (index_mode
with value time _series
), even though it doesn't validate all dependent places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments/questions:
- Why add a
data_stream.index_mode
field rather than using the existingindex_template.settings.index.mode
? - What is the dev experience we're aiming for with index settings? Do we want to move towards a more high-level, easy configuration experience or do we prefer making it explicit?
- If we prefer the latter (which would be my preference), I think we should require the user to set the
index.mode
explicitly and validate this in the spec rather than having Kibana automatically apply the index mode when the setting is present. IMO, the less magic Kibana does to install packages the better.
- If we prefer the latter (which would be my preference), I think we should require the user to set the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why add a
data_stream.index_mode
field rather than using the existingindex_template.settings.index.mode
?
It seems that after elastic/elasticsearch#82621, it is recommended to use data_stream.index_mode
. The setting under the index template settings seems to also require to manually set the start and end time of each index.
@martijnvg can probably confirm what setting we should use.
What is the dev experience we're aiming for with index settings? Do we want to move towards a more high-level, easy configuration experience or do we prefer making it explicit?
- If we prefer the latter (which would be my preference), I think we should require the user to set the
index.mode
explicitly and validate this in the spec rather than having Kibana automatically apply the index mode when the setting is present. IMO, the less magic Kibana does to install packages the better.
Not sure what would be the best experience. What concerns me here is that there are three settings that are strongly dependant, a developer needs to use dimensions, set one of these dimensions as routing path and enable time series mode to actually enable time series. I can imagine that a developer can forget of any of these things.
I think that dimensions validation belongs to the spec, there we can statically check if the dimensions and the routing path are valid. But at this point the index mode would be redundant, the developer most likely wants to use it (actually it doesn't seem possible to use a routing path without the time series mode, see #297 (comment)).
I guess that adding another static check for this in the spec is preferred to adding more magic to Kibana.
One problem of this approach is that packages enabling the time series mode will only work with versions of the stack supporting these settings. We make harder for developers to support a broad range of stack versions.
Another, more explicit option, could be to add to the data stream manifest something like this:
time_series:
enabled: true
routing_path: elastic.agent.id
This would need to be explicitly interpreted by Kibana and applied to the index template.
There are two big advantages of this:
- It is more explicit for developers.
- It is more backwards compatible, older versions of the stack could continue using these packages, but without time series features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to also require to manually set the start and end time of each index
This is not required, and when setting index_mode
to time_series
inside data stream snippet of a template then start and end time index settings are automatically set.
Only the the routing_path index setting needs to be additionally configured as index setting inside the template.
time_series:
enabled: true
routing_path: elastic.agent.id
I like this idea. Assuming that time_series
is a field inside data_stream
object of a template?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to also require to manually set the start and end time of each index
This is not required, and when setting
index_mode
totime_series
inside data stream snippet of a template then start and end time index settings are automatically set.
Oh ok, I understood from elastic/elasticsearch#82621 that this is required, and when defining it in the index settings then it to be set manually. In any case I think that for out use case we want this to be done automatically, and then it would be in the data stream object of the template, right?
time_series:
enabled: true
routing_path: elastic.agent.idI like this idea. Assuming that
time_series
is a field insidedata_stream
object of a template?
Here I am talking about the data_stream manifest in a package (like this one for apache status). This would need to be translated by Fleet to the proper template settings when installing a package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case I think that for out use case we want this to be done automatically, and then it would be in the data stream object of the template, right?
Yes, defining index_mode in data stream object would automatically setup the start and end time index settings for the indices that are backing a data stream.
Here I am talking about the data_stream manifest in a package (like this one for apache status). This would need to be translated by Fleet to the proper template settings when installing a package.
I see. I'm currently discussing with others to change how tsdb data streams are configured in templates. In the way that you describe. (bringing routing_path index setting into the data stream object of a template)
cc20e98
to
f1329bb
Compare
Closing this, we are going to use the following options to enable time series at the end:
Enabling it for kubernetes in elastic/integrations#3119. This doesn't require changes in the spec or in Fleet. Routing path will be automatically configured by Elasticsearch. |
Yes, we plan to add this. But only for non dynamic fields defined in the template. All fields with type In case a template has a dynamic template definition where fields could be dynamically mapped to be keyword fields with |
I haven't found any case for dimensions in dynamic fields. If we find it, the approach of using the
|
What does this PR do?
Checks that settings for time series indexes are supported.
Settings needed:
index.routing_path
in the index settings. This routing path must be a keyword dimension.index_mode
in the data stream settings.Why is it important?
To improve support for time series data in the context of elastic/elasticsearch#74660.
Checklist
test/packages
that prove my change is effective.versions/N/changelog.yml
.Related issues