-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow customizing managed data streams at different levels of granularity #97664
Comments
Pinging @elastic/es-data-management (Team:Data Management) |
imho this also relates to #91370 |
Big +1 on solving this with an ability to reference a "templated" component template name. I have one suggestion on the solution which may make it a bit simpler. I think there will be a slight issue with trying to use Instead, I'd suggest instead we have some ability to name the wildcards in the main template's index pattern (similar to a named regexp capture group) and then reference those as variables in the index_patterns:
- logs-(*:dataset)-(*:namespace)
composed_of:
- logs@custom
- logs-*-{{namespace}}@custom
- logs-{{dataset}}@custom
- logs-{{dataset}}-{{namespace}}@custom I think this is simpler, more obvious, and less tied to any specific convention. It also has the nice side benefit that it constrains the possibilities to only strings that appear in the actual name of the index/data stream, rather than fields in the document that may not be part of the index name. |
@BBQigniter do you think this will fully solve the problems described in #91370 or is there more we need to accommodate? |
@joshdover not completely sure but your proposal looks good for me :) |
Seems like a much better and simpler idea compared to relying on constant_keyword fields! Love it! From what I can tell, these are some aspects of #91370 that this proposal wouldn't tackle:
|
Fleet adding an explicit component template would work. Another option would be to make the dotted part of the dataset part of the pattern, so you could something like this (not sure I like the names I used, but you get the idea): index_patterns:
- logs-(*:dataset_prefix).(*:dataset_suffix)-(*:namespace)
composed_of:
- logs@custom
- logs-*-{{namespace}}@custom
- logs-{{dataset_prefix}}@custom
- logs-{{dataset_prefix}}.{{dataset_suffix}}@custom
- logs-{{dataset_prefix}}.{{dataset_suffix}}-{{namespace}}@custom I've wondered if the two-parted dataset should be part of the DSNS convention or not - we use this pattern fairly consistently, though not everywhere. |
As not all datasets have a prefix and suffix separated by a dot, the But either way, it seems the placeholders in component template references could also be used to add extension points to all data streams of an integration. |
++ on moving forward with the placeholder approach. It will not solve all problems but I think it will solve quite a few. @dakrone Would be great to get your feedback on this. |
Thanks for bringing this up Felix, and others for the discussion so far. We met today as a team to discuss this. We have a couple of reservations and some thoughts I'll try to share. First, the proposed solution of having placeholders where wildcards are essentially "captured" (the Second, the other option that I see currently would be for us to use a naming scheme for customizing component templates, for example, we'd change all of our The challenging part of the second solution is that we run into a composition problem when it comes to a change that a user wants to make with respect to a particular attribute of a data stream. For example, imagine a user that wants to make a change to the "global" data stream configuration, to set a project-level retention to all I don't think the placeholder meets the needs we have without introducing unacceptable leniency. The second is more workable but has some pieces and use-cases that we'd need to work through to make sure that we don't end up with a rigid or brittle system. What do you think? |
If all individual component templates are valid themselves, in what situation can the composition be invalid?
This sounds similar to elastic/kibana#121118. We've closed this issue because we'd like a solution that doesn't rely on Fleet to set up the data streams in the right way so that we can have the same extension points for the index templates that ship with Elasticsearch, such as |
It's not just component templates that must be valid, but also their use by the index template. For (a contrived) example, this is valid and allows an index to be created: PUT /_component_template/one
{
"template": {
"mappings": {
"properties": {
"field": {
"type": "text"
}
}
}
}
}
PUT /_index_template/it
{
"index_patterns": ["foo"],
"data_stream": {},
"composed_of": ["one"],
"template": {
"mappings": {
"properties": {
"alias-field": {
"type": "alias",
"path": "field"
}
}
}
}
} But if you tried to change the name of the field, you get an error: PUT /_component_template/one
{
"template": {
"mappings": {
"properties": {
"other-field": {
"type": "text"
}
}
}
}
}
// Returns:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "updating component template [one] results in invalid composable template [it] after templates are merged"
}
],
"type" : "illegal_argument_exception",
"reason" : "updating component template [one] results in invalid composable template [it] after templates are merged",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "composable template [it] template after composition with component templates [one] is invalid",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "invalid composite mappings for [it]",
"caused_by" : {
"type" : "mapper_parsing_exception",
"reason" : "Invalid [path] value [field] for field alias [alias-field]: an alias must refer to an existing field in the mappings."
}
}
}
},
"status" : 400
} This is just one contrived example. |
I think this is the biggest downside - these potential index templates needed are not known at integration installation time. They may only exist later. Now you could argue that the user won't really need to make any namespace-specific customizations until there is a known namespace they want to customize, so creating a new index template is a viable option. But now the user needs to either (1) manually copy the index template and keep it up-to-date with changes to the integration; or (2) use Fleet/Integration APIs in Kibana to add customizations to handle this for them, which is a confusing experience to have to switch between ES and Kibana APIs for template management. A similar alternative that would not have this downside is to add data stream naming scheme template management APIs to Elasticsearch directly so that users could more easily manage this directly from ES. IMO this might be the best middle ground, but I'd like to hear from @felixbarny on whether or not this fully solves the problem. Another idea is to solve the validation problem at indexing time instead of template creation, with a fallback to a "failure data stream" - the idea we discussed at EAH for documents that fail to be processed or indexed. This case feels pretty similar and could make use of the same mechanism. That said, I believe that's a fairly large enhancement that we have not begun work on and it would be unfortunate to block on this. |
Could you elaborate on how that would work? One potential issue with that may be how the precedence of these custom component templates is defined. How are they ordered among themselves, and how are they ordered with the component templates that already exist on the data stream? |
I'm thinking a higher level API for managing templates that are part of the data stream naming scheme, like we've brainstormed in the past. This would solve the problem of being able to direct users to use a single API surface for template management (Elasticsearch) and having Elasticsearch manage the namespace-specific settings. I think these APIs would need to support all of the granularity levels at the main issue description, in addition to global defaults. Under the hood it would need to dynamically create and update the required index and component templates, validating them all before committing the change. This API would probably also need to distinguish between user-customized settings and package-managed ones. The package API would be restricted to Kibana's system user only to keep end users to use the
For a basic case like setting a type-wide default, no new index templates need to be created, only updating the
Namespace-specific customizations require more work under the hood to create index templates with higher priority if needed. In this example, a new index template for every data stream managed by the system would need to have namespace-specific template created with higher priority, referencing a namespace-specific component template:
This has an added benefit of having Elasticsearch be the source of truth for how these customization layers are added on top of one another, instead of spreading that out across Fleet and Elasticsearch's default templates. |
One potential challenge I see is what happens when you create a new data stream after adding a namespace customization.
How do we ensure that ds2 also gets the customizations from step 2? Dataset customizations (such as But maybe it's fine to rely on copying the index templates? On the pro side, it makes existing data streams more immune to breaking changes caused by modifications in the global templates. However, they also don't benefit from improvements in these templates. Maybe that's the right tradeoff if it allows us to statically verify that the merged index templates are valid. |
We had a brief brainstorming session on this today and discussed these requirements & constraints:
Next step is for @tylerperk to flesh these requirements out more and we'll then meet again for another brainstorming session on potential solutions. |
Use `<data_stream.type>@custom` instead of `apm@custom`. This is an enhancement over what Fleet sets up; it is an additive improvement in the direction of elastic#97664. The rollup data streams' `@custom` component templates now include the duration, like what Fleet sets up. Add a YAML REST test, and a unit test ensuring consistency across the index templates.
Use `<data_stream.type>@custom` instead of `apm@custom`. This is an enhancement over what Fleet sets up; it is an additive improvement in the direction of #97664. The rollup data streams' `@custom` component templates now include the duration, like what Fleet sets up. Add a YAML REST test, and a unit test ensuring consistency across the index templates.
Hey @bytebilly and @dakrone, I wanted to bump this issue based on a recent thread we've been chatting in with @lucabelluccini about various pain points support observes when users work with component templates. The most recent example of this was elastic/integrations#8542, where we started including the Here's a writeup of this specific issue from Luca:
In this particular instance, it seems like the Overall, recommending users clone index templates for customizations has proven brittle and problematic. Having some more stack-level functionality for customization and preventing breakages would go a long way to helping with use cases like the above. One of the bigger things from this issue that would help is being able to customize data streams at the namespace level, namely applying an ILM policy to all data streams under a given namespace. We don't support this today, and this has led users down hacky paths that introduce lots of headaches for them when attempting stack upgrades. It seems like the list Josh provided previously would still be very relevant today, and having a means to provide customizations outside of actually creating new templates with their own set of rules/logic would help a lot with cases like this one above. What would it take to bump the priority on this, and what might a path forward look like? |
Thanks @kpollich for sharing the feedback. We instruct users to clone index templates to customize At the moment this is playing not very well with our Index Templates "structure" updates.
In parallel, spotting those situations can be done using heuristic approaches as we do not have a "marker" for cloned index templates. |
What are we trying to achieve?
On several occasions, we've been discussing to add ways to enable users to customize data streams that are set up via Fleet and via the built-in index templates, without having to create a copy of the index template and taking the onus to maintain the whole index template going forward. Instead, we'd want to offer dedicated extension points for users so that they can configure different settings/mappings/lifecycles at different levels of the data stream naming scheme:
*-*-*
){type}-*-*
){type}-{dataset}-*
){type}-{dataset}-{namespace}
){type}-*-{namespace}
)*-*-{namespace}
)Some concrete use cases:
logs-foo-*
data stream that is using thelogs-*-*
index template, without having to create a copy of the index template with alogs-foo-*
index pattern.Why this should be in Elasticsearch
The previous discussions (elastic/kibana#149484, elastic/kibana#121118) have mostly been focussed on Fleet. But I have a strong preference for not putting this into Fleet but into Elasticsearch so that data streams that are not managed by Fleet (such as the data streams for the built-in index templates
logs-*-*
andmetrics-*-*
) can benefit from that as well.Why is this important
This gets more important in the context of the reroute processor as documents can be routed to data streams that aren't managed by or known to Fleet. Also, we're considering to move APM index templates out of Fleet and into Elasticsearch (see #97546).
A potential solution
I've proposed one potential solution to this here: elastic/kibana#121118 (comment)
Essentially, we'd add a couple of component templates into the index templates that are managed by Fleet and Elasticsearch. For example, the
composed_of
section of thelogs-*-*
index template that is built into Elasticsearch would be extended by component templates that have a placeholder in them (exact naming tbd).Valid placeholders are any
constant_keyword
fields.If a user wants to customize a concrete data stream
logs-foo-bar
, they can create the following component templates:The text was updated successfully, but these errors were encountered: