-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Naming convention for builtin pipelines #96267
Comments
Pinging @elastic/es-data-management (Team:Data Management) |
Summarizing the conventions for other assets:
Given that, the proposed naming convention What about In #96083 (comment), @ruflin mentioned a concern that this naming convention may conflict with the data stream naming scheme so that To be honest, I don't think that's a huge concern and reason enough to break with existing naming conventions. After all, that's a name of a pipeline, not the name of a data stream. |
For me this discussion is linked to elastic/kibana#149484 @joshdover. We should not only discuss pipeline names but also index template and component names as I expect we will also ship some of these by default. The old one don't follow any convention as there wasn't any. One difference here is, that the existing I like the direction @eyalkoren is taking with For |
As #96171 is moving forward, we should come to a conclusion here quickly. The most important part for me is, that it is directly visible that an ingest pipeline or template is managed by the stack. Based on the above, I propose we follow the We still need to continue iterating on what the conventions are after |
Anyone wishing to propose a convention that will cover what we need- please take a look at the current list of resources we use to get a good idea. |
Documenting here the decision we made in our offline discussion: we are good with replacing component names within the registry, as opposed to registering each component both with the old and the new name. This means that in rolling upgrades, the already registered components will not be removed, so the cluster would serve both. |
I think the idea of @ruflin I would prefer to leave out ES or stack versions anywhere in the id of the thing itself (so no If I had to pick my preference, it would be the format of
I don't have an extremely strong opinion about the I'd like to enumerate the values for what What about the |
++
I don't expect the versioned templates to be loaded by Elasticsearch itself. Agree, it doesn't make sense for these. I expect us eventually to have an ECS / semconv package which is installed / managed by Fleet and could install different versions of the templates to be used. This are only there on demand.
I like the For the delimiter, the rule I was thinking of is that For the |
We could just use |
Summarizing points from the discussion above:
Adding my own points:
First attempt to accommodate all of those, I propose this schema:
Open questions:
Any feedback would be highly appreciated. |
Yes, I think it makes sense to not only apply this to component templates, but also index templates.
I don't think we need to enforce this but I agree that there should be consistency in terms of naming conventions. Currently, the convention in fleet seems to be For example, While I do agree that having the component type as a part of the identifier would be nice, I'm not sure if that's worth breaking with the conventions in fleet which either imply an inconsistency or that we need changes in fleet. However, we can't change the conventions in fleet easily as it would mean the existing cc @jpountz FYI that we're planning to change the names of the built-in assets like index templates, component templates, and ILM policies. |
Well, if we are using this as a guideline, then we already have a convention and what we need to do is make everything adhere to it. Second attempt then: the naming scheme has only two mandatory parts with one delimiter: |
There is a potential way to include what we do today in Fleet into this:
This gives a bit more structure to what is before the Most important from my perspective is that we have Fleet following the same convention (but I hope it already does). @eyalkoren Your table above did not include ingest pipelines, is that one purpose? |
I am OK with that, but I tried to avoid Let me try and explain my thinking behind take 2 in a different way: this discussion made me think that it's better to make the schema looser, as long as it contains The other important guidance is that the two parts of the schema are such where the first describes the data for which the component is for and the second part is essentially the ID of the component,. For example, if your data is described as I hope this makes sense...
No, not intended, but since we decided to not include the component type in the name by default, no need for examples for such. |
I like it. I got confused by the usage of +1 on using this.
Having some examples also for pipelines would help to make the point that pipelines are also affected by it especially as pipelines is in the title of the github issue. If everyone agrees with the your above recommendation, it would be great to have a final table as conclusion where we can send people to look up examples. |
I started laying out my third proposal, based on everything discussed above. Some ways we can deal with that, while still accommodate everything we decided so far:
@felixbarny @ruflin your input on that will be required in order for me to proceed |
I'm wary about making these templates reusable, as components usually have dependencies. E.g. if you have an ingest pipeline that introduces a new field, this field needs to exist in mappings. You can't easily reason about components in isolation. So things become quite complicated if we start mixing up component templates across Elasticsearch's built-in component templates and Fleet's templates, as the two have different upgrade lifecycles. I worry that this would eventually result in any change in built-in templates being considered as a potential breaking change. I can think of two ways to avoid this issue:
I have less experience dealing with templates than you do so I'm happy to take feedback, am I making it a bigger issue than it is? In any case, I'd like to make sure that changing the default template for logs would never be considered a breaking change, as it's an important way of keeping improving the out-of-the-box user experience. |
@eyalkoren, I think we can solve these issues by treating
I think we'll want to use some of the built-in component templates, such as default mappings and settings, in Fleet in the future. Currently, when we want to make changes, such as using It does imply, however, that we'll need to evaluate all changes to the built-in component templates on whether they make sense for Fleet and the integrations. I think this is a feature, not a bug, and acts as a forcing function to ensure integrations are leveraging best practices. I'm having second thoughts about renaming the component templates. We've previously concluded (#96267 (comment)) that renaming wouldn't be a breaking change as we wouldn't delete component templates in existing deployments so that the current names will continue to live in the cluster and only new installations are affected. One use-case came to mind that would break if we did that: ephemeral Elasticsearch environments that are provisioned with custom index templates that use built-in component templates that are versioned in a git repository. When spinning up an ephemeral ES cluster, we can't differ between existing use-cases and new ones. Therefore, we might want to re-consider adding the components under both their old and their new name or bring up this question in the breaking changes committee. |
@jpountz I need a clarification on this- is it about the order in which stack components get upgraded? How is this different from any other possible inconsistencies between Elasticsearch and Kibana?
@felixbarny I don't get what the convention is in this case and how it will be applied to other components. I created a spreadsheet so that we can compare multiple convention options and examples of how components will be renamed, please add yours. What I was thinking is that part of the refactoring will include the merge of mappings and settings components. From what I can see, couples of those ( Based on that, following is my third attempt to propose a convention: Take IIIThere are three dimensions for the matrix we are trying to describe here
We need a scheme that will allow any combination of those. For example, if we want to have specific Based on this rationale, the naming scheme can have only two mandatory parts with one delimiter:
where the only special character is
|
I have a slight preference for the proposal from @felixbarny in the spreadsheet. The reason is the split up of settings and mappings. I understand @eyalkoren you want to merge these as often these are used together. But I would argue, it should be possible to use logs mappings only without being force to inherit the global settings too. One thing I don't like too much in @felixbarny proposal is that it creates many prefixed, but I get the logic that it matches the data stream / index names so hopefully there should not be a naming conflict. For templates like Around the concern from @jpountz on the upgrade lifecycles different from the Fleet / Package Manager ones. I historically shared this concern but have come around to think that the benefits here outweigh the risks. Lets take a step back and decouple the template conventions from who installs it. Lets assume there is a component X that ensures some built in templates are always of the most recent version and we expect users to be able to use this when they build integrations. I'm stating integrations here because I expect in most scenarios in a future, users using these templates are doing it during building an integration. Decoupled from who loads the templates, do we agree on a convention? As a follow up, we can focus on how installs it. As the templates are needed directly after startup, Elasticsearch is currently our only option. I personally would prefer if we could bundle these default templates in one more multiple integrations that are loaded on startup by Elasticsearch instead of building it into the source code. But we are not there yet. So I think for now Elasticsearch is the right place. What about breaking changes? In general, all these templates should only get additions. But even then, things could become breaking as a field previous not mapped has suddenly become mapped. My assumption is, this is the edge case and not the norm. Most users will get a better experience thanks to changes made to the basic templates. But how do we make sure we don't break some of the users where this would be a breaking change? Could we make the upgrade a config option? By default, these component templates are always overwritten and ideally everything using these templates would be rolled over automatically. A config option could exist to disable this feature and the existing component templates would be kept forever. It is then up to the user to get the newest templates and update these manually through docs. Having an integration package with all the templates ,would make this even simpler. The nice part about our |
OK, then trying to summarize @felixbarny's proposal, as it is reflected in the examples spreadsheet:
@dakrone @joshdover would this convention cover the use cases you can think of? Please feel free to add either rows to spreadsheet with additional examples and/or add a column with an alternative proposal. |
@dakrone @joshdover we really need your input in order to proceed with actually enforcing a convention. If the latest proposal makes sense, a simple 👍 would do 😉 |
I think the proposal sounds good in general. The only thing that I would change is instead of doing anything at the " |
Apologies for my lack of engagement on this topic and thanks to everyone driving this forward. 👍 from my end on the naming convention we landed on, it seems to cover all of the existing and upcoming use cases for the Elastic Package Manager installed templates. |
LGTM. |
Closing this discussion issue as decided now. I've created implementation issues for ES and Fleet-managed components. |
See elastic#96267 (comment) - Index templates have a "@template" suffix - Component templates are split into settings and mappings, and have a @settings and @mappings suffix respectively - Ingest pipelines generally have a @pipeline suffix, except for one special case where we use @default-pipeline We no longer have a component template for every data stream, rather we have reusable component templates where it makes sense. For example, we now have metrics-apm@settings and metrics-apm@mappings, which are included by all APM metrics data streams. We now set both default_pipeline and final_pipeline in the index templates, preventing users from overriding them completely. The default pipeline is always apm@default-pipeline, which performs some rerouting of legacy data, and invokes user-defined @Custom pipelines based on the data_stream.type and data_stream.dataset fields. The final pipeline just performs built-in processing.
* x-pack/plugin/apm: introduce x-pack-apm plugin * Dependency fix and tests * Restore addition to ESRestTestCase * Replace IngestPipelineConfig instantiation * Update DataStreamUpgradeRestIT to expect logs-apm.error * Adding rollover functionality * Extend basic rollover funtionality tests * Start adding integration test * Hide rollup data streams * Apm ingest fixes - Map transaction.duration.us independently of span.duration.us, fix ingest pipeline. We might consider aliasing later. - Set event.ingested - Set processor.event, using constant_keyword where possible * Fix error.grouping_name script * Only set event.ingested for traces-apm.sampled * Enabling APMRolloverIT * Spotless... * Assertion change * Wait a bit before assertBusy * Adjust template and pipeline names to convention See #96267 (comment) - Index templates have a "@template" suffix - Component templates are split into settings and mappings, and have a @settings and @mappings suffix respectively - Ingest pipelines generally have a @pipeline suffix, except for one special case where we use @default-pipeline We no longer have a component template for every data stream, rather we have reusable component templates where it makes sense. For example, we now have metrics-apm@settings and metrics-apm@mappings, which are included by all APM metrics data streams. We now set both default_pipeline and final_pipeline in the index templates, preventing users from overriding them completely. The default pipeline is always apm@default-pipeline, which performs some rerouting of legacy data, and invokes user-defined @Custom pipelines based on the data_stream.type and data_stream.dataset fields. The final pipeline just performs built-in processing. * No need for manual cluster change events anymore * Fix test * Lower template priority to 110 Certain index templates installed by Fleet are given a priority of 150, so lower the priority to avoid conflicts. See https://github.com/elastic/kibana/blob/a18c68f7ac0b2cb6a46552328e9c6fc28c223970/x-pack/plugins/fleet/server/services/epm/elasticsearch/template/template.ts#L58 * Add logging * Remove explicit timeout for rollovers * Add _meta to component templates too * Increase template priorities to 140 This goes closer to the Fleet-installed template priorities, allowing for more builtin templates to be introduced without affecting APM. * Test multiple index template upgrades with rollovers * Enfore correct component template versions * Rename to x-pack-apm-data, default to disabled Because there's already an "apm" plugin, we haven't yet solved all issues with rollovers, and the templates need some more love. * Fix checkstyle * Fix more checkstyle * Fix even more linting issues * Adjust test to disabled by default * remove todo * Fixing test * Refactor leftovers * Fix constant keyword mapping conflict * Fix after merge * Comment fixes * Rename ApmIngestPipelineConfig to YamlIngestPipelineConfig * Always return a registry * spotless, what else * Add rollover integration test to core * Remove APMRolloverIT * Refactor in renamed components --------- Co-authored-by: eyalkoren <[email protected]>
TL;DR
This issue quickly evolved to include not only ingest pipelines, but all builtin assets.
The eventual agreed schema is described in #96267 (comment), with examples shown in a spreadsheet.
Description
#95782 introduced a way for stack components to automatically install prebuilt ingest pipelines.
Since then, the following pipelines were already added:
logs-default-pipeline
(through [Logs+] Default pipeline for logs data streams #95971)behavioral_analytics-events-final_pipeline
(through [Behavioral Analytics] Analytics pipeline to the index template registry #96104)logs@json-message
(through [Logs+] Add pipeline that parses JSON log events into top-level fields #96083)We want to come up with a naming convention that will:
Just as an example, consider a template such as:
{data-stream-type}@{purpose}[-{description}]
, where-{description}
is optional, yielding the following names:logs@default
behavioral_analytics@final
logs@json-message
all@ecs-core
all@ecs-extended
logs@ecs
Maybe as an extension to this discussion, we should decide on how and where we properly document the builtin pipelines, as well as builtin component templates. We could use the
org.elasticsearch.xpack.core.template.IndexTemplateRegistry
concrete implementations to produce this documentation, or at least to verify that all automatically-installed components are documented (e.g. through a unit test that matches documented lists to registry-produced lists).@ruflin @felixbarny @jimczi @dakrone - please provide your input.
The text was updated successfully, but these errors were encountered: