-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transform to spec #307
Conversation
I'm happy to see that we complete the package spec with the things we already support / have implemented. We should have a follow up discussion on how transforms should work in the future. Even thought it works today I would argue it is less then ideal. Many conversations happened in the past in #23 and linked issues. There were also discussions happening with the team behind transforms on how it could be improved but not sure where we landed there. @kevinlog you might know? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few nit-picks on my side. Regarding the transform logic, I'm leaving the final decision about this PR for @joshdover.
versions/1/elasticsearch/spec.yml
Outdated
# https://www.elastic.co/guide/en/elasticsearch/reference/current/transforms.html | ||
type: file | ||
contentMediaType: "application/json" | ||
pattern: '^.+\.json$' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should apply any naming requirements? For example, to prepend a package name to the file. Is it possible that another package can override this transform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure but I think the filename doesn't effect the created Transform at all.
As you can see in our integration (works locally) we have two different transforms in different directories with the same file name (default.json
) and they don't override each other.
Probably @joshdover could explain better the exact meaning and implications the filenames and the containing directory names have.
endpoint security integration transforms look the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
endpoint security integration transforms look the same.
From the spec perspective, all Endpoint security integration transforms are illegal as they haven't been documented. This is the reason why you need to deal with it now. We will enable the strict package validation soon and it will block the publishing of packages with undocumented (= illegal) features. cc @pzl
I'm not sure but I think the filename doesn't effect the created Transform at all.
As you can see in our integration (works locally) we have two different transforms in different directories with the same file name (default.json) and they don't override each other.
Let me elaborate here, what will happen if we have another package foobar
with exact transforms and directories? When we install it, will it overwrite your transforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @pzl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me elaborate here, what will happen if we have another package foobar with exact transforms and directories? When we install it, will it overwrite your transforms?
To my understanding the answer is no.
Seems like the id of the transform is a result of both the integration package name and the transform directory name.
Additionally after having a conversation with @CohenIdo about the naming of the transform files I understood he was instructed to name them default.json
so I'll change the spec to reflect this requirement.
Anyways I'm sure Josh will shed more light over this topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eyalkraft thanks for doing this, it's something I should have done some time back when my team originally added transforms.
I'm not sure but I think the filename doesn't effect the created Transform at all.
Let me elaborate here, what will happen if we have another packagefoobar
with exact transforms and directories? When we install it, will it overwrite your transforms?
Currently, when transforms are installed, the package name will be prepended (along with the directory path) to actual name of the transform. Thus, it doesn't matter what the definition .json
file is named. So, as long as the packages themselves are named differently, I don't think there will be any collisions. Here is what our transforms are named when installed. (@pzl let me know if I have anything wrong here)
That being said, I am OK with any naming convention we want to enforce and we can update the names of files to match.
We will enable the strict package validation soon and it will block the publishing of packages with undocumented (= illegal) features.
For clarity, adding this PR to the package spec will "document" the transform feature and will no longer make it illegal, correct?
Further, let us know when we can test against strict package enforcement so we can fix any other problems on our end and be prepared for when it is live. I would ask that we can publish packages as is for the next couple weeks so that we can push our package with changes for the 8.2
release and in case we need to push anything additional to fix bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said, I am OK with any naming convention we want to enforce and we can update the names of files to match.
That's the killer argument for this issue. I guess we can resolve the conversation. Thanks for jumping in, Kevin.
For clarity, adding this PR to the package spec will "document" the transform feature and will no longer make it illegal, correct?
Yes, that's the idea.
Further, let us know when we can test against strict package enforcement so we can fix any other problems on our end and be prepared for when it is live. I would ask that we can publish packages as is for the next couple weeks so that we can push our package with changes for the 8.2 release and in case we need to push anything additional to fix bugs.
@eyalkraft It would be great if we can copy Endpoint's transforms here as test cases to make sure they are aligned with new spec requirements. @kevinlog What would be the best source to copy from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ eyalkraft It would be great if we can copy Endpoint's transforms here as test cases to make sure they are aligned with new spec requirements. @ kevinlog What would be the best source to copy from?
Either of the definitions here are good: https://github.com/elastic/endpoint-package/tree/master/package/endpoint/elasticsearch/transform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been meaning to sit down and write some thoughts on the open issue for this #23, but now is a good as time as any. I see two main problems with the existing implementation of transforms as used by the Endpoint package:
An index template is required, but there is no explicit link in the package (or the package install implementation) between the template and the transform
Transforms require an index template as well (which is another 'illegal' feature), yet in Endpoint's current transform implementation there is no explicit link between the index template and transform. This creates problems when a transform's source index is another transform's destination index, as @CohenIdo noted in #23 (comment). In such a scenario, it'd be ideal for our package install code to be able to be aware of both the dependencies between transforms and between a transform and its index template. This would allow us to install the index templates and create the destination indices before setting up the transforms.
The CSP plugin is currently hacking around this limitation by manually creating the index template in their Kibana plugin. This means the index template is currently duplicated between the Kibana codebase and the CSP package source code.
Adding support for namespace-specific transforms
We need to keep an eye towards this future requirement with an idea of how we will support this in the future (see private issue linked below). Currently the transforms contain the default
namespace directly in the transform's name and source/dest indices. We should instead provide some sort of naming convention/pattern that does not include the namespace and rely on the package install code to add the namespace at package install time. In the future, we'll be providing an API to add namespaces and the associated resources, see elastic/kibana#121118
I think we should redesign how this works a bit, what about supporting a yaml file like this?
description: Latest Endpoint metadata document per host
frequency: 10s
source:
# namespace would be appended as `metrics-endpoint.metadata-default*`
index_prefix: metrics-endpoint.metadata-
query: |
{
"range": {
"@timestamp": {
"gt": "now-90d/d"
}
}
}
dest:
# namespace would be appended as `metrics-endpoint.metadata_current-default`
index_prefix: metrics-endpoint.metadata_current-
# Raw index template json
index_template: |
latest: |
{
"unique_key": [
"elastic.agent.id"
],
"sort": "@timestamp"
}
sync: |
{
"time": {
"field": "event.ingested",
"delay": "10s"
}
}
One challenge here is that to solve the above problems we may also need to support the existing pattern that the Endpoint plugin uses. Alternatively, we could switch over to a new pattern in a given stack version and older Endpoint packages wouldn't be installable in those newer Stack versions. @kevinlog would this be acceptable?
This would be OK from our perspective since we tear down the old transform and install the new one on each package upgrade. We always tie a new package release to a new stack release. One thing to keep in mind is that the new package doesn't get installed until a user with proper permissions visits Fleet or Security in Kibana. I know there is talk of installing packages in ES at upgrade time which would solve this drift. I still think we would be OK since the older package/assets would already be installed and we would only try to install the new package with the new patterns. Let me know if the above makes sense. |
Is this still true? Fleet changed to auto upgrade packages on boot using the kibana_system user in 8.0 without requiring any users to visit the UI. Is there custom code in the Endpoint side that still needs to run separate from Fleet's package install/upgrade process? |
It is best practice for Transforms (pivot transforms), when deployed as part of a package, to explicitly define the destination index mappings. This can be achieved by either I agree with @joshdover that it should be part of the package. I'll also add a requirement to specify
Controlling the name of the transform and its destination and source indices might not be applicable to all use cases. It might fit for the endpoint use case, but there are multiple future requests for packages which contain transforms (such as Host Risk Score and Beaconing, from a different part of Security). These do not currently conform to the Fleet naming conventions for data streams afaik. What naming conventions are being enforced with this PR? Can we review to ensure there is enough flexibility to cover other transform use cases. |
You are correct - there is no other custom code on the Endpoint side. I simply forgot about this change, apologies for the confusion. |
versions/1/elasticsearch/spec.yml
Outdated
contents: | ||
- description: An Elasticsearch transform folder | ||
type: folder | ||
pattern: '^[a-z|_]+$' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI complains about your test case. Should this pattern contain also 0-9
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (and removed |
)
@@ -0,0 +1,46 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add also transforms referenced by Kevin in the other conversation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
As can be seen in the spec, this PR allows:
Regarding the naming of the created transform, as can be seen here:
The ids of the created transforms:
Note that it's not the spec that determines these ids but the package installation code in Kibana.
Not going into the complicated details of your comment but just noting that mixing yaml and raw json inside as strings can result with a pretty ugly parsing logic, and might lead to mistakes while creating these assets. One thing that really helped us during the integration development is that the assets formats are (almost?) identical to the exported saved objects jsons. This makes sense since under the hood Kibana actually uses the "import saved object" API for the package assets installation, And this let us first create whatever asset we wanted our integration to have in Kibana using the UI and only then export it as Json and add it to our integration (no manual asset creation). This is your yaml parsed to json{
"description": "Latest Endpoint metadata document per host",
"frequency": "10s",
"source": {
"index_prefix": "metrics-endpoint.metadata-",
"query": "{\n \"range\": {\n \"@timestamp\": {\n \"gt\": \"now-90d/d\"\n }\n }\n}\n"
},
"dest": {
"index_prefix": "metrics-endpoint.metadata_current-",
"index_template": ""
},
"latest": "{\n \"unique_key\": [\n \"elastic.agent.id\"\n ],\n \"sort\": \"@timestamp\"\n}\n",
"sync": "{\n \"time\": {\n \"field\": \"event.ingested\",\n \"delay\": \"10s\"\n }\n}"
} I'm probably missing what is the value of having the raw index template json inside the yaml as string? Alternatively, what's wrong with going for yaml alone? Alternative yaml suggestiondescription: Latest Endpoint metadata document per host
frequency: 10s
source:
# namespace would be appended as `metrics-endpoint.metadata-default*`
index_prefix: metrics-endpoint.metadata-
query:
range:
"@timestamp":
gt: "now-90d/d"
dest:
# namespace would be appended as `metrics-endpoint.metadata_current-default`
index_prefix: metrics-endpoint.metadata_current-
index_template:
latest:
unique_key:
- elastic.agent.id
sort: "@timestamp"
sync:
time:
field: event.ingested
delay: 10s And as a json this looks like: {
"description": "Latest Endpoint metadata document per host",
"frequency": "10s",
"source": {
"index_prefix": "metrics-endpoint.metadata-",
"query": {
"range": {
"@timestamp": {
"gt": "now-90d/d"
}
}
}
},
"dest": {
"index_prefix": "metrics-endpoint.metadata_current-",
"index_template": null
},
"latest": {
"unique_key": [
"elastic.agent.id"
],
"sort": "@timestamp"
},
"sync": {
"time": {
"field": "event.ingested",
"delay": "10s"
}
}
} |
Also another suggestion, since:
What do you think about merging this PR to align the spec with Kibana & the endpoint integration package, and opening another issue to continue discussing the problems and open ends with today's implementation of transformation assets? I'll happily open the follow-up issue with all the information from this thread if you all agree. |
@sophiec20 Is there a reason they can't conform to the data stream naming convention? It's a goal for all data ingested by Elastic integrations to conform to this convention as it provides many benefits across the entire architecture.
@eyalkraft Yeah I'm not too particular on the mix of yaml and json in my example. Going pure YAML probably makes sense. The goal of my example was to demonstrate how accommodating my concerns around the associated index template and dynamic namespace could work.
If we're going to make breaking changes to how this works, I'd rather not add it to the spec now since then other packages may start to use this and we'd then need to support two ways of doing the same thing from the install side. If we defer on adding it to the spec now, could CSP continue using the existing implementation in Kibana today and then accommodate a new updated pattern with a hard version requirement ( |
Hey @joshdover, thanks for syncing everyone with the index-template issue.
So what I think is in addition to adding a section of |
@joshdover To be honest, I'm on the fence, as we obey spec rules. We don't have a mechanism to disable spec validation now, but even if we have that would be a backdoor many developers could use :) So rather than disabling the spec, I would recommend merging this PR, as it just legalizes a "transform directory": - description: Folder containing Elasticsearch Transforms
# https://www.elastic.co/guide/en/elasticsearch/reference/current/transforms.html
type: folder
name: transform
required: false
contents:
- description: An Elasticsearch transform folder
type: folder
pattern: '^[a-z0-9_]+$'
contents:
- description: An Elasticsearch transform file
type: file
contentMediaType: "application/json"
pattern: '^default\.json$'
^ this, to be honest. We already "committed" a crime, let's just confirm it :) |
This will mean we have to support this for quite a while until we do a v2 of the spec, which is unfortunate since it's only used by 1 package today, one that we have very tight control over. If we add this way of configuring transforms to the spec any package can start using it and we already know this way has problems. This current way of configuring transforms even poses problems for the very use case we're trying to unblock here and we're already having to hack around it in the CSP Kibana plugin. If we're already having to hack around it, why don't we do this:
I just don't see the benefit of adding a broken feature to the spec that we're going to have to keep supporting. We've lived with this 'illegal' feature for some time, why not leave it as-is until we have the time to really fix it rather than committing to supporting it's problems for potentially years when there's not much incentive to do so. In terms of the long-term approach, how much do we still need to decide? Could we iron out the details in the next week or so? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking very close! A couple last suggestions
If we conform to the proposed index template naming, it seems it would imply that we should rename the destination indices for our transforms. Our index is coded to be: Let me know if the above makes sense. I may be missing something. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Index naming constrains.
Could be done on the following iteration.
"$id": "#root/dest/index" | ||
title: Index | ||
type: string | ||
pattern: "^.*$" # TODO: Enforce the data stream naming scheme. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Destination index naming constraints
- type: string | ||
examples: | ||
- kibana_sample_data_ecommerce | ||
pattern: "^.*$" # TODO: Enforce the data stream naming scheme. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Source index naming constraints
You are correct here, thanks for showing your team's example as well! @joshdover my prior concerns with upgrade and deleting old indices shouldn't be an issue. We can change the name of our index pattern and/or move the template files in the directory, but we can leave the index patterns the same. |
@eyalkraft Is there a reason not to include the type of the data in the name of the index template? I think it should match the existing pattern we have for all other index templates which does include the type prefix, for example the You are right that the name of the index template shouldn't impact the name of the destination indices, but IMO, they should actually match so this isn't confusing or inconsistent. |
Actually there is no reason I know of we don't include the type of the data in the name of the index template. I'm updating the PR and the description to match then. cc @kevinlog sorry for the confusion |
Following another quick sync with @joshdover regarding the naming of the transform, we thought it makes sense for the transform to be named in a similar way to the index template (aka include the data type prefix). Same with the index template names, the name of the transform doesn't affect the destination/source indices it runs on, but optimally is correlated with them to provide a clear user experience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thank you for locking down the transform fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Implementation for supporting this in Fleet's installation code will be discussed in elastic/kibana#134321. This is not yet scheduled. |
I've opened a follow up issue for the future improvements that still need to be made, many of which were discussed in this PR: #370 |
What does this PR do?
This is the first iteration change to add Elasticsearch transform to the spec.
This PR allows adding a transform asset to packages, with an optional destination index template to be installed with it. It also allows installing a transform without starting it.
Upon installation of a package that contains a transform asset, the create behavior is as follows:
The name of the created index template will be derived from the data type, name of the package and the name of the containing folder,
For example:
logs-example_package.example_name
._meta
fields to mark the matching indices as managed by Fleet.default
namespace usingkibana_system
privileges.The name of the created transform will be derived from the data type, the name of the package and the name of the containing folder,
For example:
logs-example_package.example_name-default-0.0.1
.Namespace and version are added by fleet.
Upon upgrading a package that contains a transform asset, the upgrade behavior is as follows:
The recommended naming convention for the source and destination indices is the data stream naming scheme, or patterns matching the data stream naming scheme.
Data streams are named in the following way:
{type}-{dataset}-{namespace}
,For example:
logs-example_package.example_name-default
.For more information see the introduction to the Elastic data stream naming scheme.
What doesn't this PR allow?
This list of possible changes will be further discussed in follow up issues:
default
.Installing a transform without starting it.Included.kibana_system
(such as the installing user/ some service user). This limits the source indices possible to use by package-installed transforms.Why is it important?
Checklist
test/packages
that prove my change is effective.versions/N/changelog.yml
.Related issues