Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Facility for deploying ElasticSearch Transform #23

Closed
5 tasks done
nnamdifrankie opened this issue Aug 13, 2020 · 19 comments · Fixed by #307
Closed
5 tasks done

Add Facility for deploying ElasticSearch Transform #23

nnamdifrankie opened this issue Aug 13, 2020 · 19 comments · Fixed by #307

Comments

@nnamdifrankie
Copy link
Contributor

nnamdifrankie commented Aug 13, 2020

Background

And the ability to deploy a defined ElasticSearch transform to be be deployed when a package is applied or upgraded.

Acceptance Criteria

  • As a user I should be able to define an ElasticSearch transform as part of a package.
  • The ElasticSearch transform should be started after it is added to the search database.
  • As a user I should be able to update the attributes of the transform including the name possibly. This should not result in two Transform running.
  • As a user we should be able to delete a transform through an update to the package.
  • As a user I should be able to view statistics and information about a transform after deployment using the ElasticSearch API or Kibana if available.

Sample Transform Creation Statements Captured From Kibana Devtools

PUT _transform/endpoint_host_metadata_transform
{
  "source": {
    "index": "metrics-endpoint.metadata-default"
  },
  "dest": {
    "index": "metrics-endpoint.metadata_current-default"
  },
  "pivot": {
    "group_by": {
      "agent.id": {
        "terms": {
          "field": "agent.id"
        }
      }
    },
    "aggregations": {
      "HostDetails": {
        "scripted_metric": {
          "init_script": "state.timestamp_latest = 0L; state.last_doc=''",
          "map_script": "def current_date = doc['@timestamp'].getValue().toInstant().toEpochMilli(); if (current_date > state.timestamp_latest) {state.timestamp_latest = current_date;state.last_doc = new HashMap(params['_source']);}",
          "combine_script": "return state",
          "reduce_script": "def last_doc = '';def timestamp_latest = 0L; for (s in states) {if (s.timestamp_latest > (timestamp_latest)) {timestamp_latest = s.timestamp_latest; last_doc = s.last_doc;}} return last_doc"
        }
      }
    }
  },
  "description": "collapse and update the latest document for each host",
  "frequency": "1m",
  "sync": {
    "time": {
      "field": "event.created",
      "delay": "60s"
    }
  }
}

POST _transform/endpoint_host_metadata_transform/_start
DELETE _transform/endpoint_host_metadata_transform
@ycombinator
Copy link
Contributor

@ruflin could use your input here.

As a user I should be able to define an ElasticSearch transform as part of a package.

This item impacts the package spec. We could enhance the spec to allow transforms to be optionally defined as <packageRoot>/dataset/<dataset>/elasticsearch/transform/<transform>.json files. So your sample transform above would be defined in endpoint/dataset/metadata_current/elasticsearch/transform/endpoint_host_metadata_transform.json. The contents of the JSON file would be the same JSON that you'd send to the PUT _transform Elasticsearch API. The only part I'm not 100% sure about is whether the JSON should hardcode the values of the source.index and dest.index properties or if those need to be dynamically generated by Kibana before loading the transform into Elasticsearch?

The ElasticSearch transform should be started after it is added to the search database.
As a user I should be able to update the attributes of the transform including the name possibly. This should not result in two Transform running.
As a user we should be able to delete a transform through an update to the package.
As a user I should be able to view statistics and information about a transform after deployment using the ElasticSearch API or Kibana if available.

These items impact the actual implementation, which would happen in Kibana. Once we agree on the spec changes (discussion in the previous paragraph), I'll make a PR to update the spec in this repo here and then we can file an issue in Kibana for the implementation items.

@ruflin
Copy link
Contributor

ruflin commented Aug 14, 2020

It seems the transform depends on one dataset to create a second one. The index metrics-endpoint.metadata-default would be part of the existing metadata dataset. The transform part would be its own dataset and most follow the indexing strategy. With metrics-endpoint.metadata_current-default this seems to be the case, so the dataset would be called metadata_current. This dataset also contains the mapping for the data after the transformation.

So what <packageRoot>/dataset/<dataset>/elasticsearch/transform/<transform>.json makes sense to me. @ycombinator Should we support json and yaml? ;-)

But what about the namespace part? Will the transform be applied to all namespaces? Can the namespace somehow be passed in as a param on start?

@nnamdifrankie Could you open a second issue in Kibana around the part installing / removing / updating the transforms as this not directly applies to the spec.

@nnamdifrankie
Copy link
Contributor Author

@ruflin What repository for the kibana ticket. And to answer your question about the metadata_current we have already defined it and it is managed by ingest. https://github.com/elastic/package-storage/pull/325/files#diff-fd155145fe68664d8a5cbd8b7727ac5f

@ruflin
Copy link
Contributor

ruflin commented Aug 17, 2020

@nnamdifrankie https://github.com/elastic/kibana

Could you also share your thoughts on how you would see it working with the namespaces?

@nnamdifrankie
Copy link
Contributor Author

Could you also share your thoughts on how you would see it working with the namespaces?
@ruflin

This is a good I will answer and ask some question on how the indices are created.

Answer:

We currently write to the default namespace and read from all namespaces -*, with our index pattern and we do not specify any namespaces. So to begin we will be successful if we target the default namespace.

Questions:

  1. Are all indices created by the default in a namespace, and the default namespace is always the fall back. I ask this because of the destination index which is defined by a template. Assuming we specified metrics-endpoint.metadata_current will this create an index metrics-endpoint.metadata_current-default

@ruflin
Copy link
Contributor

ruflin commented Aug 18, 2020

Related Kibana issue can be found here: elastic/kibana#75153

@ruflin
Copy link
Contributor

ruflin commented Aug 18, 2020

Indices are not created by default, there is only the template. The index (data stream) is created when the first document is ingested.

@polyfractal
Copy link

Side note: it looks like the scripted_metric agg is trying to capture the "most recent" document for each host? If that's true, I think it can be done with a TopMetrics agg instead

We're trying to discourage the use of scripted-metric... it can be very dangerous, and the top-metrics agg should be faster too. :)

@nnamdifrankie
Copy link
Contributor Author

nnamdifrankie commented Aug 31, 2020

@polyfractal

I know we spoke offline on slack at the time of your comment. But I wanted to comment here for posterity. We are not trying to gather specific metrics but we are trying to retrieve the latest document related to the pivot property, and replace the last document in the destination. Hence we are using this approach, but if there are other better approaches please comment. Thanks.

@polyfractal
Copy link

If you need the latest document, perhaps TopHits agg instead?

@mtojek
Copy link
Contributor

mtojek commented Nov 19, 2021

It's a bit old issue. Feel free to reopen it if it's still valid.

@mtojek mtojek closed this as completed Nov 19, 2021
@ruflin
Copy link
Contributor

ruflin commented Nov 19, 2021

This actually got implemented. We are now in the progress in figuring out how we can improve it.

@mtojek mtojek reopened this Mar 7, 2022
@mtojek
Copy link
Contributor

mtojek commented Mar 7, 2022

Reopening this issue as we noticed that transforms and index templates haven't been formalized with package-spec, which mean they're "illegal".

@pzl @nnamdifrankie Could you please add missing definitions to spec?

@joshdover
Copy link
Contributor

For context, back in elastic/kibana#75153 support was added to Kibana to install transforms and the required associated index template for a transform's destination index. Endpoint is currently the only package to be using these today, however no associated package-spec changes were ever merged for transforms or their associated index templates

@mtojek
Copy link
Contributor

mtojek commented Mar 8, 2022

ping @kevinlog

@CohenIdo
Copy link
Contributor

CohenIdo commented Mar 9, 2022

Reopening this issue as we noticed that transforms and index templates haven't been formalized with package-spec, which mean they're "illegal".

@pzl @nnamdifrankie Could you please add missing definitions to spec?

Hey @mtojek, another point that is related to this point is when we perform elastic-package check after adding the mentioned assets, we receive the following linting error:

Error: checking package failed: linting package failed: found 2 validation errors:
   1. item [index_template] is not allowed in folder [integrations/packages/cis_kubernetes_benchmark/elasticsearch]
   2. item [transform] is not allowed in folder [integrations/packages/cis_kubernetes_benchmark/elasticsearch]

Another point I would like to raise after conversation with @joshdover is:

Adding assets to create the cloud security posture solution led us to discover that in some cases we would like to create a dependency between assets.
As an example, I created two Transform assets. (transform1 and transform2)
The source index of transform2 is the destination index of transform1.

When the assets were initialized, we got an error saying the source index of transform2 didn't exist.
As a workaround, I created the destination index of transform1 before initializing the assets.
As a user, I'd like to be able to tell the integration that transform2 depends on transform1.

cc: @kfirpeled

@kfirpeled
Copy link
Contributor

kfirpeled commented Mar 22, 2022

Hey all, I'd like to raise several points to consider here, regarding latest transform and the transform destination convention.
In this discussion @joshdover brought to our attention that our Transform destination index is not following the data stream naming scheme and the importance of following it to the user.

Since latest Transform can't use a datastream as destination, and that I had the impression the naming convention is reserved to datastreams only.

I wonder if we should introduce a hint, in the naming convention, that the current index is not a datastream or that this is a latest Transform kind of index. Or maybe that is me, misunderstanding how it works.

The second question is, when installing package's Transforms, how to make them aware to kibana namespaces. How to create a transform that the source of it is a datastream? which is kibana namespace aware. And how to create a datastream that the destination of it is kibana namespace aware?

@kevinlog
Copy link

@kfirpeled

The second question is, when installing package's Transforms, how to make them aware to kibana namespaces. How to create a transform that the source of it is a datastream? which is kibana namespace aware. And how to create a datastream that the destination of it is kibana namespace aware?

The issue has some relevant discussion about transforms and space awareness. We're currently discussing options with the ML team on how to solve this. There's nothing concrete yet, but you can follow along. @joshdover latest comment here has some ideas.

@ruflin ruflin mentioned this issue Mar 31, 2022
2 tasks
@joshdover
Copy link
Contributor

closed by #307

@joshdover joshdover linked a pull request Jul 12, 2022 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants