Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup: Add default pipeline to templates #14001

Closed
wants to merge 2 commits into from

Conversation

cwurm
Copy link
Contributor

@cwurm cwurm commented Oct 10, 2019

Adds the default_pipeline index setting (ES docs) to the templates loaded on setup.

Enabled by default, Libbeat first loads the ingest pipeline, then adds it to the template and loads the template. The built-in ingest pipeline contains processors for GeoIP and ASN lookups (and renaming to ECS fields) and adds an ingest timestamp. These are especially relevant for the SIEM app, where several widgets (network map, top IP tables, IP details overview) depend on GeoIP and ASN data being present. The ingest timestamp will help with detecting events that arrive much later than they occur or with a wrong timestamp (more details in elastic/ecs#582).

This feature is governed by four settings:

  1. setup.template.default_pipeline.enabled - if it is enabled at all (default true)
  2. setup.template.default_pipeline.overwrite - if an existing pipeline with the same name should be overwritten
  3. setup.template.default_pipeline.name - allows specifying a custom pipeline name (the default is beats_default_pipeline)
  4. setup.template.default_pipeline.file - instead of using the built-in default pipeline users can also load a custom pipeline from a file. I've re-used the unmarshalling logic from filesets so both JSON and YAML files can be used.

A few other notes:

  1. This would be enabled by default on all Beats. This will allow the SIEM app to work out of the box with all Beats data, but also in general it makes sense to have these enrichments enabled - they are probably the most common ones with both users and in our shipped pipelines (e.g. Filebeat modules). We can put a focus on this for testing, but could also disable it by default at first and turn it on by default later.
  2. Instead of one beats_default_pipeline pipeline for all Beats we could also do Beats-specific pipelines (packetbeat_default_pipeline, auditbeat_default_pipeline, etc.). I don't have a strong opinion.
  3. apm-server seems to be unaffected, it has its own pipeline-loading logic. I should probably test that a bit more to be sure.

libbeat/template/pipeline.go Outdated Show resolved Hide resolved
libbeat/template/pipeline.go Outdated Show resolved Hide resolved
filebeat/fileset/fileset.go Outdated Show resolved Hide resolved
@cwurm cwurm requested review from a team and removed request for a team October 11, 2019 11:09
"set": common.MapStr{
"field": "event.ingested",
"value": "{{_ingest.timestamp}}",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool. I wonder if event.ingested should be in ECS. CC @MikePaquette @webmat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already on it: elastic/ecs#582 :-)

version65, _ := common.NewVersion("6.5.0")
if config.DefaultPipelineEnabled && !ver.LessThan(version65) {
indexSettings.Put("default_pipeline", config.DefaultPipelineName)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipeline itself requires the Geoip plugin, which is only bundled into ES since 6.7. Might be worth checking for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense - changing to 6.7.0. I confirmed it works out of the box with that Elasticsearch version.

@tsg
Copy link
Contributor

tsg commented Oct 11, 2019

Nice work @cwurm! I can't think of anything that might get broken due to this, so I'm supportive to have it enabled by default unless someone else thinks of something that breaks.

@webmat
Copy link
Contributor

webmat commented Oct 11, 2019

Haven't reviewed in detail, but I love this!

BTW we've been tracking ideas of what can be done, with events that are ECS formatted. Check out this issue elastic/ecs#181. You don't have to change anything here, but just in case it sparks more ideas of things general enough to include here eventually :-)

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just commenting on a few operational aspects. I didn't review the code.

Will beat setup --index-management load this pipeline?

The documentation for installing the index template when using a non-ES output will need to be updated to include how to install the pipeline. We might need an export default-pipeline sub-command to get a copy of the pipeline JSON.

@@ -48,5 +53,8 @@ func DefaultConfig() TemplateConfig {
Enabled: true,
Fields: "",
Order: 1,

DefaultPipelineEnabled: true,
DefaultPipelineName: "beats_default_pipeline",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the pipeline name should have version number in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I like the idea of using the beat's name rather than "beats".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the pipeline name should have version number in it.

Good catch, definitely. We'll need it when running multiple versions of Beats.

And I like the idea of using the beat's name rather than "beats".

👍 - now the default pipeline for all Beats would be the same, but in the future this might well diverge.

@cwurm
Copy link
Contributor Author

cwurm commented Oct 13, 2019

Will beat setup --index-management load this pipeline?

Yes

The documentation for installing the index template when using a non-ES output will need to be updated to include how to install the pipeline. We might need an export default-pipeline sub-command to get a copy of the pipeline JSON.

Good point, let me implement export default-pipeline.

@cwurm
Copy link
Contributor Author

cwurm commented Oct 14, 2019

I've implemented export default-pipeline, and also fitted the code into the existing structure we use for loading the template and ILM policy.

Would like to do more testing on this, and need to add it in some places in the docs, but I think the code is ready to take a look at.

Copy link
Contributor

@faec faec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is a lovely change! I have only very minor comments on what's here, but please add some unit tests for the libbeat/template changes :-)


// loadIngestPipeline loads an ingest pipeline into Elasticsearch,
// overwriting an existing pipeline if it exists.
// If you wish to not overwrite an existing pipeline then use ingestPipelinExists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: ingestPipelineExists

return false, nil
}

status, body, _ := l.client.Request("GET", esIngestPipelinePath+name, "", nil, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please either include the error result here as above, or add a comment why it is skipped in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I included the error

@cwurm cwurm force-pushed the libbeat_default_ingest_pipeline branch from 9055be8 to 7aba709 Compare November 6, 2019 19:20
@cwurm
Copy link
Contributor Author

cwurm commented Nov 6, 2019

@faec Thanks, I've addressed your comments and added TestFileLoader_LoadDefaultPipeline in load_test.go and TestDefaultPipeline in template_test.go. There is also the new pipeline.go but it's not doing much so wouldn't say it needs a pipeline_test.go.

Let me know what you think. :-)

Copy link
Contributor

@faec faec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Approved

@cwurm
Copy link
Contributor Author

cwurm commented Dec 5, 2019

Now that we have a final_pipeline in Elasticsearch (elastic/elasticsearch#49470), I think it might make more sense to set that instead of default_pipeline. That way, the enrichments are ensured to run, making it easier to develop content (e.g. alerts, dashboards) using those fields.

If we change to final_pipeline, this would then only work for ES 7.5+ - but I think that's ok.

What do others think? @urso @tsg @andrewkroh @faec?

@tsg
Copy link
Contributor

tsg commented Dec 5, 2019

If we make it a final_pipeline, for the modules that already do GeoIP enhancement, we could end up doing it twice? Would it then remove the enhancement from all modules where it already exists?

Other than that, final_pipeline does seem to match better the use case here.

@cwurm cwurm force-pushed the libbeat_default_ingest_pipeline branch from 7aba709 to 5160f91 Compare February 6, 2020 14:09
@cwurm cwurm force-pushed the libbeat_default_ingest_pipeline branch from 5160f91 to 986f3ac Compare February 6, 2020 15:25
@cwurm cwurm force-pushed the libbeat_default_ingest_pipeline branch from 986f3ac to 9ad7b8a Compare February 6, 2020 15:30
@cwurm
Copy link
Contributor Author

cwurm commented Feb 6, 2020

@tsg @urso @andrewkroh @faec I've updated this to set final_pipeline instead of default_pipeline. Please have a look.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me. The final_pipeline concept is very useful. I didn't scrutinize the code, but the pipeline looks good.


"github.com/elastic/beats/libbeat/common"
"github.com/elastic/beats/libbeat/common/cfgwarn"
commonP "github.com/elastic/beats/libbeat/common/pipeline"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package names should be lowercase.

@simitt
Copy link
Contributor

simitt commented Feb 7, 2020

APM Server currently has a mechanism to load a set of default pipelines to ES on startup and applying them by default on event ingestion. This current approach does not use index pipelines, therefore it is definitely interesting to investigate how to best deprecate and move over to this new handling. I can look into that in more detail.

The pipelines partially overlap with the default pipelines introduced in this PR, but APM Server additionally createa a user-agent pipeline. Not necessarily a requirement for this PR, but have you thought about how a beat could extend the beatsFinalPipeline? It would be great to have a mechanism to add beats specific processor to this pipeline.

if v[idx], err = fixYAMLMaps(value); err != nil {
return nil, err
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a default case erroring out, just in case.

@@ -102,6 +103,39 @@ func (l *ESLoader) Load(config TemplateConfig, info beat.Info, fields []byte, mi
return nil
}

// LoadFinalPipeline loads the final pipeline.
func (l *ESLoader) LoadFinalPipeline(config TemplateConfig, info beat.Info) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing a 'good' package for loading pipelines. Maybe we can introduce it to the idxmgmt package instead of template? The idxmgmt package configures lifecycle.name and lifecycle.rollover_alias when creating the TemplateConfig.

@urso
Copy link

urso commented Feb 13, 2020

@simitt Would this idea fit apm-server needs? If so, do you think we can proceed here, and create a follow up issue to make the final pipeline more extensible?

Depending on the capabilities of the pipeline we might need a version check, plus the pipeline contents. One option to make it somewhat extensible would be to pass the default final pipeline to idxmgmt.MakeDefaultSupport. The pipeline also needs a version check. Maybe we could pass something like this to MakeDefaultSupport:

type PipelineConfig struct {
  // configures default pipeline name. libbeat chooses a name if empty
  Name string  
  
  // raw JSON (or YAML?)
  Definition string 

  // CheckVersion returns an error if the Elasticsearch version does not match the required version. Can be `nil`.
  CheckVersion func(common.Version) error

  // Check returns an error if the pipeline is not compatible with Elasticsearch (e.g. check license or for other capabilies?)
  Check func(*elasticsearch.Client) error
}

All setup should rely on this type being passed where it is required. For pipeline handling in general we might want to introduce a package named libbeat/idxmgmt/pipeline, so to move even more index management related functionality into one umbrella package. As we do for ILM already, it would be helpful if the template does not really know about the final pipeline, but the idxmgmt package would add the right settings to TemplateConfig, once the pipeline has been setup.

@simitt
Copy link
Contributor

simitt commented Feb 14, 2020

@urso your suggestion generally sounds reasonable to me.

Regarding moving forward with this PR - it is introducing some breaking changes to exported interfaces in the idxmgmt package, which require some changes on the APM Server side. That shouldn't be a big issue, as long as the changes do not introduce user facing breaking changes. As pointed out in above comment, we currently have a dedicated setup for registering pipelines on startup and have them set as default for output.elasticsearch.apm-server. Is my understanding correct that the index pipelines would be ignored in this case? If so I think moving forward with this PR and adding the changes you suggest in a follow up is fine.

@urso urso added candidate Candidate to be added to the current iteration Team:Services (Deprecated) Label for the former Integrations-Services team labels Mar 3, 2020
@botelastic
Copy link

botelastic bot commented Jul 17, 2020

Hi!
We just realized that we haven't looked into this PR in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it in as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Jul 17, 2020
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

1 similar comment
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@botelastic botelastic bot removed the Stalled label Jul 17, 2020
@botelastic
Copy link

botelastic bot commented Aug 16, 2020

Hi!
We just realized that we haven't looked into this PR in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it in as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Aug 16, 2020
@botelastic
Copy link

botelastic bot commented Sep 15, 2020

Hi!
This PR has been stale for a while and we're going to close it as part of our cleanup procedure.
We appreciate your contribution and would like to apologize if we have not been able to review it, due to the current heavy load of the team.
Feel free to re-open this PR if you think it should stay open and is worth rebasing.
Thank you for your contribution!

@botelastic botelastic bot closed this Sep 15, 2020
@zube zube bot removed the [zube]: Done label Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
candidate Candidate to be added to the current iteration libbeat review Stalled Team:Services (Deprecated) Label for the former Integrations-Services team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants