Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove event.ingested processor from ingest pipelines #4462

Closed
gsantoro opened this issue Oct 14, 2022 · 10 comments
Closed

Remove event.ingested processor from ingest pipelines #4462

gsantoro opened this issue Oct 14, 2022 · 10 comments
Assignees

Comments

@gsantoro
Copy link
Contributor

As part of a PR it has been brought to my attention that we shouldn't add event.ingested to ingest pipelines since that field is already added by a default fleet pipeline.

According to @andrewkroh here and here, we shouldn't add the field event.ingested in individual integrations since this fleet pipeline runs both in managed and standalone mode.

I would like to use this issue to agree on this topic and create some action points like:

  • add this suggestion to our internal docs so that we can refer to them in the future
  • create PR to remove event.ingested if necessary from ingest pipeline in integrations.
@ruflin
Copy link
Contributor

ruflin commented Oct 17, 2022

I want to take this as a chance to take a step back. Why do we need event.ingested in the first place? My understanding is that it is useful for monitoring (ingestion delay) and for security purpose. On the second point, I assume @andrewkroh can add a few more bits on how it is used.

Both of the above are not package related but "centralised" concers. Also the addition of the data can only happen centrally. Based on this, I agree with not adding it to every package and have this as part of our docs.

As it is a centralised concern, who should be in control of it? Is it a global setting, is it per package or per data stream? One thing to consider is that timestamps in general have quite a bit of overhead as these cannot be compressed well (@jpountz might have here some more details).

Few questions that come to mind:

  • Should we allow users to turn on / off this feature?
  • Is the final pipeline the right place for this or are there better places like mappings, setting on a data stream?
  • Who is responsible for the mapping of event.ingested?
  • Does event.ingested have to be indexed or is stored enough?

@gsantoro If nobody objects, it seems we can move forward with your suggestions but I would like to keep this open to have a follow up discussions on the long term of event.ingested.

@scunningham
Copy link

I also am curious as to what we are using event.ingested for? It does seem useful for diagnostic reasons; it is interesting when there is a large lag between event creation and delivery. This may affect the efficacy of the SIEM as it is time window bound. May also demonstrate a clock skew issue.

However, does it necessarily have to be mapped? If it is not mapped, then the compression overhead concerns go away. We do still have pipeline overhead concerns.

@ruflin ruflin self-assigned this Oct 17, 2022
@ruflin
Copy link
Contributor

ruflin commented Nov 1, 2022

@gsantoro Is there any follow up issue or PR for your action points to track the progress?

@joshdover Pinging you on this one for awareness as event.ingested seems to be a more generic ingest challenge together with the final pipeline. @axw Is apm using event.ingested in any way?

@gsantoro
Copy link
Contributor Author

gsantoro commented Nov 1, 2022

Hello @ruflin ,
regarding the PR that created this discussion in the first place, it has now been merged as it was (without the event.ingested) as agreed.

I assume that if we agree to remove event.ingested from all the ingest pipeline, this would not happen in a single PR but would probably involve multiple teams according to the CODEOWNERS responsibilities.

@andrewkroh
Copy link
Member

andrewkroh commented Nov 1, 2022

I also am curious as to what we are using event.ingested for?

These are the use cases that I am aware of.

  • It is used to prevent evasion of the detection rules 1 2 that look at all data from the Agents for host masquerading. We use this as a trusted timestamp for a means of iterating over data in roughly the order in which it was ingested.
  • In general, the detection engine relies on event.ingested to deal with documents that are delayed 3.
  • Diagnosing ingestion lag via analysis of differences between various timestamps.

Footnotes

  1. https://github.com/elastic/detection-rules/blob/46d5e37b76f8f55287b92f27fdc3da089c60c520/rules/cross-platform/defense_evasion_agent_spoofing_mismatched_id.toml

  2. https://github.com/elastic/detection-rules/blob/46d5e37b76f8f55287b92f27fdc3da089c60c520/rules/cross-platform/defense_evasion_agent_spoofing_multiple_hosts.toml

  3. https://www.elastic.co/guide/en/security/current/alerts-ui-monitor.html#troubleshoot-ingestion-pipeline-delay

@andrewkroh
Copy link
Member

If it is not mapped, then the compression overhead concerns go away.

It is mapped as a date. The value does not contain subseconds to help improve compressibility. 1 2

Footnotes

  1. https://github.com/elastic/beats/issues/22388#issuecomment-857755008

  2. https://github.com/elastic/kibana/pull/104044

@axw
Copy link
Member

axw commented Nov 2, 2022

@axw Is apm using event.ingested in any way?

No. I have used it in the past to calculate ingestion delay, but that was a one-off exercise, not something that is visualised in the product.

@ruflin
Copy link
Contributor

ruflin commented Dec 6, 2022

The way I read the above comments, there are specific features which need event.ingested. But these are not features specific to integrations but in general to data ingestion. We can currently enable on data coming from Elastic Agent that uses integrations (my understanding is final pipeline is only applied for integrations) but if used, it would be ideal that it is applied on all data (at least all data coming in for the data stream naming scheme).

event.ingested is a feature that a user or a feature should be able to run on / off based on user needs. It can be turned on globally for evasion detection rules and more locally on some data streams for detecting ingest lag. This sounds a lot like an "ingest/fleet" feature @amitkanfer .

My current take is, we need to continue the discussion around this feature but with a more holistic view as part of ingest and is not tied to any of the integrations. Because of this, I'm closing this issue but I think we need a separate issue for further discussion.

@gsantoro @tommyers-elastic Can we ensure that the decision not adding event.ingested is documented? #4462 (comment) @jsoriano Should we even have a check?

@ruflin ruflin closed this as completed Dec 6, 2022
@jsoriano
Copy link
Member

jsoriano commented Dec 6, 2022

Should we even have a check?

A check to disallow adding event.ingested from integrations? This could be an option, yes.

@ruflin
Copy link
Contributor

ruflin commented Dec 6, 2022

@jsoriano Yes. Or in a way that someone could still overwrite it but has to set a flag. But we can do that if it is actually needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants