Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Round metrics timestamps to the second #22388

Closed
exekias opened this issue Nov 3, 2020 · 6 comments
Closed

Round metrics timestamps to the second #22388

exekias opened this issue Nov 3, 2020 · 6 comments
Labels
discuss Issue needs further discussion. enhancement Metricbeat Metricbeat Stalled Team:Integrations Label for the Integrations team

Comments

@exekias
Copy link
Contributor

exekias commented Nov 3, 2020

Describe the enhancement:

We have been doing some storage benchmarking for the metrics use case. One of the outcomes was that we could potentially save a fair amount of space if we drop some precision in our timestamps, for instance, by rounding them to the second.

I think this may be fair for the metrics coming out of Metricbeat, we normally collect every 10s, so users probably don't care about subsecond precision on their events.

This issue is minded to discuss this, maybe doing some more testing for this particular change only and make a decision based on the results.

@exekias exekias added enhancement discuss Issue needs further discussion. Metricbeat Metricbeat Team:Integrations Label for the Integrations team labels Nov 3, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@exekias
Copy link
Contributor Author

exekias commented Nov 12, 2020

Something we will need to check with this change:

It may happen that by rounding we increase the possibility of skewing values within buckets when doing a date_histogram agg.

Right now it's very unlikely that data points get misaligned in a way that we end up with 2 points in one bucket and 0 points in the next one. By rounding I think we increase the chance of this happening, so we need to test for this.

@andrewkroh
Copy link
Member

andrewkroh commented Jun 9, 2021

@jpountz Are the benefits of rounding that there are fewer unique data points so we get better compression?

How would this be implemented on the Beats side? Do we need to pass any additional settings in our index mappings for the date field?

Would the Beat simply round 2021-04-13T13:51:38.123Z to 2021-04-13T13:51:38.000Z? Or would it need to truncate the seconds (e.g. 2021-04-13T13:51:38Z?

I was thinking about this for the logging use case where we are adding event.ingested for the Security Detection engine. We don't really need the precision in that case either.

@jpountz
Copy link
Contributor

jpountz commented Jun 9, 2021

Are the benefits of rounding that there are fewer unique data points so we get better compression?

It's mostly helpful because Lucene has logic to detect when all values have common multiples. In that case, if all timestamps were seconds, it would notice that they are all multiples of 1000 and compress accordingly.

Would the Beat simply round 2021-04-13T13:51:38.123Z to 2021-04-13T13:51:38.000Z? Or would it need to truncate the seconds (e.g. 2021-04-13T13:51:38Z?

Both would work. I have a slight preference for the latter, which better conveys that we don't care about milliseconds, but the former would work equally well.

I was thinking about this for the logging use case where we are adding event.ingested for the Security Detection engine. We don't really need the precision in that case either.

Let's do the same with event.ingested then. :)

@andrewkroh
Copy link
Member

I'm thinking about how to implement the removal of seconds from the data. How does this look for an implementation?

I was hoping I could do this in one step, but the script context doesn't have access to the _ingest metadata. So I am copying the timestamp which is a ZonedDateTime then manipulating the format.

PUT _ingest/pipeline/truncate-event-ingested
{
  "processors": [
    {
      "set": {
        "field": "event.ingested",
        "copy_from": "_ingest.timestamp"
      }
    },
    {
      "script": {
        "source": "ctx.event.ingested = ctx.event.ingested.withNano(0).format(DateTimeFormatter.ISO_OFFSET_DATE_TIME);"
      }
    }
  ]
}
POST _ingest/pipeline/truncate-event-ingested/_simulate
{
  "docs": [
    {
      "_source": {
        "hello": "world"
      }
    }
  ]
}
{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_id" : "_id",
        "_source" : {
          "hello" : "world",
          "event" : {
            "ingested" : "2021-06-30T12:06:28Z"
          }
        },
        "_ingest" : {
          "timestamp" : "2021-06-30T12:06:28.962709798Z"
        }
      }
    }
  ]
}

andrewkroh added a commit to andrewkroh/kibana that referenced this issue Jul 1, 2021
The `event.ingested` field is added to all documents ingested via
Fleet plus Agent. By removing the subseconds we can be better compression
of the values in Elasticsearch.

The primary user of `event.ingested` today is the the Security Detection Engine
as a tie-breaker in search_after, but once it moves to the using the
point-in-time API the need for precision will be lessened because PIT has
an implicit tie-breaker.

Relates elastic#103944
Relates elastic/beats#22388
andrewkroh added a commit to elastic/kibana that referenced this issue Aug 3, 2021
The `event.ingested` field is added to all documents ingested via
Fleet plus Agent. By removing the subseconds we can be better compression
of the values in Elasticsearch.

The primary user of `event.ingested` today is the the Security Detection Engine
as a tie-breaker in search_after, but once it moves to the using the
point-in-time API the need for precision will be lessened because PIT has
an implicit tie-breaker.

Relates #103944
Relates elastic/beats#22388

Co-authored-by: Kibana Machine <[email protected]>
kibanamachine added a commit to kibanamachine/kibana that referenced this issue Aug 3, 2021
The `event.ingested` field is added to all documents ingested via
Fleet plus Agent. By removing the subseconds we can be better compression
of the values in Elasticsearch.

The primary user of `event.ingested` today is the the Security Detection Engine
as a tie-breaker in search_after, but once it moves to the using the
point-in-time API the need for precision will be lessened because PIT has
an implicit tie-breaker.

Relates elastic#103944
Relates elastic/beats#22388

Co-authored-by: Kibana Machine <[email protected]>
kibanamachine added a commit to elastic/kibana that referenced this issue Aug 3, 2021
The `event.ingested` field is added to all documents ingested via
Fleet plus Agent. By removing the subseconds we can be better compression
of the values in Elasticsearch.

The primary user of `event.ingested` today is the the Security Detection Engine
as a tie-breaker in search_after, but once it moves to the using the
point-in-time API the need for precision will be lessened because PIT has
an implicit tie-breaker.

Relates #103944
Relates elastic/beats#22388

Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: Andrew Kroh <[email protected]>
streamich pushed a commit to vadimkibana/kibana that referenced this issue Aug 8, 2021
The `event.ingested` field is added to all documents ingested via
Fleet plus Agent. By removing the subseconds we can be better compression
of the values in Elasticsearch.

The primary user of `event.ingested` today is the the Security Detection Engine
as a tie-breaker in search_after, but once it moves to the using the
point-in-time API the need for precision will be lessened because PIT has
an implicit tie-breaker.

Relates elastic#103944
Relates elastic/beats#22388

Co-authored-by: Kibana Machine <[email protected]>
@botelastic
Copy link

botelastic bot commented Jun 30, 2022

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. enhancement Metricbeat Metricbeat Stalled Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

4 participants