-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Round metrics timestamps to the second #22388
Comments
Pinging @elastic/integrations (Team:Integrations) |
Something we will need to check with this change: It may happen that by rounding we increase the possibility of skewing values within buckets when doing a date_histogram agg. Right now it's very unlikely that data points get misaligned in a way that we end up with 2 points in one bucket and 0 points in the next one. By rounding I think we increase the chance of this happening, so we need to test for this. |
@jpountz Are the benefits of rounding that there are fewer unique data points so we get better compression? How would this be implemented on the Beats side? Do we need to pass any additional settings in our index mappings for the Would the Beat simply round I was thinking about this for the logging use case where we are adding |
It's mostly helpful because Lucene has logic to detect when all values have common multiples. In that case, if all timestamps were seconds, it would notice that they are all multiples of 1000 and compress accordingly.
Both would work. I have a slight preference for the latter, which better conveys that we don't care about milliseconds, but the former would work equally well.
Let's do the same with |
I'm thinking about how to implement the removal of seconds from the data. How does this look for an implementation? I was hoping I could do this in one step, but the script context doesn't have access to the
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_id" : "_id",
"_source" : {
"hello" : "world",
"event" : {
"ingested" : "2021-06-30T12:06:28Z"
}
},
"_ingest" : {
"timestamp" : "2021-06-30T12:06:28.962709798Z"
}
}
}
]
} |
The `event.ingested` field is added to all documents ingested via Fleet plus Agent. By removing the subseconds we can be better compression of the values in Elasticsearch. The primary user of `event.ingested` today is the the Security Detection Engine as a tie-breaker in search_after, but once it moves to the using the point-in-time API the need for precision will be lessened because PIT has an implicit tie-breaker. Relates elastic#103944 Relates elastic/beats#22388
The `event.ingested` field is added to all documents ingested via Fleet plus Agent. By removing the subseconds we can be better compression of the values in Elasticsearch. The primary user of `event.ingested` today is the the Security Detection Engine as a tie-breaker in search_after, but once it moves to the using the point-in-time API the need for precision will be lessened because PIT has an implicit tie-breaker. Relates #103944 Relates elastic/beats#22388 Co-authored-by: Kibana Machine <[email protected]>
The `event.ingested` field is added to all documents ingested via Fleet plus Agent. By removing the subseconds we can be better compression of the values in Elasticsearch. The primary user of `event.ingested` today is the the Security Detection Engine as a tie-breaker in search_after, but once it moves to the using the point-in-time API the need for precision will be lessened because PIT has an implicit tie-breaker. Relates elastic#103944 Relates elastic/beats#22388 Co-authored-by: Kibana Machine <[email protected]>
The `event.ingested` field is added to all documents ingested via Fleet plus Agent. By removing the subseconds we can be better compression of the values in Elasticsearch. The primary user of `event.ingested` today is the the Security Detection Engine as a tie-breaker in search_after, but once it moves to the using the point-in-time API the need for precision will be lessened because PIT has an implicit tie-breaker. Relates #103944 Relates elastic/beats#22388 Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Andrew Kroh <[email protected]>
The `event.ingested` field is added to all documents ingested via Fleet plus Agent. By removing the subseconds we can be better compression of the values in Elasticsearch. The primary user of `event.ingested` today is the the Security Detection Engine as a tie-breaker in search_after, but once it moves to the using the point-in-time API the need for precision will be lessened because PIT has an implicit tie-breaker. Relates elastic#103944 Relates elastic/beats#22388 Co-authored-by: Kibana Machine <[email protected]>
Hi! We're labeling this issue as |
Describe the enhancement:
We have been doing some storage benchmarking for the metrics use case. One of the outcomes was that we could potentially save a fair amount of space if we drop some precision in our timestamps, for instance, by rounding them to the second.
I think this may be fair for the metrics coming out of Metricbeat, we normally collect every 10s, so users probably don't care about subsecond precision on their events.
This issue is minded to discuss this, maybe doing some more testing for this particular change only and make a decision based on the results.
The text was updated successfully, but these errors were encountered: