-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: skip points outside of archive retention on first pass #82
fix: skip points outside of archive retention on first pass #82
Conversation
// If we have already seen a point with the same timestamp it means | ||
// we already have a point from an archive with higher precision that | ||
// we want to keep. So we skip this point. | ||
if _, ok := seenTs[p.Timestamp]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pstibrany and @ywwg This solves problem 2 that we discussed yesterday where we want to keep points from prior archives because they have higher resolutions.
|
||
for _, p := range archivePoints { | ||
if p.Timestamp < minArchiveTs { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pstibrany and @ywwg This solves problem 1 where we just drop all points that are beyond the retention of the archive but still present in the ring buffer.
// may overlap and have older points | ||
sort.Slice(keptPoints, func(i, j int) bool { | ||
return keptPoints[i].Timestamp < keptPoints[j].Timestamp | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the two additions above, we now only need to sort once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'm not too familar with this code so you probably want another approval from Owen/Peter too.
AS mentioned privately, I don't think it solves the problem that we're facing. With archives and samples like this: Archive #0:
Archive #1:
I believe the end result will still be:
Using archives:
|
@pstibrany I've added 647f350 together with a unit test where I precisely take into account previous archives maxts to skip points in subsequent archives. Lets try again with these changes. |
Signed-off-by: Jesus Vazquez <[email protected]>
07d14e0
to
647f350
Compare
I'm trying to achieve something like this MaxT MinT 0: [XXXXXXXXXX] 1d, 1m 1: [ XXXXXXXXXXXXXXXXX] 1w, 10m 2: [ XXXXXXXXXXXXX] 1y, 60m So we process archive 0 first and then we track archive 0 min ts and while processing archive 1 we discard all samples up to that mint and grab the remaining ones.
Signed-off-by: Jesus Vazquez <[email protected]>
for _, p := range archivePoints { | ||
// We want to track the max timestamp of the archive because we know | ||
// it virtually represents now() and we wont have newer points. | ||
// Then the min timestamp of the archive would be maxTs - the archive | ||
// retention. | ||
if p.Timestamp > maxArchiveTs { | ||
maxArchiveTs = p.Timestamp | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we do this for each archive again and again?
Don't all archives have the same "now" and they only differ in their retention period?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right. We had seen the "offset" value in the database dump but that is a byte offset, not a timestamp offset.
// if we are already in the second or subsequent archive and we had | ||
// some points in the prior archives, we want to skip | ||
// samples in the previous archives | ||
maxArchiveTs = lastMinTs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We update maxArchiveTs
here, but this variable it's not used afterwards.
lastMinTs = minArchiveTs | ||
|
||
for _, p := range archivePoints { | ||
if p.Timestamp < minArchiveTs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we check if minArchiveTs <= p.Timestamp && p.Timestamp <= maxArchiveTs
holds here?
If we do that, we don't need to keep seenTs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually check for minArchiveTs
should be exclusive, ie. minArchiveTs < p.Timestamp
must hold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because points are written to all archives at once, all archives will contain points from timestamps that are within the retentions. So we do need a check like this. I think we can do it more efficiently if we pre-sort the points and then do a double-index comparison
name: "test retention when first archives are empty", | ||
metricName: "mymetric", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be unrealistic scenario to me. If first archive (with raw data) is empty, how can there be any aggregations at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an end-to-end test broke, so we added this test to cover that edge case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I think the logic checking to see if the point is a duplicate timestamp is redundant -- by definition we are only looking at points for timestamps where that archive is the highest resolution block. We can be much more efficient -- start with the oldest block and just blast through the points that are within the correct bounds |
a much faster algo is:
done |
…s, not the archives themselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks good.
I think we have tiny mistake in bounds computation (archive with 10 points, with max timestamp 100 and resolution 1 should NOT include point 90).
continue | ||
// Don't include any points in this archive that were covered in a higher | ||
// resolution archive. | ||
if p.Timestamp >= lastMinTs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly:
if p.Timestamp >= lastMinTs { | |
if p.Timestamp > lastMinTs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh I was wondering if I'd got that switched. thanks
name: "test retention when first archives are empty", | ||
metricName: "mymetric", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
thanks for the bounds-check fix! |
explaining the new tests I had to write, this is a valid archive:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code makes sense to me. Thanks.
We have realized that during first pass and processing of the archive, we are counting in points outside of the retention and this can mess up some calculations. So we are fixing the code to only include points within the expected archive boundaries min timestamp and max timestamp.
Previous we kept points in a lower archive if that exact timestamp did not exist in a higher-resolution archive. This was incorrect behavior because the lower archive might be a SUM aggregation, and that results in spike points amidst the original high-res points. Instead, we only take points if there is no higher-resolution archive covering that entire time span. Similarly, we do not keep any points in a high res archive if those points are covered by a lower-res archive. In effect, we always know which points are valid based on the boundaries between the archives:
Based on this new approach, we can greatly reduce the amount of logic in the ReadSamples function and make it much faster and more space-efficient. We also discovered new edge cases and added tests for those.