fix: skip points outside of archive retention on first pass #82

jesusvazquez · 2023-09-25T17:43:33Z

We have realized that during first pass and processing of the archive, we are counting in points outside of the retention and this can mess up some calculations. So we are fixing the code to only include points within the expected archive boundaries min timestamp and max timestamp.

Previous we kept points in a lower archive if that exact timestamp did not exist in a higher-resolution archive. This was incorrect behavior because the lower archive might be a SUM aggregation, and that results in spike points amidst the original high-res points. Instead, we only take points if there is no higher-resolution archive covering that entire time span. Similarly, we do not keep any points in a high res archive if those points are covered by a lower-res archive. In effect, we always know which points are valid based on the boundaries between the archives:

   MaxT     MinT
0: [XXXXXXXXXX] 1d, 1m
1: [           XXXXXXXXXXXXXXXXX] 1w, 10m
2: [                            XXXXXXXXXXXXX]  1y, 60m

Based on this new approach, we can greatly reduce the amount of logic in the ReadSamples function and make it much faster and more space-efficient. We also discovered new edge cases and added tests for those.

jesusvazquez · 2023-09-26T10:20:14Z

pkg/graphite/convert/whisperconverter/whisper.go

+			// If we have already seen a point with the same timestamp it means
+			// we already have a point from an archive with higher precision that
+			// we want to keep. So we skip this point.
+			if _, ok := seenTs[p.Timestamp]; ok {


@pstibrany and @ywwg This solves problem 2 that we discussed yesterday where we want to keep points from prior archives because they have higher resolutions.

jesusvazquez · 2023-09-26T10:21:05Z

pkg/graphite/convert/whisperconverter/whisper.go

+
+		for _, p := range archivePoints {
+			if p.Timestamp < minArchiveTs {
+				continue


@pstibrany and @ywwg This solves problem 1 where we just drop all points that are beyond the retention of the archive but still present in the ring buffer.

jesusvazquez · 2023-09-26T10:21:43Z

pkg/graphite/convert/whisperconverter/whisper.go

+	// may overlap and have older points
+	sort.Slice(keptPoints, func(i, j int) bool {
+		return keptPoints[i].Timestamp < keptPoints[j].Timestamp
+	})


With the two additions above, we now only need to sort once.

fionaliao

LGTM, but I'm not too familar with this code so you probably want another approval from Owen/Peter too.

pstibrany · 2023-09-26T10:54:15Z

AS mentioned privately, I don't think it solves the problem that we're facing. With archives and samples like this:

Archive #0:

218	1663891200	2022-09-23T00:00:00Z
217	1663891320	2022-09-23T00:02:00Z
216	1663891440	2022-09-23T00:04:00Z
214	1663891560	2022-09-23T00:06:00Z
213	1663891680	2022-09-23T00:08:00Z

Archive #1:

651	1663891200	2022-09-23T00:00:00Z
427	1663891500	2022-09-23T00:05:00Z

I believe the end result will still be:

218	1663891200	2022-09-23T00:00:00Z
217	1663891320	2022-09-23T00:02:00Z
216	1663891440	2022-09-23T00:04:00Z
427	1663891500	2022-09-23T00:05:00Z *** we don't want this sample ***
214	1663891560	2022-09-23T00:06:00Z
213	1663891680	2022-09-23T00:08:00Z

Using archives:

 "Archives": [
    {
      "Offset": 40,
      "SecondsPerPoint": 60,
      "Points": 1051200 // 60s * 1051200 points = 730 days
    },
    {
      "Offset": 12614440,
      "SecondsPerPoint": 300,
      "Points": 735840 // 300s * 735840 points = 2555 days
    }
  ]

jesusvazquez · 2023-09-26T11:29:54Z

@pstibrany I've added 647f350 together with a unit test where I precisely take into account previous archives maxts to skip points in subsequent archives. Lets try again with these changes.

Signed-off-by: Jesus Vazquez <[email protected]>

I'm trying to achieve something like this MaxT MinT 0: [XXXXXXXXXX] 1d, 1m 1: [ XXXXXXXXXXXXXXXXX] 1w, 10m 2: [ XXXXXXXXXXXXX] 1y, 60m So we process archive 0 first and then we track archive 0 min ts and while processing archive 1 we discard all samples up to that mint and grab the remaining ones.

Signed-off-by: Jesus Vazquez <[email protected]>

pstibrany · 2023-09-27T09:04:15Z

pkg/graphite/convert/whisperconverter/whisper.go

+		for _, p := range archivePoints {
+			// We want to track the max timestamp of the archive because we know
+			// it virtually represents now() and we wont have newer points.
+			// Then the min timestamp of the archive would be maxTs - the archive
+			// retention.
+			if p.Timestamp > maxArchiveTs {
+				maxArchiveTs = p.Timestamp
+			}
 		}


Why do we do this for each archive again and again?

Don't all archives have the same "now" and they only differ in their retention period?

Ah, you're right. We had seen the "offset" value in the database dump but that is a byte offset, not a timestamp offset.

pkg/graphite/convert/whisperconverter/whisper.go

pstibrany · 2023-09-27T09:08:01Z

pkg/graphite/convert/whisperconverter/whisper.go

+			// if we are already in the second or subsequent archive and we had
+			// some points in the prior archives, we want to skip
+			// samples in the previous archives
+			maxArchiveTs = lastMinTs


We update maxArchiveTs here, but this variable it's not used afterwards.

pstibrany · 2023-09-27T09:09:23Z

pkg/graphite/convert/whisperconverter/whisper.go

+		lastMinTs = minArchiveTs
+
+		for _, p := range archivePoints {
+			if p.Timestamp < minArchiveTs {


Shall we check if minArchiveTs <= p.Timestamp && p.Timestamp <= maxArchiveTs holds here?

If we do that, we don't need to keep seenTs.

Actually check for minArchiveTs should be exclusive, ie. minArchiveTs < p.Timestamp must hold.

Because points are written to all archives at once, all archives will contain points from timestamps that are within the retentions. So we do need a check like this. I think we can do it more efficiently if we pre-sort the points and then do a double-index comparison

pkg/graphite/convert/whisperconverter/whisper.go

pstibrany · 2023-09-27T09:18:28Z

pkg/graphite/convert/whisperconverter/whisper_test.go

+			name:       "test retention when first archives are empty",
+			metricName: "mymetric",


This seems to be unrealistic scenario to me. If first archive (with raw data) is empty, how can there be any aggregations at all?

an end-to-end test broke, so we added this test to cover that edge case

ywwg · 2023-09-27T16:51:26Z

I think the logic checking to see if the point is a duplicate timestamp is redundant -- by definition we are only looking at points for timestamps where that archive is the highest resolution block. We can be much more efficient -- start with the oldest block and just blast through the points that are within the correct bounds

ywwg · 2023-09-27T16:53:09Z

a much faster algo is:

determine maxTs by finding the highest ts in the first archive
calculate ts bounds for each archive
loop through, starting at lowest resolution block.
- sort its points
- only add points to Kept that are within the bounds

done

…s, not the archives themselves

pstibrany

Thanks, this looks good.

I think we have tiny mistake in bounds computation (archive with 10 points, with max timestamp 100 and resolution 1 should NOT include point 90).

pkg/graphite/convert/whisperconverter/whisper.go

pstibrany · 2023-09-28T08:33:39Z

pkg/graphite/convert/whisperconverter/whisper.go

-			continue
+			// Don't include any points in this archive that were covered in a higher
+			// resolution archive.
+			if p.Timestamp >= lastMinTs {


Similarly:

Suggested change

if p.Timestamp >= lastMinTs {

if p.Timestamp > lastMinTs {

ahh I was wondering if I'd got that switched. thanks

pstibrany · 2023-09-28T08:35:19Z

pkg/graphite/convert/whisperconverter/whisper_test.go

+			name:       "test retention when first archives are empty",
+			metricName: "mymetric",


ywwg · 2023-09-28T13:20:03Z

thanks for the bounds-check fix!

ywwg · 2023-09-28T16:17:25Z

explaining the new tests I had to write, this is a valid archive:

Meta data:
  aggregation method: sum
  max retention: 2012232704

Archive 0 info:
  offset: 52
  seconds per point: 1
  points: 86400
  retention: 86400
  size: 1036800

Archive 1 info:
  offset: 1036852
  seconds per point: 3600
  points: 840
  retention: 3024000
  size: 10080

Archive 2 info:
  offset: 1046932
  seconds per point: 86400
  points: 73000
  retention: 6307200000
  size: 876000

Archive 0 data:
Archive 1 data:
Archive 2 data:
0: 1517443200,          1
1: 1517529600,          1
2: 1517616000,          1

pstibrany

Code makes sense to me. Thanks.

fix: skip points outside of archive retention on first pass

cce9d19

jesusvazquez marked this pull request as ready for review September 25, 2023 17:47

jesusvazquez requested a review from a team as a code owner September 25, 2023 17:47

jesusvazquez marked this pull request as draft September 25, 2023 17:47

jesusvazquez added 3 commits September 25, 2023 22:15

Fix tests

41a3dd9

Optimize slice allocation

d1bb9b5

Only sort once, use single slice

ceb6348

jesusvazquez commented Sep 26, 2023

View reviewed changes

jesusvazquez marked this pull request as ready for review September 26, 2023 10:25

fionaliao approved these changes Sep 26, 2023

View reviewed changes

Track previous archive max timestamp and use it to skip points

647f350

Signed-off-by: Jesus Vazquez <[email protected]>

jesusvazquez force-pushed the jvp/remove-points-outside-of-retention-on-first-pass branch from 07d14e0 to 647f350 Compare September 26, 2023 11:31

jesusvazquez added 2 commits September 26, 2023 18:02

fix daterange test by protecting from overflow

095071d

Signed-off-by: Jesus Vazquez <[email protected]>

jesusvazquez assigned ywwg and jesusvazquez Sep 26, 2023

pstibrany reviewed Sep 27, 2023

View reviewed changes

ywwg added 4 commits September 27, 2023 10:20

multi-array method

87e3143

more space efficient

383b6ea

Don't include *any* points covered by another retention level

249c2fb

fix tests and edge cases

2c10920

ywwg requested a review from pstibrany September 27, 2023 16:46

even more simplification

0eaaa93

ywwg added 2 commits September 27, 2023 13:56

unneeded type

800cc20

remove archive sorting, because that was just sorting the archiveinfo…

9577a27

…s, not the archives themselves

pstibrany approved these changes Sep 28, 2023

View reviewed changes

ywwg added 3 commits September 28, 2023 09:20

Fix bounds checking

d2f4800

Even more efficient -- use a more idiomatic append

f1d2e64

ugh we have to account for a new edge case

2e17bb6

cleanup

b4df706

pstibrany approved these changes Sep 28, 2023

View reviewed changes

proper fix

6bf0548

ywwg merged commit b4b782c into main Sep 28, 2023

ywwg deleted the jvp/remove-points-outside-of-retention-on-first-pass branch September 28, 2023 19:29

This was referenced Sep 28, 2023

chore(main): release mimir-proxies 1.1.0 #45

Merged

chore(main): release mimir-proxies 1.2.0 #85

Closed

This was referenced Mar 14, 2024

chore(main): release 1.1.0 #136

Closed

chore(main): release 1.2.0 #144

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: skip points outside of archive retention on first pass #82

fix: skip points outside of archive retention on first pass #82

jesusvazquez commented Sep 25, 2023 •

edited by ywwg

Loading

jesusvazquez Sep 26, 2023 •

edited

Loading

jesusvazquez Sep 26, 2023

jesusvazquez Sep 26, 2023

fionaliao left a comment

pstibrany commented Sep 26, 2023

jesusvazquez commented Sep 26, 2023 •

edited

Loading

pstibrany Sep 27, 2023

ywwg Sep 27, 2023

pstibrany Sep 27, 2023

pstibrany Sep 27, 2023

pstibrany Sep 27, 2023

ywwg Sep 27, 2023

pstibrany Sep 27, 2023

ywwg Sep 27, 2023

pstibrany Sep 28, 2023

ywwg commented Sep 27, 2023

ywwg commented Sep 27, 2023

pstibrany left a comment

pstibrany Sep 28, 2023

ywwg Sep 28, 2023

pstibrany Sep 28, 2023

ywwg commented Sep 28, 2023

ywwg commented Sep 28, 2023

pstibrany left a comment

		name: "test retention when first archives are empty",
		metricName: "mymetric",

fix: skip points outside of archive retention on first pass #82

fix: skip points outside of archive retention on first pass #82

Conversation

jesusvazquez commented Sep 25, 2023 • edited by ywwg Loading

jesusvazquez Sep 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fionaliao left a comment

Choose a reason for hiding this comment

pstibrany commented Sep 26, 2023

jesusvazquez commented Sep 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywwg commented Sep 27, 2023

ywwg commented Sep 27, 2023

pstibrany left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywwg commented Sep 28, 2023

ywwg commented Sep 28, 2023

pstibrany left a comment

Choose a reason for hiding this comment

jesusvazquez commented Sep 25, 2023 •

edited by ywwg

Loading

jesusvazquez Sep 26, 2023 •

edited

Loading

jesusvazquez commented Sep 26, 2023 •

edited

Loading