-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
counter reset on aggregation boundaries #1568
Comments
I don't think that's the output that I did notice that thanos/pkg/compact/downsample/downsample.go Line 594 in 99bc7d2
|
@brian-brazil : thanks for looking at this. Are the points you posted from a single aggregation chunk? If not, where did the aggregation boundary fall? |
Oops, got my timestamps wrong. That's two counter chunk, with two samples each. |
Here's a diagram that may better illustrate the potential issue. It shows two sections of the same time series, where a counter reset occurred between time 3 & 4. If that also happens to be the boundary between aggregated chunks, the reset will effectively get lost when using the aggregated counter state: the reset check will compare the state of the second section (4) to the last value of the first section (3). That could be avoided by recording the 'first value' of an aggregated chunk, and then use it use it instead of the counter state. |
I'm not seeing that in the code, thanos/pkg/compact/downsample/downsample.go Line 582 in bd9aa1b
|
thanos-io#1568 Signed-off-by: Alfred Landrum <[email protected]>
thanos-io#1568 Signed-off-by: Alfred Landrum <[email protected]>
thanos-io#1568 Signed-off-by: Alfred Landrum <[email protected]>
Yes - here's a unit test that fails due to the missing reset, and also demos storing the first raw value in the chunk to address the issue: |
You're missing this bit of logic in that test: thanos/pkg/compact/downsample/downsample.go Line 523 in bd9aa1b
|
it's there; it's called via |
Your test is calling |
The posted test doesn't call
neither of which would prevent the chosen boundary from occurring at the same time as a counter reset. So, the test calls |
I see what you're saying. I'm guessing maintaining state across the boundaries would handle this. We'd also want to check for the across-blocks case. |
The posted branch includes a proof-of-concept solution: which is to store the first raw value in the aggregated chunk. The first values would also need to be preserved during downsampling of aggregated chunks as well, which is not demonstrated in the posted branch. Preserving the first raw value seems like a minimal change that may require no changes to |
fyi, I plan to put up a PR that adds the first raw value for the first downsampling (as in the posted branch above), and also preserves the first value at subsequent aggregations as well. |
As discussed in thanos-io#1568, storing only the last raw value of a chunk will lose a counter reset when: a) the reset occurs at a chunk boundary, and b) the last raw value of the earlier chunk is less than the first aggregated value of the later chunk. This commit stores the first raw value of a chunk during the initial raw aggregation, and retains it during subsequent aggregations. This is similar to the existing handling for the last raw value of a chunk. With this change, when counterSeriesIterator iterates over a chunk boundary, it will see the last raw value of the earlier chunk, then the first raw value of the later chunk, and then the first aggregated value of the later chunk. The first raw value will always be less than or equal to the first aggregated value, so the only difference in counterSeriesIterator's output will be the possible detection of a reset and an extra sample after the chunk boundary. Fixes: thanos-io#1568 Signed-off-by: Alfred Landrum <[email protected]>
* store the first raw value of a chunk during downsampling As discussed in #1568, storing only the last raw value of a chunk will lose a counter reset when: a) the reset occurs at a chunk boundary, and b) the last raw value of the earlier chunk is less than the first aggregated value of the later chunk. This commit stores the first raw value of a chunk during the initial raw aggregation, and retains it during subsequent aggregations. This is similar to the existing handling for the last raw value of a chunk. With this change, when counterSeriesIterator iterates over a chunk boundary, it will see the last raw value of the earlier chunk, then the first raw value of the later chunk, and then the first aggregated value of the later chunk. The first raw value will always be less than or equal to the first aggregated value, so the only difference in counterSeriesIterator's output will be the possible detection of a reset and an extra sample after the chunk boundary. Fixes: #1568 Signed-off-by: Alfred Landrum <[email protected]> * changelog for #1709 Signed-off-by: Alfred Landrum <[email protected]> * adjust existing downsampling tests Signed-off-by: Alfred Landrum <[email protected]> * add counter aggregation comments to CounterSeriesIterator Signed-off-by: Alfred Landrum <[email protected]>
) * store the first raw value of a chunk during downsampling As discussed in thanos-io#1568, storing only the last raw value of a chunk will lose a counter reset when: a) the reset occurs at a chunk boundary, and b) the last raw value of the earlier chunk is less than the first aggregated value of the later chunk. This commit stores the first raw value of a chunk during the initial raw aggregation, and retains it during subsequent aggregations. This is similar to the existing handling for the last raw value of a chunk. With this change, when counterSeriesIterator iterates over a chunk boundary, it will see the last raw value of the earlier chunk, then the first raw value of the later chunk, and then the first aggregated value of the later chunk. The first raw value will always be less than or equal to the first aggregated value, so the only difference in counterSeriesIterator's output will be the possible detection of a reset and an extra sample after the chunk boundary. Fixes: thanos-io#1568 Signed-off-by: Alfred Landrum <[email protected]> * changelog for thanos-io#1709 Signed-off-by: Alfred Landrum <[email protected]> * adjust existing downsampling tests Signed-off-by: Alfred Landrum <[email protected]> * add counter aggregation comments to CounterSeriesIterator Signed-off-by: Alfred Landrum <[email protected]> Signed-off-by: Aleksey Sin <[email protected]>
My colleague @aponjavic and I have a question about the counter aggregation implementation: We were wondering if the following scenario could occur in real world usage:
In this scenario, a
CounterSeriesIterator
wouldn't treat the A1 -> A2 transition as a reset. I can force this scenario by modifying the unit tests in downsample_test.go; for example, using raw values of:{t: 5, v: 95}, {t: 15, v: 100}, {t: 25, v: 97}, {t: 35, v: 107}
and forcing aggregation at 20 time unit boundaries causes 2 aggregation chunks, whose values are:
[]downsample.sample{downsample.sample{t:15, v:100}, downsample.sample{t:15, v:100}, downsample.sample{t:35, v:107}, downsample.sample{t:35, v:107}}
Since the value of 107 is > 100, it seems that any iteration over these wouldn't track the reset that occurred between time 15 and time 25.
If this can actually occur, we imagine that adding a "first raw" sample point in A2 for {t: 25: v: 97}, and then ensuring its passed through the aggregation iterations, would address the issue.
Please let us know if this is a real possibility, or if we've misunderstood how counter downsampling works!
The text was updated successfully, but these errors were encountered: