-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add query memory limits #1747
Add query memory limits #1747
Conversation
Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: Philip Panyukov <[email protected]>
- there are tons on iterations - why add overhead of a function call? - not that it's going to save the world but still Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: Philip Panyukov <[email protected]>
b2ea7c5
to
c960b7b
Compare
Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: Philip Panyukov <[email protected]>
I think this is pretty much ready. If people could give some love to this PR it would be great :) @bwplotka ? |
- this somewhat addresses thanos-io#703 Signed-off-by: Philip Panyukov <[email protected]>
I've added However, I'm not sure this does anything much, or at least I couldn't find any sensible value for it beyond existing query limits. This pathological query still uses tons of memory, despite all limits:
|
Probably due to the reasons I have mentioned here: #1369 (comment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what others think of this and I understand what you are trying to do but I'm not sure that I like this because the GC does not release the memory immediately so counting of these sizes might still lead to OOM situations and it feels more like tapering over the situation instead of solving it at its core. But, I understand that stopping the world on every Series() call or from time to time is also not the solution. We currently have per-samples limits and probably the next logical step would be to add some kind of limits in terms of label sets that a time series might have. But that probably needs to be elegantly solved somehow at the Prometheus level instead of here.
I'm not sure I like the implementation of how it is right now as well. We probably do not want to introduce any magical variables like this.
return parsedLimit | ||
} | ||
|
||
func byteCountToHuman(n int64) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I considered that, but I don't want to take an extra dependency for something which can be done in 10 lines of code trivially and used only in one place. Would you agree?
|
||
// buffer query sizes and process in chunks of 20MB or so | ||
// to avoid hammering cache lines with atomic increments. | ||
const querySizeBufferSize = 20 * 1000 * 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any benchmarks to show that this is really optimal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really just a "finger in the air" figure. Arbitrary number. Reasoning being: we don't care if we go over the limit +/-20MB since we are talking about limits of 500MB+ in real scenarios.
I don't know what kind of benchmarks we can do to show this is "optimal". What is "optimal" in this case? I'm open to suggestions if someone has any better ideas.
// approximate the length of each label being about 20 chars, e.g. "k8s_app_metric0" | ||
approxLabelLen = int64(10) | ||
|
||
// approximate the size if chunks by having 120 bytes in each? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// approximate the size if chunks by having 120 bytes in each? | |
// approximate the size of chunks by having 120 bytes in each? |
|
||
defer func() { | ||
totalSizeMsg := fmt.Sprintf("%.2fMB", float64(queryTotalSize)/float64(1000000)) | ||
localSizeMsg := fmt.Sprintf("%.2fMB", float64(queryLocalSize)/float64(1000000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably reuse byteCountToHuman
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I don't want to expose byteCountToHuman
as it's really an internal package detail, it doesn't make sense for the package to provide ByteCountToHuman
I think.
If this bothers anyone, maybe indeed taking dep on humanize makes sense and then we can use it in both places. I don't know if benefits of extra dep outweight the cost of this though.
Not too fussed about either approach if this is what blocks this PR :)
Yes correct. There are other allocs (index, shared chunk pool) which are not controlled by these knobs.
I see it more like "step in the right direction for having rate limits" and "experimental feature to see if it actually helps in real world". But maybe I do indeed solve this at the wrong level. What do you mean by "solving it at its core"? If you could elaborate a bit that'd be great!
errrm, I'm not familiar with this! Is it the "chunk pool" things you are referring to? If so, I think this is slightly different? Or is it something completely different you meant?
I'm not sure where at Prom level this would be solved as the whole thing seems to be Thanos-specific?
What not to like about this beautiful zero-abstractions carefully-crafted implementation? :) But seriously, any concrete dislikes are very welcome, I want to have happy consensus on the way this is implemented.
Yes. Which ones? the 20MB one? The approximate sizes? All of them? I agree! I will take on board any better ideas if someone has them. I'm very happy to discuss and chage things to make them better. It would be ideal if we could reach this:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This is a draft implementation of Query Memory Limits proposal (#1746).
Things to note:
Steps I've taken to minimise any performance impact:
atomic.AddInt64
. The costs of usingatomic
package across multiple goroutines can be exceptionally high.If the above steps are deemed insufficient and we really want to not count anything if the feature is not used, we can extract bits into a func an set it to empty if feature is not enabled. But what is more expensive? To do a few
int64
additions or make a function call? :)Surfacing the feature to users:
THANOS_LIMIT_QUERY_PIPE
andTHANOS_LIMIT_QUERY_TOTAL
as it is least intrusive.To run with limits:
This will show things like this:
Verification
The memory counters seem to be "good enough", at least if we trust
pprof
to tell us how many bytes we allocate :) Using instrumentation (which is now removed), we get these figures.// Store Gateway - single block
// Store Gateway - query overall
// Querier - query overall
Note that Querier and SG broadly agree on the total query size.
Changelog
Not done yet :)
Docker image
The latest build of this branch is on docker hub if anyone wants to give it a spin: