Add query memory limits #1747

ppanyukov · 2019-11-14T19:01:19Z

This is a draft implementation of Query Memory Limits proposal (#1746).

Things to note:

Things are counted in quite substantial loops.
Ideally I'd like to have this feature on the basis "you don't use it, you don't pay for it".
However that might not possible to remove the cost entirely.
The cost of counting is likely to be a fraction in the overall cost of queries.

Steps I've taken to minimise any performance impact:

Using buffered counters, especially to avoid excessive calls to atomic.AddInt64. The costs of using atomic package across multiple goroutines can be exceptionally high.
Minimise function calls by inlining where possible and adding to raw local variables not shared between goroutines.
Approximating the query sizes where possible:
- Querier: to avoid iterating over labels and chunks;
- Store Gateway: to avoid excessive increments to counters as loops are quite substantial.
Overall I couldn't see any measurable negative impact in my one-machine environment.

If the above steps are deemed insufficient and we really want to not count anything if the feature is not used, we can extract bits into a func an set it to empty if feature is not enabled. But what is more expensive? To do a few int64 additions or make a function call? :)

Surfacing the feature to users:

Currently using env vars THANOS_LIMIT_QUERY_PIPE and THANOS_LIMIT_QUERY_TOTAL as it is least intrusive.
By default limits are not enforced, although the statistics is still provided via debug logger.

To run with limits:

THANOS_LIMIT_QUERY_PIPE=1200000000 THANOS_LIMIT_QUERY_TOTAL=1200000000 ./thanos

This will show things like this:

// Store Gateway
level=debug ts=2019-11-21T14:35:47.267444Z caller=limit.go:29 THANOS_LIMIT_QUERY_PIPE=500.00MB THANOS_LIMIT_QUERY_TOTAL=1.20GB
level=debug ts=2019-11-21T14:35:47.703926Z caller=bucket.go:678 queryTotalSize=9.36MB queryLocalSize=9.36MB
level=debug ts=2019-11-21T14:35:47.711417Z caller=bucket.go:678 queryTotalSize=18.72MB queryLocalSize=9.36MB
level=debug ts=2019-11-21T14:35:47.745473Z caller=bucket.go:678 queryTotalSize=28.08MB queryLocalSize=9.36MB
level=debug ts=2019-11-21T14:35:47.751189Z caller=bucket.go:678 queryTotalSize=37.44MB queryLocalSize=9.36MB
level=debug ts=2019-11-21T14:35:48.626972Z caller=bucket.go:678 queryTotalSize=89.36MB queryLocalSize=31.92MB
level=debug ts=2019-11-21T14:35:48.643825Z caller=bucket.go:678 queryTotalSize=101.28MB queryLocalSize=31.92MB
level=debug ts=2019-11-21T14:35:52.967735Z caller=bucket.go:678 queryTotalSize=643.61MB queryLocalSize=182.32MB
level=debug ts=2019-11-21T14:35:53.123464Z caller=bucket.go:678 queryTotalSize=645.92MB queryLocalSize=182.32MB
level=debug ts=2019-11-21T14:35:53.214819Z caller=bucket.go:678 queryTotalSize=648.24MB queryLocalSize=182.32MB

// Querier
level=debug ts=2019-11-21T15:05:38.760654Z caller=limit.go:29 THANOS_LIMIT_QUERY_PIPE=1.20GB THANOS_LIMIT_QUERY_TOTAL=1.20GB
level=debug ts=2019-11-21T15:17:16.325742Z caller=proxy.go:390 msg="THANOS_LIMIT_QUERY_TOTAL limit 1.20GB violated (got 1.20GB)"
level=debug ts=2019-11-21T15:17:16.325791Z caller=proxy.go:376 queryTotalSize=1200.03MB queryLocalSize=940.02MB
level=debug ts=2019-11-21T15:17:16.362089Z caller=proxy.go:390 msg="THANOS_LIMIT_QUERY_TOTAL limit 1.20GB violated (got 1.22GB)"
level=debug ts=2019-11-21T15:17:16.362125Z caller=proxy.go:376 queryTotalSize=1220.03MB queryLocalSize=280.01MB
level=debug ts=2019-11-21T15:17:16.362143Z caller=proxy.go:191 queryTotalSize=1220.03MB

Verification

The memory counters seem to be "good enough", at least if we trust pprof to tell us how many bytes we allocate :) Using instrumentation (which is now removed), we get these figures.

// Store Gateway - single block

WRITTEN HEAP DUMP TO /Users/philip/thanos/github.com/ppanyukov/thanos-oom/heap-sg-blockSeries-11-before.pb.gz
MEM STATS DIFF:   	sg-blockSeries 	sg-blockSeries - AFTER 	-> Delta
    HeapAlloc  : 	288.14M 	563.41M 		-> 275.27M
    HeapObjects: 	5.02M 		7.27M 			-> 2.26M

MEM PROF DIFF:    	sg-blockSeries 	sg-blockSeries - AFTER 	-> Delta
    InUseBytes  : 	233.15M 	425.31M 		-> 192.16M    <==|
    InUseObjects: 	847 		1.27K 			-> 426
    AllocBytes  : 	2.17G 		2.52G 			-> 350.76M
    AllocObjects: 	5.24K 		5.71K 			-> 464

WRITTEN HEAP DUMP TO /Users/philip/thanos/github.com/ppanyukov/thanos-oom/heap-sg-blockSeries-11-after.pb.gz
queryLocalSize: 180.25MB    <==|

// Store Gateway - query overall

MEM STATS DIFF:   	sg-Series 	sg-Series - AFTER 	-> Delta
    HeapAlloc  : 	275.09M 	1.13G 			-> 858.34M
    HeapObjects: 	4.90M 		12.70M 			-> 7.80M

MEM PROF DIFF:    	sg-Series 	sg-Series - AFTER 	-> Delta
    InUseBytes  : 	223.81M 	946.70M 		-> 722.89M    <==|
    InUseObjects: 	843 		2.63K 			-> 1.78K
    AllocBytes  : 	2.15G 		3.48G 			-> 1.33G
    AllocObjects: 	5.18K 		7.35K 			-> 2.16K
queryTotalSize: 642.51MB    <==|

// Querier - query overall

MEM STATS DIFF:   	q-Series 	q-Series - AFTER 	-> Delta
    HeapAlloc  : 	5.27M 		892.61M 		-> 887.34M
    HeapObjects: 	16.99K 		13.83M 			-> 13.81M

MEM PROF DIFF:    	q-Series 	q-Series - AFTER 	-> Delta
    InUseBytes  : 	4.24M 		654.24M 		-> 650.00M    <==|
    InUseObjects: 	12 		14.89K 			-> 14.88K
    AllocBytes  : 	7.19M 		1.62G 			-> 1.61G
    AllocObjects: 	29 		18.04K 			-> 18.01K

WRITTEN HEAP DUMP TO /Users/philip/thanos/github.com/ppanyukov/thanos-oom/heap-q-Series-1-after.pb.gz
queryTotalSize: 668.00MB    <==|

Note that Querier and SG broadly agree on the total query size.

Changelog

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Not done yet :)

Docker image

The latest build of this branch is on docker hub if anyone wants to give it a spin:

docker pull ppanyukov/thanos:qlimit

Signed-off-by: Philip Panyukov <[email protected]>

- there are tons on iterations - why add overhead of a function call? - not that it's going to save the world but still Signed-off-by: Philip Panyukov <[email protected]>

Signed-off-by: Philip Panyukov <[email protected]>

ppanyukov · 2019-11-21T15:57:47Z

I think this is pretty much ready. If people could give some love to this PR it would be great :) @bwplotka ?

- this somewhat addresses thanos-io#703 Signed-off-by: Philip Panyukov <[email protected]>

ppanyukov · 2019-11-23T17:46:30Z

I've added THANOS_LIMIT_PROMQL_MAX_SAMPLES env var which is passed on to the PromQL engine as the limit.

However, I'm not sure this does anything much, or at least I couldn't find any sensible value for it beyond existing query limits.

This pathological query still uses tons of memory, despite all limits:

count({__name__=~".+"}) by (__name__)

GiedriusS · 2019-11-25T22:59:01Z

I've added THANOS_LIMIT_PROMQL_MAX_SAMPLES env var which is passed on to the PromQL engine as the limit.

However, I'm not sure this does anything much, or at least I couldn't find any sensible value for it beyond existing query limits.

This pathological query still uses tons of memory, despite all limits:
count({__name__=~".+"}) by (__name__)

Probably due to the reasons I have mentioned here: #1369 (comment).

GiedriusS

I'm not sure what others think of this and I understand what you are trying to do but I'm not sure that I like this because the GC does not release the memory immediately so counting of these sizes might still lead to OOM situations and it feels more like tapering over the situation instead of solving it at its core. But, I understand that stopping the world on every Series() call or from time to time is also not the solution. We currently have per-samples limits and probably the next logical step would be to add some kind of limits in terms of label sets that a time series might have. But that probably needs to be elegantly solved somehow at the Prometheus level instead of here.

I'm not sure I like the implementation of how it is right now as well. We probably do not want to introduce any magical variables like this.

GiedriusS · 2019-11-25T23:09:36Z

pkg/limit/limit.go

+	return parsedLimit
+}
+
+func byteCountToHuman(n int64) string {


https://github.com/dustin/go-humanize? 👁️

Yes I considered that, but I don't want to take an extra dependency for something which can be done in 10 lines of code trivially and used only in one place. Would you agree?

GiedriusS · 2019-11-25T23:10:45Z

pkg/store/proxy.go

+
+		// buffer query sizes and process in chunks of 20MB or so
+		// to avoid hammering cache lines with atomic increments.
+		const querySizeBufferSize = 20 * 1000 * 1000


Any benchmarks to show that this is really optimal?

This is really just a "finger in the air" figure. Arbitrary number. Reasoning being: we don't care if we go over the limit +/-20MB since we are talking about limits of 500MB+ in real scenarios.

I don't know what kind of benchmarks we can do to show this is "optimal". What is "optimal" in this case? I'm open to suggestions if someone has any better ideas.

GiedriusS · 2019-11-25T23:11:12Z

pkg/store/proxy.go

+			// approximate the length of each label being about 20 chars, e.g. "k8s_app_metric0"
+			approxLabelLen = int64(10)
+
+			// approximate the size if chunks by having 120 bytes in each?


Suggested change

// approximate the size if chunks by having 120 bytes in each?

// approximate the size of chunks by having 120 bytes in each?

GiedriusS · 2019-11-25T23:12:03Z

pkg/store/proxy.go

+
+		defer func() {
+			totalSizeMsg := fmt.Sprintf("%.2fMB", float64(queryTotalSize)/float64(1000000))
+			localSizeMsg := fmt.Sprintf("%.2fMB", float64(queryLocalSize)/float64(1000000))


We could probably reuse byteCountToHuman here.

Yes, but I don't want to expose byteCountToHuman as it's really an internal package detail, it doesn't make sense for the package to provide ByteCountToHuman I think.

If this bothers anyone, maybe indeed taking dep on humanize makes sense and then we can use it in both places. I don't know if benefits of extra dep outweight the cost of this though.

Not too fussed about either approach if this is what blocks this PR :)

ppanyukov · 2019-11-26T13:05:21Z

... so counting of these sizes might still lead to OOM situations

Yes correct. There are other allocs (index, shared chunk pool) which are not controlled by these knobs.

... and it feels more like tapering over the situation instead of solving it at its core.

I see it more like "step in the right direction for having rate limits" and "experimental feature to see if it actually helps in real world". But maybe I do indeed solve this at the wrong level. What do you mean by "solving it at its core"? If you could elaborate a bit that'd be great!

We currently have per-samples limits and probably the next logical step would be to add some kind of limits in terms of label sets that a time series might have.

errrm, I'm not familiar with this! Is it the "chunk pool" things you are referring to? If so, I think this is slightly different? Or is it something completely different you meant?

But that probably needs to be elegantly solved somehow at the Prometheus level instead of here.

I'm not sure where at Prom level this would be solved as the whole thing seems to be Thanos-specific?

I'm not sure I like the implementation of how it is right now as well.

What not to like about this beautiful zero-abstractions carefully-crafted implementation? :) But seriously, any concrete dislikes are very welcome, I want to have happy consensus on the way this is implemented.

We probably do not want to introduce any magical variables like this.

Yes. Which ones? the 20MB one? The approximate sizes? All of them? I agree! I will take on board any better ideas if someone has them.

I'm very happy to discuss and chage things to make them better. It would be ideal if we could reach this:

Are we happy for this kind of approach in general?
If so, are we happy to expose this as experimental feature exposed via env vars?
If so, are we happy to iterate on this as we see real-world usage?
If there are any "no" answers, what do we need to do to turn then in to "yes" answers?

stale · 2020-01-11T03:42:39Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ppanyukov mentioned this pull request Nov 15, 2019

docs: Query Memory Limits proposal #1746

Closed

ppanyukov added 13 commits November 21, 2019 13:31

instrument SG and Querier

936179c

Signed-off-by: Philip Panyukov <[email protected]>

implement query limits in Q and SG

fadf0ae

Signed-off-by: Philip Panyukov <[email protected]>

goimiports

d686407

Signed-off-by: Philip Panyukov <[email protected]>

Print messages to stderr in limit.go

84ac471

Signed-off-by: Philip Panyukov <[email protected]>

don't write pprof dump files

aa3f47d

Signed-off-by: Philip Panyukov <[email protected]>

fix deadlock due to instrumention

a6de556

Signed-off-by: Philip Panyukov <[email protected]>

remove instrumentation

25ac1ab

Signed-off-by: Philip Panyukov <[email protected]>

buffer atomic counters

ed68748

Signed-off-by: Philip Panyukov <[email protected]>

faster query sizing in SG by using approximations

47a3d86

Signed-off-by: Philip Panyukov <[email protected]>

manually inline frameSize func in proxy.go

1c0210d

- there are tons on iterations - why add overhead of a function call? - not that it's going to save the world but still Signed-off-by: Philip Panyukov <[email protected]>

better addAndCheck func for query limiters

34c9010

Signed-off-by: Philip Panyukov <[email protected]>

remove useless atomic load

5d2a366

Signed-off-by: Philip Panyukov <[email protected]>

remove all printing using fmt package, replace with logger

c960b7b

Signed-off-by: Philip Panyukov <[email protected]>

ppanyukov force-pushed the feature/CDATA-1163-query-limits branch from b2ea7c5 to c960b7b Compare November 21, 2019 13:32

ppanyukov added 3 commits November 21, 2019 13:56

tidy up diffs

2ecd97e

Signed-off-by: Philip Panyukov <[email protected]>

fix dumb nil pointer dereference

c8bcce9

Signed-off-by: Philip Panyukov <[email protected]>

print debug log message when query limit is exceeded

17ce652

Signed-off-by: Philip Panyukov <[email protected]>

ppanyukov marked this pull request as ready for review November 21, 2019 15:34

ppanyukov changed the title ~~DRAFT: Add query memory limits~~ Add query memory limits Nov 21, 2019

Expose PromQL samples limit via THANOS_LIMIT_PROMQL_MAX_SAMPLES env var

b228521

- this somewhat addresses thanos-io#703 Signed-off-by: Philip Panyukov <[email protected]>

GiedriusS reviewed Nov 25, 2019

View reviewed changes

ppanyukov mentioned this pull request Nov 28, 2019

querier: Limit LabelNames; LabelValues to certain time period. #1811

Closed

bwplotka mentioned this pull request Nov 28, 2019

Long Term Storage Improvements [Tracking Issue] #1705

Closed

34 tasks

stale bot added the stale label Jan 11, 2020

Shadi mentioned this pull request Jan 15, 2020

query: query crashes when start and end are equal and the step value is very large #2004

Closed

stale bot closed this Jan 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add query memory limits #1747

Add query memory limits #1747

ppanyukov commented Nov 14, 2019 •

edited

Loading

ppanyukov commented Nov 21, 2019

ppanyukov commented Nov 23, 2019

GiedriusS commented Nov 25, 2019

GiedriusS left a comment

GiedriusS Nov 25, 2019

ppanyukov Nov 26, 2019

GiedriusS Nov 25, 2019

ppanyukov Nov 26, 2019

GiedriusS Nov 25, 2019

GiedriusS Nov 25, 2019

ppanyukov Nov 26, 2019

ppanyukov commented Nov 26, 2019

stale bot commented Jan 11, 2020

	// approximate the size if chunks by having 120 bytes in each?
	// approximate the size of chunks by having 120 bytes in each?

Add query memory limits #1747

Add query memory limits #1747

Conversation

ppanyukov commented Nov 14, 2019 • edited Loading

Verification

Changelog

Docker image

ppanyukov commented Nov 21, 2019

ppanyukov commented Nov 23, 2019

GiedriusS commented Nov 25, 2019

GiedriusS left a comment

Choose a reason for hiding this comment

GiedriusS Nov 25, 2019

Choose a reason for hiding this comment

ppanyukov Nov 26, 2019

Choose a reason for hiding this comment

GiedriusS Nov 25, 2019

Choose a reason for hiding this comment

ppanyukov Nov 26, 2019

Choose a reason for hiding this comment

GiedriusS Nov 25, 2019

Choose a reason for hiding this comment

GiedriusS Nov 25, 2019

Choose a reason for hiding this comment

ppanyukov Nov 26, 2019

Choose a reason for hiding this comment

ppanyukov commented Nov 26, 2019

stale bot commented Jan 11, 2020

ppanyukov commented Nov 14, 2019 •

edited

Loading