Replace string interner with an LRU and per-origin cache up top. #20943

lallydd · 2023-11-19T15:10:51Z

What does this PR do?

This is 1 of 3 PRs for moving strings out of the heap and into MMAP'd files.

Per-origin LRU caches. No plumbing yet for origins (e.g. containers).
mmap-file implementation (short but worth separate analysis)
Plumbing - refcounting changes to agent, plumbing for origins.

Together they will provide explicit tracking of primary (strings in contexts
and metrics) memory use per container. Additionally, each container
will have a separate allocation pool drawing from temporary files on disk.

This should alleviate primary OOM concerns, as the primary driver for memory
use is taken out of the process's RSS. Memory pressure will result in IOPs
instead of agent termination. Each container's usage is separated and can be used
as a basis for throttling / backpressure decisions when IOPs go above desired
levels.

Motivation

This PR in isolation provides a better interner than we previously had, by gradual eviction
instead of dumping the entire map. It also provides tracking mechanisms for per-container
usage. It also provides a stub implementation of the mmap backend, to show how the
actual one would integrate. The next PR will show a linux implementation - although it should
be usable completely for a macos implementation. The interface exposed should allow a
windows CreateFileMapping() based implementation if desired.

Additional Notes

The agent should scale to handle higher workloads, but has the same deployment
for tiny and gigantic machines. The machines are almost exclusively idle and the
ample available resources are left unused by the agent due to its constant memory
cap and the workloads that can't use the machine for fear of overwhelming the agent.

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

Testcaess were provided, and the full test suite inv test was run on a linux-arm64 VM.

Reviewer's Checklist

cit-pr-commenter · 2023-11-19T15:26:09Z

Go Package Import Differences

Baseline: 3f5d700
Comparison: 7508437

binary	os	arch	change
agent	linux	amd64	+8, -0 +github.com/hashicorp/golang-lru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
agent	linux	arm64	+8, -0 +github.com/hashicorp/golang-lru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
agent	windows	amd64	+8, -0 +github.com/hashicorp/golang-lru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
agent	windows	386	+8, -0 +github.com/hashicorp/golang-lru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
agent	darwin	amd64	+8, -0 +github.com/hashicorp/golang-lru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
agent	darwin	arm64	+8, -0 +github.com/hashicorp/golang-lru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
iot-agent	linux	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
iot-agent	linux	arm64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
heroku-agent	linux	amd64	+8, -0 +github.com/hashicorp/golang-lru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
cluster-agent	linux	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
cluster-agent	linux	arm64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
cluster-agent-cloudfoundry	linux	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
cluster-agent-cloudfoundry	linux	arm64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
dogstatsd	linux	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
dogstatsd	linux	arm64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
process-agent	linux	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
process-agent	linux	arm64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
process-agent	windows	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
process-agent	darwin	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
process-agent	darwin	arm64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
heroku-process-agent	linux	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
security-agent	linux	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
security-agent	linux	arm64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
serverless	linux	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
serverless	linux	arm64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
system-probe	linux	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
system-probe	linux	arm64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
system-probe	windows	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
trace-agent	linux	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
trace-agent	linux	arm64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
trace-agent	windows	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
trace-agent	windows	386	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
trace-agent	darwin	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
trace-agent	darwin	arm64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog
heroku-trace-agent	linux	amd64	+9, -0 +github.com/hashicorp/golang-lru +github.com/hashicorp/golang-lru/simplelru +golang.org/x/text/feature/plural +golang.org/x/text/internal/catmsg +golang.org/x/text/internal/format +golang.org/x/text/internal/number +golang.org/x/text/internal/stringset +golang.org/x/text/message +golang.org/x/text/message/catalog

iksaif · 2023-11-20T09:08:29Z

pkg/util/cache/intern.go

+package cache
+
+import (
+	"fmt"


I don't think that's how we usually sort imports in this project

I just let the IDE do it. Is there a doc for what the preference is?

https://github.com/DataDog/datadog-agent/blob/main/docs/dev/imports.md?plain=1
Note that this is not well-known and not enforced (but setting up your IDE to do it automatically is quick and helps having a consistent import order).

iksaif · 2023-11-20T09:09:07Z

comp/dogstatsd/server/parse.go

@@ -188,7 +189,7 @@ func (p *parser) parseMetricSample(message []byte) (dogstatsdMetricSample, error
 	}

 	return dogstatsdMetricSample{
-		name:        p.interner.LoadOrStore(name),
+		name:        p.interner.LoadOrStore(name, "", nil),


If LoadOrStore() is always called with , "", nil) is probably worth adding an helper that doesn't require those arguments

The third PR in the series has the plumbing - that includes useful values for all 3 args. Use the lally/exp-mem-metrefs branch for reference: https://github.com/DataDog/datadog-agent/blob/lally/exp-mem-metrefs/comp/dogstatsd/server/parse.go#L190

I think i'd be better to only add it when we need it no ?

Actually, we could start passing in the origin IDs now (I'll have to start integrating the plumbing for this bit) so that we get per-origin (e.g. per-container) tracking now with this PR. How does that sound?

iksaif · 2023-11-20T09:09:29Z

pkg/util/cache/intern.go

+
+// backingBytesPerInCoreEntry is the number of bytes to allocate in the mmap file per
+// element in our LRU.  E.g., some value of initialInternerSize * POW(growthFactor, N).
+const backingBytesPerInCoreEntry = 4096


we want this to be configurable no ?

This is set to match the page size on the platform. For the MMUs in x86 and ARM, thats 4k (hugepages goes bigger).

FYI os.Getpagesize() is available

iksaif · 2023-11-20T09:09:45Z

pkg/util/cache/intern.go

+)
+
+// initialInternerSize is the size of the LRU cache (in #strings).  This is HEAP, so
+// don't let this get too big compared to the MMAP region.


don't mention "MMAP" here?

It's a series of 3 PRs, staged. Do you expect me to rewrite the first two as if there are no successors?

iksaif · 2023-11-20T09:10:54Z

comp/dogstatsd/server/parse.go

@@ -38,7 +39,7 @@ var (
 // parser parses dogstatsd messages
 // not safe for concurent use
 type parser struct {
-	interner    *stringInterner
+	interner    *cache.KeyedInterner


If we don't remove the old interner implementation, do we want instead to add an interface that will (for a time) let us choose between one implementation or the other ?

This will make sure all this code stays optional until we know it makes performances better

We will want to provide a write-up of how this change impacts low and high-end use cases, see this notebook as an example. If we are going to swap out the implementation like this I'd be interested in seeing the regression detector run here but I also think if the swap were configurable it'd be easier to rig experiments to understand the user implications of the work being proposed.

If someone can pass me appropriate API keys (the back-to-back summits have made this tricky) for ddev, I can get the benchmark to start uploading data to the benchmark notebook.

I can help you run this in the aml benchmark cluster, but I'd also like to see this run in the SMP regression detector.

once this passes enough of the lint/test/package build steps, the regression detector should run automatically on this PR.

iksaif · 2023-11-20T09:12:10Z

pkg/util/cache/intern.go

+const noFileCache = ""
+
+// OriginTimeSampler marks allocations to the Time Sampler.
+const OriginTimeSampler = "!Timesampler"


Should this really be defined here ?

These are for internal diagnostic use. I don't care where they go, but it's easier to see the full list if they're all in one place. I can move them to their use sites if preferred.

iksaif · 2023-11-20T09:14:29Z

pkg/util/cache/mmap_hash.go

+)
+
+// MaxValueSize is the largest possible value we can store.
+const MaxValueSize = 4080


Where does it come from ?

Copied from the mmap_hash_linux, which is based on hardware constraints & datastructure decisions that I don't expect to change across platforms.

https://github.com/DataDog/datadog-agent/blob/lally/exp-mem-metrefs/pkg/util/cache/mmap_hash_linux.go#L52

pkg/util/cache/intern.go

iksaif · 2023-11-20T09:22:27Z

High level comment before going further: can you remove all the mmap code and just keep the in-memory part ? I think we want to validate performances and behavior with the in-memory version before adding the mmap complexity.

Also I don't think we want to shard by origin with origin being OriginTimeSampler or such no? If we want to shard it's probably going by tenant no (a mix or dogstatsd container origin and listener id ?) ?

pkg/util/cache/intern.go

blt · 2023-11-20T09:50:16Z

pkg/util/cache/intern.go

+
+// LoadOrStore interns a byte-array to a string, for an origin
+func (i *KeyedInterner) LoadOrStore(key []byte, origin string, retainer InternRetainer) string {
+	sGlobalQueryCount.Add(1)


It's not clear to me that this needs to be atomic. We increment here only -- and you've got it atomic to skip taking the lock that's implicit in loadOrStore -- but the sum is only used in a debug log. I guess, how accurate does this need to be? This is going to be a sequentially consistent operation: we're forcing all CPUs to sync, then we skip some takes of the mutex, then we force all the CPUs to sync again (on x86, better on ARM).

I'm using a guideline of atomic operations on MESIF 'E' states being ~8ns in the common (l1 uncontested) case https://arxiv.org/pdf/2010.09852.pdf

I think a good part of that 8ns is still the regular unexclusive op.

Generally, I just wanted rough statistics without too much bother :)

Fair, but in a large piece of software like the Agent as core counts ramp it's not always clear that atomics cheap in the micro are cheap in the macro. A nit flag since we don't have data one way or the other yet and there's other low hanging sync issues in the Agent yet.

blt · 2023-11-20T09:54:29Z

pkg/util/cache/intern.go

+	if Check(s) {
+		return i.LoadOrStore(unsafe.Slice(unsafe.StringData(s), len(s)), origin, retainer)
+	}
+	sFailedInternalCount.Add(1)


Flagging another potential seqcst here. Although presumably rare if most strings are valid, pushing known invalid strings does ramp cost.

The only way they fail the Check() is if there's a fairly severe bug. I've been catching them with this function. sFailedInternalCount is zero on my internal testing. But I thought to leave the diagnostics in for any future changes that might need it.

I'm happy to take them out, or to hide behind a config flag.

Nah, both are nit flags and this instance is less likely to fire than the other.

pkg/util/cache/intern.go

pkg/util/cache/lru_cache.go

lallydd · 2023-11-21T20:54:32Z

High level comment before going further: can you remove all the mmap code and just keep the in-memory part ? I think we want to validate performances and behavior with the in-memory version before adding the mmap complexity.

Also I don't think we want to shard by origin with origin being OriginTimeSampler or such no? If we want to shard it's probably going by tenant no (a mix or dogstatsd container origin and listener id ?) ?

This is the in-memory part. The mmap code is (a) disabled by default and (b) stubbed out here.

I reserved the ! prefix to bucket them all into the same place, I'll add that change here.

lallydd · 2023-11-22T13:26:45Z

Hi I still have to figure out what has happened in this go.mod rabbit hole. Also still need to integrate changes from #20420.

lallydd · 2023-11-22T19:22:21Z

I put everything behind a switch & interface after talking to @blt and @iksaif. Telemetry w.r.t. #20420 should be comparable (there aren't any resets, but there are drops, and they're different metrics).

brycekahle · 2023-11-27T20:28:41Z

AFAIK the OOM killer uses Working Set Memory for decisions (at least in k8s/namespaces), which does include mmap-ed regions. Is this being taken into account?

lallydd · 2023-11-27T23:49:38Z

Sorry, can you point to any docs where working set includes mmap-ed files? AFAIK it doesn't include the page cache used by the container. I can understand why it'd use anonymously-mmap'd memory, but file-backed mmaps should morally count as page cache only. I can dig into the appropriate sources tomorrow to get a canonical answer.

brycekahle · 2023-11-28T00:24:32Z

I can understand why it'd use anonymously-mmap'd memory, but file-backed mmaps should morally count as page cache only

I don't have extensive knowledge of how it treats the different types. I do know the fd-backed (but not real files) mmap regions we use when interacting with the perf subsystem are accounted for in WSS. They are created with the MAP_SHARED flag.

This fixes an unhappy dep, we don't depend on a comp dep within pkg/util.

lallydd · 2023-11-29T14:32:25Z

I think the in memory LRU part looks great, but I'd really like to see micro-benchmarks (or real life benchmarks) before merging such a large amount of code (the regressions tests are useful, but I don't think they dispense us from actual benchmarks) - it should not take too long and they will be useful to later PRs on this part.

Ah! I think that's where we're missing each other. The regression detector is a performance regression detector - AFAIK it's a series of benchmarks to detect negative changes in performance.

Related to the MMAP code, I also have the intuition that it should be useful, but I think before we merge anything related to mmap it would be very useful to do a build of your changes and test it with a known-to-be-memory-consuming scenarios and gather some data supporting the idea (similar to @brycekahle's comment).

It's all in https://github.com/DataDog/datadog-agent/tree/lally/exp-mem-metrefs - I've developed it against the stress-test benchmark. I agree it's necessary to verify the behavior against the memory-heavy situations. Especially when the current agent OOMs. I've been, frankly, bashing my head against the wall on the go mod / linter situation but that's clearing up now. I'll start taking a look on setting that up.

Maybe It could be split out of this PR if the in-memory LRU interner already has demonstrated value for us ? (not blocking, just a suggestion)

The mmap changes aren't in this PR. There's some hooks for the mmap changes to go in, but the mmap changes aren't there. See pkg/util/cache/mmap_hash.go - it's a stub that'll be used for non-linux platforms. The linux impl isn't in this PR.

Another open question (not blocking this PR) is how does it fit in the big picture of memory usage within the agent ? What if next it's the python checks that are taking all the memory ? We can't trigger backpressure on statsd because python checks are using memory, that would be unfair. I think it would be interesting to see that written down somewhere (for context, it's the issue that the current memory limiter has)

That's a fine question. Generally, we want to find ways to throttle the component making us use excessive amounts of memory. I know nothing about the python checks. For them, the questions are: what controls do we have on them? If the checks' results linger in memory for a while, we can try reducing frequency. Or we can try to find ways to contain its usage (e.g., move parallel ops to serial, split up the work into batches, run in a separate container with a mem limit and control over an IPC bridge, etc).

Signed-off-by: Brian L. Troutwine <[email protected]>

val06 · 2023-11-30T13:38:47Z

@val06 thanks these are great. When we're near OOM-killing state in the mmap-enabled situation, where I expect that the container limits block is set quite high or unset, I think this is when we should look for strategies to split up the larger mmaps (>100 MiB?) into many files that we can open/close -- perhaps with LRU. That'll keep more of that dataset in the inactive list. WDYT?

Yes I also consider this scenario as the main candidate for benefits of using mmap as a backend for string interning. One of the main gains (subject to PoC'ing) is avoiding OOMs caused by k8s hard memory limits and GO's gc behavior. For reference: in System-Probe, we are using this pkg for string interning (in heap, no LRU)

lallydd requested review from a team as code owners November 19, 2023 15:10

lallydd force-pushed the lally/mem-cacheonly branch from 3b69508 to 50b0fa7 Compare November 19, 2023 15:19

lallydd mentioned this pull request Nov 19, 2023

Merge extended reference handling back into primary experimental branch. #20585

Closed

10 tasks

iksaif reviewed Nov 20, 2023

View reviewed changes

pkg/util/cache/intern.go Outdated Show resolved Hide resolved

blt reviewed Nov 20, 2023

View reviewed changes

pkg/util/cache/intern.go Outdated Show resolved Hide resolved

blt reviewed Nov 20, 2023

View reviewed changes

pkg/util/cache/intern.go Show resolved Hide resolved

blt reviewed Nov 20, 2023

View reviewed changes

scottopell reviewed Nov 20, 2023

View reviewed changes

blt added this to the 7.51.0 milestone Nov 21, 2023

blt added changelog/no-changelog [deprecated] qa/skip-qa - use other qa/ labels [DEPRECATED] Please use qa/done or qa/no-code-change to skip creating a QA card team/single-machine-performance Single Machine Performance labels Nov 21, 2023

lallydd force-pushed the lally/mem-cacheonly branch from ff93ef4 to 6468dc7 Compare November 22, 2023 15:09

lallydd and others added 15 commits November 29, 2023 09:29

Passes tests w/oreal mmap or refcounting.

eb30914

Updates for comments,

e832bf0

Interners are selectable and have good telemetry.

6033941

Go module changes.

9d85991

Linter fixes.

a9942f0

Deps cleanup for modules.

bad2161

Made growth exp configurable.

3a6009b

More linter fixes.

59007f6

fixup for exponential parameter in tests.

58c7191

go mod tidy fixes.

4743dcc

More tidy fixes.

d855f28

More vendoring and moving a util call upwards.

0487e01

another mod tidy.

398182b

Runs go mod tidy on pkg/util/cache

4b31b52

Moved config reader impl to server

dbf2e67

This fixes an unhappy dep, we don't depend on a comp dep within pkg/util.

inv tidy-all pass.

c7a9567

lallydd force-pushed the lally/mem-cacheonly branch from 3992888 to c7a9567 Compare November 29, 2023 18:02

string-interner experiments with LRU enabled

19a7a4a

Signed-off-by: Brian L. Troutwine <[email protected]>

blt requested a review from a team as a code owner November 29, 2023 18:37

Port #21191 changes

7508437

Signed-off-by: Brian L. Troutwine <[email protected]>

kacper-murzyn added the 7.51.0-drop label Jan 8, 2024

kacper-murzyn modified the milestones: 7.51.0, 7.52.0 Jan 8, 2024

kacper-murzyn added the 7.52.0-drop label Feb 18, 2024

kacper-murzyn modified the milestones: 7.52.0, 7.53.0 Feb 18, 2024

scottopell closed this Mar 5, 2024

dd-devflow bot deleted the lally/mem-cacheonly branch May 31, 2024 00:00

Replace string interner with an LRU and per-origin cache up top. #20943

Replace string interner with an LRU and per-origin cache up top. #20943

Conversation

lallydd commented Nov 19, 2023

What does this PR do?

Motivation

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

Reviewer's Checklist

cit-pr-commenter bot commented Nov 19, 2023 • edited Loading

Go Package Import Differences

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lallydd Nov 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iksaif commented Nov 20, 2023 • edited Loading

blt Nov 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blt Nov 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lallydd commented Nov 21, 2023

lallydd commented Nov 22, 2023

lallydd commented Nov 22, 2023

brycekahle commented Nov 27, 2023

lallydd commented Nov 27, 2023

brycekahle commented Nov 28, 2023

lallydd commented Nov 29, 2023 • edited Loading

val06 commented Nov 30, 2023

cit-pr-commenter bot commented Nov 19, 2023 •

edited

Loading

lallydd Nov 20, 2023 •

edited

Loading

iksaif commented Nov 20, 2023 •

edited

Loading

blt Nov 20, 2023 •

edited

Loading

blt Nov 21, 2023 •

edited

Loading

lallydd commented Nov 29, 2023 •

edited

Loading