Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unique: large string still referenced, after interning only a small substring #69370

Closed
Deleplace opened this issue Sep 10, 2024 · 7 comments
Closed
Assignees
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@Deleplace
Copy link
Contributor

Go version

go version go1.23.1 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/deleplace/Library/Caches/go-build'
GOENV='/Users/deleplace/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/deleplace/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/deleplace/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.1'
GODEBUG=''
GOTELEMETRY='on'
GOTELEMETRYDIR='/Users/deleplace/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/p4/g7fnpss96g5708_mmv3cxzgh0000gn/T/go-build3365171445=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

  • Create a large string book (300MB)
  • Intern a string word which is a small substring of book, using unique.Make
  • Run the garbage collector with runtime.GC, after the last use of book, before the last use of word
  • Print the memory currently allocated by the program, using runtime.ReadMemStats

What did you see happen?

Program is still using 300MB

What did you expect to see?

Program should be using less than 10MB

Discussion

This issue has already been fixed by @mknyszek in CL 610738. The change will be included in the future Go 1.24.

The current issue is about documenting the behavior of Go 1.23.0 and Go 1.23.1 when interning strings, and retaining more memory than expected.

The problem arises only with strings, not with other types comparable with ==. String is the only comparable type that supports slicing.

When interning a small substring y of a large string x, 2 strategies are possible in the implementation of unique.Make:

  1. Keep the original memory location of y, within x. x will not be garbage collected, because the object pool in unique is keeping a reference to a part of it. The user is responsible about the memory consequences of interning a substring. The user may decide to call strings.Clone beforehand themself.
  2. Clone y into a fresh string y2, and intern y2. x may then be garbage collected.

unique implements strategy (2).

Before the fix 610738, unique.Make did proactively clone input strings. However, an internal map was still referencing the original input strings, which was unintended.

image

(in this diagram, Bwords is a []unique.Handle[string])

After the fix 610738, unique.Make proactively clones input strings, and then keeps only references to the clones. The large original input strings are then properly garbage collected.

image
@gabyhelp
Copy link

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@mknyszek
Copy link
Contributor

As @Deleplace states, this is already fixed. The only question is whether or not to backport it. I'll come back with a decision today.

@mknyszek mknyszek self-assigned this Sep 10, 2024
@mknyszek mknyszek added the NeedsFix The path to resolution is known, but the work has not been done. label Sep 10, 2024
@mknyszek mknyszek added this to the Go1.24 milestone Sep 10, 2024
@mknyszek
Copy link
Contributor

Ah and, thank you @Deleplace for filing an issue. :)

@ianlancetaylor
Copy link
Contributor

@gopherbot Please open backport to 1.23

Since the unique package is new in 1.23, we should implement this optimization there as well, for consistent behavior.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #69383 (for 1.23).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@mknyszek
Copy link
Contributor

Thanks @ianlancetaylor!

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/612295 mentions this issue: [release-branch.go1.23] unique: don't retain uncloned input as key

gopherbot pushed a commit that referenced this issue Sep 11, 2024
Currently the unique package tries to clone strings that get stored in
its internal map to avoid retaining large strings.

However, this falls over entirely due to the fact that the original
string is *still* stored in the map as a key. Whoops. Fix this by
storing the cloned value in the map instead.

This change also adds a test which fails without this change.

For #69370.
Fixes #69383.

Change-Id: I1a6bb68ed79b869ea12ab6be061a5ae4b4377ddb
Reviewed-on: https://go-review.googlesource.com/c/go/+/610738
Reviewed-by: Michael Pratt <[email protected]>
Auto-Submit: Michael Knyszek <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
(cherry picked from commit 21ac23a)
Reviewed-on: https://go-review.googlesource.com/c/go/+/612295
Auto-Submit: Tim King <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants