Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unique: "run out of hash bits" bug in internal HashTrieMap #69534

Closed
GiedriusS opened this issue Sep 19, 2024 · 12 comments
Closed

unique: "run out of hash bits" bug in internal HashTrieMap #69534

GiedriusS opened this issue Sep 19, 2024 · 12 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@GiedriusS
Copy link

GiedriusS commented Sep 19, 2024

Go version

go version devel go1.24-3d33437c45 Sun Sep 15 02:05:37 2024 +0000 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/giedriusstatkevicius/.cache/go-build'
GOENV='/home/giedriusstatkevicius/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/giedriusstatkevicius/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/giedriusstatkevicius/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/giedriusstatkevicius/dev/goroot'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/giedriusstatkevicius/dev/goroot/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='devel go1.24-3d33437c45 Sun Sep 15 02:05:37 2024 +0000'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/giedriusstatkevicius/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/giedriusstatkevicius/dev/thanos/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2862491001=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Trying to intern all strings coming through gRPC using the new unique.Make(): new vtprotobuf extension. It seems like it is very easy to break some limits? The machine has 128 cores and 256GiB+ of RAM so resources shouldn't be a problem.

I tried creating a toy program that interns random strings and couldn't reproduce it, unfortunately.

What did you see happen?

I see this after some time:

panic: internal/concurrent.HashMapTrie: ran out of hash bits while inserting                                                
 goroutine 119167 [running]:                                                                                                 
 internal/concurrent.(*HashTrieMap[...]).expand(0x3869440?, 0xc004e117d0, 0xc00c21ed20, 0x7eee22e697634db7, 0x0, 0xc0151414a0
)                                                                                                                                                                                             
       /usr/local/go/src/internal/concurrent/hashtriemap.go:163 +0x1ea                                                       
 internal/concurrent.(*HashTrieMap[...]).LoadOrStore(0x3869440, {0xc00155e107, 0x12}, {0xc000fe1540})
       /usr/local/go/src/internal/concurrent/hashtriemap.go:142 +0x40a
 unique.Make[...]({0xc00155e107, 0x12})
       /usr/local/go/src/unique/handle.go:67 +0x1ea
 github.com/thanos-io/thanos/pkg/store/labelpb.(*Label).UnmarshalVT(0xc001935bd0, {0xc00155e0fe, 0x1b, 0x3f02})
       /home/giedriusstatkevicius/dev/thanos/pkg/store/labelpb/types_vtproto.pb.go:303 +0x4e5

Same happens with 1.23.1.

What did you expect to see?

I expected no panic.

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 19, 2024
@cagedmantis cagedmantis added this to the Go1.24 milestone Sep 19, 2024
@cagedmantis
Copy link
Contributor

cc @mknyszek

@mknyszek
Copy link
Contributor

There's no internal limit, this is just a bug. ("Run out of hash bits" means that the implementation would've expected a hash collision given that it just walked over (what it thinks is) the maximum depth of the tree and ran out of hash bits. Hash collisions are still handled, though.)

Looking into it.

@mknyszek mknyszek self-assigned this Sep 19, 2024
@mknyszek mknyszek added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 19, 2024
@mknyszek mknyszek moved this to In Progress in Go Compiler / Runtime Sep 19, 2024
@GiedriusS
Copy link
Author

I'll try to add an atomic counter to see how many calls it takes to trigger this panic

@mknyszek
Copy link
Contributor

My suspicion is an off-by-one error in creating new nodes. Like, the hash matches down to the last 4 bits, but the last 4 don't and we give up too early. It occurs to me that this codepath isn't well-tested because actual collisions (which are well-tested) short-circuit to appending onto the overflow list for a node.

I'll try to write a test for this case and see if that's the problem.

@mknyszek
Copy link
Contributor

mknyszek commented Sep 19, 2024

Yeah, actually, I'm feeling more confident in this diagnosis. If it's the case, I should have a fix and a test shortly.

EDIT: No, that doesn't seem to be it.

@mknyszek
Copy link
Contributor

mknyszek commented Sep 19, 2024

I took a brief look at planetscale/vtprotobuf#140 and I noticed that you're using unsafe.String to pass a byte slice through to unique.Make. This is fine, but one thing that occurs to me is that on insert, the node we're colliding with is re-hashed. If the byte slice you passed to unique.Make is mutated prior to this commit then it's possible that that produces a different hash.

It already looks like you already tried at tip though, past that commit, and ran into the same issue. Can you confirm that?

@mknyszek mknyszek changed the title unique: easy to "run out of hash bits" unique: "run out of hash bits" bug in internal HashTrieMap Sep 19, 2024
@GiedriusS
Copy link
Author

GiedriusS commented Sep 19, 2024

I took a brief look at planetscale/vtprotobuf#140 and I noticed that you're using unsafe.String to pass a byte slice through to unique.Make. This is fine, but one thing that occurs to me is that on insert, the node we're colliding with is re-hashed. If the byte slice you passed to unique.Make is mutated prior to this commit then it's possible that that produces a different hash.

It already looks like you already tried at tip though, past that commit, and ran into the same issue. Can you confirm that?

Yeah, that's why I tried the tip version. Without that commit the panic occurs within a few seconds of me triggering that code path. With that commit it's a bit harder and occurs in 30-60 seconds.

@mknyszek
Copy link
Contributor

mknyszek commented Sep 19, 2024

If you have any instructions to reproduce, or if you're willing to share a core file or something (so that I can inspect the state of the HashTrieMap on crash) that would help me iterate on this faster.

I can also write a small patch to dump the state of the HashMapTrie on crash, if you're willing to run that and provide the output. (EDIT: If it helps, I can remove the actual key/value output, I think I just need the hash anyway.)

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/614475 mentions this issue: unique,internal/concurrent: add some more tests

@GiedriusS
Copy link
Author

Let me try to work on reproducing this and creating a core dump file.

gopherbot pushed a commit that referenced this issue Sep 26, 2024
One is a test of unsafe.String usage, which was broken before CL 610738
was merged.

The other is trying to improve coverage of "near collision" scenarios in
the HashTrieMap where only the last few bits differ. This is intended to
catch off-by-one errors in iterating down the tree.

For #69534.

Change-Id: I3f302e148e81269a50e93b5edf83cafc2d291098
Reviewed-on: https://go-review.googlesource.com/c/go/+/614475
Auto-Submit: Michael Knyszek <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
Reviewed-by: Cuong Manh Le <[email protected]>
@GiedriusS
Copy link
Author

You are correct, I couldn't reproduce this anymore with the newest version after that fix. I probably accidentally ran a build with an older version. Sorry for the noise and thank you so much for all your hard work. In case this resurfaces, I will reopen this issue.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Go Compiler / Runtime Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

5 participants