Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: scanobject can use incorrect span metadata #11267

Closed
aclements opened this issue Jun 18, 2015 · 7 comments
Closed

runtime: scanobject can use incorrect span metadata #11267

aclements opened this issue Jun 18, 2015 · 7 comments
Milestone

Comments

@aclements
Copy link
Member

When scanobject looks up the span for a pointer it read from the object it's scanning, it's possible for it to get incorrect information for the span, causing it to ignore the pointer, or, much worse, attempt to mark uninitialized memory.

  1. scanobject loads a pointer slot out of the object it's scanning, and this happens to be one of the special pointers from the heap into a stack. Call the pointer p and suppose it points into X's stack.
  2. X, running on another thread, grows its stack and frees its old stack.
  3. The old stack happens to be large or was the last stack in its span, so X frees this span, setting it to state _MSpanFree.
  4. scanobject calls heapBitsForObject, which loads the span containing p, which is now in state _MSpanFree, so it ignore it. There's currently a disabled test here that would panic (issue runtime: reenable bad pointer check in GC #10591).

or

  1. The span gets reused as a heap span.
  2. scanobject calls heapBitsForObject, which loads the span containing p, which is now in state _MSpanInUse, but doesn't necessarily have an object at p. The not-object at p gets marked, and at this point all sorts of things can go wrong.

This is quite easy to reproduce with a well-placed sleep:

diff --git a/src/runtime/mbitmap.go b/src/runtime/mbitmap.go
index 146ffbf..fe36f3a 100644
--- a/src/runtime/mbitmap.go
+++ b/src/runtime/mbitmap.go
@@ -190,6 +190,9 @@ func heapBitsForObject(p uintptr) (base uintptr, hbits heapBits, s *mspan) {
        // Consult the span table to find the block beginning.
        k := p >> _PageShift
        s = h_spans[idx]
+       if s != nil && s.state == _MSpanStack {
+               usleep(100)
+       }
        if s == nil || pageID(k) < s.start || p >= s.limit || s.state != mSpanInUse {
                if s == nil || s.state == _MSpanStack {
                        // If s is nil, the virtual address has never been part of the heap.
@@ -201,7 +204,7 @@ func heapBitsForObject(p uintptr) (base uintptr, hbits heapBits, s *mspan) {
                // The following ensures that we are rigorous about what data
                // structures hold valid pointers.
                // TODO(rsc): Check if this still happens.
-               if false {
+               if true {
                        // Still happens sometimes. We don't know why.
                        printlock()
                        print("runtime:objectstart Span weird: p=", hex(p), " k=", hex(k))

With this sleep, GOGC=10 ./runtime.test -test.short crashes reliably for me.

@RLH @rsc

@aclements
Copy link
Member Author

I can think of a few possible ways to fix this.

  1. Somehow synchronize stack growth and scanobject. I don't think this is a good idea unless we can find a way that doesn't slow down scanobject.
  2. Delay freeing stack spans during a GC until the end of that GC. This would give stack spans the same monotonic property that makes heapBitsForObject safe on heap spans. I think we do something like this for some runtime data structure, though I forget what it is. This has the downside that, if the user is creating goroutines at a rapid clip during a GC, they'll eat up memory faster than they do now.
  3. Replace all heap-to-stack pointers with uintptrs. The garbage collector simply won't trace these, and it can even blow up if it finds a pointer into a stack span.

3 seems like the cleanest solution, assuming it doesn't have other ramifications.

@aclements
Copy link
Member Author

... and it can even blow up if it finds a pointer into a stack span.

... from a heap object. This code path is also used for scanning stacks, which can of course contain pointers into themselves (and, maybe, in the runtime, to another stack). Don't want to blow up on those. But if we're not scanning a stack, we shouldn't see a stack pointer.

@randall77
Copy link
Contributor

We already do #2. See stack1.go:620-628.

@aclements
Copy link
Member Author

I see. Did we just miss the fact that the same logic applies equally to the growing branch of that if?

Still, the number of stacks that can shrink during a GC cycle is limited, since the GC is in charge of doing the shrink. These zombie stacks can't last more than the duration of mark termination. If we do the same for growing stacks, user code can leak an arbitrary amount of stack memory during a GC cycle. We don't have a mechanism for pushing back on that like we do for pushing back on allocation. Hence, I'm still leaning toward solution 3. I think that may also eliminate the need for the the delayed free of shrunk stacks.

@aclements
Copy link
Member Author

I took a stab at 3, and quickly ran into pointers that could point in to either the stack or the heap for deep reasons (like g.sched.ctxt), which throws a monkey-wrench into this solution.

But there's a fourth solution that's like delayed freeing of stacks, but I think solves its downsides: free stacks immediately, but delay freeing stack spans. It's perfectly fine to reuse the stacks. It's fine to reuse the stack spans. The danger is only the state transition from stack span to free span (and possibly to heap span after that), so that's all we need to prevent.

This looks easy to do for small stacks: just don't immediately free them in stackpoolfree when ref==0. At the end of mark termination, walk the stackpools and free spans with ref==0. For large stacks, I'm not sure what the best way to do this is that also lets us reuse that memory for new stacks.

@aclements
Copy link
Member Author

I chatted with @rsc about this. The plan is to simply return small stacks to the stack cache without freeing their spans until the end of GC. That way small stacks can be reused, so a program rapidly creating and exiting goroutines won't chew through memory. This should be a fairly small change. For large stacks, for now, we'll simply delay freeing them and not attempt to reuse that memory during a GC cycle. We should be able to reuse the existing delay free mechanism to do this, so that should also be a low risk change. When 1.6 opens, we can reconsider whether and how we could reuse the memory of large stacks.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/11502 mentions this issue.

@golang golang locked and limited conversation to collaborators Jun 28, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants