Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure finalizers don't deallocate GPGME objects while C code is still using them #23

Merged
merged 3 commits into from
Jan 15, 2020

Conversation

mtrmac
Copy link
Contributor

@mtrmac mtrmac commented Jan 10, 2020

For some time, we’ve been struggling with more or less occasional memory corruption, e.g. when running the tests of github.com/containers/image/signature

GO111MODULE="on" go test -c ./signature/
for i in $(seq 1 10000); do
    echo =====$i
    (cd signature; GOTRACEBACK=crash GODEBUG=clobberfree=1,cgocheck=2 ../signature.test) || break
done

eventually crashes on most platforms.


Thanks to Ulrich Obergfell [email protected], we now have a strong candidate for the cause:

As far as Go is concerned, Data.dh and Data.cbc are just values, so it is OK to implement

cgo_call(d.dh)

as

tmp := d.dh
dropReferenceTo(d)
// d's finalizer can be called at any point from now on, possibly deallocating d.dh
cgo_call(tmp) // possibly using free memory

To fix this, explicitly extend the lifetime of Data:

cgo_call(d.dh)
runtime.KeepAlive(d)

becomes

tmp := d.dh
cgo_call(tmp) // d.dh is certainly still valid
dropReferenceTo(d)
// d's finalizer can be called from now on, possibly deallocating d.dh

and the like.


This PR implements that throughout the codebase; see the individual commit messages for details.

I’m filing this PR for ~early review, I’ll run a stress-test to validate the fixes (at least on the code paths relevant to us) over the weekend.

As far as Go is concerned, Data.dh and Data.cbc are just values, so it is
OK to implement
> cgo_call(d.dh)
as
> tmp := d.dh
> dropReferenceTo(d)
> // d's finalizer can be called at any point from now on, possibly deallocating d.dh
> cgo_call(tmp) // possibly using free memory

To fix this, explicitly extend the lifetime of Data:
> cgo_call(d.dh)
> runtime.KeepAlive(d)
becomes
> tmp := d.dh
> cgo_call(tmp) // d.dh is certainly still valid
> dropReferenceTo(d)
> // d's finalizer can be called from now on, possibly deallocating d.dh

Signed-off-by: Miloslav Trmač <[email protected]>
As far as Go is concerned, Context.ctx and Context.cbc are just values, so it is
OK to implement
> cgo_call(c.ctx)
as
> tmp := c.ctx
> dropReferenceTo(c)
> // c's finalizer can be called at any point from now on, possibly deallocating c.ctx
> cgo_call(tmp) // possibly using free memory

To fix this, explicitly extend the lifetime of Context:
> cgo_call(c.ctx)
> runtime.KeepAlive(c)
becomes
> tmp := c.ctx
> cgo_call(tmp) // c.ctx is certainly still valid
> dropReferenceTo(c)
> // c's finalizer can be called from now on, possibly deallocating c.ctx

To be extra careful, this adds runtime.KeepAlive calls even in cases where the current
code path necessarily keeps c alive afterwards, so that the patern is clear and so
that future possible changes to the code don't invalidate this assumption.

Secondarily, in a few cases GPGMe returns internal pointers inside Context.ctx;
in that case, extend the lifetime of Context until we make a complete copy of
the data.

As a special case, EngineInfo is now not just a wrapper around a C pointer with
an unknown lifetime, but we make a full copy of the data when obtaining the pointer
so that we can safely use it after dropping the context that returned the information,
or after changing the information and possibly invalidating the pointers.

Signed-off-by: Miloslav Trmač <[email protected]>
As far as Go is concerned, Key.k is just a value, so it is
OK to implement
> cgo_call(k.k)
as
> tmp := k.k
> dropReferenceTo(k)
> // k's finalizer can be called at any point from now on, possibly deallocating k.k
> cgo_call(tmp) // possibly using free memory

To fix this, explicitly extend the lifetime of Key:
> cgo_call(k.k)
> runtime.KeepAlive(k)
becomes
> tmp := k.k
> cgo_call(tmp) // k.k is certainly still valid
> dropReferenceTo(k)
> // k's finalizer can be called from now on, possibly deallocating k.k

Signed-off-by: Miloslav Trmač <[email protected]>
@mtrmac
Copy link
Contributor Author

mtrmac commented Jan 10, 2020

A more explicit demonstration:

package main

import (
	"fmt"
	"math/rand"
	"runtime"
)

type pointerBinding struct {
	managed uint64
}

func newPointerBinding() *pointerBinding {
	ptr := &pointerBinding{}
	runtime.SetFinalizer(ptr, (*pointerBinding).Finalizer)
	ptr.managed = rand.Uint64() // Could come from CGo
	fmt.Printf("Created %p, managed value %v\n", ptr, ptr.managed)
	return ptr
}

func (ptr *pointerBinding) Finalizer() {
	fmt.Printf("Finalizing %p, managed value %v\n", ptr, ptr.managed)
	// Free ptr.managed via CGo
	ptr.managed = 0
}

func (ptr *pointerBinding) functionBinding() {
	fmt.Printf("Calling function for %p, managed value %v\n", ptr, ptr.managed)
	underlyingFunction(ptr.managed)
	fmt.Printf("Done\n") // No reference to ptr here, or the behavior would change
	// HERE runtime.KeepAlive(ptr)
}

func underlyingFunction(pointer uint64) {
	runtime.GC()
	fmt.Printf("Underlying function, managed value %v\n", pointer)
}

func main() {
	ptr := newPointerBinding()
	ptr.functionBinding()
	runtime.GC()
}

As is:

$ GOTRACEBACK=crash GODEBUG=clobberfree=1,cgocheck=2 go run ./standalone.go 
Created 0xc000014080, managed value 5577006791947779410
Calling function for 0xc000014080, managed value 5577006791947779410
Finalizing 0xc000014080, managed value 5577006791947779410
Underlying function, managed value 5577006791947779410
Done

note that “Finalizing” happens before “Underlying function” is done with the managed value.

Uncommenting the runtime.KeepAlive at “HERE”:

$ GOTRACEBACK=crash GODEBUG=clobberfree=1,cgocheck=2 go run ./standalone.go 
Created 0xc000014080, managed value 5577006791947779410
Calling function for 0xc000014080, managed value 5577006791947779410
Underlying function, managed value 5577006791947779410
Done
Finalizing 0xc000014080, managed value 5577006791947779410

works as expected. (Without the GO* variables, the finalizer is actually never run, but that’s fine for our purposes as well.)

@mtrmac
Copy link
Contributor Author

mtrmac commented Jan 13, 2020

I’m filing this PR for ~early review, I’ll run a stress-test to validate the fixes (at least on the code paths relevant to us) over the weekend.

I have encountered no crashes during a 3-day test run of the tests mentioned above; before this PR the tests would crash in 1–4 iterations on macOS, and in <2 thousand iterations on Linux; now it survived 1053 iterations on macOS, and 160931 iterations on Linux.

(The different number of iterations is caused by the Linux build of GPG having https://dev.gnupg.org/T3380 applied.)

Copy link
Owner

@proglottis proglottis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the PR @mtrmac - that must have been an inordinate amount of effort 💚

If I knew this would have been the result I definitely would not have used finalisers. In fact I'm wondering if it would be better to remove them, better safe than sorry?

I've left a couple of questions

gpgme.go Show resolved Hide resolved
gpgme.go Show resolved Hide resolved
gpgme.go Show resolved Hide resolved
@mtrmac mtrmac changed the title Ensure finalizers don't deallocate GPGMe objects while C code is still using them Ensure finalizers don't deallocate GPGME objects while C code is still using them Jan 14, 2020
@mtrmac
Copy link
Contributor Author

mtrmac commented Jan 15, 2020

If I knew this would have been the result I definitely would not have used finalisers. In fact I'm wondering if it would be better to remove them, better safe than sorry?

In general, I think it’s nice for the Go bindings to do the work to make the API Go-like, including Go assumptions like no need to worry about memory references/lifetimes.

The code in containers/image that calls these bindings apparently relies on the finalizers, there are no explicit Close/Release calls (I’m actually rather surprised at this… I don’t know why I never thought to look for them or call them, my best guess guess because the Context/Data objects are in-memory-only), and the containers/image library is used in long-term daemons. So, at least for this code, removing finalizers now would lead to undesirable memory leaks unless the callers were adjusted.

That does not necessarily mean that other consumers are the same, but it’s a possibility, and nothing in the documentation comments suggests the need to call the Close/Release methods.

@proglottis proglottis merged commit 92153bc into proglottis:master Jan 15, 2020
@mtrmac mtrmac deleted the lifetimes branch January 18, 2020 21:01
mtrmac added a commit to mtrmac/image that referenced this pull request Jan 18, 2020
This primarily includes proglottis/gpgme#23 ,
fixing use-after-free crashes.

Signed-off-by: Miloslav Trmač <[email protected]>
mtrmac added a commit to mtrmac/docker that referenced this pull request Feb 20, 2020
This fixes CVE-2020-8945 by incorporating
proglottis/gpgme#23 .

Other changes included by the rebase:
- Support for gpgme_off_t (~no-op with the RHEL 7 GPGME 1.3.2)
- Wrapping a few more GPGME functions (irrelevant if we don't call them)
- Better error reporting in Context.GetKey

Given how invasive the CVE fix is (affecting basically all binding
code), it seems safer to just update the package (and be verifiably
equivalent with upstream) than to backport and try to back out the few
other changes.

Performed by updating vendor.conf, and
$ mkdir -p _build/src/github.com/docker
$ ln -s $(pwd) _build/src/github.com/docker/docker
$ GOPATH=$(pwd)/_build:$GOPATH vndr github.com/mtrmac/gpgme

Signed-off-by: Miloslav Trmač <[email protected]>
mtrmac added a commit to mtrmac/skopeo that referenced this pull request Feb 20, 2020
This fixes CVE-2020-8945 by incorporating proglottis/gpgme#23 .

Other changes included by the rebase:
- Support for gpgme_off_t (~no-op on Linux)
- Wrapping a few more GPGME functions (irrelevant if we don't call them)

Given how invasive the CVE fix is (affecting basically all binding
code), it seems safer to just update the package (and be verifiably
equivalent with upstream) than to backport and try to back out the few
other changes.

Performed by
$ go get github.com/mtrmac/[email protected]
$ make vendor

Signed-off-by: Miloslav Trmač <[email protected]>
mtrmac added a commit to mtrmac/skopeo that referenced this pull request Feb 20, 2020
This fixes CVE-2020-8945 by incorporating proglottis/gpgme#23 .

Other changes included by the rebase:
- Support for gpgme_off_t (~no-op on Linux)
- Wrapping a few more GPGME functions (irrelevant if we don't call them)

Given how invasive the CVE fix is (affecting basically all binding
code), it seems safer to just update the package (and be verifiably
equivalent with upstream) than to backport and try to back out the few
other changes.

Performed by
$ go get github.com/mtrmac/[email protected]
$ make vendor
and manually backing out unrelated deletions of files (which Go 1.13 does
but Go 1.11, used for testing the old version, does not).

Signed-off-by: Miloslav Trmač <[email protected]>
mtrmac added a commit to mtrmac/skopeo that referenced this pull request Feb 21, 2020
This fixes CVE-2020-8945 by incorporating proglottis/gpgme#23 .

Other changes included by the rebase:
- Support for gpgme_off_t (~no-op on Linux)
- Wrapping a few more GPGME functions (irrelevant if we don't call them)

Given how invasive the CVE fix is (affecting basically all binding
code), it seems safer to just update the package (and be verifiably
equivalent with upstream) than to backport and try to back out the few
other changes.

Performed by updating vendor conf,
$ vndr github.com/mtrmac/gpgme
and manually backing out unrelated deletions of files.

Signed-off-by: Miloslav Trmač <[email protected]>
mtrmac added a commit to mtrmac/skopeo that referenced this pull request Feb 21, 2020
This fixes CVE-2020-8945 by incorporating proglottis/gpgme#23 .

Other changes included by the rebase:
- Support for gpgme_off_t (~no-op on Linux)
- Wrapping a few more GPGME functions (irrelevant if we don't call them)

Given how invasive the CVE fix is (affecting basically all binding
code), it seems safer to just update the package (and be verifiably
equivalent with upstream) than to backport and try to back out the few
other changes.

Performed by updating vendor conf and
$ vndr github.com/mtrmac/gpgme

Signed-off-by: Miloslav Trmač <[email protected]>
mtrmac added a commit to mtrmac/skopeo that referenced this pull request Feb 21, 2020
This fixes CVE-2020-8945 by incorporating proglottis/gpgme#23 .

Other changes included by the rebase:
- Support for gpgme_off_t (~no-op on Linux)
- Wrapping a few more GPGME functions (irrelevant if we don't call them)

Given how invasive the CVE fix is (affecting basically all binding
code), it seems safer to just update the package (and be verifiably
equivalent with upstream) than to backport and try to back out the few
other changes.

Performed by updating vendor conf and
$ vndr github.com/mtrmac/gpgme

Signed-off-by: Miloslav Trmač <[email protected]>
mtrmac added a commit to mtrmac/machine-config-operator that referenced this pull request Feb 27, 2020
This "fixes" CVE-2020-8945 by incorporating proglottis/gpgme#23 .

The code is not actually used, for two reasons:
- Nothing in this repository invokes signature verification
  (the subpackage is only used to generate contents of policy.json)
- Builds use the 'containers_image_openpgp' build tag, which
  switches to the non-gpgme signature backend.

This updates the vendored code anyway
- to avoid false positives when scanning for vulnerabilities
- so that we don't have to worry about any future changes in this
  repository enabling those code paths.

Performed by
$ go get github.com/mtrmac/[email protected] && make go-deps
in a golang:1.12 container.

Signed-off-by: Miloslav Trmač <[email protected]>
mtrmac added a commit to mtrmac/machine-config-operator that referenced this pull request Jun 20, 2020
This "fixes" CVE-2020-8945 by incorporating proglottis/gpgme#23 .

The code is not actually used, for two reasons:
- Nothing in this repository invokes signature verification
  (the subpackage is only used to generate contents of policy.json)
- Builds use the 'containers_image_openpgp' build tag, which
  switches to the non-gpgme signature backend.

This updates the vendored code anyway
- to avoid false positives when scanning for vulnerabilities
- so that we don't have to worry about any future changes in this
  repository enabling those code paths.

Performed by
$ go get github.com/mtrmac/[email protected] && make go-deps

Signed-off-by: Miloslav Trmač <[email protected]>
mtrmac added a commit to mtrmac/machine-config-operator that referenced this pull request Jun 20, 2020
This "fixes" CVE-2020-8945 by incorporating proglottis/gpgme#23 .

The code is not actually used, for two reasons:
- Nothing in this repository invokes signature verification
  (the subpackage is only used to generate contents of policy.json)
- Builds use the 'containers_image_openpgp' build tag, which
  switches to the non-gpgme signature backend.

This updates the vendored code anyway
- to avoid false positives when scanning for vulnerabilities
- so that we don't have to worry about any future changes in this
  repository enabling those code paths.

Performed by
$ GOPROXY=https://proxy.golang.org GO111MODULE=on go get github.com/mtrmac/[email protected] && make go-deps
in a golang:1.12 container

Signed-off-by: Miloslav Trmač <[email protected]>
mtrmac added a commit to mtrmac/machine-config-operator that referenced this pull request Jun 20, 2020
This "fixes" CVE-2020-8945 by incorporating proglottis/gpgme#23 .

The code is not actually used, for two reasons:
- Nothing in this repository invokes signature verification
  (the subpackage is only used to generate contents of policy.json)
- Builds use the 'containers_image_openpgp' build tag, which
  switches to the non-gpgme signature backend.

This updates the vendored code anyway
- to avoid false positives when scanning for vulnerabilities
- so that we don't have to worry about any future changes in this
  repository enabling those code paths.

Performed by updating Gopgg.tompl and
$ dep ensure

Signed-off-by: Miloslav Trmač <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants