Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/pebbleiter: guard against double-closed Iterators #90859

Merged
merged 1 commit into from
Jan 12, 2023

Conversation

jbowens
Copy link
Collaborator

@jbowens jbowens commented Oct 28, 2022

Replace direct uses of the *pebble.Iterator type with a pebbleiter.Iterator
type definition. In production builds, this type is a simple type definition for
*pebble.Iterator which suffers no overhead. In crdb_test builds, the
pebbleiter.Iterator is defined as a *pebbleiter.AssertionIter type that can
assert proper use of the *pebble.Iterator type and its contract.

Initially, use this new assertion iterator to ensure that a *pebble.Iterator
is never double-closed, which may cause the Iterator to be added to a sync pool
twice leading to races. This type of assertion cannot be done by
*pebble.Iterator, because the iterator may already be reused by the time the
second Close is run.

In the future, I'd like to also introduce assertions that pkg/storage always
checks validity before calling HasPointAndRange and RangeKeyChanged (see
cockroachdb/pebble#2205).

Release note: None
Epic: None

Close #90643.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jbowens jbowens self-assigned this Oct 31, 2022
@jbowens jbowens force-pushed the assertion-iter branch 2 times, most recently from 642dce5 to 4485497 Compare January 5, 2023 17:07
@jbowens jbowens marked this pull request as ready for review January 5, 2023 17:09
@jbowens jbowens requested review from a team as code owners January 5, 2023 17:09
Copy link
Collaborator

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! :lgtm:

In the future, would it also make sense to run some of our roachtests with these new assertions enabled?

Reviewed 14 of 14 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jbowens)


pkg/storage/pebbleiter/BUILD.bazel line 7 at r1 (raw file):

# gazelle:exclude gen-crdb_test_on.go

# keep

Is this a necessary build directive? Same for the keep below in the go_test target.


pkg/storage/pebbleiter/BUILD.bazel line 19 at r1 (raw file):

)

REMOVE_GO_BUILD_CONSTRAINTS = "cat $< | grep -v '//go:build' | grep -v '// +build' > $@"

I don't understand the build system enough to be able to tell at a glance why this is needed. Worth a comment? Or is this common enough knowledge for people that speak Bazel?


pkg/storage/pebbleiter/crdb_test_off.go line 22 at r1 (raw file):

// MaybeWrap returns the provided Iterator, unmodified.
func MaybeWrap(iter *pebble.Iterator) Iterator {

Do these inline? If they don't, do we care about the overhead?


pkg/storage/pebbleiter/crdb_test_on.go line 31 at r1 (raw file):

// AssertionIter wraps a *pebble.Iterator with assertion checking.
type AssertionIter struct {

nit: does this need to be exported?

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 14 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jbowens and @nicktrav)


pkg/storage/pebbleiter/crdb_test_off.go line 22 at r1 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Do these inline? If they don't, do we care about the overhead?

worth adding // gcassert:inline

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nicktrav, @rickystewart, and @sumeerbhola)


pkg/storage/pebbleiter/BUILD.bazel line 7 at r1 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Is this a necessary build directive? Same for the keep below in the go_test target.

I believe so, but I copied this from the buildutil package.


pkg/storage/pebbleiter/BUILD.bazel line 19 at r1 (raw file):

Previously, nicktrav (Nick Travers) wrote…

I don't understand the build system enough to be able to tell at a glance why this is needed. Worth a comment? Or is this common enough knowledge for people that speak Bazel?

I couldn't tell you what they do either, but I lifted them from the only other build-tag-constrained file. Maybe @rickystewart knows.


pkg/storage/pebbleiter/crdb_test_off.go line 22 at r1 (raw file):

Previously, sumeerbhola wrote…

worth adding // gcassert:inline

The production one does, and done.


pkg/storage/pebbleiter/crdb_test_on.go line 31 at r1 (raw file):

Previously, nicktrav (Nick Travers) wrote…

nit: does this need to be exported?

unexported

Copy link
Collaborator

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, 7 of 7 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rickystewart)

Copy link
Collaborator

@rickystewart rickystewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, the comment is more food for thought than a suggestion you need to take.

// by the Apache License, Version 2.0, included in the file
// licenses/APL.txt.

//go:build !crdb_test || crdb_test_off
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth considering whether eschewing the conditional compilation and just sticking with a straightforward wrapping design might be fine. For example, something like:

import "github.com/cockroachdb/cockroach/pkg/util/buildutil"

type Iterator struct {
    iter *pebble.Iterator
    closed bool
}

func Wrap(iter *pebble.Iterator) Iterator {
    return Iterator { iter: iter }
}

func (i *Iterator) Clone(cloneOpts pebble.CloneOptions) (*Iterator, error) { ... }

func (i *Iterator) Close() error {
    if buildutil.CrdbTestBuild {
        if i.closed { 
            panic("pebble.Iterator already closed")
        }
        err := i.iter.Close()
        i.closed = true
        return err
    } else {
        return i.iter.Close()
    }
}

I imagine your thinking is that allocating the extra struct is not a cost we should pay if we're not running a test build, and maybe you're right. However I wonder if the extra cost (another little struct and maybe one additional indirection for Clone() and Close()) is not minimal enough that we can just eat it. I don't know enough about the implementation to say so -- maybe these iterators are opened and closed in a really hot loop and we can't afford taking any extra time.

Another thing to consider: if we just decide to keep the wrapping in non-test builds, then we could have an extra guardrail, right?

func (i *Iterator) Close() error {
    if i.closed {
         if buildutil.CrdbTestBuild {
            panic(...)
        } else {
            return fmt.Errorf("iterator has already been closed") // can attach extra data to the error here too
        }
    }
    err := i.iter.Close()
    i.closed = true
    return err
}

We can avoid panicking immediately in this case but presumably throwing an error if we're doing something that's definitely illegal is better than just continuing on to likely corrupt data?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: the last bit -- if we're going to keep track of whether each individual iterator is closed to warn about double-closes, I also wonder whether that logic makes more sense as a part of the pebble.Iterator itself. That struct could carry around the closed bool and give a sensible error if the method is called more than once.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However I wonder if the extra cost (another little struct and maybe one additional indirection for Clone() and Close()) is not minimal enough that we can just eat it. I don't know enough about the implementation to say so -- maybe these iterators are opened and closed in a really hot loop and we can't afford taking any extra time.

This layer is particularly performance sensitive and any additional allocations would be prohibitive.

Re: the last bit -- if we're going to keep track of whether each individual iterator is closed to warn about double-closes, I also wonder whether that logic makes more sense as a part of the pebble.Iterator itself. That struct could carry around the closed bool and give a sensible error if the method is called more than once.

Since both storage.pebbleIterator and pebble.Iterator are pooled to reduce allocations, the iterators structs may already be in use again by the time Close is called a second time. We can't distinguish between a Close issued by the new user, versus a double-Close by the old user.

It does seem like it'd be feasible to maintain monotonically increasing counts of opens and closes and assert that closes never exceeds opens, but the ability to make crdb_test-gated assertions of pkg/storage's use of pebble.Iterator is generally useful. We could move some of those assertions into pebble.Iterator, but they still need to be gated on crdb_test.

@rickystewart would it be feasible to add the Pebble invariants build tag when the Cockroach crdb_test build tag is set? That could allow us to take advantage of the existing Pebble invariant assertions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(As mentioned, don't let this comment block you, this is fine to be checked-in as is.)

@rickystewart would it be feasible to add the Pebble invariants build tag when the Cockroach crdb_test build tag is set? That could allow us to take advantage of the existing Pebble invariant assertions.

It is possible, but it might result in a lot of stuff being re-compiled for developers when swapping between builds and tests if they happen to be using pebble. You can file a follow-up issue to discuss this if you like.

Replace direct uses of the `*pebble.Iterator` type with a `pebbleiter.Iterator`
type definition. In production builds, this type is a simple type definition for
`*pebble.Iterator` which suffers no overhead. In `crdb_test` builds, the
`pebbleiter.Iterator` is defined as a `*pebbleiter.AssertionIter` type that can
assert proper use of the `*pebble.Iterator` type and its contract.

Initially, use this new assertion iterator to ensure that a `*pebble.Iterator`
is never double-closed, which may cause the Iterator to be added to a sync pool
twice leading to races. This type of assertion cannot be done by
`*pebble.Iterator`, because the iterator may already be reused by the time the
second Close is run.

In the future, I'd like to also introduce assertions that pkg/storage always
checks validity before calling HasPointAndRange and RangeKeyChanged (see
cockroachdb/pebble#2205).

Release note: None
Epic: None

Close cockroachdb#90643.
@jbowens jbowens merged commit 8330bdb into cockroachdb:master Jan 12, 2023
@jbowens jbowens deleted the assertion-iter branch January 12, 2023 03:12
jbowens added a commit to jbowens/cockroach that referenced this pull request Feb 23, 2023
Backport cockroachdb#90859, cockroachdb#96685 and cockroachdb#97222 to mangle the buffers returned by Pebble
iterators to ensure MVCC and Cockroach code respects the Pebble iterator memory
lifetimes.

The RangeBounds() and RangeKeys() validity assertions are omitted, since 22.2
allowed these methods to be invoked on an invalid iterator or an iterator
positioned at a point key.

Release note: None
Release justification: Non-production code changes
jbowens added a commit to jbowens/cockroach that referenced this pull request Feb 24, 2023
Backport cockroachdb#90859, cockroachdb#96685 and cockroachdb#97222 to mangle the buffers returned by Pebble
iterators to ensure MVCC and Cockroach code respects the Pebble iterator memory
lifetimes.

The RangeBounds() and RangeKeys() validity assertions are omitted, since 22.2
allowed these methods to be invoked on an invalid iterator or an iterator
positioned at a point key.

Release note: None
Release justification: Non-production code changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: guard against double Iterator Close
5 participants