Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rditer: replace ReplicaDataEngineIterator with IterateReplicaKeySpans #84070

Merged
merged 2 commits into from
Jul 27, 2022

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Jul 8, 2022

See #83747 for prior discussion.


rditer: rename "key range" to "key span" and use roachpb.Span

This patch renames the concept "key range" to "key span", and the
functions now return regular roachpb.Span instead of rditer.KeyRange.
"Range" is already an overloaded term (I have used "range keys in the
range's key ranges" in a sentence!), and "key span" is a more widely used
term in the code.

Release note: None

rditer: replace ReplicaDataEngineIterator with IterateReplicaKeySpans

This patch replaces ReplicaDataEngineIterator with
IterateReplicaKeySpans().

The Raft snapshot protocol requires an unusual iteration order: for each
of the range's key spans, first send all point keys then send all range
keys. This allows ingesting SSTs that only span each key span (rather
than across wide swaths of the keyspace), with efficient, separate
iteration over point/range keys.

While the current ReplicaDataEngineIterator already does this, the
iteration order is rather peculiar, which can be unexpected to users.
IterateReplicaKeySpans attempts to make this iteration order more
explicit.

Other approaches were considered but rejected:

  • Use an iterator generator, which returns the next iterator in the
    sequence. This was very unergonomic to use, since it required nested
    iteration loops (one over the iterators, and one per iterator) which
    make variable naming/reuse and iterator closing awkward.

  • Use a visitor for each key instead of keyspan. This would again end up
    hiding the iteration sequence/structure. It also would make it
    impossible to skip across key spans, or do partial processing of a key
    span, since it could only abort the entire iteration.

Touches #82591.

Release note: None

@erikgrinaker erikgrinaker requested review from a team as code owners July 8, 2022 12:57
@erikgrinaker erikgrinaker self-assigned this Jul 8, 2022
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker erikgrinaker force-pushed the rditer-generator branch 2 times, most recently from f8f533a to ec16753 Compare July 8, 2022 13:40
@erikgrinaker erikgrinaker changed the title rditer: replace ReplicateDataEngineIterator with IterateReplicaKeySpans rditer: replace ReplicaDataEngineIterator with IterateReplicaKeySpans Jul 8, 2022
Copy link
Contributor

@aliher1911 aliher1911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks cleaner that iterator flipping between spans and key types to me.
Updated tests seem to ignore range key type iterators mostly by expecting points. Does that need fixing?

pkg/kv/kvserver/replica_test.go Show resolved Hide resolved
pkg/kv/kvserver/replica_learner_test.go Show resolved Hide resolved
pkg/kv/kvserver/store_snapshot.go Show resolved Hide resolved
@erikgrinaker
Copy link
Contributor Author

Updated tests seem to ignore range key type iterators mostly by expecting points. Does that need fixing?

I think it's fine. We don't need range keys in every test, as long as we have sufficient test coverage, which I think we do (at least for them being included in snapshots) -- see e.g. integration tests in client_raft_test.go.

Copy link
Contributor

@aliher1911 aliher1911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Looks good!

This patch renames the concept "key range" to "key span", and the
functions now return regular `roachpb.Span` instead of `rditer.KeyRange`.
"Range" is already an overloaded term (I have used "range keys in the
range's key ranges" in a sentence!), and "key span" is a more widely used
term in the code.

Release note: None
@erikgrinaker
Copy link
Contributor Author

TFTR!

bors r=aliher1911

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @erikgrinaker)


pkg/kv/kvserver/rditer/replica_data_iter.go line 366 at r4 (raw file):

// iterutil.StopIteration().
//
// Must use a reader with consistent iterators.

With some resumption information in the visitor interface we could also in the future solve #75824

@erikgrinaker
Copy link
Contributor Author

bors r-

This leaks an iterator if the seek errors.

@craig
Copy link
Contributor

craig bot commented Jul 26, 2022

Canceled.

Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @sumeerbhola)


pkg/kv/kvserver/rditer/replica_data_iter.go line 366 at r4 (raw file):

Previously, sumeerbhola wrote…

With some resumption information in the visitor interface we could also in the future solve #75824

Good point. I'll pick that up shortly, it should be fairly trivial.

…ans`

This patch replaces `ReplicaDataEngineIterator` with
`IterateReplicaKeySpans()`.

The Raft snapshot protocol requires an unusual iteration order: for each
of the range's key spans, first send all point keys then send all range
keys. This allows ingesting SSTs that only span each key span (rather
than across wide swaths of the keyspace), with efficient, separate
iteration over point/range keys.

While the current `ReplicaDataEngineIterator` already does this, the
iteration order is rather peculiar, which can be unexpected to users.
`IterateReplicaKeySpans` attempts to make this iteration order more
explicit.

Other approaches were considered but rejected:

* Use an iterator generator, which returns the next iterator in the
  sequence. This was very unergonomic to use, since it required nested
  iteration loops (one over the iterators, and one per iterator) which
  make variable naming/reuse and iterator closing awkward.

* Use a visitor for each key instead of keyspan. This would again end up
  hiding the iteration sequence/structure. It also would make it
  impossible to skip across key spans, or do partial processing of a key
  span, since it could only abort the entire iteration.

Release note: None
@erikgrinaker
Copy link
Contributor Author

bors r=aliher1911

@craig
Copy link
Contributor

craig bot commented Jul 26, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 26, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 27, 2022

Build succeeded:

@craig craig bot merged commit 4753c67 into cockroachdb:master Jul 27, 2022
@erikgrinaker erikgrinaker deleted the rditer-generator branch July 27, 2022 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants