Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanconfig: support rangefeeds for dynamic system tables #74555

Merged
merged 3 commits into from
Jan 31, 2022

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Jan 6, 2022

Fixes #73045.

We're running out of system table IDs (see #57531), and as a result
we're now introducing the notion of dynamic system IDs throughout the
system. Previously KV baked-in the assumption of static system IDs at
two points:

  • When deciding to allow rangefeeds on a given range;
  • When enforcing strict GC TTL;

It did so by decoding the range's key span and comparing against the
hard-coded maximum system ID, all to determine whether the range in
question contained system tables. If so, we allowed rangefeeds to be
declared over it, and also did not enforce strict GC TTL (only really
applies to user tables). This way of doing things does not compose with
dynamically allocated system table IDs. With arbitrary, possibly
non-contiguous IDs, we don't have the convenient key-comparison
properties to rely on.

To that end, we use the span configs infrastructure to to delegate
control of whether rangefeeds are enabled over a given range and whether
strict GC is enforced. This scheme allows SQL code to declare "system
table configs" over arbitrary schemas, and have KV still respect it.
This PR does not expose these span config settings as part of zone
configs -- there's no need to (though we could in the future). To
account for the asynchronous nature of the span configs infra, we need
to ensure that ranges without an available config default to enabling
rangefeeds.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif
Copy link
Contributor Author

This is "done", but still marking it as draft; it's building on top of 3 open PRs that I'd like to land first.

@irfansharif irfansharif force-pushed the 220105.rangefeed-span-cfgs branch from a475ea7 to eb6ff6b Compare January 18, 2022 03:23
@irfansharif irfansharif marked this pull request as ready for review January 18, 2022 03:23
@irfansharif irfansharif requested a review from a team as a code owner January 18, 2022 03:23
@irfansharif irfansharif requested a review from ajwerner January 18, 2022 03:23
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code :lgtm:, interesting that the tests are failing in the way that they are. Any ideas what's going on there?

Reviewed 13 of 92 files at r1, 79 of 79 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)

@irfansharif
Copy link
Contributor Author

Not yet, but thankfully it's contained only to this PR. It'll take some time to munch through, and I'm a bit occupied with the fallout from #73876. I'll let this PR sit on ice for a bit.

@irfansharif irfansharif removed the request for review from a team January 29, 2022 20:55
@irfansharif irfansharif force-pushed the 220105.rangefeed-span-cfgs branch 2 times, most recently from e4bd589 to 3996065 Compare January 31, 2022 04:23
@irfansharif irfansharif requested a review from a team January 31, 2022 04:23
Fixes cockroachdb#73045.

We're running out of system table IDs (see cockroachdb#57531), and as a result
we're now introducing the notion of dynamic system IDs throughout the
system. Previously KV baked-in the assumption of static system IDs at
two points:
- When deciding to allow rangefeeds on a given range;
- When enforcing strict GC TTL;

It did so by decoding the range's key span and comparing against the
hard-coded maximum system ID, all to determine whether the range in
question contained system tables. If so, we allowed rangefeeds to be
declared over it, and also did not enforce strict GC TTL (only really
applies to user tables). This way of doing things does not compose with
dynamically allocated system table IDs. With arbitrary, possibly
non-contiguous IDs, we don't have the convenient key-comparison
properties to rely on.

To that end, we use the span configs infrastructure to to delegate
control of whether rangefeeds are enabled over a given range and whether
strict GC is enforced. This scheme allows SQL code to declare "system
table configs" over arbitrary schemas, and have KV still respect it.
This PR does not expose these span config settings as part of zone
configs -- there's no need to (though we could in the future). To
account for the asynchronous nature of the span configs infra, we need
to ensure that ranges without an available config default to enabling
rangefeeds.

Release note: None
@irfansharif irfansharif force-pushed the 220105.rangefeed-span-cfgs branch from 3996065 to 4b8258c Compare January 31, 2022 05:18
@irfansharif
Copy link
Contributor Author

TFTR!

interesting that the tests are failing in the way that they are. Any ideas what's going on there?

Just a few tests that needed updating.

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 31, 2022

Build failed:

Now that we're using span configs to control whether rangefeeds are
enabled on ranges or not, this test need some additional synchrony
(waiting for span configs to be applied).

Release note: None
@irfansharif
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 31, 2022

Build failed:

@ajwerner
Copy link
Contributor

This most recent flake looks plausibly legit.

=== RUN   TestProposalOverhead/user-key_overhead
=== CONT  TestProposalOverhead/user-key_overhead
    client_replica_test.go:3603:
          Error Trace:  client_replica_test.go:3603
          Error:        Not equal:
                        expected: 0x2a
                        actual  : 0x45
          Test:         TestProposalOverhead/user-key_overhead

https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_UnitTests/4256737?showRootCauses=false&expandBuildChangesSection=true&expandBuildProblemsSection=true&expandBuildTestsSection=true

I imagine this could happen if the logical op log was still enabled.

bors r-

@irfansharif
Copy link
Contributor Author

Ugh, thanks for flagging. I keep catching new flakes every CI run.

..TestReplicaRangefeedRetryErrors.

- TestProposalOverhead relies on configs applying to the scratch range
  now that we're using configs to determine whether rangefeeds are
  enabled or not. We just opt into the right testing knob.
- TestReplicaRangefeedRetryErrors creates a split from "underneath" the
  span configs infra, which previously ended up applying the default
  config (rangefeed-enabled=true) for the freshly created replicas. This
  is not what the test wants. Short of rewriting the test, we introduce
  an interceptor to manually override the replica's span config.

Release note: None
@irfansharif irfansharif force-pushed the 220105.rangefeed-span-cfgs branch from d1c6356 to 43a96b3 Compare January 31, 2022 15:59
@irfansharif
Copy link
Contributor Author

This most recent flake looks plausibly legit.

Just more tests that needed updating (in appended commits). Will merge on green.

@irfansharif
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 31, 2022

Build succeeded:

@craig craig bot merged commit 6a0bdec into cockroachdb:master Jan 31, 2022
@irfansharif irfansharif deleted the 220105.rangefeed-span-cfgs branch January 31, 2022 19:56
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Feb 12, 2022
\cockroachdb#74555 starts using the span configs infrastructure to control whether
rangefeeds are enabled over a given range. Before dynamic system table
IDs (cockroachdb#76003), we used the range's key boundaries to determine whether
the range in question was for a system table ID. In mixed-version
clusters, it's possible to have both forms of this check. To ensure
things work in this form (something we suspected in cockroachdb#76331), we add a
test.

NB: The reason things still work is because in cockroachdb#74555 we modified the
system config span to hard code the relevant config fields for constant
system table IDs -- behaving identically to previous version nodes.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Mar 29, 2022
\cockroachdb#74555 starts using the span configs infrastructure to control whether
rangefeeds are enabled over a given range. Before dynamic system table
IDs (cockroachdb#76003), we used the range's key boundaries to determine whether
the range in question was for a system table ID. In mixed-version
clusters, it's possible to have both forms of this check. To ensure
things work in this form (something we suspected in cockroachdb#76331), we add a
test.

NB: The reason things still work is because in cockroachdb#74555 we modified the
system config span to hard code the relevant config fields for constant
system table IDs -- behaving identically to previous version nodes.

Release note: None
craig bot pushed a commit that referenced this pull request Mar 30, 2022
76466: spanconfig: verify migration for rangefeed enablement r=irfansharif a=irfansharif

\#74555 starts using the span configs infrastructure to control whether
rangefeeds are enabled over a given range. Before dynamic system table
IDs (#76003), we used the range's key boundaries to determine whether
the range in question was for a system table ID. In mixed-version
clusters, it's possible to have both forms of this check. To ensure
things work in this form (something we suspected in #76331), we add a
test.

NB: The reason things still work is because in #74555 we modified the
system config span to hard code the relevant config fields for constant
system table IDs -- behaving identically to previous version nodes.

Release note: None


Co-authored-by: irfan sharif <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Mar 30, 2022
\#74555 starts using the span configs infrastructure to control whether
rangefeeds are enabled over a given range. Before dynamic system table
IDs (#76003), we used the range's key boundaries to determine whether
the range in question was for a system table ID. In mixed-version
clusters, it's possible to have both forms of this check. To ensure
things work in this form (something we suspected in #76331), we add a
test.

NB: The reason things still work is because in #74555 we modified the
system config span to hard code the relevant config fields for constant
system table IDs -- behaving identically to previous version nodes.

Release note: None
fqazi pushed a commit to fqazi/cockroach that referenced this pull request Apr 4, 2022
\cockroachdb#74555 starts using the span configs infrastructure to control whether
rangefeeds are enabled over a given range. Before dynamic system table
IDs (cockroachdb#76003), we used the range's key boundaries to determine whether
the range in question was for a system table ID. In mixed-version
clusters, it's possible to have both forms of this check. To ensure
things work in this form (something we suspected in cockroachdb#76331), we add a
test.

NB: The reason things still work is because in cockroachdb#74555 we modified the
system config span to hard code the relevant config fields for constant
system table IDs -- behaving identically to previous version nodes.

Release note: None
@yuzefovich
Copy link
Member

Tracked down one of the regressions in microbenchmark to the first commit here:

Switching to 4b8258c4b4a8f89fe8a4ea87fc8ee7cd5c916b6d^
+ make bench PKG=./pkg/bench BENCHTIMEOUT=24h BENCHES=BenchmarkSQL/MultinodeCockroach/InsertSecondaryIndex/count=1000 'TESTFLAGS=-count 10 -benchmem'
Switching to 4b8258c4b4a8f89fe8a4ea87fc8ee7cd5c916b6d
+ make bench PKG=./pkg/bench BENCHTIMEOUT=24h BENCHES=BenchmarkSQL/MultinodeCockroach/InsertSecondaryIndex/count=1000 'TESTFLAGS=-count 10 -benchmem'
name                                                       old time/op    new time/op    delta
SQL/MultinodeCockroach/InsertSecondaryIndex/count=1000-24    59.9ms ±12%    64.1ms ±15%     ~     (p=0.123 n=10+10)

name                                                       old alloc/op   new alloc/op   delta
SQL/MultinodeCockroach/InsertSecondaryIndex/count=1000-24     198MB ±31%     249MB ±30%  +25.77%  (p=0.011 n=10+10)

name                                                       old allocs/op  new allocs/op  delta
SQL/MultinodeCockroach/InsertSecondaryIndex/count=1000-24      394k ± 3%      562k ± 3%  +42.84%  (p=0.000 n=10+10)

Not sure if we can do anything about it.

@irfansharif
Copy link
Contributor Author

I looked into this a bit, no smoking gun other than it being clearly due to more rangefeed processing. Filed #79602 to track investigation. (+cc @ajwerner just in case the additional mem usage looks patently obvious to you.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

zoneconfig,sql/catalog: ensure dynamic system tables support rangefeeds
5 participants