-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: deprecate gossip-based physical planning #98527
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @dt, @knz, and @rharding6373)
-- commits
line 4 at r1:
nit: s/differnt/different/g
.
pkg/sql/distsql_physical_planner.go
line 1289 at r1 (raw file):
} // partitionSpansTenant assigns SQL instances in a tenant to spans. It performs
nit: this comment needs a minor adjustment.
pkg/sql/distsql_physical_planner.go
line 1371 at r1 (raw file):
// this server is in mixed mode, we can safely assume every other server // is as well, and thus has IDs matching node IDs. resolver = func(nodeID roachpb.NodeID) base.SQLInstanceID {
nit: does this result in an allocation? If so, perhaps it'd be worth to make this function non-anonymous?
b0743fd
to
6ed3014
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so might not be quite as quick and easy as it initially looked to get this green/merged: Two main issues making this red right now:
- there's lots of tests where the test setup / mocking / dependency plumbing / etc needs to be updated to just the test to even execute without nil-pointering
- there are some legitimate behavior differences, namely around the distSQL version, that mean some tests are "correctly" failing
On (2), I'd forgotten this version filter behavior was still here; somehow I thought we'd moved away from this already in favor of more granular per-processor version difference handling. We discussed this on slack though and we do have incompatible changes in some processors between 22.2 and 23.1 that are relying on this version and the filtering for correctness. That said, this change here only uses the new no-version-filtering tenant implementation of planing for system tenants, who would not have previously used it, when the cluster version is 23.1, meaning that mixed clusters where 22.2 nodes are still participating in flows are not a concern. So we do not need to version-filtering to the tenant-implementation of planning at all, as long as that planning is only used when V23.1 is active.
So I think that means the tests that specifically expect to see version filtering can just be configured to use a <23.1 cluster version, so they won't try to use tenant planning, with its lack of version filtering.
So I think that just leaves all the stuff in (1) -- all the tests that currently setup gossip or whatever but not the instance reader or nodeid -- and need to be updated. I can start working my way through those, but just wanted to check that this sounds right before I sink a bunch of time into it?
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz, @rharding6373, and @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/differnt/different/g
.
Done.
pkg/sql/distsql_physical_planner.go
line 1289 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: this comment needs a minor adjustment.
Done.
pkg/sql/distsql_physical_planner.go
line 1371 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: does this result in an allocation? If so, perhaps it'd be worth to make this function non-anonymous?
not sure but I lifted anyway since that also lets me move the comment to live on it and then the logic in this function fits on one screen which seems nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the test setup will need updating with the distsql changes here.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz and @yuzefovich)
I'm going to work on the tests this week, but I'm feeling like I might want an escape hatch here in case we discover any regressions we can't fix quickly in instance-based planning; I assume we'd want to fix them, since tenants use instance-based planning so it does need to work, but it might be nice to have a feature flag/session var/cluster setting/etc we can toggle quickly as well. Is there an example of how to plumb a session var down here? or should I just go cluster setting? |
If |
Nope, that was exactly the answer I was looking for -- I just didn't know that the extended eval ctx is what I was looking for (it has been awhile since I spent any time in SQL code, sorry!). But I guess other question: does session var sound like the right escape hatch? or cluster setting? |
I think nowadays a session var is more preferable than a cluster setting (we have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it!
I elected to comment out the 23.1 check part of this, so this is now just a code re-org, but not a behavior change in this diff, i.e. all system tenants keep using gossip for now, regardless of version. I still want to add the version check and/or a session var so that even system tenants will start using the same (instance backed) planning as everyone else, but I think I'd rather wait and do that later -- at the moment I have a BACKUP diff I want to get out that is on top of this change, and then I'll revisit the real behavior change that was here -- and fixing all these tests -- later. So I think this is a boring code-moving-only PR now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let yahor do the final review
Reviewed 1 of 2 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)
My current thinking: stop here with this diff, that just moves the code around so we're ready to flip the switch to turn off gossip-backed planning for system tenants and have them use the same planning as everyone else, and then a follow-up diff can flip that switch with one line (and maybe an escape-hatch var), while mostly focusing on the testing changes it implies. |
@yuzefovich does your previous LGTM still stand here with the scaled-back ambitions reflected in the code now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @dt)
pkg/sql/distsql_physical_planner_test.go
line 1029 at r4 (raw file):
dsp := DistSQLPlanner{ planVersion: tc.planVersion, st: cluster.MakeTestingClusterSettingsWithVersions(
nit: is the change to versions here still necessary?
Previously system tenants and secondary tenants used different physical planning implementations, with the system tenant and only the system tenant using nodeIDs while other tenants used the instance table. This unifies those implementations such that all tenants use NodeIDs if running in mixed mode and use the instance table if not. Release note: none. Epic: CRDB-16910
TFTR! bors r+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz and @yuzefovich)
pkg/sql/distsql_physical_planner_test.go
line 1029 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: is the change to versions here still necessary?
nope, removed all changes to this file
Build succeeded: |
Previously system tenants and secondary tenants used differnt physical planning implementations, with the system tenant and only the system tenant using nodeIDs while other tenants used the instance table. This unifies those implementations such that all tenants use NodeIDs if running in mixed mode and use the instance table if not.
Release note: none.
Epic: CRDB-16910