Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: deprecate gossip-based physical planning #98527

Merged
merged 1 commit into from
Mar 21, 2023
Merged

Conversation

dt
Copy link
Member

@dt dt commented Mar 13, 2023

Previously system tenants and secondary tenants used differnt physical planning implementations, with the system tenant and only the system tenant using nodeIDs while other tenants used the instance table. This unifies those implementations such that all tenants use NodeIDs if running in mixed mode and use the instance table if not.

Release note: none.
Epic: CRDB-16910

@dt dt requested review from knz, yuzefovich and rharding6373 March 13, 2023 20:37
@dt dt requested a review from a team as a code owner March 13, 2023 20:37
@blathers-crl
Copy link

blathers-crl bot commented Mar 13, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt, @knz, and @rharding6373)


-- commits line 4 at r1:
nit: s/differnt/different/g.


pkg/sql/distsql_physical_planner.go line 1289 at r1 (raw file):

}

// partitionSpansTenant assigns SQL instances in a tenant to spans. It performs

nit: this comment needs a minor adjustment.


pkg/sql/distsql_physical_planner.go line 1371 at r1 (raw file):

		// this server is in mixed mode, we can safely assume every other server
		// is as well, and thus has IDs matching node IDs.
		resolver = func(nodeID roachpb.NodeID) base.SQLInstanceID {

nit: does this result in an allocation? If so, perhaps it'd be worth to make this function non-anonymous?

@dt dt force-pushed the unify branch 3 times, most recently from b0743fd to 6ed3014 Compare March 13, 2023 22:37
Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so might not be quite as quick and easy as it initially looked to get this green/merged: Two main issues making this red right now:

  1. there's lots of tests where the test setup / mocking / dependency plumbing / etc needs to be updated to just the test to even execute without nil-pointering
  2. there are some legitimate behavior differences, namely around the distSQL version, that mean some tests are "correctly" failing

On (2), I'd forgotten this version filter behavior was still here; somehow I thought we'd moved away from this already in favor of more granular per-processor version difference handling. We discussed this on slack though and we do have incompatible changes in some processors between 22.2 and 23.1 that are relying on this version and the filtering for correctness. That said, this change here only uses the new no-version-filtering tenant implementation of planing for system tenants, who would not have previously used it, when the cluster version is 23.1, meaning that mixed clusters where 22.2 nodes are still participating in flows are not a concern. So we do not need to version-filtering to the tenant-implementation of planning at all, as long as that planning is only used when V23.1 is active.

So I think that means the tests that specifically expect to see version filtering can just be configured to use a <23.1 cluster version, so they won't try to use tenant planning, with its lack of version filtering.

So I think that just leaves all the stuff in (1) -- all the tests that currently setup gossip or whatever but not the instance reader or nodeid -- and need to be updated. I can start working my way through those, but just wanted to check that this sounds right before I sink a bunch of time into it?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz, @rharding6373, and @yuzefovich)


-- commits line 4 at r1:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/differnt/different/g.

Done.


pkg/sql/distsql_physical_planner.go line 1289 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: this comment needs a minor adjustment.

Done.


pkg/sql/distsql_physical_planner.go line 1371 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: does this result in an allocation? If so, perhaps it'd be worth to make this function non-anonymous?

not sure but I lifted anyway since that also lets me move the comment to live on it and then the logic in this function fits on one screen which seems nice.

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the test setup will need updating with the distsql changes here.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz and @yuzefovich)

@dt
Copy link
Member Author

dt commented Mar 15, 2023

I'm going to work on the tests this week, but I'm feeling like I might want an escape hatch here in case we discover any regressions we can't fix quickly in instance-based planning; I assume we'd want to fix them, since tenants use instance-based planning so it does need to work, but it might be nice to have a feature flag/session var/cluster setting/etc we can toggle quickly as well.

Is there an example of how to plumb a session var down here? or should I just go cluster setting?

@yuzefovich
Copy link
Member

yuzefovich commented Mar 15, 2023

Is there an example of how to plumb a session var down here? or should I just go cluster setting?

If here refers to partitionSpans method, then we have the extended eval context in PlanningCtx which should have the session data available, so it looks like no additional plumbing is needed. Am I misunderstanding your question?

@dt
Copy link
Member Author

dt commented Mar 15, 2023

Am I misunderstanding your question?

Nope, that was exactly the answer I was looking for -- I just didn't know that the extended eval ctx is what I was looking for (it has been awhile since I spent any time in SQL code, sorry!).

But I guess other question: does session var sound like the right escape hatch? or cluster setting?

@yuzefovich
Copy link
Member

I think nowadays a session var is more preferable than a cluster setting (we have ALTER ROLE ALL that modifies a session var for all except root, if curious). But overall either sounds reasonable to me.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

@dt
Copy link
Member Author

dt commented Mar 20, 2023

I elected to comment out the 23.1 check part of this, so this is now just a code re-org, but not a behavior change in this diff, i.e. all system tenants keep using gossip for now, regardless of version.

I still want to add the version check and/or a session var so that even system tenants will start using the same (instance backed) planning as everyone else, but I think I'd rather wait and do that later -- at the moment I have a BACKUP diff I want to get out that is on top of this change, and then I'll revisit the real behavior change that was here -- and fixing all these tests -- later.

So I think this is a boring code-moving-only PR now.

@dt dt changed the title sql: unify physical planning sql: deprecate gossip-based physical planning Mar 21, 2023
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let yahor do the final review

Reviewed 1 of 2 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)

@dt
Copy link
Member Author

dt commented Mar 21, 2023

My current thinking: stop here with this diff, that just moves the code around so we're ready to flip the switch to turn off gossip-backed planning for system tenants and have them use the same planning as everyone else, and then a follow-up diff can flip that switch with one line (and maybe an escape-hatch var), while mostly focusing on the testing changes it implies.

@dt
Copy link
Member Author

dt commented Mar 21, 2023

@yuzefovich does your previous LGTM still stand here with the scaled-back ambitions reflected in the code now?

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 2 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt)


pkg/sql/distsql_physical_planner_test.go line 1029 at r4 (raw file):

			dsp := DistSQLPlanner{
				planVersion: tc.planVersion,
				st: cluster.MakeTestingClusterSettingsWithVersions(

nit: is the change to versions here still necessary?

Previously system tenants and secondary tenants used different physical planning
implementations, with the system tenant and only the system tenant using nodeIDs
while other tenants used the instance table. This unifies those implementations
such that all tenants use NodeIDs if running in mixed mode and use the instance
table if not.

Release note: none.
Epic: CRDB-16910
@dt
Copy link
Member Author

dt commented Mar 21, 2023

TFTR!

bors r+

Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz and @yuzefovich)


pkg/sql/distsql_physical_planner_test.go line 1029 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: is the change to versions here still necessary?

nope, removed all changes to this file

@craig craig bot merged commit 77029b1 into cockroachdb:master Mar 21, 2023
@craig
Copy link
Contributor

craig bot commented Mar 21, 2023

Build succeeded:

@dt dt deleted the unify branch March 23, 2023 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants