Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fix performance regression in user authn #58671

Merged
merged 1 commit into from
Jan 11, 2021

Conversation

rafiss
Copy link
Collaborator

@rafiss rafiss commented Jan 8, 2021

The authn code needs to query system.users and system.role_options.
These queries are run by the internal executor, which has a current DB
of "". This causes the name to be resolved as "".system.users and
"".system.role_options. The lookup for the "" DB always fails, but that
result is not cached, so the lookup occurs on every authn attempt. There
is fallback logic that then looks up the correct name.

Now we specify the fully-qualified 3-part name for these two queries.

This is a low-risk change that can be backported. A more robust fix that
will prevent this class of mistaken lookups will follow later, but probably
can't be backported.

Release note (bug fix): The user authentication flow no longer performs
extraneous name lookups. This performance regression was present since
v20.2.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rafiss rafiss requested review from knz and a team January 8, 2021 20:55
@rafiss
Copy link
Collaborator Author

rafiss commented Jan 8, 2021

I applied this patch to 20.2.3 and tested on a multi-region cluster and confirmed better login latencies. Not sure how/if to test in unit tests.

@thoszhang
Copy link
Contributor

Can we add something to the bench/ddl_analysis tests to test the number of round-trips?

@ajwerner
Copy link
Contributor

ajwerner commented Jan 8, 2021

I applied this patch to 20.2.3 and tested on a multi-region cluster and confirmed better login latencies. Not sure how/if to test in unit tests.

We have a thing for this. Add a benchmark for the "ddl_analyis". I'll unskip the verification part of that too. It's a bad name. It has nothing to do with DDL.

Grr. got sniped by jordan's stream and didn't submit this until lucy got there.

@rafiss
Copy link
Collaborator Author

rafiss commented Jan 8, 2021

Awesome! will work on those tests then.

See #58674 for follow-up work

@rafiss
Copy link
Collaborator Author

rafiss commented Jan 11, 2021

i wrote a new benchmark under bench/ddl_analysis but for some reason it always reports 0 roundtrips.

I've added logging and confirmed it does descriptor lookups in LookupObjectID and GetDescriptorByID, and I've used the debugger to see that it executes TxnCoordSender.Send(). (The benchmark test looks for txn coordinator send spans to count the number of round trips.) It looks like the counting logic only includes txn coordinator sends that are under the topmost flow, so it does not count some of the roundtrips that happen:

// Find the topmost "flow" span to start traversing from.
for _, sp := range r {
if sp.ParentSpanID == root.SpanID && sp.Operation == "flow" {
return countKvBatchRequestsInSpan(r, sp)
}

The test cases look like this

		{
			name: "select system.users without schema name",
			stmt: `SELECT username, "hashedPassword" FROM system.users WHERE username = 'root'`,
		},
		{
			name:  "select system.users with empty database name",
			setup: `SET sql_safe_updates = false; USE "";`,
			stmt:  `SELECT username, "hashedPassword"  FROM system.users WHERE username = 'root'`,
		},
		{
			name: "select system.users with schema name",
			stmt: `SELECT username, "hashedPassword" FROM system.public.users WHERE username = 'root'`,
		},

@knz
Copy link
Contributor

knz commented Jan 11, 2021

You may need help from @ajwerner for the ddl_analysis test.

Then, separately, can you please file an issue to also add a .public schema prefix into all the remaining queries to system tables in the remainder of the source code? A quick grep check revealed we have plenty of them and some of them are even perfomance sensitive too. Thanks.

@rafiss
Copy link
Collaborator Author

rafiss commented Jan 11, 2021

@knz I'm getting help from @RichardJCai who wrote the benchmarks and just re-joined us today.

I can file such an issue, though my initial thought was to not do that since it would be rather tedious to fix, and also not future-proof, as someone could quite easily add another system query that doesn't specify the schema. That's why I created this change instead: #58674

@knz
Copy link
Contributor

knz commented Jan 11, 2021

oh cool that works too. carry on then

@rafiss rafiss force-pushed the fix-user-authn-lookup branch from d108784 to aa8925c Compare January 11, 2021 18:58
The authn code needs to query system.users and system.role_options.
These queries are run by the internal executor, which has a current DB
of "". This causes the name to be resolved as "".system.users and
"".system.role_options. The lookup for the "" DB always fails, but that
result is not cached, so the lookup occurs on every authn attempt. There
is fallback logic that then looks up the correct name.

Now we specify the fully-qualified 3-part name for these two queries.

To show that this fix is important, new benchmarks are added to the
bench/ddl_analysis tests.

Release note (bug fix): The user authentication flow no longer performs
extraneous name lookups. This performance regression was present since
v20.2.
@rafiss rafiss force-pushed the fix-user-authn-lookup branch from aa8925c to 4106e33 Compare January 11, 2021 20:20
@rafiss
Copy link
Collaborator Author

rafiss commented Jan 11, 2021

bors r=otan

@craig
Copy link
Contributor

craig bot commented Jan 11, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants