sql: fix performance regression in user authn #58671

rafiss · 2021-01-08T20:10:40Z

The authn code needs to query system.users and system.role_options.
These queries are run by the internal executor, which has a current DB
of "". This causes the name to be resolved as "".system.users and
"".system.role_options. The lookup for the "" DB always fails, but that
result is not cached, so the lookup occurs on every authn attempt. There
is fallback logic that then looks up the correct name.

Now we specify the fully-qualified 3-part name for these two queries.

This is a low-risk change that can be backported. A more robust fix that
will prevent this class of mistaken lookups will follow later, but probably
can't be backported.

Release note (bug fix): The user authentication flow no longer performs
extraneous name lookups. This performance regression was present since
v20.2.

cockroach-teamcity · 2021-01-08T20:10:47Z

This change is

rafiss · 2021-01-08T20:57:57Z

I applied this patch to 20.2.3 and tested on a multi-region cluster and confirmed better login latencies. Not sure how/if to test in unit tests.

thoszhang · 2021-01-08T21:24:10Z

Can we add something to the bench/ddl_analysis tests to test the number of round-trips?

ajwerner · 2021-01-08T21:47:56Z

I applied this patch to 20.2.3 and tested on a multi-region cluster and confirmed better login latencies. Not sure how/if to test in unit tests.

We have a thing for this. Add a benchmark for the "ddl_analyis". I'll unskip the verification part of that too. It's a bad name. It has nothing to do with DDL.

Grr. got sniped by jordan's stream and didn't submit this until lucy got there.

rafiss · 2021-01-08T22:03:33Z

Awesome! will work on those tests then.

See #58674 for follow-up work

rafiss · 2021-01-11T18:01:37Z

i wrote a new benchmark under bench/ddl_analysis but for some reason it always reports 0 roundtrips.

I've added logging and confirmed it does descriptor lookups in LookupObjectID and GetDescriptorByID, and I've used the debugger to see that it executes TxnCoordSender.Send(). (The benchmark test looks for txn coordinator send spans to count the number of round trips.) It looks like the counting logic only includes txn coordinator sends that are under the topmost flow, so it does not count some of the roundtrips that happen:

cockroach/pkg/bench/ddl_analysis/ddl_analysis_bench.go

Lines 104 to 108 in f0b5774

    
           // Find the topmost "flow" span to start traversing from. 
        
           for _, sp := range r { 
        
           	if sp.ParentSpanID == root.SpanID && sp.Operation == "flow" { 
        
           		return countKvBatchRequestsInSpan(r, sp) 
        
           	}

The test cases look like this

		{
			name: "select system.users without schema name",
			stmt: `SELECT username, "hashedPassword" FROM system.users WHERE username = 'root'`,
		},
		{
			name:  "select system.users with empty database name",
			setup: `SET sql_safe_updates = false; USE "";`,
			stmt:  `SELECT username, "hashedPassword"  FROM system.users WHERE username = 'root'`,
		},
		{
			name: "select system.users with schema name",
			stmt: `SELECT username, "hashedPassword" FROM system.public.users WHERE username = 'root'`,
		},

knz · 2021-01-11T18:18:04Z

You may need help from @ajwerner for the ddl_analysis test.

Then, separately, can you please file an issue to also add a .public schema prefix into all the remaining queries to system tables in the remainder of the source code? A quick grep check revealed we have plenty of them and some of them are even perfomance sensitive too. Thanks.

rafiss · 2021-01-11T18:22:49Z

@knz I'm getting help from @RichardJCai who wrote the benchmarks and just re-joined us today.

I can file such an issue, though my initial thought was to not do that since it would be rather tedious to fix, and also not future-proof, as someone could quite easily add another system query that doesn't specify the schema. That's why I created this change instead: #58674

knz · 2021-01-11T18:28:53Z

oh cool that works too. carry on then

The authn code needs to query system.users and system.role_options. These queries are run by the internal executor, which has a current DB of "". This causes the name to be resolved as "".system.users and "".system.role_options. The lookup for the "" DB always fails, but that result is not cached, so the lookup occurs on every authn attempt. There is fallback logic that then looks up the correct name. Now we specify the fully-qualified 3-part name for these two queries. To show that this fix is important, new benchmarks are added to the bench/ddl_analysis tests. Release note (bug fix): The user authentication flow no longer performs extraneous name lookups. This performance regression was present since v20.2.

rafiss · 2021-01-11T21:59:36Z

bors r=otan

craig · 2021-01-11T22:59:24Z

Build succeeded:

GitHub CI (Cockroach)

rafiss requested review from knz and a team January 8, 2021 20:55

otan approved these changes Jan 10, 2021

View reviewed changes

rafiss force-pushed the fix-user-authn-lookup branch from d108784 to aa8925c Compare January 11, 2021 18:58

rafiss force-pushed the fix-user-authn-lookup branch from aa8925c to 4106e33 Compare January 11, 2021 20:20

craig bot merged commit fc2be3a into cockroachdb:master Jan 11, 2021

This was referenced Jan 11, 2021

sql: use follower reads for internal authentication-related queries #58497

Closed

release-20.2: sql: fix performance regression in user authn #58739

Merged

rafiss deleted the fix-user-authn-lookup branch January 12, 2021 04:06

rafiss mentioned this pull request Jan 13, 2021

release-20.2: sql: avoid looking up empty-string object names #58900

Merged

This was referenced Jan 25, 2021

sql: Add ddl_analysis/bench test for connection attempts #59393

Closed

roachtest: create a workload that tests latencies of repeated connection attempts #59394

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: fix performance regression in user authn #58671

sql: fix performance regression in user authn #58671

rafiss commented Jan 8, 2021 •

edited

Loading

cockroach-teamcity commented Jan 8, 2021

rafiss commented Jan 8, 2021

thoszhang commented Jan 8, 2021

ajwerner commented Jan 8, 2021

rafiss commented Jan 8, 2021

rafiss commented Jan 11, 2021

knz commented Jan 11, 2021

rafiss commented Jan 11, 2021

knz commented Jan 11, 2021

rafiss commented Jan 11, 2021

craig bot commented Jan 11, 2021

sql: fix performance regression in user authn #58671

sql: fix performance regression in user authn #58671

Conversation

rafiss commented Jan 8, 2021 • edited Loading

cockroach-teamcity commented Jan 8, 2021

rafiss commented Jan 8, 2021

thoszhang commented Jan 8, 2021

ajwerner commented Jan 8, 2021

rafiss commented Jan 8, 2021

rafiss commented Jan 11, 2021

knz commented Jan 11, 2021

rafiss commented Jan 11, 2021

knz commented Jan 11, 2021

rafiss commented Jan 11, 2021

craig bot commented Jan 11, 2021

rafiss commented Jan 8, 2021 •

edited

Loading