Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: GetRequest not used when querying secondary index without stored columns #72735

Closed
nvanbenschoten opened this issue Nov 14, 2021 · 4 comments · Fixed by #72794
Closed

sql: GetRequest not used when querying secondary index without stored columns #72735

nvanbenschoten opened this issue Nov 14, 2021 · 4 comments · Fixed by #72794
Assignees
Labels
A-sql-execution Relating to SQL execution. C-investigation Further steps needed to qualify. C-label will change. C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-queries SQL Queries Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Nov 14, 2021

In #61583, @jordanlewis landed a major improvement to perform point KV GetRequests over ranged KV ScanRequests in some cases. This came with a number of benefits, which were listed in the PR description.

While looking at TPC-E, I've seen traces that indicate that this optimization has had a very positive impact. However, I've also noticed a few cases where point lookups should be used when querying secondary indexes, but aren't being used. In these cases, we're resorting to Scan requests instead of Get requests.

I believe I've narrowed this down to some logic in span.Builder.CanSplitSpanIntoFamilySpans, which may be too restrictive:

// If we're looking at a secondary index...
if s.index.GetID() != s.table.GetPrimaryIndexID() {
// * The index constraint must not contain null, since that would cause the
// index key to not be completely knowable.
if containsNull {
return false
}
// * The index cannot be inverted.
if s.index.GetType() != descpb.IndexDescriptor_FORWARD {
return false
}
// * The index must store some columns.
if s.index.NumSecondaryStoredColumns() == 0 {
return false
}
// * The index is a new enough version.
if s.index.GetVersion() < descpb.SecondaryIndexFamilyFormatVersion {
return false
}
}

Specifically, I think there's something off with this NumSecondaryStoredColumns check. I don't understand why it's needed, and it seems to break cases where a unique secondary index that does not store any columns is being queried to retrieve the primary key column(s).

Here's a reproduction:

./cockroach demo --empty

CREATE TABLE kv (
	k INT PRIMARY KEY,
	v INT NOT NULL,
	v2 INT NOT NULL,
	UNIQUE INDEX idx1 (v),
	UNIQUE INDEX idx2 (v) STORING (v2)
);

SET tracing = on; SELECT k FROM kv@idx1 WHERE v = 10; SET tracing = off; SHOW kv TRACE FOR session;

            timestamp           |       age       |                message                 |                             tag                             |                location                 |    operation     | span
--------------------------------+-----------------+----------------------------------------+-------------------------------------------------------------+-----------------------------------------+------------------+-------
  2021-11-14 21:52:15.674302+00 | 00:00:00.004366 | Scan /Table/52/2/1{0-1}                | [n=1,client=127.0.0.1:49475,hostssl,user=demo]              | sql/row/kv_batch_fetcher.go:441         | colbatchscan     |    7
  2021-11-14 21:52:15.6774+00   | 00:00:00.007464 | querying next range at /Table/52/2/10  | [n=1,client=127.0.0.1:49475,hostssl,user=demo,txn=ac105482] | kv/kvclient/kvcoord/range_iter.go:175   | dist sender send |    9
  2021-11-14 21:52:15.678542+00 | 00:00:00.008606 | r44: sending batch 1 Scan to (n1,s1):1 | [n=1,client=127.0.0.1:49475,hostssl,user=demo,txn=ac105482] | kv/kvclient/kvcoord/dist_sender.go:1938 | dist sender send |    9
  2021-11-14 21:52:15.683729+00 | 00:00:00.013793 | rows affected: 0                       | [n=1,client=127.0.0.1:49475,hostssl,user=demo]              | sql/conn_executor_exec.go:709           | exec stmt        |    3

SET tracing = on; SELECT k FROM kv@idx2 WHERE v = 10; SET tracing = off; SHOW kv TRACE FOR session;

            timestamp           |       age       |                 message                 |                             tag                             |                location                 |    operation     | span
--------------------------------+-----------------+-----------------------------------------+-------------------------------------------------------------+-----------------------------------------+------------------+-------
  2021-11-14 21:52:16.857978+00 | 00:00:00.004353 | Scan /Table/52/3/10/0                   | [n=1,client=127.0.0.1:49475,hostssl,user=demo]              | sql/row/kv_batch_fetcher.go:441         | colbatchscan     |    7
  2021-11-14 21:52:16.858828+00 | 00:00:00.005203 | querying next range at /Table/52/3/10/0 | [n=1,client=127.0.0.1:49475,hostssl,user=demo,txn=cab3dead] | kv/kvclient/kvcoord/range_iter.go:175   | dist sender send |    9
  2021-11-14 21:52:16.859335+00 | 00:00:00.00571  | r44: sending batch 1 Get to (n1,s1):1   | [n=1,client=127.0.0.1:49475,hostssl,user=demo,txn=cab3dead] | kv/kvclient/kvcoord/dist_sender.go:1938 | dist sender send |    9
  2021-11-14 21:52:16.866307+00 | 00:00:00.012682 | rows affected: 0                        | [n=1,client=127.0.0.1:49475,hostssl,user=demo]              | sql/conn_executor_exec.go:709           | exec stmt        |    3
@nvanbenschoten nvanbenschoten added C-investigation Further steps needed to qualify. C-label will change. C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-sql-execution Relating to SQL execution. labels Nov 14, 2021
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Nov 14, 2021
@nvanbenschoten
Copy link
Member Author

Removing the if s.index.NumSecondaryStoredColumns() == 0 condition resolves the issue, but I'd like to understand why that was there to begin with before doing anything about this.

@cucaroach cucaroach self-assigned this Nov 15, 2021
@cucaroach
Copy link
Contributor

This was added by b852e92. I don't know why, author and primary reviewer are both gone (Rohan and Solon). @RaduBerinde @postamar and @jordanlewis any ideas why CanSplitSpanIntoFamilySpans would return false when the index doesn't store any columns?

@jordanlewis
Copy link
Member

No, I don't understand why that condition is there either. I can't think of any circumstance where a secondary index without stored columns would behave differently in this case.

@rohany I'm pinging you since I know you are around on our GitHub from time to time - if you happen to see this and have any recollection of why this condition was there that would be great but it's no big deal either way! Hope you are doing well :)

In some cases, we have to be careful to scan all column families because queries that are just looking for "row existence" can't rely on any given column family to be there - any column family might be. There's a sentinel key concept for this, as well. I think this only really matters for the primary index, though.

I'll think some more about this, but we could put up a PR with the condition removed and see if tests pass at least.

@rohany
Copy link
Contributor

rohany commented Nov 15, 2021

Hey, nice to hear from you!

I believe the original intention of that code was pretty explicit for the original use case -- we can't split a secondary index into separate families if the index isn't storing any columns. Nathan's use case seems valid, where family 0 can be the target of a GetRequest of a secondary index with no families. I think the code is just being used in a more general situation than originally envisioned.

cucaroach added a commit to cucaroach/cockroach that referenced this issue Nov 17, 2021
cucaroach added a commit to cucaroach/cockroach that referenced this issue Nov 22, 2021
craig bot pushed a commit that referenced this issue Nov 23, 2021
72794: sql: allow point lookups on secondary indexes with 0 stored columns r=cucaroach a=cucaroach

Remove an unnecessary check that was preventing point lookups in some
TPCE queries.

Fixes #72735

Release note: None


Co-authored-by: Tommy Reilly <[email protected]>
@craig craig bot closed this as completed in 6955df3 Nov 23, 2021
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-investigation Further steps needed to qualify. C-label will change. C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants