Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support lookup joins on secondary indexes #25431

Closed
petermattis opened this issue May 11, 2018 · 14 comments
Closed

sql: support lookup joins on secondary indexes #25431

petermattis opened this issue May 11, 2018 · 14 comments
Assignees
Labels
A-sql-execution Relating to SQL execution. C-performance Perf of queries or internals. Solution not expected to change functional behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
Milestone

Comments

@petermattis
Copy link
Collaborator

Lookup joins are currently only supported when the right side of the join (the lookup side) is for the primary index. Supporting secondary index lookup joins requires an extra level of lookup (if the secondary index doesn't cover the columns in the query): first level lookup finds the primary key and the second level lookup reads the row. We have at least one user with a query which can benefit from this. The workaround is to perform the join "in the application", which is unsatisfactory.

If adding support for lookup joins on secondary indexes is not too intrusive, we should consider backporting this to 2.0.x.

@petermattis petermattis added C-performance Perf of queries or internals. Solution not expected to change functional behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs A-sql-execution Relating to SQL execution. labels May 11, 2018
@petermattis petermattis added this to the 2.1 milestone May 11, 2018
@petermattis
Copy link
Collaborator Author

This looks relatively straightforward to do. Currently, joinReader only uses a single additional lookup for the right-hand side. Specifically, joinReader.mainLoop calls joinReader.indexLookup with a list of keys (spans) to read from the right-hand side. joinReader.indexLookup performs a scan (using a RowFetcher) for those keys and then outputs the resulting rows. Allowing lookup joins on secondary indexes would require another RowFetcher that takes the output of the first step and uses it to lookup rows in the primary index.

@jordanlewis
Copy link
Member

There's also a small matter of allowing joinReader to construct secondary index keys in the first place. It's currently hardcoded to construct primary index keys.

Finally, joinReader should understand when the secondary index it's looking up into is covering, and not create a second index lookup step in that case.

@petermattis
Copy link
Collaborator Author

There's also a small matter of allowing joinReader to construct secondary index keys in the first place. It's currently hardcoded to construct primary index keys.

Where does that hardcoding occur? I see where the primaryKeyIndex variable is created, but besides the name I'm not seeing anything that is assuming that is the primary index.

@jordanlewis
Copy link
Member

I think you're right, I was going on memory and the variable name confused me.

@knz
Copy link
Contributor

knz commented May 14, 2018

I looked into this as well. Before going the route of the full indirect lookup, some mileage could be gained by extending verifyLookupJoin to accept secondary indexes that are covering of the requested columns. I think the entire execution logic would work in that case but there's a predicate against that in verifyLookupJoin for a reason that escapes me.

@petermattis
Copy link
Collaborator Author

@knz My understanding is that the full indirect lookup join is necessary for the customer query we were looking at last week. We could ask for the schema to be changed in order to be covering, though that might have negative effects (one of the fields was a string field of unknown size). My point in looking at this and filing this bug was to quantify the work involved in supporting indirect lookup joins. My conclusion is that it is straightforward work on deterministic code that we can knock out quickly.

@solongordon
Copy link
Contributor

The approach @petermattis described makes sense to me, but it seems like we could also handle the second lookup via physical planning by adding an index join after the lookup join. I think that would keep the joinreader logic simpler since we wouldn't have to add the nested RowFetcher, but can anyone think of disadvantages?

@petermattis
Copy link
Collaborator Author

@solongordon That's a good idea. I can't think of any disadvantages, though you might run into practical problems as you explore this approach.

@petermattis
Copy link
Collaborator Author

Also, be aware that you'll have to adjust the distsql physical planning code (regardless of which approach to enhancing joinReader you take).

@solongordon
Copy link
Contributor

The case where the secondary index doesn't cover all output columns may be trickier than we thought. Currently the logical planner plans an index join prior to the inner join, which seems problematic since we want it to come after.

root@:26257/> CREATE DATABASE d;
root@:26257/> CREATE TABLE d.t (x INT, y INT, z INT, PRIMARY KEY (x, y), INDEX y_idx (y));
root@:26257/> EXPLAIN SELECT t2.z FROM d.t t1 JOIN d.t@y_idx t2 ON t1.y = t2.y;
+----------------------+----------+-------------+
|         Tree         |  Field   | Description |
+----------------------+----------+-------------+
| render               |          |             |
|  └── join            |          |             |
|       │              | type     | inner       |
|       │              | equality | (y) = (y)   |
|       ├── scan       |          |             |
|       │              | table    | t@primary   |
|       │              | spans    | ALL         |
|       └── index-join |          |             |
|            ├── scan  |          |             |
|            │         | table    | t@y_idx     |
|            │         | spans    | ALL         |
|            └── scan  |          |             |
|                      | table    | t@primary   |
+----------------------+----------+-------------+

(I'm specifying d.t@y_idx for the right side since otherwise it just selects the primary index.)

I'm going to focus on the covering case for now but interested to hear thoughts on this.

@knz
Copy link
Contributor

knz commented May 15, 2018 via email

@petermattis
Copy link
Collaborator Author

@solongordon I thought the point of your earlier suggestion was to change the planning so that the index-join is performed after the lookup-join. That is, the plan described in #25431 (comment) needs to be converted to one that looks like:

+------------------------+----------+-------------+
|         Tree           |  Field   | Description |
+------------------------+----------+-------------+
| render                 |          |             |
|  └── index-join        |          |             |
|        ├── lookup-join |          |             |
|        │     │         | type     | inner       |
|        │     │         | equality | (y) = (y)   |
|        │     ├── scan  |          |             |
|        │     │         | table    | t@primary   |
|        │     │         | spans    | ALL         |
|        │     └── scan  |          |             |
|        │               | table    | t@y_idx     |
|        │               | spans    | ALL         |
|        └── scan        |          |             |
|                        | table    | t@primary   |
+------------------------+----------+-------------+

I was imagining you would do this during distsql physical planning. @knz might have a better suggestion to do this during expandPlan. But mucking with the planning code might be more difficult than adding smarts to joinReader to be able to handle a lookup join on a secondary index during execution.

@solongordon
Copy link
Contributor

Yes, that plan is what I had in mind though I hadn't realized that an index join would be logically planned below the inner join. It just makes things a bit more complicated since like @knz says we need to identify and "lift" the index join. I think this is the case even if we add more logic to joinReader since we still need to avoid performing the index join beforehand. I'll explore these options though.

@petermattis
Copy link
Collaborator Author

Yes, we need some tweak to planning even if we add more logic to joinReader. I was initially imagining this would happen in DistSQLPlanner.createPlanForJoin. If lookup-joins are enabled, then look for a structure where the right side of the join as an index-join and the left side of the join is a scan. If a lookup join is feasible, we convert this into either a lookup-join where the right side is a secondary index and joinReader is enhanced to support that, or we lift the index-join about the lookup-join. You should consult with @RaduBerinde and @knz about what is easier to do.

solongordon added a commit to solongordon/cockroach that referenced this issue May 17, 2018
joinReader now supports lookup joins on secondary indexes. This was a
trivial change for queries where all the output columns are included in
the secondary index. I just modified the physical planner to specify the
secondary index in the JoinReaderSpec and removed checks which prevented
secondary indexes from being used.

The more complicated situation is when we want to do a lookup join
against a non-covering index. In this case, the logical planner plans an
index join before the inner join, but we want to perform the lookup join
first. We now handle this by only planning the lookup join during
physical planning, not the index join. During execution, the joinReader
detects that there are output columns not covered by the secondary
index, and it performs primary index lookups as necessary to retrieve
the additional columns.

Fixes cockroachdb#25431
solongordon added a commit to solongordon/cockroach that referenced this issue May 18, 2018
joinReader now supports lookup joins on secondary indexes. This was a
trivial change for queries where all the output columns are included in
the secondary index. I just modified the physical planner to specify the
secondary index in the JoinReaderSpec and removed checks which prevented
secondary indexes from being used.

The more complicated situation is when we want to do a lookup join
against a non-covering index. In this case, the logical planner plans an
index join before the inner join, but we want to perform the lookup join
first. We now handle this by only planning the lookup join during
physical planning, not the index join. During execution, the joinReader
detects that there are output columns not covered by the secondary
index, and it performs primary index lookups as necessary to retrieve
the additional columns.

Fixes cockroachdb#25431
solongordon added a commit to solongordon/cockroach that referenced this issue May 21, 2018
joinReader now supports lookup joins on secondary indexes. This was a
trivial change for queries where all the output columns are included in
the secondary index. I just modified the physical planner to specify the
secondary index in the JoinReaderSpec and removed checks which prevented
secondary indexes from being used.

The more complicated situation is when we want to do a lookup join
against a non-covering index. In this case, the logical planner plans an
index join before the inner join, but we want to perform the lookup join
first. We now handle this by only planning the lookup join during
physical planning, not the index join. During execution, the joinReader
detects that there are output columns not covered by the secondary
index, and it performs primary index lookups as necessary to retrieve
the additional columns.

Fixes cockroachdb#25431

Release note (sql change): The experimental lookup join feature now
supports secondary indexes.
solongordon added a commit to solongordon/cockroach that referenced this issue May 22, 2018
joinReader now supports lookup joins on secondary indexes. This was a
trivial change for queries where all the output columns are included in
the secondary index. I just modified the physical planner to specify the
secondary index in the JoinReaderSpec and removed checks which prevented
secondary indexes from being used.

The more complicated situation is when we want to do a lookup join
against a non-covering index. In this case, the logical planner plans an
index join before the inner join, but we want to perform the lookup join
first. We now handle this by only planning the lookup join during
physical planning, not the index join. During execution, the joinReader
detects that there are output columns not covered by the secondary
index, and it performs primary index lookups as necessary to retrieve
the additional columns.

Fixes cockroachdb#25431

Release note (sql change): The experimental lookup join feature now
supports secondary indexes.
craig bot pushed a commit that referenced this issue May 22, 2018
25005: ui: add top-level LoginContainer: require login before rendering anything r=couchand a=vilterp

Depends on #25057 
Touches #24939

![](https://user-images.githubusercontent.com/7341/39150096-38b4cd28-470f-11e8-9d67-e1832d35a211.gif)

Shown above:
1. go to admin UI; see login screen
2. error message when you type in the wrong password
3. you can't hit an authenticated endpoint because you don't have a valid session (this checking is turned on by #24944, only in secure mode)
4. once you login with the right password, you can see the UI (the temporary "connection lost" banner shouldn't be there)
5. now you (and the UI itself) can hit endpoints, because you have a valid session

Todos:
- [ ] redirect to login page instead of current wrapper component switching thing
- [ ] logout
    - [ ] logout button (make API call; redirect to login & blow away redux store)
    - [ ] log out other tabs (if they get a 403, redirect to login & blow away redux store)
- [ ] styling

Release note: None

25628: distsql: support lookup join on secondary index r=solongordon a=solongordon

joinReader now supports lookup joins on secondary indexes. This was a
trivial change for queries where all the output columns are included in
the secondary index. I just modified the physical planner to specify the
secondary index in the JoinReaderSpec and removed checks which prevented
secondary indexes from being used.

The more complicated situation is when we want to do a lookup join
against a non-covering index. In this case, the logical planner plans an
index join before the inner join, but we want to perform the lookup join
first. We now handle this by only planning the lookup join during
physical planning, not the index join. During execution, the joinReader
detects that there are output columns not covered by the secondary
index, and it performs primary index lookups as necessary to retrieve
the additional columns.

Fixes #25431

Co-authored-by: Pete Vilter <[email protected]>
Co-authored-by: Andrew Couch <[email protected]>
Co-authored-by: Solon Gordon <[email protected]>
@craig craig bot closed this as completed in #25628 May 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-performance Perf of queries or internals. Solution not expected to change functional behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
Projects
None yet
Development

No branches or pull requests

4 participants