sql: support lookup joins on secondary indexes #25431

petermattis · 2018-05-11T19:51:39Z

Lookup joins are currently only supported when the right side of the join (the lookup side) is for the primary index. Supporting secondary index lookup joins requires an extra level of lookup (if the secondary index doesn't cover the columns in the query): first level lookup finds the primary key and the second level lookup reads the row. We have at least one user with a query which can benefit from this. The workaround is to perform the join "in the application", which is unsatisfactory.

If adding support for lookup joins on secondary indexes is not too intrusive, we should consider backporting this to 2.0.x.

petermattis · 2018-05-12T21:14:22Z

This looks relatively straightforward to do. Currently, joinReader only uses a single additional lookup for the right-hand side. Specifically, joinReader.mainLoop calls joinReader.indexLookup with a list of keys (spans) to read from the right-hand side. joinReader.indexLookup performs a scan (using a RowFetcher) for those keys and then outputs the resulting rows. Allowing lookup joins on secondary indexes would require another RowFetcher that takes the output of the first step and uses it to lookup rows in the primary index.

jordanlewis · 2018-05-13T16:25:09Z

There's also a small matter of allowing joinReader to construct secondary index keys in the first place. It's currently hardcoded to construct primary index keys.

Finally, joinReader should understand when the secondary index it's looking up into is covering, and not create a second index lookup step in that case.

petermattis · 2018-05-13T16:27:52Z

There's also a small matter of allowing joinReader to construct secondary index keys in the first place. It's currently hardcoded to construct primary index keys.

Where does that hardcoding occur? I see where the primaryKeyIndex variable is created, but besides the name I'm not seeing anything that is assuming that is the primary index.

jordanlewis · 2018-05-13T16:34:11Z

I think you're right, I was going on memory and the variable name confused me.

knz · 2018-05-14T10:37:36Z

I looked into this as well. Before going the route of the full indirect lookup, some mileage could be gained by extending verifyLookupJoin to accept secondary indexes that are covering of the requested columns. I think the entire execution logic would work in that case but there's a predicate against that in verifyLookupJoin for a reason that escapes me.

petermattis · 2018-05-14T13:34:47Z

@knz My understanding is that the full indirect lookup join is necessary for the customer query we were looking at last week. We could ask for the schema to be changed in order to be covering, though that might have negative effects (one of the fields was a string field of unknown size). My point in looking at this and filing this bug was to quantify the work involved in supporting indirect lookup joins. My conclusion is that it is straightforward work on deterministic code that we can knock out quickly.

solongordon · 2018-05-14T16:54:55Z

The approach @petermattis described makes sense to me, but it seems like we could also handle the second lookup via physical planning by adding an index join after the lookup join. I think that would keep the joinreader logic simpler since we wouldn't have to add the nested RowFetcher, but can anyone think of disadvantages?

petermattis · 2018-05-14T16:58:23Z

@solongordon That's a good idea. I can't think of any disadvantages, though you might run into practical problems as you explore this approach.

petermattis · 2018-05-14T16:59:00Z

Also, be aware that you'll have to adjust the distsql physical planning code (regardless of which approach to enhancing joinReader you take).

solongordon · 2018-05-15T03:20:57Z

The case where the secondary index doesn't cover all output columns may be trickier than we thought. Currently the logical planner plans an index join prior to the inner join, which seems problematic since we want it to come after.

root@:26257/> CREATE DATABASE d;
root@:26257/> CREATE TABLE d.t (x INT, y INT, z INT, PRIMARY KEY (x, y), INDEX y_idx (y));
root@:26257/> EXPLAIN SELECT t2.z FROM d.t t1 JOIN d.t@y_idx t2 ON t1.y = t2.y;
+----------------------+----------+-------------+
|         Tree         |  Field   | Description |
+----------------------+----------+-------------+
| render               |          |             |
|  └── join            |          |             |
|       │              | type     | inner       |
|       │              | equality | (y) = (y)   |
|       ├── scan       |          |             |
|       │              | table    | t@primary   |
|       │              | spans    | ALL         |
|       └── index-join |          |             |
|            ├── scan  |          |             |
|            │         | table    | t@y_idx     |
|            │         | spans    | ALL         |
|            └── scan  |          |             |
|                      | table    | t@primary   |
+----------------------+----------+-------------+

(I'm specifying d.t@y_idx for the right side since otherwise it just selects the primary index.)

I'm going to focus on the covering case for now but interested to hear thoughts on this.

knz · 2018-05-15T11:47:59Z

Solon perhaps we can make a heuristic that lifts the index join above the join in specific cases, in expandPlan(). Ping me or Radu if you want suggestions. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.

petermattis · 2018-05-15T12:29:43Z

@solongordon I thought the point of your earlier suggestion was to change the planning so that the index-join is performed after the lookup-join. That is, the plan described in #25431 (comment) needs to be converted to one that looks like:

+------------------------+----------+-------------+
|         Tree           |  Field   | Description |
+------------------------+----------+-------------+
| render                 |          |             |
|  └── index-join        |          |             |
|        ├── lookup-join |          |             |
|        │     │         | type     | inner       |
|        │     │         | equality | (y) = (y)   |
|        │     ├── scan  |          |             |
|        │     │         | table    | t@primary   |
|        │     │         | spans    | ALL         |
|        │     └── scan  |          |             |
|        │               | table    | t@y_idx     |
|        │               | spans    | ALL         |
|        └── scan        |          |             |
|                        | table    | t@primary   |
+------------------------+----------+-------------+

I was imagining you would do this during distsql physical planning. @knz might have a better suggestion to do this during expandPlan. But mucking with the planning code might be more difficult than adding smarts to joinReader to be able to handle a lookup join on a secondary index during execution.

solongordon · 2018-05-15T13:03:38Z

Yes, that plan is what I had in mind though I hadn't realized that an index join would be logically planned below the inner join. It just makes things a bit more complicated since like @knz says we need to identify and "lift" the index join. I think this is the case even if we add more logic to joinReader since we still need to avoid performing the index join beforehand. I'll explore these options though.

petermattis · 2018-05-15T13:30:26Z

Yes, we need some tweak to planning even if we add more logic to joinReader. I was initially imagining this would happen in DistSQLPlanner.createPlanForJoin. If lookup-joins are enabled, then look for a structure where the right side of the join as an index-join and the left side of the join is a scan. If a lookup join is feasible, we convert this into either a lookup-join where the right side is a secondary index and joinReader is enhanced to support that, or we lift the index-join about the lookup-join. You should consult with @RaduBerinde and @knz about what is easier to do.

joinReader now supports lookup joins on secondary indexes. This was a trivial change for queries where all the output columns are included in the secondary index. I just modified the physical planner to specify the secondary index in the JoinReaderSpec and removed checks which prevented secondary indexes from being used. The more complicated situation is when we want to do a lookup join against a non-covering index. In this case, the logical planner plans an index join before the inner join, but we want to perform the lookup join first. We now handle this by only planning the lookup join during physical planning, not the index join. During execution, the joinReader detects that there are output columns not covered by the secondary index, and it performs primary index lookups as necessary to retrieve the additional columns. Fixes cockroachdb#25431

joinReader now supports lookup joins on secondary indexes. This was a trivial change for queries where all the output columns are included in the secondary index. I just modified the physical planner to specify the secondary index in the JoinReaderSpec and removed checks which prevented secondary indexes from being used. The more complicated situation is when we want to do a lookup join against a non-covering index. In this case, the logical planner plans an index join before the inner join, but we want to perform the lookup join first. We now handle this by only planning the lookup join during physical planning, not the index join. During execution, the joinReader detects that there are output columns not covered by the secondary index, and it performs primary index lookups as necessary to retrieve the additional columns. Fixes cockroachdb#25431 Release note (sql change): The experimental lookup join feature now supports secondary indexes.

25005: ui: add top-level LoginContainer: require login before rendering anything r=couchand a=vilterp Depends on #25057 Touches #24939 ![](https://user-images.githubusercontent.com/7341/39150096-38b4cd28-470f-11e8-9d67-e1832d35a211.gif) Shown above: 1. go to admin UI; see login screen 2. error message when you type in the wrong password 3. you can't hit an authenticated endpoint because you don't have a valid session (this checking is turned on by #24944, only in secure mode) 4. once you login with the right password, you can see the UI (the temporary "connection lost" banner shouldn't be there) 5. now you (and the UI itself) can hit endpoints, because you have a valid session Todos: - [ ] redirect to login page instead of current wrapper component switching thing - [ ] logout - [ ] logout button (make API call; redirect to login & blow away redux store) - [ ] log out other tabs (if they get a 403, redirect to login & blow away redux store) - [ ] styling Release note: None 25628: distsql: support lookup join on secondary index r=solongordon a=solongordon joinReader now supports lookup joins on secondary indexes. This was a trivial change for queries where all the output columns are included in the secondary index. I just modified the physical planner to specify the secondary index in the JoinReaderSpec and removed checks which prevented secondary indexes from being used. The more complicated situation is when we want to do a lookup join against a non-covering index. In this case, the logical planner plans an index join before the inner join, but we want to perform the lookup join first. We now handle this by only planning the lookup join during physical planning, not the index join. During execution, the joinReader detects that there are output columns not covered by the secondary index, and it performs primary index lookups as necessary to retrieve the additional columns. Fixes #25431 Co-authored-by: Pete Vilter <[email protected]> Co-authored-by: Andrew Couch <[email protected]> Co-authored-by: Solon Gordon <[email protected]>

petermattis added C-performance Perf of queries or internals. Solution not expected to change functional behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs A-sql-execution Relating to SQL execution. labels May 11, 2018

petermattis added this to the 2.1 milestone May 11, 2018

petermattis assigned jordanlewis May 11, 2018

solongordon assigned solongordon and unassigned jordanlewis May 14, 2018

lnhsingh mentioned this issue May 16, 2018

Lookup Joins cockroachdb/docs#3153

Closed

solongordon mentioned this issue May 17, 2018

distsql: support lookup join on secondary index #25628

Merged

craig bot closed this as completed in #25628 May 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: support lookup joins on secondary indexes #25431

sql: support lookup joins on secondary indexes #25431

petermattis commented May 11, 2018

petermattis commented May 12, 2018

jordanlewis commented May 13, 2018

petermattis commented May 13, 2018

jordanlewis commented May 13, 2018

knz commented May 14, 2018

petermattis commented May 14, 2018

solongordon commented May 14, 2018

petermattis commented May 14, 2018

petermattis commented May 14, 2018

solongordon commented May 15, 2018

knz commented May 15, 2018 via email

petermattis commented May 15, 2018

solongordon commented May 15, 2018

petermattis commented May 15, 2018

sql: support lookup joins on secondary indexes #25431

sql: support lookup joins on secondary indexes #25431

Comments

petermattis commented May 11, 2018

petermattis commented May 12, 2018

jordanlewis commented May 13, 2018

petermattis commented May 13, 2018

jordanlewis commented May 13, 2018

knz commented May 14, 2018

petermattis commented May 14, 2018

solongordon commented May 14, 2018

petermattis commented May 14, 2018

petermattis commented May 14, 2018

solongordon commented May 15, 2018

knz commented May 15, 2018 via email

petermattis commented May 15, 2018

solongordon commented May 15, 2018

petermattis commented May 15, 2018