-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
35561: opt: make lookup join more expensive r=justinj a=justinj This change adds a constant factor to the cost of fetching a row for a lookup join scan. This constant was picked somewhat arbitrarily and can be made more principled going forward. This change was motivated by this query: ``` SELECT count(*) FROM lineitem JOIN supplier ON l_suppkey = s_suppkey ``` This query runs an approximately 1:600 join, supplier having around 10,000 rows and lineitem having around 6,000,000 (there's an FK relationship so technically we could replace this join with a scan). Previously a lookup-join was chosen. After this change, a merge-join is chosen, which is roughly 2x as fast. Run locally, the times for each join strategy are (recall `INNER HASH JOIN` and `INNER LOOKUP JOIN` are not commutative): ``` SELECT count(*) FROM lineitem INNER HASH JOIN supplier ON l_suppkey = s_suppkey => 4.67s => cost: 6,446,371 SELECT count(*) FROM supplier INNER HASH JOIN lineitem ON l_suppkey = s_suppkey => 4.48s => cost: 6,476,327 SELECT count(*) FROM lineitem INNER MERGE JOIN supplier ON l_suppkey = s_suppkey => 3.68s => cost: 6,431,793 SELECT count(*) FROM lineitem INNER LOOKUP JOIN supplier ON l_suppkey = s_suppkey => 10.16s => old cost: 36,765,282 => new cost: 42,746,162 SELECT count(*) FROM supplier INNER LOOKUP JOIN lineitem ON l_suppkey = s_suppkey => 6.53s => old cost: 6,330,224 => new cost: 12,311,104 ``` This was validated experimentally by running the following three parameterized queries via exprgen: Merge Join ---------- ``` (MergeJoin (Scan [ (Table "lineitem") (Cols "l_suppkey") (Index "lineitem@l_sk") (HardLimit $lineitem_rows) ] ) (Scan [ (Table "supplier") (Cols "s_suppkey") (HardLimit $supplier_rows) ] ) [ ] [ (JoinType "inner-join") (LeftEq "+l_suppkey") (RightEq "+s_suppkey") (LeftOrdering "+l_suppkey") (RightOrdering "+s_suppkey") ] ) ``` Hash Join --------- ``` (InnerJoin (Scan [ (Table "supplier") (Cols "s_suppkey") (Index "supplier@s_nk") (HardLimit $supplier_rows) ] ) (Scan [ (Table "lineitem") (Cols "l_suppkey") (Index "lineitem@l_sk") (HardLimit $lineitem_rows) ] ) [ (Eq (Var "l_suppkey") (Var "s_suppkey")) ] [ ] ) ``` Lookup Join ----------- ``` (MakeLookupJoin (Scan [ (Table "supplier") (Index "supplier@s_nk") (Cols "s_suppkey") (HardLimit $supplier_rows) ] ) [ (JoinType "inner-join") (Table "lineitem") (Index "lineitem@l_sk") (KeyCols "s_suppkey") (Cols "l_suppkey") ] [ ] ) ``` Varying the input sizes, the original plot of estimated cost to actual cost looked like this: <img width="394" alt="image" src="https://user-images.githubusercontent.com/409075/54059872-b1935c00-41c8-11e9-8799-0482da48ff5f.png"> After making this change, the plot looks like this: <img width="397" alt="image" src="https://user-images.githubusercontent.com/409075/54059990-12bb2f80-41c9-11e9-8879-8f0586bae673.png"> (Green = Lookup, Purple = Hash, Red = Merge). The change is somewhat unprincipled, but experimentally holds up in at least the case of this query, and doesn't appear to have any overtly bad effects on tests that I can see, besides one sort-merge join that bears further investigation (the query, TPCH Q2, is ever so slightly slower after this change, ~2.05s -> ~2.15s from my testing). Release note (sql change): the cost-based optimizer will now pick lookup-joins less frequently. Co-authored-by: Justin Jaffray <[email protected]>
- Loading branch information
Showing
12 changed files
with
123 additions
and
77 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.