-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking: parity with Postgres for TPC-H cardinality estimations #127
Comments
wangpatrick57
changed the title
Tracking: parity with Postgres for cardinality estimation
Tracking: parity with Postgres for TPC-H cardinality estimations
Mar 24, 2024
wangpatrick57
added a commit
that referenced
this issue
Mar 30, 2024
**Summary**: Using magic numbers from Postgres in various selectivity edge cases. **Demo**: Different (unfortunately worse) q-error on TPC-H SF1. See #127 for per-query details on how this PR affects q-error. ![Screenshot 2024-03-30 at 11 27 24](https://github.com/cmu-db/optd/assets/20631215/b0cce5d4-6ac8-4cd5-b0cf-48f86db14d26) **Details**: * Fixed the cardinality of Q10! * `INVALID_SEL` is **no longer used** at all during cardtest. It is still used during plannertest as some plannertests use the optd optimizer instead of the datafusion logical optimizer. This can be checked by replacing all instances of `INVALID_SEL` with a `panic!()` and seeing that cardtest still runs. * Using magic number from Postgres for `LIKE`. * Using magic number from Postgres for equality with various complex expressions. * Using magic number from Postgres for range comparison with various complex expressions. * Replaced `INVALID_SEL` with `panic!()` and `unreachable!()` statements in places where it makes sense.
This was referenced Mar 30, 2024
Merged
wangpatrick57
added a commit
that referenced
this issue
Mar 31, 2024
**Summary**: Implemented join selectivity formulas for inner joins, left/right outer joins, and cross joins. Also properly accounts for filters in the join condition. **Demo**: We now match Postgres on our median Q-error. See #127 for more details on what queries this PR affected. ![Screenshot 2024-03-31 at 13 13 48](https://github.com/cmu-db/optd/assets/20631215/fae590a6-8c55-4016-b924-c697a1c25070) **Details**: * We only consider equality checks on columns of different tables to be "join on conditions". * Join selectivity formulas are from [Rogov 2022](https://postgrespro.com/blog/pgsql/5969618). * If there are multiple on conditions, we multiply their selectivities together.
22 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Notes
Queries
EXTRACT(year FROM orders.o_orderdate)
and it simply uses the N-Distinct of orders.o_orderdate as the cardinality of the query.p_name like '%forest'
and just useo_orderdate as o_year
, you get exactly 60150 rows.The text was updated successfully, but these errors were encountered: