unsupported TPC-DS queries tracking issue #37464

yuzefovich · 2019-05-11T01:36:17Z

This issue tracks features that are unimplemented but needed for TPC-DS queries (also some of the queries that didn't finish within 5 minutes on scale factor 1 - 1 GB of data - on a 3 node roachprod cluster).

Q1, Q30, Q81 sql: internal error: needed column not produced when a CTE is used multiple times #39708
Q4, Q39, Q78 - hit the memory limit of ~4GB.
Q5, Q14, Q18, Q22, Q67, Q77, Q80 - unknown function: rollup() sql: add support for GROUPING SETS, CUBE, and ROLLUP #46280
Q7, Q13, Q17, Q19, Q25, Q26, Q29, Q45, Q46, Q48, Q50, Q61, Q64, Q66, Q68, Q84, Q85 - didn't complete in 5 minutes
Q64 - didn't complete in 5 minutes
Q17, Q35, Q39 - pq: unknown function: stddev_samp()
Q27, Q36, Q70, Q86 - skipped due to parsing errors (queries need modifications)

The queries were run one at a time, and here are the runtimes of all queries that completed within 5 minutes:

[q1] 98.21 seconds
[q2] 6.55 seconds
[q3] 0.04 seconds
[q4] 48.08 seconds
[q6] 8.29 seconds
[q7] 6.65 seconds
[q8] 0.21 seconds
[q9] 36.30 seconds
[q10] 2.18 seconds
[q11] 24.68 seconds
[q12] 0.89 seconds
[q13] 5.48 seconds
[q15] 1.74 seconds
[q16] 3.18 seconds
[q17] 2.36 seconds
[q19] 0.25 seconds
[q20] 1.61 seconds
[q21] 0.58 seconds
[q23] 44.95 seconds
[q24] 0.02 seconds
[q25] 1.54 seconds
[q26] 8.69 seconds
[q28] 18.92 seconds
[q29] 0.82 seconds
[q30] 2.02 seconds
[q31] 8.29 seconds
[q32] 0.08 seconds
[q33] 3.69 seconds
[q34] 2.79 seconds
[q35] 2.39 seconds
[q37] 0.02 seconds
[q38] 2.13 seconds
[q39] 53.34 seconds
[q40] 0.61 seconds
[q41] 0.04 seconds
[q42] 0.21 seconds
[q43] 3.23 seconds
[q44] 11.73 seconds
[q45] 1.17 seconds
[q46] 5.44 seconds
[q47] 7.76 seconds
[q48] 4.30 seconds
[q49] 0.30 seconds
[q50] 0.33 seconds
[q51] 6.89 seconds
[q52] 0.21 seconds
[q53] 0.72 seconds
[q54] 1.21 seconds
[q55] 0.16 seconds
[q56] 1.82 seconds
[q57] 4.59 seconds
[q58] 3.92 seconds
[q59] 7.24 seconds
[q60] 7.52 seconds
[q61] 0.10 seconds
[q62] 0.90 seconds
[q63] 0.73 seconds
[q65] 4.16 seconds
[q66] 2.66 seconds
[q68] 5.13 seconds
[q69] 2.11 seconds
[q71] 3.30 seconds
[q72] 3.37 seconds
[q73] 2.61 seconds
[q74] 10.34 seconds
[q75] 2.38 seconds
[q76] 3.63 seconds
[q78] 10.29 seconds
[q79] 4.76 seconds
[q81] 7.04 seconds
[q82] 0.45 seconds
[q83] 0.47 seconds
[q84] 0.20 seconds
[q85] 1.35 seconds
[q87] 2.16 seconds
[q88] 10.08 seconds
[q89] 0.89 seconds
[q90] 0.86 seconds
[q91] 0.24 seconds
[q92] 0.07 seconds
[q93] 0.03 seconds
[q94] 2.22 seconds
[q95] 26.16 seconds
[q96] 2.22 seconds
[q97] 2.40 seconds
[q98] 2.24 seconds
[q99] 2.04 seconds

Features that are missing "native" support in the vectorized engine:

Q16, Q94: right semi hash join with ON expression (exec: add support for ON expression for the joiners #38018).

Jira issue: CRDB-4428

The text was updated successfully, but these errors were encountered:

46279: sql, workload, compose: miscellaneous cleanups r=yuzefovich a=yuzefovich **sql: add stddev_samp alias for stddev aggregate builtin** Release justification: low-risk new functionality. Addresses: #37464. Release note (sql change): CockroachDB now supports `stddev_samp` aggregate builtin function which is the same as `stddev` (actually, the latter is the historical alias of the former, according to Postgres documentation). **workload, compose: miscellaneous cleanups** Release justification: non-production code changes. This commit cleans up a few things: 1. we recently renamed `experimental_on` vectorize setting to `on`, but a couple of places were missed. 2. compose-compare test (of randomized land) was skipping some queries due to bugs which have been fixed. 3. `tpcds` workload has been slightly enhanced (added `vectorize` option and refactored the way statement timeout is set). Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>

yuzefovich · 2020-08-26T18:30:05Z

Updating the issue since

we no longer hit memory limit on Q4, Q39, Q78
we now can run Q7, Q13, Q17, Q19, Q25, Q26, Q29, Q45, Q46, Q48, Q50, Q61, Q66, Q68, Q84, Q85 within 5 minutes

yuzefovich · 2021-04-22T06:19:24Z

Recent runs have shown that we have regressed between 9ba1a81 and b32bbb5 on the following queries:

Q1: no longer completes in 5 minutes
~~- Q68: 19x slowdown~~ cannot repro
Q6: 4.5x slowdown
Q37, Q61: about 2x slowdown (still extremely fast)
Q81: 1.6x slowdown
Q28, Q30, Q51, Q63, Q72, Q87, Q97, Q98: 1.2-1.5x slowdown

DrewKimball · 2021-07-06T06:45:06Z

For Q6, disabling the vectorized hash aggregator causes a significant speedup. Nearly all the time for the vectorized hash aggregator is spent in hashBasedPartitioner.partitionBatch, specifically when the tuples are enqueued into the partitioned disk queues. So, the slowdown is because the external hash aggregator has to load all tuples onto disk, whereas the row-wise aggregator maintains as many buckets as are required for the entire input.

DrewKimball · 2021-07-06T06:47:15Z

A possible solution might be to have the external aggregator perform a pass with the in-memory aggregator for each batch before partitioning, and then only partition the tuples that don't fit into the existing buckets.

DrewKimball · 2021-07-06T07:00:20Z

With regard to Q64, changing the join reorder limit to 10 joins gets the runtime down to a manageable ~17 seconds on my machine. Though, changing it to 20 brings it back up to ~1 minute - maybe because of overestimating predicate selectivity?

I'm not sure it's a good idea to increase the default join reorder limit, but it might be worth exploring left-deep trees for some number of joins beyond the limit, maybe proportional to the ratio between the number of bushy and the number of left-deep trees.

yuzefovich · 2023-04-07T20:00:08Z

Updated the numbers using 23.1.0-alpha.7.

Notable (at least 33%) speedups since the last update:

- Q1:	before: 2074.44s after: 98.21s	 -95.27%
- Q4:	before: 75.39s	after: 48.08s	 -36.22%
- Q6:	before: 20.44s	after: 8.29s	 -59.44%
- Q16:	before: 5.15s	after: 3.18s	 -38.25%
- Q24:	before: 0.04s	after: 0.02s	 -50.00%
- Q30:	before: 3.73s	after: 2.02s	 -45.84%
- Q31:	before: 12.47s	after: 8.29s	 -33.52%
- Q37:	before: 0.15s	after: 0.02s	 -86.67%
- Q39:	before: 97.26s	after: 53.34s	 -45.16%
- Q40:	before: 1.12s	after: 0.61s	 -45.54%
- Q41:	before: 0.07s	after: 0.04s	 -42.86%
- Q45:	before: 2.01s	after: 1.17s	 -41.79%
- Q49:	before: 0.46s	after: 0.30s	 -34.78%
- Q60:	before: 11.75s	after: 7.52s	 -36.00%
- Q61:	before: 0.24s	after: 0.10s	 -58.33%
- Q75:	before: 5.76s	after: 2.38s	 -58.68%
- Q81:	before: 15.51s	after: 7.04s	 -54.61%

Notable (at least 33%) slowdowns:

- Q17:	before: 1.53s	after: 2.36s	 54.25%
- Q25:	before: 1.10s	after: 1.54s	 40.00%
- Q28:	before: 7.14s	after: 18.92s	 164.99%
- Q32:	before: 0.06s	after: 0.08s	 33.33%
- Q34:	before: 2.04s	after: 2.79s	 36.76%
- Q44:	before: 6.35s	after: 11.73s	 84.72%
- Q56:	before: 1.18s	after: 1.82s	 54.24%
- Q73:	before: 1.95s	after: 2.61s	 33.85%
- Q88:	before: 7.28s	after: 10.08s	 38.46%
- Q92:	before: 0.04s	after: 0.07s	 75.00%
- Q96:	before: 1.60s	after: 2.22s	 38.75%

yuzefovich added the meta-issue Contains a list of several other issues. label May 11, 2019

yuzefovich mentioned this issue May 11, 2019

workload: add TPC-DS queries and workload #37465

Merged

yuzefovich mentioned this issue May 16, 2019

Add TPC-DS benchmark #37436

Closed

awoods187 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jun 21, 2019

yuzefovich mentioned this issue Mar 19, 2020

sql, workload, compose: miscellaneous cleanups #46279

Merged

yuzefovich mentioned this issue Feb 17, 2021

roachtest: run more queries in tpcdsvec #60630

Merged

jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021

yuzefovich mentioned this issue Mar 21, 2022

roachtest: tpcdsvec failed #78128

Closed

yuzefovich added this to SQL Queries May 2, 2024

yuzefovich moved this from Triage to Backlog in SQL Queries May 2, 2024

github-project-automation bot moved this to Triage in SQL Queries May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unsupported TPC-DS queries tracking issue #37464

unsupported TPC-DS queries tracking issue #37464

yuzefovich commented May 11, 2019 •

edited

Loading

yuzefovich commented Aug 26, 2020

yuzefovich commented Apr 22, 2021 •

edited

Loading

DrewKimball commented Jul 6, 2021

DrewKimball commented Jul 6, 2021 •

edited

Loading

DrewKimball commented Jul 6, 2021

yuzefovich commented Apr 7, 2023

unsupported TPC-DS queries tracking issue #37464

unsupported TPC-DS queries tracking issue #37464

Comments

yuzefovich commented May 11, 2019 • edited Loading

yuzefovich commented Aug 26, 2020

yuzefovich commented Apr 22, 2021 • edited Loading

DrewKimball commented Jul 6, 2021

DrewKimball commented Jul 6, 2021 • edited Loading

DrewKimball commented Jul 6, 2021

yuzefovich commented Apr 7, 2023

yuzefovich commented May 11, 2019 •

edited

Loading

yuzefovich commented Apr 22, 2021 •

edited

Loading

DrewKimball commented Jul 6, 2021 •

edited

Loading