distsql: support window functions in distsql #24425

nvanbenschoten · 2018-04-02T19:13:44Z

We currently don't support window functions in DistSQL queries. This means that any query which includes a window function will be forced into local SQL execution. This is a problem for OLAP-style workloads because these often include window functions when performing complex analytics.

We should fix this. The design of window function processing in DistSQL would need to be carefully thought out. To start, we would want to consider how they could be distributed and decomposed. To get an intuition about this, we can look at how window functions are implemented in local SQL. Each function begins with a scan over all input rows in order to compute partitions. This is similar to the partitioning step in a hash join, which DistSQL already knows how to do. After this, each partition can be operated on in isolation. The per-partition work can be decomposed into two steps - a sort and an aggregation. It would be ideal if these two steps could re-use existing distSQL processors. This should be possible but will require that aggregation processors be taught about window frames. Finally, we'll need to merge all of the partitions together. This again could rely on some existing processors like the UNION processor and the HASH JOIN processor (depending on how many columns in each row are propagated through the window function computation steps).

One complication of window functions is that a given query can have multiple window functions. This is similar to how a query can have multiple aggregation functions, but can get a lot more complicated because each window function can specify a different partitioning and sorting scheme. This means that we can't always combine window function aggregations. Instead, it's probably easiest to run each window function serially and not worry about multiple partitioning phases.

Adds support of window functions in DistSQL. A stage of windower processors is added for a particular PARTITION BY scheme, all window functions with the same partitioning are processed by that stage. A doc describing all the details of the implementation will be out soon. Note: memory accounting is not done, I plan on adding that after this PR. Resolves: cockroachdb#24425. Release note: None

Adds support of window functions in DistSQL. A stage of windower processors is added for a particular PARTITION BY scheme, all window functions with the same partitioning are processed by that stage. We also cache sorted partitions if more than one window function has the same ORDER BY clause. A doc describing the details of the implementation is out. Note: memory accounting is not done, I plan on adding that after this PR. Resolves: cockroachdb#24425. Release note: None

27140: distsql: support window functions in distsql r=yuzefovich a=yuzefovich Adds support of window functions in DistSQL. A stage of windower processors is added for a particular PARTITION BY scheme, all window functions with the same partitioning are processed by that stage. We also cache sorted partitions if more than one window function has the same ORDER BY clause. A doc describing the details of the implementation is out. Note: memory accounting is not done, I plan on adding that after this PR. Resolves: #24425. Release note: None Co-authored-by: yuzefovich <[email protected]>

jordanlewis · 2018-08-06T21:42:21Z

Boom!!!

Adds memory accounting and stats collection to windowers. Related to: cockroachdb#24425. Release note: None

Using slices instead of maps gives significant performance improvement. On several quiries I ran with profiling, time spent while populating output rows reduced from 10s or to 1-2s (in terms of performance degradation: the difference went down from 90-100% to about 20%). Related to: cockroachdb#24425. Release note: None

@benesch

27978: sql: improve handling of decimal zeroes r=mjibson a=mjibson Previously we had inconsistent handling of decimal zeroes. Various forms (-0, -0.00, +0.00) were correctly handled from a "what is this equal to" perspective, but we were not consistent in keeping the digits after the decimal point. Furthermore, postgres always removes negatives when making -0 for ints and decimals (not floats). This behavior (removing the negative from -0 automatically) existed in the higher level apd tests but we were subverting it by using apd.Decimal.Neg directly. apd has been updated to do this for Decimals now. Our constant evaluation has also caused us some problems. We use the go/constant package, but it evaluates numbers as floats. When we applied the + operator our NumVal type would remove its OrigString representation and thus use only the constant's Value, which, as a float, didn't contain the trailing digits after the decimal (in cases like +0.000). Now the + operator is a pass through. The - operator remains the same, but has special logic to detect -0 and leave the original string The added logic tests have been compared to postgres and now we match them exactly for many more things. Of note is that, in postgres, `SELECT -0::float` returns -0 (in opposition to the decimal behavior). This is why the test in TestParseDatumStringAs removed only the -0 test for decimals and not floats. Release note (sql change): More correct handling of decimal 0s. Specifically, -0 is coerced to 0 and values like 0.00 retain the digits after the decimal point. 28145: sql: hopefully reduce the memory usage of TestDropWhileBackfill r=vivekmenezes a=vivekmenezes related to #28084 Release note: None 28229: rangefeed: add resolved timestamp initialization scan and registration catch-up scan r=nvanbenschoten a=nvanbenschoten This PR includes a number of small improvements to the `rangefeed` package, culminating in the introduction of two scans that a rangefeed will need to perform. The first is a scan over the entire key range from the previously known resolved timestamp onward to learn about all existing unresolved intents. The second is a scan over a new registration's key range to backfill in all values from before it was registered but after its starting timestamp. Both of these scans will be able to take advantage of TimeBound iterators. However, the way they're represented in the `rangefeed` package is pretty abstract, which made testing pretty easy. The first commit and the last two are substantial, but the rest are just small logical changes so I hope this is easy to review. @benesch thanks for offering to take a look at this while @tschottdorf is out! There's only been one PR merged so far so it shouldn't take much to catch up to speed: #28072. 28313: workload: revert --init idempotency if initial rows are loaded r=nvanbenschoten a=nvanbenschoten Fixes #27981. Release note: None 28326: cli: set a timeout delay for SQL connections r=knz a=knz Fixes #20097. This patch leverages a feature in the underlying SQL driver `lib/pq` to set a connection timeout for all CLI commands that use a SQL connection. This includes `cockroach sql`/`demo`, `cockroach node ls`, `cockroach zone set`, etc. The timeout is set to 5 seconds. If the client cannot connect within 5 seconds it fails with the following error: ``` Error: cannot dial server. Is the server running? If the server is running, check --host client-side and --advertise server-side. read tcp [::1]:60205->[::1]:26257: i/o timeout ``` Release note (cli change): the various client commands that use a SQL connection (including e.g. `cockroach sql`, `cockroach node`, `cockroach user`) now produce an error if a connection could not be established within 5 seconds instead of waiting forever. 28341: distsql: use slices for populating output of windowers r=yuzefovich a=yuzefovich Using slices instead of maps gives significant performance improvement. On several quiries I ran with profiling, time spent while populating output rows reduced from 10s or to 1-2s (in terms of performance degradation: the difference went down from 90-100% to about 20%). Related to: #24425. Release note: None Co-authored-by: Matt Jibson <[email protected]> Co-authored-by: Vivek Menezes <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: yuzefovich <[email protected]>

Adds memory accounting and stats collection to windowers. Related to: cockroachdb#24425. Release note: None

28328: distsql: add memory accounting and stats to windowers r=yuzefovich a=yuzefovich Adds memory accounting and stats collection to windowers. Related to: #24425. Release note: None Co-authored-by: yuzefovich <[email protected]>

yuzefovich · 2018-08-13T21:08:01Z

Here are benchmarks on 4 node roachprod cluster with TPCH dataset (after adding memory accounting and fixing some inefficiencies during population of output rows). Performance improvements are 2x and more. The numbers are even higher if a query has several window functions with the same partitioning scheme.

Please note that on 1 node cluster, the performance of DistSQL window functions will be worse than of local execution which is expected (decrease in performance is usually around 20%).

Comparison on orders table (1500000 rows)

Query	Local	DistSQL	Difference
`SELECT count(*) OVER (PARTITION BY o_orderdate) FROM orders ORDER BY o_orderkey LIMIT 10;`	3.770270427s	1.579288617s	-58.11%
`SELECT sum(o_totalprice) OVER (PARTITION BY o_orderdate ORDER BY o_orderkey) FROM orders ORDER BY o_orderkey LIMIT 10;`	6.319257886s	2.457607566s	-61.11%
`SELECT avg(o_totalprice) OVER (PARTITION BY o_orderdate ORDER BY o_orderkey) FROM orders ORDER BY o_orderkey LIMIT 10;`	18.839226118s	5.485205625s	-70.88%
`SELECT avg(o_totalprice) OVER w, sum(o_custkey) OVER w FROM orders WINDOW w AS (PARTITION BY o_orderdate ORDER BY o_orderkey) ORDER BY o_orderkey LIMIT 10;`	22.03540549s	6.03339383s	-72.62%
`SELECT count(*) OVER w1, rank() OVER w2, avg(o_totalprice) OVER w1, sum(o_custkey) OVER w2, max(o_shippriority) OVER w2 FROM orders WINDOW w1 AS (PARTITION BY o_orderdate), w2 AS (PARTITION BY o_orderpriority ORDER BY o_orderkey) ORDER BY o_orderkey LIMIT 10;`	29.909022153s	10.799945246s	-63.89%

Comparison on lineitem table (6001215 rows)

Query	Local	DistSQL	Difference
`SELECT count(*) OVER (PARTITION BY l_shipdate) FROM lineitem ORDER BY l_orderkey LIMIT 10;`	15.852753941s	8.356666513s	-47.29%
`SELECT sum(l_quantity) OVER (PARTITION BY l_shipdate ORDER BY l_orderkey) FROM lineitem ORDER BY l_orderkey LIMIT 10;`	28.248049396s	11.769740243s	-58.33%
`SELECT avg(l_quantity) OVER (PARTITION BY l_shipdate ORDER BY l_orderkey) FROM lineitem ORDER BY l_orderkey LIMIT 10;`	77.418174762s	23.919147975s	-69.10%
`SELECT avg(l_quantity) OVER w, sum(l_linenumber) OVER w FROM lineitem WINDOW w AS (PARTITION BY l_receiptdate ORDER BY l_orderkey) ORDER BY l_orderkey LIMIT 10;`	OOM error	26.874768298s	-infinity%
`SELECT row_number() OVER w1, sum(l_discount) OVER w2, last_value(l_comment) OVER w1, min(l_suppkey) OVER w2 FROM lineitem WINDOW w1 AS (PARTITION BY l_shipmode ORDER BY l_orderkey), w2 AS (PARTITION BY l_commitdate) ORDER BY l_orderkey LIMIT 10;`	OOM error	34.302523343s	-infinity%

bladefist · 2018-08-13T22:40:49Z

@yuzefovich is a stud. thank you so much for this.

nvanbenschoten added this to the 2.1 milestone Apr 2, 2018

RaduBerinde mentioned this issue Apr 2, 2018

distsql: tracking issue for feature parity gaps #19320

Closed

13 tasks

rjnn self-assigned this Apr 2, 2018

knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-execution Relating to SQL execution. labels Apr 27, 2018

jordanlewis assigned yuzefovich and unassigned rjnn May 21, 2018

yuzefovich mentioned this issue Jul 3, 2018

distsql: support window functions in distsql #27140

Merged

craig bot closed this as completed in #27140 Aug 6, 2018

yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Aug 7, 2018

distsql: add memory accounting and stats to windowers

2f2a83e

Adds memory accounting and stats collection to windowers. Related to: cockroachdb#24425. Release note: None

yuzefovich mentioned this issue Aug 7, 2018

distsql: add memory accounting and stats to windowers #28328

Merged

yuzefovich mentioned this issue Aug 7, 2018

distsql: use slices for populating output of windowers #28341

Merged

yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Aug 9, 2018

distsql: add memory accounting and stats to windowers

fef1a65

Adds memory accounting and stats collection to windowers. Related to: cockroachdb#24425. Release note: None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distsql: support window functions in distsql #24425

distsql: support window functions in distsql #24425

nvanbenschoten commented Apr 2, 2018

jordanlewis commented Aug 6, 2018

yuzefovich commented Aug 13, 2018

bladefist commented Aug 13, 2018

distsql: support window functions in distsql #24425

distsql: support window functions in distsql #24425

Comments

nvanbenschoten commented Apr 2, 2018

jordanlewis commented Aug 6, 2018

yuzefovich commented Aug 13, 2018

bladefist commented Aug 13, 2018