Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distsql: support window functions in distsql #24425

Closed
nvanbenschoten opened this issue Apr 2, 2018 · 3 comments
Closed

distsql: support window functions in distsql #24425

nvanbenschoten opened this issue Apr 2, 2018 · 3 comments
Assignees
Labels
A-sql-execution Relating to SQL execution. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Milestone

Comments

@nvanbenschoten
Copy link
Member

We currently don't support window functions in DistSQL queries. This means that any query which includes a window function will be forced into local SQL execution. This is a problem for OLAP-style workloads because these often include window functions when performing complex analytics.

We should fix this. The design of window function processing in DistSQL would need to be carefully thought out. To start, we would want to consider how they could be distributed and decomposed. To get an intuition about this, we can look at how window functions are implemented in local SQL. Each function begins with a scan over all input rows in order to compute partitions. This is similar to the partitioning step in a hash join, which DistSQL already knows how to do. After this, each partition can be operated on in isolation. The per-partition work can be decomposed into two steps - a sort and an aggregation. It would be ideal if these two steps could re-use existing distSQL processors. This should be possible but will require that aggregation processors be taught about window frames. Finally, we'll need to merge all of the partitions together. This again could rely on some existing processors like the UNION processor and the HASH JOIN processor (depending on how many columns in each row are propagated through the window function computation steps).

One complication of window functions is that a given query can have multiple window functions. This is similar to how a query can have multiple aggregation functions, but can get a lot more complicated because each window function can specify a different partitioning and sorting scheme. This means that we can't always combine window function aggregations. Instead, it's probably easiest to run each window function serially and not worry about multiple partitioning phases.

@nvanbenschoten nvanbenschoten added this to the 2.1 milestone Apr 2, 2018
@rjnn rjnn self-assigned this Apr 2, 2018
@knz knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-execution Relating to SQL execution. labels Apr 27, 2018
@jordanlewis jordanlewis assigned yuzefovich and unassigned rjnn May 21, 2018
yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Jul 20, 2018
Adds support of window functions in DistSQL.
A stage of windower processors is added
for a particular PARTITION BY scheme, all
window functions with the same partitioning
are processed by that stage.

A doc describing all the details of the
implementation will be out soon.

Note: memory accounting is not done,
I plan on adding that after this PR.

Resolves: cockroachdb#24425.

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Jul 24, 2018
Adds support of window functions in DistSQL.
A stage of windower processors is added
for a particular PARTITION BY scheme, all
window functions with the same partitioning
are processed by that stage. We also cache
sorted partitions if more than one window
function has the same ORDER BY clause.

A doc describing the details of the
implementation is out.

Note: memory accounting is not done,
I plan on adding that after this PR.

Resolves: cockroachdb#24425.

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Jul 28, 2018
Adds support of window functions in DistSQL.
A stage of windower processors is added
for a particular PARTITION BY scheme, all
window functions with the same partitioning
are processed by that stage. We also cache
sorted partitions if more than one window
function has the same ORDER BY clause.

A doc describing the details of the
implementation is out.

Note: memory accounting is not done,
I plan on adding that after this PR.

Resolves: cockroachdb#24425.

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Jul 29, 2018
Adds support of window functions in DistSQL.
A stage of windower processors is added
for a particular PARTITION BY scheme, all
window functions with the same partitioning
are processed by that stage. We also cache
sorted partitions if more than one window
function has the same ORDER BY clause.

A doc describing the details of the
implementation is out.

Note: memory accounting is not done,
I plan on adding that after this PR.

Resolves: cockroachdb#24425.

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Aug 2, 2018
Adds support of window functions in DistSQL.
A stage of windower processors is added
for a particular PARTITION BY scheme, all
window functions with the same partitioning
are processed by that stage. We also cache
sorted partitions if more than one window
function has the same ORDER BY clause.

A doc describing the details of the
implementation is out.

Note: memory accounting is not done,
I plan on adding that after this PR.

Resolves: cockroachdb#24425.

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Aug 2, 2018
Adds support of window functions in DistSQL.
A stage of windower processors is added
for a particular PARTITION BY scheme, all
window functions with the same partitioning
are processed by that stage. We also cache
sorted partitions if more than one window
function has the same ORDER BY clause.

A doc describing the details of the
implementation is out.

Note: memory accounting is not done,
I plan on adding that after this PR.

Resolves: cockroachdb#24425.

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Aug 6, 2018
Adds support of window functions in DistSQL.
A stage of windower processors is added
for a particular PARTITION BY scheme, all
window functions with the same partitioning
are processed by that stage. We also cache
sorted partitions if more than one window
function has the same ORDER BY clause.

A doc describing the details of the
implementation is out.

Note: memory accounting is not done,
I plan on adding that after this PR.

Resolves: cockroachdb#24425.

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Aug 6, 2018
Adds support of window functions in DistSQL.
A stage of windower processors is added
for a particular PARTITION BY scheme, all
window functions with the same partitioning
are processed by that stage. We also cache
sorted partitions if more than one window
function has the same ORDER BY clause.

A doc describing the details of the
implementation is out.

Note: memory accounting is not done,
I plan on adding that after this PR.

Resolves: cockroachdb#24425.

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Aug 6, 2018
Adds support of window functions in DistSQL.
A stage of windower processors is added
for a particular PARTITION BY scheme, all
window functions with the same partitioning
are processed by that stage. We also cache
sorted partitions if more than one window
function has the same ORDER BY clause.

A doc describing the details of the
implementation is out.

Note: memory accounting is not done,
I plan on adding that after this PR.

Resolves: cockroachdb#24425.

Release note: None
craig bot pushed a commit that referenced this issue Aug 6, 2018
27140: distsql: support window functions in distsql r=yuzefovich a=yuzefovich

Adds support of window functions in DistSQL. A stage of windower processors is added for a particular PARTITION BY scheme, all window functions with the same partitioning are processed by that stage. We also cache sorted partitions if more than one window function has the same ORDER BY clause.

A doc describing the details of the implementation is out.

Note: memory accounting is not done, I plan on adding that after this PR.

Resolves: #24425.

Release note: None

Co-authored-by: yuzefovich <[email protected]>
@craig craig bot closed this as completed in #27140 Aug 6, 2018
@jordanlewis
Copy link
Member

Boom!!!

yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Aug 7, 2018
Adds memory accounting and stats collection to
windowers.

Related to: cockroachdb#24425.

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Aug 7, 2018
Using slices instead of maps gives significant
performance improvement. On several quiries I
ran with profiling, time spent while populating
output rows reduced from 10s or to 1-2s (in terms
of performance degradation: the difference went
down from 90-100% to about 20%).

Related to: cockroachdb#24425.

Release note: None
craig bot pushed a commit that referenced this issue Aug 7, 2018
27978: sql: improve handling of decimal zeroes r=mjibson a=mjibson

Previously we had inconsistent handling of decimal zeroes. Various forms
(-0, -0.00, +0.00) were correctly handled from a "what is this equal to"
perspective, but we were not consistent in keeping the digits after the
decimal point. Furthermore, postgres always removes negatives when making
-0 for ints and decimals (not floats).

This behavior (removing the negative from -0 automatically) existed in the
higher level apd tests but we were subverting it by using apd.Decimal.Neg
directly. apd has been updated to do this for Decimals now.

Our constant evaluation has also caused us some problems. We use the
go/constant package, but it evaluates numbers as floats. When we applied
the + operator our NumVal type would remove its OrigString representation
and thus use only the constant's Value, which, as a float, didn't contain
the trailing digits after the decimal (in cases like +0.000). Now the +
operator is a pass through.

The - operator remains the same, but has special logic to detect -0 and
leave the original string

The added logic tests have been compared to postgres and now we match
them exactly for many more things.

Of note is that, in postgres, `SELECT -0::float` returns -0 (in opposition
to the decimal behavior). This is why the test in TestParseDatumStringAs
removed only the -0 test for decimals and not floats.

Release note (sql change): More correct handling of decimal
0s. Specifically, -0 is coerced to 0 and values like 0.00 retain the
digits after the decimal point.

28145: sql: hopefully reduce the memory usage of TestDropWhileBackfill r=vivekmenezes a=vivekmenezes

related to #28084

Release note: None

28229: rangefeed: add resolved timestamp initialization scan and registration catch-up scan r=nvanbenschoten a=nvanbenschoten

This PR includes a number of small improvements to the `rangefeed` package, culminating in the introduction of two scans that a rangefeed will need to perform. The first is a scan over the entire key range from the previously known resolved timestamp onward to learn about all existing unresolved intents. The second is a scan over a new registration's key range to backfill in all values from before it was registered but after its starting timestamp. Both of these scans will be able to take advantage of TimeBound iterators. However, the way they're represented in the `rangefeed` package is pretty abstract, which made testing pretty easy.

The first commit and the last two are substantial, but the rest are just small logical changes so I hope this is easy to review.

@benesch thanks for offering to take a look at this while @tschottdorf is out! There's only been one PR merged so far so it shouldn't take much to catch up to speed: #28072.

28313: workload: revert --init idempotency if initial rows are loaded r=nvanbenschoten a=nvanbenschoten

Fixes #27981.

Release note: None

28326: cli: set a timeout delay for SQL connections r=knz a=knz

Fixes #20097.

This patch leverages a feature in the underlying SQL driver `lib/pq`
to set a connection timeout for all CLI commands that use a SQL
connection. This includes `cockroach sql`/`demo`, `cockroach node ls`,
`cockroach zone set`, etc.

The timeout is set to 5 seconds. If the client cannot connect within 5
seconds it fails with the following error:

```
Error: cannot dial server.
Is the server running?
If the server is running, check --host client-side and --advertise server-side.

read tcp [::1]:60205->[::1]:26257: i/o timeout
```

Release note (cli change): the various client commands that use a SQL
connection (including e.g. `cockroach sql`, `cockroach node`,
`cockroach user`) now produce an error if a connection could not be
established within 5 seconds instead of waiting forever.

28341: distsql: use slices for populating output of windowers r=yuzefovich a=yuzefovich

Using slices instead of maps gives significant
performance improvement. On several quiries I
ran with profiling, time spent while populating
output rows reduced from 10s or to 1-2s (in terms
of performance degradation: the difference went
down from 90-100% to about 20%).

Related to: #24425.

Release note: None

Co-authored-by: Matt Jibson <[email protected]>
Co-authored-by: Vivek Menezes <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: yuzefovich <[email protected]>
yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Aug 9, 2018
Adds memory accounting and stats collection to
windowers.

Related to: cockroachdb#24425.

Release note: None
craig bot pushed a commit that referenced this issue Aug 9, 2018
28328: distsql: add memory accounting and stats to windowers r=yuzefovich a=yuzefovich

Adds memory accounting and stats collection to
windowers.

Related to: #24425.

Release note: None

Co-authored-by: yuzefovich <[email protected]>
@yuzefovich
Copy link
Member

Here are benchmarks on 4 node roachprod cluster with TPCH dataset (after adding memory accounting and fixing some inefficiencies during population of output rows). Performance improvements are 2x and more. The numbers are even higher if a query has several window functions with the same partitioning scheme.

Please note that on 1 node cluster, the performance of DistSQL window functions will be worse than of local execution which is expected (decrease in performance is usually around 20%).

Comparison on orders table (1500000 rows)

Query Local DistSQL Difference
SELECT count(*) OVER (PARTITION BY o_orderdate) FROM orders ORDER BY o_orderkey LIMIT 10; 3.770270427s 1.579288617s -58.11%
SELECT sum(o_totalprice) OVER (PARTITION BY o_orderdate ORDER BY o_orderkey) FROM orders ORDER BY o_orderkey LIMIT 10; 6.319257886s 2.457607566s -61.11%
SELECT avg(o_totalprice) OVER (PARTITION BY o_orderdate ORDER BY o_orderkey) FROM orders ORDER BY o_orderkey LIMIT 10; 18.839226118s 5.485205625s -70.88%
SELECT avg(o_totalprice) OVER w, sum(o_custkey) OVER w FROM orders WINDOW w AS (PARTITION BY o_orderdate ORDER BY o_orderkey) ORDER BY o_orderkey LIMIT 10; 22.03540549s 6.03339383s -72.62%
SELECT count(*) OVER w1, rank() OVER w2, avg(o_totalprice) OVER w1, sum(o_custkey) OVER w2, max(o_shippriority) OVER w2 FROM orders WINDOW w1 AS (PARTITION BY o_orderdate), w2 AS (PARTITION BY o_orderpriority ORDER BY o_orderkey) ORDER BY o_orderkey LIMIT 10; 29.909022153s 10.799945246s -63.89%

Comparison on lineitem table (6001215 rows)

Query Local DistSQL Difference
SELECT count(*) OVER (PARTITION BY l_shipdate) FROM lineitem ORDER BY l_orderkey LIMIT 10; 15.852753941s 8.356666513s -47.29%
SELECT sum(l_quantity) OVER (PARTITION BY l_shipdate ORDER BY l_orderkey) FROM lineitem ORDER BY l_orderkey LIMIT 10; 28.248049396s 11.769740243s -58.33%
SELECT avg(l_quantity) OVER (PARTITION BY l_shipdate ORDER BY l_orderkey) FROM lineitem ORDER BY l_orderkey LIMIT 10; 77.418174762s 23.919147975s -69.10%
SELECT avg(l_quantity) OVER w, sum(l_linenumber) OVER w FROM lineitem WINDOW w AS (PARTITION BY l_receiptdate ORDER BY l_orderkey) ORDER BY l_orderkey LIMIT 10; OOM error 26.874768298s -infinity%
SELECT row_number() OVER w1, sum(l_discount) OVER w2, last_value(l_comment) OVER w1, min(l_suppkey) OVER w2 FROM lineitem WINDOW w1 AS (PARTITION BY l_shipmode ORDER BY l_orderkey), w2 AS (PARTITION BY l_commitdate) ORDER BY l_orderkey LIMIT 10; OOM error 34.302523343s -infinity%

@bladefist
Copy link

@yuzefovich is a stud. thank you so much for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

6 participants