Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: Add weak keys and rule to eliminate DISTINCT #24451

Merged
merged 2 commits into from
Apr 6, 2018

Conversation

andy-kimball
Copy link
Contributor

This PR contains two commits, the first to track weak keys in expressions, and the second to use that information to eliminate unnecessary DISTINCT group by operators.

@andy-kimball andy-kimball requested a review from a team as a code owner April 3, 2018 17:44
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rytaft
Copy link
Collaborator

rytaft commented Apr 3, 2018

:lgtm:


Reviewed 32 of 32 files at r1, 7 of 7 files at r2.
Review status: :shipit: all files reviewed at latest revision, all discussions resolved, all commit checks successful.


pkg/sql/opt/metadata_column.go, line 87 at r1 (raw file):

// WeakKeys are combinations of columns that form a weak key. No two non-null
// rows are equal if it contains columns from a weak key. For more details, see

it contains -> they contain


pkg/sql/opt/memo/logical_props.go, line 74 at r1 (raw file):

	// weak key is a key if "(WeakKeys[i] & NotNullCols) == WeakKeys[i]".
	//
	// An empty key set is valid (an empty key it implies there is at most one

key it implies -> key implies


pkg/sql/opt/memo/logical_props_factory.go, line 365 at r1 (raw file):

// filterWeakKeys ensures that each weak key is a subset of the output columns.
// It respects immutability by making a coy of the weak keys if they need to be

coy -> copy


pkg/sql/opt/memo/logical_props_factory.go, line 371 at r1 (raw file):

	var filtered opt.WeakKeys
	if !additional.Empty() {
		filtered = make(opt.WeakKeys, 0, len(props.WeakKeys))

why not make the capacity len(props.WeakKeys)+1?


Comments from Reviewable

@RaduBerinde
Copy link
Member

Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


pkg/sql/opt/memo/logical_props.go, line 63 at r1 (raw file):

	//
	// A column set is a key if no two rows are equal after projection onto that
	// set. A requirement for a column set to be a key is for no columns in the

I would add (under the assumption that NULL equals NULL) to the first sentence instead of the NULL != NULL sentence below.

Note that in some cases, we can have what for all intents and purposes is a (strong) key even though some columns are nullable. For example DISTINCT de-duplicates between NULLs like any other value so the output of DISTINCT always has a key, regardless of nullability. I doubt these cases are important enough to support (it would require keeping track of strong keys explicitly) but maybe we should keep the definition general.


pkg/sql/opt/memo/logical_props.go, line 69 at r1 (raw file):

	// NOT NULL).
	//
	// A weak key is a set of columns where no two rows containing non-NULL

This is ambiguous, they have to have non-NULL values just on these specific columns. Maybe define it like key but with the assumption that NULL does not equal NULL.


pkg/sql/opt/memo/logical_props.go, line 74 at r1 (raw file):

	// weak key is a key if "(WeakKeys[i] & NotNullCols) == WeakKeys[i]".
	//
	// An empty key set is valid (an empty key it implies there is at most one

I'd say just "empty key" (set suggests you're talking about the entire WeakKeys)


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: all files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


pkg/sql/opt/memo/testdata/logprops/groupby, line 56 at r1 (raw file):

 ├── columns: column5:5(decimal) column6:6(int)
 ├── stats: [rows=1]
 ├── keys: ()

What does this mean? A key with no columns?


pkg/sql/opt/memo/testdata/logprops/join, line 136 at r1 (raw file):

 ├── columns: x:1(int!null) y:2(int) s:3(string) d:4(decimal!null) x:5(int!null) y:6(int) s:7(string) d:8(decimal!null)
 ├── stats: [rows=1000000]
 ├── keys: (1) weak(3,4) (5) weak(7,8)

Is this correct?

CREATE DATABASE t;
SET DATABASE = t;
CREATE TABLE t (
  k INT PRIMARY KEY
);
INSERT INTO t VALUES (1), (2);
SELECT * FROM t AS a, t AS b;

The SELECT produces:

+---+---+
| k | k |
+---+---+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
+---+---+

Comments from Reviewable

@RaduBerinde
Copy link
Member

Review status: all files reviewed at latest revision, 9 unresolved discussions, all commit checks successful.


pkg/sql/opt/memo/testdata/logprops/groupby, line 56 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

What does this mean? A key with no columns?

yeah. empty key = single row


Comments from Reviewable

@RaduBerinde
Copy link
Member

Review status: all files reviewed at latest revision, 9 unresolved discussions, all commit checks successful.


pkg/sql/opt/memo/logical_props_factory.go, line 243 at r1 (raw file):

		// add new weak key that consists of all the grouping columns.
		props.Relational.WeakKeys = inputProps.WeakKeys
		filterWeakKeys(props.Relational, groupingColSet)

Here we actually want to add groupingColSet even if it is empty no?


pkg/sql/opt/memo/logical_props_factory.go, line 389 at r1 (raw file):

	}

	if !additional.Empty() {

If all keys were "good", filtered is nil here, we do the Add on it and we are effectively dropping all the other keys.

I don't think the minor optimization is worth the additional argument (which has dubious semantics because an empty key is valid in other contexts). Callers could just do props.WeakKeys.Add(xx) after calling the function. We could allocate filtered to cap(props.WeakKeys) instead of len.


Comments from Reviewable

@RaduBerinde
Copy link
Member

pkg/sql/opt/memo/logical_props_factory.go, line 389 at r1 (raw file):

Previously, RaduBerinde wrote…

If all keys were "good", filtered is nil here, we do the Add on it and we are effectively dropping all the other keys.

I don't think the minor optimization is worth the additional argument (which has dubious semantics because an empty key is valid in other contexts). Callers could just do props.WeakKeys.Add(xx) after calling the function. We could allocate filtered to cap(props.WeakKeys) instead of len.

Ah, sorry, if additional is non-empty, filtered can't be nil here.


Comments from Reviewable

@RaduBerinde
Copy link
Member

pkg/sql/opt/memo/logical_props_factory.go, line 243 at r1 (raw file):

Previously, RaduBerinde wrote…

Here we actually want to add groupingColSet even if it is empty no?

Sorry, missed that we already checked for that above.


Comments from Reviewable

@RaduBerinde
Copy link
Member

:lgtm:


Review status: 17 of 39 files reviewed at latest revision, 11 unresolved discussions, all commit checks successful.


pkg/sql/opt/memo/logical_props_factory.go, line 196 at r4 (raw file):

		props.Relational.NotNullCols.UnionWith(leftProps.NotNullCols)
	}

[nit] Add a TODO to detect keys when we can (for example, if the equality columns are keys on both sides)


pkg/sql/opt/memo/logical_props_factory.go, line 228 at r4 (raw file):

		props.Relational.Stats.RowCount = 1
	} else {
		// The grouping columns always form a key because the GroupBy operation

Nice.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 17 of 39 files reviewed at latest revision, 13 unresolved discussions, all commit checks successful.


pkg/sql/opt/memo/logical_props_factory.go, line 196 at r4 (raw file):

Previously, RaduBerinde wrote…

[nit] Add a TODO to detect keys when we can (for example, if the equality columns are keys on both sides)

+1. I think there are some other cases here too. For instance, an inner join where the join condition is a key in either the left or right relation. I'm not sure if I have that right. I haven't thought about how keys propagate through joins recently and I punted on doing this in opttoy. Might be worthwhile searching literature or other systems to see if they have a list of rules.


Comments from Reviewable

@RaduBerinde
Copy link
Member

pkg/sql/opt/memo/logical_props_factory.go, line 196 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

+1. I think there are some other cases here too. For instance, an inner join where the join condition is a key in either the left or right relation. I'm not sure if I have that right. I haven't thought about how keys propagate through joins recently and I punted on doing this in opttoy. Might be worthwhile searching literature or other systems to see if they have a list of rules.

You can only do that if you know that each row on the key side won't get matched with multiple rows on the other side. E.g. ab INNER JOIN ac USING (a): even if a is a key in ab, it won't be a key in the result if ac has multiple rows with the same a value.

One way to know that is if it's a key on both sides, there may be others..


Comments from Reviewable

Add a new WeakKeys logical property. Weak keys are derived from
unique indexes and sometimes from operators (e.g. GroupBy). Cache
the weak keys in metadata and propagate them through various
relational operators.

Release note: None
@andy-kimball
Copy link
Contributor Author

Review status: 17 of 39 files reviewed at latest revision, 13 unresolved discussions, all commit checks successful.


pkg/sql/opt/metadata_column.go, line 87 at r1 (raw file):

Previously, rytaft wrote…

it contains -> they contain

Done.


pkg/sql/opt/memo/logical_props.go, line 63 at r1 (raw file):

Previously, RaduBerinde wrote…

I would add (under the assumption that NULL equals NULL) to the first sentence instead of the NULL != NULL sentence below.

Note that in some cases, we can have what for all intents and purposes is a (strong) key even though some columns are nullable. For example DISTINCT de-duplicates between NULLs like any other value so the output of DISTINCT always has a key, regardless of nullability. I doubt these cases are important enough to support (it would require keeping track of strong keys explicitly) but maybe we should keep the definition general.

I grabbed this definition from opttoy. I did some tweaking to it to incorporate your feedback.


pkg/sql/opt/memo/logical_props.go, line 69 at r1 (raw file):

Previously, RaduBerinde wrote…

This is ambiguous, they have to have non-NULL values just on these specific columns. Maybe define it like key but with the assumption that NULL does not equal NULL.

See if my comment rewrite addresses your concerns.


pkg/sql/opt/memo/logical_props.go, line 74 at r1 (raw file):

Previously, rytaft wrote…

key it implies -> key implies

Done.


pkg/sql/opt/memo/logical_props.go, line 74 at r1 (raw file):

Previously, RaduBerinde wrote…

I'd say just "empty key" (set suggests you're talking about the entire WeakKeys)

Done.


pkg/sql/opt/memo/logical_props_factory.go, line 365 at r1 (raw file):

Previously, rytaft wrote…

coy -> copy

Done.


pkg/sql/opt/memo/logical_props_factory.go, line 371 at r1 (raw file):

Previously, rytaft wrote…

why not make the capacity len(props.WeakKeys)+1?

I got rid of this for other reasons.


pkg/sql/opt/memo/logical_props_factory.go, line 389 at r1 (raw file):

Previously, RaduBerinde wrote…

Ah, sorry, if additional is non-empty, filtered can't be nil here.

I changed the logic to do it differently. See what you think.


pkg/sql/opt/memo/logical_props_factory.go, line 196 at r4 (raw file):

Previously, RaduBerinde wrote…

You can only do that if you know that each row on the key side won't get matched with multiple rows on the other side. E.g. ab INNER JOIN ac USING (a): even if a is a key in ab, it won't be a key in the result if ac has multiple rows with the same a value.

One way to know that is if it's a key on both sides, there may be others..

Done.


pkg/sql/opt/memo/testdata/logprops/groupby, line 56 at r1 (raw file):

Previously, RaduBerinde wrote…

yeah. empty key = single row

	// An empty key is valid (an empty key implies there is at most one row). Note
	// that an empty key is always the only key in the set, since it's a subset of
	// every other key (i.e. every other key would be redundant).

pkg/sql/opt/memo/testdata/logprops/join, line 136 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Is this correct?

CREATE DATABASE t;
SET DATABASE = t;
CREATE TABLE t (
  k INT PRIMARY KEY
);
INSERT INTO t VALUES (1), (2);
SELECT * FROM t AS a, t AS b;

The SELECT produces:

+---+---+
| k | k |
+---+---+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
+---+---+

Good catch, it's wrong. It's only OK in case where the inner-join predicate compares two weak keys for equality. For now, I'll just remove the join weakkey logic, and we can circle back once we start taking predicates into account.


Comments from Reviewable

Add a new EliminateDistinct rule. EliminateDistinct discards a
GroupBy operator that is eliminating duplicate rows by using
grouping columns that are statically known to form a strong key. By
definition, a strong key does not allow duplicate values, so the
GroupBy is redundant and can be eliminated.

Release note: None
@RaduBerinde
Copy link
Member

Reviewed 17 of 32 files at r1, 14 of 22 files at r3, 8 of 8 files at r6.
Review status: all files reviewed at latest revision, 10 unresolved discussions, all commit checks successful.


Comments from Reviewable

@andy-kimball andy-kimball merged commit cc882bf into cockroachdb:master Apr 6, 2018
@andy-kimball andy-kimball deleted the weakkey branch June 8, 2018 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants