Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Resolve attributes recursively for improved subquery support #69765

Merged
merged 9 commits into from
Mar 11, 2021

Conversation

palesz
Copy link
Contributor

@palesz palesz commented Mar 2, 2021

Previously we did not resolve the attributes recursively which meant that if a field or expression was re-aliased multiple times (through multiple levels of subqueries), the aliases were only resolved one level down. This led to failed query translation because ReferenceAttributes were pointing to non-existing attributes during query translation.

For example the query

SELECT i AS j FROM ( SELECT int AS i FROM test) ORDER BY j

failed during translation because the OrderBy resolved the j ReferenceAttribute to another i ReferenceAttribute that was later removed by an Optimization:

OrderBy[[Order[j{r}#4,ASC,LAST]]]                                             ! OrderBy[[Order[i{r}#2,ASC,LAST]]]
\_Project[[j]]                                                                = \_Project[[j]]
  \_Project[[i]]                                                              !   \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..]
    \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! 

By resolving the Attributes recursively both j{r} and i{r} will resolve to test.int{f} above:

OrderBy[[Order[test.int{f}#22,ASC,LAST]]]                                     = OrderBy[[Order[test.int{f}#22,ASC,LAST]]]
\_Project[[j]]                                                                = \_Project[[j]]
  \_Project[[i]]                                                              !   \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..]
    \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! 

The scope of recursive resolution depends on how the AttributeMap is constructed and populated.

Fixes #67237

Code simplification in the AttributeMap
@palesz palesz force-pushed the subquery-npe-fix branch from f80b9de to aba2f63 Compare March 2, 2021 17:26
@palesz palesz changed the title SQL: Resolve attributes recursively to improve subquery support SQL: Resolve attributes recursively for improved subquery support Mar 2, 2021
@palesz palesz added >bug :Analytics/SQL SQL querying Team:QL (Deprecated) Meta label for query languages team labels Mar 2, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@palesz palesz requested review from astefan, costin, matriv and bpintea March 3, 2021 14:23
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see progress on this however the breakage to the map contract isn't the way forward (see my comments).

Comment on lines 251 to 276
/**
* @param key {@link NamedExpression} to look up
* @return Looks up the key in the AttributeMap recursively, meaning if the value for this key
* is another key in the map, it will follow it. It will return the final value or null if the key
* does not exist in the map.
*/
@Override
public E get(Object key) {
if (key instanceof NamedExpression) {
return delegate.get(new AttributeWrapper(((NamedExpression) key).toAttribute()));
}
return null;
return getOrDefault(key, null);
}

/**
* @param key {@link NamedExpression} to look up
* @return Looks up the key in the AttributeMap recursively, meaning if the value for this key
* is another key in the map, it will follow it. It will return the final value or the
* specified default value if the key does not exist in the map.
*/
@Override
public E getOrDefault(Object key, E defaultValue) {
E e;
return (((e = get(key)) != null) || containsKey(key))
? e
: defaultValue;
E candidate = defaultValue;
E value = null;
while (((value = lookup(key)) != null || containsKey(key)) && value != key) {
key = candidate = value;
}
return candidate;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is backwards.
getOrDefault should delegate to get not vice-versa. The former is an extension of the latter historically and conceptually.

Moreover by making get and getOrDefault recursive the Map contract has been fundamentally broken:

m.put(a, b);
assertTrue(b, m.get(a));
m.put(b,c);
assertTrue(b, m.get(a)); // fails

Instead of renaming get to lookup and modifying the contract, simply introduce a separate method (e.g. traverse(key), getResolvedValue(key), getRecursive(key) or something else) to the original map that does the recursive lookup and use that instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree as well on this, I'd prefer to have a dedicated method that does this more complex logic rather than overriding the default get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, but maybe the solution here is not just a simple rename of the method (resolve(key)), but to not implement the Map<Attribute, E> interface anymore. The Map contract was broken since the beginning because instead of equals() the semanticEquals() is used for key equality checks (which for example violates "containsKey(key) returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k)).")
After noget()s are required anymore, unless you can think of a place where recursive key resolution would actually cause harm (if that is the case, I'd say the AttributeMap building should be fixed instead to limit the scope from where the attributes are picked up from).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttributeMap is a Map for AttributeWrappers; the get, contains rely on equals. For convenience the underlying class is not exposed instead Attribute is used with the wrapping happening automatically.
It's similar to IdentityHashMap for example.

resolve(key) is a compound method that relies on get, the latter might not be used publicly now however I see too many downsides in removing it (and the Map contract) instead of keeping it.
For example get/put do per key read/write but resolve, which does multi-key read has not write counter-party; removing/replacing get breaks too many assumptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the downsides or the breaking assumptions, since we barely utilize anything from the Map interface, practically only some read methods: isEmpty, size and get/getOrDefault (which can be renamed to resolve). The building of the map (puts) happen through the Builder methods today, here is how a full minimalization of the public methods would look like (at this point the AttributeMap is rather an AttributeResolver): palesz@fe814ec

Nonetheless, I can keep the current Map implementation and create a new resolve method for the recursive resolution, switch out all the get/getOrDefault calls to resolve and leave the gets there between the other unused methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -316,10 +316,11 @@ private static boolean checkGroupByOrder(LogicalPlan p, Set<Failure> localFailur
Map<Expression, Node<?>> missing = new LinkedHashMap<>();

o.order().forEach(oe -> {
Expression e = oe.child();
final Expression e = oe.child();
final Expression resolvedE = attributeRefs.getOrDefault(e, e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above - it's misleading for a map lookup to do resolution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 133 to 138
Batch refs = new Batch("Replace References", Limiter.ONCE,
new ReplaceReferenceAttributeWithSource()
);
new ReplaceReferenceAttributeWithSource()
);

Batch operators = new Batch("Operator Optimization",
// combining
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -222,7 +222,7 @@ public LogicalPlan apply(LogicalPlan plan) {
AttributeMap.Builder<Expression> builder = AttributeMap.builder();
// collect aliases
plan.forEachExpressionUp(Alias.class, a -> builder.put(a.toAttribute(), a.child()));
final Map<Attribute, Expression> collectRefs = builder.build();
final AttributeMap<Expression> collectRefs = builder.build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -600,10 +600,12 @@ else if (target.foldable()) {
else {
GroupByKey matchingGroup = null;
if (groupingContext != null) {
matchingGroup = groupingContext.groupFor(target);
final Expression resolvedTarget = queryC.aliases().getOrDefault(target, target);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit of code is confusing due to the left overs.
Target doesn't seem to be used anywhere or at least its initial value is no good - instead of adding a resolvedTarget just reinitialize it:
target = queryC.aliases().getOrDefault(target, target)

Same for id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -2658,4 +2657,29 @@ public void testSubqueryWithAliasOrderByAlias() throws Exception {
"( SELECT int AS i FROM test ) AS s " +
"ORDER BY s.i > 10");
}

public void testReferenceResolutionInSubqueries() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please break this method into multiple tests that indicate the difference between them in their name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Andras Palinkas added 2 commits March 5, 2021 19:32
Recursive get/getDefault() to resolve() in AttributeMap
Removed noise
Test breakdown
@palesz palesz requested a review from costin March 6, 2021 00:38
Comment on lines 269 to 278
@SuppressWarnings({ "unchecked" })
public E resolve(Object key, E defaultValue) {
AttributeMap<Object> map = (AttributeMap<Object>) this;
Object candidate = defaultValue;
Object value = null;
while ((value = map.getOrDefault(key, NOT_FOUND)) != NOT_FOUND && value != key) {
key = candidate = value;
}
return (E) candidate;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this method builds on other types methods in this class, it shouldn't have unchecked warnings nor used NOT_FOUND. Also not sure why the candidate is returned instead of value, should be the other way around.

Further more this implementation looks buggy: if the map contains (e,e) calling resolve(e, default) would incorrectly return default and not e.
Further more there's no handling of cycles which leads to an infinite loop: (a, b), (b, a).

Below a suggestion to fix both issues:

Suggested change
@SuppressWarnings({ "unchecked" })
public E resolve(Object key, E defaultValue) {
AttributeMap<Object> map = (AttributeMap<Object>) this;
Object candidate = defaultValue;
Object value = null;
while ((value = map.getOrDefault(key, NOT_FOUND)) != NOT_FOUND && value != key) {
key = candidate = value;
}
return (E) candidate;
}
public E resolveOrDefault(Object key, E defaultValue) {
E value = defaultValue;
E candidate;
int allowedLookups = 10;
while (key != value && ((candidate = get(key)) != null || containsKey(key))) {
if (--allowedLookups == 0) {
throw new QlIllegalArgumentException("Potential cycle detected");
}
key = candidate;
value = candidate;
}
return value;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch with the default value. I was thinking of adding the limitation on the number of lookups, but seemed we would need quite a huge bug in our AttributeMap population for it to happen with more than one hops in the cycle. Anyways, I added this defensive check.

The suggested code won't really work, because:

1.resolveOrDefault(e, e) will always return e even if the map has e -> f mapping.
2. The loop will always terminate after the first lookup (key == value == candidate)

Anyways, the updated code fixes the bug you found.

@@ -61,6 +63,30 @@ public void testAttributeMapWithSameAliasesCanResolveAttributes() {
assertTrue(newAttributeMap.get(param2.toAttribute()) == param2.child());
}

public void testResolve() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see more tests for dealing with:
a. same key, value association - (e,e)
b. handling of default value across association: resolveOrDefault(k, default) for a map with (k, default), (default, value) and also resolveOrDefault(k, k)
c. handling of cycles: (a, b), (b, c), (c,a ) with a mixture of default value in place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✓ Done.

@@ -406,7 +411,7 @@ private static boolean checkGroupByHavingHasOnlyAggs(Expression e, Set<Expressio

// resolve FunctionAttribute to backing functions
if (e instanceof ReferenceAttribute) {
e = attributeRefs.get(e);
e = attributeRefs.resolve(e, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null variant is used enough times to make a resolve method, in addition to resolveOrDefault on the main class similar to get and getOrDefault (even if it's just syntactic sugar).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✓ Done.

@palesz palesz requested a review from costin March 10, 2021 23:32
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (--allowedLookups == 0) {
throw new QlIllegalArgumentException("Potential cycle detected");
}
key = value = candidate;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please break the assignment in two pieces.

@palesz palesz merged commit 6e1eb44 into elastic:master Mar 11, 2021
palesz pushed a commit to palesz/elasticsearch that referenced this pull request Mar 11, 2021
…astic#69765)

Previously we did not resolve the attributes recursively which meant that if a field or expression was re-aliased multiple times (through multiple levels of subqueries), the aliases were only resolved one level down. This led to failed query translation because `ReferenceAttribute`s were pointing to non-existing attributes during query translation.

For example the query

```sql
SELECT i AS j FROM ( SELECT int AS i FROM test) ORDER BY j
```

failed during translation because the `OrderBy` resolved the `j` ReferenceAttribute to another `i` ReferenceAttribute that was later removed by an Optimization:

```
OrderBy[[Order[j{r}elastic#4,ASC,LAST]]]                                             ! OrderBy[[Order[i{r}elastic#2,ASC,LAST]]]
\_Project[[j]]                                                                = \_Project[[j]]
  \_Project[[i]]                                                              !   \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..]
    \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..] ! 
```

By resolving the `Attributes` recursively both `j{r}` and `i{r}` will resolve to `test.int{f}` above:

```
OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]]                                     = OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]]
\_Project[[j]]                                                                = \_Project[[j]]
  \_Project[[i]]                                                              !   \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..]
    \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..] ! 
 ```

The scope of recursive resolution depends on how the `AttributeMap` is constructed and populated.

Fixes elastic#67237
palesz pushed a commit to palesz/elasticsearch that referenced this pull request Mar 11, 2021
…astic#69765)

Previously we did not resolve the attributes recursively which meant that if a field or expression was re-aliased multiple times (through multiple levels of subqueries), the aliases were only resolved one level down. This led to failed query translation because `ReferenceAttribute`s were pointing to non-existing attributes during query translation.

For example the query

```sql
SELECT i AS j FROM ( SELECT int AS i FROM test) ORDER BY j
```

failed during translation because the `OrderBy` resolved the `j` ReferenceAttribute to another `i` ReferenceAttribute that was later removed by an Optimization:

```
OrderBy[[Order[j{r}elastic#4,ASC,LAST]]]                                             ! OrderBy[[Order[i{r}elastic#2,ASC,LAST]]]
\_Project[[j]]                                                                = \_Project[[j]]
  \_Project[[i]]                                                              !   \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..]
    \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..] ! 
```

By resolving the `Attributes` recursively both `j{r}` and `i{r}` will resolve to `test.int{f}` above:

```
OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]]                                     = OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]]
\_Project[[j]]                                                                = \_Project[[j]]
  \_Project[[i]]                                                              !   \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..]
    \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..] ! 
 ```

The scope of recursive resolution depends on how the `AttributeMap` is constructed and populated.

Fixes elastic#67237
palesz pushed a commit that referenced this pull request Mar 11, 2021
…9765) (#70325)

Previously we did not resolve the attributes recursively which meant that if a field or expression was re-aliased multiple times (through multiple levels of subqueries), the aliases were only resolved one level down. This led to failed query translation because `ReferenceAttribute`s were pointing to non-existing attributes during query translation.

For example the query

```sql
SELECT i AS j FROM ( SELECT int AS i FROM test) ORDER BY j
```

failed during translation because the `OrderBy` resolved the `j` ReferenceAttribute to another `i` ReferenceAttribute that was later removed by an Optimization:

```
OrderBy[[Order[j{r}#4,ASC,LAST]]]                                             ! OrderBy[[Order[i{r}#2,ASC,LAST]]]
\_Project[[j]]                                                                = \_Project[[j]]
  \_Project[[i]]                                                              !   \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..]
    \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! 
```

By resolving the `Attributes` recursively both `j{r}` and `i{r}` will resolve to `test.int{f}` above:

```
OrderBy[[Order[test.int{f}#22,ASC,LAST]]]                                     = OrderBy[[Order[test.int{f}#22,ASC,LAST]]]
\_Project[[j]]                                                                = \_Project[[j]]
  \_Project[[i]]                                                              !   \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..]
    \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! 
 ```

The scope of recursive resolution depends on how the `AttributeMap` is constructed and populated.

Fixes #67237
palesz pushed a commit that referenced this pull request Mar 11, 2021
…9765) (#70322)

Previously we did not resolve the attributes recursively which meant that if a field or expression was re-aliased multiple times (through multiple levels of subqueries), the aliases were only resolved one level down. This led to failed query translation because `ReferenceAttribute`s were pointing to non-existing attributes during query translation.

For example the query

```sql
SELECT i AS j FROM ( SELECT int AS i FROM test) ORDER BY j
```

failed during translation because the `OrderBy` resolved the `j` ReferenceAttribute to another `i` ReferenceAttribute that was later removed by an Optimization:

```
OrderBy[[Order[j{r}#4,ASC,LAST]]]                                             ! OrderBy[[Order[i{r}#2,ASC,LAST]]]
\_Project[[j]]                                                                = \_Project[[j]]
  \_Project[[i]]                                                              !   \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..]
    \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! 
```

By resolving the `Attributes` recursively both `j{r}` and `i{r}` will resolve to `test.int{f}` above:

```
OrderBy[[Order[test.int{f}#22,ASC,LAST]]]                                     = OrderBy[[Order[test.int{f}#22,ASC,LAST]]]
\_Project[[j]]                                                                = \_Project[[j]]
  \_Project[[i]]                                                              !   \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..]
    \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! 
 ```

The scope of recursive resolution depends on how the `AttributeMap` is constructed and populated.

Fixes #67237
@gwbrown gwbrown removed the :Analytics/EQL EQL querying label Apr 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL: NPE with ORDER BY re-aliased reference from subquery
6 participants