-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Resolve attributes recursively for improved subquery support #69765
Conversation
Code simplification in the AttributeMap
Pinging @elastic/es-ql (Team:QL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see progress on this however the breakage to the map contract isn't the way forward (see my comments).
/** | ||
* @param key {@link NamedExpression} to look up | ||
* @return Looks up the key in the AttributeMap recursively, meaning if the value for this key | ||
* is another key in the map, it will follow it. It will return the final value or null if the key | ||
* does not exist in the map. | ||
*/ | ||
@Override | ||
public E get(Object key) { | ||
if (key instanceof NamedExpression) { | ||
return delegate.get(new AttributeWrapper(((NamedExpression) key).toAttribute())); | ||
} | ||
return null; | ||
return getOrDefault(key, null); | ||
} | ||
|
||
/** | ||
* @param key {@link NamedExpression} to look up | ||
* @return Looks up the key in the AttributeMap recursively, meaning if the value for this key | ||
* is another key in the map, it will follow it. It will return the final value or the | ||
* specified default value if the key does not exist in the map. | ||
*/ | ||
@Override | ||
public E getOrDefault(Object key, E defaultValue) { | ||
E e; | ||
return (((e = get(key)) != null) || containsKey(key)) | ||
? e | ||
: defaultValue; | ||
E candidate = defaultValue; | ||
E value = null; | ||
while (((value = lookup(key)) != null || containsKey(key)) && value != key) { | ||
key = candidate = value; | ||
} | ||
return candidate; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is backwards.
getOrDefault should delegate to get not vice-versa. The former is an extension of the latter historically and conceptually.
Moreover by making get and getOrDefault recursive the Map contract has been fundamentally broken:
m.put(a, b);
assertTrue(b, m.get(a));
m.put(b,c);
assertTrue(b, m.get(a)); // fails
Instead of renaming get to lookup
and modifying the contract, simply introduce a separate method (e.g. traverse(key)
, getResolvedValue(key)
, getRecursive(key)
or something else) to the original map that does the recursive lookup and use that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree as well on this, I'd prefer to have a dedicated method that does this more complex logic rather than overriding the default get
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points, but maybe the solution here is not just a simple rename of the method (resolve(key)
), but to not implement the Map<Attribute, E>
interface anymore. The Map contract was broken since the beginning because instead of equals()
the semanticEquals()
is used for key equality checks (which for example violates "containsKey(key) returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k)).")
After noget()
s are required anymore, unless you can think of a place where recursive key resolution would actually cause harm (if that is the case, I'd say the AttributeMap building should be fixed instead to limit the scope from where the attributes are picked up from).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AttributeMap
is a Map
for AttributeWrapper
s; the get
, contains
rely on equals
. For convenience the underlying class is not exposed instead Attribute
is used with the wrapping happening automatically.
It's similar to IdentityHashMap
for example.
resolve(key)
is a compound method that relies on get
, the latter might not be used publicly now however I see too many downsides in removing it (and the Map
contract) instead of keeping it.
For example get
/put
do per key read/write but resolve
, which does multi-key read has not write counter-party; removing/replacing get
breaks too many assumptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about the downsides or the breaking assumptions, since we barely utilize anything from the Map interface, practically only some read methods: isEmpty
, size
and get/getOrDefault
(which can be renamed to resolve
). The building of the map (put
s) happen through the Builder
methods today, here is how a full minimalization of the public methods would look like (at this point the AttributeMap
is rather an AttributeResolver
): palesz@fe814ec
Nonetheless, I can keep the current Map
implementation and create a new resolve
method for the recursive resolution, switch out all the get/getOrDefault
calls to resolve
and leave the get
s there between the other unused methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -316,10 +316,11 @@ private static boolean checkGroupByOrder(LogicalPlan p, Set<Failure> localFailur | |||
Map<Expression, Node<?>> missing = new LinkedHashMap<>(); | |||
|
|||
o.order().forEach(oe -> { | |||
Expression e = oe.child(); | |||
final Expression e = oe.child(); | |||
final Expression resolvedE = attributeRefs.getOrDefault(e, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above - it's misleading for a map lookup to do resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Batch refs = new Batch("Replace References", Limiter.ONCE, | ||
new ReplaceReferenceAttributeWithSource() | ||
); | ||
new ReplaceReferenceAttributeWithSource() | ||
); | ||
|
||
Batch operators = new Batch("Operator Optimization", | ||
// combining |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -222,7 +222,7 @@ public LogicalPlan apply(LogicalPlan plan) { | |||
AttributeMap.Builder<Expression> builder = AttributeMap.builder(); | |||
// collect aliases | |||
plan.forEachExpressionUp(Alias.class, a -> builder.put(a.toAttribute(), a.child())); | |||
final Map<Attribute, Expression> collectRefs = builder.build(); | |||
final AttributeMap<Expression> collectRefs = builder.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -600,10 +600,12 @@ else if (target.foldable()) { | |||
else { | |||
GroupByKey matchingGroup = null; | |||
if (groupingContext != null) { | |||
matchingGroup = groupingContext.groupFor(target); | |||
final Expression resolvedTarget = queryC.aliases().getOrDefault(target, target); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit of code is confusing due to the left overs.
Target doesn't seem to be used anywhere or at least its initial value is no good - instead of adding a resolvedTarget
just reinitialize it:
target = queryC.aliases().getOrDefault(target, target)
Same for id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -2658,4 +2657,29 @@ public void testSubqueryWithAliasOrderByAlias() throws Exception { | |||
"( SELECT int AS i FROM test ) AS s " + | |||
"ORDER BY s.i > 10"); | |||
} | |||
|
|||
public void testReferenceResolutionInSubqueries() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please break this method into multiple tests that indicate the difference between them in their name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Recursive get/getDefault() to resolve() in AttributeMap Removed noise Test breakdown
@SuppressWarnings({ "unchecked" }) | ||
public E resolve(Object key, E defaultValue) { | ||
AttributeMap<Object> map = (AttributeMap<Object>) this; | ||
Object candidate = defaultValue; | ||
Object value = null; | ||
while ((value = map.getOrDefault(key, NOT_FOUND)) != NOT_FOUND && value != key) { | ||
key = candidate = value; | ||
} | ||
return (E) candidate; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this method builds on other types methods in this class, it shouldn't have unchecked warnings nor used NOT_FOUND
. Also not sure why the candidate
is returned instead of value
, should be the other way around.
Further more this implementation looks buggy: if the map contains (e,e)
calling resolve(e, default) would incorrectly return default
and not e
.
Further more there's no handling of cycles which leads to an infinite loop: (a, b), (b, a).
Below a suggestion to fix both issues:
@SuppressWarnings({ "unchecked" }) | |
public E resolve(Object key, E defaultValue) { | |
AttributeMap<Object> map = (AttributeMap<Object>) this; | |
Object candidate = defaultValue; | |
Object value = null; | |
while ((value = map.getOrDefault(key, NOT_FOUND)) != NOT_FOUND && value != key) { | |
key = candidate = value; | |
} | |
return (E) candidate; | |
} | |
public E resolveOrDefault(Object key, E defaultValue) { | |
E value = defaultValue; | |
E candidate; | |
int allowedLookups = 10; | |
while (key != value && ((candidate = get(key)) != null || containsKey(key))) { | |
if (--allowedLookups == 0) { | |
throw new QlIllegalArgumentException("Potential cycle detected"); | |
} | |
key = candidate; | |
value = candidate; | |
} | |
return value; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch with the default value. I was thinking of adding the limitation on the number of lookups, but seemed we would need quite a huge bug in our AttributeMap population for it to happen with more than one hops in the cycle. Anyways, I added this defensive check.
The suggested code won't really work, because:
1.resolveOrDefault(e, e)
will always return e
even if the map has e -> f
mapping.
2. The loop will always terminate after the first lookup (key == value == candidate
)
Anyways, the updated code fixes the bug you found.
@@ -61,6 +63,30 @@ public void testAttributeMapWithSameAliasesCanResolveAttributes() { | |||
assertTrue(newAttributeMap.get(param2.toAttribute()) == param2.child()); | |||
} | |||
|
|||
public void testResolve() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see more tests for dealing with:
a. same key, value association - (e,e)
b. handling of default value across association: resolveOrDefault(k, default) for a map with (k, default), (default, value) and also resolveOrDefault(k, k)
c. handling of cycles: (a, b), (b, c), (c,a ) with a mixture of default value in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✓ Done.
@@ -406,7 +411,7 @@ private static boolean checkGroupByHavingHasOnlyAggs(Expression e, Set<Expressio | |||
|
|||
// resolve FunctionAttribute to backing functions | |||
if (e instanceof ReferenceAttribute) { | |||
e = attributeRefs.get(e); | |||
e = attributeRefs.resolve(e, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The null variant is used enough times to make a resolve
method, in addition to resolveOrDefault
on the main class similar to get
and getOrDefault
(even if it's just syntactic sugar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✓ Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if (--allowedLookups == 0) { | ||
throw new QlIllegalArgumentException("Potential cycle detected"); | ||
} | ||
key = value = candidate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please break the assignment in two pieces.
…astic#69765) Previously we did not resolve the attributes recursively which meant that if a field or expression was re-aliased multiple times (through multiple levels of subqueries), the aliases were only resolved one level down. This led to failed query translation because `ReferenceAttribute`s were pointing to non-existing attributes during query translation. For example the query ```sql SELECT i AS j FROM ( SELECT int AS i FROM test) ORDER BY j ``` failed during translation because the `OrderBy` resolved the `j` ReferenceAttribute to another `i` ReferenceAttribute that was later removed by an Optimization: ``` OrderBy[[Order[j{r}elastic#4,ASC,LAST]]] ! OrderBy[[Order[i{r}elastic#2,ASC,LAST]]] \_Project[[j]] = \_Project[[j]] \_Project[[i]] ! \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..] \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..] ! ``` By resolving the `Attributes` recursively both `j{r}` and `i{r}` will resolve to `test.int{f}` above: ``` OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]] = OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]] \_Project[[j]] = \_Project[[j]] \_Project[[i]] ! \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..] \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..] ! ``` The scope of recursive resolution depends on how the `AttributeMap` is constructed and populated. Fixes elastic#67237
…astic#69765) Previously we did not resolve the attributes recursively which meant that if a field or expression was re-aliased multiple times (through multiple levels of subqueries), the aliases were only resolved one level down. This led to failed query translation because `ReferenceAttribute`s were pointing to non-existing attributes during query translation. For example the query ```sql SELECT i AS j FROM ( SELECT int AS i FROM test) ORDER BY j ``` failed during translation because the `OrderBy` resolved the `j` ReferenceAttribute to another `i` ReferenceAttribute that was later removed by an Optimization: ``` OrderBy[[Order[j{r}elastic#4,ASC,LAST]]] ! OrderBy[[Order[i{r}elastic#2,ASC,LAST]]] \_Project[[j]] = \_Project[[j]] \_Project[[i]] ! \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..] \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..] ! ``` By resolving the `Attributes` recursively both `j{r}` and `i{r}` will resolve to `test.int{f}` above: ``` OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]] = OrderBy[[Order[test.int{f}elastic#22,ASC,LAST]]] \_Project[[j]] = \_Project[[j]] \_Project[[i]] ! \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..] \_EsRelation[test][date{f}elastic#6, some{f}elastic#7, some.string{f}elastic#8, some.string..] ! ``` The scope of recursive resolution depends on how the `AttributeMap` is constructed and populated. Fixes elastic#67237
…9765) (#70325) Previously we did not resolve the attributes recursively which meant that if a field or expression was re-aliased multiple times (through multiple levels of subqueries), the aliases were only resolved one level down. This led to failed query translation because `ReferenceAttribute`s were pointing to non-existing attributes during query translation. For example the query ```sql SELECT i AS j FROM ( SELECT int AS i FROM test) ORDER BY j ``` failed during translation because the `OrderBy` resolved the `j` ReferenceAttribute to another `i` ReferenceAttribute that was later removed by an Optimization: ``` OrderBy[[Order[j{r}#4,ASC,LAST]]] ! OrderBy[[Order[i{r}#2,ASC,LAST]]] \_Project[[j]] = \_Project[[j]] \_Project[[i]] ! \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! ``` By resolving the `Attributes` recursively both `j{r}` and `i{r}` will resolve to `test.int{f}` above: ``` OrderBy[[Order[test.int{f}#22,ASC,LAST]]] = OrderBy[[Order[test.int{f}#22,ASC,LAST]]] \_Project[[j]] = \_Project[[j]] \_Project[[i]] ! \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! ``` The scope of recursive resolution depends on how the `AttributeMap` is constructed and populated. Fixes #67237
…9765) (#70322) Previously we did not resolve the attributes recursively which meant that if a field or expression was re-aliased multiple times (through multiple levels of subqueries), the aliases were only resolved one level down. This led to failed query translation because `ReferenceAttribute`s were pointing to non-existing attributes during query translation. For example the query ```sql SELECT i AS j FROM ( SELECT int AS i FROM test) ORDER BY j ``` failed during translation because the `OrderBy` resolved the `j` ReferenceAttribute to another `i` ReferenceAttribute that was later removed by an Optimization: ``` OrderBy[[Order[j{r}#4,ASC,LAST]]] ! OrderBy[[Order[i{r}#2,ASC,LAST]]] \_Project[[j]] = \_Project[[j]] \_Project[[i]] ! \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! ``` By resolving the `Attributes` recursively both `j{r}` and `i{r}` will resolve to `test.int{f}` above: ``` OrderBy[[Order[test.int{f}#22,ASC,LAST]]] = OrderBy[[Order[test.int{f}#22,ASC,LAST]]] \_Project[[j]] = \_Project[[j]] \_Project[[i]] ! \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] ! ``` The scope of recursive resolution depends on how the `AttributeMap` is constructed and populated. Fixes #67237
Previously we did not resolve the attributes recursively which meant that if a field or expression was re-aliased multiple times (through multiple levels of subqueries), the aliases were only resolved one level down. This led to failed query translation because
ReferenceAttribute
s were pointing to non-existing attributes during query translation.For example the query
failed during translation because the
OrderBy
resolved thej
ReferenceAttribute to anotheri
ReferenceAttribute that was later removed by an Optimization:By resolving the
Attributes
recursively bothj{r}
andi{r}
will resolve totest.int{f}
above:The scope of recursive resolution depends on how the
AttributeMap
is constructed and populated.Fixes #67237