-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL version of unnest native druid function #13576
SQL version of unnest native druid function #13576
Conversation
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidUnnestDatasourceRel.java
Fixed
Show fixed
Hide fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice 🚀
I suggest adding some more tests with some other query types such as topN or group by
.idea/misc.xml
Outdated
@@ -46,7 +46,7 @@ | |||
<option name="myDefaultNotNull" value="javax.annotation.Nonnull" /> | |||
<option name="myNullables"> | |||
<value> | |||
<list size="12"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume changes to this file were unintended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted
// This exception is caught while returning false from isValidDruidQuery() method | ||
if (ref.getField().getIndex() > rowSignature.size()) { | ||
throw new CannotBuildQueryException( | ||
"Cannot build query as index is higher than row size" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest showing column/field name that is missing from row signature instead of saying anything about index since is confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
final String columnName = rowSignature.getColumnName(ref.getField().getIndex()); | ||
final Optional<ColumnType> columnType = rowSignature.getColumnType(ref.getField().getIndex()); | ||
if (columnName == null) { | ||
throw new ISE("Expression referred to nonexistent index[%d]", ref.getField().getIndex()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the error messages
public class DruidCorrelateUnnestRel extends DruidRel<DruidCorrelateUnnestRel> | ||
{ | ||
// This may be needed for the explain plan later | ||
// private static final TableDataSource DUMMY_DATA_SOURCE = new TableDataSource("__unnest__"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is this needed? else remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code has been modified
unnestDatasourceRel.getUnnestProject().getRowType() | ||
); | ||
|
||
String dimensionToUnnest; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: final? also suggest using inputToUnnest
or something since the input doesn't have to be a dimension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
final Project project = Preconditions.checkNotNull(partialQuery.getUnnestProject(), "unnestProject"); | ||
|
||
if (partialQuery.getAggregate() != null) { | ||
throw new ISE("Cannot have both 'unnestProject' and 'aggregate', how can this be?"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
@@ -54,6 +54,8 @@ | |||
private final RelNode scan; | |||
private final Filter whereFilter; | |||
private final Project selectProject; | |||
// add an unnestProject |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove comment
} | ||
|
||
|
||
// todo: make new projects with druidQueryRel projects + unnestRel projects shifted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this todo seems done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java
Fixed
Show resolved
Hide resolved
sql/src/test/java/org/apache/druid/sql/calcite/CalciteJoinQueryTest.java
Fixed
Show fixed
Hide fixed
sql/src/test/java/org/apache/druid/sql/calcite/CalciteJoinQueryTest.java
Fixed
Show fixed
Hide fixed
processing/src/test/java/org/apache/druid/segment/UnnestColumnValueSelectorCursorTest.java
Fixed
Show fixed
Hide fixed
processing/src/test/java/org/apache/druid/segment/UnnestColumnValueSelectorCursorTest.java
Fixed
Show fixed
Hide fixed
.idea/misc.xml
Outdated
<component name="SuppressKotlinCodeStyleNotification"> | ||
<option name="disableForAll" value="true" /> | ||
</component> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this file appear to continue to exist in this PR, calling it out as it should likely be reverted before a merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I'll merge the master into this branch and update this
@@ -125,7 +125,7 @@ private static boolean rowsEqual(final Iterable<Object[]> rowsA, final Iterable< | |||
final Object[] rowA = listA.get(i); | |||
final Object[] rowB = listB.get(i); | |||
|
|||
if (!Arrays.equals(rowA, rowB)) { | |||
if (!Arrays.deepEquals(rowA, rowB)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a hashCode
method that claims that it is compatible with this method. I'm not sure that is actually true anymore because I don't believe it walks into Arrays-of-Arrays the same way that this one does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change got percolated here, I'll move to the equals
here
@@ -181,7 +181,10 @@ public void inspectRuntimeShape(RuntimeShapeInspector inspector) | |||
@Override | |||
public Object getObject() | |||
{ | |||
if (indexedIntsForCurrentRow == null) { | |||
if (dimSelector.getObject() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we switch away from using the stored reference? getObject()
can do work, work we've already done, doing the same work multiple times is bad.
It looks to me like you just needed to switch it to be
if (indexedIntsForCurrentRow == null || indexedIntsForCurrentRow.size() == 0)
And then, you could perhaps make it even simpler if you check the size once when setting indexedIntsForCurrentRow
and set to null
when size is 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving to stored reference and setting the indexedIntsForCurrentRow
to null at the time of assignment if there are no elements
// to support rows which have only null values | ||
// need to check if the value is not null and the size is greater than 0 | ||
if (indexedIntsForCurrentRow != null && indexedIntsForCurrentRow.size() > 0) { | ||
return indexedIntsForCurrentRow.get(index); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we be getting any rows at all when unnesting a data point that was null to begin with? That is, if we had the array-of-null, we should be getting an IndexedInts back with 1 entry, which is null. If we didn't have anything to begin with, then we should be getting back null
and probably don't have any work to do on the row anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is misleading, I'll update the comment we process the row only if it is not null and has atleast 1 value
int k = 0; | ||
while (!unnestCursor.isDone()) { | ||
if (k < 8) { | ||
Assert.assertEquals(unnestDimSelector.getValue(), expectedResults.get(k).toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have expected and actual inverted. The junit assert has expected come first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, actually in this file all were inverted. Good catch ! I'll update all of them
// This case arises in the case of a correlation where the rexNode points to a table from the left subtree | ||
// while the underlying datasource is the scan stub created from LogicalValuesRule | ||
// In such a case we throw a CannotBuildQueryException so that Calcite does not go ahead with this path | ||
// This exception is caught while returning false from isValidDruidQuery() method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this comment has been separated from its home. Is it inside of the if down below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this to inside of the if
); | ||
} | ||
|
||
final String columnName = rowSignature.getColumnName(index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we just search the rowSignature for the name? Why will we get a different name back than the one that we searched for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used the rexNodes name to find the index before. Now we are just using the column name of the row signature at the particular index
if (columnName == null) { | ||
throw new ISE("Expression referred to nonexistent index [%d] in row [%s]", index, rowSignature); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this if block can ever be run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True we are ensuring the index is always within the rowSignature length, will remove this
throw new ISE("Expression referred to nonexistent index [%d] in row [%s]", index, rowSignature); | ||
} | ||
|
||
return DruidExpression.ofColumn(columnType.orElse(null), columnName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we ever not have the columnType? Why do we need to orElse it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the index in bounds this should be just columnType.get()
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java
Fixed
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went to merge this, then realized that there isn't a great commit message to combine all of the commits under. I tend to try to write the very first commit message to be something that can be used for the whole PR, but here each of the commit messages tends to be a terse statement of fact. So, instead of merge, I'm going to just approve.
If that lets you merge, then please merge and come up with a good commit message for the squashed files. If you still cannot merge, please write a good commit message as a comment in the PR here and I will merge with that.
This PR implements the SQL component of the native unnest functionality in Druid. Unnest in SQL through Calcite has been implemented as a combination of Correlate (the comma join part) and Uncollect (the unnest part). Here we have introduced rules to handle unnest SQL queries on a table dimension, virtual column or a constant array and appropriate rels to convert them into Druid convention to translate correctly into native Druid queries. |
* adds the SQL component of the native unnest functionality in Druid to unnest SQL queries on a table dimension, virtual column or a constant array and convert them into native Druid queries * unnest in SQL is implemented as a combination of Correlate (the comma join part) and Uncollect (the unnest part)
The goal here is to develop rules to add
LogicalCorrelate
andUncollect
to support the query form we have decided for SQL Unnest.The SQL query that takes the form
SELECT * FROM UNNEST(ARRAY['1','2','3'])
orSELECT * FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3))
generates two components in the Calcite's logical plan (LogicalCorrelate
andUncollect
) which do not have DruidConverters.The logical plans that calcite generates for these type of queries are
and
So we add a set of rules which helps convert these to Druid queries. In a nutshell the chain of rules look like the following:
In this PR we define the rules and develop appropriate DruidRels with partial queries that converts these plans to druid queries. Testcases are added to establish the different use cases we deal with
This PR has: