SQL version of unnest native druid function #13576

somu-imply · 2022-12-15T22:25:29Z

The goal here is to develop rules to add LogicalCorrelate and Uncollect to support the query form we have decided for SQL Unnest.

The SQL query that takes the form SELECT * FROM UNNEST(ARRAY['1','2','3']) or SELECT * FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) generates two components in the Calcite's logical plan (LogicalCorrelate and Uncollect) which do not have DruidConverters.

The logical plans that calcite generates for these type of queries are

 SELECT * FROM UNNEST(ARRAY['1','2','3'])
 
 Generates the plan

25:Uncollect
  23:LogicalProject(subset=[rel#24:Subset#1.NONE.[]], EXPR$0=[ARRAY('1', '2', '3')])
    4:LogicalValues(subset=[rel#22:Subset#0.NONE.[0]], tuples=[[{ 0 }]])

and

SELECT * FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3))

Generates

80:LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{3}])
  6:LogicalTableScan(subset=[rel#74:Subset#0.NONE.[]], table=[[druid, numfoo]])
  78:Uncollect(subset=[rel#79:Subset#3.NONE.[]])
    76:LogicalProject(subset=[rel#77:Subset#2.NONE.[]], EXPR$0=[MV_TO_ARRAY($cor0.dim3)])
      7:LogicalValues(subset=[rel#75:Subset#1.NONE.[0]], tuples=[[{ 0 }]])

So we add a set of rules which helps convert these to Druid queries. In a nutshell the chain of rules look like the following:

In this PR we define the rules and develop appropriate DruidRels with partial queries that converts these plans to druid queries. Testcases are added to establish the different use cases we deal with

This PR has:

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidUnnestDatasourceRel.java

clintropolis

nice 🚀

I suggest adding some more tests with some other query types such as topN or group by

clintropolis · 2022-12-22T21:34:19Z

.idea/misc.xml

@@ -46,7 +46,7 @@
    <option name="myDefaultNotNull" value="javax.annotation.Nonnull" />
    <option name="myNullables">
      <value>
-        <list size="12">


I assume changes to this file were unintended?

clintropolis · 2023-01-03T20:29:01Z

sql/src/main/java/org/apache/druid/sql/calcite/expression/Expressions.java

+    // This exception is caught while returning false from isValidDruidQuery() method
+    if (ref.getField().getIndex() > rowSignature.size()) {
+      throw new CannotBuildQueryException(
+          "Cannot build query as index is higher than row size"


suggest showing column/field name that is missing from row signature instead of saying anything about index since is confusing

clintropolis · 2023-01-03T20:29:10Z

sql/src/main/java/org/apache/druid/sql/calcite/expression/Expressions.java

+    final String columnName = rowSignature.getColumnName(ref.getField().getIndex());
+    final Optional<ColumnType> columnType = rowSignature.getColumnType(ref.getField().getIndex());
+    if (columnName == null) {
+      throw new ISE("Expression referred to nonexistent index[%d]", ref.getField().getIndex());


same comment about error message

Fixed the error messages

clintropolis · 2023-01-03T20:29:50Z

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java

+public class DruidCorrelateUnnestRel extends DruidRel<DruidCorrelateUnnestRel>
+{
+  // This may be needed for the explain plan later
+  // private static final TableDataSource DUMMY_DATA_SOURCE = new TableDataSource("__unnest__");


nit: is this needed? else remove

The code has been modified

clintropolis · 2023-01-03T20:30:31Z

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java

+        unnestDatasourceRel.getUnnestProject().getRowType()
+    );
+
+    String dimensionToUnnest;


nit: final? also suggest using inputToUnnest or something since the input doesn't have to be a dimension

clintropolis · 2023-01-03T20:35:52Z

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java

+    final Project project = Preconditions.checkNotNull(partialQuery.getUnnestProject(), "unnestProject");
+
+    if (partialQuery.getAggregate() != null) {
+      throw new ISE("Cannot have both 'unnestProject' and 'aggregate', how can this be?");


clintropolis · 2023-01-03T20:36:32Z

sql/src/main/java/org/apache/druid/sql/calcite/rel/PartialDruidQuery.java

@@ -54,6 +54,8 @@
  private final RelNode scan;
  private final Filter whereFilter;
  private final Project selectProject;
+  // add an unnestProject


nit: remove comment

clintropolis · 2023-01-03T20:37:38Z

sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidCorrelateUnnestRule.java

+    }
+
+
+    // todo: make new projects with druidQueryRel projects + unnestRel projects shifted


nit: this todo seems done?

…values

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java

…rrelate

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java

sql/src/test/java/org/apache/druid/sql/calcite/CalciteJoinQueryTest.java

processing/src/test/java/org/apache/druid/segment/UnnestColumnValueSelectorCursorTest.java

imply-cheddar · 2023-01-19T02:10:10Z

.idea/misc.xml

+  <component name="SuppressKotlinCodeStyleNotification">
+    <option name="disableForAll" value="true" />
+  </component>


The changes in this file appear to continue to exist in this PR, calling it out as it should likely be reverted before a merge.

Good call, I'll merge the master into this branch and update this

imply-cheddar · 2023-01-19T02:15:13Z

processing/src/main/java/org/apache/druid/query/InlineDataSource.java

@@ -125,7 +125,7 @@ private static boolean rowsEqual(final Iterable<Object[]> rowsA, final Iterable<
        final Object[] rowA = listA.get(i);
        final Object[] rowB = listB.get(i);

-        if (!Arrays.equals(rowA, rowB)) {
+        if (!Arrays.deepEquals(rowA, rowB)) {


There's a hashCode method that claims that it is compatible with this method. I'm not sure that is actually true anymore because I don't believe it walks into Arrays-of-Arrays the same way that this one does.

This change got percolated here, I'll move to the equals here

imply-cheddar · 2023-01-19T02:18:34Z

processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java

@@ -181,7 +181,10 @@ public void inspectRuntimeShape(RuntimeShapeInspector inspector)
          @Override
          public Object getObject()
          {
-            if (indexedIntsForCurrentRow == null) {
+            if (dimSelector.getObject() == null) {


Why did we switch away from using the stored reference? getObject() can do work, work we've already done, doing the same work multiple times is bad.

It looks to me like you just needed to switch it to be

if (indexedIntsForCurrentRow == null || indexedIntsForCurrentRow.size() == 0)

And then, you could perhaps make it even simpler if you check the size once when setting indexedIntsForCurrentRow and set to null when size is 0.

Moving to stored reference and setting the indexedIntsForCurrentRow to null at the time of assignment if there are no elements

imply-cheddar · 2023-01-19T02:23:13Z

processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java

+      // to support rows which have only null values
+      // need to check if the value is not null and the size is greater than 0
+      if (indexedIntsForCurrentRow != null && indexedIntsForCurrentRow.size() > 0) {
+        return indexedIntsForCurrentRow.get(index);
+      }


Why would we be getting any rows at all when unnesting a data point that was null to begin with? That is, if we had the array-of-null, we should be getting an IndexedInts back with 1 entry, which is null. If we didn't have anything to begin with, then we should be getting back null and probably don't have any work to do on the row anyway?

The comment is misleading, I'll update the comment we process the row only if it is not null and has atleast 1 value

imply-cheddar · 2023-01-19T02:24:32Z

processing/src/test/java/org/apache/druid/segment/UnnestColumnValueSelectorCursorTest.java

+    int k = 0;
+    while (!unnestCursor.isDone()) {
+      if (k < 8) {
+        Assert.assertEquals(unnestDimSelector.getValue(), expectedResults.get(k).toString());


I think you have expected and actual inverted. The junit assert has expected come first.

Thanks, actually in this file all were inverted. Good catch ! I'll update all of them

imply-cheddar · 2023-01-19T02:27:10Z

sql/src/main/java/org/apache/druid/sql/calcite/expression/Expressions.java

+    // This case arises in the case of a correlation where the rexNode points to a table from the left subtree
+    // while the underlying datasource is the scan stub created from LogicalValuesRule
+    // In such a case we throw a CannotBuildQueryException so that Calcite does not go ahead with this path
+    // This exception is caught while returning false from isValidDruidQuery() method


I feel like this comment has been separated from its home. Is it inside of the if down below?

Moving this to inside of the if

imply-cheddar · 2023-01-19T02:28:33Z

sql/src/main/java/org/apache/druid/sql/calcite/expression/Expressions.java

+      );
+    }
+
+    final String columnName = rowSignature.getColumnName(index);


Didn't we just search the rowSignature for the name? Why will we get a different name back than the one that we searched for?

We used the rexNodes name to find the index before. Now we are just using the column name of the row signature at the particular index

imply-cheddar · 2023-01-19T02:29:18Z

sql/src/main/java/org/apache/druid/sql/calcite/expression/Expressions.java

+    if (columnName == null) {
+      throw new ISE("Expression referred to nonexistent index [%d] in row [%s]", index, rowSignature);
+    }


I don't believe this if block can ever be run.

True we are ensuring the index is always within the rowSignature length, will remove this

imply-cheddar · 2023-01-19T02:29:47Z

sql/src/main/java/org/apache/druid/sql/calcite/expression/Expressions.java

+      throw new ISE("Expression referred to nonexistent index [%d] in row [%s]", index, rowSignature);
+    }
+
+    return DruidExpression.ofColumn(columnType.orElse(null), columnName);


Why would we ever not have the columnType? Why do we need to orElse it?

With the index in bounds this should be just columnType.get()

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java

…_investigation

cheddar

I went to merge this, then realized that there isn't a great commit message to combine all of the commits under. I tend to try to write the very first commit message to be something that can be used for the whole PR, but here each of the commit messages tends to be a terse statement of fact. So, instead of merge, I'm going to just approve.

If that lets you merge, then please merge and come up with a good commit message for the squashed files. If you still cannot merge, please write a good commit message as a comment in the PR here and I will merge with that.

… that query path

somu-imply · 2023-01-23T17:43:43Z

This PR implements the SQL component of the native unnest functionality in Druid. Unnest in SQL through Calcite has been implemented as a combination of Correlate (the comma join part) and Uncollect (the unnest part). Here we have introduced rules to handle unnest SQL queries on a table dimension, virtual column or a constant array and appropriate rels to convert them into Druid convention to translate correctly into native Druid queries.

* adds the SQL component of the native unnest functionality in Druid to unnest SQL queries on a table dimension, virtual column or a constant array and convert them into native Druid queries * unnest in SQL is implemented as a combination of Correlate (the comma join part) and Uncollect (the unnest part)

somu-imply added 10 commits December 11, 2022 20:34

Some changes for sql unnest

2e72fd3

Updating partial query

812133f

Updating a test case

80846d3

Adding select project of left to top of correlate

828bf95

Handling the correlate data type check to not blow up

2a308be

temp

cce6aa8

Working version of sql unnest

f0ac1a3

temp changes to debug virtual column creation

6385f16

Adding sql support for virtual columns

92e161a

Fixing some test cases

6767913

github-advanced-security bot found potential problems Dec 21, 2022

View reviewed changes

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java Fixed Show fixed Hide fixed

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidUnnestDatasourceRel.java Fixed Show fixed Hide fixed

clintropolis added the Area - SQL label Dec 22, 2022

clintropolis reviewed Jan 3, 2023

View reviewed changes

somu-imply added 4 commits January 4, 2023 15:21

More test cases and some changes to support virtual columns and null …

de943e9

…values

fixing group by on unnested virtual columns

ff561dd

Making constructor for correlate slimmer

fbb5d5b

Fixed for grouping by on virtual columns for unnest

f820307

github-advanced-security bot found potential problems Jan 11, 2023

View reviewed changes

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java Fixed Show fixed Hide fixed

Refactoring code to use abstract class correlate instead of LogicalCo…

94f47a5

…rrelate

github-advanced-security bot found potential problems Jan 12, 2023

View reviewed changes

somu-imply added 2 commits January 12, 2023 09:51

adding explain terms to fix the limit issue

3afd027

test case for unnest on top of a join datasource

6356ba3

somu-imply marked this pull request as ready for review January 12, 2023 20:30

Adding javadocs and fixing one test

8a854dc

github-advanced-security bot found potential problems Jan 13, 2023

View reviewed changes

processing/src/test/java/org/apache/druid/segment/UnnestColumnValueSelectorCursorTest.java Fixed Show fixed Hide fixed

processing/src/test/java/org/apache/druid/segment/UnnestColumnValueSelectorCursorTest.java Fixed Show fixed Hide fixed

somu-imply added 3 commits January 13, 2023 13:07

Removing two unused variables and one speel check

4f99571

Test with grouping on virtual column

0376170

Fixing a test case to generate more line coverage

a1e0ea8

imply-cheddar reviewed Jan 19, 2023

View reviewed changes

Addressing comments part 1

06b3811

somu-imply added 7 commits January 19, 2023 09:41

Merge remote-tracking branch 'upstream/master' into sql_unnnest_error…

8c03d38

…_investigation

Updating the misc.xml

2bb22fa

Addressing review comments part 2

df05cf4

Fixing one code comment

533be68

spotbugs fix

16c750c

Removing throws clause in the clone method

bf0e27c

One last fix to reuse a columnName

301cbaa

cheddar approved these changes Jan 23, 2023

View reviewed changes

Updating the field expression to catch missing index and not pursuing…

5320349

… that query path

clintropolis approved these changes Jan 23, 2023

View reviewed changes

somu-imply mentioned this pull request Jan 23, 2023

Additional native query tests for unnest datasource #13554

Merged

10 tasks

clintropolis merged commit 90d4455 into apache:master Jan 23, 2023

317brian mentioned this pull request Feb 1, 2023

docs: sql unnest and cleanup unnest datasource #13736

Merged

1 task

clintropolis added this to the 26.0 milestone Apr 10, 2023

techdocsmith mentioned this pull request Apr 12, 2023

[DRAFT] 26.0.0 release notes #14064

Closed

		}


		// todo: make new projects with druidQueryRel projects + unnestRel projects shifted

SQL version of unnest native druid function #13576

SQL version of unnest native druid function #13576

Conversation

somu-imply commented Dec 15, 2022 • edited Loading

clintropolis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cheddar left a comment

Choose a reason for hiding this comment

somu-imply commented Jan 23, 2023

somu-imply commented Dec 15, 2022 •

edited

Loading