-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move INSERT & REPLACE validation to the Calcite validator #15908
Conversation
…ct source node. Implemented validateInsert method in DruidSqlValidtor to validate the respective node and moved a lot of the validation being done previously in the ingestHandlers to this overriden method. In the next commit will try to pull back out some of this validation to make this code change smaller and easier to review.
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlToRelConverter.java
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/planner/IngestHandler.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/planner/IngestHandler.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some comments; I'm not sure if I understand every bits of it...
are these codes covered by tests?
there are quite a few new incorrect state checks; it would be nice to cover at least some of them with directed tests
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java
Outdated
Show resolved
Hide resolved
(SqlNodeList) operands[5], | ||
// Must match DruidSqlReplace.getOperandList() | ||
operands[6], | ||
null // fix this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what needs to be fixed? I would recommend to either use FIXME
and address it before merging the patch; or remove this comment and make sure that bad doesn't happen by providing some reasonable exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in order for this to work properly, the exportFileFormat
needs to be changed into an SqlNode, and added as an operand. Without this, I dont think that parameterized queries using export capabilites will work. cc @adarshsanjeev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
{ | ||
return new DruidSqlReplace( | ||
pos, | ||
// Must match SqlInsert.getOperandList() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand these comments - it doesn't help me understand...do we need them?
you could create a bunch of local variables with the casted types and name them accrodingly - that might help or even provide a place to add comments...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the comments, let me know if ok now.
if (!query.isA(SqlKind.QUERY)) { | ||
throw InvalidSqlInput.exception("Unexpected SQL statement type [%s], expected it to be a QUERY", query.getKind()); | ||
} | ||
return DruidSqlInsert.create(new SqlInsert( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think contents of this method could be just pushed into DruidSqlInsert
; it seems like all this supposed to belong to there...
...or there is something I've missed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment here: #15908 (comment)
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
private void validateSegmentGranularity( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seeing a method which is void
and named as validateSegmentGranularity
I have a feeling that the method have lost its context;
meaning: ideally this should be named something like getEffectiveGranularity
and return with Granularity
- but in the process of getting the actual granularity it must also identify invalid cases...
do you see any possible better place for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Fixed. let me know if better now.
final List<RelDataTypeField> sourceFields = sourceType.getFieldList(); | ||
for (final RelDataTypeField sourceField : sourceFields) { | ||
// Check that there are no unnamed columns in the insert. | ||
if (UNNAMED_COLUMN_PATTERN.matcher(sourceField.getName()).matches()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any test for this exception? what if the pattern doesn't match anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! there are tests in CalciteInsertDmlTest, for example, org.apache.druid.sql.calcite.CalciteInsertDmlTest#testInsertWithUnnamedColumnInSelectStatement
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Outdated
Show resolved
Hide resolved
if (query instanceof SqlOrderBy) { | ||
SqlOrderBy sqlOrderBy = (SqlOrderBy) query; | ||
SqlNodeList orderByList = sqlOrderBy.orderList; | ||
if (!(orderByList == null || orderByList.equals(SqlNodeList.EMPTY))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why have all this here? isn't DruidSqlIngest
subclassed to have DruidSqlInsert
?
shouldn't this happen in DruidSqlInsert
when it gets created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doing it in DruidSqlInsert construction changes the output of unparse. Not sure if this is ok. There are a few tests that are testing the expected output of unparse in DruidSqlUnparseTest.java
. I can update these tests, but just not sure if doing this has other ramifications?
* allow granularity mismatch * remove duplicate validation around unnamed columns
// Copied here from MSQE since that extension is not visible here. | ||
public static final String CTX_ROWS_PER_SEGMENT = "msqRowsPerSegment"; | ||
|
||
public interface ValidatorContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: unused interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
} | ||
|
||
// The target namespace is both the target table ID and the row type for that table. | ||
final SqlValidatorNamespace targetNamespace = getNamespace(insert); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we have a copy getNamespaceOrThrow
over from SqlValidatorImpl
or use requireNonNull
(just to avoid possible issue if it ends up being null)
// know names and we match by name.) Thus, we'd have to validate (to know names and types) | ||
// to get the target types, but we need the target types to validate. Catch-22. So, we punt. | ||
final SqlValidatorScope scope; | ||
if (source instanceof SqlSelect) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: it seems like most of this is copied over from SqlValidatorImpl#validateInsert
; extended / refactored / methods were extracted / commented on...
I think it might be challenging to maintain this in the long run - however validateInsert
doesn't seem to be changing very often
I think it might be usefull to leave some comments about the origins of this method as an apidoc of validateInsert
method
for (final RelDataTypeField sourceField : sourceFields) { | ||
// Check that there are no unnamed columns in the insert. | ||
if (UNNAMED_COLUMN_PATTERN.matcher(sourceField.getName()).matches()) { | ||
throw InvalidSqlInput.exception( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you replace these exceptions with ones communicating the SqlNode
if its interesting/valuable....
if its a general error that doesn't matter...but this error is specific to a selected column:
throw buildCalciteContextException(
"Insertion requires columns to be named....",
getSqlNodeFor(insert, sourceFields.indexOf(sourceField))
rough sqlNodeFor
method:
SqlNode getSqlNodeFor(SqlInsert insert, int idx) {
SqlNode src = insert.getSource();
if(src instanceof SqlSelect) {
SqlSelect sqlSelect = (SqlSelect) src;
SqlNodeList selectList = sqlSelect.getSelectList();
if(idx < selectList.size()) {
return selectList.get(idx);
}
}
return src;
}
@@ -1765,7 +1762,6 @@ public void testErrorWhenInputSourceInvalid() | |||
+ "partitioned by DAY\n" | |||
+ "clustered by channel"; | |||
HashMap<String, Object> context = new HashMap<>(DEFAULT_CONTEXT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't comment on the testInsertWithInvalidColumnNameInIngest
testcase; but the intention of the check is to ensure that something like:
INSERT INTO t SELECT __time, 1+1 FROM foo PARTITIONED BY ALL
is catched ; could you change or add something like this as a testcase?
@zachjsh - This was merged without a committer approval. Can you revert it and open another PR so that you can get a committer approval? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retroactive approval - LGTM overall. Left a few minor comments which can be addressed in a follow-up.
public void testInsertHourGrain() | ||
{ | ||
testIngestionQuery() | ||
.sql("INSERT INTO hourDs\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a follow up, could you also please add a test for a REPLACE
query where PARTITIONED BY
clause in the query is omitted?
} | ||
|
||
/** | ||
* If the segment grain is given in the catalog then use this value is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If the segment grain is given in the catalog then use this value is used. | |
If the segment grain is given in the catalog and absent in the PARTITIONED BY clause in the query, then use the value from the catalog. |
@Nullable SqlIdentifier exportFileFormat, | ||
@Nullable SqlNode replaceTimeQuery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious, is the order of args swapped for a reason?
// Copied here from MSQE since that extension is not visible here. | ||
public static final String CTX_ROWS_PER_SEGMENT = "msqRowsPerSegment"; | ||
|
||
public interface ValidatorContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -1679,7 +1679,6 @@ public void testErrorWithUnableToConstructColumnSignatureWithExtern() | |||
+ "partitioned by DAY\n" | |||
+ "clustered by channel"; | |||
HashMap<String, Object> context = new HashMap<>(DEFAULT_CONTEXT); | |||
context.put(PlannerContext.CTX_SQL_OUTER_LIMIT, 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, are these context parameters not required by these tests anymore?
…urce input expressions (#15962) * * address remaining comments from #15836 * * address remaining comments from #15908 * * add test that exposes relational algebra issue * * simplify test exposing issue * * fix * * add tests for sealed / non-sealed * * update test descriptions * * fix test failure when -Ddruid.generic.useDefaultValueForNull=true * * check type assignment based on natice Druid types * * add tests that cover missing jacoco coverage * * add replace tests * * add more tests and comments about column ordering * * simplify tests * * review comments * * remove commented line * * STRING family types should be validated as non-null
Description
This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner #13686 from @paul-rogers, Refactoring the IngestHandler and subclasses to produce a validated SqlInsert instance node instead of the previous Insert source node. The SqlInsert node is then validated in the calcite validator. The validation that is implemented as part of this pr, is only that for the source node, and some of the validation that was previously done in the ingest handlers. As part of this change, the partitionedBy clause can be supplied by the table catalog metadata if it exists, and can be omitted from the ingest time query in this case.
This PR has: