-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate the catalog with the Calcite planner #13686
Conversation
sql/src/main/java/org/apache/druid/sql/calcite/table/DatasourceTable.java
Fixed
Show fixed
Hide fixed
server/src/main/java/org/apache/druid/catalog/model/facade/DatasourceFacade.java
Fixed
Show fixed
Hide fixed
extensions-core/druid-catalog/src/test/java/org/apache/druid/catalog/sql/LiveCatalogTest.java
Fixed
Show fixed
Hide fixed
...sions-core/druid-catalog/src/main/java/org/apache/druid/catalog/sql/LiveCatalogResolver.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Fixed
Show fixed
Hide fixed
* Validate INSERT, REPLACE against the catalog (partial) * Obtain partitioning, clustering, etc. from catalog * Strict schema enforcement on ingest * Parameters work with MSQ queries * String literals for PARTITIONED BY * Revise handlers to use new validation logic * Catalog-defined external tables as MSQ input sources * Array-valued function arguments (improvements) * Documentation
264aa28
to
e13fbba
Compare
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Fixed
Show fixed
Hide fixed
PARTITIONED BY 'HOUR' | ||
|
||
-- Or | ||
PARTITIOND BY 'PT1H' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it possible to make this not work? Like if the user wants to do a PARTITIOND BY <period>
they have to do PARTITIONED BY TIME_FLOOR(__time, <period>)
the reasoning for this is that people should really be sticking to the existing partition types. When people partition by whacky periods they are asking for trouble...
One of the biggest offenders of this that I have encountered is users wanting to partition by "week" (P1W or P7D). It seems like a cool zany middle ground between day and month, but in reality it is asking for trouble later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great comment. As it turns out, allowing a string doesn't change much. The user can still say TIME_FLOOR(__time, 'P1W')
. There is code within the new validation code to accept a limited set of times. So, if we don't like P1W
, we simply don't accept it. The code currently accepts only the strings that match the SQL symbols we already support.
This change, by the way is a "nice to have" to support moving validation out of ad-hoc code into the Calcite validator. Basically, Calcite literals know about strings, but not Druid Granularities
so we now translate the SQL literals to strings, and convert those literals to a Granularity
during validation. Previously, we created a Granularity
in the parser as a "special" field, but that doesn't play by the Calcite rules. Easy enough to not accept a string literal in the parser; leave it as the SQL keywords and TIME_FLOOR
.
Note that "partitioning" is defined as time partitioning. Thus, we don't really need the TIME_FLOOR(__time
part of the expression: we know that partitioning means __time
and and means using TIME_FLOOR
.
Also, in the catalog, partitioning is defined as an ISO 8601 string: P1D
. Given that, allowing the same form in the SQL syntax isn't quite as crazy as it might first look. The catalog, also, validates the string to ensure it is one of the supported forms.
That said, aside from the code, do we have a list of the "actually" supported forms? Does anyone actually partition by second or minute, say?
Build fixes Cleaned up warnings in some IT code
* Revised Broker Guice integration * Added flush and resync operations to catalog cache * Fixed issues with listing catalog tables in info schema * Tests for info schema with the catalog
Simple Python-based Druid SDK More catalog IT coverage Partial fixes to handling catalog partial tables Added a PARAMTERS table to INFORMATION_SCHEMA Changed info schema numeric columns to BIGINT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked over DruidSqlValidator. Had various commentary on error messages and a few other things. Nothing seemed egregious, mostly just suggestions about wording, etc.
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Show resolved
Hide resolved
@paul-rogers , this PR #14023 moved some things around, including |
This pull request has been marked as stale due to 60 days of inactivity. |
… partitioning (#15836) This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner #13686 from @paul-rogers, extending the PARTITION BY clause to accept string literals for the time partitioning
This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner #13686 from @paul-rogers, Refactoring the IngestHandler and subclasses to produce a validated SqlInsert instance node instead of the previous Insert source node. The SqlInsert node is then validated in the calcite validator. The validation that is implemented as part of this pr, is only that for the source node, and some of the validation that was previously done in the ingest handlers. As part of this change, the partitionedBy clause can be supplied by the table catalog metadata if it exists, and can be omitted from the ingest time query in this case.
This pull request/issue has been closed due to lack of activity. If you think that |
This pull request has been marked as stale due to 60 days of inactivity. |
This pull request/issue has been closed due to lack of activity. If you think that |
This PR is currently a draft. Resolving merge conflicts after splitting out some of the code to other PRs.
Prior PRs added the catalog (table metadata) foundations, and an improved set of table functions. This PR brings it all together:
To allow all the above to work:
Release note
This PR introduces the full catalog functionality. See the documentation files for the details. In this version, the catalog is an extension: you must enable the catalog extension to use the catalog. Enabling the extension creates an additional table in your metadata database. We consider the catalog to be experimental, and the metadata table schema is subject to change.
Table functions, introduced in a prior PR, are production ready and independent of the catalog. "Partial table functions" (define some of the properties in the catalog, some in SQL) are new in this PR and are experimental, along with the catalog itself.
Hints to reviewers
Much of this PR is doc files, test code and minor cleanup. The core changes (those that could break a running system if done wrong) are:
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/*
sql/src/main/*
The real core of this PR is
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
: the place where we moved the former ad-hocINSERT
andREPLACE
validation to instead run within the SQL validator.No runtime code was changed: all the non-trivial changes are in the SQL planner.
This PR has: