Allow broker to use catalog for datasource schemas for SQL queries #15469

jon-wei · 2023-12-01T05:55:25Z

This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner #13686 from @paul-rogers, allowing the datasource table schemas defined in the catalog to be synced to the broker.

This does not include the MSQ INSERT and REPLACE validation using the catalog functionality from #13686, this PR is just the portion of changes needed to sync the queryable schemas to the broker.

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
a release note entry in the PR description.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

extensions-core/druid-catalog/src/test/java/org/apache/druid/catalog/sql/CatalogQueryTest.java

+    SqlSchema schema = SqlSchema.builder()
+        .column("__time", "TIMESTAMP(3) NOT NULL")
+        .column("extra1", "VARCHAR")
+        .column("dim2", "VARCHAR")
+        .column("dim1", "VARCHAR")
+        .column("cnt", "BIGINT NOT NULL")
+        .column("m1", "DOUBLE NOT NULL")
+        .column("extra2", "BIGINT NOT NULL")
+        .column("extra3", "VARCHAR")
+        .column("m2", "DOUBLE NOT NULL")
+        .build();


zachjsh · 2023-12-05T22:18:09Z

...sions-core/druid-catalog/src/main/java/org/apache/druid/catalog/sql/LiveCatalogResolver.java

+import java.util.Map;
+import java.util.Set;
+
+public class LiveCatalogResolver implements CatalogResolver


zachjsh · 2023-12-05T22:19:21Z

...sions-core/druid-catalog/src/main/java/org/apache/druid/catalog/sql/LiveCatalogResolver.java

+  /**
+   * Create a {@link DruidTable} based on the physical segments, catalog entry, or both.
+   */
+  @Override


@Nullable

zachjsh · 2023-12-05T22:31:56Z

...sions-core/druid-catalog/src/main/java/org/apache/druid/catalog/sql/LiveCatalogResolver.java

+      columns.put(col.name(), colMetadata);
+    }
+
+    // Mark any hidden columns. Assumes that the hidden columns are a disjoint set


does anything enforce this disjoint property? What if hidden columns are also defined?

There isn't enforcement of this right now, as-is if hidden columns overlap with the declared columns, the hiding would be ignored (the user would need to remove the column from the declared set to have the hiding take effect)

zachjsh · 2023-12-05T22:37:03Z

...sions-core/druid-catalog/src/main/java/org/apache/druid/catalog/sql/LiveCatalogResolver.java

+    // Merge columns. All catalog-defined columns come first,
+    // in the order defined in the catalog.
+    final RowSignature.Builder builder = RowSignature.builder();
+    Map<String, EffectiveColumnMetadata> columns = new HashMap<>();


Does the insertion of keys, values into hashMap preserve order that it was added? I assume that the order of columns of the effectiveMetadata is important?

columns is just used as column name lookup, the order-preserving component in a DatasourceTable is the RowSignature which stores the columns in a list

zachjsh · 2023-12-07T06:55:13Z

extensions-core/druid-catalog/src/main/java/org/apache/druid/catalog/sync/CatalogClient.java

@@ -86,7 +86,7 @@ public CatalogClient(
  @Override
  public List<TableMetadata> tablesForSchema(String dbSchema)
  {
-    String url = StringUtils.replace(SCHEMA_SYNC_PATH, "{dbSchema}", dbSchema);
+    String url = StringUtils.replace(SCHEMA_SYNC_PATH, "{schema}", dbSchema);


nit: Good catch. should we move "{schema}" to a constant here?

zachjsh · 2023-12-07T06:55:19Z

extensions-core/druid-catalog/src/main/java/org/apache/druid/catalog/sync/CatalogClient.java

@@ -96,7 +96,7 @@ public List<TableMetadata> tablesForSchema(String dbSchema)
  @Override
  public TableMetadata table(TableId id)
  {
-    String url = StringUtils.replace(TABLE_SYNC_PATH, "{dbSchema}", id.schema());
+    String url = StringUtils.replace(TABLE_SYNC_PATH, "{schema}", id.schema());


zachjsh · 2023-12-07T06:59:55Z

...ns-core/druid-catalog/src/main/java/org/apache/druid/catalog/sync/CatalogUpdateReceiver.java

+ * Coordinator. To prevent slowing initial queries, this class loads the
+ * current catalog contents into the local cache on lifecycle start, which
+ * avoids the on-demand reads that would otherwise occur. After the first load,
+ * events from the Coordinator keep the local cache evergreen.


Where are events from the coordinator pushed after this receiver loads on lifecycle start? May be good to note that here.

zachjsh · 2023-12-07T07:06:51Z

server/src/main/java/org/apache/druid/catalog/model/ColumnSpec.java

   */
-  private final String sqlType;
+  private final String dataType;


What is the reasoning behind changing from sql type to druid type here?

It now accepts both SQL and Druid native type strings in the catalog definition

zachjsh · 2023-12-07T07:10:07Z

server/src/main/java/org/apache/druid/catalog/model/Columns.java

-        .put(VARCHAR, ColumnType.STRING)
-        .build();
+          .put(SQL_BIGINT, ColumnType.LONG)
+          .put(SQL_VARCHAR, ColumnType.STRING)


Do you mean to remove mappings for SQL_FLOAT and SQL_DOUBLE and TIMESTAMP here?

This part wasn't being used actually right now, removed it

zachjsh · 2023-12-07T07:14:44Z

server/src/main/java/org/apache/druid/catalog/model/Columns.java

      return null;
    }
-    ColumnType druidType = SQL_TO_DRUID_TYPES.get(StringUtils.toUpperCase(sqlType));
+    ColumnType druidType = SQL_TO_DRUID_TYPES.get(StringUtils.toUpperCase(dataType));


It looks like we changed the dataType of columnSpec to be the druid type instead of the sql typ now, so should we take the type as is now instead of looking up mapping here?

It checks if it's a SQL type first and attempts to convert, and then does ColumnType.fromString(dataType) if not to treat it as a Druid native type

zachjsh · 2023-12-07T07:19:28Z

server/src/main/java/org/apache/druid/catalog/model/facade/DatasourceFacade.java

+      this.spec = spec;
+      if (Columns.isTimeColumn(spec.name()) && spec.dataType() == null) {
+        // For __time only, force a type if type is null.
+        this.sqlType = Columns.LONG;


Should this be COLUMNS.SQL_BIGINT?

No, this is using the native type

zachjsh · 2023-12-07T07:34:30Z

server/src/main/java/org/apache/druid/catalog/model/table/DatasourceDefn.java

@@ -68,16 +68,6 @@ public class DatasourceDefn extends TableDefn
   */
  public static final String HIDDEN_COLUMNS_PROPERTY = "hiddenColumns";

-  /**
-   * By default: columns are optional hints. If a datasource has columns defined,


Why does this description no longer apply?

I'm not sure why this was removed in Paul's original PR

zachjsh · 2023-12-07T07:35:29Z

server/src/main/java/org/apache/druid/catalog/model/table/ExternalTableDefn.java

+  /**
+   * Column type for external tables.
+   */
+  public static final String EXTERNAL_COLUMN_TYPE = "extern";


What is this used for?

This is in another part of the original PR related to external table handling, it's not used currently but future changes will

zachjsh · 2023-12-07T07:44:16Z

sql/src/main/java/org/apache/druid/sql/calcite/table/DatasourceTable.java

+      return columns.get(name);
+    }
+
+    public boolean isEmpty()


Does isEmpty signify whether or not the table had any data in it or not?

Yes, it's set to true when there are no physical segments

zachjsh

LGTM 🚀

github-actions bot added Area - Batch Ingestion Area - Querying Area - Dependencies Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Dec 1, 2023

github-advanced-security bot found potential problems Dec 1, 2023

View reviewed changes

jon-wei force-pushed the catalog_schema_sync branch 3 times, most recently from 8eb5d50 to 7f2bf29 Compare December 4, 2023 19:55

zachjsh self-requested a review December 5, 2023 17:39

zachjsh reviewed Dec 5, 2023

View reviewed changes

zachjsh reviewed Dec 7, 2023

View reviewed changes

Allow broker to use catalog for datasource schemas

998898e

jon-wei force-pushed the catalog_schema_sync branch from 7f2bf29 to 998898e Compare December 7, 2023 21:12

jon-wei added 2 commits December 11, 2023 13:22

More PR comments

b169246

PR comments

cdc03d3

zachjsh approved these changes Dec 11, 2023

View reviewed changes

jon-wei merged commit 5d1e66b into apache:master Jan 8, 2024
83 checks passed

LakshSingla added this to the 29.0.0 milestone Jan 29, 2024

LakshSingla mentioned this pull request Feb 13, 2024

[DRAFT] 29.0.0 release notes #15896

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow broker to use catalog for datasource schemas for SQL queries #15469

Allow broker to use catalog for datasource schemas for SQL queries #15469

jon-wei commented Dec 1, 2023

zachjsh Dec 5, 2023

zachjsh Dec 5, 2023

zachjsh Dec 5, 2023

jon-wei Dec 7, 2023

zachjsh Dec 5, 2023

jon-wei Dec 7, 2023 •

edited

Loading

zachjsh Dec 7, 2023

zachjsh Dec 7, 2023

zachjsh Dec 7, 2023

zachjsh Dec 7, 2023

jon-wei Dec 11, 2023

zachjsh Dec 7, 2023

jon-wei Dec 11, 2023

zachjsh Dec 7, 2023

jon-wei Dec 11, 2023

zachjsh Dec 7, 2023

jon-wei Dec 11, 2023

zachjsh Dec 7, 2023

jon-wei Dec 11, 2023

zachjsh Dec 7, 2023

jon-wei Dec 11, 2023

zachjsh Dec 7, 2023

jon-wei Dec 8, 2023

zachjsh left a comment

Allow broker to use catalog for datasource schemas for SQL queries #15469

Allow broker to use catalog for datasource schemas for SQL queries #15469

Conversation

jon-wei commented Dec 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-wei Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zachjsh left a comment

Choose a reason for hiding this comment

jon-wei Dec 7, 2023 •

edited

Loading