Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow broker to use catalog for datasource schemas for SQL queries #15469

Merged
merged 3 commits into from
Jan 8, 2024

Conversation

jon-wei
Copy link
Contributor

@jon-wei jon-wei commented Dec 1, 2023

This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner #13686 from @paul-rogers, allowing the datasource table schemas defined in the catalog to be synced to the broker.

This does not include the MSQ INSERT and REPLACE validation using the catalog functionality from #13686, this PR is just the portion of changes needed to sync the queryable schemas to the broker.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions bot added Area - Batch Ingestion Area - Querying Area - Dependencies Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Dec 1, 2023
Comment on lines +55 to +65
SqlSchema schema = SqlSchema.builder()
.column("__time", "TIMESTAMP(3) NOT NULL")
.column("extra1", "VARCHAR")
.column("dim2", "VARCHAR")
.column("dim1", "VARCHAR")
.column("cnt", "BIGINT NOT NULL")
.column("m1", "DOUBLE NOT NULL")
.column("extra2", "BIGINT NOT NULL")
.column("extra3", "VARCHAR")
.column("m2", "DOUBLE NOT NULL")
.build();

Check notice

Code scanning / CodeQL

Unread local variable Note test

Variable 'SqlSchema schema' is never read.
@jon-wei jon-wei force-pushed the catalog_schema_sync branch 3 times, most recently from 8eb5d50 to 7f2bf29 Compare December 4, 2023 19:55
@zachjsh zachjsh self-requested a review December 5, 2023 17:39
import java.util.Map;
import java.util.Set;

public class LiveCatalogResolver implements CatalogResolver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc

/**
* Create a {@link DruidTable} based on the physical segments, catalog entry, or both.
*/
@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nullable

columns.put(col.name(), colMetadata);
}

// Mark any hidden columns. Assumes that the hidden columns are a disjoint set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does anything enforce this disjoint property? What if hidden columns are also defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't enforcement of this right now, as-is if hidden columns overlap with the declared columns, the hiding would be ignored (the user would need to remove the column from the declared set to have the hiding take effect)

// Merge columns. All catalog-defined columns come first,
// in the order defined in the catalog.
final RowSignature.Builder builder = RowSignature.builder();
Map<String, EffectiveColumnMetadata> columns = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the insertion of keys, values into hashMap preserve order that it was added? I assume that the order of columns of the effectiveMetadata is important?

Copy link
Contributor Author

@jon-wei jon-wei Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columns is just used as column name lookup, the order-preserving component in a DatasourceTable is the RowSignature which stores the columns in a list

@@ -86,7 +86,7 @@ public CatalogClient(
@Override
public List<TableMetadata> tablesForSchema(String dbSchema)
{
String url = StringUtils.replace(SCHEMA_SYNC_PATH, "{dbSchema}", dbSchema);
String url = StringUtils.replace(SCHEMA_SYNC_PATH, "{schema}", dbSchema);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Good catch. should we move "{schema}" to a constant here?

@@ -96,7 +96,7 @@ public List<TableMetadata> tablesForSchema(String dbSchema)
@Override
public TableMetadata table(TableId id)
{
String url = StringUtils.replace(TABLE_SYNC_PATH, "{dbSchema}", id.schema());
String url = StringUtils.replace(TABLE_SYNC_PATH, "{schema}", id.schema());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

* Coordinator. To prevent slowing initial queries, this class loads the
* current catalog contents into the local cache on lifecycle start, which
* avoids the on-demand reads that would otherwise occur. After the first load,
* events from the Coordinator keep the local cache evergreen.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are events from the coordinator pushed after this receiver loads on lifecycle start? May be good to note that here.

*/
private final String sqlType;
private final String dataType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasoning behind changing from sql type to druid type here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It now accepts both SQL and Druid native type strings in the catalog definition

.put(VARCHAR, ColumnType.STRING)
.build();
.put(SQL_BIGINT, ColumnType.LONG)
.put(SQL_VARCHAR, ColumnType.STRING)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to remove mappings for SQL_FLOAT and SQL_DOUBLE and TIMESTAMP here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part wasn't being used actually right now, removed it

return null;
}
ColumnType druidType = SQL_TO_DRUID_TYPES.get(StringUtils.toUpperCase(sqlType));
ColumnType druidType = SQL_TO_DRUID_TYPES.get(StringUtils.toUpperCase(dataType));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we changed the dataType of columnSpec to be the druid type instead of the sql typ now, so should we take the type as is now instead of looking up mapping here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It checks if it's a SQL type first and attempts to convert, and then does ColumnType.fromString(dataType) if not to treat it as a Druid native type

this.spec = spec;
if (Columns.isTimeColumn(spec.name()) && spec.dataType() == null) {
// For __time only, force a type if type is null.
this.sqlType = Columns.LONG;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be COLUMNS.SQL_BIGINT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is using the native type

@@ -68,16 +68,6 @@ public class DatasourceDefn extends TableDefn
*/
public static final String HIDDEN_COLUMNS_PROPERTY = "hiddenColumns";

/**
* By default: columns are optional hints. If a datasource has columns defined,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this description no longer apply?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this was removed in Paul's original PR

/**
* Column type for external tables.
*/
public static final String EXTERNAL_COLUMN_TYPE = "extern";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in another part of the original PR related to external table handling, it's not used currently but future changes will

return columns.get(name);
}

public boolean isEmpty()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does isEmpty signify whether or not the table had any data in it or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's set to true when there are no physical segments

Copy link
Contributor

@zachjsh zachjsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@jon-wei jon-wei merged commit 5d1e66b into apache:master Jan 8, 2024
83 checks passed
@LakshSingla LakshSingla added this to the 29.0.0 milestone Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - Dependencies Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants