Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: specifying table schema when creating source/table from avro/protobuf encode whose full schema is externally defined #12199

Open
3 tasks
st1page opened this issue Sep 11, 2023 · 6 comments
Assignees
Milestone

Comments

@st1page
Copy link
Contributor

st1page commented Sep 11, 2023

Currently, we do not allow user to define schema in column clause

CREATE SOURCE(a int, b int) FORMAT PLAIN ENCODE PROTOBUF;
 ERROR: ExecuteError: Protocol error: User-defined schema is not allowed with FORMAT PLAIN ENCODE PROTOBUF

But some user needs it to prune columns, especially when they are creating a table with the connector, the selected columns determine how many columns will be materialized in storage. Also, casting the source data into the expected datatype is needed too.

Another issue is that if the user want to define a generated column, he must specify columns in the create source/table statement. We might need to introduce another syntax #12209 fixed

  • avro
  • protobuf
  • json with schema registry
@github-actions github-actions bot added this to the release-1.2 milestone Sep 11, 2023
@fuyufjh fuyufjh modified the milestones: release-1.2, release-1.3 Sep 11, 2023
@fuyufjh
Copy link
Member

fuyufjh commented Sep 11, 2023

How can user specifies a subset of schema when using schema registry?

It feels like this is not necessary. They may just ignore these columns when creating MVs on this source.

How to use generated column when using schema registry?

+1 for the syntax of your proposed.

@st1page
Copy link
Contributor Author

st1page commented Oct 10, 2023

How can user specifies a subset of schema when using schema registry?

It feels like this is not necessary. They may just ignore these columns when creating MVs on this source.

But when they create tables with primary key, the specified columns influence which columns will be materialized in storage.

@st1page st1page modified the milestones: release-1.3, release-1.4 Oct 10, 2023
@fuyufjh fuyufjh modified the milestones: release-1.4, release-1.5 Nov 8, 2023
@hzxa21
Copy link
Collaborator

hzxa21 commented Nov 9, 2023

How can user specifies a subset of schema when using schema registry?

It feels like this is not necessary. They may just ignore these columns when creating MVs on this source.

One thing semi-related to this issue is #10949. If user can specify a subset of columns, we may be able to filter out unnecessary changes (new row == old row) in the Table's materialize executor during the conflict check.

@tabVersion tabVersion modified the milestones: release-1.5, release-1.6 Dec 6, 2023
@tabVersion tabVersion modified the milestones: release-1.6, release-1.7 Jan 9, 2024
@tabVersion
Copy link
Contributor

tabVersion commented Jan 9, 2024

How can user specifies a subset of schema when using schema registry?

It feels like this is not necessary. They may just ignore these columns when creating MVs on this source.

But when they create tables with primary key, the specified columns influence which columns will be materialized in storage.

The feature seems not a pain point. But still have some concerns about the compatibility when updating schema.
Let's keep this open and make it a ramp-up task.

@tabVersion
Copy link
Contributor

we can allow user-defined parts only when both name and type are matched with the ones mapped from avro/pb. Let's take another look, cc @st1page

Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants