Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frontend: refactor source schema resolution #9828

Open
BugenZhao opened this issue May 16, 2023 · 5 comments
Open

frontend: refactor source schema resolution #9828

BugenZhao opened this issue May 16, 2023 · 5 comments

Comments

@BugenZhao
Copy link
Member

BugenZhao commented May 16, 2023

Generally, there're multiple sources of truth for the catalog derived from a CREATE SOURCE or CREATE TABLE statement.

  • Column definitions and the constraints on them. key INT PRIMARY KEY
  • Table definitions. PRIMARY KEY (key)
  • Properties along with the row format. ROW SCHEMA LOCATION '..'

Given a parsed CREATE statement, it seems not that intuitive to decide the order of the steps for resolving each of them. In the current implementation, we...

  1. bind column definitions without constraints
  2. bind PRIMARY KEY column constraints
  3. bind PRIMARY KEY table constraint
  4. generate a row_id column if necessary
  5. resolve source schema
    • check if valid according to the connector type
    • may add extra columns (for example, debezium mongo json)
    • may direct overwrite or discard the binding results so far (😨), according to the connector type
  6. bind GENERATED column constraints (since we just have all columns resolved)

It's obvious to see that the procedure is somehow complicated and can be confusing, especially for step 5 which is too ad-hoc. Here I propose we let step 5 directly operate on the AST structure ahead of time, making all other binding steps work on an immutable and final CREATE statement AST. The advantages could be...

Feel free to comment. cc @tabVersion @st1page @xiangjinwu @yuhao-su

@st1page
Copy link
Contributor

st1page commented May 16, 2023

LGTM

@st1page
Copy link
Contributor

st1page commented May 17, 2023

before we determine use AST structure as the interface of the source schema resolution, I'd like to do some refactor on the function and try to make it better. 🤔

@st1page st1page modified the milestones: release-1.0, release-1.1 Jul 12, 2023
@st1page st1page modified the milestones: release-1.1, release-1.2 Aug 8, 2023
@st1page st1page modified the milestones: release-1.2, release-1.3 Sep 11, 2023
@st1page st1page modified the milestones: release-1.5, release-1.6 Dec 5, 2023
@xxchan
Copy link
Member

xxchan commented May 19, 2024

Is it finished?

@st1page st1page modified the milestones: release-1.6, release-1.10 May 20, 2024
@st1page
Copy link
Contributor

st1page commented May 20, 2024

Is it finished?

I think not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants