-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(cdc): only allow ADD
and DROP
in auto schema change
#18245
Conversation
ADD
and DROP
in auto schema change
@@ -1123,6 +1124,7 @@ async fn derive_schema_for_cdc_table( | |||
constraints: &Vec<TableConstraint>, | |||
connect_properties: WithOptionsSecResolved, | |||
need_auto_schema_map: bool, | |||
original_catalog: Option<Arc<TableCatalog>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some comments for this parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
let allowed_ddl = ["ADD COLUMN", "DROP COLUMN"]; | ||
let upper_upstream_ddl = upstream_ddl.to_uppercase(); | ||
let is_allowed = allowed_ddl | ||
.iter() | ||
.any(|&allowed_ddl| upper_upstream_ddl.contains(allowed_ddl)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fragile. Shall we instead compare the previous and current schema to check whether one is the subset of the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compare the previous and current schema to check whether one is the subset of the other?
Not quite get it. Here the goal is to skip upstream messages generated by unsupported ALTER
statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we only support ADD
and DROP
, in which cases we must have...
the previous and current schema to check whether one is the subset of the other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds ok to me, based on we only support ADD
and DROP
.
let allowed_ddl = ["ADD COLUMN", "DROP COLUMN"]; | ||
let upper_upstream_ddl = upstream_ddl.to_uppercase(); | ||
let is_allowed = allowed_ddl | ||
.iter() | ||
.any(|&allowed_ddl| upper_upstream_ddl.contains(allowed_ddl)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fragile.
+1.
What if there are both ALTER
and ADD/DROP COLUMN
statements in upstream_ddl
? From debezium's doc: "The ddl field can contain multiple DDL statements."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ddl field can contain multiple DDL statements.
I am aware of this statement, but I cannot construct this case even with transaction. And it seems mysql doesn't support DDL transactions. So actually the parser assumes each message only contains a single DDL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…18305) Co-authored-by: StrikeW <[email protected]>
if !(original_column_names.is_subset(&new_column_names) | ||
|| original_column_names.is_superset(&new_column_names)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realize that comparing the name sets is not sufficient, as users may alter the data type of a column, or even set a new default value. Those functionalities are not supported by us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. So there is one more check in below.
// skip the schema change if there is no change to original columns
if original_column_names == new_column_names {
continue;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if
- alter column type, we skip since there's no change on the names
- then, add column, we do schema change.
Will we also accidentally apply the changes happened in step 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we also accidentally apply the changes happened in step 1?
Yes, that would be a problem. Since we apply the latest version column schema from upstream. Maybe we should also check column type besides the names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
ADD
andDROP
column, we check whether the new columns and the original columns is a subset of the other in MetaChecklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.