-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(planner): correctly handle hidden columns for SourceBackfill #19578
Conversation
57fc1c4
to
fd2f157
Compare
fcd5805
to
64cc6ce
Compare
Shall we separate the newly introduced test cases into a different PR? 🤡 |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Done |
e61510f
to
39e77fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rubber stamp
let mut last_column_id = if skip_col_id { | ||
// col id will be filled outside later. Here just use a placeholder. | ||
ColumnId::new(0) | ||
} else { | ||
max_column_id(columns) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where we fill the ColumnId
, can you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
risingwave/src/frontend/src/handler/create_source.rs
Lines 1600 to 1605 in 39e77fb
// XXX: why do we use col_id_gen here? It doesn't seem to be very necessary. | |
// XXX: should we also chenge the col id for struct fields? | |
for c in &mut columns { | |
c.column_desc.column_id = col_id_gen.generate(c.name()) | |
} | |
debug_assert_column_ids_distinct(&columns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😄🤔 What is chenge
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you link to the function in create_source.rs
from the comment of ColumnId::new(0)
? I don't feel confident in terms of future maintenance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, why we need a placeholder here
shall we always place a dummy column_id when handling additional columns and fill the real one by col_id_gen.generate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use ColumnId::placeholder()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because at this stage the col ids are ColumnId::placeholder()
(i32::MAX - 1
), and will overflow here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because at this stage the col ids are
ColumnId::placeholder()
(i32::MAX - 1
), and will overflow here.
Then I suppose doing incrementation on a dummy column ID can be worse.🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
So previously, |
// For shared sources, we will include partition and offset cols in the SourceExecutor's *output*, to be used by the SourceBackfillExecutor. | ||
// For shared CDC source, the schema is different. See debezium_cdc_source_schema, CDC_BACKFILL_TABLE_ADDITIONAL_COLUMNS | ||
if is_shared_non_cdc { | ||
let (columns_exist, additional_columns) = source_add_partition_offset_cols( | ||
&columns, | ||
&with_properties.get_connector().unwrap(), | ||
true, // col_id filled below at col_id_gen.generate | ||
); | ||
for (existed, c) in columns_exist.into_iter().zip_eq_fast(additional_columns) { | ||
if !existed { | ||
columns.push(ColumnCatalog::hidden(c)); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that previously due to the logic in the planner, shared CDC source will also emit columns _rw_mysql-cdc_offset
, _rw_mysql-cdc_partition
(besides _rw_table_name
and _rw_offset
).
.. Or maybe somehow not included for CDC source. Since at the beginning of this PR, (i.e., using is_shared
, instead of is_shared_non_cdc
here), CDC will actually fail with index out of range at Java side..
I'm not 100% sure how it works. But it seems current stage is working.
@st1page I can't be confident without your approval |
:lark_cry: To be honest I do not very familiar with the shared source but every details in this PR LGTM. |
fix #19575 Signed-off-by: xxchan <[email protected]> fix test Signed-off-by: xxchan <[email protected]> add tpch test Signed-off-by: xxchan <[email protected]> fix Signed-off-by: xxchan <[email protected]>
Signed-off-by: xxchan <[email protected]>
Signed-off-by: xxchan <[email protected]>
Signed-off-by: xxchan <[email protected]>
Signed-off-by: xxchan <[email protected]>
Signed-off-by: xxchan <[email protected]>
ee18a7e
to
fd14a95
Compare
Merge activity
|
) Signed-off-by: xxchan <[email protected]>
Signed-off-by: xxchan <[email protected]>
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
fix #19575
Previously we add the hidden column in
StreamSource/StreamSourceScan::new
.It works for
StreamSource
(i.e., when create source, because it's a simple plan), but not forStreamSourceScan
(create MV). The index will be wrong when there are JOIN, etc above.We cannot put in
LogicalSource
neither, it should be in binder.Also add e2e & planner nexmark & tpch testsMoved to #19589Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.