-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flink-CDC 2.3.0 consumes data based on SPECIFIC_OFFSETS. If the table structure is changed after the starting offset, it will not be able to consume the data correctly. #1962
Comments
This is an expected behavior, which has been written in the mysql cdc connector doc.
|
#1724 is the PR about this part. |
flink 1.14.5 `2023-03-14 11:19:41,676 WARN com.ververica.cdc.connectors.mysql.debezium.reader.BinlogSplitReader - Failed to close the binlog split reader in 30 seconds. Process finished with exit code -1 |
Thank you very much, @ruanhang1993. I understand your point, so I removed the Hello, @leonardBang, is there a plan to solve this issue? |
is this fixed? same problem with me. |
is this fixed? same problem with me. |
Search before asking
Flink version
1.14.5
Flink CDC version
2.3.0
Database and its version
5.7.37
Minimal reproduce step
What did you expect to see?
The CDC program consumes the binlog normally from the specified GTIDs
What did you see instead?
The CDC program throws an error:
Anything else?
It is very common in actual business scenarios to encounter situations where the table structure has been changed after the starting binlog offset when recovering data from SPECIFIC_OFFSETS. If this issue is not resolved, the functionality of data recovery from SPECIFIC_OFFSETS will become less useful.
After reading the source code of CDC and Debezium, I found that when recovering data based on SPECIFIC_OFFSETS, the table structure is still obtained from the latest schema. To solve this problem, I believe that it is necessary to obtain a schema snapshot at the starting binlog timestamp. This can be achieved using the following approach:
The biggest problem with this approach may be that it could take a long time to recover the historical schema snapshot. Perhaps there is a better solution available?
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: