-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Report: Vttablet is writing invalid _vt.schema_version rows #12981
Comments
Thank you for the amazing bug report, @brendar ! 😍 Are you still able to repeat it with this patch against main?
|
@mattlord that doesn't fix the issue. The pointers are being modified in I've put up a reproduction example here: Shopify@8f7b006 And a prototype fix here (copying the fields in |
Thanks, @brendar! I do think we'll need to that as there are bigger "race" related issues with the schema tracker that we need to address. Do you want to open a PR? |
Opened a PR here: #13045 |
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
Overview of the Issue
With vttablet schema tracking enabled (
--watch_replication_stream=true
and--track_schema_versions=true
), we're seeing invalidschemax
data being written to_vt.schema_version
rows.Symptoms
On v15.0.3 this causes vttablets to fail to start serving at startup with errors like this in their logs:
Historian failed to open: proto: cannot parse invalid wire-format data
. It also causes running tablets to silently stop loading new_vt.schema_version
rows, which effectively stops schema tracking.On main (tested on edb702b), which includes the switch from
proto.Marshal()
toMinimalSchema.MarshalVT()
, the invalidschema_version
row is never written because the call toMarshalVT()
panics. I'm not sure what state this leaves the tablet in.Cause
We believe the cause of the invalid
schemax
data is a race condition between marshaling the schema data inTracker. saveCurrentSchemaToDb()
and modifyingField.ColumnType
invstreamer.buildTableColumns()
Details
Using a mix of debugging and protoscope to unpack a corrupt
schemax
protobuf message, we found that the corruption occurred in theMinimalTable
message for the table that was migrated, and that the message size did not match the data written. We also noticed that theColumnType
fields of itsquery.Field
messages did not appear to be included in the message sizes. This led us to believe that something was settingColumnType
between when protobuf message sizes were calculated, and when data was actually written to the buffer.We found that
ColumnType
is being set invstreamer.buildTableColumns()
, and we believe this leads to a race condition when:vstreamer.buildTableColumns()
to be invoked concurrently withTracker.saveCurrentSchemaToDb()
when the client's vstreamer encounters a table map event.Reproduction Steps
We have been able to reproduce this race condition on both v15.0.3 and main (edb702b).
I can clean up our reproduction example and share it if that would be helpful, but the process was as follows:
v15.0.3
historian.loadFromDB
to log an error when it encounters a corrupt row--watch_replication_stream=true
and--track_schema_versions=true
)We were able to see corrupt schema_version rows being created every minute or so.
After reproducing the issue, we confirmed that commenting out
field.ColumnType = extColInfo.columnType
invstreamer.buildTableColumns()
prevented reoccurrence (but obviously we're not proposing that as the fix).main
We followed the same reproduction steps, but main includes the switch from
proto.Marshal()
toMinimalSchema.MarshalVT()
), so theschema_version
row is never written because the call toMarshalVT()
panics:Binary Version
main
vttablet --version Version: 17.0.0-SNAPSHOT (Git revision edb702b039f6c1c67446bfbe32aaf7b0c166693e branch 'main') built on Wed Apr 26 14:32:28 UTC 2023 by spin@localhost using go1.20.3 linux/amd64
Operating System and Environment details
Log Fragments
No response
The text was updated successfully, but these errors were encountered: