-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper handling of CHAR columns with binary collations #8730
Conversation
I'm actually unsure in which endtoend test this test data lands. Right now we're seeing a flare of failing tests, but none of them seems to indicate anything |
Binlog events w/o the COLLATION clause from the test case (vreplication works fine):
Max storage/display width bytes went from 12 to 9 -- because this is MySQL 5.7 where |
Awesome! I see the same error when I do manual tests; and we need to fix this. What concerns me is that I can't see which CI test fails due to the changes in this PR. We should find at least one failed test first, and if there isn't, generate one, and then work to fix the problem and ensure the test passes. It's easy for me to produce a test case in |
@mattlord some character set info is found here: Lines 590 to 678 in 7d0607c
which is currently mostly used to map a character set to Question: I don';t really understand the code in https://github.com/mysql/mysql-server/blob/beb865a960b9a8a16cf999c323e46c5b0c67f21f/sql/sql_show.cc#L5468-L5482 and in https://github.com/mysql/mysql-server/blob/beb865a960b9a8a16cf999c323e46c5b0c67f21f/sql/field.cc#L6410-L6440. I wonder: would it be correct to just trim right the |
I created #8749 as a draft PR, which introduces a test that is known to fail CI. I prefer the tests in this PR, but am unsure where they land and which test they fail. |
What confuses me is that |
So the This means that we need to apply the padding, or trim the previously added padding, when forming the SQL statement on the vplayer side -- where we should have the character set information for the table and column(s). @shlomi-noach make sense? I think that's what you were alluding to with:
|
@mattlord I have to say, so nice to have someone so authoritative! 😊 |
I'm not familiar with the binlog parsing, but how does mysqlbinlog tool generate following info?
Are these usable info for auto-padding? |
Hi @tokikanno!
At the code level, mysqlbinlog parses the events in essentially the same manner we are in vitess: https://github.com/mysql/mysql-server/blob/5.7/client/mysqlbinlog.cc At a logical level, you can see the docs here: https://dev.mysql.com/doc/refman/5.7/en/mysqlbinlog-row-events.html
|
thx, got it 🤨 |
8a1da75
to
437f77a
Compare
e08c25f
to
82e4788
Compare
We need to take the max bytes per character into account or we try to insert too many characters into the fixed length column. Signed-off-by: Matt Lord <[email protected]>
82e4788
to
fe9ed55
Compare
Looks like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have the insight on how binaries are encoded in MySQL; tests were added - if they pass, then the PR is good to merge!
If we do the padding higher in the call stack then we have the column schema, which includes the character set, that we can use to do the padding correctly. Signed-off-by: Matt Lord <[email protected]>
5822063
to
e7c8cfb
Compare
There's some test failures that I don't fully understand -- this method is cauing VDiffs to fail, along with the onlineddl_vrepl varbinary test. Reverting to the post-trim method as it now seems safer -- if less elegant. Signed-off-by: Matt Lord <[email protected]>
e7c8cfb
to
639acae
Compare
And also ensure that both the value and the column type binary Signed-off-by: Matt Lord <[email protected]>
…yCollation Proper handling of CHAR columns with binary collations Signed-off-by: Matt Lord <[email protected]>
…yCollation Proper handling of CHAR columns with binary collations Signed-off-by: Matt Lord <[email protected]>
Backport [#8730 to 12.0]: Proper handling of CHAR columns with binary collations
Backport [#8730 to 11.0]: Proper handling of CHAR columns with binary collations
Description
The work in #8137 seems to have introduced a bug in how we vreplicate CHAR columns that are using a binary collation. If we e.g. have a column defined as
foo CHAR(3) COLLATE UTF8MB4_BIN
then the column gets padded to the maximum byte length / display width of 12 --utf8mb4
is 1-4 bytes per character -- which causes the SQL statement to fail as we're then trying to insert 12 characters into a 3 character column:Test Case
Final results:
Originating Binlog events:
It's the max length you see there of 12 bytes that is the issue. We have to take the max-bytes-per-character into account when padding the byte array with null bytes (
\0
s). So in this case we have a max of 4 bytes per character withutf8mb4
and we should be calculating this to map the max bytes to max chars when padding with null bytes: 12 / 4 = 3. In effect, we need to make the same kind of adjustment done here and here in MySQL. I don't yet see where we have character set info available here in Vitess though...Related Issue(s)
#8743
Checklist
Deployment Notes