-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set the default charset at startup #7757
Conversation
I am now running tidb with the arguments: -character-set-server utf8mb4 -collation-server utf8mb4_unicode_ci My understaning is that this doesn't change the behavior of TiDB (it is equivalent to the current configuration). However, it allows TiDB to be compatible with standard MySQL setups. Previously, I would see this failure message when using an application: unsupported modify column charset utf8 not match origin utf8mb4 Using the above command line flags, the application now works. There is still more to do to have proper charset support, but this is enough to fix my use case.
5b26f52
to
350fc0e
Compare
/run-all-tests |
if err != nil { | ||
return errors.Trace(err) | ||
} | ||
err = variable.SetSessionSystemVar(sessionVars, variable.CollationDatabase, types.NewStringDatum("utf8_unicode_ci")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @XuHuaiyu is any by design in here to using utf8_unicode_ci
instead of utf8_bin
like other place...and can we change this~?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackysp its hard for me to tell which question was being answered. Is this a good change to make?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it's a good change.
@@ -164,7 +164,7 @@ var defaultSysVars = []*SysVar{ | |||
{ScopeGlobal, "innodb_max_undo_log_size", ""}, | |||
{ScopeGlobal | ScopeSession, "range_alloc_block_size", "4096"}, | |||
{ScopeGlobal, ConnectTimeout, "10"}, | |||
{ScopeGlobal | ScopeSession, "collation_server", charset.CollationUTF8}, | |||
{ScopeGlobal | ScopeSession, "collation_server", mysql.DefaultCollationName}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found charset.CharsetUTF8, charset.CollationUTF8
was refered in another pkg, e.g. expression/ builtin_*
, does them also need change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it in more places now :)
@gregwebs the question that we want to solve is how abount only modify this PR seems open a door to modify global charset configuration, user maybe use this to set if we just want to support alter column from maybe need PTAL @winkyao |
I think making a change in Otherwise it is simple to validate that only utf8 and utf8mb4 are the allowed charsets for this setting: I can push a commit for that. |
41666dd
to
8883e5d
Compare
8883e5d
to
e8be196
Compare
@@ -557,7 +557,7 @@ func (s *testEvaluatorSuite) TestConv(c *C) { | |||
tp := f.GetType() | |||
c.Assert(tp.Tp, Equals, mysql.TypeVarString) | |||
c.Assert(tp.Charset, Equals, charset.CharsetUTF8) | |||
c.Assert(tp.Collate, Equals, charset.CharsetUTF8) | |||
c.Assert(tp.Collate, Equals, charset.CollationUTF8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bugfix from this PR, although I am not sure what part of this changset fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My pr also fixed it... :)
@gregwebs if we start tidb-server with
can we change |
@gregwebs |
@lysu no, you cannot change it back. That would be an additional feature to make the global variables work, but hopefully an easy change we can add on. Maybe that is needed to make this PR look right? Right now this changeset is focused on being good enough to solve the most common real-world usage where utf8mb4 is used for everything. We could also solve this by just switching our default charset from utf8 -> utf8mb4, is there a reason why we default to utf8? @coocood can you outline the changes to support modify column? But after that it is quite possible for there to be confusion and bugs because the real problem is that the column was supposed to be created with utf8mb4, not utf8. The modify issue is just an artifact of that. But it is useful to properly support modify, and I can test to see if that solves my problem for now. I agree this PR touches a lot for solving just my issue. Maybe you are thinking of a more elegant way of doing it? However, I think this change is necessary at some point to have proper MySQL compatibility, particularly if we add some of the related global variable setting. |
This will also become an issue for MySQL 8.0 compatibility, where utf8mb4 (not utf8) is the default character set. Some applications expect/depend on utf8mb4. |
Maybe instead of flags for setting charset, we should have |
Where possible, I think it's better to target a new GA as 8.0 compatible. Having modes can hurt usability, and users rarely change from defaults. |
I think we should separate UX from internal implementation. Internally, I doubt TiDB can afford to have to backport fixes to an old 5.7 only release and we will want one system that supports both modes, particularly as we start to add 8.0 compatibility but have not yet completed it. This could be exposed to the user via a single config option (compatibility mode), a collection of config options (config files), or two different binary builds with different default settings built-in. With any of these options, at some point we can switch to 8.0 as default (in the binary case that is more about naming the binaries). |
@@ -337,7 +352,7 @@ func overrideConfig() { | |||
} | |||
} | |||
|
|||
func validateConfig() { | |||
func validateConfig() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think It's better to print the error log then exit instead of return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its really problematic to have exits anywhere other then the main function. The linters warn against this. In this case it is a private function in main.go, but still any function which exits is not testable.
I agree that being in an intermediate state where some return error and some still exit is not great, but I don't want to start cleaning up other code in this PR. Although it is still possible I might be forced to when I start writing test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see. But it still looks so strange. We need to clean up the other code anyway, but may not in this PR. It's better to add a TODO here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
} | ||
|
||
func setGlobalVars() { | ||
func setGlobalVars() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also avoid this return value.
I am still hoping for agreement that this PR can be merged once the quality is good enough. Once agreed, I can write tests (write down any specific test cases you have in mind here). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It seems there is no need to modify the ddl package, so I remove the label |
I will remove the command line option to change the default charset. We need to just switch the default to utf8mb4, which is the correct MySQL terminology. |
Maybe we should merge #7965 first? |
@@ -81,19 +80,19 @@ func inferType4ControlFuncs(lhs, rhs *types.FieldType) *types.FieldType { | |||
} | |||
} | |||
if types.IsNonBinaryStr(lhs) && !types.IsBinaryStr(rhs) { | |||
resultFieldType.Charset, resultFieldType.Collate, resultFieldType.Flag = charset.CharsetUTF8, charset.CollationUTF8, 0 | |||
resultFieldType.Charset, resultFieldType.Collate, resultFieldType.Flag = mysql.DefaultCharset, mysql.DefaultCollationName, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to change to mysql.DefaultCharset, mysql.DefaultCollationName. This type is string, so the charset is definite utf8mb4(after my pr merged.) and my PR will change here to charset.CharsetUTF8MB4, charset.CollationUTF8MB4
@@ -557,7 +557,7 @@ func (s *testEvaluatorSuite) TestConv(c *C) { | |||
tp := f.GetType() | |||
c.Assert(tp.Tp, Equals, mysql.TypeVarString) | |||
c.Assert(tp.Charset, Equals, charset.CharsetUTF8) | |||
c.Assert(tp.Collate, Equals, charset.CharsetUTF8) | |||
c.Assert(tp.Collate, Equals, charset.CollationUTF8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My pr also fixed it... :)
@gregwebs I will support change any charset to utf8 or utf8mb4 later, I think it will solve your problem. |
if mysql.HasBinaryFlag(lhs.Flag) || !types.IsNonBinaryStr(rhs) { | ||
resultFieldType.Flag |= mysql.BinaryFlag | ||
} | ||
} else if types.IsNonBinaryStr(rhs) && !types.IsBinaryStr(lhs) { | ||
resultFieldType.Charset, resultFieldType.Collate, resultFieldType.Flag = charset.CharsetUTF8, charset.CollationUTF8, 0 | ||
resultFieldType.Charset, resultFieldType.Collate, resultFieldType.Flag = mysql.DefaultCharset, mysql.DefaultCollationName, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Allow setting of the default collation on the CLI. This allows setting -collation-server utf8mb4_unicode_ci, which is not correct, but could enhance compatibility
@winkyao should we go forward with this PR? We decided that we don't need to let the user set the charset at startup if we just default to utf8mb4. |
I am just going to close this for now. We can revisit this if we have a concrete use case for collation support, etc. |
What problem does this PR solve?
This adds MySQL compatibility for setting a default charset (and collation).
To some extent it is just a work around for the lack of charset support that makes existing applications that set utf8mb4 fail. Here is a bug report when someone tried to store data from Grafana.
I am a unsure if the current approach is good, so I haven't followed through with writing tests yet. I would be happy if the TiDB team took this over and used a better approach.
What is changed and how it works?
I am now running tidb with the arguments:
My understaning is that this doesn't change the behavior of TiDB (it is
equivalent to the current configuration).
However, it allows TiDB to be compatible with standard MySQL setups.
Previously, I would see this failure message when using an application:
Using the above command line flags, the application now works.
There is still more to do to have proper charset support,
but this is enough to fix my use case.
Check List
Tests
As per the above description, I manually tested that this fixed the issue with the application I am running (prisma).
Code changes
This change creates backwards compatible additions/enhancements.
Side effects
Related changes