-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: change default charset and collation from 'utf8 utf8_bin' to 'utf8mb4 utf8mb4_bin' #7965
Conversation
types/field_type.go
Outdated
@@ -365,8 +365,8 @@ func DefaultTypeForValue(value interface{}, tp *FieldType) { | |||
// TODO: tp.Flen should be len(x) * 3 (max bytes length of CharsetUTF8) | |||
tp.Flen = len(x) | |||
tp.Decimal = UnspecifiedLength | |||
tp.Charset = mysql.DefaultCharset | |||
tp.Collate = mysql.DefaultCollationName | |||
tp.Charset = charset.CharsetUTF8MB4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use GetDefaultCharsetAndCollate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sense to me for this PR. My PR will change it back though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. However, I think we are treading on very dangerous ground with change until we just make utf8mb4 the default: #7757
@shenli PTAL |
The ci will be fixed after pingcap/parser#13 merged. |
/run-all-tests |
/run-all-tests -tidb-test=pr/646 |
/run-integration-ddl-test -tidb-test=pr/646 |
/run-sqllogic-test -tidb-test=pr/646 |
/run-integration-ddl-test -tidb-test=pr/646 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
expression/aggregation/descriptor.go
Outdated
a.RetTp.Charset = charset.CharsetUTF8 | ||
a.RetTp.Collate = charset.CollationUTF8 | ||
a.RetTp.Charset = charset.CharsetUTF8MB4 | ||
a.RetTp.Collate = charset.CollationUTF8MB4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/charset.CharsetUTF8MB4/mysql.DefaultCharset/
s/charset.CollationUTF8MB4/mysql.DefaultCollationName/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or: charset.GetDefaultCharsetAndCollate()
expression/builtin.go
Outdated
Flag: mysql.BinaryFlag, | ||
} | ||
} | ||
if mysql.HasBinaryFlag(fieldType.Flag) && fieldType.Tp != mysql.TypeJSON { | ||
fieldType.Charset, fieldType.Collate = charset.CharsetBin, charset.CollationBin | ||
} else { | ||
fieldType.Charset, fieldType.Collate = charset.CharsetUTF8, charset.CharsetUTF8 | ||
fieldType.Charset, fieldType.Collate = mysql.DefaultCharset, mysql.DefaultCollationName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about: charset.GetDefaultCharsetAndCollate()
expression/builtin.go
Outdated
@@ -199,7 +199,7 @@ func (b *baseBuiltinFunc) getRetTp() *types.FieldType { | |||
b.tp.Tp = mysql.TypeMediumBlob | |||
} | |||
if len(b.tp.Charset) <= 0 { | |||
b.tp.Charset, b.tp.Collate = charset.CharsetUTF8, charset.CollationUTF8 | |||
b.tp.Charset, b.tp.Collate = mysql.DefaultCharset, mysql.DefaultCollationName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
expression/builtin_cast.go
Outdated
Charset: charset.CharsetUTF8, | ||
Collate: charset.CollationUTF8, | ||
Charset: charset.CharsetUTF8MB4, | ||
Collate: charset.CollationUTF8MB4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
expression/builtin_cast.go
Outdated
@@ -1741,7 +1741,7 @@ func WrapWithCastAsString(ctx sessionctx.Context, expr Expression) Expression { | |||
argLen = mysql.MaxIntWidth | |||
} | |||
tp := types.NewFieldType(mysql.TypeVarString) | |||
tp.Charset, tp.Collate = charset.CharsetUTF8, charset.CollationUTF8 | |||
tp.Charset, tp.Collate = charset.CharsetUTF8MB4, charset.CollationUTF8MB4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -141,15 +141,15 @@ func newBaseBuiltinFuncWithTp(ctx sessionctx.Context, args []Expression, retType | |||
Tp: mysql.TypeJSON, | |||
Flen: mysql.MaxBlobWidth, | |||
Decimal: 0, | |||
Charset: charset.CharsetUTF8, | |||
Collate: charset.CollationUTF8, | |||
Charset: mysql.DefaultCharset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not charset.CharsetUTF8MB4
?
Charset: charset.CharsetUTF8, | ||
Collate: charset.CollationUTF8, | ||
Charset: mysql.DefaultCharset, | ||
Collate: mysql.DefaultCollationName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not charset.CollationUTF8MB4
expression/builtin.go
Outdated
Flag: mysql.BinaryFlag, | ||
} | ||
} | ||
if mysql.HasBinaryFlag(fieldType.Flag) && fieldType.Tp != mysql.TypeJSON { | ||
fieldType.Charset, fieldType.Collate = charset.CharsetBin, charset.CollationBin | ||
} else { | ||
fieldType.Charset, fieldType.Collate = charset.CharsetUTF8, charset.CharsetUTF8 | ||
fieldType.Charset, fieldType.Collate = mysql.DefaultCharset, mysql.DefaultCollationName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
expression/builtin.go
Outdated
@@ -199,7 +199,7 @@ func (b *baseBuiltinFunc) getRetTp() *types.FieldType { | |||
b.tp.Tp = mysql.TypeMediumBlob | |||
} | |||
if len(b.tp.Charset) <= 0 { | |||
b.tp.Charset, b.tp.Collate = charset.CharsetUTF8, charset.CollationUTF8 | |||
b.tp.Charset, b.tp.Collate = mysql.DefaultCharset, mysql.DefaultCollationName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
if mysql.HasBinaryFlag(rhs.Flag) || !types.IsNonBinaryStr(lhs) { | ||
resultFieldType.Flag |= mysql.BinaryFlag | ||
} | ||
} else if types.IsBinaryStr(lhs) || types.IsBinaryStr(rhs) || !evalType.IsStringKind() { | ||
types.SetBinChsClnFlag(resultFieldType) | ||
} else { | ||
resultFieldType.Charset, resultFieldType.Collate, resultFieldType.Flag = charset.CharsetUTF8, charset.CollationUTF8, 0 | ||
resultFieldType.Charset, resultFieldType.Collate, resultFieldType.Flag = mysql.DefaultCharset, mysql.DefaultCollationName, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -164,7 +164,7 @@ func (c *caseWhenFunctionClass) getFunction(ctx sessionctx.Context, args []Expre | |||
} | |||
fieldTp.Decimal, fieldTp.Flen = decimal, flen | |||
if fieldTp.EvalType().IsStringKind() && !isBinaryStr { | |||
fieldTp.Charset, fieldTp.Collate = mysql.DefaultCharset, mysql.DefaultCollationName | |||
fieldTp.Charset, fieldTp.Collate = charset.CharsetUTF8MB4, charset.CollationUTF8MB4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @XuHuaiyu , use mysql.DefaultCharset
or charset.CharsetUTF8MB4
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will mysql.DefaultCharset change? If so, we should we utf8mb4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XuHuaiyu, DefaultCharset
is better. It hides the implementation details, once we change the default charset again, the code modification can be minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, this place should be utf8mb4, if we did not set the charset, it can be default charset, but in here, it should definitely be utf8mb4.
@zz-jason @crazycs520 PTAL |
executor/show.go
Outdated
@@ -618,7 +618,7 @@ func (e *ShowExec) fetchShowCreateTable() error { | |||
buf.WriteString(") ENGINE=InnoDB") | |||
charsetName := tb.Meta().Charset | |||
if len(charsetName) == 0 { | |||
charsetName = charset.CharsetUTF8 | |||
charsetName = charset.CharsetUTF8MB4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/charset.CharsetUTF8MB4/mysql.DefaultCharset/
@@ -81,19 +81,19 @@ func inferType4ControlFuncs(lhs, rhs *types.FieldType) *types.FieldType { | |||
} | |||
} | |||
if types.IsNonBinaryStr(lhs) && !types.IsBinaryStr(rhs) { | |||
resultFieldType.Charset, resultFieldType.Collate, resultFieldType.Flag = charset.CharsetUTF8, charset.CollationUTF8, 0 | |||
resultFieldType.Charset, resultFieldType.Collate, resultFieldType.Flag = charset.CharsetUTF8MB4, charset.CollationUTF8MB4, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use mysql.DefaultCharset, mysql.DefaultCollationName
instead.
cs = mysql.DefaultCharset | ||
cl = mysql.DefaultCollationName | ||
cs = charset.CharsetUTF8MB4 | ||
cl = charset.CollationUTF8MB4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cs, cl = charset.GetDefaultCharsetAndCollate()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
mCharset = mysql.DefaultCharset | ||
mCollation = mysql.DefaultCollationName | ||
mCharset = charset.CharsetUTF8MB4 | ||
mCollation = charset.CollationUTF8MB4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
charset.GetDefaultCharsetAndCollate()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here should be definitely utf8mb4
server/server.go
Outdated
@@ -153,6 +153,7 @@ func NewServer(cfg *config.Config, driver IDriver) (*Server, error) { | |||
var err error | |||
if cfg.Socket != "" { | |||
if s.listener, err = net.Listen("unix", cfg.Socket); err == nil { | |||
// job.SnapshotVer == 0 means |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What problem does this PR solve?
fix #7920.
Change TiDB default charset and collation to "utf8mb4 utf8mb4_bin", TiDB treat all the data as utf8mb4 actually, but the previous default charset is "utf8", insert the 4 bytes unicode string into TiDB will be ok, but if we use mysqldump to restore the data back into mysql, the charset will be utf8, and it will report an error
ERROR 1366 (HY000): Incorrect string value: '\xF0\xA4\x8B\xAE' for column 'v' at row 1
.how it works?
mysql.DefaultCharset
fromUTF8Charset
toUTF8MB4Charset
.mysql.DefaultCollationName
fromUTF8DefaultCollation
toUTF8MB4DefaultCollation
.charset.CharsetUTF8
andCollationUTF8
, modify them tocharset.CharsetUTF8MB4
ormysql.DefaultCharset
.Then fix corresponding test cases.
Check List
Tests
Code changes
Related changes