Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-32336 deb default config - use collation-server = utf8mb4_uca1400_ai_ci #2775

Merged

Conversation

grooverdan
Copy link
Member

  • The Jira issue number for this PR is: MDEV-32336

Description

utf8mb4_general_ci has been outdated for a while. Lets use our modern standard.

How can this PR be tested?

my_print_defaults --mysqld on install in debian

Basing the PR against the correct MariaDB version

  • This is a new feature and the PR is based against the latest MariaDB development branch.
  • [] This is a bug fix and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@grooverdan grooverdan added the MariaDB Foundation Pull requests created by MariaDB Foundation label Oct 1, 2023
Copy link
Contributor

@ottok ottok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration to utf-8 was done in Debian by having Debian defaults that override the upstream defaults (latin1) as they seem outdated and unsuitable for most users. According to https://mariadb.com/kb/en/server-system-variables/#character_set_server and https://mariadb.com/kb/en/server-system-variables/#collation_server this still seems to be the case.

Would it not be the best solution to simply remove these customizations in debian/ and default to upstream server defaults, and ensure upstream has modern and sensible values?

If changing upstream default is not an option, the change in debian/ should be properly documented. The Jira https://jira.mariadb.org/browse/MDEV-32336 does not explain why the new default should be specifically utf8mb4_uca1400_ai_ci nor does the commit message explain why this change should be done to this value:

MDEV-32336 deb default config - use collation-server =
 utf8mb4_uca1400_ai_ci

utf8mb4_general_ci has been outdated for a while. Lets use our modern
standard.

See tips 2 & 3 in about good commit message. The commit message should say something about why utf8mb4_uca1400_ai_ci is the best value and why this change is done now in 11.3 and not say in 11.4 as https://jira.mariadb.org/browse/MDEV-25829 says.

@grooverdan
Copy link
Member Author

Good point on commit message. Later and actually a standard is the general criteria however I'll find some better words.

Looking back on why utf8mb4_general_ci, was copied from utf8 comment from 2012 in 438ed04 in and in 7c2079f was made non-commented, but updated to utf8mb4.

MDEV-25829, I'm not sure is a genuine target version of 11.4, but I hadn't looked for it either.

I'd certainly welcome better upstream defaults too.

11.3 was chosen as only non-packaged releases have been done on this so I was assuming its still compatible and not causing packaging regressions.

@grooverdan grooverdan force-pushed the bb-11.3-MDEV-32336-deb-collation-server branch from f1a83e2 to 410f86b Compare October 5, 2023 14:32
@illuusio
Copy link
Contributor

illuusio commented Oct 10, 2023

What is the real difference between utf8mb4_uca1400_ai_ci and utf8mb4_general_ci. Is it just newer standard or all the pages are supported?

@grooverdan
Copy link
Member Author

Mainly from https://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci - utf8mb4_general_ci isn't a standard, it was just a (slightly) quicker (and dirty) implementation.

@grooverdan grooverdan force-pushed the bb-11.3-MDEV-32336-deb-collation-server branch from 410f86b to 2194317 Compare October 17, 2023 04:11
@ottok
Copy link
Contributor

ottok commented Oct 19, 2023

You probably want to have the git commit message updated with the text you posted in PR comments here, and use as title something like "MDEV-32336: Use utf8mb4_uca1400_ai_ci as default collation in Debian"

utf8mb4_general_ci has been outdated for a while and contained loosely
standardized collations.

UCA-14.0.0 has a more defined collation with multiple benefit that new
users may not immediately consider, or may assume to be default.

By defining default collation for utf8mb4 to be uc1400_ai_ci newly
created tables will have a modern standard collation.
@grooverdan grooverdan force-pushed the bb-11.3-MDEV-32336-deb-collation-server branch from 2194317 to ae122c7 Compare October 19, 2023 04:09
@grooverdan
Copy link
Member Author

Acceptable?

@grooverdan grooverdan merged commit 0e8dfcf into MariaDB:11.3 Oct 19, 2023
12 of 13 checks passed
@grooverdan grooverdan deleted the bb-11.3-MDEV-32336-deb-collation-server branch October 19, 2023 21:54
@illuusio
Copy link
Contributor

Is that correct new notation that I don't know utf8mb4=uca1400_ai_ci. I suppose so. I have to admit look very weird but if it work then I'm cool with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MariaDB Foundation Pull requests created by MariaDB Foundation
Development

Successfully merging this pull request may close these issues.

3 participants