Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charset on a per column basis cannot be set. #2326

Closed
rquadling opened this issue Feb 22, 2016 · 13 comments
Closed

Charset on a per column basis cannot be set. #2326

rquadling opened this issue Feb 22, 2016 · 13 comments

Comments

@rquadling
Copy link
Contributor

https://github.com/doctrine/dbal/blob/master/lib/Doctrine/DBAL/Platforms/AbstractPlatform.php#L2549

returns an empty string, and there are no overrides on a per plaform basis, so, when we get back to https://github.com/doctrine/dbal/blob/master/lib/Doctrine/DBAL/Platforms/AbstractPlatform.php#L2212, the charset is always empty.

@mhujer
Copy link

mhujer commented Nov 8, 2016

Here are links to specific commit, so you don't have to hunt for the lines which has already changed (you can press "Y" and github will change the current URL to the one containing hash)

This returns empty string:

public function getColumnCharsetDeclarationSQL($charset)
{
return '';
}

Called from:

' ' . $this->getColumnCharsetDeclarationSQL($field['charset']) : '';


As for the issue, I used collation instead and it worked for me (MySQL):
@ORM\Column(type="string", options={"collation":"ascii_bin"})
->
application_url VARCHAR(255) NOT NULL COLLATE ascii_bin

@Ocramius
Copy link
Member

Ocramius commented Nov 8, 2016

Possibly related: #2551

@Tobion
Copy link
Contributor

Tobion commented Nov 18, 2017

You can set the collation option on column (see #2919). The collation also afffects the charset. @rquadling can you give an example where you need to use charset but cannot use collation instead? Otherwise I'd consider this problem as Won't fix as there is a more flexible solution already.

@rquadling
Copy link
Contributor Author

I can't remember the exact use case, but the charset was UTF-8 and I wanted to use a different collation (utf8_unicode_ci rather than utf8_general_ci) ... I think. Not exactly sure.

Or, put it another way, whatever the issue was at the time that led me to post this ... we're 21 months on and I've not been held up by this issue. :-/

@Tobion
Copy link
Contributor

Tobion commented Apr 10, 2018

I guess it can be closed then

@JakobEichler
Copy link

This still seems to be an issue, because people like to set use an unique index on e-mail fields with a length of 254 and it will lead with utf8 to this error: Syntax error or access violation: 1071 Specified key was too long; max key length is 767 bytes

Due to an IRC discussion, the error can be avoided with the charset latin1.

@Ocramius
Copy link
Member

can be avoided with the charset latin1.

Please don't do that. Just use utf8_unicode_ci, unless you really are 100% sure of what you are doing.

@JakobEichler
Copy link

JakobEichler commented Dec 21, 2018

We solved it by restricting the length of the e-mail address to 767/4 → 191, because doctrine used utf8mb4.

A smart helper in the #mysql irc @freenode pointed out that an other option to solve the issue is to enable "long prefixes".

In my special case the issue is solved and no further change of doctrine is required.

https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_large_prefix

@rquadling
Copy link
Contributor Author

I have some info which may or may not be relevant.

In mysql, when you talk about utf8 encodings, you are really using utf8mb3 : https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8.html

And to add to the confusion, utf8mb3 will soon be removed from mysql in preference to utf8mb4 and the alias utf8 repointed to utf8mb4 : https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8.html

It looks like the best advice is to not use utf8 or utf8mb3 anymore and, instead, use utf8mb4.

But which collation? And which version of the collation as that is now a thing!

Prior to mysql 8, we have decided to use utf8mb4_unicode_520_ci : https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-sets.html. That provides (allegedly), the most uptodate version of Unicode, with support for the widest characters (so people can put emojis in their content and have it sorted if used in an indexable column ... yep! .... don't look at my, I'm just the developer, not the user). But we can't set the DB default connection or server collation to this on an RDS Mysql instance.

But in mysql 8, there is now support for later versions. Also, with the existing encodings, you can be case-insensitive (the _ci bit). Now you have accents to deal with. So, accent-insensitive (_ai).

And then there's padding to be taken into account : https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html

Things seem to have gotten a whole lot more complicated if you intend to have data from all over the world.

@Tobion
Copy link
Contributor

Tobion commented Apr 10, 2020

@Ocramius Issue can be closed IMO

@gggeek
Copy link

gggeek commented Nov 2, 2020

Maybe tangential to the issue at hand, but what if it was easier to make Doctrine not specify the charset and collation at table creation time and rely on the database default - would it not help in most scenarios?

@morozov
Copy link
Member

morozov commented Feb 17, 2022

Closing based on #2326 (comment). The DBAL will not provide any default options in 4.0 (#4644).

@morozov morozov closed this as completed Feb 17, 2022
@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants