-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLServer - fix unicode support for text type - nvarchar(max) instead of varchar(max) #6421
base: 4.2.x
Are you sure you want to change the base?
SQLServer - fix unicode support for text type - nvarchar(max) instead of varchar(max) #6421
Conversation
94b2c3a
to
96f47a5
Compare
I initially thought that VARCHR/CHAR were deprecated in SQLServer, but reading the documentation again this is far from true:
Only TEXT, NTEXT and IMAGE are deprecated and will be removed. I've created a small test script: CREATE DATABASE test2 COLLATE Latin1_General_100_BIN2
GO
USE test2
CREATE TABLE collation_test (
id INT NOT NULL IDENTITY,
name1 VARCHAR(100) NOT NULL,
name2 NVARCHAR(100) NOT NULL
)
INSERT INTO collation_test (name1, name2) VALUES (
'UmlautsÄÜÖß',
'UmlautsÄÜÖß'
)
SELECT LEN(ct.name1), LEN(ct.name2), DATALENGTH(ct.name1), DATALENGTH(ct.name2) FROM collation_test ct
GO
CREATE DATABASE test COLLATE Latin1_General_100_BIN2_UTF8
GO
USE test
CREATE TABLE collation_test (
id INT NOT NULL IDENTITY,
name1 VARCHAR(100) NOT NULL,
name2 NVARCHAR(100) NOT NULL
)
INSERT INTO collation_test (name1, name2) VALUES (
'UmlautsÄÜÖß',
'UmlautsÄÜÖß'
)
SELECT LEN(ct.name1), LEN(ct.name2), DATALENGTH(ct.name1), DATALENGTH(ct.name2) FROM collation_test ct
GO
Result:
So even I follow the initial issue #2323 for years now and use NVARCHAR() everywhere, I think this shouldn't be merged, as it is based in outdated information. |
@fabiang thanks for your comment. Just to summarise:
Is it right? I am not sure if I can see your point - if What then about users who are stuck (for whatever reason) on version prior to 15.x? |
I still use NVARCHAR because I had outdated or wrong informations. As of SQLServer 2019 I'll use VARCHAR now with the correct collation set. Therefore I think doctrine/dbal should only use VARCHAR(length)/VARCHAR(MAX) now everywhere. And imho SQLServer < 15.x support should be dropped, instead of having a BC-break. |
Summary
Very old bug for SQLServer platform preventing usage unicode characters with "text" type.
Background
Here are previously reported issues and attempts to fix it:
Before it was unfortunately not classified as a bug - see #5237 (comment)
but let me provide here a bit more light, why actually IT IS a bug.
First of all in SQLServer we have types like:
varchar
(variable length)char
(fixed length)text
(deprecated in favour ofvarchar(max)
)But these types does not accept support UTF-8, so therefore there are the following types:
nvarchar
(variable length)nchar
(fixed length)ntext
(deprecated in favour ofnvarchar(max)
)At some point, when doctrine evolved SQLServer platform from MSSQL the change has been made, but not in all places - see 8b29ffe:
So it was recognised that we suppose to use
nvarchar
in some places, but notntext
in the other.And since then it is just there.
Later it was just updated to use
varchar(max)
instead oftext
(astext
become deprecated): #451Current situation
Now, on version 4.0.x, we still have no ability to use text field with SQLServer for UTF-8 characters.
The other fields:
dbal/src/Platforms/SQLServerPlatform.php
Line 875 in cbd0e9a
dbal/src/Platforms/SQLServerPlatform.php
Line 890 in cbd0e9a
do use
n
version of the columns by, but not the text one:dbal/src/Platforms/SQLServerPlatform.php
Lines 910 to 913 in cbd0e9a
and there is no easy option to use it, but overriding the type to provide support of length=-1 on the type level (so migrations is not trying to create nvarchar(-1) but uses nvarchar(max); -1 is reported back from the schema when we use max length, and this is also the same behaviour for varchar field).
Additional considerations
As it would solve the current issue, it's not a perfect solution.
It might be even considered as a BC break for some, especially when taking into account storage differences between
varchar
andnvarchar
types.Not everyone has a need to store unicode characters, but I haven't seen any issues reported that "
(var)char
should be used instead ofn(var)char
".If we want to provide full flexibility we should be able to choose the exact type we want - either unicode or non-unicode, BUT the default solution should be at least consistent and not mix of these two.