-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema validation error (and diff error) when collation is set #5338
Comments
I have a similar issue that seems to happen when you define a When the column specifies the same as the Table defaults, they are removed by Doctrine\DBAL\Platforms\MySQL\Comparator::normalizeColumns. When comparing them to the database schema, it's always different that causes the diff tool to propose an update. I think this normalization is naive. When a user explicitly added the values, they should not be removed (normalized). You could say: "Why define charset/collate on the column when it's the same as the table default?" Well, if you are coming from a legacy application you often have old tables that are still in a different charset/collation. The only way to get JoinColumns to work, is when the foreign column and local column have the same charset/collation. That used to work when defining the charset/collation on the column that is referenced, it would then use that for the other column as well. |
Found a workaround: always define -#[ORM\Column(type: 'string', length: 20, options: ['collation' => 'ascii_bin'])]
+#[ORM\Column(type: 'string', length: 20, options: ['charset' => 'ascii', 'collation' => 'ascii_bin'])] And the garbage output of It looks like the diff algorithm fails to infer the |
Unfortunately, that workaround does not work in my case. I am using I have the following ORM-Definition in my Application entity: /**
* @ORM\Column(type="string", nullable=false, options={"collation"="utf8mb4_unicode_ci"})
*/
private string $status; This results in the following generated diff: // up
$this->addSql('ALTER TABLE application CHANGE status status VARCHAR(255) NOT NULL COLLATE `utf8mb4_unicode_ci`');
// down
$this->addSql('ALTER TABLE application CHANGE status status VARCHAR(255) NOT NULL'); If I apply the workaround and change the ORM definition to: +@ORM\Column(type="string", nullable=false, options={"charset"="utf8mb4", "collation"="utf8mb4_unicode_ci"})
-@ORM\Column(type="string", nullable=false, options={"collation"="utf8mb4_unicode_ci"}) The diff tool generates the same faulty migrations, but with the added charset definitions. // up
+$this->addSql('ALTER TABLE application CHANGE status status VARCHAR(255) CHARACTER SET utf8mb4 NOT NULL COLLATE `utf8mb4_unicode_ci`');
-$this->addSql('ALTER TABLE application CHANGE status status VARCHAR(255) NOT NULL COLLATE `utf8mb4_unicode_ci`');
// down
$this->addSql('ALTER TABLE application CHANGE status status VARCHAR(255) NOT NULL'); |
Note that it would help more if anyone affected by this issue could provide a way to reproduce it instead of exchanging workarounds. |
@BenMorel already set up a reproduction example and explained it in the initial post:
|
@Dherlou the project in the initial post reproduces the issue by using quite some dependencies that the DBAL doesn't depend on. In order to address the issue in the DBAL, we need to reproduce it by using only the DBAL APIs. See #5338 (comment). |
@morozov Per your request I created a reproduction example repo: https://github.com/Dherlou/doctrine-collation-bug-repro As far as I understand it, the
I think that the case in bold is not implemented in DBAL. If I leave out the charset definition, it is detected as a diff, because the unspecified charset in my column definition (doesn't matter if it comes from ORM or directly from the DBAL-only code) is not equal to the current charset of the database table (in my case utf8mb4). Therefore a diff like: ALTER TABLE Test CHANGE name name VARCHAR(255) NOT NULL COLLATE `utf8mb4_unicode_ci` without a charset definition (= "null") is generated. If it is executed, it doesn't change the table definition, because the collation is already the same and a charset change is not specified. This way it can be run in an endless loop of |
Alright, so this is the reproducer from the test repo: <?php
use Doctrine\DBAL\Tools\Console\ConnectionProvider;
use Doctrine\DBAL\Schema\Schema;
use Doctrine\DBAL\Types\Types;
require 'vendor/autoload.php';
$connection = /* get connection */;
$connection->executeQuery('CREATE TABLE IF NOT EXISTS Test (id INT AUTO_INCREMENT NOT NULL, name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE `utf8mb4_unicode_ci` NOT NULL, PRIMARY KEY(id)) DEFAULT CHARACTER SET utf8 COLLATE `utf8_unicode_ci` ENGINE = InnoDB;');
$schemaManager = $connection->createSchemaManager();
$fromSchema = $schemaManager->createSchema();
$toSchema = new Schema([], [], $schemaManager->createSchemaConfig());
$table = $toSchema->createTable('test');
$table->addColumn(
'id',
Types::INTEGER,
[
'autoincrement' => true
]
);
$table->setPrimaryKey(["id"]);
$table->addColumn(
'name',
Types::STRING,
[
'length' => 255,
'notnull' => true,
'platformOptions' => [
'version' => false
],
'customSchemaOptions' => [
// 'charset' => 'utf8mb4', // this is the workaround. uncommenting it results in no diff
'collation' => 'utf8mb4_unicode_ci'
]
]
);
$up = $schemaManager->createComparator()->compareSchemas($fromSchema, $toSchema);
$diffSql = $up->toSql($connection->getDatabasePlatform());
echo("Up Diffs:\n");
var_dump($diffSql);
$down = $schemaManager->createComparator()->compareSchemas($toSchema, $fromSchema);
$diffSql = $down->toSql($connection->getDatabasePlatform());
echo("Down Diffs:\n");
var_dump($diffSql); |
So the charset is specified in the column definition in SQL: name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE `utf8mb4_unicode_ci` And the charset is not specified in the column definition in the DBAL: $table->addColumn(
'name',
Types::STRING,
[
'length' => 255,
'notnull' => true,
'platformOptions' => [
'version' => false
],
'customSchemaOptions' => [
// 'charset' => 'utf8mb4', // this is the workaround. uncommenting it results in no diff
'collation' => 'utf8mb4_unicode_ci'
]
]
); The resulting migration up will remove the charset (specified → not specified): ALTER TABLE Test CHANGE name name VARCHAR(255) NOT NULL COLLATE `utf8mb4_unicode_ci` The resulting migration down will add the charset (not specified → specified): ALTER TABLE test CHANGE name name VARCHAR(255) CHARACTER SET utf8mb4 NOT NULL COLLATE `utf8mb4_unicode_ci` What is the problem? Why do you consider specifying the charset a workaround? |
As far as I understand the general concept, a charset is the encoding of characters (i.e. how they are stored as bytes) whereas the collations are the ordering/sorting rules for their respective charset. A collation depends on a charset and cannot work without it. Therefore (at least in the MySQL/MariaDB world), it is not possible to "remove the charset". Specifying the charset is optional and therefore my example DBAL/ORM code does not specify a charset explicitly. If it is not specified, it is inferred from the provided collation (in this example: I find it hard to differentiate, whether this is an actual "bug" or just a "missing vendor-specific convenience feature", but with the current state of implementation it is required to explicitly specify the charset (in contrast to the MySQL specification where it is optional) to have DBAL work as expected. |
My understanding of the meaning of the charset/collation is the same as yours. By "removing" I mean removing from the definition, not from the column as such.
Why do you specify it in SQL then? Will the issue still persist if it's omitted in the SQL as well? |
Fair point. I removed the charset definition from the table creation SQL statement. (It also still had a charset/collation of |
Here's a simplified reproducer w/o SQL: <?php
use Doctrine\DBAL\Schema\Table;
use Doctrine\DBAL\Tools\Console\ConnectionProvider;
use Doctrine\DBAL\Types\Types;
require 'vendor/autoload.php';
$connection = /* get connection */;
$table = new Table('test');
$table->addColumn(
'name',
Types::STRING,
[
'length' => 255,
'customSchemaOptions' => [
'collation' => 'utf8mb4_unicode_ci',
]
]
);
$connection->executeStatement('DROP TABLE IF EXISTS test');
$schemaManager = $connection->createSchemaManager();
$schemaManager->createTable($table);
$t2 = $schemaManager->listTableDetails('test');
$diff = $schemaManager->createComparator()->diffTable($t2, $table);
var_dump($connection->getDatabasePlatform()->getAlterTableSQL($diff));
// array(1) {
// [0]=>
// string(84) "ALTER TABLE test CHANGE name name VARCHAR(255) NOT NULL COLLATE `utf8mb4_unicode_ci`"
// }
Yeah, that's what's happening. We probably need to auto-detect the charset if the collation is specified but the charset isn't using something from the documentation. For instance, SELECT CHARACTER_SET_NAME
FROM information_schema.COLLATIONS
WHERE COLLATION_NAME = 'utf8mb4_unicode_ci';
-- utf8mb4 It's possible that the same issue will exist if the collation is also omitted but it's inherited from some defaults as it works on SQL Server. |
The issue should be fixed in the |
The new test fails on |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Bug Report
Not sure if this belongs here or to ORM / Migrations, but this looks directly related to #5322.
Summary
When an entity has a
collation
set for astring
column:doctrine:schema:validate
always reports:and
doctrine:migrations:diff
generates the same migration again and again, even after executing it.How to reproduce
Small reproduce project: https://github.com/BenMorel/doctrine-bug-2022-03-29
DATABASE_URL
in.env
or.env.local
bin/console doctrine:database:create
bin/console doctrine:migrations:diff
bin/console doctrine:migrations:migrate -n
bin/console doctrine:schema:validate
At this point, running
doctrine:migrations:diff
again generates the same migration again and again:The text was updated successfully, but these errors were encountered: