Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate relying on the current implementation of the database object name parser #6592

Merged
merged 3 commits into from
Nov 13, 2024

Conversation

morozov
Copy link
Member

@morozov morozov commented Nov 12, 2024

Q A
Type deprecation

Currently, when building DDL, the DBAL quotes only the identifies that are explicitly quoted or are reserved keywords. The reason for not quoting all of them is that the databases that respect the SQL-92 standard normalize the case of unquoted identifiers but preserve the case of the quoted ones. If the DBAL quoted all identifiers, then an object containing lower-case letters in its name would have to be referenced as quoted in SQL (e.g, "id") on Oracle and IBM DB2. Similarly, a column containing upper-case letters would have to be quoted on PostgreSQL.

The current design has the following major downsides:

  1. The need to maintain lists of reserved keywords for supported platforms.
  2. Inconsistency of the resulting object names on Oracle and IBM DB2 depending on whether the name is a reserved keyword. For instance, an unquoted column named "id" will be created in the database as upper-case ID but unquoted (and implicitly quoted) "select" will be created as lower-case select.
  3. Lack of auto-quoting the names that aren't keywords but need to be quoted (e.g. the ones that begin with a digit).
  4. Potential security issues.

Additionally, the current object name parser is rudimentary and exposes various issue (see #4357 for examples).

While the solution proposed in #4772 is hard to pull off at once, this change should be a manageable first step. The implementation is drafted in #6589.

The logic is the following:

  1. Parse the name according to simplified SQL syntax (see the details in the parser's PHPDoc).
  2. Normalize unquoted identifiers according to the rules of the destination database platforms.
  3. Consistently quote all identifiers. At this point, knowing whether the identifier is a reserved keyword is unnecessary.

The upgrade path:

  1. Compare the results of parsing the name using the current and the new implementations.
  2. Trigger a deprecation notice in case of mismatch. Besides incorrect parsing of quoted identifiers containing dots, all other deprecations should be possible to address by properly formatting the object name.

The overhead of using two parsers should be insignificant since schema management is usually not a hot path.

@morozov morozov force-pushed the new-asset-name-parser branch from 0fc75bb to 4e67695 Compare November 12, 2024 02:04
@morozov morozov force-pushed the new-asset-name-parser branch from 4e67695 to 358c6ea Compare November 12, 2024 02:07
@morozov morozov marked this pull request as ready for review November 12, 2024 02:17
@morozov morozov added this to the 4.3.0 milestone Nov 12, 2024
@morozov morozov requested review from greg0ire and derrabus November 12, 2024 02:17
@morozov morozov merged commit 43d8490 into doctrine:4.3.x Nov 13, 2024
90 of 91 checks passed
@morozov morozov deleted the new-asset-name-parser branch November 13, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants