Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean table alias generation and generate unique aliases up-front #32784

Closed
roji opened this issue Jan 11, 2024 · 0 comments · Fixed by #32785
Closed

Clean table alias generation and generate unique aliases up-front #32784

roji opened this issue Jan 11, 2024 · 0 comments · Fixed by #32785
Assignees
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Milestone

Comments

@roji
Copy link
Member

roji commented Jan 11, 2024

Table aliases need to be unique in the SQL tree (at least within a given SELECT) Our current table alias management works as follows:

  • SelectExpression has private _usedAliases state that tracks which aliases have already been used.
  • During translation, tables are created with aliases that don't yet take into account aliases used further up. As a result, when a new SQL clause (predicate, ordering...) is added to a SelectExpression, we run a visitor (AliasUniquifier) over the clause, which accepts the containing select's _usedAliases and rewrites aliases recursively inside the clause to make sure they don't conflict.
  • This has the following consequences:
    • This notably means that the same node can get visited multiple times for uniquification, as tree fragments get added to a select, and then that select gets added to another select, and so on.
    • Since we uniquify aliases late - after the table has already been created - the alias is mutable, and AliasUniquifier simply sets it. This is one place where our SQL tree isn't fully immutable, and can cause the usual issues related to mutability.
    • When we added support for UpdateExpression, we forgot to invoke the AliasUniquifier, causing #31208.

We're now exploring ways to simplify the query tree and its processing, and specifically to get rid of TableReferenceExpression and have ColumnExpression just contain its table alias directly as a string. For this to be feasible, the alias on a table cannot change after its constructed, since a column may already be referencing it via its alias (or more precisely, it can be changed, but then we need to recursively look for column in the tree and change them as well). We also generally want to make things more immutable, as well as remove private state from SelectExpression, making it more like a regular expression.

As a result, we should do the following:

  • When instantiating a table, pass an already generated, fully unique alias to it.
    • This allows referencing the table immediately from a column, and removes the need to do alias uniquification in an additional pass every time we add a clause to a SelectExpression.
  • Track used aliases globally in the query, rather than storing that state on SelectExpression.
    • We'll do this by introducing a new SqlAliasManager service, which is owned by QueryCompilationContext (so scoped to the entire compilation). The holds the state of which aliases are already in use, and is responsible for generating unique ones.
    • This also concentrates alias generation in a single, pluggable place - the actual policy for generating aliases (i.e. take the 1st character and lower-case) is currently spread all over the place, including in table expression types. Providers/users will be able to replace this service to plug in new alias generation policies.
  • Tables can get pruned out of the tree because they're not needed. In this case, we currently leave gaps in the alias numbering (#32757)
    • While this in itself isn't a big problem, uniquifying aliases eagerly as proposed above causes even more gaps, since during translation we sometimes translate a fragment and then throw it away (or retranslate it). This causes the alias to be considered "taken", although it never actually made its way into the tree.
    • To make things nice and consecutive, we'll add a post-processing step where SqlAliasManager gets a chance to go over the tree and rewrite, or "finalize" aliases. This will close any numbering gaps, and can also do fancier things in the future, like turn some aliases into wider, multi-character ones for readability rather than just adding numbers (#31134).
    • Note that alias post-processing is fully optional - removing that step would never cause any queries to break, since aliases are already uniquified at the very start. It would just cause uglier/less consecutive aliases to be generated.
roji added a commit to roji/efcore that referenced this issue Jan 11, 2024
roji added a commit to roji/efcore that referenced this issue Jan 11, 2024
roji added a commit to roji/efcore that referenced this issue Jan 11, 2024
@roji roji self-assigned this Jan 14, 2024
@roji roji added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Jan 14, 2024
roji added a commit to roji/efcore that referenced this issue Jan 15, 2024
roji added a commit that referenced this issue Jan 15, 2024
@ajcvickers ajcvickers added this to the 9.0.0 milestone Jan 17, 2024
@ajcvickers ajcvickers modified the milestones: 9.0.0, 9.0.0-preview1 Jan 31, 2024
@roji roji modified the milestones: 9.0.0-preview1, 9.0.0 Oct 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants