Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look into making SelectExpression (more) immutable, getting rid of TableReferenceExpression #32558

Closed
roji opened this issue Dec 8, 2023 · 1 comment · Fixed by #32812
Closed
Assignees
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Milestone

Comments

@roji
Copy link
Member

roji commented Dec 8, 2023

Our ColumnExpressions currently reference their tables via TableReferenceExpressions, which are themselves mutable; this is the source of many bugs, e.g. when the SelectExpression is copied (e.g. when a visitor changes something inside), but the columns remain shared between the two copies, causing various referential integrity problems (e.g. #32234). This system also adds substantial complexity to keep everything in sync, again adding a source of bugs.

To simplify this, we'll consider removing TableReferenceExpression altogether, and have ColumnExpression reference its TableExpressionBase via its string alias only; there would be no actual .NET reference between the column and the table.

  • This makes the SQL tree almost immutable
    • SelectExpression still has a temporary mutable mode during translation - this is orthogonal to the table/column relationship and their mutability.
    • At the moment, the table alias is a property on TableExpressionBase; there's no easy way to immutably change that alias - it's a mutable property. We need to make that immutable as well.
  • Any visitor that needs the actual TableExpressionBase given a ColumnExpression would need to track aliases/tables as part of its state and perform lookups.
    • This effectively moves the referential relationship between column and table out of the expression tree, and into visitor state, which is temporary/point-in-time.
    • There are only a few such actual needs; the type mapping inference visitor is one, and the pruner makes an optional use of this as well. In any case, the additional work for tracking alias/table mappings is negligible and does not add any considerable complexity.
  • Once we're fully immutable, we can consider removing expression cloning altogether. We may still decide to continue doing it only so that cloned fragments inside the same tree have different table aliases (though note that we've never had fully unique aliases in the tree). Note that we'd ideally stop having cloned duplicates within the same query tree (i.e. GroupBy), at which point this will become totally moot.
  • Note that this depends on a change in how we manage aliases (see #32784).
    • Currently, we generate temporary aliases for tables created in SQL fragments, and then run a uniquifying visitor when that fragment is added to a SelectExpression in the tree (predicate, ordering...).
    • This "late" alias uniquification would require us to also replace all columns referencing the table (and is generally inefficient even without that).
    • Instead, we should generate a unique alias up-front, and pass it to the table when it's created, not changing it later (except for post-processing, at which point we'd need a pass to rewrite the referencing columns).

Note that we considered two alternatives:

  • Have ColumnExpression directly reference its TableExpressionBase (without TableReferenceExpression).
    • This is also a fully immutable model; but it means that any change to a TableExpressionBase requires a additional recursive visit to all the SelectExpression's clauses to replace any ColumnExpressions (and their containing expressions). Recall that a table can be a SELECT subquery, at which point any modification within would trigger such a recursive rewrite. This is quite a heavy alternative.
    • Referencing the table via its alias is a bit similar, but only requires recursive rewriting when the alias changes, and otherwies allows effort-less replacing of tables. The only visitor which currently modifies aliases is the alias manager post-processor (which closes alias gaps due to e.g. truncated tables); this seems very acceptable (and the postprocessing step is also completely optional/esthetic).
  • Keep TableReferenceExpression, but make it owned by TableExpressionBase rather than the SelectExpression
    • Each TableExpressionBase would reference its TableReferenceExpression.
    • When a table needs to be replaced, we go to the old table's TRE and set it to point to the new table. Crucially, the new table must also share the same TRE from the moment it's created, to ensure that we don't have two TREs.
    • This would resolve a lot of the book-keeping and complexity concerns associated with maintaining a list of TableReferenceExpressions on SelectExpression, mirroring the list of tables.
    • This keeps the SQL tree mutable, meaning that we need to clone to reuse, etc.

Note that this has nothing to do with the SelectExpression having a "mutable mode", where its projection hasn't yet been applied (that's completely orthogonal to this).

@roji
Copy link
Member Author

roji commented Jan 13, 2024

Note: I largely rewrote the above to take into account the more recent plan to switch to string table aliases.

@roji roji added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Jan 14, 2024
roji added a commit to roji/efcore that referenced this issue Jan 14, 2024
And do some cleanup around table annotations

Continues work from dotnet#32558
roji added a commit to roji/efcore that referenced this issue Jan 14, 2024
And do some cleanup around table annotations

Continues work from dotnet#32558
@ajcvickers ajcvickers added this to the 9.0.0 milestone Jan 17, 2024
roji added a commit to roji/efcore that referenced this issue Jan 18, 2024
And do some cleanup around table annotations

Continues work from dotnet#32558
roji added a commit to roji/efcore that referenced this issue Jan 18, 2024
And do some cleanup around table annotations

Continues work from dotnet#32558
roji added a commit to roji/efcore that referenced this issue Jan 18, 2024
And do some cleanup around table annotations

Continues work from dotnet#32558
roji added a commit that referenced this issue Jan 19, 2024
And do some cleanup around table annotations

Continues work from #32558
@ajcvickers ajcvickers modified the milestones: 9.0.0, 9.0.0-preview1 Jan 31, 2024
@roji roji modified the milestones: 9.0.0-preview1, 9.0.0 Oct 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants