Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

airframe-sql: Fix qualifier resolution #2653

Merged
merged 45 commits into from
Dec 23, 2022
Merged

airframe-sql: Fix qualifier resolution #2653

merged 45 commits into from
Dec 23, 2022

Conversation

xerial
Copy link
Member

@xerial xerial commented Dec 20, 2022

Purpose

  • Propagate qualifiers (table name) properly so that join keys can be resolved correctly.
  • Merge the common logic for Attribute handling (AllColumn, SingleColumn, MultiColumn, Unresolved/ResolvedAttribute)

Major changes

  • Renamed MultiColumn -> MultiSourceColumn to clarify the meaning
  • TypeResolver.resolveTableScan -> Do not add qualifiers to ResolvedAttribute. Qualifiers can be added if they are given in the original SQL
  • Fixed Join outputAttributes:
    • JoinUsing(keys, ..): merge join keys as MultiSourceColumn. Added ResolveJoinUsing(...)
    • JoinOn(cond): Output all input child relation outputs
  • Changed Attribute class structure
    • Add common methods (e.g., qualifier, withQualifier, withAlias, etc.)
    • ResolvedAttribute is changed to take a single source column because we can merge multiple ResolvedAttributes with MultiSourceColumn
    • Removed alias parameters. Added Alias expression instead for simplicity
    • Moved matches, matched methods into Attribute base class
      • Use ColumnPath(database, table, column) pairs to properly match columns. With this change, Attributes belonging to different tables can be found as expected
    • UnresolvedAttribute now can take a qualifier given in the original SQL
    • Improved toString to distinguish different Attribute types:
      • SingleColumn: (fullName):(dataType) := (expr)
      • MultSourceColumn: (fullName):(dataType) := { c1, c2, ...}
      • Alias: <fullName> := (expr)
      • ResolvedAttribute *(fullName):(dataType) <- (sourceColumn)
  • Simplified Set operation's outputAttributes method by using transpose and the newly added Attribute methods

TypeResolver changes

  • TypeResolver
    • Fixed join, sort, and projection column resolvers to use new Attribute
    • Extracted attribute match and merge logic to Attribute class methods.
    • When resolving attributes, if there is a source column without any change, propagate it to the resolved attribute
    • Optimize nested SingleColumn(ResolvedAttribute, ...) as a single ResolvedAttribute
    • Reorder resolver rules. GroupBy, SortBy need to be processed last after resolving select items

Minor changes

  • SQLGenerator
    • Changed the behavior within SELECT ... and other part (e.g., GROUP BY ...). In generating select items, printSelectItem will be used first to avoid using alias (e.g., c as a1) in ORDER BY, GROUP BY clauses.

@github-actions github-actions bot added the bug label Dec 20, 2022
@xerial xerial changed the title airframe-sql: Fix qualifier airframe-sql: Fix qualifier resolution Dec 20, 2022
@xerial xerial marked this pull request as ready for review December 21, 2022 21:24
@xerial xerial requested a review from takezoe December 21, 2022 21:49
@codecov
Copy link

codecov bot commented Dec 21, 2022

Codecov Report

Merging #2653 (11a8d23) into master (9db537a) will increase coverage by 0.03%.
The diff coverage is 83.24%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2653      +/-   ##
==========================================
+ Coverage   82.14%   82.17%   +0.03%     
==========================================
  Files         334      334              
  Lines       14050    14019      -31     
  Branches     2213     2187      -26     
==========================================
- Hits        11541    11520      -21     
+ Misses       2509     2499      -10     
Impacted Files Coverage Δ
...cala/wvlet/airframe/sql/analyzer/SQLAnalyzer.scala 90.00% <ø> (ø)
.../scala/wvlet/airframe/sql/model/ResolvedPlan.scala 55.00% <57.14%> (-1.25%) ⬇️
...in/scala/wvlet/airframe/sql/model/Expression.scala 70.91% <76.47%> (+3.70%) ⬆️
...la/wvlet/airframe/sql/analyzer/SQLAnonymizer.scala 73.17% <80.00%> (+0.19%) ⬆️
...ala/wvlet/airframe/sql/analyzer/TypeResolver.scala 91.36% <84.00%> (-2.10%) ⬇️
...n/scala/wvlet/airframe/sql/model/LogicalPlan.scala 88.88% <92.30%> (-0.56%) ⬇️
...a/wvlet/airframe/parquet/ParquetQueryPlanner.scala 89.36% <100.00%> (ø)
...cala/wvlet/airframe/sql/analyzer/CTEResolver.scala 94.11% <100.00%> (ø)
.../scala/wvlet/airframe/sql/analyzer/Optimizer.scala 78.57% <100.00%> (ø)
...cala/wvlet/airframe/sql/analyzer/RewriteRule.scala 66.66% <100.00%> (-16.67%) ⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9db537a...11a8d23. Read the comment docs.

@xerial xerial added internal Internal changes (usually non-user facing) breaking Breaking changes and removed internal Internal changes (usually non-user facing) labels Dec 21, 2022
c1 shouldBe ra1.withAlias("p1")
c2 shouldBe rb1
test("ru2: resolve union with column alias") {
val p = analyze("select p1 from (select id as p1 from A union all select id as p1 from B)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following two queries should be resolved in the same way. But the results are different:

select p1 from (select id as p1 from A union all select id as p1 from B)
  -> List(p1:long := {<p1> := *id:long <- A.id, <p1> := *id:long <- B.id})

select p1 from (select id as p1 from A union all select id from B)
  -> List(*p1:long <- A.id)

Also, p2 shouldn't be resolved in the following query but it's wrongly resolved.

select p2 from (select id as p1 from A union all select id as p2 from B)
  -> List(*p2:long <- B.id)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Let me check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test cases for them #2663 and working on the fix.

@xerial xerial merged commit b5bbc28 into master Dec 23, 2022
@xerial xerial deleted the fix-qualifier branch December 23, 2022 06:02
xerial added a commit that referenced this pull request Dec 23, 2022
- Propagate qualifiers (table name) properly so that join keys can be
resolved correctly.
- Merge the common logic for Attribute handling (AllColumn,
SingleColumn, MultiColumn, Unresolved/ResolvedAttribute)

- Renamed MultiColumn -> MultiSourceColumn to clarify the meaning
- TypeResolver.resolveTableScan -> Do not add qualifiers to
ResolvedAttribute. Qualifiers can be added if they are given in the
original SQL
- Fixed Join outputAttributes:
- JoinUsing(keys, ..): merge join keys as MultiSourceColumn. Added
ResolveJoinUsing(...)
  - JoinOn(cond): Output all input child relation outputs
- Changed Attribute class structure
  - Add common methods (e.g., qualifier, withQualifier, withAlias, etc.)
- ResolvedAttribute is changed to take a single source column because we
can merge multiple ResolvedAttributes with MultiSourceColumn
- Removed alias parameters. Added Alias expression instead for
simplicity
  - Moved matches, matched methods into Attribute base class
- Use ColumnPath(database, table, column) pairs to properly match
columns. With this change, Attributes belonging to different tables can
be found as expected
- UnresolvedAttribute now can take a qualifier given in the original SQL
  - Improved toString to distinguish different Attribute types:
    - SingleColumn:   `(fullName):(dataType) := (expr)`
    - MultSourceColumn: `(fullName):(dataType) := { c1, c2, ...}`
    - Alias: `<fullName> := (expr)`
    - ResolvedAttribute `*(fullName):(dataType) <- (sourceColumn)`
- Simplified Set operation's `outputAttributes` method by using
transpose and the newly added Attribute methods

- TypeResolver
- Fixed join, sort, and projection column resolvers to use new Attribute
- Extracted attribute match and merge logic to Attribute class methods.
- When resolving attributes, if there is a source column without any
change, propagate it to the resolved attribute
- Optimize nested SingleColumn(ResolvedAttribute, ...) as a single
ResolvedAttribute
- Reorder resolver rules. GroupBy, SortBy need to be processed last
after resolving select items

- SQLGenerator
- Changed the behavior within SELECT ... and other part (e.g., GROUP BY
...). In generating select items, printSelectItem will be used first to
avoid using alias (e.g., c as a1) in ORDER BY, GROUP BY clauses.

Reformat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking changes bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants