-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize TRowSet generators #5851
Conversation
2990665
to
8af2825
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #5851 +/- ##
============================================
+ Coverage 61.28% 61.35% +0.07%
Complexity 23 23
============================================
Files 609 612 +3
Lines 36282 36238 -44
Branches 4976 4967 -9
============================================
- Hits 22235 22234 -1
+ Misses 11655 11601 -54
- Partials 2392 2403 +11 ☔ View full report in Codecov by Sentry. |
37c2bf6
to
8f5ebe9
Compare
import org.apache.kyuubi.shaded.hive.service.rpc.thrift._ | ||
import org.apache.kyuubi.shaded.hive.service.rpc.thrift.TTypeId._ | ||
|
||
class RowSetGenerator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have a different name for each module? e.g. ChatRowSetGenerator
TrinoRowSetGenerator
, KyuubiRowSetGenerator
(for server module)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, each module had the same class RowSet
for TRowSet generation. As RowSetGenerator
follows the same style and it's not shared crossing the modules, it's not required to put the engine name as part of the class name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this actually confuses developers when searching the code base with RowSetGenerator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have added the engine name as a class name prefix.
val taken = iter.take(rowSetSize) | ||
val resultRowSet = RowSet.toTRowSet(taken.toSeq, 1, getProtocolVersion) | ||
val taken = iter.take(rowSetSize).map(_.toSeq) | ||
val resultRowSet = new RowSetGenerator().toTRowSet( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why should create a new instance each time? is it stateful? if not, I suppose we can define object RowSetGenerator
instead of class RowSetGenerator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's designed on purpose. As for Spark's RowSetGenerator, we will have TimeFormatters instance inside, and should not be shared by others as they may not be thread-safe. As for Flink's RowSetGenerator, we init the RowSetGenerator with the timezone id instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense. thanks for explanation
kyuubi-common/src/main/scala/org/apache/kyuubi/engine/schema/AbstractRowSetGenerator.scala
Outdated
Show resolved
Hide resolved
aabfea9
to
d52d6ac
Compare
3936f4e
to
1d2f73a
Compare
Thanks, merged to master (1.9.0). |
…ects # 🔍 Description ## Issue References 🔗 As described. ## Describe Your Solution 🔧 - Introduced JdbcTRowSetGenerator extending `AbstractTRowSetGenerator ` introduced in #5851 in JDBC engine. - Provide a DefaultJdbcTRowSetGenerator as default implementation for mapping the JDBC data types to TRowSet generation - Make JDBC dialect providing TRowSetGenerator extending DefaultJdbcTRowSetGenerator to adapt detailed differences ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklists ## 📝 Author Self Checklist - [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project - [x] I have performed a self-review - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] New and existing unit tests pass locally with my changes - [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) ## 📝 Committer Pre-Merge Checklist - [ ] Pull request title is okay. - [ ] No license issues. - [ ] Milestone correctly set? - [ ] Test coverage is ok - [ ] Assignees are selected. - [ ] Minimum number of approvals - [ ] No changes are requested **Be nice. Be informative.** Closes #5861 from bowenliang123/jdbc-rowgen. Closes #5861 7f8658d [Bowen Liang] generalize jdbc TRowSet generator Authored-by: Bowen Liang <[email protected]> Signed-off-by: liangbowen <[email protected]>
# 🔍 Description ## Issue References 🔗 As described. ## Describe Your Solution 🔧 - Introduced a generalized RowSet generator `AbstractTRowSetGenerator[SchemaT, RowT, ColumnT]` - extract common methods for looping and assembling the rows to TRowSet - support generation for either column-based or row-based TRowSet - Each engine creates a sub-generator of `AbstractTRowSetGenerator` - focus on mapping and conversion from the engine's data type to the relative Thrift type - implements the schema data type and column value methods - create a generator instance instead of the previously used `RowSet` object, for isolated session-aware or thread-aware configs or context, eg. Timezone ID for Flink, and the Hive time formatters for Spark. - This PR covers the TRowSet generation for the server and the engines of Spark/Flink/Trino/Chat, except the JDBC engine which will be supported in the follow-ups with JDBC dialect support. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ No behavior changes. #### Behavior With This Pull Request 🎉 No behavior changes. #### Related Unit Tests CI tests. --- # Checklists ## 📝 Author Self Checklist - [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project - [x] I have performed a self-review - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] New and existing unit tests pass locally with my changes - [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) ## 📝 Committer Pre-Merge Checklist - [ ] Pull request title is okay. - [ ] No license issues. - [ ] Milestone correctly set? - [ ] Test coverage is ok - [ ] Assignees are selected. - [ ] Minimum number of approvals - [ ] No changes are requested **Be nice. Be informative.** Closes apache#5851 from bowenliang123/rowset-gen. Closes apache#5851 1d2f73a [Bowen Liang] common RowSetGenerator Authored-by: Bowen Liang <[email protected]> Signed-off-by: Bowen Liang <[email protected]>
…h dialects # 🔍 Description ## Issue References 🔗 As described. ## Describe Your Solution 🔧 - Introduced JdbcTRowSetGenerator extending `AbstractTRowSetGenerator ` introduced in apache#5851 in JDBC engine. - Provide a DefaultJdbcTRowSetGenerator as default implementation for mapping the JDBC data types to TRowSet generation - Make JDBC dialect providing TRowSetGenerator extending DefaultJdbcTRowSetGenerator to adapt detailed differences ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklists ## 📝 Author Self Checklist - [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project - [x] I have performed a self-review - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] New and existing unit tests pass locally with my changes - [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) ## 📝 Committer Pre-Merge Checklist - [ ] Pull request title is okay. - [ ] No license issues. - [ ] Milestone correctly set? - [ ] Test coverage is ok - [ ] Assignees are selected. - [ ] Minimum number of approvals - [ ] No changes are requested **Be nice. Be informative.** Closes apache#5861 from bowenliang123/jdbc-rowgen. Closes apache#5861 7f8658d [Bowen Liang] generalize jdbc TRowSet generator Authored-by: Bowen Liang <[email protected]> Signed-off-by: liangbowen <[email protected]>
🔍 Description
Issue References 🔗
As described.
Describe Your Solution 🔧
AbstractTRowSetGenerator[SchemaT, RowT, ColumnT]
AbstractTRowSetGenerator
RowSet
object, for isolated session-aware or thread-aware configs or context, eg. Timezone ID for Flink, and the Hive time formatters for Spark.Types of changes 🔖
Test Plan 🧪
Behavior Without This Pull Request ⚰️
No behavior changes.
Behavior With This Pull Request 🎉
No behavior changes.
Related Unit Tests
CI tests.
Checklists
📝 Author Self Checklist
📝 Committer Pre-Merge Checklist
Be nice. Be informative.