Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize TRowSet generators #5851

Closed
wants to merge 1 commit into from

Conversation

bowenliang123
Copy link
Contributor

@bowenliang123 bowenliang123 commented Dec 13, 2023

🔍 Description

Issue References 🔗

As described.

Describe Your Solution 🔧

  • Introduced a generalized RowSet generator AbstractTRowSetGenerator[SchemaT, RowT, ColumnT]
    • extract common methods for looping and assembling the rows to TRowSet
    • support generation for either column-based or row-based TRowSet
  • Each engine creates a sub-generator of AbstractTRowSetGenerator
    • focus on mapping and conversion from the engine's data type to the relative Thrift type
    • implements the schema data type and column value methods
    • create a generator instance instead of the previously used RowSet object, for isolated session-aware or thread-aware configs or context, eg. Timezone ID for Flink, and the Hive time formatters for Spark.
  • This PR covers the TRowSet generation for the server and the engines of Spark/Flink/Trino/Chat, except the JDBC engine which will be supported in the follow-ups with JDBC dialect support.

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

No behavior changes.

Behavior With This Pull Request 🎉

No behavior changes.

Related Unit Tests

CI tests.


Checklists

📝 Author Self Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • This patch was not authored or co-authored using Generative Tooling

📝 Committer Pre-Merge Checklist

  • Pull request title is okay.
  • No license issues.
  • Milestone correctly set?
  • Test coverage is ok
  • Assignees are selected.
  • Minimum number of approvals
  • No changes are requested

Be nice. Be informative.

@bowenliang123 bowenliang123 changed the title Extact common RowSetGenerator for engines Generalized RowSet generator Dec 13, 2023
@bowenliang123 bowenliang123 marked this pull request as ready for review December 13, 2023 15:44
@codecov-commenter
Copy link

codecov-commenter commented Dec 14, 2023

Codecov Report

Attention: 42 lines in your changes are missing coverage. Please review.

Comparison is base (a0fdead) 61.28% compared to head (1d2f73a) 61.35%.
Report is 2 commits behind head on master.

Files Patch % Lines
...che/kyuubi/sql/schema/ServerTRowSetGenerator.scala 28.12% 21 Missing and 2 partials ⚠️
...gine/spark/schema/SparkArrowTRowSetGenerator.scala 47.82% 8 Missing and 4 partials ⚠️
...yuubi/engine/schema/AbstractTRowSetGenerator.scala 93.80% 4 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #5851      +/-   ##
============================================
+ Coverage     61.28%   61.35%   +0.07%     
  Complexity       23       23              
============================================
  Files           609      612       +3     
  Lines         36282    36238      -44     
  Branches       4976     4967       -9     
============================================
- Hits          22235    22234       -1     
+ Misses        11655    11601      -54     
- Partials       2392     2403      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bowenliang123 bowenliang123 force-pushed the rowset-gen branch 2 times, most recently from 37c2bf6 to 8f5ebe9 Compare December 15, 2023 03:19
import org.apache.kyuubi.shaded.hive.service.rpc.thrift._
import org.apache.kyuubi.shaded.hive.service.rpc.thrift.TTypeId._

class RowSetGenerator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a different name for each module? e.g. ChatRowSetGenerator TrinoRowSetGenerator, KyuubiRowSetGenerator(for server module)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, each module had the same class RowSet for TRowSet generation. As RowSetGenerator follows the same style and it's not shared crossing the modules, it's not required to put the engine name as part of the class name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this actually confuses developers when searching the code base with RowSetGenerator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I have added the engine name as a class name prefix.

val taken = iter.take(rowSetSize)
val resultRowSet = RowSet.toTRowSet(taken.toSeq, 1, getProtocolVersion)
val taken = iter.take(rowSetSize).map(_.toSeq)
val resultRowSet = new RowSetGenerator().toTRowSet(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should create a new instance each time? is it stateful? if not, I suppose we can define object RowSetGenerator instead of class RowSetGenerator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's designed on purpose. As for Spark's RowSetGenerator, we will have TimeFormatters instance inside, and should not be shared by others as they may not be thread-safe. As for Flink's RowSetGenerator, we init the RowSetGenerator with the timezone id instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense. thanks for explanation

@bowenliang123 bowenliang123 changed the title Generalized RowSet generator Generalize TRowSet generators Dec 15, 2023
@bowenliang123
Copy link
Contributor Author

Thanks, merged to master (1.9.0).

@bowenliang123 bowenliang123 deleted the rowset-gen branch December 15, 2023 09:44
bowenliang123 added a commit that referenced this pull request Dec 18, 2023
…ects

# 🔍 Description
## Issue References 🔗

As described.

## Describe Your Solution 🔧

- Introduced JdbcTRowSetGenerator extending `AbstractTRowSetGenerator ` introduced in #5851 in JDBC engine.
- Provide a DefaultJdbcTRowSetGenerator as default implementation for mapping the JDBC data types to TRowSet generation
- Make JDBC dialect providing TRowSetGenerator extending DefaultJdbcTRowSetGenerator to adapt detailed differences

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklists
## 📝 Author Self Checklist

- [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project
- [x] I have performed a self-review
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [x] New and existing unit tests pass locally with my changes
- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

## 📝 Committer Pre-Merge Checklist

- [ ] Pull request title is okay.
- [ ] No license issues.
- [ ] Milestone correctly set?
- [ ] Test coverage is ok
- [ ] Assignees are selected.
- [ ] Minimum number of approvals
- [ ] No changes are requested

**Be nice. Be informative.**

Closes #5861 from bowenliang123/jdbc-rowgen.

Closes #5861

7f8658d [Bowen Liang] generalize jdbc TRowSet generator

Authored-by: Bowen Liang <[email protected]>
Signed-off-by: liangbowen <[email protected]>
zhaohehuhu pushed a commit to zhaohehuhu/incubator-kyuubi that referenced this pull request Mar 21, 2024
# 🔍 Description
## Issue References 🔗

As described.

## Describe Your Solution 🔧

- Introduced a generalized RowSet generator `AbstractTRowSetGenerator[SchemaT, RowT, ColumnT]`
   - extract common methods for looping and assembling the rows to TRowSet
   - support generation for either column-based or row-based TRowSet
- Each engine creates a sub-generator of  `AbstractTRowSetGenerator`
   - focus on mapping and conversion from the engine's data type to the relative Thrift type
   - implements the schema data type and column value methods
   - create a generator instance instead of the previously used `RowSet` object, for isolated session-aware or thread-aware configs or context, eg. Timezone ID for Flink, and the Hive time formatters for Spark.
- This PR covers the TRowSet generation for the server and the engines of Spark/Flink/Trino/Chat, except the JDBC engine which will be supported in the follow-ups with JDBC dialect support.

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️
No behavior changes.

#### Behavior With This Pull Request 🎉
No behavior changes.

#### Related Unit Tests
CI tests.

---

# Checklists
## 📝 Author Self Checklist

- [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project
- [x] I have performed a self-review
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [x] New and existing unit tests pass locally with my changes
- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

## 📝 Committer Pre-Merge Checklist

- [ ] Pull request title is okay.
- [ ] No license issues.
- [ ] Milestone correctly set?
- [ ] Test coverage is ok
- [ ] Assignees are selected.
- [ ] Minimum number of approvals
- [ ] No changes are requested

**Be nice. Be informative.**

Closes apache#5851 from bowenliang123/rowset-gen.

Closes apache#5851

1d2f73a [Bowen Liang] common RowSetGenerator

Authored-by: Bowen Liang <[email protected]>
Signed-off-by: Bowen Liang <[email protected]>
zhaohehuhu pushed a commit to zhaohehuhu/incubator-kyuubi that referenced this pull request Mar 21, 2024
…h dialects

# 🔍 Description
## Issue References 🔗

As described.

## Describe Your Solution 🔧

- Introduced JdbcTRowSetGenerator extending `AbstractTRowSetGenerator ` introduced in apache#5851 in JDBC engine.
- Provide a DefaultJdbcTRowSetGenerator as default implementation for mapping the JDBC data types to TRowSet generation
- Make JDBC dialect providing TRowSetGenerator extending DefaultJdbcTRowSetGenerator to adapt detailed differences

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklists
## 📝 Author Self Checklist

- [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project
- [x] I have performed a self-review
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [x] New and existing unit tests pass locally with my changes
- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

## 📝 Committer Pre-Merge Checklist

- [ ] Pull request title is okay.
- [ ] No license issues.
- [ ] Milestone correctly set?
- [ ] Test coverage is ok
- [ ] Assignees are selected.
- [ ] Minimum number of approvals
- [ ] No changes are requested

**Be nice. Be informative.**

Closes apache#5861 from bowenliang123/jdbc-rowgen.

Closes apache#5861

7f8658d [Bowen Liang] generalize jdbc TRowSet generator

Authored-by: Bowen Liang <[email protected]>
Signed-off-by: liangbowen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants