-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Umbrella] Improvements and evaluation for TRowSet generation of Spark Engine #5808
Open
3 of 12 tasks
Labels
Comments
This was referenced Dec 4, 2023
pan3793
pushed a commit
that referenced
this issue
Dec 7, 2023
…ization of TRowSet generation # 🔍 Description ## Issue References 🔗 Subtask of #5808. ## Describe Your Solution 🔧 Value serialization to Hive style string by `HiveResult.toHiveString` requires a `TimeFormatters` instance handling the date/time data types. Reusing the pre-created time formatters's instance, it dramatically reduces the overhead and improves the TRowset generation. This may help to reduce memory footprints and fewer operations for TRowSet generation performance. This is also aligned to the Spark's `RowSetUtils` in the implementation of RowSet generation for [SPARK-39041](https://issues.apache.org/jira/browse/SPARK-39041) by yaooqinn , with explicitly declared TimeFormatters instance. ## Types of changes 🔖 - [x] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklists ## 📝 Author Self Checklist - [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project - [x] I have performed a self-review - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) ## 📝 Committer Pre-Merge Checklist - [x] Pull request title is okay. - [x] No license issues. - [x] Milestone correctly set? - [x] Test coverage is ok - [x] Assignees are selected. - [x] Minimum number of approvals - [x] No changes are requested **Be nice. Be informative.** Closes #5811 from bowenliang123/rowset-timeformatter. Closes #5811 2270991 [Bowen Liang] Reuse time formatters instance in value serialization of TRowSet Authored-by: Bowen Liang <[email protected]> Signed-off-by: Cheng Pan <[email protected]>
pan3793
pushed a commit
that referenced
this issue
Dec 7, 2023
…ization of TRowSet generation # 🔍 Description ## Issue References 🔗 Subtask of #5808. ## Describe Your Solution 🔧 Value serialization to Hive style string by `HiveResult.toHiveString` requires a `TimeFormatters` instance handling the date/time data types. Reusing the pre-created time formatters's instance, it dramatically reduces the overhead and improves the TRowset generation. This may help to reduce memory footprints and fewer operations for TRowSet generation performance. This is also aligned to the Spark's `RowSetUtils` in the implementation of RowSet generation for [SPARK-39041](https://issues.apache.org/jira/browse/SPARK-39041) by yaooqinn , with explicitly declared TimeFormatters instance. ## Types of changes 🔖 - [x] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklists ## 📝 Author Self Checklist - [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project - [x] I have performed a self-review - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) ## 📝 Committer Pre-Merge Checklist - [x] Pull request title is okay. - [x] No license issues. - [x] Milestone correctly set? - [x] Test coverage is ok - [x] Assignees are selected. - [x] Minimum number of approvals - [x] No changes are requested **Be nice. Be informative.** Closes #5811 from bowenliang123/rowset-timeformatter. Closes #5811 2270991 [Bowen Liang] Reuse time formatters instance in value serialization of TRowSet Authored-by: Bowen Liang <[email protected]> Signed-off-by: Cheng Pan <[email protected]> (cherry picked from commit 4463cc8) Signed-off-by: Cheng Pan <[email protected]>
18 tasks
bowenliang123
added a commit
that referenced
this issue
Dec 8, 2023
…umnValues in TRowSet generation # 🔍 Description ## Issue References 🔗 Subtask of #5808. ## Describe Your Solution 🔧 To avoid unnecessary possible array copy in ensuring capacity of the array list, pre-allocate array list capacity for TColumns and TColumnValues in TRowSet generation. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 No behaviour changes. #### Related Unit Tests RowSetSuite of Spark Engine. --- # Checklists ## 📝 Author Self Checklist - [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project - [ ] I have performed a self-review - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) ## 📝 Committer Pre-Merge Checklist - [x] Pull request title is okay. - [x] No license issues. - [x] Milestone correctly set? - [x] Test coverage is ok - [x] Assignees are selected. - [x] Minimum number of approvals - [ ] No changes are requested **Be nice. Be informative.** Closes #5831 from bowenliang123/rowset-inplace. Closes #5831 c1c6c0f [liangbowen] avoid possible array copying with growing of array list in TRowSet generation Lead-authored-by: Bowen Liang <[email protected]> Co-authored-by: liangbowen <[email protected]> Signed-off-by: Bowen Liang <[email protected]>
bowenliang123
added a commit
that referenced
this issue
Dec 8, 2023
…umnValues in TRowSet generation # 🔍 Description ## Issue References 🔗 Subtask of #5808. ## Describe Your Solution 🔧 To avoid unnecessary possible array copy in ensuring capacity of the array list, pre-allocate array list capacity for TColumns and TColumnValues in TRowSet generation. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 No behaviour changes. #### Related Unit Tests RowSetSuite of Spark Engine. --- # Checklists ## 📝 Author Self Checklist - [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project - [ ] I have performed a self-review - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) ## 📝 Committer Pre-Merge Checklist - [x] Pull request title is okay. - [x] No license issues. - [x] Milestone correctly set? - [x] Test coverage is ok - [x] Assignees are selected. - [x] Minimum number of approvals - [ ] No changes are requested **Be nice. Be informative.** Closes #5831 from bowenliang123/rowset-inplace. Closes #5831 c1c6c0f [liangbowen] avoid possible array copying with growing of array list in TRowSet generation Lead-authored-by: Bowen Liang <[email protected]> Co-authored-by: liangbowen <[email protected]> Signed-off-by: Bowen Liang <[email protected]> (cherry picked from commit 1d6e653) Signed-off-by: Bowen Liang <[email protected]>
18 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Code of Conduct
Search before asking
Describe the proposal
RowSet generation is that taking the results from result iterator and serializing them into column-based or row-based TRowSet, which is the key point for transportation and performance in most common cases.
SparkOperation
(https://github.com/apache/kyuubi/blob/master/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SparkOperation.scala#L253C21-L253C26)Task list
toSeq
(toStream
of Iterator) to immutable collectionUse toIndexedSeq for preparing taken collection of result rowset generation #5804DecimalType with column-based mode [SPARK] Improvement for DecimalType serialization in toTColumn for column-based TRowSet generation #5810ArrayType of primitive data types with column-based modeAre you willing to submit PR?
The text was updated successfully, but these errors were encountered: