[Spark] Reuse the column value skipping redundant access on row(i) #5812

bowenliang123 · 2023-12-04T05:23:06Z

🔍 Description

Issue References 🔗

Subtask of #5808

Describe Your Solution 🔧

Avoid redundant access on row(i) in both the isNullAt(ordinal) and row.getAs[T](ordinal) by Reusing the column value

Types of changes 🔖

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests

Checklists

📝 Author Self Checklist

My code follows the style guidelines of this project
I have performed a self-review
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
This patch was not authored or co-authored using Generative Tooling

📝 Committer Pre-Merge Checklist

Be nice. Be informative.

pan3793 · 2023-12-04T05:33:55Z

...ls/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/schema/RowSet.scala

@@ -169,12 +169,13 @@ object RowSet {
    var idx = 0
    while (idx < size) {
      val row = rows(idx)
-      val isNull = row.isNullAt(ordinal)
+      val value = row.get(ordinal)
+      val isNull = value == null


I think we can not make such an assumption, the implementation may be different, for example, the developer uses a custom value to represent NULL in its own CustomRow, especially for those non-JVM implementations

meanwhile, the default implementation of Row is array-based, so indexed-access is cheap.

No changes here, as the Spark's row.isNullAt checks the value equals to null of Java precisely

Thanks for hint. The default implementation of Row IS array-based.

bowenliang123 · 2023-12-04T05:41:30Z

Based on the second point above in the discussion, access to the array-based columns inside the row should be cheap. this PR is not necessary. Closing this PR.

Reuse the column value skipping redundant access on row(i)

ba141d8

github-actions bot added the module:spark label Dec 4, 2023

pan3793 reviewed Dec 4, 2023

View reviewed changes

bowenliang123 closed this Dec 4, 2023

bowenliang123 deleted the rowset-nullcheck branch December 4, 2023 05:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spark] Reuse the column value skipping redundant access on row(i) #5812

[Spark] Reuse the column value skipping redundant access on row(i) #5812

bowenliang123 commented Dec 4, 2023

pan3793 Dec 4, 2023

pan3793 Dec 4, 2023

bowenliang123 Dec 4, 2023

bowenliang123 commented Dec 4, 2023

[Spark] Reuse the column value skipping redundant access on row(i) #5812

[Spark] Reuse the column value skipping redundant access on row(i) #5812

Conversation

bowenliang123 commented Dec 4, 2023

🔍 Description

Issue References 🔗

Describe Your Solution 🔧

Types of changes 🔖

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests

Checklists

📝 Author Self Checklist

📝 Committer Pre-Merge Checklist

pan3793 Dec 4, 2023

Choose a reason for hiding this comment

pan3793 Dec 4, 2023

Choose a reason for hiding this comment

bowenliang123 Dec 4, 2023

Choose a reason for hiding this comment

bowenliang123 commented Dec 4, 2023