Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Reuse the column value skipping redundant access on row(i) #5812

Closed
wants to merge 1 commit into from

Conversation

bowenliang123
Copy link
Contributor

🔍 Description

Issue References 🔗

Subtask of #5808

Describe Your Solution 🔧

  • Avoid redundant access on row(i) in both the isNullAt(ordinal) and row.getAs[T](ordinal) by Reusing the column value

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests


Checklists

📝 Author Self Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • This patch was not authored or co-authored using Generative Tooling

📝 Committer Pre-Merge Checklist

  • Pull request title is okay.
  • No license issues.
  • Milestone correctly set?
  • Test coverage is ok
  • Assignees are selected.
  • Minimum number of approvals
  • No changes are requested

Be nice. Be informative.

@@ -169,12 +169,13 @@ object RowSet {
var idx = 0
while (idx < size) {
val row = rows(idx)
val isNull = row.isNullAt(ordinal)
val value = row.get(ordinal)
val isNull = value == null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can not make such an assumption, the implementation may be different, for example, the developer uses a custom value to represent NULL in its own CustomRow, especially for those non-JVM implementations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meanwhile, the default implementation of Row is array-based, so indexed-access is cheap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. No changes here, as the Spark's row.isNullAt checks the value equals to null of Java precisely
  2. Thanks for hint. The default implementation of Row IS array-based.

@bowenliang123
Copy link
Contributor Author

Based on the second point above in the discussion, access to the array-based columns inside the row should be cheap. this PR is not necessary. Closing this PR.

@bowenliang123 bowenliang123 deleted the rowset-nullcheck branch December 4, 2023 05:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants