Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23284][SQL] Document the behavior of several ColumnVector's get APIs when accessing null slot #20455

Closed
wants to merge 9 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Jan 31, 2018

What changes were proposed in this pull request?

For some ColumnVector get APIs such as getDecimal, getBinary, getStruct, getArray, getInterval, getUTF8String, we should clearly document their behaviors when accessing null slot. They should return null in this case. Then we can remove null checks from the places using above APIs.

For the APIs of primitive values like getInt, getInts, etc., this also documents their behaviors when accessing null slots. Their returning values are undefined and can be anything.

How was this patch tested?

Added tests into ColumnarBatchSuite.

@viirya viirya changed the title [SPARK-23284][SQL] Document several get API of ColumnVector's behavior when accessing null slot [SPARK-23284][SQL] Document the behavior of several ColumnVector's get APIs when accessing null slot Jan 31, 2018
@viirya
Copy link
Member Author

viirya commented Jan 31, 2018

Once map support is added later, we should also document getMap's behavior.

@SparkQA
Copy link

SparkQA commented Jan 31, 2018

Test build #86878 has finished for PR 20455 at commit 5ef0ba8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Feb 1, 2018

cc @cloud-fan @ueshin @kiszk

@ueshin
Copy link
Member

ueshin commented Feb 1, 2018

LGTM so far. Let's wait for map type support.

@cloud-fan
Copy link
Contributor

LGTM so far, one thing I wanna add is to also document the behavior of accessing null primitive values, e.g. getInt. We can say that the return value is undefined and can be anything, if that slot is null. Same to the batched version like getInts.

@cloud-fan
Copy link
Contributor

BTW we should also update ColumnarBatchSuite for the "return null" behavior

@kiszk
Copy link
Member

kiszk commented Feb 1, 2018

LGTM for this behavior and comments.

@viirya
Copy link
Member Author

viirya commented Feb 1, 2018

Since the map support is added, I'll do related change later.

@ueshin
Copy link
Member

ueshin commented Feb 1, 2018

We also need to add if (isNullAt(rowId)) return null; to WritableColumnVector.getMap()?

@viirya
Copy link
Member Author

viirya commented Feb 1, 2018

@ueshin Yes, missing it. Thanks. I'll add it.

@@ -1261,4 +1261,140 @@ class ColumnarBatchSuite extends SparkFunSuite {
batch.close()
allocator.close()
}

testVector("getUTF8String should return null for null slot", 4, StringType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have test cases for each type, can we just change the existing test cases a little to add this null check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds ok to me. I'll commit the change by tonight.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we don't have individual tests for binary type and decimal type?

@SparkQA
Copy link

SparkQA commented Feb 1, 2018

Test build #86918 has finished for PR 20455 at commit 7a1fd57.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 1, 2018

Test build #86921 has finished for PR 20455 at commit 5246fcc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1261,4 +1269,38 @@ class ColumnarBatchSuite extends SparkFunSuite {
batch.close()
allocator.close()
}

testVector("getDecimal should return null for null slot", 4, DecimalType.IntDecimal) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we make it a normal test case for decimal type? we can follow the other tests, e.g. create a decimal array, and check the value of column vector at the same index.

@SparkQA
Copy link

SparkQA commented Feb 1, 2018

Test build #86923 has finished for PR 20455 at commit 35548e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 1, 2018

Test build #86926 has finished for PR 20455 at commit 923d0fe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 1, 2018

Test build #86935 has finished for PR 20455 at commit 6d5f7ec.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@gatorsmile
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Feb 1, 2018

Test build #86941 has finished for PR 20455 at commit 6d5f7ec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Feb 2, 2018

LGTM.

@cloud-fan
Copy link
Contributor

thanks, merging to master/2.3!

asfgit pushed a commit that referenced this pull request Feb 2, 2018
…t APIs when accessing null slot

## What changes were proposed in this pull request?

For some ColumnVector get APIs such as getDecimal, getBinary, getStruct, getArray, getInterval, getUTF8String, we should clearly document their behaviors when accessing null slot. They should return null in this case. Then we can remove null checks from the places using above APIs.

For the APIs of primitive values like getInt, getInts, etc., this also documents their behaviors when accessing null slots. Their returning values are undefined and can be anything.

## How was this patch tested?

Added tests into `ColumnarBatchSuite`.

Author: Liang-Chi Hsieh <[email protected]>

Closes #20455 from viirya/SPARK-23272-followup.

(cherry picked from commit 90848d5)
Signed-off-by: Wenchen Fan <[email protected]>
@asfgit asfgit closed this in 90848d5 Feb 2, 2018
@viirya viirya deleted the SPARK-23272-followup branch December 27, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants