[SPARK-23047][PYTHON][SQL] Change MapVector to NullableMapVector in ArrowColumnVector #20239

icexelloss · 2018-01-11T20:45:47Z

What changes were proposed in this pull request?

This PR changes usage of MapVector in Spark codebase to use NullableMapVector.

MapVector is an internal Arrow class that is not supposed to be used directly. We should use NullableMapVector instead.

How was this patch tested?

Existing test.

icexelloss · 2018-01-11T20:46:13Z

cc @BryanCutler @ueshin

icexelloss · 2018-01-11T20:46:43Z

@BryanCutler I think this comes up in the Arrow sync yesterday

BryanCutler

I think it's preferable to use NullableMapVector to be consistent, but since MapVector is the super class, the way it currently is shouldn't cause any errors right? I believe all vectors are created through the VectorSchemaRoot and not directly, so those would be the nullable version already.

SparkQA · 2018-01-12T00:06:05Z

Test build #85989 has finished for PR 20239 at commit 0e59098.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

icexelloss · 2018-01-12T00:51:10Z

@BryanCutler Yes there is no error currently. This should make the code cleaner though.

ueshin · 2018-01-12T05:36:34Z

I'm not sure we can change to NullableMapVector and I'm just worrying whether the MapVector is never happened here.
LGTM if you are sure that the MapVector is never happened here.

ueshin · 2018-01-12T07:59:05Z

Btw, I don't mean to block this pr but why does only MapVector have Nullable version, just out of curiosity.

icexelloss · 2018-01-12T16:48:02Z

@ueshin and @BryanCutler I took another look and the class StructAccessor defined in ArrowColumnVector never gets used for getStruct. ArrowColumnVector.getStruct() method just calls ColumnVector.getStruct() which does the right thing. StructAccessor is used for isNullAt and does the right thing.

The branch here: https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java#L250 does happen. As @BryanCutler mentioned, this is because MapVector is a parent of NullableMapVector and NullableMapVector is the actual class gets passed in.

@ueshin with regard to naming, in Arrow 0.8 most "Nullable" prefix to vector classes are removed with the exception of MapVector, which we plan to clean up in later releases.

icexelloss · 2018-01-12T17:13:48Z

MapVector is still used in Arrow internal code but it should not be returned to user directly. https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/types/Types.java#L134

@BryanCutler Do you agree?

I also added a test "non nullable struct" in ArrowColumnVectorSuite

SparkQA · 2018-01-12T20:11:14Z

Test build #86043 has finished for PR 20239 at commit e068966.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-12T20:35:04Z

Test build #86048 has finished for PR 20239 at commit ab2a309.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ueshin · 2018-01-17T09:14:37Z

@BryanCutler Any comments on this? Thanks!

HyukjinKwon

Just double checked. Another LGTM from me.

BryanCutler

Yes, I think it is better to use the NullableMapVector to be consistent. I'm not sure the test really adds anything but doesn't hurt I suppose, LGTM

…rrowColumnVector ## What changes were proposed in this pull request? This PR changes usage of `MapVector` in Spark codebase to use `NullableMapVector`. `MapVector` is an internal Arrow class that is not supposed to be used directly. We should use `NullableMapVector` instead. ## How was this patch tested? Existing test. Author: Li Jin <[email protected]> Closes #20239 from icexelloss/arrow-map-vector. (cherry picked from commit 4e6f8fb) Signed-off-by: hyukjinkwon <[email protected]>

HyukjinKwon · 2018-01-17T22:27:21Z

Merged to master and branch-2.3.

icexelloss · 2018-01-17T22:34:41Z

Thanks for everyone for review!

Change MapVector to NullableMapVector in ArrowColumnVector

0e59098

BryanCutler reviewed Jan 12, 2018

View reviewed changes

Add comment to StructAccessor

e068966

Add test

ab2a309

HyukjinKwon approved these changes Jan 17, 2018

View reviewed changes

BryanCutler approved these changes Jan 17, 2018

View reviewed changes

asfgit closed this in 4e6f8fb Jan 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23047][PYTHON][SQL] Change MapVector to NullableMapVector in ArrowColumnVector #20239

[SPARK-23047][PYTHON][SQL] Change MapVector to NullableMapVector in ArrowColumnVector #20239

icexelloss commented Jan 11, 2018

icexelloss commented Jan 11, 2018

icexelloss commented Jan 11, 2018

BryanCutler left a comment •

edited

Loading

SparkQA commented Jan 12, 2018

icexelloss commented Jan 12, 2018

ueshin commented Jan 12, 2018 •

edited

Loading

ueshin commented Jan 12, 2018

icexelloss commented Jan 12, 2018 •

edited

Loading

icexelloss commented Jan 12, 2018

SparkQA commented Jan 12, 2018

SparkQA commented Jan 12, 2018

ueshin commented Jan 17, 2018 •

edited

Loading

HyukjinKwon left a comment

BryanCutler left a comment

HyukjinKwon commented Jan 17, 2018

icexelloss commented Jan 17, 2018

[SPARK-23047][PYTHON][SQL] Change MapVector to NullableMapVector in ArrowColumnVector #20239

[SPARK-23047][PYTHON][SQL] Change MapVector to NullableMapVector in ArrowColumnVector #20239

Conversation

icexelloss commented Jan 11, 2018

What changes were proposed in this pull request?

How was this patch tested?

icexelloss commented Jan 11, 2018

icexelloss commented Jan 11, 2018

BryanCutler left a comment • edited Loading

Choose a reason for hiding this comment

SparkQA commented Jan 12, 2018

icexelloss commented Jan 12, 2018

ueshin commented Jan 12, 2018 • edited Loading

ueshin commented Jan 12, 2018

icexelloss commented Jan 12, 2018 • edited Loading

icexelloss commented Jan 12, 2018

SparkQA commented Jan 12, 2018

SparkQA commented Jan 12, 2018

ueshin commented Jan 17, 2018 • edited Loading

HyukjinKwon left a comment

Choose a reason for hiding this comment

BryanCutler left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Jan 17, 2018

icexelloss commented Jan 17, 2018

BryanCutler left a comment •

edited

Loading

ueshin commented Jan 12, 2018 •

edited

Loading

icexelloss commented Jan 12, 2018 •

edited

Loading

ueshin commented Jan 17, 2018 •

edited

Loading