-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23047][PYTHON][SQL] Change MapVector to NullableMapVector in ArrowColumnVector #20239
Conversation
@BryanCutler I think this comes up in the Arrow sync yesterday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's preferable to use NullableMapVector
to be consistent, but since MapVector
is the super class, the way it currently is shouldn't cause any errors right? I believe all vectors are created through the VectorSchemaRoot
and not directly, so those would be the nullable version already.
Test build #85989 has finished for PR 20239 at commit
|
@BryanCutler Yes there is no error currently. This should make the code cleaner though. |
I'm not sure we can change to |
Btw, I don't mean to block this pr but why does only |
@ueshin and @BryanCutler I took another look and the class The branch here: https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java#L250 does happen. As @BryanCutler mentioned, this is because @ueshin with regard to naming, in Arrow 0.8 most "Nullable" prefix to vector classes are removed with the exception of |
@BryanCutler Do you agree? I also added a test "non nullable struct" in |
Test build #86043 has finished for PR 20239 at commit
|
Test build #86048 has finished for PR 20239 at commit
|
@BryanCutler Any comments on this? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checked. Another LGTM from me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it is better to use the NullableMapVector to be consistent. I'm not sure the test really adds anything but doesn't hurt I suppose, LGTM
…rrowColumnVector ## What changes were proposed in this pull request? This PR changes usage of `MapVector` in Spark codebase to use `NullableMapVector`. `MapVector` is an internal Arrow class that is not supposed to be used directly. We should use `NullableMapVector` instead. ## How was this patch tested? Existing test. Author: Li Jin <[email protected]> Closes #20239 from icexelloss/arrow-map-vector. (cherry picked from commit 4e6f8fb) Signed-off-by: hyukjinkwon <[email protected]>
Merged to master and branch-2.3. |
Thanks for everyone for review! |
What changes were proposed in this pull request?
This PR changes usage of
MapVector
in Spark codebase to useNullableMapVector
.MapVector
is an internal Arrow class that is not supposed to be used directly. We should useNullableMapVector
instead.How was this patch tested?
Existing test.