-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure levels are maintained when joining df's with Categorical cols #1266
Conversation
I've added a note to the docstring for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'd really like to avoid special-casing CategoricalArray
in DataFrames, but I trust you if you say that doesn't work. Maybe we could create an API to pseudo-concatenate two arrays, something like similar
but taking two arrays instead of one. But for now it will be fine.
test/join.jl
Outdated
@test levels(join(B, A, on=:b, kind = :semi)[:b]) == ["a", "b", "c"] | ||
end | ||
|
||
@testset "maintain Categorical levels ordering on join - only 1 is categorical" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Only LHS is categorical" (IIUC).
test/join.jl
Outdated
@@ -325,4 +325,50 @@ module TestJoin | |||
@test all(isa.(o(on).columns, | |||
[CategoricalVector{Union{T, Null}} for T in (Int, Float64)])) | |||
end | |||
|
|||
@testset "maintain Categorical levels ordering on join - non-`on` cols" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CategoricalArray
test/join.jl
Outdated
A = DataFrame(a = [1, 2, 3], b = ["a", "b", "c"]) | ||
c = levels!(categorical(["a", "b", "b"]), ["b", "a"]) | ||
B = DataFrame(b = ["a", "b", "c"], c = c) | ||
@test levels(join(A, B, on=:b, kind=:inner)[:c]) == ["b", "a"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to test isordered
too, since that's easy to get wrong. While you're at it, testing the actual contents of the resulting column wouldn't hurt either, if that's not too much additional code.
Ignore what I said before, a combination of I think I remember why we went with |
Ah, great, that's really the most natural and generic API. Using |
850574f
to
9c3ed1b
Compare
This is working for me locally with JuliaData/CategoricalArrays.jl#97. We'll need to merge that, tag a new release and update this PR with a new minimum requirement on CategoricalArrays |
CI won't pass until we move to Missings. I'm preparing a PR for that. |
9c3ed1b
to
c180053
Compare
Thanks! I really like how you have been able to fix a bug while removing code. |
Fixes #1257
cc @alyst