Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uncompact #41

Merged
merged 2 commits into from
Nov 3, 2016
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/CategoricalArrays.jl
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ module CategoricalArrays
NullableCategoricalArray, NullableCategoricalVector, NullableCategoricalMatrix
export LevelsException

export categorical, compact, droplevels!, levels, levels!, isordered, ordered!
export categorical, compact, uncompact, droplevels!, levels, levels!, isordered, ordered!

using Compat

Expand Down
15 changes: 15 additions & 0 deletions src/array.jl
Original file line number Diff line number Diff line change
Expand Up @@ -463,6 +463,21 @@ function compact{T, N}(A::CatArray{T, N})
convert(arraytype(typeof(A)){T, N, R}, A)
end

"""
uncompact(A::CategoricalArray)
uncompact(A::NullableCategoricalArray)

Return a copy of categorical array `A` using the default reference type. If `A` is using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"default reference type ($DefaultRefType)" should work.

a small reference type (such as UInt8 or UInt16) the uncompact array will have room for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backticks around type names. Also, "uncompacted" would be better.

more levels.

Avoid using compact to avoid having to call uncompact.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"compact" should be a link, see how @ref is used elsewhere. Though I find this phrasing a bit weird. Maybe something along the lines of "To avoid the need to call uncompact, ensure compact is not called during creation or importation of data as categorical arrays."?

"""
uncompact{T, N}(A::CatArray{T, N}) =
convert(arraytype(typeof(A)){T, N, DefaultRefType}, A)
uncompact{T}(P::CategoricalPool{T}) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go into pool.jl. But it is really needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, no. I'll remove it

convert(CategoricalPool{T, DefaultRefType}, P)

arraytype(A::CategoricalArray...) = CategoricalArray
arraytype(A::CatArray...) = NullableCategoricalArray

Expand Down
2 changes: 1 addition & 1 deletion src/pool.jl
Original file line number Diff line number Diff line change
Expand Up @@ -182,5 +182,5 @@ ordered!(pool::CategoricalPool, ordered) = (pool.ordered = ordered; pool)
# LevelsException
function Base.showerror{T, R}(io::IO, err::LevelsException{T, R})
levs = join(repr.(err.levels), ", ", " and ")
print(io, "cannot store level(s) $levs since reference type $R can only hold $(typemax(R)) levels. Convert categorical array to a larger reference type to add more levels.")
print(io, "cannot store level(s) $levs since reference type $R can only hold $(typemax(R)) levels. Convert categorical array to a larger reference type to add more levels (see uncompact).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simply say "Use the uncompact function to add more levels".

end
8 changes: 5 additions & 3 deletions test/07_levels.jl
Original file line number Diff line number Diff line change
Expand Up @@ -207,18 +207,20 @@ module TestLevels

res = @test_throws LevelsException{Int, UInt8} CategoricalPool{Int, UInt8}(collect(256:-1:1))
@test res.value.levels == [1]
@test sprint(showerror, res.value) == "cannot store level(s) 1 since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels."
@test sprint(showerror, res.value) == "cannot store level(s) 1 since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels (see uncompact)."

pool = CategoricalPool(collect(30:288))
res = @test_throws LevelsException{Int, UInt8} convert(CategoricalPool{Int, UInt8}, pool)
@test res.value.levels == collect(285:288)
@test sprint(showerror, res.value) == "cannot store level(s) 285, 286, 287 and 288 since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels."
@test sprint(showerror, res.value) == "cannot store level(s) 285, 286, 287 and 288 since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels (see uncompact)."

pool = CategoricalPool{String, UInt8}(string.(318:-1:65))
res = @test_throws LevelsException{String, UInt8} levels!(pool, vcat("az", levels(pool), "bz", "cz"))
@test res.value.levels == ["bz", "cz"]
@test sprint(showerror, res.value) == "cannot store level(s) \"bz\" and \"cz\" since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels."
@test sprint(showerror, res.value) == "cannot store level(s) \"bz\" and \"cz\" since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels (see uncompact)."
lev = copy(levels(pool))
levels!(pool, vcat(lev, "az"))
@test levels(pool) == vcat(lev, "az")
pool2 = uncompact(pool)
levels!(pool2, vcat(levels(pool2), "bz", "cz"))
end
14 changes: 8 additions & 6 deletions test/13_arraycommon.jl
Original file line number Diff line number Diff line change
Expand Up @@ -239,41 +239,43 @@ for (CA, A) in ((CategoricalArray, Array), (NullableCategoricalArray, NullableAr

res = @test_throws LevelsException{Int, UInt8} CA{Int, 1, UInt8}(256:-1:1)
@test res.value.levels == [1]
@test sprint(showerror, res.value) == "cannot store level(s) 1 since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels."
@test sprint(showerror, res.value) == "cannot store level(s) 1 since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels (see uncompact)."

x = CA{Int, 1, UInt8}(254:-1:1)
x[1] = 1000
res = @test_throws LevelsException{Int, UInt8} x[1] = 1001
@test res.value.levels == [1001]
@test sprint(showerror, res.value) == "cannot store level(s) 1001 since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels."
@test sprint(showerror, res.value) == "cannot store level(s) 1001 since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels (see uncompact)."
@test x == A(vcat(1000, 253:-1:1))
@test levels(x) == vcat(1:254, 1000)

x = CA{Int, 1, UInt8}(1:254)
res = @test_throws LevelsException{Int, UInt8} x[1:2] = 1000:1001
@test res.value.levels == [1001]
@test sprint(showerror, res.value) == "cannot store level(s) 1001 since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels."
@test sprint(showerror, res.value) == "cannot store level(s) 1001 since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels (see uncompact)."
@test x == A(vcat(1000, 2:254))
@test levels(x) == vcat(1:254, 1000)

x = CA{Int, 1, UInt8}([1, 3, 256])
res = @test_throws LevelsException{Int, UInt8} levels!(x, collect(1:256))
@test res.value.levels == [255]
@test sprint(showerror, res.value) == "cannot store level(s) 255 since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels."
@test sprint(showerror, res.value) == "cannot store level(s) 255 since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels (see uncompact)."

x = CA(30:2:131115)
res = @test_throws LevelsException{Int, UInt16} CategoricalVector{Int, UInt16}(x)
@test res.value.levels == collect(131100:2:131114)
@test sprint(showerror, res.value) == "cannot store level(s) 131100, 131102, 131104, 131106, 131108, 131110, 131112 and 131114 since reference type UInt16 can only hold 65535 levels. Convert categorical array to a larger reference type to add more levels."
@test sprint(showerror, res.value) == "cannot store level(s) 131100, 131102, 131104, 131106, 131108, 131110, 131112 and 131114 since reference type UInt16 can only hold 65535 levels. Convert categorical array to a larger reference type to add more levels (see uncompact)."

x = CA{String, 1, UInt8}(string.(Char.(65:318)))
res = @test_throws LevelsException{String, UInt8} levels!(x, vcat(levels(x), "az", "bz", "cz"))
@test res.value.levels == ["bz", "cz"]
@test sprint(showerror, res.value) == "cannot store level(s) \"bz\" and \"cz\" since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels."
@test sprint(showerror, res.value) == "cannot store level(s) \"bz\" and \"cz\" since reference type UInt8 can only hold 255 levels. Convert categorical array to a larger reference type to add more levels (see uncompact)."
@test x == A(string.(Char.(65:318)))
lev = copy(levels(x))
levels!(x, vcat(lev, "az"))
@test levels(x) == vcat(lev, "az")
x2 = uncompact(x)
levels!(x2, vcat(levels(x2), "bz", "cz"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this test in a different block. You don't actually need to test calling levels!: just ensure the type matches what is expected (the rest is tested elsewhere).

end

end