Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uncompact #41

Merged
merged 2 commits into from
Nov 3, 2016
Merged

uncompact #41

merged 2 commits into from
Nov 3, 2016

Conversation

gustafsson
Copy link
Contributor

Return a copy of categorical array A using the default reference type. This is needed when adding levels beyond what a compact pool has room for. Added a suggestion to use uncompact when hitting a LevelsException.

Related: JuliaData/DataFrames.jl#990

Example:

a = categorical([1;1:255], true)
a[1] = 0 # error
a2 = uncompact(a)
a2[1] = 0 # works

Using uncompact in the first place would probably be an anti-pattern so I'm not even sure if it's a good idea to add this function. None-the-less I did run into this myself when I used compact to keep the array as small as possible in the inner loop yet wanted to merge my output with other results later. I also believe it fits well with the description of what to do when you hit a LevelsException.

Return a copy of categorical array `A` using the default reference type.
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this could be useful, in particular if we decide not to widen the reference type by default. It's always good to be able to give a simple solution to the user from error messages.

"""
uncompact{T, N}(A::CatArray{T, N}) =
convert(arraytype(typeof(A)){T, N, DefaultRefType}, A)
uncompact{T}(P::CategoricalPool{T}) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go into pool.jl. But it is really needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, no. I'll remove it

uncompact(A::NullableCategoricalArray)

Return a copy of categorical array `A` using the default reference type. If `A` is using
a small reference type (such as UInt8 or UInt16) the uncompact array will have room for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backticks around type names. Also, "uncompacted" would be better.

@@ -182,5 +182,5 @@ ordered!(pool::CategoricalPool, ordered) = (pool.ordered = ordered; pool)
# LevelsException
function Base.showerror{T, R}(io::IO, err::LevelsException{T, R})
levs = join(repr.(err.levels), ", ", " and ")
print(io, "cannot store level(s) $levs since reference type $R can only hold $(typemax(R)) levels. Convert categorical array to a larger reference type to add more levels.")
print(io, "cannot store level(s) $levs since reference type $R can only hold $(typemax(R)) levels. Convert categorical array to a larger reference type to add more levels (see uncompact).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simply say "Use the uncompact function to add more levels".

uncompact(A::CategoricalArray)
uncompact(A::NullableCategoricalArray)

Return a copy of categorical array `A` using the default reference type. If `A` is using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"default reference type ($DefaultRefType)" should work.

a small reference type (such as UInt8 or UInt16) the uncompact array will have room for
more levels.

Avoid using compact to avoid having to call uncompact.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"compact" should be a link, see how @ref is used elsewhere. Though I find this phrasing a bit weird. Maybe something along the lines of "To avoid the need to call uncompact, ensure compact is not called during creation or importation of data as categorical arrays."?

@test x == A(string.(Char.(65:318)))
lev = copy(levels(x))
levels!(x, vcat(lev, "az"))
@test levels(x) == vcat(lev, "az")
x2 = uncompact(x)
levels!(x2, vcat(levels(x2), "bz", "cz"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this test in a different block. You don't actually need to test calling levels!: just ensure the type matches what is expected (the rest is tested elsewhere).

@coveralls
Copy link

coveralls commented Nov 1, 2016

Coverage Status

Coverage increased (+0.06%) to 86.788% when pulling 5f90070 on gustafsson:uncompact into d3f4101 on JuliaData:master.

@codecov-io
Copy link

codecov-io commented Nov 1, 2016

Current coverage is 86.75% (diff: 100%)

Merging #41 into master will increase coverage by 0.03%

@@             master        #41   diff @@
==========================================
  Files             8          8          
  Lines           437        438     +1   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            379        380     +1   
  Misses           58         58          
  Partials          0          0          

Powered by Codecov. Last update d3f4101...b59161c

@coveralls
Copy link

coveralls commented Nov 1, 2016

Coverage Status

Coverage increased (+0.03%) to 86.758% when pulling b59161c on gustafsson:uncompact into d3f4101 on JuliaData:master.

@nalimilan nalimilan merged commit 77cdd97 into JuliaData:master Nov 3, 2016
@nalimilan
Copy link
Member

Thanks! I think I'll rename compact and uncompact to compress and decompress, which sound more natural, in a later commit.

@gustafsson gustafsson deleted the uncompact branch November 4, 2016 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants