Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new columns via transform and missing values #84

Closed
tbeason opened this issue Jan 18, 2018 · 7 comments
Closed

new columns via transform and missing values #84

tbeason opened this issue Jan 18, 2018 · 7 comments

Comments

@tbeason
Copy link

tbeason commented Jan 18, 2018

I think perhaps transform from DataFramesMeta has some problems when the new column contains missing. When I do an operation involving a column of type Union{<:Real,Missing} the new column will be of type Any rather than Union{<:Real,Missing}

julia> dd = DataFrame(a=[1,2,3],b=[1,2,missing])
3×2 DataFrames.DataFrame
│ Row │ a │ b       │
├─────┼───┼─────────┤
│ 111       │
│ 222       │
│ 33missing │

julia> eltypes(dd)
2-element Array{Type,1}:
 Int64
 Union{Int64, Missings.Missing}

julia> dd=@transform(dd,aa=2*:a,bb=2*:b)
3×4 DataFrames.DataFrame
│ Row │ a │ b       │ aa │ bb      │
├─────┼───┼─────────┼────┼─────────┤
│ 11122       │
│ 22244       │
│ 33missing6missing │

julia> eltypes(dd)
4-element Array{Type,1}:
 Int64
 Union{Int64, Missings.Missing}
 Int64
 Any
@tshort
Copy link
Contributor

tshort commented Jan 19, 2018

I think this is an upstream problem. Note the result type of the following:

julia> 2*dd[:b]
3-element Array{Any,1}:
 2
 4
  missing

@tshort
Copy link
Contributor

tshort commented Jan 19, 2018

Note that I edited the example above. It's the same answer with * and .*.

@tbeason
Copy link
Author

tbeason commented Jan 19, 2018

You're right. This is much more pervasive than I thought. I'll post this in Missings.jl I suppose.

julia> a=[1,missing,2]
3-element Array{Union{Int64, Missings.Missing},1}:
 1
  missing
 2

julia> a+2
3-element Array{Any,1}:
 3
  missing
 4

@nalimilan
Copy link
Member

It's even more upstream than that, see JuliaLang/julia#25553.

@tbeason
Copy link
Author

tbeason commented Jan 22, 2018

Until the upstream issue is resolved, I've written a workaround (that is admittedly quite slow). With simple arrays you can just convert to the correct type, but with dataframes that doesn't happen. What I do is, basically, you can do your operations which will return vectors of Any and then you can fill a prepopulated vector of the correct type with the values of the original vector. I wrote it in such a way that you can keep using it inside chained @linq statements.

function anytounion!(d,x::Union{Integer,Symbol})
	LL = length(d[x])
	tmp = Vector{Union{Float64,Missing}}(LL)
	@inbounds for i =1:LL
		tmp[i] = d[i,x]
	end
	d[:,x] = tmp
	return d
end

Note that it isn't flexible (it always creates Vector{Union{Float64,Missing}}) but you could extend it easily to other types.

Just thought this might be helpful to anybody on 0.6 / finding this issue frustrating.

@tshort
Copy link
Contributor

tshort commented Jan 22, 2018

Another workaround that's speedy is to preallocate the result.

dd[:c] = similar(dd[:b])
@with dd :c .= :b + 3

Another option for preallocating is to use byrow!with @newcol. You can't use @linq for either option, though.

@tbeason
Copy link
Author

tbeason commented Feb 3, 2018

This is fixed on Julia 0.7 master branch, so I'm going to close the issue.

@tbeason tbeason closed this as completed Feb 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants