Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect missing in Dict constructor #25551

Closed
bkamins opened this issue Jan 14, 2018 · 9 comments
Closed

Detect missing in Dict constructor #25551

bkamins opened this issue Jan 14, 2018 · 9 comments
Labels
missing data Base.missing and related functionality

Comments

@bkamins
Copy link
Member

bkamins commented Jan 14, 2018

We have that array constructors correctly detect missing and create a proper Union type of the resulting object:

julia> [1,2,missing, 1.0]
4-element Array{Union{Missing, Float64},1}:
 1.0
 2.0
  missing
 1.0

However, Dict constructor does not follow this pattern and uses Any:

julia> Dict(1=>1, 2=>missing)
Dict{Int64,Any} with 2 entries:
  2 => missing
  1 => 1

julia> Dict(1=>1, missing=>2)
Dict{Any,Int64} with 2 entries:
  missing => 2
  1       => 1

but if we wrap the pairs in a vector it does:

julia> Dict([1=>1, 2=>missing])
Dict{Int64,Union{Missing, Int64}} with 2 entries:
  2 => missing
  1 => 1

julia> Dict([1=>1, missing=>2])
Dict{Union{Missing, Int64},Int64} with 2 entries:
  missing => 2
  1       => 1

My proposal is to make Dict constructors consistent and always create an appropriate Union.

I see the reason in the source code why currently we have this bad behavior, i.e. eltype for <:Tuple uses typejoin and not promote_type (like arrays do), but I could not find any efficient way to fix it.

Of course eltype for tuple should use typejoin, but then maybe typejoin(::Missing,T) should return Union{Missing, T} if !(T>:Missing). I do not know if this was discussed and this is breaking.

CC @nalimilan

@nalimilan nalimilan added the missing data Base.missing and related functionality label Jan 14, 2018
@nalimilan
Copy link
Member

Yes, that inconsistency is annoying. eltype for tuples should also return Union{T, Missing}. Should be fixed by either #25423 or #24332, depending on which solution we eventually retain.

It could be worth thinking more generally about whether it's OK that Array constructor uses promote but not Dict.

@nalimilan
Copy link
Member

Actually, looks like Dict(1=>1, missing=>2) will need more changes, as there are currently specialized methods for when pairs all have the same type, and the fallback is Any. Shouldn't be hard to fix, but since it's a separate issue I won't fix it in the PRs I mentioned above to keep things simple.

@bkamins
Copy link
Member Author

bkamins commented Jan 14, 2018

Yes, but I would update constructors after we have a target behavior of typejoin because this will influence the design.

@JeffBezanson
Copy link
Member

The == method for dicts could also be updated to handle missing.

@bkamins
Copy link
Member Author

bkamins commented Jan 19, 2018

I have a related question (I could not locate it discussed; if it was and there is some conclusions or if there is a better place to discuss it I would be obliged if you let me know). The problem is that broadcast also loses union with Missing, e.g.:

julia> x = [1, missing, 2]
3-element Array{Union{Missing, Int64},1}:
 1
  missing
 2

julia> identity.(x)
3-element Array{Any,1}:
 1
  missing
 2

@bkamins
Copy link
Member Author

bkamins commented Jan 19, 2018

Sorry - now I see - it is #25423 - right?

@nalimilan
Copy link
Member

Yes, that's it.

@nalimilan
Copy link
Member

The == method for dicts could also be updated to handle missing.

See #25661.

@nalimilan
Copy link
Member

See #25805 for the Dict constructors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing data Base.missing and related functionality
Projects
None yet
Development

No branches or pull requests

3 participants