Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix dict x == x to return missing if x contains it #34809

Merged
merged 1 commit into from
Feb 21, 2020
Merged

Conversation

JeffBezanson
Copy link
Member

closes #34744

Also use isequal to compare keys in ImmutableDict. I'm not sure why it was using ==, but this is not exported so we might as well change it to be more similar to Dict.

@JeffBezanson JeffBezanson added missing data Base.missing and related functionality collections Data structures holding multiple items, e.g. sets labels Feb 19, 2020
Copy link
Member

@andyferris andyferris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this seems closer to the kind of semantic people seem to be aiming for.

As a general note - I still find it interesting that missing (and NaN) have to get handled in generic code rather than just falling out. It also seems unfortunate that I can't create my_missing in a user package (or have == return anything other than Bool or Missing) and have it work well with the Base containers (when having most stuff possible in user space seemed to be a design goal of the language/standard library).

Would you recommend us to write generic code that assumes only missing and NaN are allowed to violate == being reflexive (and not user-defined types)?

@@ -474,14 +474,16 @@ function isequal(l::AbstractDict, r::AbstractDict)
end

function ==(l::AbstractDict, r::AbstractDict)
l === r && return true
if l === r
return any(ismissing, values(l)) ? missing : true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about when NaN is a value of the Dict? Perhaps something like this:

return any(ismissing, values(l)) ? missing : !(any(isnan, values(l))

Copy link
Member

@andyferris andyferris Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And out of curiousity, do you know if the compiler can elide the entire any when the eltype doesn't intersect Missing or AbstractFloat?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can:

julia> f(a) = any(ismissing, a)
julia> @code_llvm f([1])

;  @ REPL[2]:1 within `f'
define i8 @julia_f_17290(%jl_value_t addrspace(10)* nonnull align 16 dereferenceable(40)) {
top:
  ret i8 0
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand

julia> @code_llvm f(Dict("foo" => 1))

;  @ REPL[10]:1 within `f'
define i8 @julia_f_17814(%jl_value_t addrspace(10)* nonnull align 8 dereferenceable(64)) {
top:
; ┌ @ reduce.jl:765 within `any'
   %1 = call i8 @julia__any_17815(%jl_value_t addrspace(10)* nonnull %0)
; └
  ret i8 %1
}

julia> using BenchmarkTools

julia> data = Dict([randstring(10) => rand(1:10) for i in 1:10^6]);

julia> @btime f(data)
  8.170 ms (0 allocations: 0 bytes)
false

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but we can't do this for any(ismissing, values(d)). Dict iteration seems to be too complex. Another reason to switch to the ordered representation perhaps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now there's another problem. For maximum generality I'd like to just remove the ===, but we have tests for circular dictionaries that then break 🤦‍♂️ . Do we really need that? Arrays etc. stack overflow in that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't care about the circular dictionaries, personally, but I dunno - maybe someone does? The === is not a complete protection in any case, depth two nestings give a stack overflow (I'm guessing you need to maintain a stack of identities to do it properly? We seem to do that with IOContexts in show...)

(Out of interest, I'd also mention that ===, at least on the keys, is a nice shortcut for isequal on each key, which is used heavily as a (massive) optimization in Dictionaries.jl as it lets you both skip comparing keys as well as having to look up values by key (you get fast co-iteration of the values via vectors, so you may even get that ellision of any, I'll have to look into that...))

if isa(l,IdDict) != isa(r,IdDict)
return false
end
length(l) != length(r) && return false
anymissing = false
for pair in l
isin = in(pair, r, ==)
isin = in(pair, r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the key be compared with isequal and value compared with ==?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what this does. I just want to phase out the third argument to in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, thanks.

So are we using == everywhere for in except for AbstractSet using isequal?

@JeffBezanson
Copy link
Member Author

As a general note - I still find it interesting that missing (and NaN) have to get handled in generic code rather than just falling out.

That's a good point. We could do a little better by checking whether == returns true or false, and if it returns something other than those, use that as the result. But then you might get multiple missing-like values --- do we promote them, or what?

NaN is a bit different. There I would definitely discourage people from defining new types where x==x returns false (unless isnan is true). But, that's also easier to handle generically since the right behavior falls out just by handling true/false results from ==.

closes #34744

use `isequal` to compare keys in `ImmutableDict`
@tkf
Copy link
Member

tkf commented Feb 20, 2020

Maybe I'm missing something but why Base functions use anymissing flag pattern? Why not let & (and |) handle it? Something like this:

function ==′(l::AbstractDict, r::AbstractDict)
    if isa(l,IdDict) != isa(r,IdDict)
        return false
    end
    length(l) != length(r) && return false
    acc = true
    for pair in l
        isin = in(pair, r)
        isin === false && return false
        acc &= isin
    end
    return acc
end

This way, we only require that missing-like object to implement a sane & (i.e., false is an absorbing element). I think this is nice as the reduction code itself doesn't have to mention missing at all.

Maybe anymissing-pattern is used for type-stability? Though maybe it's OK here since in(pair, r) is the non-trivial inner-most operation?

@andyferris
Copy link
Member

andyferris commented Feb 20, 2020

It's a good point, @tkf - that does seem more generic.

Further curiosity: in that code snippet, will the compiler know when isin is inferred as Bool that its value in the last line inside the loop must be true since it can't be false?

@JeffBezanson JeffBezanson merged commit 5cd7482 into master Feb 21, 2020
@JeffBezanson JeffBezanson deleted the jb/dictcompare branch February 21, 2020 16:05
birm pushed a commit to birm/julia that referenced this pull request Feb 22, 2020
closes JuliaLang#34744

use `isequal` to compare keys in `ImmutableDict`
KristofferC pushed a commit that referenced this pull request Apr 11, 2020
closes #34744

use `isequal` to compare keys in `ImmutableDict`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets missing data Base.missing and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants