fix dict `x == x` to return missing if `x` contains it #34809

JeffBezanson · 2020-02-19T00:33:35Z

Also use isequal to compare keys in ImmutableDict. I'm not sure why it was using ==, but this is not exported so we might as well change it to be more similar to Dict.

andyferris

Nice, this seems closer to the kind of semantic people seem to be aiming for.

As a general note - I still find it interesting that missing (and NaN) have to get handled in generic code rather than just falling out. It also seems unfortunate that I can't create my_missing in a user package (or have == return anything other than Bool or Missing) and have it work well with the Base containers (when having most stuff possible in user space seemed to be a design goal of the language/standard library).

Would you recommend us to write generic code that assumes only missing and NaN are allowed to violate == being reflexive (and not user-defined types)?

andyferris · 2020-02-19T11:27:16Z

base/abstractdict.jl

@@ -474,14 +474,16 @@ function isequal(l::AbstractDict, r::AbstractDict)
 end

 function ==(l::AbstractDict, r::AbstractDict)
-    l === r && return true
+    if l === r
+        return any(ismissing, values(l)) ? missing : true


What about when NaN is a value of the Dict? Perhaps something like this:

return any(ismissing, values(l)) ? missing : !(any(isnan, values(l))

And out of curiousity, do you know if the compiler can elide the entire any when the eltype doesn't intersect Missing or AbstractFloat?

It can:

julia> f(a) = any(ismissing, a) julia> @code_llvm f([1]) ; @ REPL[2]:1 within `f' define i8 @julia_f_17290(%jl_value_t addrspace(10)* nonnull align 16 dereferenceable(40)) { top: ret i8 0 }

On the other hand

julia> @code_llvm f(Dict("foo" => 1)) ; @ REPL[10]:1 within `f' define i8 @julia_f_17814(%jl_value_t addrspace(10)* nonnull align 8 dereferenceable(64)) { top: ; ┌ @ reduce.jl:765 within `any' %1 = call i8 @julia__any_17815(%jl_value_t addrspace(10)* nonnull %0) ; └ ret i8 %1 } julia> using BenchmarkTools julia> data = Dict([randstring(10) => rand(1:10) for i in 1:10^6]); julia> @btime f(data) 8.170 ms (0 allocations: 0 bytes) false

Ah, but we can't do this for any(ismissing, values(d)). Dict iteration seems to be too complex. Another reason to switch to the ordered representation perhaps.

Ok, now there's another problem. For maximum generality I'd like to just remove the ===, but we have tests for circular dictionaries that then break 🤦‍♂️ . Do we really need that? Arrays etc. stack overflow in that case.

I wouldn't care about the circular dictionaries, personally, but I dunno - maybe someone does? The === is not a complete protection in any case, depth two nestings give a stack overflow (I'm guessing you need to maintain a stack of identities to do it properly? We seem to do that with IOContexts in show...)

(Out of interest, I'd also mention that ===, at least on the keys, is a nice shortcut for isequal on each key, which is used heavily as a (massive) optimization in Dictionaries.jl as it lets you both skip comparing keys as well as having to look up values by key (you get fast co-iteration of the values via vectors, so you may even get that ellision of any, I'll have to look into that...))

andyferris · 2020-02-19T11:29:11Z

base/abstractdict.jl

    if isa(l,IdDict) != isa(r,IdDict)
        return false
    end
    length(l) != length(r) && return false
    anymissing = false
    for pair in l
-        isin = in(pair, r, ==)
+        isin = in(pair, r)


Should the key be compared with isequal and value compared with ==?

Yes, that's what this does. I just want to phase out the third argument to in.

Oh I see, thanks.

So are we using == everywhere for in except for AbstractSet using isequal?

JeffBezanson · 2020-02-19T16:30:10Z

As a general note - I still find it interesting that missing (and NaN) have to get handled in generic code rather than just falling out.

That's a good point. We could do a little better by checking whether == returns true or false, and if it returns something other than those, use that as the result. But then you might get multiple missing-like values --- do we promote them, or what?

NaN is a bit different. There I would definitely discourage people from defining new types where x==x returns false (unless isnan is true). But, that's also easier to handle generically since the right behavior falls out just by handling true/false results from ==.

closes #34744 use `isequal` to compare keys in `ImmutableDict`

tkf · 2020-02-20T05:47:03Z

Maybe I'm missing something but why Base functions use anymissing flag pattern? Why not let & (and |) handle it? Something like this:

function ==′(l::AbstractDict, r::AbstractDict)
    if isa(l,IdDict) != isa(r,IdDict)
        return false
    end
    length(l) != length(r) && return false
    acc = true
    for pair in l
        isin = in(pair, r)
        isin === false && return false
        acc &= isin
    end
    return acc
end

This way, we only require that missing-like object to implement a sane & (i.e., false is an absorbing element). I think this is nice as the reduction code itself doesn't have to mention missing at all.

Maybe anymissing-pattern is used for type-stability? Though maybe it's OK here since in(pair, r) is the non-trivial inner-most operation?

andyferris · 2020-02-20T06:04:46Z

It's a good point, @tkf - that does seem more generic.

Further curiosity: in that code snippet, will the compiler know when isin is inferred as Bool that its value in the last line inside the loop must be true since it can't be false?

closes JuliaLang#34744 use `isequal` to compare keys in `ImmutableDict`

closes #34744 use `isequal` to compare keys in `ImmutableDict`

JeffBezanson added missing data Base.missing and related functionality collections Data structures holding multiple items, e.g. sets labels Feb 19, 2020

JeffBezanson force-pushed the jb/dictcompare branch from 386fcb9 to 1c8f885 Compare February 19, 2020 00:34

andyferris reviewed Feb 19, 2020

View reviewed changes

fix dict x == x to return missing if x contains it

167ca59

closes #34744 use `isequal` to compare keys in `ImmutableDict`

JeffBezanson force-pushed the jb/dictcompare branch from 1c8f885 to 167ca59 Compare February 19, 2020 17:34

JeffBezanson merged commit 5cd7482 into master Feb 21, 2020

JeffBezanson deleted the jb/dictcompare branch February 21, 2020 16:05

birm pushed a commit to birm/julia that referenced this pull request Feb 22, 2020

fix dict x == x to return missing if x contains it (JuliaLang#34809)

1c1c708

closes JuliaLang#34744 use `isequal` to compare keys in `ImmutableDict`

tkf mentioned this pull request Mar 20, 2020

Add a 2-arg flavor, argmax(f, domain) #27613

Closed

KristofferC pushed a commit that referenced this pull request Apr 11, 2020

fix dict x == x to return missing if x contains it (#34809)

4e9447b

closes #34744 use `isequal` to compare keys in `ImmutableDict`

KristofferC mentioned this pull request Jun 1, 2020

StackOverflow when testing == of a circular dict with itself #36101

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix dict `x == x` to return missing if `x` contains it #34809

fix dict `x == x` to return missing if `x` contains it #34809

JeffBezanson commented Feb 19, 2020

andyferris left a comment

andyferris Feb 19, 2020

andyferris Feb 19, 2020 •

edited

Loading

JeffBezanson Feb 19, 2020

KristofferC Feb 19, 2020

JeffBezanson Feb 19, 2020

JeffBezanson Feb 19, 2020

andyferris Feb 20, 2020

andyferris Feb 19, 2020

JeffBezanson Feb 19, 2020

andyferris Feb 20, 2020

JeffBezanson commented Feb 19, 2020

tkf commented Feb 20, 2020

andyferris commented Feb 20, 2020 •

edited

Loading

fix dict x == x to return missing if x contains it #34809

fix dict x == x to return missing if x contains it #34809

Conversation

JeffBezanson commented Feb 19, 2020

andyferris left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyferris Feb 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeffBezanson commented Feb 19, 2020

tkf commented Feb 20, 2020

andyferris commented Feb 20, 2020 • edited Loading

fix dict `x == x` to return missing if `x` contains it #34809

fix dict `x == x` to return missing if `x` contains it #34809

andyferris Feb 19, 2020 •

edited

Loading

andyferris commented Feb 20, 2020 •

edited

Loading