ensure promotion rules don't alter eltype values #26109

vtjnash · 2018-02-19T02:00:16Z

This also helps to re-synchronize cat and map and restore v0.6 behavior. Since I don't think it's obvious if this type can be made efficient (aka fastest), I think it seems best to avoid changing this to a more complicated type.

Closes #25924

This also helps to re-synchronize `cat` and `map` Closes #25924

nalimilan · 2018-02-19T09:05:29Z

test/tuple.jl

@@ -189,11 +189,11 @@ end
    for T in (Nothing, Missing)
        x = [(1, T()), (1, 2)]
        y = map(v -> (v[1], v[2]), [(1, T()), (1, 2)])
-        @test y isa Vector{Tuple{Int,Union{T,Int}}}
+        @test y isa Vector{Tuple{Int, Any}}


As I showed at #25924, Tuple{Union{T,Int}} is currently much faster than Tuple{Any}, so AFAICT this would be a clear regression.

I also think we should consider these questions in the light of broader issue of inference and NamedTuple at #25925.

Looks like this is simply a very minor known optimization miss (specifically, that we don't yet implement the obvious optimization to split x isa Union{Int, Missing} to x isa Int || x isa Missing), so we spend almost all of the time running subtyping. This is quite trivial – and non-breaking – to fix, so we haven't worked on it yet. We can verify that this PR doesn't actually affect performance by forcing away the handling of missing at the end (and also reducing the expense of the kernel operation from sin to *):

getx(x::Missing) = 0 getx(x) = x f(x) = begin @time [ getx(2*(x[1])) for x in x ] @time Any[ getx(2*(x[1])) for x in x ] @time map(x -> getx(2*(x[1])), x) nothing end; f(x); f(x);

Note that the data vector used to benchmark in #25924 isn't quite correct for general comparisons. I'm using x = map(i -> (isodd(i) ? missing : 12345,), 1:10_000); which ensures I have the same percent missingness and avoids the small-integer cache.

By contrast, I don't really know that the issue at #25925 has a good solution. It's possible just to widen incrementally – which will give fairly decent performance on micro-benchmarks – but will be impossible to precompile effectively, so load times might always be very long.

Addendum: if you actually do care about performance, doing this column-wise (e.g. as a Tuple{Vector...}) would give me a significant (3-10x) speedup*.

f(x) = begin @time [ getx(2*(x)) for x in x ] @time Any[ getx(2*(x)) for x in x ] @time map(x -> getx(2*(x)), x) nothing end; f(xx); xx = map(x -> x[1], x);

* this is failing to inline, which will also prevent vectorization, and is known performance issue #23338 – it should be even faster yet

Looks like this is simply a very minor known optimization miss (specifically, that we don't yet implement the obvious optimization to split x isa Union{Int, Missing} to x isa Int || x isa Missing), so we spend almost all of the time running subtyping. This is quite trivial – and non-breaking – to fix, so we haven't worked on it yet. We can verify that this PR doesn't actually affect performance by forcing away the handling of missing at the end (and also reducing the expense of the kernel operation from sin to *):

Interesting. But how is the compiler supposed to detect that it can do x isa Int || x isa Missing given that the only thing it knows about the input is that it's a Tuple{Any}? I guess in your example it's possible since getx has only two methods, but what happens for e.g. sin or +?

Also the return type of map is only inferred as ::AbstractArray when the input is a Vector{Tuple{Any}}, which can be very problematic down the line.

Note that the data vector used to benchmark in #25924 isn't quite correct for general comparisons. I'm using x = map(i -> (isodd(i) ? missing : 12345,), 1:10_000); which ensures I have the same percent missingness and avoids the small-integer cache.

OK. I'd rather use something like map(i -> (rand(Bool) ? missing : i,), 1:10_000) though, to make sure the repetition of (missing, 12345) doesn't allow the CPU to predict the result.

By contrast, I don't really know that the issue at #25925 has a good solution. It's possible just to widen incrementally – which will give fairly decent performance on micro-benchmarks – but will be impossible to precompile effectively, so load times might always be very long.

That's precisely why I propose using inference rather than widening incrementally. But can we have that discussion there instead?

given that the only thing it knows about the input

The input type doesn't actually play a role here, it's the output type that is failing to codegen. There's an open PR about this, although I can't find it right now.

which can be very problematic down the line

As long as there's a function call boundary (or we finally implement loop out-lining), this has no performance significance

using inference

Many inference optimizations currently rely on being able to compute the expanded form of the input unions, so I don't know that will make much of a difference.

The input type doesn't actually play a role here, it's the output type that is failing to codegen. There's an open PR about this, although I can't find it right now.

But how can the compiler infer the output type given that the input is Tuple{Any}?

As long as there's a function call boundary (or we finally implement loop out-lining), this has no performance significance

Well, yeah, but by this line of reasoning we wouldn't care about type stability of any functions, would we?

Many inference optimizations currently rely on being able to compute the expanded form of the input unions, so I don't know that will make much of a difference.

Sorry, but I'm not sure what this means. Could you try to explain this for mere mortals like me? Also this PR affects the public API, while inference could be improved without breaking the API in 1.x releases. So I think the question is more: what public API will allow for the best optimizations in the future?

But how can the compiler infer the output type given that the input is Tuple{Any}?

It doesn't need to – we don't lose any performance from this. All that matters is that the output type is something easy to optimize (like a concrete type or a trivial union).

we wouldn't care about type stability of any functions

Well, actually, no. It's critical to be able to write kernels with "type stability" so that for-loops over Array are optimized. That translates to requiring that most Base code be coded very carefully. But it doesn't actually carry over to mattering much for a significant portion of stdlib and beyond.

while inference could be improved

There are limits to what will be possible. We optimize unions by computing the fully expanded version. In general, this has O(2^n) components, so that will never help here – it's really just present to handle simple cases like iteration (and even there, it's not simple, ref redesign in #26079 needed to handle just this low order case). The best optimizations in the future will be for code kernels that operate directly on columns of Array{T} or Array{Union{T, Nothing}} – which also happens to be the fastest way to do it now (cf. above addendum).

JeffBezanson · 2018-02-19T16:37:08Z

If I understand the PR title correctly, I agree that promotion rules should avoid changing eltypes of inner containers. But this doesn't seem to do that --- it just gives Container{Any} instead of Container{Union{T,Missing}}.

vtjnash · 2018-02-20T03:02:59Z

it just gives Container{Any} instead of Container{Union{T,Missing}}

I think it should be giving Container now in this case. It looks like this is broken though on master and just gives a method error instead right now. That was supposed to be separate from this PR however – I'll file another issue to look into that later. The intended full title for this PR should have clarified the intent here was to avoid altering them to non-concrete types. We can put the issue back on triage for not changing the concrete eltype ever (#24988) – but that also should be a separate discussion.

nalimilan · 2018-02-20T09:52:25Z

My understanding of what we concluded from previous discussions (for example in #25553) is that Union{T,Missing/Nothing} should generally be given the same treatment as concrete types (regarding user-visible behavior at least) when T is concrete. So I'm fine with leaving type parameters unspecified if we do it all the time. OTC I'm a priori opposed to dropping code which prevents Union{T,Missing/Nothing} from being transformed to Any.

vtjnash · 2018-02-20T18:54:50Z

should generally be given the same treatment as concrete types

IMO, this should only true be true in specific limited cases. For more general cases, I think we need more expressive syntax (such as some form of |||), lifting (nullable-map), and simply requiring explicit handling. I'm usually opposed to implicitly propagating nulls into storage types, although I make a few exceptions for pragmatic reasons (such as the non-recursive cases in #25553). I'm similarly also opposed to implicitly propagating nulls through most functions, but allow a few exceptions (such as + and ==) – a view that I think Jeff first espoused to me (and others).

ararslan · 2018-02-20T20:28:52Z

We've been having the discussion about missing data for so long now. Literally the entire point of missing versus nothing is that it propagates. Indeed, we have an enormous number of Base functions that accept and appropriately handle/propagate missing, as they should. Otherwise using Julia for data analysis would be a nightmare. We should be making the compiler better suited for data-related tasks, not punishing people for using Julia to work with their data.

I'd really like to avoid rehashing this argument but I think it's important in this case to reiterate what's already been decided, implemented, and is now in use. Please don't regress this case just because you don't agree with it.

JeffBezanson · 2018-02-20T20:50:10Z

I believe these promotion rules are totally orthogonal to the issue of whether f(missing) == missing.

vtjnash · 2018-02-22T21:00:13Z

Triage decided we should go with this PR, and probably also remove the promote rules for Array eltype (but that's a separate PR)

nalimilan · 2018-02-27T19:56:04Z

Did you intend to keep this method as-is?

julia/base/tuple.jl

Lines 80 to 89 in b1b5066

    
           function _compute_eltype(t::Type{<:Tuple}) 
        
               @_pure_meta 
        
               t isa Union && return promote_typejoin(eltype(t.a), eltype(t.b)) 
        
               t´ = unwrap_unionall(t) 
        
               r = Union{} 
        
               for ti in t´.parameters 
        
                   r = promote_typejoin(r, rewrap_unionall(unwrapva(ti), t)) 
        
               end 
        
               return r 
        
           end

vtjnash · 2018-02-27T21:28:45Z

I hadn't looked at that method. It's not a bad definition, but it's completely invalid to mark it pure. That needs to be dropped if it's not going to be reverted.

ensure promotion rules don't alter eltype values

4a20236

This also helps to re-synchronize `cat` and `map` Closes #25924

ararslan requested review from JeffBezanson and nalimilan February 19, 2018 02:06

nalimilan reviewed Feb 19, 2018

View reviewed changes

vtjnash added the triage This should be discussed on a triage call label Feb 22, 2018

vtjnash removed the triage This should be discussed on a triage call label Feb 22, 2018

nalimilan mentioned this pull request Feb 22, 2018

Use promote_typejoin for Tuple and NamedTuple promotion #25924

Closed

vtjnash merged commit 32e4610 into master Feb 27, 2018

vtjnash deleted the jn/typejoin_tuple branch February 27, 2018 17:16

nalimilan mentioned this pull request Mar 17, 2018

Make promote_typejoin recursive #26501

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ensure promotion rules don't alter eltype values #26109

ensure promotion rules don't alter eltype values #26109

vtjnash commented Feb 19, 2018

nalimilan Feb 19, 2018

vtjnash Feb 20, 2018 •

edited

Loading

nalimilan Feb 20, 2018

vtjnash Feb 20, 2018

vtjnash Feb 20, 2018

nalimilan Feb 20, 2018

vtjnash Feb 20, 2018

JeffBezanson commented Feb 19, 2018

vtjnash commented Feb 20, 2018

nalimilan commented Feb 20, 2018

vtjnash commented Feb 20, 2018 •

edited

Loading

ararslan commented Feb 20, 2018

JeffBezanson commented Feb 20, 2018

vtjnash commented Feb 22, 2018

nalimilan commented Feb 27, 2018

vtjnash commented Feb 27, 2018

ensure promotion rules don't alter eltype values #26109

ensure promotion rules don't alter eltype values #26109

Conversation

vtjnash commented Feb 19, 2018

nalimilan Feb 19, 2018

Choose a reason for hiding this comment

vtjnash Feb 20, 2018 • edited Loading

Choose a reason for hiding this comment

nalimilan Feb 20, 2018

Choose a reason for hiding this comment

vtjnash Feb 20, 2018

Choose a reason for hiding this comment

vtjnash Feb 20, 2018

Choose a reason for hiding this comment

nalimilan Feb 20, 2018

Choose a reason for hiding this comment

vtjnash Feb 20, 2018

Choose a reason for hiding this comment

JeffBezanson commented Feb 19, 2018

vtjnash commented Feb 20, 2018

nalimilan commented Feb 20, 2018

vtjnash commented Feb 20, 2018 • edited Loading

ararslan commented Feb 20, 2018

JeffBezanson commented Feb 20, 2018

vtjnash commented Feb 22, 2018

nalimilan commented Feb 27, 2018

vtjnash commented Feb 27, 2018

vtjnash Feb 20, 2018 •

edited

Loading

vtjnash commented Feb 20, 2018 •

edited

Loading